The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflectional endings from words in English. It is used as part of a term normalization process that is usually done when setting up Information Retrieval systems. The rules in the Porter algorithm are separated into five distinct phases numbered from 1 to 5. They are applied to the words in the text starting from phase 1 and moving on to phase 5. Further, they are applied sequentially one after the other as commands in a program.

Originally written in 1979 at Computer Laboratory, Cambridge (England), it was reprinted in 1997 in the book “Readings in Information Retrieval“. Initially it was written in BCPL language. Here is the list of implementations in other programming languages including C, Java and Pearl implementations done by author himself.

Porter's Algorithm in C

This is the Porter stemming algorithm, coded up in ANSI C by the author himself. You can compile it on Unix with ‘gcc -O3 -o stem stem.c’ after which ‘stem’ takes a list of inputs and sends the stemmed equivalent to stdout.