5ubliminal@twitter

What is Word Stemming and How is it Done? : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/116/">What is Word Stemming and How is it Done? : 5ubliminal's TellinYa</a>
5ubliminal's YAMS
Keyword stemming … hmmm … stem cells?

Yes you figured out the origin of the term. Word stemming comes from stem cells as the concept is the same. As stem cells are the root for all other cells the words also have stems.

This can be best explained by an example:

  • photograph
  • photographs
  • photographer
  • photography
  • photographic
  • photographed
  • photographing

As you see all the previous words have a root. A word that is the prefix in all of them. That is the stem word.

The stem word does not always have to be a valid word. It will be a succession of letters that is left after known suffixes are stripped from words. Known suffixes include: ing,es,ed,li,less,ful and so on.

Why word stemming?

The concept is very simple but the result is worth every penny! Stemming allows you to find related words with different forms or misspellings. Stemming will point out related words and search engines love stems.

Keyword stemming implementations

I use the Porter Stemming Algorythm with it's PHP implementation, but tweaked it a bit. I got my own custom version ;)

Using keyword stemming to detect similar keyphrases

By stemming every word in a keyphrase and then ordering the results alphabetically you can find similar phrases like: feeling well, wellness feeling or any variations. Trust me, you'll put it to good use!

3 Comments Posted By Readers :

Add your comment
#1 david hoffman from India
Posted on Saturday, 26 April, 2008
this is a good article and define stemming very good.
#2 nogenius from United States
Posted on Wednesday, 30 April, 2008
I've been playing around with Porter Stemming, and can definitely think of some good uses for it, thanks for sharing this gem.

However, I've run into a bit of a dilemma that I was hoping I could get some help with. I have a list of words (some plural, some singular), and I would like to transform that list into all singular forms.

Porter Stemming sometimes works for this, but as you mentioned, it sometimes will return a word that is not actually a word.

I guess it would be possible to query a dictionary like www.m-w.com with the word to try to find the singular form, but before I hack up a script to do that, do you know of any easier (or maybe more algorithmic) ways to transforming a plural word to a singular one?
#3 5ubliminal web
Posted on Thursday, 01 May, 2008
An English plural ends in -s or -es or -ves (wolf - wolves) and there are some irregular forms (goose - geese).
I guess you could build an algo using the usual terminations and correctly determine 95% of your list's singulars.
Then manually check the rest that don't match any rules.
Stemming is best to remove words that belong to same family by getting the root and keeping one match. But not to get a singular. It will only give you a root. Sometimes not a valid word but a seed found in all the lexical family.

I'll be bak to 'office' in the weekend and I'll look into this challenge too. If you could send me your list or a part of it (by mail) to do some tests I'll give you an algo to work out most of it.
Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Tuesday, 06 January, 2009 - 00:28:55 GMT]   No Ajax / Flash Used Here
" What is Word Stemming and How is it Done? : 5ubliminal's TellinYa "
Close
Tellinya.com is relocating to blog.5ubliminal.com