5ubliminal@twitter

Basic Idea on How The Search Engines Work : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/114/">Basic Idea on How The Search Engines Work : 5ubliminal's TellinYa</a>
5ubliminal's YAMS
Many might wonder how search engines work?!

The concept is actually simple but part of the execution is rocket sience. I'll try to explain in every day words how search engines work.

The structure of a search engine

A search engine is composed of three disctinct parts what work together as a whole. You have the Research Department, the Indexing Department and the PR Department. Together these three `departments` cooperate for the benefit of the end user: you!

I'll now detail each of them.

The Search Engine's Research Department

This is also known in popular language as the crawler and is by far the easiest part a search engine has to do. The web robot's task is simple. It has to get the content from there to here. It has to download pages and crawl pages for new links and download them too. The crawler has a URL queue sorted by importance and downloads it.

The Search Engine's Indexing Department

This is the rocket sience part of the search engine's job. All the data retrieved by the crawler has to be organized and sorted by importance and relevance.

The pages that were downloaded are parsed into elements. They are parsed as entities and text. Entities means links and HTML structure elements to decide important of sections in a page. Here the indexing knows to weigh more on the text in H1 and page TITLE then rest.

Then the page is parsed as text. Each chunk of text has a previous assigned importance. Content is split in paragraphs, sentences and words. Then word stemming is performed.

By using it's large datasets the search engine eventually learns and associates certain words or phrases. By searching many pages it will learn that health is associated with disease and even names of diseases, then diseases are associated with cures, and name of cures so health has to do with cures! And so on it will know words that match same context by distance between them. Distance means words in phrase between them or related words that link them.

Here brain surgery begins. Search engines start looking at how documents are related and what they say of each other. They rank the good pages by seeing the way they vote each other with links. This algorythm is easy to comprehend as an overview but it can work in so many ways. This is why the search engines are such a husle to SEO. Unknown are the paths of the ranking systems of search engines.

The Public Relations Department
It's public. Do not misspell this word ;)

This is the part that interacts with you. The data is already ranked and organized and the user interface will just show it to you! This is what you actually see in the search results. The PR department has to find the nearest PR officer (data center) and then has to send you the info you requested.

This is it … explained in a light way

For the heavy version do some searches for search engine functionality, document ranking in search engines, word stemming and so on.

I hope next time you will use a search engine you will start to appreciate more the enormous amount of work behind the scenes! They diserve credit … at least for trying to do a good job.

Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Friday, 21 November, 2008 - 08:54:51 GMT]   No Ajax / Flash Used Here
" Basic Idea on How The Search Engines Work : 5ubliminal's TellinYa "