Home > Articles > Search Engine Crawling

Search Engine Crawling

skip to navigation

This section of the site features articles published between 2002 and 2004. They remain here for reference purposes and may contain information that is out of date.

In this short article I will be explaining what is meant by a Search Engine "crawling" or "spidering" your web site.

Background

First a little background about Search Engines. Search Engines attempt to bring order to the chaos that the Internet essentially is. Through the use of a Search Engine you can locate web sites related to your chosen topic.

This is done by entering one or more words or phrases that relate to the topic you are looking for. For you using a Search Engine is very simple, for the Search Engine it's a lot harder.

To give you an idea of the scale of running a Search Engine, Google currently runs on 10,000 servers and employs 50 or more PHD level Software Engineers to constantly work on the Search Engine Software.

And what does the Software do? We are only looking at a small part of it, the part that actually goes out and finds the web pages on the Internet.

The Bot

A program called the GoogleBot visits your website and reads each page by following the links within your site. It makes a note of all the words used on your pages that can later be used to find your pages when searching for those words.

This process is called "crawling" or "spidering" because of the way in which the robot (GoogleBot in this case) finds it's way through your site.

This does not just apply to Google's robot; most Internet Search Engines work in the same way.

Once GoogleBot has examined your site the information is passed on to another part of the software which goes on to analyse the words and phrases to get them ready to be added to the index.

After your site has been crawled, you just need to wait for the next Google Update.