Search Engine Crawling

A retro search engine standing on the porch of a house, holding a clipboard with an inquiring look, seen from inside the house.

Technical Level:	Basic/Beginner	Published:	5^th November 2002
Author:	Nigel Peck	Last Updated:	-

In this short article I will be explaining what is meant by a Search Engine "crawling" or "spidering" your web site.

Background

First a little background about Search Engines. Search Engines attempt to bring order to the chaos that the Internet essentially is. Through the use of a Search Engine you can locate web sites related to your chosen topic.

This is done by entering one or more words or phrases that relate to the topic you are looking for. For you using a Search Engine is very simple, for the Search Engine it's a lot harder.

To give you an idea of the scale of running a Search Engine, Google currently runs on 10,000 servers and employs 50 or more PHD level Software Engineers to constantly work on the Search Engine Software.

And what does the Software do? We are only looking at a small part of it, the part that actually goes out and finds the web pages on the Internet.

The Bot

A program called the GoogleBot visits your website and reads each page by following the links within your site. It makes a note of all the words used on your pages that can later be used to find your pages when searching for those words.

This process is called "crawling" or "spidering" because of the way in which the robot (GoogleBot in this case) finds it's way through your site.

This does not just apply to Google's robot; most Internet Search Engines work in the same way.

Once GoogleBot has examined your site the information is passed on to another part of the software which goes on to analyse the words and phrases to get them ready to be added to the index.

After your site has been crawled, you just need to wait for the next Google Update.

A Selection of Other Articles from Our Collection

mod_accessibility for Apache 2

An introduction to mod_accessibility for Apache 2.

Technology & Innovation

15^th August 2003

CSS Positioning Properties

Aimed at experienced CSS developers who need a reference for the properties related to positioning in CSS 2.

Technology & Innovation

28^th September 2003

Random Content Generator

A method of displaying random content on a web page using CSS, that is both search engine friendly and accessible.

Technology & Innovation

8^th December 2003

Fancy Paragraphs with CSS

Get started with CSS by improving the look of your paragraphs.

Technology & Innovation

16^th November 2002