Okay, so you know the basic concept of a search engine. Type a word or phrase into a search
box and click a button. Wait a few seconds, and references to thousands (or hundreds of
thousands) of pages will appear. Then all you have to do is click through those results to find
what you want. But what exactly is a search engine, beyond this general concept of ‘‘seek and ye
shall find’’?
It’s a little complicated. On the back end, a search engine is a piece of software that uses
algorithms to find and collect information about web pages. The information collected is usually
keywords or phrases that are possible indicators of what is contained on the web page as a
whole, the URL of the page, the code that makes up the page, and links into and out of the
page. That information is then indexed and stored in a database.
On the front end, the software has a user interface where users enter a search term — a word or
phrase — in an attempt to find specific information. When the user clicks a search button, an
algorithm then examines the information stored in the back-end database and retrieves links to
web pages that appear to match the search term the user entered.
The process of collecting information about web pages is performed by an agent called a crawler,
spider, or robot. The crawler literally looks at every URL on the Web that’s not blocked from it
and collects key words and phrases on each page, which are then included in the database that
powers a search engine. Considering that the number of sites on the Web exceeded 100 million
some time ago and is increasing by more than 1.5 million sites each month, that’s like your
brain cataloging every single word you read, so that when you need to know something, you
think of that word and every reference to it comes to mind.
In a word . . . overwhelming.
0 comments:
Post a Comment