Monday, March 2, 2009

Web Spider

A Web spider is a computer program that browses the WWW in a disciplined, automated manner. Other terms for Web spiders are ants, regular indexers, bots, and worms[1] or Web spider, Web robot, or—especially in the FOAF community.

This method is called Web crawling. Many sites, in particular search engines, use spidering as a means of offering up-to-date data. Web spiders are mostly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to offer fast searches. Spiders can also be used for automating repairs tasks on a Web site, such as checking links or validating HTML code. Also, spiders can be used to gather specific types of information from Website pages, such as harvesting e-mail addresses.

A Website crawler is one type of software agent. In general, it starts with a list of URLs to visit, called the seeds. As the spider visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.