This topic provides information about how to manage Web spider, Web crawlers, and robots.
Web spiders, Web crawlers, and robots are programs that traverse the Internet retrieving documents and following links in those documents. You may have noticed entries in your log files that document requests for /robots.txt files or requests for many of your Web documents. These requests may be from a robot. Most robots adhere to the robot exclusion protocol. If you want to control what portion of your Web site robots attempt to visit, you can either use a robots.txt file or the robots meta tag.
The robots.txt file
The robots.txt file must be placed in the document root directory of the server. The following is an example of a robots.txt file:
User-agent: * Disallow: /cgi-bin/
Robots meta tag
The robots meta tag can be placed in HTML documents to tell the robot:
<META NAME="ROBOTS" CONTENT="NOINDEX">
<META NAME="ROBOTS" CONTENT="NOFOLLOW">