66 lines
4.2 KiB
HTML
66 lines
4.2 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html lang="en-us" xml:lang="en-us">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<meta name="security" content="public" />
|
|
<meta name="Robots" content="index,follow" />
|
|
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
|
|
<meta name="DC.Type" content="topic" />
|
|
<meta name="DC.Title" content="Manage Web spiders, Web crawlers, and robots on HTTP Server" />
|
|
<meta name="abstract" content="This topic provides information about how to manage Web spider, Web crawlers, and robots." />
|
|
<meta name="description" content="This topic provides information about how to manage Web spider, Web crawlers, and robots." />
|
|
<meta name="DC.Relation" scheme="URI" content="rzaieparsearch.htm" />
|
|
<meta name="copyright" content="(C) Copyright IBM Corporation 2002,2006" />
|
|
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 2002,2006" />
|
|
<meta name="DC.Format" content="XHTML" />
|
|
<meta name="DC.Identifier" content="rzaiespiders" />
|
|
<meta name="DC.Language" content="en-us" />
|
|
<!-- All rights reserved. Licensed Materials Property of IBM -->
|
|
<!-- US Government Users Restricted Rights -->
|
|
<!-- Use, duplication or disclosure restricted by -->
|
|
<!-- GSA ADP Schedule Contract with IBM Corp. -->
|
|
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
|
|
<link rel="stylesheet" type="text/css" href="./ic.css" />
|
|
<title>Manage Web spiders, Web crawlers, and robots on HTTP Server</title>
|
|
</head>
|
|
<body id="rzaiespiders"><a name="rzaiespiders"><!-- --></a>
|
|
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
|
|
<h1 class="topictitle1">Manage Web spiders, Web crawlers, and robots on HTTP Server</h1>
|
|
<div><p>This topic provides information about how to manage Web spider,
|
|
Web crawlers, and robots.</p>
|
|
<div class="important"><span class="importanttitle">Important:</span> Information
|
|
for this topic supports the latest PTF levels for HTTP Server for i5/OS .
|
|
It is recommended that you install the latest PTFs to upgrade to the latest
|
|
level of the HTTP Server for i5/OS. Some of the topics documented here are
|
|
not available prior to this update. See <a href="http://www-03.ibm.com/servers/eserver/iseries/software/http/services/service.html" target="_blank">http://www.ibm.com/servers/eserver/iseries/software/http/services/service.htm</a> <img src="www.gif" alt="Link outside Information Center" /> for more information. </div>
|
|
<p>Web spiders, Web crawlers, and robots are programs that traverse the Internet
|
|
retrieving documents and following links in those documents. You may have
|
|
noticed entries in your log files that document requests for /robots.txt files
|
|
or requests for many of your Web documents. These requests may be from a robot.
|
|
Most robots adhere to the robot exclusion protocol. If you want to control
|
|
what portion of your Web site robots attempt to visit, you can either use
|
|
a robots.txt file or the robots meta tag. </p>
|
|
<p><strong>The robots.txt file </strong> </p>
|
|
<p>The robots.txt file must be placed in the document root directory of the
|
|
server. The following is an example of a robots.txt file: </p>
|
|
<pre class="block">User-agent: *
|
|
Disallow: /cgi-bin/</pre>
|
|
<div class="note"><span class="notetitle">Note:</span> Make sure that you do not alert hackers to important directories or
|
|
files by listing them in the robots.txt file. </div>
|
|
<p><strong>Robots meta tag </strong></p>
|
|
<p>The robots meta tag can be placed in HTML documents to tell the robot: </p>
|
|
<ul><li>Do not index a document <pre><META NAME="ROBOTS" CONTENT="NOINDEX"></pre>
|
|
</li>
|
|
<li>Do not follow links in a document <pre><META NAME="ROBOTS" CONTENT="NOFOLLOW"></pre>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaieparsearch.htm" title="This topic provides step-by-step tasks for the Webserver search engine.">Search tasks</a></div>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html> |