ibm-information-center/dist/eclipse/plugins/i5OS.ic.rzaie_5.4.0.1/rzaiedoclstsrch.htm

225 lines
14 KiB
HTML
Raw Normal View History

2024-04-02 14:02:31 +00:00
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="topic" />
<meta name="DC.Title" content="Set up a document list for the Webserver search engine on HTTP Server" />
<meta name="abstract" content="This topic provides information about how to create a document list for the Webserver search engine with the IBM Web Administration for i5/OS interface." />
<meta name="description" content="This topic provides information about how to create a document list for the Webserver search engine with the IBM Web Administration for i5/OS interface." />
<meta name="DC.Relation" scheme="URI" content="rzaieparsearch.htm" />
<meta name="copyright" content="(C) Copyright IBM Corporation 2002,2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 2002,2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rzaiedoclstsrch" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Set up a document list for the Webserver search engine on HTTP Server</title>
</head>
<body id="rzaiedoclstsrch"><a name="rzaiedoclstsrch"><!-- --></a>
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Set up a document list for the Webserver search engine on HTTP Server</h1>
<div><p>This topic provides information about how to create a document
list for the Webserver search engine with the <span>IBM<sup>®</sup> Web Administration for i5/OS™ interface</span>.</p>
<div class="important"><span class="importanttitle">Important:</span> Information
for this topic supports the latest PTF levels for HTTP Server for i5/OS .
It is recommended that you install the latest PTFs to upgrade to the latest
level of the HTTP Server for i5/OS. Some of the topics documented here are
not available prior to this update. See <a href="http://www-03.ibm.com/servers/eserver/iseries/software/http/services/service.html" target="_blank">http://www.ibm.com/servers/eserver/iseries/software/http/services/service.htm</a> <img src="www.gif" alt="Link outside Information Center" /> for more information. </div>
<p> A document list is a file that contains a list of documents used to create
or update a search index. When a request for a search title or description
is sent, it is compared to the document list for possible matches.</p>
<p>To set up a document for use with the Webserver search engine, complete
the following steps:</p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaieparsearch.htm" title="This topic provides step-by-step tasks for the Webserver search engine.">Search tasks</a></div>
</div>
</div><div class="nested1" id="create"><a name="create"><!-- --></a><h2 class="topictitle2">Create a document list</h2>
<div><p>To create a document list, do the following: </p>
<ol><li>Click the <strong>Advanced</strong> tab.</li>
<li>Click the <span class="uicontrol">Search Setup</span> subtab.</li>
<li>Expand <strong>Search Engine Setup</strong>. </li>
<li>Click <strong>Build document list</strong>.</li>
<li>Choose one of the two options:<dl class="block"><dt class="dlterm"><a href="#server">Build a document list from documents on this server</a></dt>
<dd>Select this option if the documents to be included in the document list
are in a local directory.</dd>
<dt class="dlterm"><a href="#crawl">Build the document list by crawling a URL</a></dt>
<dd>Select this option if the documents to be included in the document list
reside in a remote server.</dd>
</dl>
<p>There are two additional options if you choose to build
the document list using the Web crawler. These are:</p>
<dl class="block"><dt class="dlterm">Build the document list by crawling a URL</dt>
<dd>Select this option to crawl a single URL.</dd>
<dt class="dlterm"><a href="#url">Build the document list from selected URL and options objects</a></dt>
<dd>Select this option only if you have previously created a <a href="rzaieurlobjsrch.htm">Set up a URL object for the Webserver search engine on HTTP Server</a> and an <a href="rzaieoptobjsrch.htm">Set up an options object for the Webserver search engine on HTTP Server</a>.</dd>
</dl>
</li>
<li>Click <strong>Apply</strong>.</li>
</ol>
</div>
</div>
<div class="nested1" id="server"><a name="server"><!-- --></a><h2 class="topictitle2">Build a document list from documents on this server </h2>
<div><p>If you opted to build a document list from a local directory, follow these
instructions to complete your document list: </p>
<ol><li>Choose one of the two document list file name options:<dl class="block"><dt class="dlterm">Create a new document list file</dt>
<dd>Select this option to create a new document list file. Replace the asterisk
(*) with a new name for your document list file.</dd>
<dt class="dlterm">Use the document list in this file</dt>
<dd>Select this option to use an existing document list file. Select the document
list file from the list.</dd>
</dl>
<p>There are two additional options if you choose to use an
existing document list file. These are:</p>
<dl class="block"><dt class="dlterm">Replace the document list file</dt>
<dd>Select this option to overwrite the existing document list file.</dd>
<dt class="dlterm">Append the new list to the document list file</dt>
<dd>Select this option to add any new information to the existing document
list file. This option will not delete existing information.</dd>
</dl>
</li>
<li>Enter the directory the document list will build from in the <strong>Build
a document list from this directory</strong> field. For example, <em>/www/mydocs/public/info</em>.<p>There
are two additional options that you may select. These are:</p>
<dl class="block"><dt class="dlterm">Traverse subdirectories in this directory</dt>
<dd>Select to include any documents in subdirectories of the directory you
provided in the field above.</dd>
<dt class="dlterm">Document filter</dt>
<dd>Select this option if you want the document list to be made of specific
file types. For example, entering <em>*.htm*</em> will only build a document
list of file types <em>htm</em> and <em>html</em>.</dd>
</dl>
</li>
<li>Click <strong>Apply</strong>.</li>
</ol>
</div>
</div>
<div class="nested1" id="crawl"><a name="crawl"><!-- --></a><h2 class="topictitle2">Build the document list by crawling a URL</h2>
<div><p>If you opted to build a document list with the Web crawler that will crawl
a URL, follow these instructions to complete your document list: </p>
<ol><li>Choose one of the two document list file name options:<dl class="block"><dt class="dlterm">Create a new document list file</dt>
<dd>Select this option to create a new document list file. Replace the asterisk
(*) with a new name for your document list file. </dd>
<dt class="dlterm">Use the document list in this file</dt>
<dd>Select this option to use an existing document list file. Select the document
list file from the list.</dd>
</dl>
<p>There are two additional options if you choose use an existing
document list file. These are:</p>
<dl class="block"><dt class="dlterm">Replace the document list file</dt>
<dd>Select this option to overwrite the existing document list file.</dd>
<dt class="dlterm">Append the new list to the document list file</dt>
<dd>Select this option to add any new information to the existing document
list file. This option will not delete existing information.</dd>
</dl>
</li>
<li>Enter the Web crawler options:<dl class="block"><dt class="dlterm">URL</dt>
<dd>Enter the URL the Web crawler will visit to add documents to your document
list. For example, <em>http://www.ibm.com</em>.</dd>
<dt class="dlterm">URL domain filter</dt>
<dd>Enter the URL domain filter the Web crawler will stay on. For example,<em> ibm.com<sup>®</sup></em>.</dd>
<dt class="dlterm">Maximum crawling depth</dt>
<dd>Enter the depth of the crawling from the starting URL. For example, entering
a depth of 0 will download only the starting URL page. Selecting a depth of
1, will continue the crawl to the first layer of links from the starting URL. </dd>
<dt class="dlterm">Support robot exclusion</dt>
<dd>If you select <em>Yes</em>, any site or pages that contain robot exclusion
META tags or files will not be downloaded. Excluded files do not usually contain
HTML or text. See <a href="rzaiespiders.htm">Manage Web spiders, Web crawlers, and robots on HTTP Server</a> for
more information.</dd>
</dl>
</li>
<li>Choose crawling options:<dl class="block"><dt class="dlterm"><strong>Directory to store documents</strong></dt>
<dd>Enter the directory to store the documents the Web crawler finds. For
example, <em>/www/mydocs/public/crawl</em>.</dd>
<dt class="dlterm">Document language</dt>
<dd>Select the language of the documents being retrieved by the Web crawler.</dd>
<dt class="dlterm">Proxy server for HTTP</dt>
<dd>Enter the proxy server for HTTP requests. Possible values include any
valid server name. </dd>
<dt class="dlterm">Proxy port for HTTP</dt>
<dd>Enter the port number for the above proxy server. A port is required if
a proxy server for HTTP is specified. </dd>
<dt class="dlterm">Proxy server for HTTPS</dt>
<dd>Enter the proxy server for HTTPS requests.</dd>
<dt class="dlterm">Proxy port for HTTPS</dt>
<dd>Enter the port number for the above proxy server.</dd>
<dt class="dlterm">Maximum file size to download</dt>
<dd>Enter the maximum size for a downloaded file (in KB).</dd>
<dt class="dlterm">Maximum storage for files</dt>
<dd>Enter the maximum storage space for all downloaded files (in MB).</dd>
<dt class="dlterm">Maximum threads</dt>
<dd>Enter the maximum number of threads used during web crawling. Set this
value based on the system resources that are available. </dd>
<dt class="dlterm">Maximum run time</dt>
<dd>Enter the maximum amount of time the crawling session remains active in
hours and minutes. </dd>
<dt class="dlterm">Activity log file</dt>
<dd>Enter the action to take for an activity log file. This file contains
information about the crawling session plus any errors that occur. This file
must be in a directory of the IFS. You can choose to run a crawling session
with or without an activity log file. You also have the option of replacing
the log file each time a crawling session is started or appending information
to the existing file.</dd>
</dl>
<p>There are two additional options if you choose to write
an activity log. These are:</p>
<dl class="block"><dt class="dlterm">Create or replace the logging file</dt>
<dd>Select this option if the log file does not exist or you want to overwrite
an existing log file.</dd>
<dt class="dlterm">Append to the existing logging file</dt>
<dd>Select this option to add any new information to the existing log file.
This option will not delete existing information. </dd>
</dl>
</li>
<li>Click <strong>Apply</strong>.</li>
</ol>
</div>
</div>
<div class="nested1" id="url"><a name="url"><!-- --></a><h2 class="topictitle2">Build the document list from selected URL and options objects</h2>
<div><p>If you opted to build a document list with the Web crawler using selected
URL and options objects, follow these instructions to complete your document
list: </p>
<ol><li>Choose one of the two document list file name options:<dl class="block"><dt class="dlterm">Create a new document list file</dt>
<dd>Select this option to create a new document list file. Replace the asterisk
(*) with a new name for your document list file.</dd>
<dt class="dlterm">Use the document list in this file</dt>
<dd>Select this option to use an existing document list file. Select the document
list file from the list.</dd>
</dl>
<p>There are two additional options if you choose use an existing
document list file. These are:</p>
<dl class="block"><dt class="dlterm">Replace the document list file</dt>
<dd>Select this option to overwrite the existing document list file.</dd>
<dt class="dlterm">Append the new list to the document list file</dt>
<dd>Select this option to add any new information to the existing document
list file. This option will not delete existing information.</dd>
</dl>
</li>
<li>Select the <a href="rzaieurlobjsrch.htm">Set up a URL object for the Webserver search engine on HTTP Server</a>.</li>
<li>Select the <a href="rzaieoptobjsrch.htm">Set up an options object for the Webserver search engine on HTTP Server</a>.</li>
<li>Select <a href="rzaievallstsrch.htm">Set up validation lists for the Webserver search engine on HTTP Server</a>:<dl class="block"><dt class="dlterm">Validation list</dt>
<dd>Select <strong>Do not use a validation list</strong> if you know the server the
Web crawler will visit does not use a validation list for authentication.
Otherwise, select <strong>Use this validation list for sites requiring a userid
and password</strong> and select the validation list to be used from the list.</dd>
</dl>
</li>
<li>Click <strong>Apply</strong>.</li>
</ol>
</div>
</div>
</body>
</html>