Where allowed to run: All environments (*ALL) Threadsafe: No |
Parameters Examples Error messages |
The Configure Search (CFGHTTPSCH) command allows you to do various search administration tasks that include working with an index, a document list, a URL mapping rules file, or a thesaurus, plus building objects used for web crawling.
You can create an index, add documents to an index, remove documents from an index, delete an index, create, update, or delete a document list, create or update a mapping rules file.
To create an index, add or remove documents from an index, you will need to provide a document list. Specify *CRTDOCL for the Option (OPTION) parameter to create a document list.
To create an index, specify *CRTIDX for the OPTION parameter.
To create a document list, specify *CRTDOCL for the OPTION parameter. The document list can be used when you create (*CRTIDX) or update (*ADDDOC or *RMVDOC) an index.
To append additional document paths to a document list, specify *UPDDOCL for the OPTION parameter. The document list can be used when you create (*CRTIDX) or update (*ADDDOC or *RMVDOC) an index.
To add documents to an index, specify *ADDDOC for the OPTION parameter. All new or changed documents in the document list will be added to the index.
To remove documents from the index, specify *RMVDOC for the OPTION parameter.
To delete a document list, specify *DLTDOCL for the OPTION parameter.
To delete an index, specify *DLTIDX for the OPTION parameter.
To create a mapping rules file, specify *CRTMAPF for the OPTION parameter.
To add additional configuration directives to a mapping rules file, specify *UPDMAPF for the OPTION parameter.
To create a thesaurus dictionary that can be used on a search, specify *CRTTHSDCT for the OPTION parameter.
To delete a thesaurus dictionary, specify *DLTTHSDCT for the OPTION parameter.
To retrieve a thesaurus definition file from a thesaurus dictionary, specify *RTVTHSDFNF for the OPTION parameter.
The next set of OPTIONS are used for working with objects that are used when crawling remote web sites.
To create a URL object that contains a list of URLs to crawl, specify *CRTURLOBJ for the OPTION parameter.
To update a URL object, specify *UPDURLOBJ for the OPTION parameter.
To delete a URL object, specify *DLTURLOBJ for the OPTION parameter.
To create an options object, specify *CRTOPTOBJ for the OPTION parameter.
To update an options object, specify *UPDOPTOBJ for the OPTION parameter.
To delete an options object, specify *DLTOPTOBJ for the OPTION parameter.
To print the status of an index, specify *PRTIDXSTS for the OPTION parameter.
To print the status of a document list, specify *PRTDOCLSTS for the OPTION parameter.
Restrictions
Top |
Keyword | Description | Choices | Notes |
---|---|---|---|
OPTION | Option | *CRTIDX, *MRGIDX, *DLTIDX, *ADDDOC, *RMVDOC, *CRTDOCL, *UPDDOCL, *REGDOCL, *DLTDOCL, *CRTMAPF, *UPDMAPF, *CRTTHSDCT, *DLTTHSDCT, *RTVTHSDFNF, *CRTURLOBJ, *UPDURLOBJ, *DLTURLOBJ, *CRTOPTOBJ, *UPDOPTOBJ, *DLTOPTOBJ, *CRTVLDL, *ADDVLDLDTA, *RMVVLDLDTA, *DLTVLDL, *PRTIDXSTS, *PRTDOCLSTS | Required, Positional 1 |
IDX | Index name | Character value | Optional |
IDXDIR | Index directory | Path name, '/QIBM/USERDATA/HTTPSVR/INDEX' | Optional |
TEXT | Index description | Character value, *BLANK | Optional |
DOCLIST | Document list file | Path name | Optional |
STRDIR | Start directory | Path name | Optional |
SUBTREE | Traverse directory | *ALL, *NONE | Optional |
PATTERN | Filter | Character value, '*.HTM*' | Optional |
CONTENT | Document content | *HTML, *TEXT | Optional |
ALWERR | Allow file errors | *YES, *NO | Optional |
ENBCASE | Enable case sensitive search | *YES, *NO | Optional |
ALWCHAR | Valid characters | *ALPHANUM, *ALPHA | Optional |
IDXHTML | Index HTML fields | Single values: *NONE Other values (up to 5 repetitions): *TITLE, *AUTHOR, *ABSTRACT, *DESCRIPTION, *KEYWORDS, *ALLMETA |
Optional |
CFG | HTTP server | Name | Optional |
URLPFX | Prefix for URL | Character value, *NONE | Optional |
MAPFILE | Mapping rules file | Path name | Optional |
DLTTYPE | Delete type | *ALL, *SUPP | Optional |
THSDCT | Thesaurus dictionary name | Character value | Optional |
THSDCTDIR | Thesaurus directory | Path name, '/QIBM/USERDATA/HTTPSVR/SEARCH' | Optional |
THSDFNF | Thesaurus definition file | Path name | Optional |
URLOBJ | URL object | Character value | Optional |
DOCDIR | Document storage directory | Path name | Optional |
LANG | Language of documents | *ARABIC, *BALTIC, *CENTEUROPE, *CYRILLIC, *ESTONIAN, *GREEK, *HEBREW, *JAPANESE, *KOREAN, *SIMPCHINESE, *TRADCHINESE, *THAI, *TURKISH, *WESTERN | Optional |
URLACT | URL list action | *NONE, *ADD, *REMOVE | Optional |
URLLST | URL list entries | Values (up to 100 repetitions): Element list | Optional |
Element 1: URL | Character value | ||
Element 2: URL filter | Character value, *NONE | ||
Element 3: Maximum crawling depth | 0-100, 3, *NOMAX | ||
Element 4: Enable robots | *YES, *NO | ||
RMVURLLST | Remove URL list entries | Values (up to 100 repetitions): Element list | Optional |
Element 1: URL | Character value | ||
OPTOBJ | Options object | Character value | Optional |
PRXSVR | Proxy server for HTTP | Character value, *NONE, *SAME | Optional |
PRXPORT | Proxy port for HTTP | 1-65535, *SAME | Optional |
PRXSVRSSL | Proxy server for HTTPS | Character value, *NONE, *SAME | Optional |
PRXPORTSSL | Proxy port for HTTPS | 1-65535, *SAME | Optional |
MAXSIZE | Maximum file size | 1-6000, 1000, *SAME | Optional |
MAXSTGSIZE | Maximum storage size | 1-65535, 100, *NOMAX, *SAME | Optional |
MAXTHD | Maximum threads | 1-50, 20, *SAME | Optional |
MAXRUNTIME | Maximum run time | Single values: *NOMAX, *SAME Other values: Element list |
Optional |
Element 1: Hours | 0-1000, 2 | ||
Element 2: Minutes | 0-59, 0 | ||
LOGFILE | Logging file | Path name, *NONE, *SAME | Optional |
CLRLOG | Clear logging file | *YES, *NO, *SAME | Optional |
LSTTYPE | Document list type | *LOCAL, *REMOTE | Optional |
VLDL | Validation list | Name | Optional |
VLDLE | Validation list entries | Values (up to 100 repetitions): Element list | Optional |
Element 1: URL | Character value | ||
Element 2: User ID | Character value | ||
Element 3: Password | Character value | ||
RMVVLDLE | Remove validation list entries | Values (up to 100 repetitions): Element list | Optional |
Element 1: URL | Character value |
Top |
Specifies the administrative task to be performed.
This is a required parameter.
Top |
Specifies the index to be created or updated.
Top |
Specifies the index directory that is used for several files created during index administration.
Top |
Specifies the text that describes the index.
Top |
Specifies the document list file that contains a list of the documents to be indexed.
Top |
Specifies the starting directory to use to find documents to add to the document list.
Top |
Specifies whether to traverse subdirectories of the starting directory when building the document list file.
Top |
Specifies the pattern or filter to use when building the document list. To find HTML files, use the filter *.HTM*.
Top |
Specifies the contents of the documents to be indexed.
Top |
Specifies whether to skip document file errors and continue processing the request or to stop processing on a document file error.
Top |
Specifies whether a case sensitive search is allowed for this index.
Top |
Specifies the characters that are valid for a search on this index.
Top |
Specifies the HTML tags that are used to find additional character strings to index. If *NONE, is selected, all HTML tags are removed from the document before indexing. All searches will be done on the entire document.
Any tag field that is selected will be indexed separately and will also be included in the indexing of the entire document. Tagged fields or the entire document can be selected for a search.
This parameter is ignored unless CONTENT(*HTML) is also specified.
Single values
Other values (up to 5 repetitions)
Top |
Specifies the HTTP server that contains routing directives. The appropriate directives are added to the URL mapping rules file and used with the URL prefix to define the URLs that are displayed for search results.
Top |
Specifies the prefix to use for the URL for documents found on a search.
Top |
Specifies the name of the mapping rules file that contains routing information to use for creating URLs for documents found on a search.
Top |
Specifies whether to delete all of the index or only the supplemental index. The supplemental index is temporarily created when new or modified documents are added to the index.
Top |
Specifies the thesaurus dictionary that can be used on a search.
Top |
Specifies the directory to use for the thesaurus dictionary. Specify a directory that is not used for search indexes.
Top |
Specifies the thesaurus definition file used to create a thesaurus dictionary.
Top |
Specifies the URL object to use for web crawling. This parameter is used when *CRTURLOBJ or *UPDURLOBJ is specified for the OPTION parameter. This object contains a list of URLs that you want to crawl.
Top |
Specifies the directory to use to store documents found when crawling remote web sites. This parameter is used when *CRTURLOBJ or *UPDURLOBJ is specified for the OPTION parameter.
Top |
Specifies the language of the documents that are to be downloaded. These language choices are similar to the character sets or encodings that can be selected on a browser. This parameter is used when *CRTURLOBJ or *UPDURLOBJ is specified for the OPTION parameter
Top |
Specifies the action to take on the URL list for the specifed URL object. This parameter is used when *UPDURLOBJ is specified for the OPTION parameter.
Top |
This parameter is used when *CRTURLOBJ is specified for the OPTION parameter or *UPDURLOBJ is specified for the OPTION parameter and URLACT is *ADD. It specifies the list of URLs and URL attributes that are used in a crawling session:
You can specify 100 values for this parameter.
Element 1: URL
Element 2: URL filter
Element 3: Maximum crawling depth
The maximum depth to crawl from the starting URL. Zero means to stop crawling at the starting URL site. Each additional layer refers to following referenced links within the current URL.
Element 4: Enable robots
Top |
Specifies the list of URLs to remove from the URL object. This parameter is used when *UPDURLOBJ is specified for the OPTION parameter and URLACT is *REMOVE. Enter up to a maximum of 100 URLs to remove from the URL list.
Top |
Specifies the options object to use for crawling. The options object contains crawling session attributes. This parameter is used when *CRTOPTOBJ, *UPDOPTOBJ, or *DLTOPTOBJ is specified for the OPTION parameter.
Top |
Specifies the HTTP proxy server to be used. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies the HTTP proxy server port. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter. A proxy server port is required if a proxy server is also specified.
Top |
Specifies the HTTPS proxy server for using SSL support. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies the HTTPS proxy server port for SSL support. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter. An HTTPS proxy server port is required if an HTTPS proxy server is also specified.
Top |
Specifies the maximum file size, in kilobytes, to download. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies the maximum storage size, in megabytes, to allocate for downloaded files. Crawling will end when this limit is reached. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies the maximum number of threads to start for crawling web sites. Set this value based on the system resources that are available. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies the maximum time for crawling to run, in hours and minutes. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Single values
Element 1: Hours
Element 2: Minutes
Top |
Specifies the activity logging file to be used. This file contains information about the crawling session plus any errors that occur. This file must be in a directory. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies whether to clear the activity log file before starting the crawling session. This parameter is used when *CRTOPTOBJ or *UPDOPTOBJ is specified for the OPTION parameter.
Top |
Specifies whether the document list file to be registered contains paths for files on this system or for files on remote web sites that have been downloaded to this system using the web crawling function. This parameter is used when *REGDOCL is specified for the OPTION parameter
Top |
Specifies the name of the validation list to use for SSL sessions. A validation list contains a URL, a userid, and password. The validation list object is owned by the signed on user and excludes public use. This parameter is used when *CRTVLDL, *ADDVLDLDTA, *RMVVLDLDTA,or *DLTVLDL is specified for the OPTION parameter.
Restrictions: Passwords will be stored in the validation list object in encrypted form. In order to store and decrypt the passwords for authentication, the system value QRETSVRSEC (Retain Server Security) must be set to 1 before the validation list is created. If the system value is changed from 1 to 0 once the validation list exists, the encrypted passwords will be removed and authentication will fail. In this case, the system value will need to be reset to 1 and the validation list deleted and created again.
Top |
Specifies the list of URLs, userids, and passwords to use for SSL sessions. The userid and password pair will be used for the specified URL and any other URLs encountered while crawling within the same domain. This parameter is used when *CRTVLDL or *ADDVLDLDTA is specified for the OPTION parameter. Each validation list entry contains the following:
A maximum of 100 entries can be added to the validation list. This parameter is used when *CRTVLDL or *ADDVLDLDTA is specified for the OPTION parameter.
Element 1: URL
Element 2: User ID
Element 3: Password
Top |
Specifies the list of URLs to remove from an existing validation list. A maximum of 100 entries can be removed from the validation list. This parameter is used when *RMVVLDLDTA is specified for the OPTION parameter.
Top |
Example 1: Create a Document List
CFGHTTPSCH OPTION(*CRTDOCL) DOCLIST('/QIBM/USERDATA/HTTPSVR/INDEX/myindex.DOCUMENT.LIST') STRDIR('/QIBM/ProdData/HTTP/Public/HTTPSVR/HTML')
This example will create a document list called /QIBM/USERDATA/HTTPSVR/INDEX/myindex.DOCUMENT.LIST from the directory /QIBM/ProdData/HTTP/Public/HTTPSVR/HTML using the defaults SUBTREE(*ALL) PATTERN('*.HTM*'). The subdirectories will be searched and only files containing the pattern *.HTM will be included in the list.
Example 2: Create an Index
CFGHTTPSCH OPTION(*CRTIDX) IDX(myindex) DOCLIST('/QIBM/USERDATA/HTTPSVR/INDEX/myindex.DOCUMENT.LIST') IDXHTML(*ABSTRACT)
This example will create an index called myindex in index directory /QIBM/USERDATA/HTTPSVR/INDEX. The document list is in the file /QIBM/USERDATA/HTTPSVR/INDEX/myindex.DOCUMENT.LIST.
In this example the following is defined:
Example 3: Create a Mapping Rules File
CFGHTTPSCH OPTION(*CRTMAPF) CFG('MYCFG') URLPFX('http://www.myserver.com') MAPFILE(/QIBM/USERDATA/HTTPSVR/INDEX/myindex.MAP_FILE)
This example will create a mapping file called '/QIBM/USERDATA/HTTPSVR/INDEX/myindex.MAP_FILE'. The URL prefix 'http://www.myserver.com' plus all of the Pass directives from the MYCFG configuration will be copied to the mapping rules file. When documents are found on a search, the URLPFX will be followed by the path determined from the actual file path and the Pass directive.
If a document is physically located at /root/clothing/doc1.htm, and there is a Pass /clothing/* /root/clothing/* directive in the configuration file, the URL for the document on the search results will be http://www.myserver.com/clothing/doc1.htm .
Top |
*ESCAPE Messages
Top |