Where allowed to run: All environments (*ALL) Threadsafe: No |
Parameters Examples Error messages |
The Start HTTP Crawling (STRHTTPCRL) command allows you to create or append to a document list by crawling remote web sites, downloading files found, and saving the path names in the document list specified.
To create a document list, specify *CRTDOCL for the Option (OPTION) parameter.
To update a document list, specify *UPDDOCL for the OPTION parameter.
Top |
Keyword | Description | Choices | Notes |
---|---|---|---|
OPTION | Option | *CRTDOCL, *UPDDOCL | Required, Positional 1 |
METHOD | Crawling method | *OBJECTS, *DETAIL | Optional |
OBJECTS | URL and options objects | Element list | Optional |
Element 1: URL object | Character value | ||
Element 2: Options object | Character value | ||
DOCLIST | Document list file | Path name | Optional |
DOCDIR | Document storage directory | Path name, '/QIBM/USERDATA/HTTPSVR/INDEX/DOC' | Optional |
LANG | Language of documents | *ARABIC, *BALTIC, *CENTEUROPE, *CYRILLIC, *ESTONIAN, *GREEK, *HEBREW, *JAPANESE, *KOREAN, *SIMPCHINESE, *TRADCHINESE, *THAI, *TURKISH, *WESTERN | Optional |
URL | URL | Character value | Optional |
URLFTR | URL filter | Character value, *NONE | Optional |
MAXDEPTH | Maximum crawling depth | 0-100, 3, *NOMAX | Optional |
ENBROBOT | Enable robots | *YES, *NO | Optional |
PRXSVR | Proxy server for HTTP | Character value, *NONE | Optional |
PRXPORT | Proxy port for HTTP | 1-65535 | Optional |
PRXSVRSSL | Proxy server for HTTPS | Character value, *NONE | Optional |
PRXPORTSSL | Proxy port for HTTPS | 1-65535 | Optional |
MAXSIZE | Maximum file size | 1-6000, 1000 | Optional |
MAXSTGSIZE | Maximum storage size | 1-65535, 100, *NOMAX | Optional |
MAXTHD | Maximum threads | 1-50, 20 | Optional |
MAXRUNTIME | Maximum run time | Single values: *NOMAX Other values: Element list |
Optional |
Element 1: Hours | 0-1000, 2 | ||
Element 2: Minutes | 0-59, 0 | ||
LOGFILE | Logging file | Path name, *NONE | Optional |
CLRLOG | Clear logging file | *YES, *NO | Optional |
VLDL | Validation list | Name, *NONE | Optional |
Top |
Specifies the document list task to perform.
This is a required parameter.
Top |
Specifies the crawling method to use.
Top |
Specifies the objects to use for crawling. Both must be specified. Use the Configure HTTP Search (CFGHTTPSCH) command to create the objects.
Element 1: URL object
Element 2: Options object
Top |
Specifies the document list file to hold the path names of the documents found by crawling remote web sites.
Top |
Specifies the directory to use to store the documents that are downloaded.
Top |
Specifies the language of the documents that are to be downloaded. These language choices are similar to the character sets or encodings that can be selected on a browser.
Top |
Specifies the name of the URL (Universal Resource Locator) to crawl.
Top |
The domain filter to limit sites crawled to those within the specified domain.
Top |
The maximum depth to crawl from the starting URL. Zero means to stop crawling at the starting URL site. Each additional layer refers to following referenced links within the current URL.
Top |
Specifies whether to enable support for robot exclusion. If you select to support robot exclusion, any site or pages that contain robot exclusion META tags or files will not be downloaded.
Top |
Specifies the HTTP proxy server to be used.
Top |
Specifies the HTTP proxy server port.
Top |
Specifies the HTTPS proxy server for using SSL support.
Top |
Specifies the HTTPS proxy server port for SSL support.
Top |
Specifies the maximum file size, in kilobytes, to download.
Top |
Specifies the maximum storage size, in megabytes, to allocate for downloaded files. Crawling will end when this limit is reached.
Top |
Specifies the maximum number of threads to start for crawling web sites. Set this value based on the system resources that are available.
Top |
Specifies the maximum time for crawling to run, in hours and minutes.
Single values
Element 1: Hours
Element 2: Minutes
Top |
Specifies the activity logging file to be used. This file contains information about the crawling session plus any errors that occur during the crawling session. This file must be in a directory.
Top |
Specifies whether to clear the activity log file before starting the crawling session.
Top |
Specifies the validation list to use for SSL sessions. Use the Configure HTTP Search (CFGHTTPSCH) command to create a validation list object.
Top |
STRHTTPCRL OPTION(*CRTDOCL) DOCLIST('/mydir/my.doclist') URL('http://www.ibm.com') MAXDEPTH(2)
This command starts a new crawling session finding referenced links 2 layers from the starting URL at www.ibm.com. The document list will be created in '/mydir/my.doclist' and will contain sets of a local directory path, for example, '/QIBM/USERDATA/HTTPSVR/INDEX/DOC/www.ibm.com/us/index.html' and the actual URL to the page 'http://www.ibm.com/us/'. Use the Configure HTTP Search (CFGHTTPSCH) command to create an index using this document list.
Top |
*ESCAPE Messages
Top |