This topic provides information about the Webserver search engine and national language considerations.
The Webserver search engine allows you to perform full text searches on HTML and text files. You can control what options are available to the user and how the search results are displayed through customized Net.Data® macros. You can enhance search results by using the thesaurus support. For information on configuring the search engine with the HTTP Server (powered by Apache), see Set up the Webserver search engine on HTTP Server (powered by Apache).
Before you can search, you must have an index. The index is a set of files that contain the contents of the documents (in a searchable form) that are to be searched. The search index is used by the search engine rather than searching all of the actual documents.
A search index is created based upon a document list. A document list contains a list of fully qualified path names of all the documents that you want to index.
Documents satisfying a search request are returned by default in their order of ranking. A document's ranking specifies the relevance with respect to the specified search conditions. The following factors determine a document's ranking:
It is possible that a document with one search term appearing toward the beginning of the document can have a higher ranking than a document with multiple search terms appearing near the end of the document. The search function assumes that words indicating the subject or topic of the document usually appear near the beginning of the document. The highest ranking a document can have is 100%. A document can achieve a ranking of 100% if relatively few of the documents in the index contain the search terms. If many documents in the index contain the search terms, it is likely that none of the documents would achieve a ranking of 100%.
You can provide the following search functions through the customized Net.Data macros:
You can enhance search results through the use of the thesaurus support. A thesaurus contains words that are synonyms or related terms of a search word. For example, searching for Ping-Pong without thesaurus support results only in documents containing the string Ping-Pong. Using thesaurus support that includes synonyms for Ping-Pong, such as table tennis, results in documents containing either the string Ping-Pong or table tennis.
The URL mapping rules file, built from your selected HTTP Server, is used to set the URL for each document found on a search. It can specify the server port number (or instance) to use and can also map resulting file path names to external path names.
Several files are shipped with the product for your use to customize your own Web search function:
File | Description |
---|---|
/QIBM/ProdData/HTTP/Public/HTTPSVR/sample_search.ndm | Sample Net.Data macro that you can customize. |
QIBM/ProdData/HTTP/Public/HTTPSVR/ thesaurus_sample_search.ndm | Sample Net.Data macro with thesaurus support that you can customize. |
/QIBM/ProdData/HTTP/Public/HTTPSVR/sample_search.html | Sample search HTML file. |
/QIBM/ProdData/HTTP/Public/HTTPSVR/HTML/ | Directory of sample HTML files that you can use to build a test search index. |
/QIBM/ProdData/HTTP/Public/HTTPSVR/sample_thesaurus.txt | Sample thesaurus definition file. |
Documents that you are indexing can be encoded in most ASCII codepages and EBCDIC CCSIDs. Because the search engine does not support all CCSIDs, your documents might be converted to one of the supported CCSIDs during the indexing process. To see the CCSID used to index your documents, view the status of the search index.
Wildcard characters in search strings are not allowed for double byte languages. A wildcard search is implied for double byte languages. Both the name of the index and index directory name must be specified in a single byte characters. The contents of documents are often converted to one of the index CCSIDs listed below.
Documents in languages from the included character sets can all be contained in the same index, as long as the documents are indexed separately. For example, an index can contain English and French documents. Create the index including just the English documents, then update the index with the French documents. If you attempt to index Italian and Russian documents in the same index, an error will occur since the two languages cannot be converted to a common index CCSID. In this case you would need to create two separate indexes. The following table describes the supported CCSIDs for indexes.
Index CCSID | Code page name | Included character sets (CCSIDs) |
---|---|---|
500 | Latin 1 |
International Albanian, Belgian English, Belgian French, Canadian French MNCS, Danish, Dutch, Dutch MNCS, English International, English US, Finnish, French (France), French MNCS, German (Germany), German MNCS, Icelandic, Italian, Latin 1/Open Systems, Norwegian, Portuguese (Brazil), Portuguese (Portugal), Swedish |
838 | Thai |
Thai |
870 | Latin 2 |
Croatian, Czech, Hungarian, Polish, Romanian, Serbian (Latin), Slovak, Slovenia |
1025 | Cyrillic |
Bulgarian, Macedonian, Russian, Serbian (Cyrillic) |
1026 | Latin 5 |
Turkish |
875 | Greek |
Greek |
424 | Hebrew |
Hebrew |
420 | Arabic |
Arabic |
1112 | Baltic |
Latvian, Lithuanian |
1122 | Estonian |
Estonian |
935 | Simplified Chinese (GB) |
Simplified Chinese (GB) |
1388 | Simplified Chinese (GBK) |
Simplified Chinese (GBK) |
937 | Traditional Chinese |
Traditional Chinese |
5026 (930) | Japanese Katakana |
Japanese Katakana |
5035 (939) | Japanese Latin |
Japanese Latin |
1364 (933) | Korean |
Korean |
This table shows the browser and CL command interface to all of the search engine and web crawling tasks.
Task | Browser form | CL command |
---|---|---|
Create an index |
Create search index |
CFGHTTPSCH OPTION(*CRTIDX) |
Update an index |
Update search index |
CFGHTTPSCH OPTION(*ADDDOC) CFGHTTPSCH OPTION(*RMVDOC) |
Merge an index |
Merge search index |
CFGHTTPSCH OPTION(*MRGIDX) |
Delete an index |
Delete search index |
CFGHTTPSCH OPTION(*DLTIDX) V4R4 View the status of an index View status of search index: CFGHTTPSCH OPTION(*PRTIDXSTS) |
View the status of an index |
View status of search index |
CFGHTTPSCH OPTION(*PRTIDXSTS) See spoolfile QPZHASRCH |
Create a document list Start the web crawler |
Build a document list |
CFGHTTPSCH OPTION(*CRTDOCL) - local STRHTTPCRL OPTION(*CRTDOCL) - web crawler |
Add documents to a document list |
Build a document list |
CFGHTTPSCH OPTION(*UPDDOCL) Use for local documents. STRHTTPCRL OPTION(*UPDDOCL) Use for documents found with the web crawler. |
Stop a web crawling session. |
Work with document list status |
ENDHTTPCRL |
Pause a web crawling session. |
Work with document list status |
ENDHTTPCRL |
Resume a web crawling session. |
Work with document list status |
RSMHTTPCRL |
Register a document list created before V4R5 |
Register document list |
CFGHTTPSCH OPTION(*REGDOCL) |
Delete a document list |
Delete document list |
CFGHTTPSCH OPTION(*DLTDOCL) |
Display information about a document list |
Work with document list status |
CFGHTTPSCH OPTION(*PRTDOCLSTS) See spoolfile QPZHASRCH |
Create a URL mapping rules file |
Build URL mapping rules file |
CFGHTTPSCH OPTION(*CRTMAPF) |
Append a URL mapping rules file |
Build URL mapping rules file |
CFGHTTPSCH OPTION(*UPDMAPF) |
Build a thesaurus dictionary |
Build thesaurus dictionary |
CFGHTTPSCH OPTION(*CRTTHSDCT) |
Test a thesaurus dictionary |
Test thesaurus dictionary |
None. |
Retrieve a thesaurus definition from a dictionary |
Retrieve thesaurus definition |
CFGHTTPSCH OPTION(*RTVTHSDFNF) |
Delete a thesaurus dictionary |
Delete thesaurus dictionary |
CFGHTTPSCH OPTION(*DLTTHSDCT) |
Create a list of URLs to crawl |
Build URL object |
CFGHTTPSCH OPTION(*CRTURLOBJ) |
Update a list of URLs to crawl |
Build URL object |
CFGHTTPSCH OPTION(*UPDURLOBJ) |
Delete a list of URLs to crawl |
Delete URL object |
CFGHTTPSCH OPTION(*DLTURLOBJ) |
Create an object containing crawling attributes |
Build options object |
CFGHTTPSCH OPTION(*CRTOPTOBJ) |
Update an object containing crawling attributes |
Build options object |
CFGHTTPSCH OPTION(*UPDOPTOBJ) |
Build an object with userid and passwords for authentication |
Build validation list |
CFGHTTPSCH OPTION(*CRTVLDL) |
Add userids and passwords for authentication. |
Build validation list |
CFGHTTPSCH OPTION(*ADDVLDLDTA) |
Remove userids and passwords for authentication. |
Build validation list |
CFGHTTPSCH OPTION(*RMVVLDLDTA) |
Search an index |
Search index |
None |