Usually CGI programs are referred to from within HTML documents. In general, the HTML document format defines the environment variables that specify the passing of information. When you design the layout of your HTML document, you must keep in mind how a CGI program might affect the look of your document. Developing the CGI program along with the HTML document will help you avoid many design mistakes.
The CGI process involves three players: The web browser, the web server, and the CGI program. To exemplify how CGI programs for online forms work, let us assume that the web browser has already requested and obtained an HTML form.
The following HTML form illustrates the various types of fields:
<HTML> <HEAD> <TITLE>CGIXMP Test Case</TITLE> </HEAD> <BODY> <H1>CGI Sample Test Case</H1> Fill in the following fields and press APPLY. The values you enter will be read by the CGIXMP.EXE program and displayed in a simple HTML form which is generated dynamically by the program. <P> <HR> <form method=POST action="/cgi-bin/cgixmp"> <P> <H3>Checkbox Field</H3> <P> <PRE> <input type="checkbox" name="var1" value="123"> Check to set variable VAR1 to 123<BR> <input type="checkbox" name="var2" value="XyZ" checked> Check to set variable VAR2 to XyZ<BR> </PRE> <P> <H3>Radio Button Field</H3> <P> <PRE> <input type="radio" name="var3" value="1"> Select to set variable VAR3 to 1<BR> <input type="radio" name="var3" value="2"> Select to set variable VAR3 to 2<BR> <input type="radio" name="var3" value="3" checked> Select to set variable VAR3 to 3<BR> <input type="radio" name="var3" value="4"> Select to set variable VAR3 to 4<BR> </PRE> <P> <H3>Single selection List Field</H3> <P> <PRE> Select a value for variable VAR4 <select size=1 name="var4"> <option>0<option>1<option>2<option>3 <option>4<option>5</select> </PRE> <P> <H3>Text Entry Field</H3> <P> <PRE> Enter value for variable VAR5 <input type="text" name="var5" size=20 maxlength=256 value="TEST value"> </PRE> <P> <H3>Multiple selection List Field</H3> <P> <PRE> Select a value for variable VAR6 <select multiple size=2 name="var6"> <option>Ford<option>Chevrolet<option>Chrysler<option> Ferrari<option>Porsche </select> </PRE> <P> <H3>Password Field</H3> <P> <PRE> Enter Password <input type="password" name="pword" size=10 maxlength=10> </PRE> <P> <H3>Hidden Field</H3> <P> <input type="hidden" name="hidden" value="Text not shown on form..."> <P> <PRE> <input type="submit" name="pushbutton" value="Apply"> <input type="reset" name="pushbutton" value="Reset"> <HR> </PRE> </FORM> </BODY> </HTML>
When you fill out a form, the web browser sends the request to the server in a format that is described as URL-encoded. The web browser also performs this function whenever you enter a phrase in a search field and click on the submission button. In URL-encoded information:
The main advantage of using the GET method is that you can access the CGI program with a query without using a form. In other words, you can create canned queries that pass parameters to the program. For example, if you want to send the previous query to the program directly, you can do the following:
<A HREF="/cgi-bin/program.pgm?Name=John&LName=Richard&user=Smith%Company=IBM"> YourCGI Program</a>
The main advantage to the POST method is that the query length can be unlimited so you do not have to worry about the client or server truncating data. The query string of the GET method cannot exceed 8 KB.
The server can perform ASCII to EBCDIC conversions before sending data to CGI programs. This is needed because the Internet is primarily ASCII-based and the iSeries™ server is an extended binary-coded decimal interchange code (EBCDIC) server. The server can also perform EBCDIC to ASCII conversions before sending data back to the browser. You must provide data to a CGI program through environment variables and standard-input (stdin). HTTP and HTML specifications allow you to tag text data with a character set (charset parameter on the Content-Type header). However, this practice is not widely in use today (although technically required for HTTP1.0/1.1 compliance). According to this specification, text data that is not tagged can be assumed to be in the default character set ISO-8859-1 (US-ASCII). The server correlates this character set with ASCII coded character set identifier (CCSID) 819.
For more information related to CCSIDs, national language support, and international application development, see the Globalization topic in the iSeries Information Center.
You can configure HTTP Server (powered by Apache) to control which mode is used by specifying the CGIConvMode directive in different contexts, such as server config or directory:
CGIConvMode Mode
Where Mode is one of the following:
In addition, the system provides the following CGI environment variables to the CGI program:
The supported environment variables for HTTP Server are can be found in the IBM® iSeries Information Center. The environment variables have been divided into two groups: Non-SSL and SSL.
CGI_MODE | Conversion | Stdin encoding | Environment variable | Query_String encoding | argv encoding |
---|---|---|---|---|---|
BINARY or %%BINARY%% | None | No conversion | CGI job CCSID | No conversion | No conversion |
EBCDIC or %%EBCDIC%% | CGI NetCCSID to CGI job CCSID | CGI job CCSID | CGI job CCSID | CGI job CCSID | CGI job CCSID |
%%EBCDIC%% or %%EBCDIC_JCD%% with charset tag received | Calculate target EBCDIC CCSID based on received ASCII charset tag | EBCDIC equivalent of received charset | CGI job CCSID | CGI job CCSID | CGI job CCSID |
EBCDIC_JCD or %%EBCDIC_JCD%% | Detect input based on received data. Convert data to CGI job CCSID | Detect ASCII input based on received data. Convert data to CGI job CCSID. | CGI job CCSID | Detect ASCII input based on received data. Convert data to CGI job CCSID. | Detect ASCII input based on received data. Convert data to CGI job CCSID |
%%MIXED%% (Compatibility mode) | CGI NetCCSID to CGI job CCSID (receive charset tag is ignored) | CGI job CCSIDwith ASCII escape sequence | CCSID 37 | CCSID 37 with ASCII escape sequence | CCSID 37 with ASCII escape sequence |
URL-encoded forms containing DBCS data could contain ASCII octets that represent parts of DBCS characters. The server can only convert non-encoded character data. This means that it must un-encode the double-byte character set (DBCS) stdin and QUERY_STRING data before performing the conversion. In addition, it has to reassemble and re-encode the resulting EBCDIC representation before passing it to the CGI program. Because of this extra processing, CGI programs that you write to handle DBCS data may choose to receive the data as BINARY and perform all conversions to streamline the entire process.
Using the EBCDIC_JCD mode: The EBCDIC_JCD mode determines what character set is being used by the browser for a given request. This mode is also used to automatically adjust the ASCII/EBCDIC code conversions used by the web server as the request is processed.
After auto detection, the %%EBCDIC_JCD%% or EBCDIC_JCD mode converts the stdin and QUERY_STRING data from the detected network CCSID into the correct EBCDIC CCSID for Japanese. The default conversions configured for the CGI job are overridden. The DefaultFsCCSID directive or the -fsccsid startup parameter specifies the default conversions. The startup FsCCSID must be a Japanese CCSID. Alternately, the CGIJobCCSID can be set to a Japanese CCSID.
The possible detected network code page is Shift JIS, eucJP, and ISO-2022-JP. The following are the associated CCSIDs for each code page:
Shift JIS ========= CCSID 932: IBM PC (old JIS sequence, OS/2 J3.X/4.0, IBM Windows J3.1) CCSID 942: IBM PC (old JIS sequence, OS/2 J3.X/4.0) CCSID 943: MS Shift JIS (new JIS sequence, OS/2 J4.0 MS Windows J3.1/95/NT) eucJP ===== CCSID 5050: Extended UNIX Code (Japanese) ISO-2022-JP =========== CCSID 5052: Subset of RFC 1468 ISO-2022-JP (JIS X 0201 Roman and JIS X 0208-1983) plus JIS X 0201 Katakana. CCSID 5054: Subset of RFC 1468 ISO-20220JP (ASCII and JIS X 0208-1983) plus JIS X 0201 Katakana.
The detected network CCSID is available to the CGI program. The CCSID is stored in the CGI_ASCII_CCSID environment variable. When JCD can not detect, the default code conversion is done as configured (between NetCCSID and FsCCSID or CGIJobCCSID).
Since the code page of Stdin and QUERY_STRING are encoded according to the web client’s outbound code page, we recommend using the following configuration value combinations when you use the EBCDIC_JCD or %%EBCDIC_JCD%% mode.
Startup (FsCCSID)/CGI job CCSID (CGIJobCCSID) | Startup (DefaultNetCCSID)/CGI Net CCSID (DefaultNetCCSID) | Description |
---|---|---|
5026/5035 (See note 4) | 943 Default: | MS Shift JIS |
5026/5035 (See note 4) | 942 Default | IBM PC |
5026/5035 (See note 4) | 5052/5054 Default | ISO-2022-JP |
Using CCSID 5050(eucJP) for the startup NetCCSID, is not recommended. When 5050 is specified for the startup NetCCSID, the default code conversion is done between FsCCSID and 5050. This means that if JCD cannot detect a code page, JCD returns 5050 as the default network CCSID. Most browser’s use a default outbound code page of Shift JIS or ISO-2022-JP, not eucJP.
If the web client sends a charset tag, JCD gives priority to the charset tag. Stdout function is the same. If the charset/ccsid tag is specified in the Content-Type field, stdout gives priority to charset/ccsid tag. Stdout also ignores the JCD detected network CCSID.
Startup NetCCSID Shift JIS (JCD detected CCSID) ---------------- ------------------------------ 932 932 942 942 943 943 5052 943 5054 943 5050 943
main(){ printf("Content-Type: text/html; Charset=ISO-2022-JP\n\n"); ... }
main(){ printf("Content-Type: text/html\n\n); #pragama convert(930) printf("<html>"); printf("This is katakana code page\n"); #pragama convert(0) ... }
The CgiConv conversion mode includes an output mode. This section explains CGI output conversion modes in more detail.
CGI Stdout CCSID/Charset in HTTP header | Conversion action | Server reply charset tag |
---|---|---|
EBCDIC CCSID/Charset | Calculate EBCDIC to ASCII conversion based on supplied EBCDIC CCSID/Charset | Calculated ASCII charset |
ASCII CCSID/Charset | No conversion | Stdout CCSID/Charset as Charset |
65535 | No conversion | None |
None (CGIConvMode= %%BINARY%%, %%BINARY/MIXED%%, or %%BINARY/EBCDIC%%) | Default Conversion - job CCSID to NetCCSID | NetCCSID as charset |
None (CGIConvMode= BINARY or %%BINARY/BINARY%%) | No conversion | None |
None (CGIConvMode= EBCDIC, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%%) | Default Conversion - job CCSID to NetCCSID | NetCCSID as charset |
None (CGIConvMode= EBCDIC, EBCDIC_JCD, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%% with charset tag received on HTTP request) | Use inverse of conversion calculated for stdin | Charset as received on HTTP request |
None (CGIConvMode= %%EBCDIC_JCD%%, %%EBCDIC_JCD/MIXED%%, or %EBCDIC_JCD/EBCDIC%%) | Use inverse of conversion calculated by the Japanese codepage detection | ASCII CCSID as charset |
None (CGIConvMode= %%MIXED%% or %%MIXED/MIXED%%) | Default Conversion - job CCSID to NetCCSID | None (compatibility mode) |
Invalid | CGI error 500 generated by server |
HTTP Server (powered by Apache) also set an environment variable CGI_OUTPUT_MODE to reflect the setting for the CGI output mode. It contains the CGI output conversion mode the server is using for this request. Valid values are %%EBCDIC%%, %%MIXED%%, %%BINARY%%, EBCDIC, BINARY, or EBCDIC_JCD. The program can use this information to determine what conversion, if any, the server performs on CGI output.
When the CGI program is finished, it passes the resulting response to the server by using standard output (stdout). The server interprets the response and sends it to the browser.
A CGI program writes a CGI header that is followed by an entity body to standard output. The CGI header is the information that describes the data in the entity body. The entity body is the data that the server sends to the client. A single newline character always ends the CGI header. The newline character for ILE C is \n. For ILE RPG or ILE COBOL, it is hexadecimal ’15’. The following are some examples of Content-Type headers:
Content-Type: text/html\n\n Content-Type: text/html; charset=iso-8859-2\n\n
If the response is a static document, the CGI program returns either the URL of the document using the CGI Location header or returns a Status header. The CGI program does not have an entity body when using the Location header. If the host name is the local host, HTTP Server will retrieve the specified document that the CGI program sent. It will then send a copy to the web browser. If the host name is not the local host, the HTTP processes it as a redirect to the web browser. For example:
Location: http://www.acme.com/products.html\n\n
The Status header should have a Content_Type: and a Status in the CGI header. When Status is in the CGI header, an entity body should be sent with the data to be returned by the server. The entity body data contains information that the CGI program provides to a client for error processing. The Status line is the Status with an HTTP 3 digit status code and a string of alphanumeric characters (A-Z, a-z, 0-9 and space). The HTTP status code must be a valid 3 digit number from the HTTP/1.1 specification.
CONTENT-TYPE: text/html\n Status: 600 Invalid data\n \n <html><head><title>Invalid data</title> </head><body> <h1>Invalid data typed</h1> <br><pre> The data entered must be valid numeric digits for id number <br></pre> </body></html>
Most CGI programs include the following three stages:
Parsing is the first stage of a CGI program. In this stage, the program takes the data from QUERY_STRING environment variable, command line arguments using argv() or standard input. When the method is GET, the system reads the data from the QUERY_STRING environment variable or command line arguments by using argv(). There is no way to determine the length of data in QUERY_STRING. The system encodes the QUERY_STRING data in the request header.
An example of data read in the QUERY_STRING variable (%%MIXED%% mode):
NAME=Eugene+T%2E+Fox&ADDR=etfox%40ibm.net&INTEREST=RCO
Parsing breaks the fields at the ampersands and decodes the ASCII hexadecimal characters. The results look like this:
NAME=Eugene T. Fox ADDR=etfox@ibm.net INTEREST=RCO
You can use the QtmhCvtDb API to parse the information into a structure. The CGI program can refer to the structure fields. If using %%MIXED%% input mode, the “%xx” encoding values are in ASCII and must be converted into the “%xx” EBCDIC encoding values before calling QtmhCvtDb. If using %%EBCDIC%% mode, the server will do this conversion for you. The system converts ASCII “%xx” first to the ASCII character and then to the EBCDIC character. Ultimately, the system sets the EBCDIC character to the “%xx” in the EBCDIC CCSID.
When the method is POST, the system reads the data from standard input. Before the CGI attempts to read standard input, it must check environment variables REQUEST_METHOD and CONTENT_LENGTH. Read standard input only when the REQUEST_METHOD is POST. The read must specify no more than CONTENT_LENGTH bytes. Attempts to specify more than CONTENT_LENGTH bytes on reading standard input are not defined.
Data manipulation is the second stage of a CGI program. In this stage, the program takes the parsed data and performs the appropriate action. For example, a CGI program designed to process an application form might perform one of the following functions:
Response generation is the final stage of a CGI program. In this stage, the program formulates its response to the web server, which forwards it to the browser. The response contains MIME headers that vary depending on the type of response. With a search, the response might be the URLs of all the documents that met the search value. With a request that results in e-mail, the response might be a message that confirms that the system actually sent the e-mail.