The CGI Process

Usually CGI programs are referred to from within HTML documents. In general, the HTML document format defines the environment variables that specify the passing of information. When you design the layout of your HTML document, you must keep in mind how a CGI program might affect the look of your document. Developing the CGI program along with the HTML document will help you avoid many design mistakes.

Overview

The CGI process involves three players: The web browser, the web server, and the CGI program. To exemplify how CGI programs for online forms work, let us assume that the web browser has already requested and obtained an HTML form.

  1. The user clicks buttons or enters information in fields, and then clicks on an HTML button to submit the request.
  2. The web browser then sends the data to the web server in an encoded format. For example, the data might consist of responses on an HTML form.
  3. When the web server receives data, it converts the data to a format compliant with the CGI specification for input and sends it to the CGI program.
  4. The CGI program then decodes and processes the data.
  5. The system sends this response back to the web server in a form that is compliant with the CGI specification for output.
  6. The web server then interprets the response and forwards it to the web browser.
Note: If a CGI application program must change HTTP Server job attributes while processing, the CGI program must restore the attributes to their initial values before the CGI program ends. Failure to restore job attributes that are changed in the CGI program will result in unpredictable responses to future server requests. For example, a CGI program that requires a library in the library list needs to add the library to the library list. The CGI program must remove the library list before the CGI program ends.

The following HTML form illustrates the various types of fields:

Note: The CGIXMP.EXE program referred to in this sample is just an example; it is not shipped with the server product.
<HTML>
<HEAD>
<TITLE>CGIXMP Test Case</TITLE>
</HEAD>
<BODY>
<H1>CGI Sample Test Case</H1>
Fill in the following fields and press APPLY.
The values you enter will
be read by the CGIXMP.EXE program and displayed in a simple HTML
form which is generated dynamically by the program.
<P> <HR>
<form method=POST action="/cgi-bin/cgixmp">
<P>
<H3>Checkbox Field</H3>
<P>
<PRE>
<input type="checkbox" name="var1" value="123">
Check to set variable VAR1 to 123<BR>
<input type="checkbox" name="var2" value="XyZ" checked>
Check to set variable VAR2 to XyZ<BR>
</PRE>
<P>
<H3>Radio Button Field</H3>
<P>
<PRE>
<input type="radio" name="var3" value="1">
Select to set variable VAR3 to 1<BR>
<input type="radio" name="var3" value="2">
Select to set variable VAR3 to 2<BR>
<input type="radio" name="var3" value="3" checked>
Select to set variable VAR3 to 3<BR>
<input type="radio" name="var3" value="4">
Select to set variable VAR3 to 4<BR>
</PRE>
<P>
<H3>Single selection List Field</H3>
<P>
<PRE>
Select a value for variable VAR4 <select size=1 name="var4">
<option>0<option>1<option>2<option>3
<option>4<option>5</select>
</PRE>
<P>
<H3>Text Entry Field</H3>
<P>
<PRE>
Enter value for variable VAR5 <input type="text" name="var5"
size=20 maxlength=256 value="TEST value">
</PRE>
<P>
<H3>Multiple selection List Field</H3>
<P>
<PRE>
Select a value for variable VAR6
<select multiple size=2 name="var6">
<option>Ford<option>Chevrolet<option>Chrysler<option>
Ferrari<option>Porsche
</select>
</PRE>
<P>
<H3>Password Field</H3>
<P>
<PRE>
Enter Password
<input type="password" name="pword" size=10 maxlength=10>
</PRE>
<P>
<H3>Hidden Field</H3>
<P>
<input type="hidden" name="hidden" value="Text not shown on form...">
<P>
<PRE>
<input type="submit" name="pushbutton" value="Apply">
<input type="reset" name="pushbutton" value="Reset">
<HR>
</PRE>
</FORM>
</BODY>
</HTML>

Sending Information to the Server

When you fill out a form, the web browser sends the request to the server in a format that is described as URL-encoded. The web browser also performs this function whenever you enter a phrase in a search field and click on the submission button. In URL-encoded information:

Note: The method attribute specifies how the server sends the form information to the program. You use the GET and POST methods in the HTML file to process forms. The GET method sends the information through environment variables. You will see the information in the URL after the ″?″ character. The POST method passes the data through standard input.

The main advantage of using the GET method is that you can access the CGI program with a query without using a form. In other words, you can create canned queries that pass parameters to the program. For example, if you want to send the previous query to the program directly, you can do the following:

<A HREF="/cgi-bin/program.pgm?Name=John&LName=Richard&user=Smith%Company=IBM">
YourCGI Program</a>

The main advantage to the POST method is that the query length can be unlimited so you do not have to worry about the client or server truncating data. The query string of the GET method cannot exceed 8 KB.

Data Conversions on CGI Input and Output

The server can perform ASCII to EBCDIC conversions before sending data to CGI programs. This is needed because the Internet is primarily ASCII-based and the iSeries™ server is an extended binary-coded decimal interchange code (EBCDIC) server. The server can also perform EBCDIC to ASCII conversions before sending data back to the browser. You must provide data to a CGI program through environment variables and standard-input (stdin). HTTP and HTML specifications allow you to tag text data with a character set (charset parameter on the Content-Type header). However, this practice is not widely in use today (although technically required for HTTP1.0/1.1 compliance). According to this specification, text data that is not tagged can be assumed to be in the default character set ISO-8859-1 (US-ASCII). The server correlates this character set with ASCII coded character set identifier (CCSID) 819.

For more information related to CCSIDs, national language support, and international application development, see the Globalization topic in the iSeries Information Center.

National Language Support Directives

You can configure HTTP Server (powered by Apache) to control which mode is used by specifying the CGIConvMode directive in different contexts, such as server config or directory:

CGIConvMode Mode

Where Mode is one of the following:

  • BINARY
  • EBCDIC
  • EBCDIC_JCD
You can configure HTTP Server (powered by Apache) to set the ASCII and EBCDIC CCSIDs that are used for conversions by specifying the directives DefaultNetCCSID and CGIJobCCSID in different contexts, such as server config or directory. For example:
  • DefaultNetCCSID 819
  • CGIJobCCSID 37
You can configure HTTP Server (powered by Apache) to set the locale environment variable by specifying the CGIJobLocale in different contexts, such as server config or directory: CGIJobLocale /QSYS.LIB/EN_US.LOCALE.

CGI Environment Variables

In addition, the system provides the following CGI environment variables to the CGI program:

  • CGI_MODE - which input conversion mode the server is using (%%MIXED%%, %%EBCDIC%%, %%BINARY%%, %%EBCDIC_JCD%%, EBCDIC, BINARY, or EBCDIC_JCD)
  • CGI_ASCII_CCSID - from which ASCII CCSID was used to convert the data
  • CGI_EBCDIC_CCSID - which EBCDIC CCSID the data was converted into
  • CGI_OUTPUT_MODE - which output conversion mode the server is using (%%MIXED%%, %%EBCDIC%%, %%BINARY%, EBCDIC, or BINARY)
  • CGI_JOB_LOCALE - which locale to use in the CGI program. This environment variable is set only if the CGIJobLocale directive is set.

The supported environment variables for HTTP Server are can be found in the IBM® iSeries Information Center. The environment variables have been divided into two groups: Non-SSL and SSL.

CGI Input Conversion Modes

Table 1. Conversion action for text in CGI Stdin. This table summarizes the type of conversion that is performed by the server for each CGI mode.
CGI_MODE Conversion Stdin encoding Environment variable Query_String encoding argv encoding
BINARY or %%BINARY%% None No conversion CGI job CCSID No conversion No conversion
EBCDIC or %%EBCDIC%% CGI NetCCSID to CGI job CCSID CGI job CCSID CGI job CCSID CGI job CCSID CGI job CCSID
%%EBCDIC%% or %%EBCDIC_JCD%% with charset tag received Calculate target EBCDIC CCSID based on received ASCII charset tag EBCDIC equivalent of received charset CGI job CCSID CGI job CCSID CGI job CCSID
EBCDIC_JCD or %%EBCDIC_JCD%% Detect input based on received data. Convert data to CGI job CCSID Detect ASCII input based on received data. Convert data to CGI job CCSID. CGI job CCSID Detect ASCII input based on received data. Convert data to CGI job CCSID. Detect ASCII input based on received data. Convert data to CGI job CCSID
%%MIXED%% (Compatibility mode) CGI NetCCSID to CGI job CCSID (receive charset tag is ignored) CGI job CCSIDwith ASCII escape sequence CCSID 37 CCSID 37 with ASCII escape sequence CCSID 37 with ASCII escape sequence
Note: If the directive CGIJobCCSID is present, the CGI job runs under its specified CCSID value. Otherwise, the DefaultFsCCSID value is used (the default job CCSID).
BINARY
The BINARY mode, delivers QueryString and stdin to the CGI program in ASCII, exactly as it was received from the client. The environment variables are in the CGI job CCSID. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
EBCDIC
The EBCDIC mode, delivers all of the information to the CGI program in the job CCSID. The ASCII CCSID of the QueryString or stdin data is determined from a charset tag on the content type header if present. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
EBCDIC_JCD
The EBCDIC_JCD mode is the same as the EBCDIC mode except that a well-known Japanese codepage detection algorithm is used to determine the ASCII CCSID when the charset tag is not present. Japanese browsers can potentially send data in one of three code pages, JIS (ISO-2022-JP), S-JIS (PC-Windows), or EUC (UNIX®).

DBCS Considerations

URL-encoded forms containing DBCS data could contain ASCII octets that represent parts of DBCS characters. The server can only convert non-encoded character data. This means that it must un-encode the double-byte character set (DBCS) stdin and QUERY_STRING data before performing the conversion. In addition, it has to reassemble and re-encode the resulting EBCDIC representation before passing it to the CGI program. Because of this extra processing, CGI programs that you write to handle DBCS data may choose to receive the data as BINARY and perform all conversions to streamline the entire process.

Using the EBCDIC_JCD mode: The EBCDIC_JCD mode determines what character set is being used by the browser for a given request. This mode is also used to automatically adjust the ASCII/EBCDIC code conversions used by the web server as the request is processed.

After auto detection, the %%EBCDIC_JCD%% or EBCDIC_JCD mode converts the stdin and QUERY_STRING data from the detected network CCSID into the correct EBCDIC CCSID for Japanese. The default conversions configured for the CGI job are overridden. The DefaultFsCCSID directive or the -fsccsid startup parameter specifies the default conversions. The startup FsCCSID must be a Japanese CCSID. Alternately, the CGIJobCCSID can be set to a Japanese CCSID.

The possible detected network code page is Shift JIS, eucJP, and ISO-2022-JP. The following are the associated CCSIDs for each code page:

Shift JIS
=========
CCSID 932: IBM PC (old JIS sequence, OS/2 J3.X/4.0, IBM Windows J3.1)
CCSID 942: IBM PC (old JIS sequence, OS/2 J3.X/4.0)
CCSID 943: MS Shift JIS (new JIS sequence, OS/2 J4.0
MS Windows J3.1/95/NT)
eucJP
=====
CCSID 5050: Extended UNIX Code (Japanese)
ISO-2022-JP
===========
CCSID 5052: Subset of RFC 1468 ISO-2022-JP (JIS X 0201 Roman and
JIS X 0208-1983) plus JIS X 0201 Katakana.
CCSID 5054: Subset of RFC 1468 ISO-20220JP (ASCII and JIS X 0208-1983)
plus JIS X 0201 Katakana.

The detected network CCSID is available to the CGI program. The CCSID is stored in the CGI_ASCII_CCSID environment variable. When JCD can not detect, the default code conversion is done as configured (between NetCCSID and FsCCSID or CGIJobCCSID).

Since the code page of Stdin and QUERY_STRING are encoded according to the web client’s outbound code page, we recommend using the following configuration value combinations when you use the EBCDIC_JCD or %%EBCDIC_JCD%% mode.

Table 2.
Startup (FsCCSID)/CGI job CCSID (CGIJobCCSID) Startup (DefaultNetCCSID)/CGI Net CCSID (DefaultNetCCSID) Description
5026/5035 (See note 4) 943 Default: MS Shift JIS
5026/5035 (See note 4) 942 Default IBM PC
5026/5035 (See note 4) 5052/5054 Default ISO-2022-JP

Using CCSID 5050(eucJP) for the startup NetCCSID, is not recommended. When 5050 is specified for the startup NetCCSID, the default code conversion is done between FsCCSID and 5050. This means that if JCD cannot detect a code page, JCD returns 5050 as the default network CCSID. Most browser’s use a default outbound code page of Shift JIS or ISO-2022-JP, not eucJP.

If the web client sends a charset tag, JCD gives priority to the charset tag. Stdout function is the same. If the charset/ccsid tag is specified in the Content-Type field, stdout gives priority to charset/ccsid tag. Stdout also ignores the JCD detected network CCSID.

Notes:
  1. If startup NetCCSID is 932 or 942, detected network, Shift JIS’s CCSID is the same as startup NetCCSID. Otherwise, Shift JIS’s CCSID is 943.
    Startup NetCCSID Shift JIS (JCD detected CCSID)
    ---------------- ------------------------------
    932 932
    942 942
    943 943
    5052 943
    5054 943
    5050 943
  2. Netscape Navigator 3.x sends the alphanumeric characters by using JIS X 0201 Roman escape sequence (CCSID 5052) for ISO-2022-JP. Netscape Communicator 4.x sends the alphanumeric characters by using ASCII escape sequence (CCSID 5054) for ISO-2022-JP.
  3. JCD function has the capability to detect EUC and SBCS Katakana, but it is difficult to detect them. IBM recommends that you do not use SBCS Katakana and EUC in CGI.
  4. CCSID 5026 assigns lowercase alphabet characters on a special code point. This often causes a problem with lowercase alphabet characters. To avoid this problem, do one of the following:
    • Do not use lowercase alphabet literals in CGI programs if the FsCCSID is 5026.
    • Use CCSID 5035 for FsCCSID.
    • Use the Charset/CCSID tag as illustrated in the following excerpt of a CGI program:
      main(){
      printf("Content-Type: text/html; Charset=ISO-2022-JP\n\n");
      ...
      }
    • Do the code conversions in the CGI program. The following sample ILE C program converts the literals into CCSID 930 (the equivalent to CCSID 5026):
      main(){
      printf("Content-Type: text/html\n\n);
      #pragama convert(930)
      printf("<html>");
      printf("This is katakana code page\n");
      #pragama convert(0)
      ...
      }
    • When the web client sends a charset tag, the network CCSID becomes the ASCII CCSID associated with Multipurpose Internet Mail Extensions (MIME) charset header. The charset tag ignores the JCD detected CCSID. When the Charset/CCSID tag is in the Content-Type header generated by the CGI program, the JCD-detected CCSID is ignored by this Charset/CCSID. Stdout will not perform a conversion if the charset is the same as the MIME’s charset. Stdout will not perform a conversion if the CCSID is ASCII. Stdout will perform code conversion if the CCSID is EBCDIC. Because the environment variables and stdin are already stored in job CCSID, ensure that you are consistent between the job CCSID and the Content-Type header’s CCSID.

CGI Output Conversion Modes

The CgiConv conversion mode includes an output mode. This section explains CGI output conversion modes in more detail.

Table 3. Conversion action and charset tag generation for text in CGI Stdout. This table summarizes the type of conversion that is performed and the charset tag that is returned to the browser by the server.
CGI Stdout CCSID/Charset in HTTP header Conversion action Server reply charset tag
EBCDIC CCSID/Charset Calculate EBCDIC to ASCII conversion based on supplied EBCDIC CCSID/Charset Calculated ASCII charset
ASCII CCSID/Charset No conversion Stdout CCSID/Charset as Charset
65535 No conversion None
None (CGIConvMode= %%BINARY%%, %%BINARY/MIXED%%, or %%BINARY/EBCDIC%%) Default Conversion - job CCSID to NetCCSID NetCCSID as charset
None (CGIConvMode= BINARY or %%BINARY/BINARY%%) No conversion None
None (CGIConvMode= EBCDIC, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%%) Default Conversion - job CCSID to NetCCSID NetCCSID as charset
None (CGIConvMode= EBCDIC, EBCDIC_JCD, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%% with charset tag received on HTTP request) Use inverse of conversion calculated for stdin Charset as received on HTTP request
None (CGIConvMode= %%EBCDIC_JCD%%, %%EBCDIC_JCD/MIXED%%, or %EBCDIC_JCD/EBCDIC%%) Use inverse of conversion calculated by the Japanese codepage detection ASCII CCSID as charset
None (CGIConvMode= %%MIXED%% or %%MIXED/MIXED%%) Default Conversion - job CCSID to NetCCSID None (compatibility mode)
Invalid CGI error 500 generated by server

HTTP Server (powered by Apache) also set an environment variable CGI_OUTPUT_MODE to reflect the setting for the CGI output mode. It contains the CGI output conversion mode the server is using for this request. Valid values are %%EBCDIC%%, %%MIXED%%, %%BINARY%%, EBCDIC, BINARY, or EBCDIC_JCD. The program can use this information to determine what conversion, if any, the server performs on CGI output.

HTTP Server (powered by Apache)

BINARY
In this mode HTTP header output is in CCSID 819 with the escape sequences also being the ASCII representative of the ASCII code point. An example of a HTTP header that may contain escape sequences is the Location header. The body is always treated as binary data and the server performs no conversion.
EBCDIC
In this mode HTTP header output is assumed to be in the CGI job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. However, the escape sequence must be the EBCDIC representative of the EBCDIC code point for the 2 characters following the ″%″ in the escape sequence. An example of a HTTP header that may contain escape sequences is the Location header. The body (if the mime type is text/*) is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. If CGIJobCCSID is present the CGI job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
EBCDIC_JCD
In this mode HTTP header output is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. However, the escape sequence must be the EBCDIC representation of the EBCDIC code point for the 2 characters following the ″%″ in the escape sequence. An example of a HTTP header that may contain escape sequences is the Location header. The body (if the mime type is text/*) is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.

Returning Output from the Server

When the CGI program is finished, it passes the resulting response to the server by using standard output (stdout). The server interprets the response and sends it to the browser.

A CGI program writes a CGI header that is followed by an entity body to standard output. The CGI header is the information that describes the data in the entity body. The entity body is the data that the server sends to the client. A single newline character always ends the CGI header. The newline character for ILE C is \n. For ILE RPG or ILE COBOL, it is hexadecimal ’15’. The following are some examples of Content-Type headers:

Content-Type: text/html\n\n
Content-Type: text/html; charset=iso-8859-2\n\n

If the response is a static document, the CGI program returns either the URL of the document using the CGI Location header or returns a Status header. The CGI program does not have an entity body when using the Location header. If the host name is the local host, HTTP Server will retrieve the specified document that the CGI program sent. It will then send a copy to the web browser. If the host name is not the local host, the HTTP processes it as a redirect to the web browser. For example:

Location: http://www.acme.com/products.html\n\n

The Status header should have a Content_Type: and a Status in the CGI header. When Status is in the CGI header, an entity body should be sent with the data to be returned by the server. The entity body data contains information that the CGI program provides to a client for error processing. The Status line is the Status with an HTTP 3 digit status code and a string of alphanumeric characters (A-Z, a-z, 0-9 and space). The HTTP status code must be a valid 3 digit number from the HTTP/1.1 specification.

Note: The newline character \n ends the CGI header.
CONTENT-TYPE: text/html\n
Status: 600 Invalid data\n
\n
<html><head><title>Invalid data</title>
</head><body>
<h1>Invalid data typed</h1>
<br><pre>
The data entered must be valid numeric digits for id number
<br></pre>
</body></html>

How CGI Programs Work

Most CGI programs include the following three stages:

Note: Any CGI program with a name that begins with nph_ is considered a no parse header CGI program. This means that the server does no conversions on the data and adds no headers back in the response from the CGI program. The CGI programmer is in total control and is responsible for parsing the request and then sending all of the necessary headers back with the response.

Parsing

Parsing is the first stage of a CGI program. In this stage, the program takes the data from QUERY_STRING environment variable, command line arguments using argv() or standard input. When the method is GET, the system reads the data from the QUERY_STRING environment variable or command line arguments by using argv(). There is no way to determine the length of data in QUERY_STRING. The system encodes the QUERY_STRING data in the request header.

An example of data read in the QUERY_STRING variable (%%MIXED%% mode):

NAME=Eugene+T%2E+Fox&ADDR=etfox%40ibm.net&INTEREST=RCO

Parsing breaks the fields at the ampersands and decodes the ASCII hexadecimal characters. The results look like this:

NAME=Eugene T. Fox
ADDR=etfox@ibm.net
INTEREST=RCO

You can use the QtmhCvtDb API to parse the information into a structure. The CGI program can refer to the structure fields. If using %%MIXED%% input mode, the “%xx” encoding values are in ASCII and must be converted into the “%xx” EBCDIC encoding values before calling QtmhCvtDb. If using %%EBCDIC%% mode, the server will do this conversion for you. The system converts ASCII “%xx” first to the ASCII character and then to the EBCDIC character. Ultimately, the system sets the EBCDIC character to the “%xx” in the EBCDIC CCSID.

When the method is POST, the system reads the data from standard input. Before the CGI attempts to read standard input, it must check environment variables REQUEST_METHOD and CONTENT_LENGTH. Read standard input only when the REQUEST_METHOD is POST. The read must specify no more than CONTENT_LENGTH bytes. Attempts to specify more than CONTENT_LENGTH bytes on reading standard input are not defined.

Data manipulation

Data manipulation is the second stage of a CGI program. In this stage, the program takes the parsed data and performs the appropriate action. For example, a CGI program designed to process an application form might perform one of the following functions:

  1. Take the input from the parsing stage
  2. Convert abbreviations into more meaningful information
  3. Plug the information into an e-mail template
  4. Use SNDDST to send the e-mail.

Response generation

Response generation is the final stage of a CGI program. In this stage, the program formulates its response to the web server, which forwards it to the browser. The response contains MIME headers that vary depending on the type of response. With a search, the response might be the URLs of all the documents that met the search value. With a request that results in e-mail, the response might be a message that confirms that the system actually sent the e-mail.