subselect

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram>>-select-clause--from-clause--+--------------+----------------->
                               '-where-clause-'
 
>--+-----------------+--+---------------+----------------------><
   '-group-by-clause-'  '-having-clause-'
 

The subselect is a component of the fullselect.

A subselect specifies a result table derived from the tables or views identified in the FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next. (This is only a way of describing the subselect. The method used to perform the derivation may be quite different from this description. If portions of the subselect do not actually need to be executed for the correct result to be obtained, they may or may not be executed.)

A scalar-subselect is a subselect, enclosed in parentheses, that returns a single result row and a single result column. If the result of the subselect is no rows, then the null value is returned. An error is returned if there is more than one row in the result.

The sequence of the (hypothetical) operations is:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause

select-clause

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram           .-ALL------.
>>-SELECT--+----------+----------------------------------------->
           '-DISTINCT-'
 
>--+-*-------------------------------------------+-------------><
   | .-,---------------------------------------. |
   | V                                         | |
   '---+-expression--+---------------------+-+-+-'
       |             | .-AS-.              | |
       |             '-+----+--column-name-' |
       +-table-name.*------------------------+
       +-view-name.*-------------------------+
       '-correlation-name.*------------------'
 

The SELECT clause specifies the columns of the final result table. The column values are produced by the application of the select list to R. The select list is the names or expressions specified in the SELECT clause, and R is the result of the previous operation of the subselect. For example, if the only clauses specified are SELECT, FROM, and WHERE, R is the result of that WHERE clause.

ALL
Selects all rows of the final result table and does not eliminate duplicates. This is the default.
DISTINCT
Eliminates all but one of each set of duplicate rows of the final result table. Two rows are duplicates of one another only if each value in the first row is equal to the corresponding value in the second row. (For determining duplicate rows, two null values are considered equal.) Sort sequence is also used for determining distinct values.

DISTINCT is not allowed if the select list contains a LOB or DATALINK column.

Select list notation

*
Represents a list of columns of table R in the order the columns are produced by the FROM clause. The list of names is established when the statement containing the SELECT clause is prepared. Therefore, * does not identify any columns that have been added to a table after the statement has been prepared.
expression
Specifies the values of a result column. Each column-name in the expression must unambiguously identify a column of R.
column-name or AS column-name
Names or renames the result column. The name must not be qualified and does not have to be unique.
name.*
Represents a list of columns of name in the order the columns are produced by the FROM clause. The name can be a table name, view name, or correlation name, and must designate an exposed table, view, or correlation name in the FROM clause immediately following the SELECT clause. The first name in the list identifies the first column of the table or view, the second name in the list identifies the second column of the table or view, and so on.

The list of names is established when the statement containing the SELECT clause is prepared. Therefore, * does not identify any columns that have been added to a table after the statement has been prepared.

Normally, when SQL statements are implicitly rebound, the list of names is not re-established. Therefore, the number of columns returned by the statement does not change. However, there are four cases where the list of names is established again and the number of columns can change:

The number of columns in the result of SELECT is the same as the number of expressions in the operational form of the select list (that is, the list established at prepare time), and cannot exceed 8000. The result of a subquery must be a single expression, unless the subquery is used in the EXISTS predicate.

Applying the select list

The results of applying the select list to R depend on whether or not GROUP BY or HAVING is used:

If GROUP BY or HAVING is used
If neither GROUP BY nor HAVING is used

In either case the nth column of the result contains the values specified by applying the nth expression in the operational form of the select list.

Null attributes of result columns

Result columns allow null values if they are derived from:

Names of result columns

Data types of result columns

Each column of the result of SELECT acquires a data type from the expression from which it is derived.

When the expression is: The data type of the result column is:
the name of any numeric column the same as the data type of the column, with the same precision and scale for decimal columns.
an integer constant INTEGER or BIGINT (if the value of the constant is outside the range of INTEGER, but within the range of BIGINT).
a decimal or floating-point constant the same as the data type of the constant, with the same precision and scale for decimal constants.
the name of any numeric variable the same as the data type of the variable, with the same precision and scale for decimal variables. If the data type of the variable is not identical to an SQL data type (for example, DISPLAY SIGN LEADING SEPARATE in COBOL), the result column is decimal.
an expression the same as the data type of the result, with the same precision and scale for decimal results as described under Expressions.
any function the data type of the result of the function. For a built-in function, see Built-in functions to determine the data type of the result. For a user-defined function, the data type of the result is what was defined in the CREATE FUNCTION statement for the function.
the name of any string column the same as the data type of the column, with the same length attribute.
the name of any string variable the same as the data type of the variable, with a length attribute equal to the length of the variable. If the data type of the variable is not identical to an SQL data type (for example, a NUL-terminated string in C), the result column is a varying-length string.
a character-string constant of length n VARCHAR(n)
a graphic-string constant of length n VARGRAPHIC(n)
the name of a datetime column, or an ILE RPG compiler or ILE COBOL compiler datetime host variable the same as the data type of the column or variable.
the name of a datalink column a datalink, with the same length attribute.
the name of a row ID column or a row ID variable ROWID
the name of a distinct type column the same as the distinct type of the column, with the same length, precision, and scale attributes, if any.

from-clause

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram         .-,---------------.
         V                 |
>>-FROM----table-reference-+-----------------------------------><
 

The FROM clause specifies an intermediate result table.

If only one table-reference is specified, the intermediate result table is simply the result of that table-reference. If more than one table-reference is specified in the FROM clause, the intermediate result table consists of all possible combinations of the rows of the specified table-references (the Cartesian product). Each row of the result is a row from the first table-reference concatenated with a row from the second table-reference, concatenated in turn with a row from the third, and so on. The number of rows in the result is the product of the number of rows in all the individual table-references.

table-reference

 

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram>>-+-single-table------------+---------------------------------><
   +-nested-table-expression-+
   +-table-function----------+
   '-joined-table------------'
 
single-table:
 
|--+-table-name-+--+--------------------+-----------------------|
   '-view-name--'  '-correlation-clause-'
 
nested-table-expression:
 
|--+---------+--(--fullselect--+-----------------+--+--------------------+--)-->
   '-LATERAL-'                 '-order-by-clause-'  '-fetch-first-clause-'
 
>--correlation-clause-------------------------------------------|
 
table-function:
 
|--TABLE--(--function-invocation--)--correlation-clause---------|
 
correlation-clause:
 
   .-AS-.
|--+----+--correlation-name--+-----------------------+----------|
                             |    .-,-----------.    |
                             |    V             |    |
                             '-(----column-name-+--)-'
 

 

A table-reference specifies an intermediate result table.

If function-name is specified, the TABLE or LATERAL keyword is specified, or a table-reference identifies a distributed table, a table that has a read trigger, or logical file built over multiple physical file members; the query cannot contain:

The list of names in the FROM clause must conform to these rules:

Each correlation-name is defined as a designator of the intermediate result table specified by the immediately preceding table-reference. A correlation-name must be specified for nested table expressions and table functions.

The exposed names of all table references should be unique. An exposed name is:

Any qualified reference to a column for a table, view, nested table expression, or table function must use the exposed name. If the same table name or view name is specified twice, at least one specification should be followed by a correlation-name. The correlation-name is used to qualify references to the columns of the table or view. When a correlation-name is specified, column-names can also be specified to give names to the columns of the table-name, view-name, nested-table-expression or table-function. If a column list is specified, there must be a name in the column list for each column in the table or view and for each result column in the nested-table-expression or table-function. For more information, see Correlation names.

In general, nested-table-expressions and table-functions can be specified in any FROM clause. Columns from the nested table expressions and table functions can be referenced in the select list and in the rest of the subselect using the correlation name which must be specified. The scope of this correlation name is the same as correlation names for other table or view names in the FROM clause. A nested table expression can be used:

Correlated references in table-references

Correlated references can be used in nested-table-expressions. The basic rule that applies is that the correlated reference must be from a table-reference at a higher level in the hierarchy of subqueries. This hierarchy includes the table-references that have already been resolved in the left-to-right processing of the FROM clause. For nested table expressions, the TABLE or LATERAL keyword must appear before the fullselect. For more information see Column name qualifiers to avoid ambiguity

A table function can contain one or more correlated references to other tables in the same FROM clause if the referenced tables precede the reference in the left-to-right order of the tables in the FROM clause. The same capability exists for nested table expressions if the optional keyword TABLE or LATERAL is specified. Otherwise, only references to higher levels in the hierarchy of subqueries is allowed.

A nested table expression or table function that contains correlated references to other tables in the same FROM clause:

A nested table expression cannot contain a correlated reference to other tables in the same FROM clause when:

Syntax Alternatives: TABLE can be specified in place of LATERAL.

Example 1

The following example is valid:

SELECT D.DEPTNO, D.DEPTNAME, EMPINFO.AVGSAL, EMPINFO.EMPCOUNT
FROM DEPARTMENT D,
  (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT
     FROM EMPLOYEE E
     WHERE E.WORKDEPT =
       (SELECT X.DEPTNO
         FROM DEPARTMENT X
         WHERE X.DEPTNO = E.WORKDEPT ) ) AS EMPINFO   

The following example is not valid because the reference to D.DEPTNO in the WHERE clause of the nested-table-expression attempts to reference a table that is outside the hierarchy of subqueries:

SELECT D.DEPTNO, D.DEPTNAME,
        EMPINFO.AVGSAL, EMPINFO.EMPCOUNT                      ***INCORRECT***
  FROM DEPARTMENT D,
  (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT
     FROM EMPLOYEE E
     WHERE E.WORKDEPT = D.DEPTNO ) AS EMPINFO

The following example is valid because the reference to D.DEPTNO in the WHERE clause of the nested-table-expression references DEPT, which precedes the nested-table-expression and the LATERAL keyword was specified:

SELECT D.DEPTNO, D.DEPTNAME,
        EMPINFO.AVGSAL, EMPINFO.EMPCOUNT
  FROM DEPARTMENT D,
  LATERAL (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT
     FROM EMPLOYEE E
     WHERE E.WORKDEPT = D.DEPTNO ) AS EMPINFO
Example 2

The following example of a table function is valid:

SELECT t.c1, z.c5
FROM t, TABLE(tf3 (t.c2 ) ) ASWHERE t.c3 = z.c4

The following example is not valid because the reference to t.c2 is for a table that is to the right of the table function in the FROM clause:

SELECT t.c1, z.c5
FROM TABLE(tf6 (t.c2 ) ) AS z, t                  ***INCORRECT***
WHERE t.c3 = z.c4

joined-table

 

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram                        .-INNER----------------.
|--+-table-reference--+-+----------------------+--JOIN--table-reference--+-ON--join-condition-----------+-+-+--|
   |                  | |            .-OUTER-. |                         |           .-,-----------.    | | |
   |                  | +-+-LEFT--+--+-------+-+                         |           V             |    | | |
   |                  | | '-RIGHT-'            |                         '-USING--(----column-name-+--)-' | |
   |                  | | .-LEFT--.            |                                                          | |
   |                  | '-+-------+--EXCEPTION-'                                                          | |
   |                  |   '-RIGHT-'                                                                       | |
   |                  '-CROSS JOIN--table-reference-------------------------------------------------------' |
   '-(--joined-table--)-------------------------------------------------------------------------------------'
 

 

A joined-table specifies an intermediate result table that is the result of either an inner, outer, cross, or exception join. The table is derived by applying one of the join operators: INNER, LEFT OUTER, RIGHT OUTER, LEFT EXCEPTION, RIGHT EXCEPTION or CROSS to its operands.

If a join operator is not specified, INNER is implicit. The order in which multiple joins are performed can affect the result. Joins can be nested within other joins. The order of processing for joins is generally from left to right, but based on the position of the required join-condition or USING clause. Parentheses are recommended to make the order of nested joins more readable. For example:

   TB1 LEFT JOIN TB2 ON TB1.C1=TB2.C1
       LEFT JOIN TB3 LEFT JOIN TB4 ON TB3.C1=TB4.C1
               ON TB1.C1=TB3.C1

is the same as

   (TB1 LEFT JOIN TB2 ON TB1.C1=TB2.C1)
        LEFT JOIN (TB3 LEFT JOIN TB4 ON TB3.C1=TB4.C1)
              ON TB1.C1=TB3.C1

An inner join combines each row of the left table with every row of the right table keeping only the rows where the join-condition (or USING clause) is true. Thus, the result table may be missing rows of from either or both of the joined tables. Outer joins include the rows produced by the inner join as well as the missing rows, depending on the type of outer join. Exception joins include only the missing rows, depending on the type of exception join as follows:

A joined table can be used in any context in which any form of the SELECT statement is used. A view or a cursor is read-only if its SELECT statement includes a joined table.

Join condition

The join-condition is a search-condition that must conform to these rules:

For any type of join, column references in an expression of the join-condition are resolved using the rules for resolution of column name qualifiers specified in Column names before any rules about which tables the columns must belong to are applied.

Join USING

The USING clause specifies a shorthand way of defining the join condition. This form is known as a named-columns-join.

column-name
Must unambiguously identify a column that exists in both table-references of the joined table. The column must not be a DATALINK column.

The USING clause is equivalent to a join-condition in which each column from the left table-reference is compared equal to a column of the same name in the right table-reference. For example, a named-columns-join of the form:

   TB1 INNER JOIN TB2
       USING (C1, C2, ... Cn)

is equivalent to:

   TB1 INNER JOIN TB2 
       ON TB1.C1 = TB2.C1 AND
          TB1.C2 = TB2.C2 AND
          ...
          TB1.Cn = TB2.Cn 
Join operations

A join-condition (or USING clause) specifies pairings of T1 and T2, where T1 and T2 are the left and right operand tables of the JOIN operator of the join-condition (or USING clause). For all possible combinations of rows of T1 and T2, a row of T1 is paired with a row of T2 if the join-condition (or USING clause) is true. When a row of T1 is joined with a row of T2, a row in the result consists of the values of that row of T1 concatenated with the values of that row of T2. In the case of OUTER joins, the execution might involve the generation of a null row. The null row of a table consists of a null value for each column of the table, regardless of whether the columns allow null values.

INNER JOIN or JOIN
The result of T1 INNER JOIN T2 consists of their paired rows.

Using the INNER JOIN syntax with a join-condition (or USING clause) will produce the same result as specifying the join by listing two tables in the FROM clause separated by commas and using the where-clause to provide the condition.

LEFT JOIN or LEFT OUTER JOIN
The result of T1 LEFT OUTER JOIN T2 consists of their paired rows and, for each unpaired row of T1, the concatenation of that row with the null row of T2. All columns derived from T2 allow null values.
RIGHT JOIN or RIGHT OUTER JOIN
The result of T1 RIGHT OUTER JOIN T2 consists of their paired rows and, for each unpaired row of T2, the concatenation of that row with the null row of T1. All columns derived from T1 allow null values.
LEFT EXCEPTION JOIN and EXCEPTION JOIN
The result of T1 LEFT EXCEPTION JOIN T2 consists only of each unpaired row of T1, the concatenation of that row with the null row of T2. All columns derived from T2 allow null values.
RIGHT EXCEPTION JOIN
The result of T1 RIGHT EXCEPTION JOIN T2 consists only of each unpaired row of T2, the concatenation of that row with the null row of T1. All columns derived from T1 allow null values.
CROSS JOIN
The result of T1 CROSS JOIN T2 consists of each row of T1 paired with each row of T2. CROSS JOIN is also known as cartesian product.

where-clause

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram>>-WHERE--search-condition-------------------------------------><
 

The WHERE clause specifies an intermediate result table that consists of those rows of R for which the search-condition is true. R is the result of the FROM clause of the statement.

The search-condition must conform to the following rules:

Any subquery in the search-condition is effectively executed for each row of R and the results are used in the application of the search-condition to the given row of R. A subquery is executed for each row of R if it includes a correlated reference to a column of R. A subquery with no correlated reference is typically executed just once.

If a sort sequence other than *HEX is in effect when the statement that contains the WHERE clause is executed and if the search-condition contains operands that are SBCS data, mixed data, or Unicode data, then the comparison for those predicates is done using weighted values. The weighted values are derived by applying the sort sequence to the operands of the predicate.

group-by-clause

 

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram             .-,-------------------.
             V                     |
>>-GROUP BY----grouping-expression-+---------------------------><
 

 

The GROUP BY clause specifies an intermediate result table that consists of a grouping of the rows of R. R is the result of the previous clause of the subselect.

A grouping-expression is an expression that defines the grouping of R. The following restrictions apply to grouping-expression.

The result of the GROUP BY clause is a set of groups of rows. In each group of more than one row, all values of each grouping-expression are equal, and all rows with the same set of values of the grouping-expressions are in the same group. For grouping, all null values for a grouping-expression are considered equal.

Because every row of a group contains the same value of any grouping-expression, grouping-expressions can be used in a search condition in a HAVING clause, in the SELECT clause, or in a sort-key-expression of an ORDER BY clause (see order-by-clause for details). In each case, the reference specifies only one value for each group. The grouping-expression specified in these clauses must exactly match the grouping-expression in the GROUP BY clause, except that blanks are not significant. For example, a grouping-expression of

SALARY*.10

will match the expression in a having-clause of

HAVING SALARY*.10

but will not match

HAVING .10 *SALARY
 or
HAVING (SALARY*.10)+100

If the grouping-expression contains varying-length strings with trailing blanks, the values in the group can differ in the number of trailing blanks and may not all have the same length. In that case, a reference to the grouping-expression still specifies only one value for each group, but the value for a group is chosen arbitrarily from the available set of values. Thus, the actual length of the result value is unpredictable.

The number of grouping-expressions must not exceed 120 and the sum of their length attributes must not exceed 32766-n bytes, where n is the number of grouping-expressions specified that allow nulls.

If a sort sequence other than *HEX is in effect when the statement that contains the GROUP BY clause is executed, and the grouping-expressions are SBCS data, mixed data, or Unicode data, then the rows are placed into groups using the weighted values. The weighted values are derived by applying the sort sequence to the grouping-expressions.

having-clause

 

Click to skip syntax diagram
Read syntax diagramSkip visual syntax diagram>>-HAVING--search-condition------------------------------------><
 

 

The HAVING clause specifies an intermediate result table that consists of those groups of R for which the search-condition is true. R is the result of the previous clause of the subselect. If this clause is not GROUP BY, R is considered a single group with no grouping expressions.

Each expression that contains a column-name in the search condition must do one of the following:

The RRN, DATAPARTITIONNAME, DATAPARTITIONNUM, DBPARTITIONNAME, DBPARTITIONNUM, and HASHED_VALUE functions cannot be specified in the HAVING clause unless it is within an aggregate function. See "Functions" in Chapter 3 for restrictions that apply to the use of aggregate functions.

A group of R to which the search condition is applied supplies the argument for each aggregate function in the search condition, except for any function whose argument is a correlated reference.

If the search condition contains a subquery, the subquery can be thought of as being executed each time the search condition is applied to a group of R, and the results used in applying the search condition. In actuality, the subquery is executed for each group only if it contains a correlated reference. For an illustration of the difference, see examples 6 and 7 under Examples of a subselect.

A correlated reference to a group of R must either identify a grouping column or be contained within an aggregate function.

When HAVING is used without GROUP BY, any column name in the select list must appear within an aggregate function.

If a sort sequence other than *HEX is in effect when the statement that contains the HAVING clause is executed and if the search-condition contains operands that have SBCS data, mixed data, or Unicode data, the comparison for those predicates is done using weighted values. The weighted values are derived by applying the sort sequence to the operands in the predicate.

Examples of a subselect

Example 1

Select all columns and rows from the EMPLOYEE table.

   SELECT * FROM EMPLOYEE

Example 2

Join the EMPPROJACT and EMPLOYEE tables, select all the columns from the EMPPROJACT table and add the employee's surname (LASTNAME) from the EMPLOYEE table to each row of the result.

   SELECT EMPPROJACT.*, LASTNAME
     FROM EMPPROJACT, EMPLOYEE
     WHERE EMPPROJACT.EMPNO = EMPLOYEE.EMPNO

Example 3

Join the EMPLOYEE and DEPARTMENT tables, select the employee number (EMPNO), employee surname (LASTNAME), department number (WORKDEPT in the EMPLOYEE table and DEPTNO in the DEPARTMENT table) and department name (DEPTNAME) of all employees who were born (BIRTHDATE) earlier than 1930.

  SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME
    FROM EMPLOYEE, DEPARTMENT
    WHERE WORKDEPT = DEPTNO
    AND YEAR(BIRTHDATE) < 1930

This subselect could also be written as follows:

  SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME
    FROM EMPLOYEE INNER JOIN DEPARTMENT
    ON WORKDEPT = DEPTNO
    WHERE YEAR(BIRTHDATE) < 1930

Example 4

Select the job (JOB) and the minimum and maximum salaries (SALARY) for each group of rows with the same job code in the EMPLOYEE table, but only for groups with more than one row and with a maximum salary greater than or equal to 27000.

   SELECT JOB, MIN(SALARY), MAX(SALARY)
     FROM EMPLOYEE
     GROUP BY JOB
     HAVING COUNT(*) > 1 AND MAX(SALARY) >= 27000

Example 5

Select all the rows of EMPPROJACT table for employees (EMPNO) in department (WORKDEPT) 'E11'. (Employee department numbers are shown in the EMPLOYEE table.)

   SELECT * FROM EMPPROJACT
     WHERE EMPNO IN (SELECT EMPNO FROM EMPLOYEE
                     WHERE WORKDEPT = 'E11')

Example 6

From the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary for all employees.

   SELECT WORKDEPT, MAX(SALARY)
     FROM EMPLOYEE
     GROUP BY WORKDEPT
     HAVING MAX(SALARY) < (SELECT AVG(SALARY)
                           FROM EMPLOYEE)

The subquery in the HAVING clause would only be executed once in this example.

Example 7

Using the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary in all other departments.

   SELECT WORKDEPT, MAX(SALARY)
     FROM EMPLOYEE EMP_COR
     GROUP BY WORKDEPT
     HAVING MAX(SALARY) < (SELECT AVG(SALARY)
                           FROM EMPLOYEE
                           WHERE NOT WORKDEPT = EMP_COR.WORKDEPT)

In contrast to example 6, the subquery in the HAVING clause would need to be executed for each group.

Example 8

Join the EMPLOYEE and EMPPROJACT tables, select all of the employees and their project numbers. Return even those employees that do not have a project number currently assigned.

   SELECT EMPLOYEE.EMPNO, PROJNO
     FROM EMPLOYEE LEFT OUTER JOIN EMPPROJACT
     ON EMPLOYEE.EMPNO = EMPPROJACT.EMPNO

Any employee in the EMPLOYEE table that does not have a project number in the EMPPROJACT table will return one row in the result table containing the EMPNO value and the null value in the PROJNO column.


53.
A nested table expression is also called a derived table.
54.
If ALWCPYDTA(*NO) is specified, the length attribute must not be more than 2000, or 1999 if the expression is nullable.



[ Top of Page | Previous Page | Next Page | Contents | Index ]