>>-select-clause--from-clause--+--------------+-----------------> '-where-clause-' >--+-----------------+--+---------------+---------------------->< '-group-by-clause-' '-having-clause-'
The subselect is a component of the fullselect.
A subselect specifies a result table derived from the tables or views identified in the FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next. (This is only a way of describing the subselect. The method used to perform the derivation may be quite different from this description. If portions of the subselect do not actually need to be executed for the correct result to be obtained, they may or may not be executed.)
A scalar-subselect is a subselect, enclosed in parentheses, that returns a single result row and a single result column. If the result of the subselect is no rows, then the null value is returned. An error is returned if there is more than one row in the result.
The sequence of the (hypothetical) operations is:
.-ALL------. >>-SELECT--+----------+-----------------------------------------> '-DISTINCT-' >--+-*-------------------------------------------+------------->< | .-,---------------------------------------. | | V | | '---+-expression--+---------------------+-+-+-' | | .-AS-. | | | '-+----+--column-name-' | +-table-name.*------------------------+ +-view-name.*-------------------------+ '-correlation-name.*------------------'
The SELECT clause specifies the columns of the final result table. The column values are produced by the application of the select list to R. The select list is the names or expressions specified in the SELECT clause, and R is the result of the previous operation of the subselect. For example, if the only clauses specified are SELECT, FROM, and WHERE, R is the result of that WHERE clause.
DISTINCT is not allowed if the select list contains a LOB or DATALINK column.
The list of names is established when the statement containing the SELECT clause is prepared. Therefore, * does not identify any columns that have been added to a table after the statement has been prepared.
Normally, when SQL statements are implicitly rebound, the list of names is not re-established. Therefore, the number of columns returned by the statement does not change. However, there are four cases where the list of names is established again and the number of columns can change:
The number of columns in the result of SELECT is the same as the number of expressions in the operational form of the select list (that is, the list established at prepare time), and cannot exceed 8000. The result of a subquery must be a single expression, unless the subquery is used in the EXISTS predicate.
The results of applying the select list to R depend on whether or not GROUP BY or HAVING is used:
In either case the nth column of the result contains the values specified by applying the nth expression in the operational form of the select list.
Result columns allow null values if they are derived from:
Each column of the result of SELECT acquires a data type from the expression from which it is derived.
When the expression is: | The data type of the result column is: |
---|---|
the name of any numeric column | the same as the data type of the column, with the same precision and scale for decimal columns. |
an integer constant | INTEGER or BIGINT (if the value of the constant is outside the range of INTEGER, but within the range of BIGINT). |
a decimal or floating-point constant | the same as the data type of the constant, with the same precision and scale for decimal constants. |
the name of any numeric variable | the same as the data type of the variable, with the same precision and scale for decimal variables. If the data type of the variable is not identical to an SQL data type (for example, DISPLAY SIGN LEADING SEPARATE in COBOL), the result column is decimal. |
an expression | the same as the data type of the result, with the same precision and scale for decimal results as described under Expressions. |
any function | the data type of the result of the function. For a built-in function, see Built-in functions to determine the data type of the result. For a user-defined function, the data type of the result is what was defined in the CREATE FUNCTION statement for the function. |
the name of any string column | the same as the data type of the column, with the same length attribute. |
the name of any string variable | the same as the data type of the variable, with a length attribute equal to the length of the variable. If the data type of the variable is not identical to an SQL data type (for example, a NUL-terminated string in C), the result column is a varying-length string. |
a character-string constant of length n | VARCHAR(n) |
a graphic-string constant of length n | VARGRAPHIC(n) |
the name of a datetime column, or an ILE RPG compiler or ILE COBOL compiler datetime host variable | the same as the data type of the column or variable. |
the name of a datalink column | a datalink, with the same length attribute. |
the name of a row ID column or a row ID variable | ROWID |
the name of a distinct type column | the same as the distinct type of the column, with the same length, precision, and scale attributes, if any. |
The FROM clause specifies an intermediate result table.
If only one table-reference is specified, the intermediate result table is simply the result of that table-reference. If more than one table-reference is specified in the FROM clause, the intermediate result table consists of all possible combinations of the rows of the specified table-references (the Cartesian product). Each row of the result is a row from the first table-reference concatenated with a row from the second table-reference, concatenated in turn with a row from the third, and so on. The number of rows in the result is the product of the number of rows in all the individual table-references.
>>-+-single-table------------+--------------------------------->< +-nested-table-expression-+ +-table-function----------+ '-joined-table------------' single-table: |--+-table-name-+--+--------------------+-----------------------| '-view-name--' '-correlation-clause-' nested-table-expression: |--+---------+--(--fullselect--+-----------------+--+--------------------+--)--> '-LATERAL-' '-order-by-clause-' '-fetch-first-clause-' >--correlation-clause-------------------------------------------| table-function: |--TABLE--(--function-invocation--)--correlation-clause---------| correlation-clause: .-AS-. |--+----+--correlation-name--+-----------------------+----------| | .-,-----------. | | V | | '-(----column-name-+--)-'
A table-reference specifies an intermediate result table.
If function-name is specified, the TABLE or LATERAL keyword is specified, or a table-reference identifies a distributed table, a table that has a read trigger, or logical file built over multiple physical file members; the query cannot contain:
The list of names in the FROM clause must conform to these rules:
Each correlation-name is defined as a designator of the intermediate result table specified by the immediately preceding table-reference. A correlation-name must be specified for nested table expressions and table functions.
The exposed names of all table references should be unique. An exposed name is:
Any qualified reference to a column for a table, view, nested table expression, or table function must use the exposed name. If the same table name or view name is specified twice, at least one specification should be followed by a correlation-name. The correlation-name is used to qualify references to the columns of the table or view. When a correlation-name is specified, column-names can also be specified to give names to the columns of the table-name, view-name, nested-table-expression or table-function. If a column list is specified, there must be a name in the column list for each column in the table or view and for each result column in the nested-table-expression or table-function. For more information, see Correlation names.
In general, nested-table-expressions and table-functions can be specified in any FROM clause. Columns from the nested table expressions and table functions can be referenced in the select list and in the rest of the subselect using the correlation name which must be specified. The scope of this correlation name is the same as correlation names for other table or view names in the FROM clause. A nested table expression can be used:
Correlated references can be used in nested-table-expressions. The basic rule that applies is that the correlated reference must be from a table-reference at a higher level in the hierarchy of subqueries. This hierarchy includes the table-references that have already been resolved in the left-to-right processing of the FROM clause. For nested table expressions, the TABLE or LATERAL keyword must appear before the fullselect. For more information see Column name qualifiers to avoid ambiguity
A table function can contain one or more correlated references to other tables in the same FROM clause if the referenced tables precede the reference in the left-to-right order of the tables in the FROM clause. The same capability exists for nested table expressions if the optional keyword TABLE or LATERAL is specified. Otherwise, only references to higher levels in the hierarchy of subqueries is allowed.
A nested table expression or table function that contains correlated references to other tables in the same FROM clause:
A nested table expression cannot contain a correlated reference to other tables in the same FROM clause when:
Syntax Alternatives: TABLE can be specified in place of LATERAL.
The following example is valid:
SELECT D.DEPTNO, D.DEPTNAME, EMPINFO.AVGSAL, EMPINFO.EMPCOUNT FROM DEPARTMENT D, (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT FROM EMPLOYEE E WHERE E.WORKDEPT = (SELECT X.DEPTNO FROM DEPARTMENT X WHERE X.DEPTNO = E.WORKDEPT ) ) AS EMPINFO
The following example is not valid because the reference to D.DEPTNO in the WHERE clause of the nested-table-expression attempts to reference a table that is outside the hierarchy of subqueries:
SELECT D.DEPTNO, D.DEPTNAME, EMPINFO.AVGSAL, EMPINFO.EMPCOUNT ***INCORRECT*** FROM DEPARTMENT D, (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT FROM EMPLOYEE E WHERE E.WORKDEPT = D.DEPTNO ) AS EMPINFO
The following example is valid because the reference to D.DEPTNO in the WHERE clause of the nested-table-expression references DEPT, which precedes the nested-table-expression and the LATERAL keyword was specified:
SELECT D.DEPTNO, D.DEPTNAME, EMPINFO.AVGSAL, EMPINFO.EMPCOUNT FROM DEPARTMENT D, LATERAL (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT FROM EMPLOYEE E WHERE E.WORKDEPT = D.DEPTNO ) AS EMPINFO
The following example of a table function is valid:
SELECT t.c1, z.c5 FROM t, TABLE(tf3 (t.c2 ) ) AS z WHERE t.c3 = z.c4
The following example is not valid because the reference to t.c2 is for a table that is to the right of the table function in the FROM clause:
SELECT t.c1, z.c5 FROM TABLE(tf6 (t.c2 ) ) AS z, t ***INCORRECT*** WHERE t.c3 = z.c4
.-INNER----------------. |--+-table-reference--+-+----------------------+--JOIN--table-reference--+-ON--join-condition-----------+-+-+--| | | | .-OUTER-. | | .-,-----------. | | | | | +-+-LEFT--+--+-------+-+ | V | | | | | | | '-RIGHT-' | '-USING--(----column-name-+--)-' | | | | | .-LEFT--. | | | | | '-+-------+--EXCEPTION-' | | | | '-RIGHT-' | | | '-CROSS JOIN--table-reference-------------------------------------------------------' | '-(--joined-table--)-------------------------------------------------------------------------------------'
A joined-table specifies an intermediate result table that is the result of either an inner, outer, cross, or exception join. The table is derived by applying one of the join operators: INNER, LEFT OUTER, RIGHT OUTER, LEFT EXCEPTION, RIGHT EXCEPTION or CROSS to its operands.
If a join operator is not specified, INNER is implicit. The order in which multiple joins are performed can affect the result. Joins can be nested within other joins. The order of processing for joins is generally from left to right, but based on the position of the required join-condition or USING clause. Parentheses are recommended to make the order of nested joins more readable. For example:
TB1 LEFT JOIN TB2 ON TB1.C1=TB2.C1 LEFT JOIN TB3 LEFT JOIN TB4 ON TB3.C1=TB4.C1 ON TB1.C1=TB3.C1
is the same as
(TB1 LEFT JOIN TB2 ON TB1.C1=TB2.C1) LEFT JOIN (TB3 LEFT JOIN TB4 ON TB3.C1=TB4.C1) ON TB1.C1=TB3.C1
An inner join combines each row of the left table with every row of the right table keeping only the rows where the join-condition (or USING clause) is true. Thus, the result table may be missing rows of from either or both of the joined tables. Outer joins include the rows produced by the inner join as well as the missing rows, depending on the type of outer join. Exception joins include only the missing rows, depending on the type of exception join as follows:
A joined table can be used in any context in which any form of the SELECT statement is used. A view or a cursor is read-only if its SELECT statement includes a joined table.
The join-condition is a search-condition that must conform to these rules:
For any type of join, column references in an expression of the join-condition are resolved using the rules for resolution of column name qualifiers specified in Column names before any rules about which tables the columns must belong to are applied.
The USING clause specifies a shorthand way of defining the join condition. This form is known as a named-columns-join.
The USING clause is equivalent to a join-condition in which each column from the left table-reference is compared equal to a column of the same name in the right table-reference. For example, a named-columns-join of the form:
TB1 INNER JOIN TB2 USING (C1, C2, ... Cn)
is equivalent to:
TB1 INNER JOIN TB2 ON TB1.C1 = TB2.C1 AND TB1.C2 = TB2.C2 AND ... TB1.Cn = TB2.Cn
A join-condition (or USING clause) specifies pairings of T1 and T2, where T1 and T2 are the left and right operand tables of the JOIN operator of the join-condition (or USING clause). For all possible combinations of rows of T1 and T2, a row of T1 is paired with a row of T2 if the join-condition (or USING clause) is true. When a row of T1 is joined with a row of T2, a row in the result consists of the values of that row of T1 concatenated with the values of that row of T2. In the case of OUTER joins, the execution might involve the generation of a null row. The null row of a table consists of a null value for each column of the table, regardless of whether the columns allow null values.
Using the INNER JOIN syntax with a join-condition (or USING clause) will produce the same result as specifying the join by listing two tables in the FROM clause separated by commas and using the where-clause to provide the condition.
The WHERE clause specifies an intermediate result table that consists of those rows of R for which the search-condition is true. R is the result of the FROM clause of the statement.
The search-condition must conform to the following rules:
Any subquery in the search-condition is effectively executed for each row of R and the results are used in the application of the search-condition to the given row of R. A subquery is executed for each row of R if it includes a correlated reference to a column of R. A subquery with no correlated reference is typically executed just once.
If a sort sequence other than *HEX is in effect when the statement that contains the WHERE clause is executed and if the search-condition contains operands that are SBCS data, mixed data, or Unicode data, then the comparison for those predicates is done using weighted values. The weighted values are derived by applying the sort sequence to the operands of the predicate.
The GROUP BY clause specifies an intermediate result table that consists of a grouping of the rows of R. R is the result of the previous clause of the subselect.
A grouping-expression is an expression that defines the grouping of R. The following restrictions apply to grouping-expression.
The result of the GROUP BY clause is a set of groups of rows. In each group of more than one row, all values of each grouping-expression are equal, and all rows with the same set of values of the grouping-expressions are in the same group. For grouping, all null values for a grouping-expression are considered equal.
Because every row of a group contains the same value of any grouping-expression, grouping-expressions can be used in a search condition in a HAVING clause, in the SELECT clause, or in a sort-key-expression of an ORDER BY clause (see order-by-clause for details). In each case, the reference specifies only one value for each group. The grouping-expression specified in these clauses must exactly match the grouping-expression in the GROUP BY clause, except that blanks are not significant. For example, a grouping-expression of
SALARY*.10
will match the expression in a having-clause of
HAVING SALARY*.10
but will not match
HAVING .10 *SALARY or HAVING (SALARY*.10)+100
If the grouping-expression contains varying-length strings with trailing blanks, the values in the group can differ in the number of trailing blanks and may not all have the same length. In that case, a reference to the grouping-expression still specifies only one value for each group, but the value for a group is chosen arbitrarily from the available set of values. Thus, the actual length of the result value is unpredictable.
The number of grouping-expressions must not exceed 120 and the sum of their length attributes must not exceed 32766-n bytes, where n is the number of grouping-expressions specified that allow nulls.
If a sort sequence other than *HEX is in effect when the statement that contains the GROUP BY clause is executed, and the grouping-expressions are SBCS data, mixed data, or Unicode data, then the rows are placed into groups using the weighted values. The weighted values are derived by applying the sort sequence to the grouping-expressions.
The HAVING clause specifies an intermediate result table that consists of those groups of R for which the search-condition is true. R is the result of the previous clause of the subselect. If this clause is not GROUP BY, R is considered a single group with no grouping expressions.
Each expression that contains a column-name in the search condition must do one of the following:
The RRN, DATAPARTITIONNAME, DATAPARTITIONNUM, DBPARTITIONNAME, DBPARTITIONNUM, and HASHED_VALUE functions cannot be specified in the HAVING clause unless it is within an aggregate function. See "Functions" in Chapter 3 for restrictions that apply to the use of aggregate functions.
A group of R to which the search condition is applied supplies the argument for each aggregate function in the search condition, except for any function whose argument is a correlated reference.
If the search condition contains a subquery, the subquery can be thought of as being executed each time the search condition is applied to a group of R, and the results used in applying the search condition. In actuality, the subquery is executed for each group only if it contains a correlated reference. For an illustration of the difference, see examples 6 and 7 under Examples of a subselect.
A correlated reference to a group of R must either identify a grouping column or be contained within an aggregate function.
When HAVING is used without GROUP BY, any column name in the select list must appear within an aggregate function.
If a sort sequence other than *HEX is in effect when the statement that contains the HAVING clause is executed and if the search-condition contains operands that have SBCS data, mixed data, or Unicode data, the comparison for those predicates is done using weighted values. The weighted values are derived by applying the sort sequence to the operands in the predicate.
Select all columns and rows from the EMPLOYEE table.
SELECT * FROM EMPLOYEE
Join the EMPPROJACT and EMPLOYEE tables, select all the columns from the EMPPROJACT table and add the employee's surname (LASTNAME) from the EMPLOYEE table to each row of the result.
SELECT EMPPROJACT.*, LASTNAME FROM EMPPROJACT, EMPLOYEE WHERE EMPPROJACT.EMPNO = EMPLOYEE.EMPNO
Join the EMPLOYEE and DEPARTMENT tables, select the employee number (EMPNO), employee surname (LASTNAME), department number (WORKDEPT in the EMPLOYEE table and DEPTNO in the DEPARTMENT table) and department name (DEPTNAME) of all employees who were born (BIRTHDATE) earlier than 1930.
SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME FROM EMPLOYEE, DEPARTMENT WHERE WORKDEPT = DEPTNO AND YEAR(BIRTHDATE) < 1930
This subselect could also be written as follows:
SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME FROM EMPLOYEE INNER JOIN DEPARTMENT ON WORKDEPT = DEPTNO WHERE YEAR(BIRTHDATE) < 1930
Select the job (JOB) and the minimum and maximum salaries (SALARY) for each group of rows with the same job code in the EMPLOYEE table, but only for groups with more than one row and with a maximum salary greater than or equal to 27000.
SELECT JOB, MIN(SALARY), MAX(SALARY) FROM EMPLOYEE GROUP BY JOB HAVING COUNT(*) > 1 AND MAX(SALARY) >= 27000
Select all the rows of EMPPROJACT table for employees (EMPNO) in department (WORKDEPT) 'E11'. (Employee department numbers are shown in the EMPLOYEE table.)
SELECT * FROM EMPPROJACT WHERE EMPNO IN (SELECT EMPNO FROM EMPLOYEE WHERE WORKDEPT = 'E11')
From the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary for all employees.
SELECT WORKDEPT, MAX(SALARY) FROM EMPLOYEE GROUP BY WORKDEPT HAVING MAX(SALARY) < (SELECT AVG(SALARY) FROM EMPLOYEE)
The subquery in the HAVING clause would only be executed once in this example.
Using the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary in all other departments.
SELECT WORKDEPT, MAX(SALARY) FROM EMPLOYEE EMP_COR GROUP BY WORKDEPT HAVING MAX(SALARY) < (SELECT AVG(SALARY) FROM EMPLOYEE WHERE NOT WORKDEPT = EMP_COR.WORKDEPT)
In contrast to example 6, the subquery in the HAVING clause would need to be executed for each group.
Join the EMPLOYEE and EMPPROJACT tables, select all of the employees and their project numbers. Return even those employees that do not have a project number currently assigned.
SELECT EMPLOYEE.EMPNO, PROJNO FROM EMPLOYEE LEFT OUTER JOIN EMPPROJACT ON EMPLOYEE.EMPNO = EMPPROJACT.EMPNO
Any employee in the EMPLOYEE table that does not have a project number in the EMPPROJACT table will return one row in the result table containing the EMPNO value and the null value in the PROJNO column.