UDF concepts

This topic is a discussion of the important concepts you need to know before coding UDFs.

Types of function

There are several types of functions:

Built-in. These are functions provided by and shipped with the database. SUBSTR() is an example.
System-generated. These are functions implicitly generated by the database engine when a DISTINCT TYPE is created. These functions provide casting operations between the DISTINCT TYPE and its base type.
User-defined. These are functions created by users and registered to the database.

In addition, each function can be further classified as a scalar, column, or table function.

A scalar function returns a single value answer each time it is called. For example, the built-in function SUBSTR() is a scalar function, as are many built-in functions. System-generated functions are always scalar functions. Scalar UDFs can either be external (coded in a programming language such as C), written in SQL, or sourced (using the implementation of an existing function).

A column function receives a set of like values (a column of data) and returns a single value answer from this set of values. These are also called aggregating functions in DB2^®. Some built-in functions are column functions. An example of a column function is the built-in function AVG(). An external UDF cannot be defined as a column function. However, a sourced UDF is defined to be a column function if it is sourced on one of the built-in column functions. The latter is useful for distinct types. For example, if a distinct type SHOESIZE exists that is defined with base type INTEGER, you can define a UDF, AVG(SHOESIZE), as a column function sourced on the existing built-in column function, AVG(INTEGER).

A table function returns a table to the SQL statement that references it. It must be referenced in the FROM clause of a SELECT. A table function can be used to apply SQL language processing power to data that is not DB2 data, or to convert such data into a DB2 table. It can, for example, take a file and convert it to a table, sample data from the World Wide Web and tabularize it, or access a Lotus Notes^® database and return information about mail messages, such as the date, sender, and the text of the message. This information can be joined with other tables in the database. A table function can be defined as a external function or an SQL function; it cannot be defined as a sourced function.

Full name of a function

The full name of a function using *SQL naming is <schema-name>.<function-name>.

The full name of a function in *SYS naming is <schema-name>/<function-name>. Function names cannot be qualified using *SYS naming in DML statements.

You can use this full name anywhere you refer to a function. For example:

     QGPL.SNOWBLOWER_SIZE    SMITH.FOO    QSYS2.SUBSTR    QSYS2.FLOOR

However, you may also omit the <schema-name>., in which case, DB2 must determine the function to which you are referring. For example:

     SNOWBLOWER_SIZE    FOO    SUBSTR    FLOOR

Path

The concept of path is central to DB2's resolution of unqualified references that occur when schema-name is not specified. The path is an ordered list of schema names that is used for resolving unqualified references to UDFs and UDTs. In cases where a function reference matches a function in more than one schema in the path, the order of the schemas in the path is used to resolve this match. The path is established by means of the SQLPATH option on the precompile commands for static SQL. The path is set by the SET PATH statement for dynamic SQL. When the first SQL statement that runs in an activation group runs with SQL naming, the path has the following default value:

     "QSYS","QSYS2","<ID>"

This applies to both static and dynamic SQL, where <ID> represents the current statement authorization ID.

When the first SQL statement in an activation group runs with system naming, the default path is *LIBL.

Overloaded function names

Function names can be overloaded. Overloaded means that multiple functions, even in the same schema, can have the same name. Two functions cannot, however, have the same signature. A function signature is the qualified function name and the data types of all the function parameters in the order in that they are defined.

Function resolution

It is the function resolution algorithm that takes into account the facts of overloading and function path to choose the best fit for every function reference, whether it is a qualified or an unqualified reference. All functions, even built-in functions, are processed through the function selection algorithm. The function resolution algorithm does not take into account the type of a function. So a table function may be resolved to as the best fit function, even though the usage of the reference requires an scalar function, or vice versa.

Length of time that the UDF runs

UDFs are called from within an SQL statement execution, which is normally a query operation that potentially runs against thousands of rows in a table. Because of this, the UDF needs to be called from a low level of the database.

As a consequence of being called from such a low level, there are certain resources (locks and seizes) being held at the time the UDF is called and for the duration of the UDF execution. These resources are primarily locks on any tables and indexes involved in the SQL statement that is calling the UDF. Due to these held resources, it is important that the UDF not perform operations that may take an extended period of time (minutes or hours). Because of the critical nature of holding resources for long periods of time, the database only waits for a certain period of time for the UDF to finish. If the UDF does not finish in the time allocated, the SQL statement calling the UDF will fail.

The default UDF wait time used by the database should be more than sufficient to allow a normal UDF to run to completion. However, if you have a long running UDF and want to increase the wait time, this can be done using the UDF_TIME_OUT option in the query INI file. Note, however, that there is a maximum time limit that the database will not exceed, regardless of the value specified for UDF_TIME_OUT.

Since resources are held while the UDF is run, it is important that the UDF not operate on the same tables or indexes allocated for the original SQL statement or, if it does, that it does not perform an operation that conflicts with the one being performed in the SQL statement. Specifically, the UDF should not try to perform any insert, update, or delete row operation on those tables.