The Filter editor is used to define how data is modified by a Filter bean or Translate filter. The Filter editor creates a translate template for each field. You select a filter operator from a list appropriate for the data type, and provide any required parameters for the operator. Operators range from simple mathematical functions such as round to more complex functions such as scale and symbol conversion. Data type conversion may be specified in a translate template too. When a Filter bean is defined, you must specify a set of Translate templates, either programmatically or by using the Filter editor.
The translation process consists of three steps.
The Translate filter supports the following logical data types:
Data Type | Description |
Binary code | A standard binary representation for integers. In the simplest case, a binary code of length 1 is 0 or 1 and maps to integers of 0 and 1. If a binary code has length 8, it can represent 256 unique positive integer values ranging from 0 to 255. |
One-of-N code | A set of zeros with a single value of
one. The position of the one in the string indicates its
value or category. For example:
|
Number | A numeric field with values represented by a 4-byte floating-point number. |
Symbol | A character token delimited by spaces. The maximum length allowed is 25 characters. No imbedded spaces are allowed. |
Thermometer code | A binary string with the number of ones
equal to the value or category. For example:
|
Vector | A one-dimensional array (vector) of real values. Each vector element is processed as a Number field. |
One of the key issues in developing neural networks is the presentation of the training data to the network. Most neural network models require your data to be converted to a range of 0.0 to 1.0, or, for symmetric networks, -0.5 to +0.5. Data representation influences network accuracy, network size, and the time required to train the network. You can select a data representation to increase the influence of a parameter you realize is significantly more important than others in your data.
Following an import bean, a filter bean is normally used to
convert your raw data into a format the network can best utilize.
As an example, assume your input data contains a character
variable with five possible states called A, B, C, D, and E. The
following table shows several alternate data representations
which could be selected.
Data Representation Comparison
Category | Number | One-of-N Code | Thermometer Code | Binary Code |
---|---|---|---|---|
A | 0.1 | 0 0 0 0 1 | 0 0 0 0 1 | 0 0 0 |
B | 0.3 | 0 0 0 1 0 | 0 0 0 1 1 | 0 0 1 |
C | 0.5 | 0 0 1 0 0 | 0 0 1 1 1 | 0 1 0 |
D | 0.7 | 0 1 0 0 0 | 0 1 1 1 1 | 0 1 1 |
E | 0.9 | 1 0 0 0 0 | 1 1 1 1 1 | 1 0 0 |
The Number representation uses only one value which is the minimum possible. This keeps the network size small, but it will be more difficult for the network to learn and consequently training time will be longer. If there are relationships between the categories, a numeric representation implies that the categories are sorted according to those relationships. If the parameter to be represented is of relatively low importance and network size is a concern, the numeric representation could be used.
The One-of-N Code representation is shown with a length of five since there are five categories. It will create larger networks, since there are now five network inputs compared to one with the number data type. More inputs mean that this parameter will have more connections in the network and enhance the importance of this parameter in the results obtained. Also, the network will learn more easily, and train more quickly. Note, though, that four of these inputs are always zero. Use this data type for categories where there are no relationships between the values. For example, A-E could represent broad occupational categories such as Technical, Management, Sales, Manufacturing, and Administrative. This is the data type chosen when a filter is generated for a categorical field.
The Thermometer Code representation is similar to the One-of-N but implies a relationship between the categories. For example, A-E could represent a scale from best to worst.
The Binary Code is shown with a length of 3 as that is the smallest size needed to represent five separate categories. Up to eight categories could be represented with a length of 3 (2**3 = 8), 16 with a length of 4, and so on. If there are more than ten categories, use a binary representation rather than thermometer or one-of-n. In this example, it saves just two network inputs. Had there been 16 different states, only four network inputs would represent those states in binary code while sixteen would have been needed for thermometer or one-of-n.
Just as your raw data is translated for input to a neural network, the network's output may need to be translated back from its output range to symbols. Typically this is the mirror image of the translation used to provide the input. If you want your answers back in the categories A through E and selected the numeric representation, for example, define a translation table that maps 0.1 to A, 0.3 to B, and so on. Use the threshold function to convert numbers from 0.0 to 0.2 to an integer 1, 0.2 to 0.4 to an integer 2, and so on; then define a translate table to convert the integers to alphabetic categories.
You may wish to artificially increase the number of inputs to emphasize a parameter such as gender. While a binary field of length one could be used to represent male and female, increasing the length will create more network nodes and connections between this attribute and other parameters in your data. Making the length four, and mapping male to 0000 and female to 1111, quadruples the number of connections and provides a stronger distinction between records which differ in this value.
Data representation is a key consideration in designing neural network applications. Some experimentation should be expected to determine the best representation based on the results from both the network output and training experience.
The following operators are supported in the Translate filter for use with the Number and Vector data types:
Operator | Description |
Bit AND | Performs a Bit AND operation. |
Bit COMP | Performs a Bit Complement operation. |
Bit OR | Performs a Bit OR operation. |
Bit XOR | Performs a Bit XOR (exclusive-OR) operation. |
Acos | Returns the arccosine of the data |
Add | Add an arbitrary value to the data. |
Asin | Returns the arcsine of the data. |
Atan | Returns the arctangent of the data. |
Cos | Returns the cosine of the data. |
Cosh | Returns the hyperbolic cosine of the data. |
Sin | Returns the sine of the data. |
Sinh | Returns the hyperbolic sine of the data. |
Tan | Returns the tangent of the data. |
Tanh | Returns the hyperbolic tangent of the data. |
Abs | Returns the absolute value of the data. |
Ceil | Returns the smallest integer that is greater than or equal to the data value. |
Discretize | Returns the index of a set of value ranges which include the data value. |
Div | Returns the integer portion after a division of integers. |
Exp | Returns the exponential function of the data. |
Exp10 | Returns the exponential function of the data (base 10). |
Floor | Returns the value representing the largest integer that is less than or equal to the data value. |
Log | Returns the natural logarithm of the data. |
Log10 | Returns the base 10 logarithm of the data. |
Modulo | Returns the remainder after a division of integers. |
None | No operation is performed on the data. This is the default. |
Power | Raise the data value to a power you specify. |
PowerN | Raise the mantissa value you define to the power supplied from the data. |
Round | Round a real-valued number to the closest integer value. |
Scale | Scales data in a piece-wise linear fashion around a midpoint you define. |
Sqrt | Returns the square root of the data. |
Square | Returns the square of the data. |
Threshold | Sets numeric data in specified ranges to specific values. A threshold operator has one or more ranges, each with a lower boundary, an upper boundary, and a set-to-value. When an input value is greater than a lower boundary and less than or equal to an upper boundary, the set-to-value is the output of the threshold operator. A value that does not fall with one of the ranges is unchanged. If the boundaries overlap, the first boundary pair satisfied is selected. |
Truncate | Truncates a real number to its integer value by dropping any fractional part. |
The following operators are valid only on Vector data types:
Operator | Description |
NormL1 | Divide each element of the vector by the sum of all of the elements. |
NormL2 | Performs a Euclidean normalization of the vector field. |
NormMax | Divide each element of the vector by the largest element in the vector. |
The following operators are valid only on Symbol data types:
Operator | Description |
LowerCase | Forces all characters in a symbol field to lowercase. |
Translate | Substitutes a defined output string for the input strings defined in the translation table. If the input string is not found, the default output is returned. The input data type can be symbol or numeric, which is converted internally to symbol. The output data type is always symbol. |
UpperCase | Forces all characters in a symbol field to uppercase. |
The following functions are valid only for the One-of-N data type.
Operator | Description |
Max | Examines the source size number of
elements from the input buffer and returns the position
of the element that has the highest value. The following
examples show the result given a source size of five for
the field elements shown:Data: 0.4 0.3 0.5 0.1 0.7 One-of-N code: 0 0 0 0 1 Max field value: 1 Data: 0.4 0.1 0.9 0.9 0.7 One-of-N code: 0 0 1 0 0 Max field value: 3 Data: 0.0 0.0 0.0 0.0 0.0 One-of-N code: 0 0 0 0 0 Max field value: 0 |
Min | Examines the source size and number of
elements from the input buffer and returns the position
of the element that has the lowest value. The following
examples show the result given a source size of five for
the field elements shown:Data: 0.4 0.3 0.5 0.1 0.7 One-of-N code: 0 0 0 1 0 Min field value: 2 Data: 0.4 0.1 0.9 0.9 0.7 One-of-N code: 0 1 0 0 0 Min field value: 4 Data: 0.0 0.0 0.0 0.0 0.0 One-of-N code: 0 0 0 0 0 Min field value: 0 |
The source data is converted into the destination logical type data using the predefined conversion operations shown in the following table:
Destination | ||||||
Source | Binary Code | Number | One-of-N Code | Symbol | Thermometer Code | Vector |
Binary Code | None | ![]() |
![]() |
![]() |
![]() |
![]() |
Number | ![]() |
None | ![]() |
![]() |
![]() |
Invalid |
One-of-N Code | ![]() |
![]() |
None | ![]() |
![]() |
![]() |
Symbol | ![]() |
![]() |
![]() |
None | ![]() |
Invalid |
Thermometer Code | ![]() |
![]() |
![]() |
![]() |
None | ![]() |
Vector | ![]() |
Invalid | ![]() |
Invalid | ![]() |
None |
Symbol | Name | Description |
![]() |
One to One Mapping | When converting from a code type to vector, the data passes through without being transformed. When converting from a vector to a code type (code types are binary supporting 1's and 0's only), any non-zero vector element becomes a 1 in the code. |
![]() |
Conversion Tables | A conversion table contains entries defining symbols and corresponding numeric values. When a conversion is required, the table is searched for a match. When the match is found, the corresponding number or symbol is produced as the output. If the input is not matched, the default output is the result. |
![]() ![]() |
Field Values | If a source or destination data type is a code type (One-of-N, binary, or thermometer), the code data is converted using a single numeric value called a field value. |
Here is an example of field values ranging from 0 to 5 and the corresponding codes:
Field Value | Binary Code | One-of-N Code | Thermometer Code |
---|---|---|---|
0 | 0, 0, 0, 0 | 0, 0, 0, 0 | 0, 0, 0, 0 |
1 | 0, 0, 0, 1 | 0, 0, 0, 1 | 0, 0, 0, 1 |
2 | 0, 0, 1, 0 | 0, 0, 1, 0 | 0, 0, 1, 1 |
3 | 0, 0, 1, 1 | 0, 1, 0, 0 | 0, 1, 1, 1 |
4 | 0, 1, 0, 0 | 1, 0, 0, 0 | 1, 1, 1, 1 |
5 | 0, 1, 0, 1 | - | 1, 1, 1, 1 |
When a field value less than or equal to 0 is converted to a One-of-N code, the code is represented by all zeros. Also, field values greater than the length of the One-of-N code are represented by zeros.
When a field value less than or equal to 0 is converted to a thermometer code, the code is represented by all zeros. Field values greater than the thermometer code length are set to all ones.
The Menu Bar consists of these submenus:
The File menu contains menu items that apply to the container agent.
The Edit menu contains menu items that apply to a selected bean or to the clipboard.
The Template view is the only view currently supported.
The Help menu opens your browser with one of these topics:
The Template table consists of one row for each field in the data buffer. Each template row contains these values: