Filter Editor

The Filter editor is used to define how data is modified by a Filter bean or Translate filter. The Filter editor creates a translate template for each field. You select a filter operator from a list appropriate for the data type, and provide any required parameters for the operator. Operators range from simple mathematical functions such as round to more complex functions such as scale and symbol conversion. Data type conversion may be specified in a translate template too. When a Filter bean is defined, you must specify a set of Translate templates, either programmatically or by using the Filter editor.

Translation Process

The translation process consists of three steps.

  1. Input data of a logical data type is processed by an optional pre-operator.
  2. This data is then converted to the destination logical data type.
  3. The destination data is processed by an optional post-operator before it is output.

Data Types

The Translate filter supports the following logical data types:

Data Type Description
Binary code A standard binary representation for integers. In the simplest case, a binary code of length 1 is 0 or 1 and maps to integers of 0 and 1. If a binary code has length 8, it can represent 256 unique positive integer values ranging from 0 to 255.
One-of-N code A set of zeros with a single value of one. The position of the one in the string indicates its value or category. For example:
    0, 0, 0, 1 represents red
    0, 0, 1, 0 represents green
    0, 1, 0, 0 represents blue
    1, 0, 0, 0 represents yellow 
Number A numeric field with values represented by a 4-byte floating-point number.
Symbol A character token delimited by spaces. The maximum length allowed is 25 characters. No imbedded spaces are allowed.
Thermometer code A binary string with the number of ones equal to the value or category. For example:
    0, 0, 0, 1 represents fair
    0, 0, 1, 1 represents okay
    0, 1, 1, 1 represents good
    1, 1, 1, 1 represents best 
Vector A one-dimensional array (vector) of real values. Each vector element is processed as a Number field.

Using Data Types

One of the key issues in developing neural networks is the presentation of the training data to the network. Most neural network models require your data to be converted to a range of 0.0 to 1.0, or, for symmetric networks, -0.5 to +0.5. Data representation influences network accuracy, network size, and the time required to train the network. You can select a data representation to increase the influence of a parameter you realize is significantly more important than others in your data.

Following an import bean, a filter bean is normally used to convert your raw data into a format the network can best utilize. As an example, assume your input data contains a character variable with five possible states called A, B, C, D, and E. The following table shows several alternate data representations which could be selected.

Data Representation Comparison

Category Number One-of-N Code Thermometer Code Binary Code
A 0.1 0 0 0 0 1 0 0 0 0 1 0 0 0
B 0.3 0 0 0 1 0 0 0 0 1 1 0 0 1
C 0.5 0 0 1 0 0 0 0 1 1 1 0 1 0
D 0.7 0 1 0 0 0 0 1 1 1 1 0 1 1
E 0.9 1 0 0 0 0 1 1 1 1 1 1 0 0

The Number representation uses only one value which is the minimum possible. This keeps the network size small, but it will be more difficult for the network to learn and consequently training time will be longer. If there are relationships between the categories, a numeric representation implies that the categories are sorted according to those relationships. If the parameter to be represented is of relatively low importance and network size is a concern, the numeric representation could be used.

The One-of-N Code representation is shown with a length of five since there are five categories. It will create larger networks, since there are now five network inputs compared to one with the number data type. More inputs mean that this parameter will have more connections in the network and enhance the importance of this parameter in the results obtained. Also, the network will learn more easily, and train more quickly. Note, though, that four of these inputs are always zero. Use this data type for categories where there are no relationships between the values. For example, A-E could represent broad occupational categories such as Technical, Management, Sales, Manufacturing, and Administrative. This is the data type chosen when a filter is generated for a categorical field.

The Thermometer Code representation is similar to the One-of-N but implies a relationship between the categories. For example, A-E could represent a scale from best to worst.

The Binary Code is shown with a length of 3 as that is the smallest size needed to represent five separate categories. Up to eight categories could be represented with a length of 3 (2**3 = 8), 16 with a length of 4, and so on. If there are more than ten categories, use a binary representation rather than thermometer or one-of-n. In this example, it saves just two network inputs. Had there been 16 different states, only four network inputs would represent those states in binary code while sixteen would have been needed for thermometer or one-of-n.

Just as your raw data is translated for input to a neural network, the network's output may need to be translated back from its output range to symbols. Typically this is the mirror image of the translation used to provide the input. If you want your answers back in the categories A through E and selected the numeric representation, for example, define a translation table that maps 0.1 to A, 0.3 to B, and so on. Use the threshold function to convert numbers from 0.0 to 0.2 to an integer 1, 0.2 to 0.4 to an integer 2, and so on; then define a translate table to convert the integers to alphabetic categories.

You may wish to artificially increase the number of inputs to emphasize a parameter such as gender. While a binary field of length one could be used to represent male and female, increasing the length will create more network nodes and connections between this attribute and other parameters in your data. Making the length four, and mapping male to 0000 and female to 1111, quadruples the number of connections and provides a stronger distinction between records which differ in this value.

Data representation is a key consideration in designing neural network applications. Some experimentation should be expected to determine the best representation based on the results from both the network output and training experience.

Number and Vector Operators

The following operators are supported in the Translate filter for use with the Number and Vector data types:

Operator Description
Bit AND Performs a Bit AND operation.
Bit COMP Performs a Bit Complement operation.
Bit OR Performs a Bit OR operation.
Bit XOR Performs a Bit XOR (exclusive-OR) operation.
Acos Returns the arccosine of the data
Add Add an arbitrary value to the data.
Asin Returns the arcsine of the data.
Atan Returns the arctangent of the data.
Cos Returns the cosine of the data.
Cosh Returns the hyperbolic cosine of the data.
Sin Returns the sine of the data.
Sinh Returns the hyperbolic sine of the data.
Tan Returns the tangent of the data.
Tanh Returns the hyperbolic tangent of the data.
Abs Returns the absolute value of the data.
Ceil Returns the smallest integer that is greater than or equal to the data value.
Discretize Returns the index of a set of value ranges which include the data value.
Div Returns the integer portion after a division of integers.
Exp Returns the exponential function of the data.
Exp10 Returns the exponential function of the data (base 10).
Floor Returns the value representing the largest integer that is less than or equal to the data value.
Log Returns the natural logarithm of the data.
Log10 Returns the base 10 logarithm of the data.
Modulo Returns the remainder after a division of integers.
None No operation is performed on the data. This is the default.
Power Raise the data value to a power you specify.
PowerN Raise the mantissa value you define to the power supplied from the data.
Round Round a real-valued number to the closest integer value.
Scale Scales data in a piece-wise linear fashion around a midpoint you define.
Sqrt Returns the square root of the data.
Square Returns the square of the data.
Threshold Sets numeric data in specified ranges to specific values. A threshold operator has one or more ranges, each with a lower boundary, an upper boundary, and a set-to-value. When an input value is greater than a lower boundary and less than or equal to an upper boundary, the set-to-value is the output of the threshold operator. A value that does not fall with one of the ranges is unchanged. If the boundaries overlap, the first boundary pair satisfied is selected.
Truncate Truncates a real number to its integer value by dropping any fractional part.

The following operators are valid only on Vector data types:

Operator Description
NormL1 Divide each element of the vector by the sum of all of the elements.
NormL2 Performs a Euclidean normalization of the vector field.
NormMax Divide each element of the vector by the largest element in the vector.

Symbolic Operators

The following operators are valid only on Symbol data types:

Operator Description
LowerCase Forces all characters in a symbol field to lowercase.
Translate Substitutes a defined output string for the input strings defined in the translation table. If the input string is not found, the default output is returned. The input data type can be symbol or numeric, which is converted internally to symbol. The output data type is always symbol.
UpperCase Forces all characters in a symbol field to uppercase.

One-of-N Operators

The following functions are valid only for the One-of-N data type.

Operator Description
Max Examines the source size number of elements from the input buffer and returns the position of the element that has the highest value. The following examples show the result given a source size of five for the field elements shown:
    Data:           0.4  0.3  0.5  0.1  0.7
    One-of-N code:  0    0    0    0    1
    Max field value:          1
    Data:           0.4  0.1  0.9  0.9  0.7
    One-of-N code:  0    0    1    0    0
    Max field value:          3
    Data:           0.0  0.0  0.0  0.0  0.0
    One-of-N code:  0    0    0    0    0
    Max field value:          0
Min Examines the source size and number of elements from the input buffer and returns the position of the element that has the lowest value. The following examples show the result given a source size of five for the field elements shown:
    Data:           0.4  0.3  0.5  0.1  0.7
    One-of-N code:  0    0    0    1    0
    Min field value:          2
    Data:           0.4  0.1  0.9  0.9  0.7
    One-of-N code:  0    1    0    0    0
    Min field value:          4
    Data:           0.0  0.0  0.0  0.0  0.0
    One-of-N code:  0    0    0    0    0
    Min field value:          0

 

Data Conversions

The source data is converted into the destination logical type data using the predefined conversion operations shown in the following table:

   Destination
Source  Binary Code  Number  One-of-N Code  Symbol  Thermometer Code  Vector 
Binary Code  None  Figure cd2num not displayed. Figure cd2cd not displayed. Figure cd2sym not displayed. Figure cd2cd not displayed. Figure vect not displayed.
Number  Figure num2cd not displayed. None  Figure num2cd not displayed. Figure cnvtab not displayed. Figure num2cd not displayed. Invalid 
One-of-N Code  Figure cd2cd not displayed. Figure cd2num not displayed. None  Figure cd2sym not displayed. Figure cd2cd not displayed. Figure vect not displayed.
Symbol  Figure sym2cd not displayed. Figure cnvtab not displayed. Figure sym2cd not displayed. None  Figure sym2cd not displayed. Invalid 
Thermometer Code  Figure cd2cd not displayed. Figure cd2num not displayed. Figure cd2cd not displayed. Figure cd2sym not displayed. None  Figure vect not displayed.
Vector  Figure vect not displayed. Invalid  Figure vect not displayed. Invalid  Figure vect not displayed. None 

Key to Symbols

Symbol Name Description
Figure vect not displayed. One to One Mapping When converting from a code type to vector, the data passes through without being transformed. When converting from a vector to a code type (code types are binary supporting 1's and 0's only), any non-zero vector element becomes a 1 in the code.
Figure cnvtab not displayed. Conversion Tables A conversion table contains entries defining symbols and corresponding numeric values. When a conversion is required, the table is searched for a match. When the match is found, the corresponding number or symbol is produced as the output. If the input is not matched, the default output is the result.
Figure cd2num not displayed.andFigure num2cd not displayed. Field Values If a source or destination data type is a code type (One-of-N, binary, or thermometer), the code data is converted using a single numeric value called a field value.

Field Values Examples

Here is an example of field values ranging from 0 to 5 and the corresponding codes:

Field Value Binary Code One-of-N Code Thermometer Code
0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0
1 0, 0, 0, 1 0, 0, 0, 1 0, 0, 0, 1
2 0, 0, 1, 0 0, 0, 1, 0 0, 0, 1, 1
3 0, 0, 1, 1 0, 1, 0, 0 0, 1, 1, 1
4 0, 1, 0, 0 1, 0, 0, 0 1, 1, 1, 1
5 0, 1, 0, 1 - 1, 1, 1, 1

When a field value less than or equal to 0 is converted to a One-of-N code, the code is represented by all zeros. Also, field values greater than the length of the One-of-N code are represented by zeros.

When a field value less than or equal to 0 is converted to a thermometer code, the code is represented by all zeros. Field values greater than the thermometer code length are set to all ones.

Menu Bar

The Menu Bar consists of these submenus:

File   Edit   View   Help

File

The File menu contains menu items that apply to the container agent.

New
Removes all filter templates from the table.
Properties
Displays the customizer for the filter bean. One use is to remove connections.
Exit
Leaves the Filter Editor.

Edit

The Edit menu contains menu items that apply to a selected bean or to the clipboard.

Add template
Add a new template row to the table.
Cut
Remove the selected template row from the filter and place it on the clipboard.
Copy
Copy the selected template row to the clipboard.
Paste
Paste the template from the clipboard into the table.
Delete
Remove the selected template from the filter.

View

The Template view is the only view currently supported.

Template
Show the field templates.
Buffer
Show the buffer connections.
Sort by...
Sort by selected fields.

Help

The Help menu opens your browser with one of these topics:

Editor
Displays Filter editor help.
API reference
Displays package JavaDoc.

Template Table

The Template table consists of one row for each field in the data buffer. Each template row contains these values:

Name
The name of the field from its data defnition.
Usage
The field usage from its data defnition.
Replications
The count of adjacent fields for this conversion.
PreOperator
A combination box from which to select the desired preoperator.
SrcType
The type of source data expected.
SrcLen
The length of the source data.
Table
Conversion from symbolic to discrete for categorical data.
DestType
The type of data to output.
DestLen
The length of the data to output.
PostOperator
A combination box from which to select the desired postoperator.