Self-Organizing Map Bean

The self-organizing feature map (SOFM) develops a probability density map of the input vector space that is useful for data reduction. It builds a map that projects from an n-dimensional input space onto a two-dimensional output space. This output space is constrained to a rectangular area. The training data consists of real-value input vectors.

Implementation

Feature maps are implemented as described in papers by Kohonen and DeSieno. The ABLE implementation allows all units to have equal probability of winning.

Architecture

The inputs for this algorithm are simply the dimensionality of the input data. However, deciding how to set up the output layer is a crucial part of using feature maps.

The number of output units becomes the major factor in how a map self-organizes. Each output unit responds to a hypersphere in the input space. The number of output units required depends on whether you primarily want data reduction or the topological mapping function.

All Able-provided neural networks share some common attributes.

The following addition parameters relate to a self-organizing feature map network:

Architecture parameters

Number of inputs
Sets the number of units allocated for the input layer. This must be an integer value between 1 and 2000.
Number of output rows
Sets the number of rows for the units allocated for the output layer. This must be an integer value between 1 and 100. For example, if you are projecting to a single dimension, you would set output rows equal to 1 and output columns equal to the number of categories.
Number of output columns
Sets the number of columns for the units allocated for the output layer. This must be an integer value between 1 and 100. For example, if you are projecting to a two-dimensional output space, you can set the output to be any M-by-N rectangular space.

Other Parameters

Winner
Is the index of the winning output unit.
Winner Activation
The activation value of the winning processing unit.
Learn Rate
Controls how much the weights are changed during a weight update. The larger the value, the more the weights are changed. This must be a real value between 0.0 and 10.0.
Beta
Is usually set to a small value like 0.0001. Use it to calculate the percentage of time that a unit has won the competition. This must be a real value between 0.0 and 1.0.
Conscience
Represents the bias factor that determines the distance a losing unit can reach in order to win a competition. This must be a real value between 0.0 and 1.0.
Neighborhood
Controls how many units surrounding the closest unit to the input pattern are modified during a weight update. This must be an integer between 0 and the total of the number of output rows and columns. The neighborhood parameter is set to 1, then every unit adjacent to the winner has its weights changed. A rule of thumb is that the value for the neighborhood parameter should be one-third the maximum number of output rows and columns.
Learn Rate Delta
Controls how much the learn rate parameter is decreased after each training epoch. This must be a real value between 0.0 and 1.0.
Neighborhood Rate
Controls how many epochs must be presented to the network before the neighborhood parameter value is decreased by one. This must be a real value between 0.0 and 10 000.0. The size of this parameter depends on the number of input patterns in your training data set. Approximately 1000 patterns should be presented to the network before the neighborhood is decremented.
Initial Learn Rate
The initial learn rate used whenever the network is reset. This value is then decremented during training.
Initial Neighborhood
The initial neighborhood value used whenever the network is reset. This value is then decremented during training.
Maximum Number of Epochs
The maximum number of epochs used during training the network. If set to zero, training continues until a breakpoint is reached or otherwise halted. Values range from 0 to 10 000 with a default of zero. A positive value halts training after processing that number of epochs.
Sigma

Training

The training data consists of real-value input vectors scaled to values between 0.0 and 1.0. Several parameters must be adjusted to ensure good training of a feature map. The implementation requires a bias factor called Conscience, which allows losing units to win sometimes, even though they are not the closest units to the input vector. The Conscience term determines how far a losing output unit can be from the input and still be declared the winner.

Use the constant term beta to calculate the percentage of time that a unit wins the competition. It should be set to a very small value between 0.001 and 0.0001.

Running

Feature maps are very fast in run time, requiring only a single multiplication of the input vector by the weight matrix.