Radial Basis Function Network

The radial basis function (RBF) network is a feed-forward neural network with a single layer of hidden units. Radial basis function networks are different from back propagation networks because they have only one layer of hidden units, and do not use the sigmoid activation function in the hidden layer unit. Instead, the radial basis function network has fixed-feature detectors in the hidden layer which use a specified basis function to detect and respond to localized portions of the input vector space. One advantage of radial basis networks over back propagation is that, if the input signal is non-stationary, the localized nature of the hidden layer response makes the networks less susceptible to "memory loss," or, as some would say, "weight loss."

Implementation

The radial basis function as implemented has a single hidden layer of units which can use one of three basis functions: Gaussian, thin plate spline, or multi-quadratic. The basis centers or weight vectors are learned during an initial stage of self-organized learning, currently fixed at 15 epochs. Once the basis vectors are set, the output layer weights are adjusted using back propagation.

Architecture Parameters

When creating a radial basis network, you must specify the following architecture parameters:

 
Auto Center
Controls whether the first layer weights are learned or set. A value of 1 indicates the center weights are determined using self-organizing learning. A value of 0 indicates the center weights must be explicitly set.
Number of inputs
Sets the number of units allocated for the input layer. This must be an integer value greater than or equal to 1.
Number of basis units in the hidden layer
Sets the number of basis units allocated for the hidden layer. This must be an integer value greater than or equal to 0.
Number of outputs
Sets the number of units allocated for the output layer. This must be an integer value greater than or equal to 1.

Other Parameters

 
Learn Rate
Controls how much the weights are changed during a weight update. The larger the value, the more the weights are changed. This must be a real value between 0.0 and 10.0.
All Widths
Controls the width or selectivity of the selected radial basis function. If set to a non-zero value, all of the hidden units will use the parameter value; smaller values denote smaller, more focused selection. A zero value is not currently supported; for some implementations, a 0 value signals that each hidden unit takes a unique width parameter from the Widths array.
Basis Function
Selects which radial basis function is used for the activation function in the hidden units. Three functions are currently supported:
0. Gaussian = exp(- v**2 / 2 width**2)
1. Thin Plate Spline = v**2 * log(v)
2. Multi-quadratic = sqrt( v**2 + width**2)

where v is the Euclidian Norm, the distance between the Input vector and the hidden unit Center, calculated as the square root of the sum of the squared element differences:
sqrt(&Sigma.(Input[i] - Center[i])**2)

Normalized
Controls whether the hidden unit activations are normalized so that they sum to 1. If Normalized = 1, then all hidden unit values are summed and then divided by the sum. If Normalized = 0, then the hidden unit activation values are set to the value returned by the selected radial basis function. Normalization tends to force a single result rather than hedging between several responses.
Momentum
Controls how much the weights are changed during a weight update by factoring in previous weight updates. It acts as a smoothing parameter that reduces oscillation and helps attain convergence. This must be a real value between 0.0 and 1.0, a typical value for momentum is 0.5.
Last RMS Error
Indicates the root-mean-square (RMS) of the error for a single training pattern. When the number of output units is n, the formula is:
sqrt((&Sigma.&epsilon.**2)/n)
Ave RMS Error
Indicates the average RMS error of the patterns in the previous epoch.
Tolerance
Sets the value for the acceptable difference between the desired output value and the actual output value. This must be a real value between 0.0 and 1.0. For example, if your training data set contains expected values of 0 and 1 and the tolerance is set to 0.1 (the default), then the average pattern error goes to 0 when all of the outputs are within 0.1 of the desired values.
Epoch Updates
Controls whether the network weights are updated after every pattern presentation (False) or only after a complete training epoch (True).
Last Num Bad Outputs
Indicates the number of output units that are out of the specified tolerance for a single training pattern.
Bad Pattern Ratio
Indicates the number of patterns in the previous epoch which have errors above tolerance divided by the total number of patterns.
Max RMS Error
Indicates the maximum RMS error of the patterns in the previous epoch.

Training

There are several parameters to set during the initial and final training phases. To train the basis weights, you must set the autocenter parameter to TRUE. When this parameter is set, only the hidden layer weights (the basis vectors) are adjusted. Training parameters which are used during this phase include the learn rate and tolerance for errors.

After the basis weights have been set or trained, a second training phase is required. This is quite similar to training a back propagation network with a single hidden layer. The learning-rate and momentum settings are used; modifying them can have a large effect on the training performance of the network. These values are commonly set from 0.5 to 0.7 for learning rate and 0.0 to 0.9 for momentum.

The error tolerance setting controls the final training process. If the data set contains binary targets (0,1), then the tolerance parameter is usually set to 0.1. This means that the output is considered "good" when it is within 0.1 of the desired output (that is, 0.9 for a 1, 0.1 for a 0). When every output is within the tolerance range of the desired output value, the network status is changed to LOCKED and weight updates are stopped.

You can also set the epoch update flag. If set to TRUE, the weights are changed only after every complete cycle through the training set. This is true gradient descent. When the epoch update flag is set to FALSE, weights are updated after each pattern is presented. For most problems, the network converges more quickly with the epoch update flag set to FALSE.

An additional factor in training a radial basis network is the order in which training patterns are presented. By setting the randomize flag on the Import object feeding the Network object, you can ensure that the network is presented with a random ordering of the training patterns. This is often useful in avoiding local minima and aids in training speed.

Running

The run-time performance of a radial basis network is relatively fast. The input vector is propagated through the network by first multiplying it by basis vectors and then passing it through the basis function. This vector is then multiplied by the output weight matrix. Next it is passed through the selected basis function to produce the network outputs array which is returned to the application program.