Back Propagation Bean

The back propagation network is the most commonly used neural network model. It is a feed-forward model that uses pairs of real-value input/output vectors to train the network. An error-minimization method called gradient descent is used to reduce the output error by adjusting the network weights.

The algorithm used to train back propagation networks is extremely expensive computationally. The training data set consists of input/output pairs of real-value vectors. Usually the values are constrained between 0 and 1. The ABLE implementation allows from 0 to 3 layers of hidden units.

Implementation

The back propagation model implemented in ABLE meets the standard Rumelhart and McClelland specification, with some enhancements. The activation function is the logistic function, ranging from either 0.0 to 1.0 or from -0.5 to +0.5. Each unit receives inputs from all units in the preceding layer, along with a bias or threshold weight.

You can update weights either after each pattern is presented or only after a complete epoch is presented.

A tolerance parameter sets a threshold on the acceptable error between the actual output activation and the desired value. Error information consists of root mean square (RMS) errors on the last pattern, the worst pattern in the epoch, and the average of all patterns. The number of bad outputs (output - desired > tolerance) for the last pattern and the number of bad patterns (patterns with at least one bad output) are also provided.

Architecture

The number of inputs and outputs is determined by your problem definition and your knowledge representation decisions. An additional parameter is the number of hidden units. It is a common practice to keep this number small in order to improve the neural network's ability to generalize. However, if you select a number that is too small, the network may not converge on the training set, or training may take a very long time. Setting this parameter is part of the art of trying to specify and use back propagation networks in applications.

The number of hidden layers is usually set to one. In some cases, you may not need any hidden layers, resulting in improved training time. If the function you are trying to learn is very complex, you may have to use two or three hidden layers, with a corresponding degradation in training speed.

All Able-provided neural networks share some common attributes.

The following addition parameters relate to a back propagation network:

Architecture parameters

Number of inputs
Sets the number of units allocated for the input layer. This must be an integer value greater than or equal to 1.
Number of hidden units in layer 1
Sets the number of units allocated for the first hidden layer (if any). This must be an integer value greater than or equal to 0.
Number of hidden units in layer 2
Sets the number of units allocated for the second hidden layer (if any). This must be an integer value greater than or equal to 0.
Number of hidden units in layer 3
Sets the number of units allocated for the third hidden layer (if any). This must be an integer value greater than or equal to 0.
Number of outputs
Sets the number of units allocated for the output layer. This must be an integer value greater than or equal to 1.

Other Parameters

LearnRate
Controls how much the weights are changed during a weight update. The larger the value, the more the weights are changed. This must be a real value between 0.0 and 10.0.
Activation Function
Selects whether the activation function is standard, ranging from 0 to 1 (OFF) or symmetric, ranging from -0.5 to +0.5 (ON).
Momentum
Controls how much the weights are changed during a weight update by factoring in previous weight updates. It acts as a smoothing parameter that reduces oscillation and helps attain convergence. This must be a real value between 0.0 and 1.0, a typical value for momentum is 0.9.
Last RMS Error
Indicates the root-mean-square (RMS) of the error for a single training pattern. When the number of output units is n, the formula is:

sqrt((&Sigma.&epsilon.**2)/n)
Ave RMS Error
Indicates the average RMS error of the patterns in the previous epoch.
Error Tolerance
Sets the value for the acceptable difference between the desired output value and the actual output value. This must be a real value between 0.0 and 1.0. For example, if your training data set contains expected values of 0 and 1 and the tolerance is set to 0.1 (the default), then the average pattern error goes to 0 when all of the outputs are within 0.1 of the desired values.
Epoch Updates
Controls whether the network weights are updated after every pattern presentation (False) or only after a complete training epoch (True).
Last Num Bad Outputs
Indicates the number of output units that are out of the specified tolerance for a single training pattern.
Bad Pattern Ratio
Indicates the number of patterns in the previous epoch which have errors above tolerance divided by the total number of patterns.
Max RMS Error
Indicates the maximum RMS error of the patterns in the previous epoch.
Compute Sensitivity
Controls whether the accumulated error index is computed during training. This information indicates the relative importance of the inputs to the error.
ExplicitError
Controls whether the data in the input buffer during training is treated as desired (target) values or as the actual error value. This is used in back propagating error for control applications.

Training

There are several important parameters to set during training. The learning rate and momentum settings are complementary, and the results of modifying them can have a large effect on the training performance of the network. These values are commonly set from 0.5 to 0.7 for learning rate and 0.9 for momentum. However, some people have had success setting the learn rate to 2 or 5 with the momentum value at 0.001. Other techniques include altering the learn rate dynamically as the RMS error changes (termed adaptive learning), and beginning training with a near zero momentum and setting it to near one after some number of passes through the data. The best approach to take in setting these parameters is often determined by trial and error.

The error tolerance setting controls the training process. For a data set containing binary targets (0,1), the tolerance parameter is usually set to 0.1. This means that the output is considered "good" when it is within 0.1 of the desired output (that is, 0.9 for a 1, 0.1 for a 0). When every output is within the tolerance range of the desired output value, the network status is changed to LOCKED and weight updates are stopped.

You can also set the epoch update flag. If set to TRUE, the weights are changed only after every complete cycle through the training set. This is true gradient descent. When the epoch update flag is set to FALSE, weights are updated after each pattern is presented. For most problems, the network converges more quickly with the epoch update flag set to FALSE.

An additional factor in training a back propagation network is the order in which training patterns are presented. By setting the randomize flag on the Import object feeding the Network object, you can ensure that the network is presented with a random ordering of the training patterns. This is often useful in avoiding local minima and also aids in training speed.

Running

While training may be slow, the run-time performance of a back propagation network is relatively fast. The input vector is propagated through the network by multiplying it by a weight matrix and then passing it through an activation function. This vector is then multiplied by the succeeding weight matrices in a similar manner, depending on the number of hidden layers in the network. The outputs of the last layer of units are returned to the application program in the output array.