Help Topics     Concepts     Package     Class


Temporal Difference Learning Bean Properties and Use

Properties

The Temporal Difference Learning Bean panel provides these options:

Architecture
The network architecture consists of these parameters:
  1. Inputs, which must match the number of outputs provided any bean with a data buffer connections.
  2. Hidden1, which is the number of hidden units in the first layer.
  3. Hidden2, which is the number of hidden units in the second layer.
  4. Hidden3, which is the number of hidden units in the third layer.
  5. Outputs, which is calculated when beans are generated from the Training File.
  6. Feedback, which adds value to time series forecasting. Select one of the following choices:
    None
    No feedback.
    Hidden layer
    Map the first hidden layer units back to the input layer, acting as an internal state or memory.
    Output layer
    Map the output layer units back to the input layer, using the previous network output as prior state information. an internal state or memory.
Learn Rate
Enter a value to control how much the network weights are changed during a weight update. Larger values cause more change. Learn rate is a real value between 0.0 and 10.0, with a typical starting value of 0.2.
Momentum
Enter a value to control the amount that previous network weight updates should influence the current network weight update. This acts as a smoothing parameter that reduces oscillation and helps attain convergence. Momentum is a real value between 0.0 and 1.0, with a typical value of 0.9.
Tolerance
During training the error is calculated for each record and compared to the Tolerance value. Errors greater than the tolerance value indicate a bad calculation. If the error is within the tolerance, it is treated as 0. Tolerance must be a real value between 0.0 and 1.0. A typical value is 0.1.
Decay Factor
Enter a value to control the decay of the context unit activations. The context unit activation is computed as
                 (decay factor * activation) + unit activation
             

where unit is the network hidden or output unit. The smaller the decay factor the more the current hidden or output unit activation is reflected in the context value.

Mode
Select one of the following agent modes:
Train implies that the network bean's weights are unlocked, and network weights will be adjusted as data is processed.
Test implies that the network bean's weights are locked, and that error calculations will be performed as data is processed.
Run implies that the network bean's weights are locked and no error calculations are made.
Lambda
Controls how the errors between successive predictions are passed back in time. It is an exponential weighting factor that controls the temporal credit assignment which is the basis of reinforcement learning. A typical value for lambda is 0.5.
Gamma
Controls the operating mode of the temporal difference network. If set to 0.0, the network is a regular temporal difference learning network, it calculates its errors by taking the difference between successive predictions. If it is set a value greater than 0.0, it is an Adaptive Critic network. The target output is taken to be Reinforcement + (Gamma * NetOutput). The Reinforcement value is taken from the network input buffer (instead of ignoring the target value, it is taken as the reinforcement value).
Adaptive learn rate
Select adaptive learning if the Learn Rate is to be lowered as training progresses.

Use

The Temporal Difference Learning Bean panel is used to create a network with specified architecture and training parameters. The Mode is set so that the network bean can be trained or used to provide an independant data source to test that training is sufficient.

Steps in using the panel for training include:

  1. Set the architecture input value to the number of outputs from the bean providing data.
  2. Set the values for hidden unit layers.
  3. Set the number of outputs.
  4. Select the feedback mechanism.
  5. Press the Set Architecture button.
  6. Train the network by pressing the Step, Cycle, or Run buttons on the Agent Editor toolbar.
  7. You may wish to press the Stop toolbar button, change a parameter such as Tolerance or Lambda, and start again. If you change the network architecture, press Set Architecture for the changes to take effect. Press the Reset Weights button to re-initialize the network weights before starting training again if you wish.