Help Topics
Concepts
Package
Class
Temporal Difference Learning Bean Properties
and Use
The Temporal Difference Learning Bean panel provides these
options:
- Architecture
- The network architecture consists of these parameters:
- Inputs, which must match the number of
outputs provided any bean with a data buffer
connections.
- Hidden1, which is the number of hidden
units in the first layer.
- Hidden2, which is the number of hidden
units in the second layer.
- Hidden3, which is the number of hidden
units in the third layer.
- Outputs, which is calculated when beans
are generated from the Training File.
- Feedback, which adds value to time series
forecasting. Select one of the following choices:
- None
- No feedback.
- Hidden layer
- Map the first hidden layer units back
to the input layer, acting as an
internal state or memory.
- Output layer
- Map the output layer units back to
the input layer, using the previous
network output as prior state
information. an internal state or
memory.
- Learn Rate
- Enter a value to control how much the network weights are
changed during a weight update. Larger values cause more
change. Learn rate is a real value between 0.0 and 10.0,
with a typical starting value of 0.2.
- Momentum
- Enter a value to control the amount that previous network
weight updates should influence the current network
weight update. This acts as a smoothing parameter that
reduces oscillation and helps attain convergence.
Momentum is a real value between 0.0 and 1.0, with a
typical value of 0.9.
- Tolerance
- During training the error is calculated for each record
and compared to the Tolerance value. Errors
greater than the tolerance value indicate a bad
calculation. If the error is within the tolerance, it is
treated as 0. Tolerance must be a real value between 0.0
and 1.0. A typical value is 0.1.
- Decay Factor
- Enter a value to control the decay of the context unit
activations. The context unit activation is computed as
(decay factor * activation) + unit activation
where unit is the network hidden or output unit. The
smaller the decay factor the more the current hidden or
output unit activation is reflected in the context value.
- Mode
- Select one of the following agent modes:
Train implies that the network bean's weights are
unlocked, and network weights will be adjusted as data is
processed.
Test implies that the network bean's weights are
locked, and that error calculations will be performed as
data is processed.
Run implies that the network bean's weights are
locked and no error calculations are made.
- Lambda
- Controls how the errors between successive predictions
are passed back in time. It is an exponential weighting
factor that controls the temporal credit assignment which
is the basis of reinforcement learning. A typical value
for lambda is 0.5.
- Gamma
- Controls the operating mode of the temporal difference
network. If set to 0.0, the network is a regular temporal
difference learning network, it calculates its errors by
taking the difference between successive predictions. If
it is set a value greater than 0.0, it is an Adaptive
Critic network. The target output is taken to be
Reinforcement + (Gamma * NetOutput). The Reinforcement
value is taken from the network input buffer (instead of
ignoring the target value, it is taken as the
reinforcement value).
- Adaptive learn rate
- Select adaptive learning if the Learn Rate is to
be lowered as training progresses.
The Temporal Difference Learning Bean panel is used to create
a network with specified architecture and training parameters.
The Mode is set so that the network bean can be trained or
used to provide an independant data source to test that training
is sufficient.
Steps in using the panel for training include:
- Set the architecture input value to the number of outputs
from the bean providing data.
- Set the values for hidden unit layers.
- Set the number of outputs.
- Select the feedback mechanism.
- Press the Set Architecture button.
- Train the network by pressing the Step, Cycle, or Run
buttons on the Agent Editor toolbar.
- You may wish to press the Stop toolbar button,
change a parameter such as Tolerance or Lambda, and start
again. If you change the network architecture,
press Set Architecture for the changes to take
effect. Press the Reset Weights button to re-initialize
the network weights before starting training again if you
wish.