Decision Tree Bean

A decision-tree learning algorithm builds a hypothesis or decision rule in DNF (Disjunctive Normal Form) represented as a tree graph. Each node of the tree is a test for a feature value (e.g., is feature f = value v?). Each path from the root of the tree to a leaf node is a rule made by the conjunction of all feature values found along the path. The leaf node specifies the class assigned to an example.

Training examples are represented as pairs (x,c), where x is a feature vector and c is the class assigned to x. Features can be nominal or numeric (numeric features are discretized previous to the learning phase). Decision-tree algorithms are popular due to their relative speed and for enabling interpretation of the hypothesis, which may give insight into how features correlate with respect to the target concept.

Implementation

A decision tree is built by recursively partitioning the training set until examples on each partition belong to the same class. At each node the algorithm selects the best feature to divide the training set into regions that are class uniform. The algorithm is then applied recursively on each region (for details see "C4.5: Programs for Machine Learning" by J. R. Quinlan, Morgan Kaufmann, 1993).

Training Phase

A decision-tree algorithm operates in two phases. In the training or learning phase the algorithm recursively builds the decision tree in search for regions of example that are class uniform. The learning phase can only work in batch mode (some incremental versions of decision trees do exist).

Testing Phase

In the testing phase, each example is run through the decision tree by following the path that makes the test at each node true. The example is then assigned the class of the last (leaf) node. Statistics such as percentCorrect are maintained.

Run Phase

When the tree is in Run mode, a pattern is run through the decision tree by following the path that makes the test at each node true. The example is then assigned the class of the last (leaf) node.

Properties for Inspection

The following inspectable properties are available:

correctPredClass
The number of correctly predicted classes in the test phase.
currentExample
The features of the current example processed by the decision tree.
decisionTree
A representation of the tree that shows the test and conclusions at each node.
incorrectPredClass
The number of incorrectly predicted classes in the test phase.
netMode
The decision tree mode - train, test, or run.
percentCorrect
The number of correctly predicted classes expressed as a percent of total classes predicted in the test phase.
predictClass
The value of the class calculated by the decision tree.
predictClassIndex
The index of the class calculated by the decision tree.
totalPredClass
The total number of examples processed by the decision tree.