A decision-tree learning algorithm builds a hypothesis or decision rule in DNF (Disjunctive Normal Form) represented as a tree graph. Each node of the tree is a test for a feature value (e.g., is feature f = value v?). Each path from the root of the tree to a leaf node is a rule made by the conjunction of all feature values found along the path. The leaf node specifies the class assigned to an example.
Training examples are represented as pairs (x,c), where x=(x1,…xn
A decision tree is built by recursively partitioning the training set until examples on each partition belong to the same class. At each node the algorithm selects the best feature to divide the training set into regions that are class uniform. The algorithm is then applied recursively on each region (for details see "C4.5: Programs for Machine Learning" by J. R. Quinlan, Morgan Kaufmann, 1993).
The implementation of Decision Tree algorithm consists of three phases: training phase, test phase and run phase. In the training phase, the algorithm constructs the decision tree based on the training data. During test or run phase, the algorithm uses this decision tree to classify new unseen examples. In the test phase, the correct class of the new examples is known as opposed to the run phase where it is not known and needs to be output. The correct classification given in the test phase is used to assess the correctness of the algorithm.
The algorithm has two parameters that can be tuned: the metric parameter, which is used to find the best attribute at each node, and the discretization parameter, which represents the number of intervals used to discretize continuous features. All the other parameters are determined based on the input file (e.g. number of attributes, number of possible classes, number of examples/records) and cannot be changed unless the input file changes. The data files provided during test or run phases should be consistent with the input file.
The following inspectable properties are available: