Naive Bayes Bean

Naive Bayes learns a probabilistic model from training data, and then uses Bayes rule to assign the most likely class to a previously unseen instance. An instance is represented by a vector of features or attributes. Features can be either discrete (finite number of values) or continuous (in which case data must be discretized before learning using some discretization method). Naive Bayes makes a simplifying assumption that features are independent given class. Although this assumption is often violated, naive Bayes is surprisingly successful in practice and is competitive with other state-of-the-art learning approaches.

Implementation

Naive Bayes operates in two phases: learning and classification. In the learning phase, Naive Bayes learns a probabilistic model given a set of training data instances together with their class labels. Naive Bayes can learn both in a batch mode (all instances are given at once), and in an incremental mode, when training instances are added sequentially. This is performed by the buildHypothesis() method.

The learning algorithm buildHypothesis() estimates prior class probabilities, P(C=i), and conditional probabilities P(f=v|c) of each feature f taking value v given class c (maximum-likelihood estimates are computed as frequencies, in order to avoid zero denominators). In the case of sparse data, we use the equivalent sample size technique - a Bayesian technique of combining prior beliefs about the probabilities with learning from data. For details, see "Machine Learning" by T. Mitchell).

In the classification phase, Naive Bayes assigns the most likely class label to a given data instance, or a set of instances. It compares the assigned class with the correct class for each instance, and computes classification accuracy.

Training

The class field values range from 0 to n-1. It may be necessary to use a translate filter to provide this range. If the data is in origin 1, the ADD -1 filter function can be applied. Training can continue incrementally forever but one pass through the data may be sufficient.

Running

Training is relatively slow in spite of needing only one pass through the data compared to other learning techniques. The run-time performance of a naive Bayes network is relatively fast however.