ABLE 2.0.0 07/02/2003 10:25:01

com.ibm.able.beans.bayes
Class NaiveBayes

java.lang.Object
  |
  +--com.ibm.able.beans.bayes.NaiveBayes
All Implemented Interfaces:
java.io.Serializable

public class NaiveBayes
extends java.lang.Object
implements java.io.Serializable

See Also:
Serialized Form

Field Summary
protected static long serialVersionUID
           
 
Constructor Summary
NaiveBayes()
           
NaiveBayes(int ncls, int nftr, int[] nval, double m, double[] cpriors, double[][][] ppriors)
          construct NaiveBayes with the explicitly specified parameters ncls - number of class labels nftr - number of features nval - number of values per each feature (assuming nominal - discrete finite-valued - features) cpriors - prior probability distribution over class labels m - equivalent sample size ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation with equivalent sample size method)
 
Method Summary
 void buildHypothesis(int[][] data, int[] labels, int ninst, int ncls, int nftr, int[] nval)
          Build a hypothesis using explicit parameters This function learns a naive Bayes model given a set of labeled instances.
 double classify(int[][] data, int[] labels, int ninst, boolean[] selectedFeatures)
          Classify a record Input: data - training data in a table format where each row represents an instance, and each column represents an attribute (feature).
 int classifyExample(int[] instance, boolean[] selectedFeatures)
          This function selects the maximum-likelihood class label given a data instance Input: instance - feature vector (discrete finite feature values represented by integers) selectedFeatures - boolean array specifying selected features (by default, null -all features included)
 double[] findClassProbability(int[] instance, boolean[] selectedFeatures)
          This function returns the posterior probability distribution over class labels, given a data instance using Bayes rule as follows: find P(class|instance)=P(instance|class)P(class)/sum_i P(instnace|class_i) Input: instance - feature vector (discrete finite feature values represented by integers) selectedFeatures - boolean array specifying selected features (by default, null -all features included)
 double getAccuracy()
           
 double getAvgLikelihood()
           
 double getAvgLogLikelihood()
           
 double[] getClassPriors()
           
 double[] getClassProb()
           
 int[][] getConfusionMatrix()
           
 double[][][] getCPT()
           
 double[] getEqSampleSize()
           
 int getNClasses()
           
 int getNFeatures()
           
 int[] getNFValues()
           
 void initializeNB(int ncls, int nftr, int[] nval, double eqss, double[] cpriors, double[][][] ppriors)
          internal function that implements the class construction with explicit list of parameters ncls - number of class labels nftr - number of features nval - number of values per each feature (assuming nominal - discrete finite-valued - features) cpriors - prior probability distribution over class labels eqss - equivalent sample sizes for each class (by deafult, each class was seen at least once) ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation with equivalent sample size method)
 double likelihood(int[] instance, int classlabel)
          Compute the likelihood of an instance given a class label
 void setClassPriors(double[] cpriors)
           
 void setCPT(double[][][] cptpriors)
           
 void setNClasses(int ncls)
           
 void setNFeatures(int nftr, int[] nfv)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

protected static final long serialVersionUID
Constructor Detail

NaiveBayes

public NaiveBayes()

NaiveBayes

public NaiveBayes(int ncls,
                  int nftr,
                  int[] nval,
                  double m,
                  double[] cpriors,
                  double[][][] ppriors)
           throws AbleException
construct NaiveBayes with the explicitly specified parameters ncls - number of class labels nftr - number of features nval - number of values per each feature (assuming nominal - discrete finite-valued - features) cpriors - prior probability distribution over class labels m - equivalent sample size ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation with equivalent sample size method)
Method Detail

initializeNB

public void initializeNB(int ncls,
                         int nftr,
                         int[] nval,
                         double eqss,
                         double[] cpriors,
                         double[][][] ppriors)
                  throws AbleException
internal function that implements the class construction with explicit list of parameters ncls - number of class labels nftr - number of features nval - number of values per each feature (assuming nominal - discrete finite-valued - features) cpriors - prior probability distribution over class labels eqss - equivalent sample sizes for each class (by deafult, each class was seen at least once) ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation with equivalent sample size method)

getNClasses

public int getNClasses()

getNFeatures

public int getNFeatures()

getNFValues

public int[] getNFValues()

getEqSampleSize

public double[] getEqSampleSize()

getCPT

public double[][][] getCPT()

getClassPriors

public double[] getClassPriors()

getAvgLikelihood

public double getAvgLikelihood()

getAvgLogLikelihood

public double getAvgLogLikelihood()

getAccuracy

public double getAccuracy()

getClassProb

public double[] getClassProb()

getConfusionMatrix

public int[][] getConfusionMatrix()

setNClasses

public void setNClasses(int ncls)
                 throws java.lang.IllegalArgumentException

setNFeatures

public void setNFeatures(int nftr,
                         int[] nfv)
                  throws java.lang.IllegalArgumentException

setCPT

public void setCPT(double[][][] cptpriors)
            throws AbleException

setClassPriors

public void setClassPriors(double[] cpriors)

buildHypothesis

public void buildHypothesis(int[][] data,
                            int[] labels,
                            int ninst,
                            int ncls,
                            int nftr,
                            int[] nval)
Build a hypothesis using explicit parameters This function learns a naive Bayes model given a set of labeled instances. Given class value C=c, for each attribute (feature) f, it computes an estimate of the parameter P(f|C=c) using the m-estimate of probability, or so-called equivalent sample size method (see Mitchel, Machine Learning): P(f=a|C=c)= (n + mp)/(N+m), where N is the total number of instances for which C=c, n is the number of these instances for which also f=a, p is a prior estimate of the probability P(f=a|C=c), and m is a constant called the equivalent sample size. The m-estimate can be interpreted as an empirical probability estimate (i.e. frequency) assuming m additional "prior" instances distributed according to p, combined with the "current" N instances. When there is no prior information about p, we assume uniform distribution over all possible values for each feature, i.e. p=1/nFValues. Input: data - training data in a table format where each row represents an instance, and each column represents an attribute (feature). labels - the class label. ninst - the number of training instances ncls - number of class labels nftr - number of features nval - number of values per each feature (assuming nominal - discrete finite-valued - features) (We always assume that the class labels and the feature number indexes start from 0).

classify

public double classify(int[][] data,
                       int[] labels,
                       int ninst,
                       boolean[] selectedFeatures)
Classify a record Input: data - training data in a table format where each row represents an instance, and each column represents an attribute (feature). labels - the class label. ninst - the number of training instances selectedFeatures - boolean array specifying selected features (by default, null -all features included) We always assume that the class labels and the feature number indexes start from 0.

classifyExample

public int classifyExample(int[] instance,
                           boolean[] selectedFeatures)
This function selects the maximum-likelihood class label given a data instance Input: instance - feature vector (discrete finite feature values represented by integers) selectedFeatures - boolean array specifying selected features (by default, null -all features included)

findClassProbability

public double[] findClassProbability(int[] instance,
                                     boolean[] selectedFeatures)
This function returns the posterior probability distribution over class labels, given a data instance using Bayes rule as follows: find P(class|instance)=P(instance|class)P(class)/sum_i P(instnace|class_i) Input: instance - feature vector (discrete finite feature values represented by integers) selectedFeatures - boolean array specifying selected features (by default, null -all features included)

likelihood

public double likelihood(int[] instance,
                         int classlabel)
                  throws java.lang.IllegalArgumentException
Compute the likelihood of an instance given a class label

ABLE 2.0.0 07/02/2003 10:25:01

(C) Copyright IBM Corporation 1999, 2003