com.ibm.able.beans.bayes
Class NaiveBayes
java.lang.Object
|
+--com.ibm.able.beans.bayes.NaiveBayes
- All Implemented Interfaces:
- java.io.Serializable
- public class NaiveBayes
- extends java.lang.Object
- implements java.io.Serializable
- See Also:
- Serialized Form
Constructor Summary |
NaiveBayes()
|
NaiveBayes(int ncls,
int nftr,
int[] nval,
double m,
double[] cpriors,
double[][][] ppriors)
construct NaiveBayes with the explicitly specified parameters
ncls - number of class labels
nftr - number of features
nval - number of values per each feature (assuming nominal - discrete finite-valued - features)
cpriors - prior probability distribution over class labels
m - equivalent sample size
ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation
with equivalent sample size method) |
Method Summary |
void |
buildHypothesis(int[][] data,
int[] labels,
int ninst,
int ncls,
int nftr,
int[] nval)
Build a hypothesis using explicit parameters
This function learns a naive Bayes model given a set of labeled instances. |
double |
classify(int[][] data,
int[] labels,
int ninst,
boolean[] selectedFeatures)
Classify a record
Input:
data - training data in a table format where each row represents
an instance, and each column represents an attribute (feature). |
int |
classifyExample(int[] instance,
boolean[] selectedFeatures)
This function selects the maximum-likelihood class label given a data instance
Input:
instance - feature vector (discrete finite feature values represented by integers)
selectedFeatures - boolean array specifying selected features (by default, null -all features included) |
double[] |
findClassProbability(int[] instance,
boolean[] selectedFeatures)
This function returns the posterior probability distribution over class labels,
given a data instance using Bayes rule as follows:
find P(class|instance)=P(instance|class)P(class)/sum_i P(instnace|class_i)
Input:
instance - feature vector (discrete finite feature values represented by integers)
selectedFeatures - boolean array specifying selected features (by default, null -all features included) |
double |
getAccuracy()
|
double |
getAvgLikelihood()
|
double |
getAvgLogLikelihood()
|
double[] |
getClassPriors()
|
double[] |
getClassProb()
|
int[][] |
getConfusionMatrix()
|
double[][][] |
getCPT()
|
double[] |
getEqSampleSize()
|
int |
getNClasses()
|
int |
getNFeatures()
|
int[] |
getNFValues()
|
void |
initializeNB(int ncls,
int nftr,
int[] nval,
double eqss,
double[] cpriors,
double[][][] ppriors)
internal function that implements the class construction with explicit list of parameters
ncls - number of class labels
nftr - number of features
nval - number of values per each feature (assuming nominal - discrete finite-valued - features)
cpriors - prior probability distribution over class labels
eqss - equivalent sample sizes for each class (by deafult, each class was seen at least once)
ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation
with equivalent sample size method) |
double |
likelihood(int[] instance,
int classlabel)
Compute the likelihood of an instance given a class label |
void |
setClassPriors(double[] cpriors)
|
void |
setCPT(double[][][] cptpriors)
|
void |
setNClasses(int ncls)
|
void |
setNFeatures(int nftr,
int[] nfv)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
serialVersionUID
protected static final long serialVersionUID
NaiveBayes
public NaiveBayes()
NaiveBayes
public NaiveBayes(int ncls,
int nftr,
int[] nval,
double m,
double[] cpriors,
double[][][] ppriors)
throws AbleException
- construct NaiveBayes with the explicitly specified parameters
ncls - number of class labels
nftr - number of features
nval - number of values per each feature (assuming nominal - discrete finite-valued - features)
cpriors - prior probability distribution over class labels
m - equivalent sample size
ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation
with equivalent sample size method)
initializeNB
public void initializeNB(int ncls,
int nftr,
int[] nval,
double eqss,
double[] cpriors,
double[][][] ppriors)
throws AbleException
- internal function that implements the class construction with explicit list of parameters
ncls - number of class labels
nftr - number of features
nval - number of values per each feature (assuming nominal - discrete finite-valued - features)
cpriors - prior probability distribution over class labels
eqss - equivalent sample sizes for each class (by deafult, each class was seen at least once)
ppriors - prior estimates of the probabilities P(f|C) (used for Bayesian parameter estimation
with equivalent sample size method)
getNClasses
public int getNClasses()
getNFeatures
public int getNFeatures()
getNFValues
public int[] getNFValues()
getEqSampleSize
public double[] getEqSampleSize()
getCPT
public double[][][] getCPT()
getClassPriors
public double[] getClassPriors()
getAvgLikelihood
public double getAvgLikelihood()
getAvgLogLikelihood
public double getAvgLogLikelihood()
getAccuracy
public double getAccuracy()
getClassProb
public double[] getClassProb()
getConfusionMatrix
public int[][] getConfusionMatrix()
setNClasses
public void setNClasses(int ncls)
throws java.lang.IllegalArgumentException
setNFeatures
public void setNFeatures(int nftr,
int[] nfv)
throws java.lang.IllegalArgumentException
setCPT
public void setCPT(double[][][] cptpriors)
throws AbleException
setClassPriors
public void setClassPriors(double[] cpriors)
buildHypothesis
public void buildHypothesis(int[][] data,
int[] labels,
int ninst,
int ncls,
int nftr,
int[] nval)
- Build a hypothesis using explicit parameters
This function learns a naive Bayes model given a set of labeled instances.
Given class value C=c, for each attribute (feature) f, it computes an estimate
of the parameter P(f|C=c) using the m-estimate of probability, or so-called
equivalent sample size method (see Mitchel, Machine Learning):
P(f=a|C=c)= (n + mp)/(N+m),
where N is the total number of instances for which C=c, n is the number of these
instances for which also f=a, p is a prior estimate of the probability P(f=a|C=c),
and m is a constant called the equivalent sample size. The m-estimate can be interpreted
as an empirical probability estimate (i.e. frequency) assuming m additional "prior"
instances distributed according to p, combined with the "current" N instances.
When there is no prior information about p, we assume uniform distribution over all possible
values for each feature, i.e. p=1/nFValues.
Input:
data - training data in a table format where each row represents
an instance, and each column represents an attribute (feature).
labels - the class label.
ninst - the number of training instances
ncls - number of class labels
nftr - number of features
nval - number of values per each feature (assuming nominal - discrete finite-valued - features)
(We always assume that the class labels and the feature number indexes start from 0).
classify
public double classify(int[][] data,
int[] labels,
int ninst,
boolean[] selectedFeatures)
- Classify a record
Input:
data - training data in a table format where each row represents
an instance, and each column represents an attribute (feature).
labels - the class label.
ninst - the number of training instances
selectedFeatures - boolean array specifying selected features (by default, null -all features included)
We always assume that the class labels and the feature number indexes start from 0.
classifyExample
public int classifyExample(int[] instance,
boolean[] selectedFeatures)
- This function selects the maximum-likelihood class label given a data instance
Input:
instance - feature vector (discrete finite feature values represented by integers)
selectedFeatures - boolean array specifying selected features (by default, null -all features included)
findClassProbability
public double[] findClassProbability(int[] instance,
boolean[] selectedFeatures)
- This function returns the posterior probability distribution over class labels,
given a data instance using Bayes rule as follows:
find P(class|instance)=P(instance|class)P(class)/sum_i P(instnace|class_i)
Input:
instance - feature vector (discrete finite feature values represented by integers)
selectedFeatures - boolean array specifying selected features (by default, null -all features included)
likelihood
public double likelihood(int[] instance,
int classlabel)
throws java.lang.IllegalArgumentException
- Compute the likelihood of an instance given a class label
(C) Copyright IBM Corporation 1999, 2003