Machine Learning Naive Bayes

The foundation: Probabilistic model
Knowledgement and Process
Constructing a classifier from the probability model
Parameter estimation and event models
Codes
Multinomial Naive Bayes

Naive Bayes Algorithms

The foundation: Probabilistic model

Abstractly, naive Bayes is a conditional probability model: given a problem instance to be classified. Bayes Probability terminology:

Knowledgement and Process

The sample belongs to the type with the highest probability. The definition is:

a. is the features data, a means x’s every feature

b. Multiple classes:

c. Calculate conditional Probabilistic:

d. If , then

The point is how to calculate the step c:

c.1 Get the training set.

c.2 Calculate the conditional probability:

c.3 If features are conditional dependent, based on

Because the denominator is constant, so maximum the numerator, and features are dependent, so

Constructing a classifier from the probability model

The discussion so far has derived the independent feature model, that is, the naive Bayes probability model. The naive Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule. The corresponding classifier, a Bayes classifier, is the function that assigns a class label for some k as follows:

Parameter estimation and event models

Gaussian Naive Bayes

The likelihood of the features is assumed to be Gaussian:

Codes

using scikit-learn package

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d" % (iris.data.shape[0],(iris.target != y_pred).sum()))

Multinomial Naive Bayes

MultinomialNB implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice).

Bernoulli Naive Bayes

classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. Therefore, this class requires samples to be represented as binary-valued feature vectors; if handed any other kind of data, a BernoulliNB instance may binarize its input (depending on the binarize parameter).

The decision rule for Bernoulli naive Bayes is based on