Naive Bayes

Follow

Verify

Naive Bayes (also known as the Bayes Classifier) is a probabilistic classifier that has been widely used for both clustering and classification. The probabilistic model of Naive Bayes is based on Bayes theorem and is termed Naive due to strong independence assumptions. Additionally, Naive Bayes is called a generative model as it models a distribution of points. To contrast, whereas a discriminative model such as logistic regression or a support vector machine is attempting to separate the data set via a line (or, more generally a hyper-plane), a generative model such as Naive Bayes is attempting to model the data points that are distant from that line.

Derivation

Suppose that given ${\textbf {y}}$ , we have ${\textbf {X}}={\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}$ as conditionally independent random variables with ${\textbf {y}}$ a vector of length $n$ . By the definition of conditional independence we have:

$P({\textbf {X}}|{\textbf {y}})=P({\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}|{\textbf {y}})=P({\textbf {x}}_{1}|{\textbf {y}})P({\textbf {x}}_{2},\dots ,{\textbf {x}}_{m}|{\textbf {y}})=P({\textbf {x}}_{1}|{\textbf {y}})P({\textbf {x}}_{2}|{\textbf {y}})P({\textbf {x}}_{3},\dots ,{\textbf {x}}_{m}|{\textbf {y}})=\dots =P({\textbf {x}}_{1}|{\textbf {y}})P({\textbf {x}}_{2}|{\textbf {y}})\cdots P({\textbf {x}}_{m}|{\textbf {y}})=\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}).$

Now, assuming ${\textbf {y}}$ is a discrete variable, by Bayes Rule and the above we have:

$P({\textbf {y}}=y_{j}|{\textbf {x}}_{1},\dots ,{\textbf {x}}_{m})={\dfrac {P({\textbf {y}}=y_{j})P({\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}|{\textbf {y}}=y_{j})}{\sum _{k=1}^{n}P({\textbf {y}}=y_{k})P({\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}|{\textbf {y}}=y_{k})}}={\dfrac {P({\textbf {y}}=y_{j})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{j})}{\sum _{k=1}^{n}P({\textbf {y}}=y_{k})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{k})}}.$

Therefore, to attain the value of ${\textbf {y}}$ with the highest probability, (denoted by ${\hat {\textbf {y}}}$ ) it suffices to solve the equation:

${\hat {\textbf {y}}}=arg\max _{y_{j}}{\bigg \lbrace }{\dfrac {P({\textbf {y}}=y_{j})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{j})}{\sum _{k=1}^{n}P({\textbf {y}}=y_{k})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{k})}}{\bigg \rbrace }$ .

Notice that as the equation in the denominator does not depend on $y_{j}$ , it follows that the equation above reduces to:

${\hat {\textbf {y}}}=arg\max _{y_{j}}{\bigg \lbrace }P({\textbf {y}}=y_{j})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{j}){\bigg \rbrace }$ .

This is called the Naive Bayes classification rule.

Controversies

The adjective naive comes from the unrealistic assumption that the features in a data set are conditionally independent. In practice, this assumption is often violated. However, it should be noted that naive Bayes classifiers still tend to perform very well under this unrealistic assumption for small sample sizes. ^[1]

Additionally, consider a binary response $0,1$ and suppose that for some level of a predictor factor, we have all of that level assigned to $0$ . In this case, the conditional probability of ${\textbf {x}}$ given ${\textbf {y}}$ will be zero and therefore (as the formula above involves a product) will zero out the information contained in the posterior distribution. In other words, given data with a predictor class that is assigned to only 0, the Naive Bayes model will always predict new data for that particular class as 0. Depending on the application, this may or may not be desirable.

History

1763: Reverend Thomas Bayes (1702–61), who studied how to compute a distribution for the probability parameter of a binomial distribution and Bayes' theorem was named after him.

1960 :Naive Bayes introduced under a different name into the text retrieval community in the early 1960s.

Business problems that could be solved with Naive Bayes techniques

Naive Bayes can be used to for purposes such as predicting customer behavior, ^[2] predicting customer preferences, ^[3] predicting customer churn, ^[4] predicting fraudulent financial reporting, ^[5] spam detection, ^[6] network security, ^[7] and sentiment analysis. ^[8] Naive Bayes may also be used as a distance measurement between categorical variables. ^[9]

Example

Consider the following dataset:

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes

Suppose we want to classify a new patient with the following observation:

Fever	Headache	Sore Throat
Yes	No	Yes

Now, we have:

$P(Flu=Yes)=.60$	$P(Flu=No)=.40$
$P(Fever=Yes\|Flu=Yes)=0.5$	$P(Fever=Yes\|Flu=No)=0.25$
$P(Headache=Yes\|Flu=Yes)=0.33$	$P(Headache=Yes\|Flu=No)=0.5$
$P(SoreThroat=Yes\|Flu=Yes)=0.66$	$P(SoreThroat=Yes\|Flu=No)=0.5$
$P(Fever=No\|Flu=Yes)=0.5$	$P(Fever=No\|Flu=No)=0.75$
$P(Headache=No\|Flu=Yes)=0.66$	$P(Headache=No\|Flu=No)=0.5$
$P(SoreThroat=No\|Flu=Yes)=0.33$	$P(SoreThroat=No\|Flu=No)=0.5$

From this, we have

$P(Flu=Yes)P(Fever=Yes|Flu=Yes)P(Headache=No|Flu=Yes)P(SoreThroat=Yes|Flu=Yes)=(0.6)(0.5)(.66)(.66)=0.13$

and

$P(Flu=No)P(Fever=Yes|Flu=No)P(Headache=No|Flu=No)P(SoreThroat=Yes|Flu=No)=(0.4)(0.5)(0.5)(0.5)=0.05$ .

Therefore, as 0.13 is larger that 0.05, this new patient would be classified as being sick with the flu.

Top 5 Recent Tweets

Date	Author	Tweet
11 Dec 2015	@vyassaurabh411	One of the most informative articles on #NaiveBayes and #TextClassification!! Thanks @rasbt !! #DataScience http://sebastianraschka.com/Articles/2014_naive_bayes_1.html …
13 Dec 2015	@albuhhh	Overhearing a Stats PhD talk about his daily fantasy sports algorithm. Hint: involves naive Bayes. Exhibit A that you're getting fleeced.
11 Dec 2015	@@gcosma1	A nice explanation of naive bayes #machinelearning #datascience
10 Dec 2015	@kylemathews	classifying domains into arbitrary categories. Using Naive bayes. It's working really well but there's lots of knobs to learn.
4 Dec 2015	@cljds	An implementation of Naive Bayes in Clojure applied to the Titanic dataset https://github.com/clojuredatascience/ch4-classification/blob/master/src/cljds/ch4/examples.clj#L215 … #clojurex

Top 5 Recent News Headlines

Top 5 Lifetime Tweets

Top 5 Lifetime News Headlines

↑ http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4-NaiveBayes.pdf
↑ Mauser, Arne, et al. "Predicting customer behavior using naive bayes and maximum entropy–winning the data-mining-cup 2004." Proc. Informatiktage. 2005.
↑ Balaji, S., and S. K. Srivatsa. "Naïve Bayes Classification Approach for Mining Life Insurance Databases for Effective Prediction of Customer Preferences over Life Insurance Products." International Journal of Computer Applications 51.3 (2012).
↑ Nath, Shyam V., and Ravi S. Behara. "Customer churn analysis in the wireless industry: A data mining approach." Proceedings-annual meeting of the decision sciences institute. 2003.
↑ Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons, 2011.
↑ Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. "Spam filtering with naive bayes-which naive bayes?." CEAS. 2006.
↑ Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive bayes." International journal of computer science and network security 7.12 (2007): 258-263.
↑ Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.
↑ Li, Chaoqun, Liangxiao Jiang, and Hongwei Li. "Naive Bayes for value difference metric." Frontiers of Computer Science 8.2 (2014): 255-264

[1] ttp://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4-NaiveBayes.pdf

[2] Mauser, Arne, et al. "Predicting customer behavior using naive bayes and maximum entropy–winning the data-mining-cup 2004." Proc. Informatiktage. 2005.

[3] Balaji, S., and S. K. Srivatsa. "Naïve Bayes Classification Approach for Mining Life Insurance Databases for Effective Prediction of Customer Preferences over Life Insurance Products." International Journal of Computer Applications 51.3 (2012).

[4] Nath, Shyam V., and Ravi S. Behara. "Customer churn analysis in the wireless industry: A data mining approach." Proceedings-annual meeting of the decision sciences institute. 2003.

[5] Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons, 2011.

[6] Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. "Spam filtering with naive bayes-which naive bayes?." CEAS. 2006.

[7] Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive bayes." International journal of computer science and network security 7.12 (2007): 258-263.

[8] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.

[9] Li, Chaoqun, Liangxiao Jiang, and Hongwei Li. "Naive Bayes for value difference metric." Frontiers of Computer Science 8.2 (2014): 255-264

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

$P(Flu=Yes)=.60$	$P(Flu=No)=.40$
$P(Fever=Yes\|Flu=Yes)=0.5$	$P(Fever=Yes\|Flu=No)=0.25$
$P(Headache=Yes\|Flu=Yes)=0.33$	$P(Headache=Yes\|Flu=No)=0.5$
$P(SoreThroat=Yes\|Flu=Yes)=0.66$	$P(SoreThroat=Yes\|Flu=No)=0.5$
$P(Fever=No\|Flu=Yes)=0.5$	$P(Fever=No\|Flu=No)=0.75$
$P(Headache=No\|Flu=Yes)=0.66$	$P(Headache=No\|Flu=No)=0.5$
$P(SoreThroat=No\|Flu=Yes)=0.33$	$P(SoreThroat=No\|Flu=No)=0.5$

Naive Bayes

Follow

Verify

Contents

Derivation

Controversies

History

Business problems that could be solved with Naive Bayes techniques

Example

Top 5 Recent Tweets

Top 5 Recent News Headlines

Top 5 Lifetime Tweets

Top 5 Lifetime News Headlines

Top Authors

Verification history

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes

Naive Bayes Follow Verify

Contents

Derivation

Controversies

History

Business problems that could be solved with Naive Bayes techniques

Example

Top 5 Recent Tweets

Top 5 Recent News Headlines

Top 5 Lifetime Tweets

Top 5 Lifetime News Headlines

Top Authors

Verification history

Navigation menu

Search

Naive Bayes

Follow

Verify

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes