Naive Bayes

Follow

Verify

Naive Bayes (also known as the Bayes Classifier) is a probabilistic classifier that has been widely used for both clustering and classification. The probabilistic model of Naive Bayes is based on Bayes theorem and is termed Naive due to strong independence assumptions. Additionally, Naive Bayes is called a generative model as it models a distribution of points. To contrast, whereas a discriminative model such as logistic regression or a support vector machine is attempting to separate the data set via a line (or, more generally a hyper-plane), a generative model such as Naive Bayes is attempting to model the data points that are distant from that line.

Derivation[edit | edit source]

Suppose that given ${\textbf {y}}$ , we have ${\textbf {X}}={\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}$ as conditionally independent random variables with ${\textbf {y}}$ a vector of length $n$ . By the definition of conditional independence we have:

$P({\textbf {X}}|{\textbf {y}})=P({\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}|{\textbf {y}})=P({\textbf {x}}_{1}|{\textbf {y}})P({\textbf {x}}_{2},\dots ,{\textbf {x}}_{m}|{\textbf {y}})=P({\textbf {x}}_{1}|{\textbf {y}})P({\textbf {x}}_{2}|{\textbf {y}})P({\textbf {x}}_{3},\dots ,{\textbf {x}}_{m}|{\textbf {y}})=\dots =P({\textbf {x}}_{1}|{\textbf {y}})P({\textbf {x}}_{2}|{\textbf {y}})\cdots P({\textbf {x}}_{m}|{\textbf {y}})=\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}).$

Now, assuming ${\textbf {y}}$ is a discrete variable, by Bayes Rule and the above we have:

$P({\textbf {y}}=y_{j}|{\textbf {x}}_{1},\dots ,{\textbf {x}}_{m})={\dfrac {P({\textbf {y}}=y_{j})P({\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}|{\textbf {y}}=y_{j})}{\sum _{k=1}^{n}P({\textbf {y}}=y_{k})P({\textbf {x}}_{1},\dots ,{\textbf {x}}_{m}|{\textbf {y}}=y_{k})}}={\dfrac {P({\textbf {y}}=y_{j})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{j})}{\sum _{k=1}^{n}P({\textbf {y}}=y_{k})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{k})}}.$

Therefore, to attain the value of ${\textbf {y}}$ with the highest probability, (denoted by ${\hat {\textbf {y}}}$ ) it suffices to solve the equation:

${\hat {\textbf {y}}}=arg\max _{y_{j}}{\bigg \lbrace }{\dfrac {P({\textbf {y}}=y_{j})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{j})}{\sum _{k=1}^{n}P({\textbf {y}}=y_{k})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{k})}}{\bigg \rbrace }$ .

Notice that as the equation in the denominator does not depend on $y_{j}$ , it follows that the equation above reduces to:

${\hat {\textbf {y}}}=arg\max _{y_{j}}{\bigg \lbrace }P({\textbf {y}}=y_{j})\prod _{i=1}^{m}P({\textbf {x}}_{i}|{\textbf {y}}=y_{j}){\bigg \rbrace }$ .

This is called the Naive Bayes classification rule.

Controversies[edit | edit source]

The adjective naive comes from the unrealistic assumption that the features in a data set are conditionally independent. In practice, this assumption is often violated. However, it should be noted that naive Bayes classifiers still tend to perform very well under this unrealistic assumption for small sample sizes. ^[1]

Additionally, consider a binary response $0,1$ and suppose that for some level of a predictor factor, we have all of that level assigned to $0$ . In this case, the conditional probability of ${\textbf {x}}$ given ${\textbf {y}}$ will be zero and therefore (as the formula above involves a product) will zero out the information contained in the posterior distribution. In other words, given data with a predictor class that is assigned to only 0, the Naive Bayes model will always predict new data for that particular class as 0. Depending on the application, this may or may not be desirable.

History[edit | edit source]

1763: Reverend Thomas Bayes (1702–61), who studied how to compute a distribution for the probability parameter of a binomial distribution and Bayes' theorem was named after him.

1960 :Naive Bayes introduced under a different name into the text retrieval community in the early 1960s.

Business problems that could be solved with Naive Bayes techniques[edit | edit source]

Naive Bayes can be used to for purposes such as predicting customer behavior, ^[2] predicting customer preferences, ^[3] predicting customer churn, ^[4] predicting fraudulent financial reporting, ^[5] spam detection, ^[6] network security, ^[7] and sentiment analysis. ^[8] Naive Bayes may also be used as a distance measurement between categorical variables. ^[9]

Example[edit | edit source]

Consider the following dataset:

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes

Suppose we want to classify a new patient with the following observation:

Fever	Headache	Sore Throat
Yes	No	Yes

Now, we have:

$P(Flu=Yes)=.60$	$P(Flu=No)=.40$
$P(Fever=Yes\|Flu=Yes)=0.5$	$P(Fever=Yes\|Flu=No)=0.25$
$P(Headache=Yes\|Flu=Yes)=0.33$	$P(Headache=Yes\|Flu=No)=0.5$
$P(SoreThroat=Yes\|Flu=Yes)=0.66$	$P(SoreThroat=Yes\|Flu=No)=0.5$
$P(Fever=No\|Flu=Yes)=0.5$	$P(Fever=No\|Flu=No)=0.75$
$P(Headache=No\|Flu=Yes)=0.66$	$P(Headache=No\|Flu=No)=0.5$
$P(SoreThroat=No\|Flu=Yes)=0.33$	$P(SoreThroat=No\|Flu=No)=0.5$

From this, we have

$P(Flu=Yes)P(Fever=Yes|Flu=Yes)P(Headache=No|Flu=Yes)P(SoreThroat=Yes|Flu=Yes)=(0.6)(0.5)(.66)(.66)=0.13$

and

$P(Flu=No)P(Fever=Yes|Flu=No)P(Headache=No|Flu=No)P(SoreThroat=Yes|Flu=No)=(0.4)(0.5)(0.5)(0.5)=0.05$ .

Therefore, as 0.13 is larger that 0.05, this new patient would be classified as being sick with the flu.

Top 5 Recent Tweets[edit | edit source]

Date	Author	Tweet
11 Dec 2015	@vyassaurabh411	One of the most informative articles on #NaiveBayes and #TextClassification!! Thanks @rasbt !! #DataScience http://sebastianraschka.com/Articles/2014_naive_bayes_1.html …
13 Dec 2015	@albuhhh	Overhearing a Stats PhD talk about his daily fantasy sports algorithm. Hint: involves naive Bayes. Exhibit A that you're getting fleeced.
11 Dec 2015	@@gcosma1	A nice explanation of naive bayes #machinelearning #datascience
10 Dec 2015	@kylemathews	classifying domains into arbitrary categories. Using Naive bayes. It's working really well but there's lots of knobs to learn.
4 Dec 2015	@cljds	An implementation of Naive Bayes in Clojure applied to the Titanic dataset https://github.com/clojuredatascience/ch4-classification/blob/master/src/cljds/ch4/examples.clj#L215 … #clojurex

Top 5 Recent News Headlines[edit | edit source]

Top 5 Lifetime Tweets[edit | edit source]

Top 5 Lifetime News Headlines[edit | edit source]

Jump up ↑ http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4-NaiveBayes.pdf
Jump up ↑ Mauser, Arne, et al. "Predicting customer behavior using naive bayes and maximum entropy–winning the data-mining-cup 2004." Proc. Informatiktage. 2005.
Jump up ↑ Balaji, S., and S. K. Srivatsa. "Naïve Bayes Classification Approach for Mining Life Insurance Databases for Effective Prediction of Customer Preferences over Life Insurance Products." International Journal of Computer Applications 51.3 (2012).
Jump up ↑ Nath, Shyam V., and Ravi S. Behara. "Customer churn analysis in the wireless industry: A data mining approach." Proceedings-annual meeting of the decision sciences institute. 2003.
Jump up ↑ Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons, 2011.
Jump up ↑ Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. "Spam filtering with naive bayes-which naive bayes?." CEAS. 2006.
Jump up ↑ Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive bayes." International journal of computer science and network security 7.12 (2007): 258-263.
Jump up ↑ Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.
Jump up ↑ Li, Chaoqun, Liangxiao Jiang, and Hongwei Li. "Naive Bayes for value difference metric." Frontiers of Computer Science 8.2 (2014): 255-264

[1] Jump up ↑ http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4-NaiveBayes.pdf

[2] Jump up ↑ Mauser, Arne, et al. "Predicting customer behavior using naive bayes and maximum entropy–winning the data-mining-cup 2004." Proc. Informatiktage. 2005.

[3] Jump up ↑ Balaji, S., and S. K. Srivatsa. "Naïve Bayes Classification Approach for Mining Life Insurance Databases for Effective Prediction of Customer Preferences over Life Insurance Products." International Journal of Computer Applications 51.3 (2012).

[4] Jump up ↑ Nath, Shyam V., and Ravi S. Behara. "Customer churn analysis in the wireless industry: A data mining approach." Proceedings-annual meeting of the decision sciences institute. 2003.

[5] Jump up ↑ Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons, 2011.

[6] Jump up ↑ Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. "Spam filtering with naive bayes-which naive bayes?." CEAS. 2006.

[7] Jump up ↑ Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive bayes." International journal of computer science and network security 7.12 (2007): 258-263.

[8] Jump up ↑ Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.

[9] Jump up ↑ Li, Chaoqun, Liangxiao Jiang, and Hongwei Li. "Naive Bayes for value difference metric." Frontiers of Computer Science 8.2 (2014): 255-264

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

$P(Flu=Yes)=.60$	$P(Flu=No)=.40$
$P(Fever=Yes\|Flu=Yes)=0.5$	$P(Fever=Yes\|Flu=No)=0.25$
$P(Headache=Yes\|Flu=Yes)=0.33$	$P(Headache=Yes\|Flu=No)=0.5$
$P(SoreThroat=Yes\|Flu=Yes)=0.66$	$P(SoreThroat=Yes\|Flu=No)=0.5$
$P(Fever=No\|Flu=Yes)=0.5$	$P(Fever=No\|Flu=No)=0.75$
$P(Headache=No\|Flu=Yes)=0.66$	$P(Headache=No\|Flu=No)=0.5$
$P(SoreThroat=No\|Flu=Yes)=0.33$	$P(SoreThroat=No\|Flu=No)=0.5$

Naive Bayes

Follow

Verify

Contents

Derivation[edit | edit source]

Controversies[edit | edit source]

History[edit | edit source]

Business problems that could be solved with Naive Bayes techniques[edit | edit source]

Example[edit | edit source]

Top 5 Recent Tweets[edit | edit source]

Top 5 Recent News Headlines[edit | edit source]

Top 5 Lifetime Tweets[edit | edit source]

Top 5 Lifetime News Headlines[edit | edit source]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Top Authors

Verification history

Navigation

Tools

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes

Naive Bayes Follow Verify

Contents

Derivation[edit | edit source]

Controversies[edit | edit source]

History[edit | edit source]

Business problems that could be solved with Naive Bayes techniques[edit | edit source]

Example[edit | edit source]

Top 5 Recent Tweets[edit | edit source]

Top 5 Recent News Headlines[edit | edit source]

Top 5 Lifetime Tweets[edit | edit source]

Top 5 Lifetime News Headlines[edit | edit source]

Navigation menu

Search

Top Authors

Verification history

Naive Bayes

Follow

Verify

Fever	Headache	Sore Throat	Flu
Yes	No	No	No
No	No	Yes	Yes
Yes	Yes	No	Yes
Yes	No	Yes	Yes
No	Yes	Yes	No
No	No	No	No
Yes	No	No	Yes
No	Yes	Yes	No
No	Yes	Yes	Yes
No	No	Yes	Yes