Naive Bayes
Naive Bayes (also known as the Bayes Classifier) is a probabilistic classifier that has been widely used for both clustering and classification. The probabilistic model of Naive Bayes is based on Bayes theorem and is termed Naive due to strong independence assumptions. Additionally, Naive Bayes is called a generative model as it models a distribution of points. To contrast, whereas a discriminative model such as logistic regression or a support vector machine is attempting to separate the data set via a line (or, more generally a hyperplane), a generative model such as Naive Bayes is attempting to model the data points that are distant from that line.
Contents
Derivation
Suppose that given , we have as conditionally independent random variables with a vector of length . By the definition of conditional independence we have:
Now, assuming is a discrete variable, by Bayes Rule and the above we have:
Therefore, to attain the value of with the highest probability, (denoted by ) it suffices to solve the equation:
.
Notice that as the equation in the denominator does not depend on , it follows that the equation above reduces to:
.
This is called the Naive Bayes classification rule.
Controversies
The adjective naive comes from the unrealistic assumption that the features in a data set are conditionally independent. In practice, this assumption is often violated. However, it should be noted that naive Bayes classifiers still tend to perform very well under this unrealistic assumption for small sample sizes. ^{[1]}
Additionally, consider a binary response and suppose that for some level of a predictor factor, we have all of that level assigned to . In this case, the conditional probability of given will be zero and therefore (as the formula above involves a product) will zero out the information contained in the posterior distribution. In other words, given data with a predictor class that is assigned to only 0, the Naive Bayes model will always predict new data for that particular class as 0. Depending on the application, this may or may not be desirable.
History
1763: Reverend Thomas Bayes (1702–61), who studied how to compute a distribution for the probability parameter of a binomial distribution and Bayes' theorem was named after him.
1960 :Naive Bayes introduced under a different name into the text retrieval community in the early 1960s.
Business problems that could be solved with Naive Bayes techniques
Naive Bayes can be used to for purposes such as predicting customer behavior, ^{[2]} predicting customer preferences, ^{[3]} predicting customer churn, ^{[4]} predicting fraudulent financial reporting, ^{[5]} spam detection, ^{[6]} network security, ^{[7]} and sentiment analysis. ^{[8]} Naive Bayes may also be used as a distance measurement between categorical variables. ^{[9]}
Example
Consider the following dataset:
Fever  Headache  Sore Throat  Flu 

Yes  No  No  No 
No  No  Yes  Yes 
Yes  Yes  No  Yes 
Yes  No  Yes  Yes 
No  Yes  Yes  No 
No  No  No  No 
Yes  No  No  Yes 
No  Yes  Yes  No 
No  Yes  Yes  Yes 
No  No  Yes  Yes 
Suppose we want to classify a new patient with the following observation:
Fever  Headache  Sore Throat 

Yes  No  Yes 
Now, we have:
From this, we have
and
.
Therefore, as 0.13 is larger that 0.05, this new patient would be classified as being sick with the flu.
Top 5 Recent Tweets
Date  Author  Tweet  Link 

11 Dec 2015  @vyassaurabh411  One of the most informative articles on #NaiveBayes and #TextClassification!! Thanks @rasbt !! #DataScience
http://sebastianraschka.com/Articles/2014_naive_bayes_1.html … 

13 Dec 2015  @albuhhh  Overhearing a Stats PhD talk about his daily fantasy sports algorithm. Hint: involves naive Bayes.
Exhibit A that you're getting fleeced. 

11 Dec 2015  @@gcosma1  A nice explanation of naive bayes #machinelearning #datascience  
10 Dec 2015  @kylemathews  classifying domains into arbitrary categories. Using Naive bayes. It's working really well but there's lots of knobs to learn.  
4 Dec 2015  @cljds  An implementation of Naive Bayes in Clojure applied to the Titanic dataset https://github.com/clojuredatascience/ch4classification/blob/master/src/cljds/ch4/examples.clj#L215 … #clojurex 
Top 5 Recent News Headlines
Top 5 Lifetime Tweets
Top 5 Lifetime News Headlines
 ↑ http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4NaiveBayes.pdf
 ↑ Mauser, Arne, et al. "Predicting customer behavior using naive bayes and maximum entropy–winning the dataminingcup 2004." Proc. Informatiktage. 2005.
 ↑ Balaji, S., and S. K. Srivatsa. "Naïve Bayes Classification Approach for Mining Life Insurance Databases for Effective Prediction of Customer Preferences over Life Insurance Products." International Journal of Computer Applications 51.3 (2012).
 ↑ Nath, Shyam V., and Ravi S. Behara. "Customer churn analysis in the wireless industry: A data mining approach." Proceedingsannual meeting of the decision sciences institute. 2003.
 ↑ Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons, 2011.
 ↑ Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. "Spam filtering with naive bayeswhich naive bayes?." CEAS. 2006.
 ↑ Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive bayes." International journal of computer science and network security 7.12 (2007): 258263.
 ↑ Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL02 conference on Empirical methods in natural language processingVolume 10. Association for Computational Linguistics, 2002.
 ↑ Li, Chaoqun, Liangxiao Jiang, and Hongwei Li. "Naive Bayes for value difference metric." Frontiers of Computer Science 8.2 (2014): 255264