Naive Bayes (also known as the Bayes Classifier) is a probabilistic classifier that has been widely used for both clustering and classification. The probabilistic model of Naive Bayes is based on Bayes theorem and is termed Naive due to strong independence assumptions. Additionally, Naive Bayes is called a generative model as it models a distribution of points. To contrast, whereas a discriminative model such as logistic regression or a support vector machine is attempting to separate the data set via a line (or, more generally a hyper-plane), a generative model such as Naive Bayes is attempting to model the data points that are distant from that line.
Suppose that given , we have as conditionally independent random variables with a vector of length . By the definition of conditional independence we have:
Now, assuming is a discrete variable, by Bayes Rule and the above we have:
Therefore, to attain the value of with the highest probability, (denoted by ) it suffices to solve the equation:
Notice that as the equation in the denominator does not depend on , it follows that the equation above reduces to:
This is called the Naive Bayes classification rule.
The adjective naive comes from the unrealistic assumption that the features in a data set are conditionally independent. In practice, this assumption is often violated. However, it should be noted that naive Bayes classifiers still tend to perform very well under this unrealistic assumption for small sample sizes. 
Additionally, consider a binary response and suppose that for some level of a predictor factor, we have all of that level assigned to . In this case, the conditional probability of given will be zero and therefore (as the formula above involves a product) will zero out the information contained in the posterior distribution. In other words, given data with a predictor class that is assigned to only 0, the Naive Bayes model will always predict new data for that particular class as 0. Depending on the application, this may or may not be desirable.
1763: Reverend Thomas Bayes (1702–61), who studied how to compute a distribution for the probability parameter of a binomial distribution and Bayes' theorem was named after him.
1960 :Naive Bayes introduced under a different name into the text retrieval community in the early 1960s.
Business problems that could be solved with Naive Bayes techniques
Naive Bayes can be used to for purposes such as predicting customer behavior,  predicting customer preferences,  predicting customer churn,  predicting fraudulent financial reporting,  spam detection,  network security,  and sentiment analysis.  Naive Bayes may also be used as a distance measurement between categorical variables. 
Consider the following dataset:
Suppose we want to classify a new patient with the following observation:
Now, we have:
From this, we have
Therefore, as 0.13 is larger that 0.05, this new patient would be classified as being sick with the flu.
Top 5 Recent Tweets
|11 Dec 2015||@vyassaurabh411||One of the most informative articles on #NaiveBayes and #TextClassification!! Thanks @rasbt !! #DataScience|
|13 Dec 2015||@albuhhh|| Overhearing a Stats PhD talk about his daily fantasy sports algorithm. Hint: involves naive Bayes.
Exhibit A that you're getting fleeced.
|11 Dec 2015||@@gcosma1||A nice explanation of naive bayes #machinelearning #datascience|
|10 Dec 2015||@kylemathews||classifying domains into arbitrary categories. Using Naive bayes. It's working really well but there's lots of knobs to learn.|
|4 Dec 2015||@cljds||An implementation of Naive Bayes in Clojure applied to the Titanic dataset https://github.com/clojuredatascience/ch4-classification/blob/master/src/cljds/ch4/examples.clj#L215 … #clojurex|
Top 5 Recent News Headlines
Top 5 Lifetime Tweets
Top 5 Lifetime News Headlines
- Mauser, Arne, et al. "Predicting customer behavior using naive bayes and maximum entropy–winning the data-mining-cup 2004." Proc. Informatiktage. 2005.
- Balaji, S., and S. K. Srivatsa. "Naïve Bayes Classification Approach for Mining Life Insurance Databases for Effective Prediction of Customer Preferences over Life Insurance Products." International Journal of Computer Applications 51.3 (2012).
- Nath, Shyam V., and Ravi S. Behara. "Customer churn analysis in the wireless industry: A data mining approach." Proceedings-annual meeting of the decision sciences institute. 2003.
- Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data mining for business intelligence: Concepts, techniques, and applications in Microsoft Office Excel with XLMiner. John Wiley and Sons, 2011.
- Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. "Spam filtering with naive bayes-which naive bayes?." CEAS. 2006.
- Panda, Mrutyunjaya, and Manas Ranjan Patra. "Network intrusion detection using naive bayes." International journal of computer science and network security 7.12 (2007): 258-263.
- Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.
- Li, Chaoqun, Liangxiao Jiang, and Hongwei Li. "Naive Bayes for value difference metric." Frontiers of Computer Science 8.2 (2014): 255-264