Naive Bayes

#navebayes #machinelearning #bayes

Antony Christopher Mar 01 2021 · 3 min read
Share this

Naive Bayes is a probabilistic machine learning algorithm. It is used widely to solve the classification problem. In addition to that this algorithm works perfectly in natural language problems (NLP).

About Bayes Theorem

The naive Bayes algorithm is based on the Bayes theorem. Let's see what the theorem explains.

In simple terms, the Bayes theorem is a way of finding a probability when we know certain other probabilities. From the formula, we can clearly infer that the probability of B has already occurred to determine the probability of A. Let's see an example to have a better understanding. 


A group of people planning a picnic today, but the morning is cloudy,

  • Oh no! 50% of all rainy days start off cloudy!
  • But cloudy mornings are common (about 40% of days start cloudy)
  • And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)
  • What is the chance of rain during the day?

    Here, we correlate the rain and cloud two variables. By using those we can determine the formula as below. 

    P(Rain|Cloud) = P(Rain)*P(Cloud|Rain)/P(Cloud)

    From the given data determine the below info, 

    P(Cloud|Rain) is the probability of cloudy and rain happens is 50%

    P(Cloud) is the probability of Cloud 40%

    P(Rain) is the probability of rain 10%

    p(Rain|Cloud) =  0.1*0.5/0.4 = 0.125

    The chance of rain is 12.5%. Let's plan for a picnic!.

    Naive Bayes classifiers are a collection of classification algorithm based on Baye's theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other. For example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of any possible correlations between the colour, roundness, and diameter features.  

    Naive Bayes classifiers have been heavily used for text classification and text analysis in machine learning problems. 

    Assumptions made by Naive Bayes 
    The fundamental Naive Bayes assumption is that each feature makes an:

  • independent
  • equal
  • contribution to the outcome.

    Let us take an example to get some better intuition. Consider the car theft problem with attributes Color, Type, Origin, and the target, Stolen can be either Yes or No.

    The dataset is represented as below.

    We need to classify whether the car is stolen, given the features of the car. The columns represent these features and the rows represent individual entries. If we take the first row of the dataset, we can observe that the car is stolen if the Color is Red, the Type is Sports and the Origin is Domestic. So we want to classify a Red Domestic SUV is getting stolen or not. Note that there is no example of a Red Domestic SUV in our data set.

    The posterior probability P(y|X) can be calculated by first, creating a Frequency Table for each attribute against the target. Then, molding the frequency tables to Likelihood Tables and finally, use the Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of the prediction.

    Frequency and Likelihood tables of ‘Color’
    Frequency and Likelihood tables of ‘Type’
    Frequency and Likelihood tables of ‘Origin’

    The list of features to consider as 

    today = (Red, SUV, Domestic)

    To determine car is stolen 'YES'

    P(Yes|today) = P(Red|Yes) * P(SUV|Yes) * P(Domestic|Yes) * P(Yes)/P(today)

    To determine the car is not stolen 'NO'

    P(No|today) = P(Red|No)*P(SUV|No)*P(Domestic|No) * P(No)/P(today)

    Since P(today) is common in both probabilities, we can ignore p(today) and find probabilities : 

    P(Yes|today)  = 3/5*1/5*2/5*1 = 0.048

    P(No|today) = 2/5*3/5*3/5*1 = 0.144

    Since 0.144 > 0.048, Which means given the features RED SUV and Domestic, our example gets classified as ’NO’ the car is not stolen.

    The method that we discussed above is applicable to discrete data. In the case of continuous data, we need to make some assumptions regarding the distribution of values of each feature.

    Gaussian Naive Bayes classifier

    In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution. When plotted, it gives a bell-shaped curve that is symmetric about the mean of the feature values as shown below:

    Other popular Naive Bayes classifiers are:

  • Multinomial Naive Bayes: Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification.
  • Bernoulli Naive Bayes: In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence(i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. frequency of a word in the document).
  • Hope this article helps you to give a better idea of Naive Bayes. 

    Read next