Naïve Bayes Classifier

  Add to Bookmark

Naïve Bayes is a family of simple yet powerful probabilistic classifiers based on applying Bayes’ Theorem with a strong (naïve) assumption of independence between features. It’s especially effective for text classification tasks like spam detection or sentiment analysis.


What You'll Learn

  • What is Naïve Bayes and how it works
  • Types of Naïve Bayes classifiers
  • Real-world applications
  • Example using Python (with sklearn)

What is Naïve Bayes?

Naïve Bayes classifiers use the principles of Bayes' Theorem:

In simple terms, it calculates the probability of a class given the input features. The “naïve” assumption is that all features are independent of each other, which simplifies computation.


Types of Naïve Bayes

  • Gaussian Naïve Bayes – Assumes continuous data and follows a normal distribution
  • Multinomial Naïve Bayes – For discrete counts like word occurrences
  • Bernoulli Naïve Bayes – For binary features (yes/no, 0/1)

Example: Naïve Bayes for Text Classification (Spam Detection)

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Sample data
texts = ["Win money now", "Hello friend", "Buy cheap meds", "See you tomorrow", "Free entry now"]
labels = [1, 0, 1, 0, 1]  # 1 = spam, 0 = not spam

# Convert text to features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
y = labels

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train model
model = MultinomialNB()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Output-

Accuracy: 0.0

When to Use Naïve Bayes

  • Spam filtering
  • Sentiment analysis
  • Document categorization
  • Medical diagnosis based on symptoms
  • Recommendation systems (e.g., movie genres)

Advantages

  • Simple and fast
  • Works well with high-dimensional data
  • Requires a small amount of training data
  • Handles both binary and multiclass classification

Limitations

  • The assumption of feature independence is rarely true in real-world data
  • Poor performance if features are highly correlated
  • Cannot capture complex relationships between features

Summary

Naïve Bayes is a foundational machine learning algorithm that combines simplicity with effectiveness. While its independence assumption may not hold in all scenarios, it often performs surprisingly well, especially in text classification.