- Supervised Learning
-
Overview
- Introduction to Supervised Learning
- Linear Regression and Its Applications
- Logistic Regression for Classification
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN) Algorithm
- Naïve Bayes Classifier
- Gradient Boosting (XGBoost, LightGBM)
- Overfitting and Underfitting in Models
- Bias-Variance Tradeoff
Naïve Bayes Classifier
Add to BookmarkNaïve Bayes is a family of simple yet powerful probabilistic classifiers based on applying Bayes’ Theorem with a strong (naïve) assumption of independence between features. It’s especially effective for text classification tasks like spam detection or sentiment analysis.
What You'll Learn
- What is Naïve Bayes and how it works
- Types of Naïve Bayes classifiers
- Real-world applications
- Example using Python (with
sklearn
)
What is Naïve Bayes?
Naïve Bayes classifiers use the principles of Bayes' Theorem:
In simple terms, it calculates the probability of a class given the input features. The “naïve” assumption is that all features are independent of each other, which simplifies computation.
Types of Naïve Bayes
- Gaussian Naïve Bayes – Assumes continuous data and follows a normal distribution
- Multinomial Naïve Bayes – For discrete counts like word occurrences
- Bernoulli Naïve Bayes – For binary features (yes/no, 0/1)
Example: Naïve Bayes for Text Classification (Spam Detection)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# Sample data
texts = ["Win money now", "Hello friend", "Buy cheap meds", "See you tomorrow", "Free entry now"]
labels = [1, 0, 1, 0, 1] # 1 = spam, 0 = not spam
# Convert text to features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
y = labels
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train model
model = MultinomialNB()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
Output-
Accuracy: 0.0
When to Use Naïve Bayes
- Spam filtering
- Sentiment analysis
- Document categorization
- Medical diagnosis based on symptoms
- Recommendation systems (e.g., movie genres)
Advantages
- Simple and fast
- Works well with high-dimensional data
- Requires a small amount of training data
- Handles both binary and multiclass classification
Limitations
- The assumption of feature independence is rarely true in real-world data
- Poor performance if features are highly correlated
- Cannot capture complex relationships between features
Summary
Naïve Bayes is a foundational machine learning algorithm that combines simplicity with effectiveness. While its independence assumption may not hold in all scenarios, it often performs surprisingly well, especially in text classification.
Prepare for Interview
- JavaScript Interview Questions for 5+ Years Experience
- JavaScript Interview Questions for 2–5 Years Experience
- JavaScript Interview Questions for 1–2 Years Experience
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
Random Blogs
- Python Challenging Programming Exercises Part 1
- String Operations in Python
- Generative AI - The Future of Artificial Intelligence
- The Ultimate Guide to Machine Learning (ML) for Beginners
- The Ultimate Guide to Data Science: Everything You Need to Know
- Government Datasets from 50 Countries for Machine Learning Training
- Downlaod Youtube Video in Any Format Using Python Pytube Library
- Variable Assignment in Python
- Store Data Into CSV File Using Python Tkinter GUI Library
- OLTP vs. OLAP Databases: Advanced Insights and Query Optimization Techniques
- Python Challenging Programming Exercises Part 3
- Why to learn Digital Marketing?
- Ideas for Content of Every niche on Reader’s Demand during COVID-19
- Robotics & AI – How AI is Powering Modern Robotics
- Datasets for Speech Recognition Analysis
Datasets for Machine Learning
- Awesome-ChatGPT-Prompts
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset