- Supervised Learning
-
Overview
- Introduction to Supervised Learning
- Linear Regression and Its Applications
- Logistic Regression for Classification
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN) Algorithm
- Naïve Bayes Classifier
- Gradient Boosting (XGBoost, LightGBM)
- Overfitting and Underfitting in Models
- Bias-Variance Tradeoff
Naïve Bayes Classifier
Add to BookmarkNaïve Bayes is a family of simple yet powerful probabilistic classifiers based on applying Bayes’ Theorem with a strong (naïve) assumption of independence between features. It’s especially effective for text classification tasks like spam detection or sentiment analysis.
What You'll Learn
- What is Naïve Bayes and how it works
- Types of Naïve Bayes classifiers
- Real-world applications
- Example using Python (with
sklearn
)
What is Naïve Bayes?
Naïve Bayes classifiers use the principles of Bayes' Theorem:
In simple terms, it calculates the probability of a class given the input features. The “naïve” assumption is that all features are independent of each other, which simplifies computation.
Types of Naïve Bayes
- Gaussian Naïve Bayes – Assumes continuous data and follows a normal distribution
- Multinomial Naïve Bayes – For discrete counts like word occurrences
- Bernoulli Naïve Bayes – For binary features (yes/no, 0/1)
Example: Naïve Bayes for Text Classification (Spam Detection)
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# Sample data
texts = ["Win money now", "Hello friend", "Buy cheap meds", "See you tomorrow", "Free entry now"]
labels = [1, 0, 1, 0, 1] # 1 = spam, 0 = not spam
# Convert text to features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
y = labels
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train model
model = MultinomialNB()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
Output-
Accuracy: 0.0
When to Use Naïve Bayes
- Spam filtering
- Sentiment analysis
- Document categorization
- Medical diagnosis based on symptoms
- Recommendation systems (e.g., movie genres)
Advantages
- Simple and fast
- Works well with high-dimensional data
- Requires a small amount of training data
- Handles both binary and multiclass classification
Limitations
- The assumption of feature independence is rarely true in real-world data
- Poor performance if features are highly correlated
- Cannot capture complex relationships between features
Summary
Naïve Bayes is a foundational machine learning algorithm that combines simplicity with effectiveness. While its independence assumption may not hold in all scenarios, it often performs surprisingly well, especially in text classification.
Prepare for Interview
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
- Debugging in Python
- Unit Testing in Python
- Asynchronous Programming in PYthon
Random Blogs
- OLTP vs. OLAP Databases: Advanced Insights and Query Optimization Techniques
- The Ultimate Guide to Starting a Career in Computer Vision
- Avoiding the Beginner’s Trap: Key Python Fundamentals You Shouldn't Skip
- Window Functions in SQL – The Ultimate Guide
- Create Virtual Host for Nginx on Ubuntu (For Yii2 Basic & Advanced Templates)
- Extract RGB Color From a Image Using CV2
- Top 10 Knowledge for Machine Learning & Data Science Students
- Variable Assignment in Python
- AI Agents & Autonomous Systems – The Future of Self-Driven Intelligence
- SQL Joins Explained: A Complete Guide with Examples
- Understanding HTAP Databases: Bridging Transactions and Analytics
- Data Analytics: The Power of Data-Driven Decision Making
- Loan Default Prediction Project Using Machine Learning
- Python Challenging Programming Exercises Part 1
- Python Challenging Programming Exercises Part 3
Datasets for Machine Learning
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
- Artificial Characters Dataset