Logistic Regression for Classification

  Add to Bookmark

Logistic Regression is a supervised machine learning algorithm used for classification problems. It estimates the probability that a given input point belongs to a certain class and is widely used for binary classification tasks.


What is Logistic Regression?

Unlike linear regression, which predicts continuous values, logistic regression predicts probabilities. It uses a logistic (sigmoid) function to map predicted values to a range between 0 and 1. If the output probability is above a threshold (usually 0.5), the input is classified into one class; otherwise, it's classified into the other.


Sigmoid Function

The sigmoid function outputs values between 0 and 1, which can be interpreted as probabilities.


Applications of Logistic Regression

ApplicationDescription
Email Spam DetectionClassify emails as spam or not spam
Medical DiagnosisPredict if a patient has a disease based on symptoms
Credit RiskDetermine whether a customer will default on a loan
Customer ChurnPredict whether a customer will cancel a service
MarketingPredict whether a customer will buy a product

Python Implementation: Binary Classification

We’ll classify whether a student passes (1) or fails (0) based on hours studied.

# Import libraries
from sklearn.linear_model import LogisticRegression
import numpy as np
import matplotlib.pyplot as plt

# Sample data
X = np.array([[1], [2], [3], [4], [5], [6]])  # Hours studied
y = np.array([0, 0, 0, 1, 1, 1])              # 0 = Fail, 1 = Pass

# Create and train the model
model = LogisticRegression()
model.fit(X, y)

# Predictions and probability
X_test = np.array([[3.5]])
predicted_class = model.predict(X_test)
probability = model.predict_proba(X_test)

print("Predicted Class:", predicted_class[0])
print("Probability of Passing:", probability[0][1])

Output- 

Predicted Class: 1
Probability of Passing: 0.5000015650516633

Visualizing the Sigmoid Curve

# Plot sigmoid curve for visualization
X_range = np.linspace(0, 7, 100).reshape(-1, 1)
y_prob = model.predict_proba(X_range)[:, 1]

plt.plot(X_range, y_prob, color='green')
plt.title("Logistic Regression - Probability Curve")
plt.xlabel("Hours Studied")
plt.ylabel("Probability of Passing")
plt.grid(True)
plt.show()

Evaluation Metrics for Classification

  • Accuracy: (TP + TN) / Total
  • Precision: TP / (TP + FP)
  • Recall: TP / (TP + FN)
  • F1 Score: 2 * (Precision * Recall) / (Precision + Recall)
  • Confusion Matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

y_pred = model.predict(X)
print("Accuracy:", accuracy_score(y, y_pred))
print("Precision:", precision_score(y, y_pred))
print("Recall:", recall_score(y, y_pred))
print("F1 Score:", f1_score(y, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y, y_pred))

Output - 

Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
 [[3 0]
 [0 3]]

Tips for Beginners

  • Logistic regression works best when the classes are linearly separable.
  • Always scale input features, especially when using regularization.
  • Try visualizing data before choosing logistic regression.

Tips for Professionals

  • Use regularization (L1 or L2) to prevent overfitting (penalty='l1' or 'l2').
  • For multiclass classification, use multi_class='multinomial' with solver='lbfgs'.
  • Logistic regression can be used as a baseline model for classification tasks due to its interpretability and speed.

Summary

  • Logistic regression is a classification algorithm, not a regression one.
  • It uses the sigmoid function to estimate class probabilities.
  • Useful for many real-world binary classification problems and scalable to multiclass tasks.