Time Series Analysis on Air Passenger Data
Time series analysis is crucial for understanding trends, seasonality, and making future predictions based on historical data. In this blog, we will analyze the famous AirPassengers dataset, perform decomposition, check for stationarity, build an ARIMA model for forecasting, and detect anomalies.
Goal
Our goal is to analyze the AirPassengers dataset to:
- Identify trends and seasonal patterns.
- Check stationarity of the dataset.
- Build an ARIMA model for forecasting.
- Detect anomalies in the data.
Required Packages
Before we start, install the necessary Python packages:
pip install pandas matplotlib statsmodels numpyStep 1: Load and Inspect the Data
We first load the dataset and inspect its structure.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import numpy as np
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
# Load dataset
df = pd.read_csv("AirPassengers.csv", parse_dates=["Month"], index_col="Month")
# Display first few rows
print(df.head())
print(df.info()) Step 2: Data Visualization
Plot the data to understand trends and seasonality.
plt.figure(figsize=(12,6))
plt.plot(df.index, df["#Passengers"], label="Monthly Passengers")
plt.xlabel("Year")
plt.ylabel("Passengers (in thousands)")
plt.title("Air Passenger Traffic Over Time")
plt.legend()
plt.show() 
Step 3: Handle Missing Values
print(df.isnull().sum()) # Check for missing values
df = df.fillna(method="ffill") # Forward fill if necessaryStep 4: Decomposing the Time Series
We decompose the time series into trend, seasonal, and residual components.
# Decomposing the time series
decomposed = seasonal_decompose(df["#Passengers"], model="multiplicative", period=12)
# Plot the decomposition
decomposed.plot()
plt.show() 
Step 5: Test for Stationarity (ADF Test)
We use the Augmented Dickey-Fuller test to check stationarity.
result = adfuller(df["#Passengers"])
print(f"ADF Statistic: {result[0]}")
print(f"P-value: {result[1]}") If the p-value is greater than 0.05, the data is non-stationary.
To make the series stationary, we apply differencing:
df["Passengers_diff"] = df["#Passengers"].diff().dropna()
# Re-run ADF test
result = adfuller(df["Passengers_diff"].dropna())
print(f"New ADF Statistic: {result[0]}")
print(f"New P-value: {result[1]}") Step 6: Train-Test Split
train_size = int(len(df) * 0.8)
train, test = df.iloc[:train_size], df.iloc[train_size:] Step 7: Train an ARIMA Model
# Train ARIMA Model
model = ARIMA(train["#Passengers"], order=(1,1,1))
model_fit = model.fit()
# Forecast
forecast = model_fit.forecast(steps=len(test))
# Plot results
plt.plot(test.index, test["#Passengers"], label="Actual")
plt.plot(test.index, forecast, label="Forecast", linestyle="dashed")
plt.legend()
plt.show() 
Step 8: Detecting Anomalies using Z-Score
df["Z-Score"] = (df["#Passengers"] - df["#Passengers"].mean()) / df["#Passengers"].std()
df["Anomaly"] = df["Z-Score"].abs() > 3 # Mark anomalies if Z-score > 3
# Plot anomalies
plt.plot(df.index, df["#Passengers"], label="#Passengers")
plt.scatter(df.index[df["Anomaly"]], df["#Passengers"][df["Anomaly"]], color="red", label="Anomalies")
plt.legend()
plt.show() 
Conclusion
In this analysis, we:
- Visualized the data to identify trends and seasonality.
- Checked for stationarity and made adjustments.
- Built an ARIMA model for forecasting future values.
- Detected anomalies using Z-Score. This approach can be applied to any time series dataset to gain insights and make informed decisions.
Full Code
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import numpy as np
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
# Load dataset
df = pd.read_csv("AirPassengers.csv", parse_dates=["Month"], index_col="Month")
# Handle missing values
df = df.fillna(method="ffill")
# Visualization
plt.figure(figsize=(12,6))
plt.plot(df.index, df["#Passengers"], label="Monthly Passengers")
plt.xlabel("Year")
plt.ylabel("Passengers (in thousands)")
plt.title("Air Passenger Traffic Over Time")
plt.legend()
plt.show()
# Decomposing the time series
decomposed = seasonal_decompose(df["#Passengers"], model="multiplicative", period=12)
decomposed.plot()
plt.show()
# ADF Test
result = adfuller(df["#Passengers"])
print(f"ADF Statistic: {result[0]}")
print(f"P-value: {result[1]}")
# Differencing
df["Passengers_diff"] = df["#Passengers"].diff().dropna()
result = adfuller(df["Passengers_diff"].dropna())
print(f"New ADF Statistic: {result[0]}")
print(f"New P-value: {result[1]}")
# Train-Test Split
train_size = int(len(df) * 0.8)
train, test = df.iloc[:train_size], df.iloc[train_size:]
# Train ARIMA Model
model = ARIMA(train["#Passengers"], order=(1,1,1))
model_fit = model.fit()
forecast = model_fit.forecast(steps=len(test))
plt.plot(test.index, test["#Passengers"], label="Actual")
plt.plot(test.index, forecast, label="Forecast", linestyle="dashed")
plt.legend()
plt.show()
# Detect Anomalies
df["Z-Score"] = (df["#Passengers"] - df["#Passengers"].mean()) / df["#Passengers"].std()
df["Anomaly"] = df["Z-Score"].abs() > 3
plt.plot(df.index, df["#Passengers"], label="#Passengers")
plt.scatter(df.index[df["Anomaly"]], df["#Passengers"][df["Anomaly"]], color="red", label="Anomalies")
plt.legend()
plt.show()
Random Blogs
- Python Challenging Programming Exercises Part 3
- Mastering SQL in 2025: A Complete Roadmap for Beginners
- Career Guide: Natural Language Processing (NLP)
- The Ultimate Guide to Machine Learning (ML) for Beginners
- How to Install Tableau and Power BI on Ubuntu Using VirtualBox
- The Ultimate Guide to Artificial Intelligence (AI) for Beginners
- Grow your business with Facebook Marketing
- Why to learn Digital Marketing?
- Mastering Python in 2025: A Complete Roadmap for Beginners
- How to Become a Good Data Scientist ?
Prepare for Interview
- JavaScript Interview Questions for 5+ Years Experience
- JavaScript Interview Questions for 2–5 Years Experience
- JavaScript Interview Questions for 1–2 Years Experience
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
Datasets for Machine Learning
- Awesome-ChatGPT-Prompts
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
- Artificial Characters Dataset
- Bitcoin Heist Ransomware Address Dataset





