- Data Analysis with Python
-
Overview
- Introduction to Data Science and Analytics
- Loading and Cleaning Data in Pandas
- Data Manipulation with NumPy and Pandas
- Exploratory Data Analysis (EDA) Techniques
- Handling Missing Data and Duplicates
- Merging, Joining, and Concatenating DataFrames
- Time Series Analysis Basics
- Data Visualization with Matplotlib and Seaborn
- Descriptive Statistics and Data Summarization
- Advanced Pandas Operations
Data Manipulation with NumPy and Pandas
Add to BookmarkManipulating data efficiently is essential in data analysis. NumPy and Pandas provide powerful tools for handling and transforming data. This tutorial covers:
- Using NumPy for numerical operations
- Manipulating DataFrames in Pandas
- Filtering, sorting, and grouping data
1. Why Use NumPy and Pandas?
- NumPy: Optimized for numerical computations with fast operations on large arrays.
- Pandas: Built on NumPy, provides high-level data manipulation tools for structured data.
import numpy as np
import pandas as pd2. NumPy Basics for Data Manipulation
NumPy is mainly used for handling arrays.
Creating and Manipulating NumPy Arrays
arr = np.array([10, 20, 30, 40, 50])
print(arr * 2) # Multiply each element by 2Generating Random Data
rand_arr = np.random.rand(5) # Array with 5 random numbersReshaping Arrays
arr2D = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2D.T) # Transpose of the arrayMathematical Operations
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr)) # Calculate mean
print(np.sum(arr)) # Calculate sum
print(np.sqrt(arr)) # Square root3. Data Manipulation with Pandas
Pandas makes data manipulation easier using Series and DataFrames.
Creating a DataFrame
data = {"Name": ["Amit", "Pooja", "Rahul", "Neha"],
"Age": [25, 30, 22, 35],
"Salary": [50000, 60000, 45000, 70000]}
df = pd.DataFrame(data)
print(df)4. Selecting and Filtering Data
Selecting Columns
print(df["Name"]) # Select single column
print(df[["Name", "Salary"]]) # Select multiple columnsSelecting Rows
print(df.iloc[1]) # Select second row
print(df.loc[df["Age"] > 25]) # Filter rows where Age > 25Conditional Filtering
high_salary = df[df["Salary"] > 50000]
print(high_salary)5. Adding, Updating, and Deleting Data
Adding a New Column
df["Bonus"] = df["Salary"] * 0.10 # 10% BonusUpdating Values
df.loc[df["Name"] == "Rahul", "Salary"] = 50000Deleting a Column
df.drop(columns=["Bonus"], inplace=True)Deleting a Row
df.drop(index=2, inplace=True) # Remove Rahul6. Sorting and Rearranging Data
Sorting by a Column
df_sorted = df.sort_values(by="Salary", ascending=False)Reordering Columns
df = df[["Name", "Salary", "Age"]]7. Grouping and Aggregating Data
Grouping helps in summarizing large datasets.
df_grouped = df.groupby("Age").mean()
df.groupby("Age")["Salary"].sum()8. Merging and Joining DataFrames
Merging DataFrames
df1 = pd.DataFrame({"ID": [1, 2], "Name": ["Amit", "Pooja"]})
df2 = pd.DataFrame({"ID": [1, 2], "Salary": [50000, 60000]})
df_merged = pd.merge(df1, df2, on="ID")Concatenating DataFrames
df_concat = pd.concat([df1, df2], axis=0)9. Handling Missing Data
df.fillna(0, inplace=True) # Fill missing values with 0
df.dropna(inplace=True) # Remove rows with missing values10. Saving Processed Data
df.to_csv("processed_data.csv", index=False) # Save as CSV
df.to_excel("processed_data.xlsx", index=False) # Save as ExcelConclusion
In this tutorial, we explored NumPy and Pandas for data manipulation. You learned how to filter, sort, merge, and clean data effectively. In the next tutorial, we will focus on Exploratory Data Analysis (EDA) Techniques.
Prepare for Interview
- JavaScript Interview Questions for 5+ Years Experience
- JavaScript Interview Questions for 2–5 Years Experience
- JavaScript Interview Questions for 1–2 Years Experience
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
Random Blogs
- Best Platform to Learn Digital Marketing in Free
- Top 15 Recommended SEO Tools
- AI & Space Exploration – AI’s Role in Deep Space Missions and Planetary Research
- How to Install Tableau and Power BI on Ubuntu Using VirtualBox
- AI Agents: The Future of Automation, Work, and Opportunities in 2025
- Python Challenging Programming Exercises Part 3
- Exploratory Data Analysis On Iris Dataset
- What Is SEO and Why Is It Important?
- Robotics & AI – How AI is Powering Modern Robotics
- The Ultimate Guide to Artificial Intelligence (AI) for Beginners
- Datasets for Exploratory Data Analysis for Beginners
- Mastering Python in 2025: A Complete Roadmap for Beginners
- Quantum AI – The Future of AI Powered by Quantum Computing
- Transforming Logistics: The Power of AI in Supply Chain Management
- Internet of Things (IoT) & AI – Smart Devices and AI Working Together
Datasets for Machine Learning
- Awesome-ChatGPT-Prompts
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
