Where to Find Free Datasets for Your Next Machine Learning & Data Science Project

Finding the right dataset is crucial for building machine learning and data science projects. Whether you are working on deep learning, natural language processing, or data visualization, having access to diverse datasets can enhance your work. Here is a list of some of the best platforms where you can find free datasets for your next project.
1. Kaggle (https://www.kaggle.com/datasets)
Kaggle hosts a vast collection of datasets across multiple domains, including healthcare, finance, and natural language processing. It also provides an interactive environment to work with datasets directly in notebooks.
2. Google Dataset Search (https://datasetsearch.research.google.com/)
This search engine allows you to find publicly available datasets from different sources, including government databases, research institutions, and open data repositories.
3. UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php)
A well-known source for classic datasets commonly used in academic research. It includes datasets for classification, regression, and clustering tasks.
4. Data.gov (https://www.data.gov/)
The U.S. government’s open data portal offers datasets related to health, climate, finance, education, and more.
5. World Bank Open Data (https://data.worldbank.org/)
Provides global economic, financial, and demographic datasets useful for research and analysis.
6. FiveThirtyEight (https://data.fivethirtyeight.com/)
Offers datasets used in FiveThirtyEight’s journalism, covering topics like politics, sports, and culture.
7. AWS Open Data Registry (https://registry.opendata.aws/)
A collection of large-scale datasets hosted on AWS, covering satellite imagery, genomics, and machine learning benchmarks.
8. Google Cloud Public Datasets (https://cloud.google.com/public-datasets)
A collection of public datasets available for big data analysis using Google Cloud’s computing resources.
9. Quandl (https://www.quandl.com/)
Provides economic, financial, and stock market datasets, including both free and premium datasets.
10. European Data Portal (https://data.europa.eu/en)
A platform for open government data from European Union member states.
11. DataHub.io (https://datahub.io/)
Contains open datasets in various fields such as finance, health, and climate.
12. UN Data (https://data.un.org/)
Provides datasets from the United Nations on global issues like demographics, health, and economics.
13. NASA Earthdata (https://earthdata.nasa.gov/)
A great resource for geospatial and environmental datasets, useful for climate research and earth sciences.
14. Google Open Images Dataset (https://storage.googleapis.com/openimages/web/index.html)
A vast dataset of annotated images for computer vision tasks.
15. DataSF (https://data.sfgov.org/)
Provides open data from the city of San Francisco, covering transportation, business, crime, and more.
16. GitHub - Awesome Public Datasets (https://github.com/awesomedata/awesome-public-datasets)
A curated list of open datasets across multiple domains, including sports, medicine, and finance.
Summary
These platforms provide an excellent starting point for sourcing high-quality datasets. Whether you are a beginner or an expert, having access to real-world data can significantly improve your machine learning and data science skills.
Random Blogs
- Python Challenging Programming Exercises Part 3
- Types of Numbers in Python
- 10 Awesome Data Science Blogs To Check Out
- Python Challenging Programming Exercises Part 2
- Best Platform to Learn Digital Marketing in Free
- Mastering SQL in 2025: A Complete Roadmap for Beginners
- Deep Learning (DL): The Core of Modern AI
- Top 10 Blogs of Digital Marketing you Must Follow
- OLTP vs. OLAP Databases: Advanced Insights and Query Optimization Techniques
- Career Guide: Natural Language Processing (NLP)
Prepare for Interview
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
- Debugging in Python
- Unit Testing in Python
Datasets for Machine Learning
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
- Artificial Characters Dataset
- Bitcoin Heist Ransomware Address Dataset