Dataset Library

Access real-world datasets for your data science projects and practice

Awesome-ChatGPT-Prompts

2025

The Awesome-ChatGPT-Prompts dataset is a community-curated collection of high-quality prompts for ChatGPT and other large language models. It includes diverse prompt templates—from technical tasks like acting as a Linux terminal or Python interpreter to creative roles like storyteller or teacher—helping users explore, reuse, and improve prompt engineering practices. The dataset is open-source under CC0, making it freely available for research, development, and practical applications.

DownloadDetails

Amazon Product Reviews Dataset

Real

2018

Amazon Review Data (2018) is a large-scale dataset containing over 233 million customer reviews from Amazon products between 1996 and 2018. It includes detailed review information such as ratings, review text, helpfulness votes, and timestamps, along with rich product metadata like brand, price, category, and images. This dataset supports various tasks in natural language processing, sentiment analysis, and recommendation systems.

DownloadDetails

Ozone Level Detection Dataset

Real

2008

Two ground ozone level data sets are included in this collection. One is the eight hour peak set (eighthr.data), the other is the one hour peak set (onehr.data). Those data were collected from 1998 to 2004 at the Houston, Galveston and Brazoria area.

DownloadDetails

Bank Transaction Fraud Detection

Multivariate

2024

At LOL Bank Pvt. Ltd., ensuring the safety and integrity of economic transactions is a top priority. With increasingly more on line transactions and digital banking activities, fraudulent transactions have end up a good sized danger to both the financial institution and its customers. Fraudulent activities, along with unauthorized account get right of entry to, identification robbery, and suspicious transaction patterns, bring about economic losses and harm to patron agree with.

DownloadDetails

YouTube Trending Video Dataset (updated daily)

Multivariate

2021

YouTube maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year

DownloadDetails

Covid-19 Case Surveillance Public Use Dataset

Multivariate, Data-Generator

2020

The COVID-19 case surveillance system database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and states. On April 5, 2020,

DownloadDetails

US Election 2020

Multivariate

2020

The US Election 2020 dataset contains 864 instances and 52 attributes, focusing on the presidential race at the county level. It includes real-valued multivariate data for classification and regression tasks, with no missing values. The dataset provides insights into voting patterns and election trends across the U.S.

DownloadDetails

Forest Fires Dataset

Multivariate

2008

Forest fires are a major environmental issue, creating economical and ecological damage while endangering human lives. Fast detection is a key element for controlling such phenomenon. To achieve this, one alternative is to use automatic tools based on local sensors, such as provided by meteorological stations.

DownloadDetails

Mobile Robots Dataset

Domain-Theory

1995

The Mobile Robots dataset, published in 1995, contains sensor data from a mobile robot for classification tasks. It includes categorical, integer, and real attributes with no missing values. The dataset is used for learning concepts from robotic sensor data and was contributed by researchers from the University of Dortmund, Germany.

DownloadDetails

Safety Helmet Detection

Image

2020

Improve work safety by detecting the presence of people and safety helmets. To import a dataset, install MakeML. You can train an Object Detection neural network in a few clicks using this dataset.

DownloadDetails

All Space Missions from 1957

Multivariate

2020

This Dataset contains informations regarding space missions since the beginning of them (1957). This Datasets contains 9 Column (String : 6, Integer : 2, Decimal : 1) This DataSet was scraped from https://nextspaceflight.com/launches/past/?page=1 and includes all the space missions since the beginning of Space Race (1957)

DownloadDetails

OSIC Pulmonary Fibrosis Progression Dataset

Image

2019

The Open Source Imaging Consortium (OSIC) is proud to partner with Kaggle to host the first-ever computational challenge for interstitial lung diseases: The OSIC Pulmonary Fibrosis Progression Challenge. A $55,000 prize will be offered to the Kaggle investigator(s) who devises the highest performing algorithm.

DownloadDetails