Datasets for Speech Recognition Analysis

In This post we share top Datasets for Speech Recognition. Speech emotion analysis is an important task which further enables several application use cases. Due to the widespread use of smartphones, it becomes viable to analyze speech commands captured using microphones for emotion understanding by utilizing on-device machine learning models
- Google Audio Dataset
- Urbansound Dataset
- Spoken Digit Dataset
- Bird Audio Detection
- TensorFlow Speech Recognition
- Emotion Based Speech Recognition in the Wild
1. Google Audio Dataset
Google Audio Dataset is a large-scale dataset of manually annotated audio events. In this Daataset 2.1 million annotated videos, 5.8 thousand hours of audio with 527 classes of annotated sounds.
Some Audio Classes are given below
2. Urbansound Dataset
This is Urbansound Dataset. This dataset contains 1302 labeled sound recordings. Each recording is labeled with the start and end times of sound events from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. Each recording may contain multiple sound events, but for each file only events from a single class are labeled.
This dataset in Both CSV and JSON File. You can Download and Grow your Machine Learning Skills.
3. Spoken Digit Dataset
This is simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends. FSDD is open Dataset for Everyone. In this Dataset 4 Speakers with 2000 Recordings in English pronunciations.
No. of Rows:- 2000
4. Bird Audio Detection
Detecting bird sounds in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management.you can download dataset through { freefield1010: • [data labels] • [audio files (5.8 Gb zip)] (or [via bittorrent]) Warblr: • [data labels] • [audio files (4.3 Gb zip)] (or [via bittorrent]) }
5. TensorFlow Speech Recognition
Can you build an algorithm that understands simple speech commands? If Yes than this Dataset is for you. Note: There are only 12 possible labels for the Test set: yes, no, up, down, left, right, on, off, stop, go, silence, unknown. The unknown label should be used for a command that is not one one of the first 10 labels or that is not silence.
6. Emotion Based Speech Recognition in the Wild
EmoSpeech Dataset India's first keyword-emotion dataset. Surveillance 3 datasets in one 8K Environment Samples, 8K Keywords Samples and 8K Emotion Samples.
Your browser does not support the audio element.
Thanks for Reading
Random Blogs
- Python Challenging Programming Exercises Part 1
- 15 Amazing Keyword Research Tools You Should Explore
- Mastering Python in 2025: A Complete Roadmap for Beginners
- Extract RGB Color From a Image Using CV2
- Important Mistakes to Avoid While Advertising on Facebook
- Google’s Core Update in May 2020: What You Need to Know
- Ideas for Content of Every niche on Reader’s Demand during COVID-19
- Python Challenging Programming Exercises Part 2
- Loan Default Prediction Project Using Machine Learning
- String Operations in Python
Prepare for Interview
Datasets for Machine Learning
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
- Artificial Characters Dataset
- Bitcoin Heist Ransomware Address Dataset