nEMO: Dataset of Emotional Speech in Polish

nEMO is a simulated Polish emotional speech dataset. The corpus contains over three hours of recordings performed by nine actors, portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state.

The text material used in the recordings was carefully selected to represent the phonetics of the Polish language. The corpus is freely available under the Creative Commons (CC BY-NC-SA 4.0) license.

Supported Tasks

Audio Classification:

The dataset is primarily designed for speech emotion recognition. Each recording is labeled with one of six emotional states (anger, fear, happiness, sadness, surprise, and neutral). Additionally, each sample includes speaker ID and gender, making the dataset useful for various audio classification tasks.

Automatic Speech Recognition (ASR):

The dataset contains orthographic and normalized transcriptions for each audio recording, making it a valuable resource for automatic speech recognition (ASR) tasks. The sentences were carefully selected to cover a wide range of phonemes in Polish.

Text-to-Speech (TTS):

The dataset includes emotionally expressive speech recordings along with transcriptions, making it useful for developing TTS systems that generate emotionally expressive speech.

Language:

nEMO contains both audio and transcriptions in Polish.

Dataset Access:

Available at: [amu-cai/nEMO]

Corpus Author:

Iwona Christop

License:

Creative Commons (CC BY-NC-SA 4.0)