DS-11 Language & NLP

African Speech Recognition & Transcription Dataset

3,000+ hours of transcribed audio across 12 African languages — recorded in realistic acoustic environments by demographically diverse speakers — providing the broadest African speech corpus available for ASR model training, voice assistant localisation, and accessibility tool development.

This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.

The African Speech Recognition & Transcription Dataset spans 3,000+ hours of audio recorded across 12 African languages: Hausa, Yoruba, Igbo, Swahili, Zulu, Xhosa, Twi, Amharic, Wolof, Luganda, Kinyarwanda, and Mozambican Portuguese. Recordings were collected in controlled studio sessions, community centres, outdoor markets, and simulated call-centre environments — deliberately capturing the full acoustic range encountered in real-world deployment: background noise, varying microphone quality, speaker proximity variation, and multi-speaker overlap.

Each audio clip is paired with a human-verified orthographic transcript, a phonetic transliteration where applicable, speaker demographic metadata (age group, gender, dialect region), and acoustic environment tags. ASR confidence scores from a baseline wav2vec 2.0 model are included per clip, enabling curriculum learning approaches that sequence training from high-confidence to difficult utterances. Clip durations range from 2 to 30 seconds; the median is 8 seconds.

The dataset is partitioned into train / validation / test splits stratified by language, speaker identity (no speaker appears in both train and test), and acoustic environment. A separate out-of-domain evaluation set comprising radio broadcast excerpts and phone-call audio is provided for robustness testing. All audio is stored as 16 kHz mono WAV; transcripts are distributed as JSON sidecar files and as HuggingFace Datasets-compatible JSONL.

Key Use Cases

ASR model training and fine-tuning for African languages
Voice assistant localisation (Siri, Google Assistant, custom)
IVR system speech recognition for banking and telco
Accessibility tools: dictation and screen readers in local languages
Speaker diarisation and gender/age classification research
Language identification from audio
Cross-lingual transfer learning baseline benchmarking
Low-resource speech synthesis (TTS) data bootstrapping

Languages Covered

🇳🇬 Hausa
🇳🇬 Yoruba
🇳🇬 Igbo
🇰🇪 Swahili
🇿🇦 Zulu / Xhosa
🇬🇭 Twi (Akan)
🇪🇹 Amharic
🇸🇳 Wolof
🇺🇬 Luganda
🇷🇼 Kinyarwanda
🇲🇿 Mozambican Portuguese
📦 WAV 16 kHz + JSONL / HuggingFace

Dataset Highlights

Audio Hours
3,000+
transcribed speech
Languages
12
across 12 African countries
Unique Speakers
8,500+
demographically diverse
Acoustic Envs
4
studio, community, outdoor, call-centre

Geographic Coverage

Primary Coverage
Other Regions

Dataset Schema

Each record represents one audio clip and its associated transcript and metadata. Audio files are referenced by filename; transcripts and annotations are stored inline.

Field NameTypeDescriptionNullableExample
clip_id STRING Unique clip identifier No CLK-HAS-NGA-0041823
language ENUM Spoken language: HAUSA, YORUBA, IGBO, SWAHILI, ZULU, XHOSA, TWI, AMHARIC, WOLOF, LUGANDA, KINYARWANDA, PT_MOZ No HAUSA
country_code STRING ISO 3166-1 alpha-2 country of recording No NG
audio_filename STRING WAV file path relative to dataset root No audio/ng/hausa/CLK-HAS-NGA-0041823.wav
duration_seconds FLOAT Clip duration in seconds No 7.4
transcript STRING Human-verified orthographic transcription No Yaya za mu iya taimaka muku yau?
phonetic_transcript STRING Phonetic transliteration in IPA (null if not available) Yes null
speaker_id STRING Anonymised speaker identifier (consistent within split) No SPK-NGA-1847
speaker_gender ENUM Speaker gender: MALE, FEMALE Yes FEMALE
speaker_age_group ENUM Age group: YOUTH (15–24), ADULT (25–54), SENIOR (55+) Yes ADULT
dialect_region STRING Speaker dialect or regional variety label Yes Northern Nigeria
acoustic_environment ENUM Recording environment: STUDIO, COMMUNITY, OUTDOOR, CALL_CENTRE No COMMUNITY
snr_db FLOAT Estimated signal-to-noise ratio in decibels Yes 18.3
baseline_asr_wer FLOAT Word error rate from baseline wav2vec 2.0 model (0–1) Yes 0.14
split ENUM Dataset partition: TRAIN, VAL, TEST, OOD_EVAL No TRAIN

Sample Records

Four representative clip records spanning languages, acoustic environments, and speaker demographics.

speech_sample.json
[ { "clip_id": "CLK-HAS-NGA-0041823", "language": "HAUSA", "country_code": "NG", "audio_filename": "audio/ng/hausa/CLK-HAS-NGA-0041823.wav", "duration_seconds": 7.4, "transcript": "Yaya za mu iya taimaka muku yau?", "phonetic_transcript": null, "speaker_id": "SPK-NGA-1847", "speaker_gender": "FEMALE", "speaker_age_group": "ADULT", "dialect_region": "Northern Nigeria", "acoustic_environment": "COMMUNITY", "snr_db": 18.3, "baseline_asr_wer": 0.14, "split": "TRAIN" }, { "clip_id": "CLK-SWA-KEN-0087234", "language": "SWAHILI", "country_code": "KE", "audio_filename": "audio/ke/swahili/CLK-SWA-KEN-0087234.wav", "duration_seconds": 11.2, "transcript": "Ninataka kuhamisha pesa kwenye akaunti yangu nyingine.", "phonetic_transcript": null, "speaker_id": "SPK-KEN-0432", "speaker_gender": "MALE", "speaker_age_group": "ADULT", "dialect_region": "Nairobi", "acoustic_environment": "CALL_CENTRE", "snr_db": 24.7, "baseline_asr_wer": 0.08, "split": "TRAIN" }, { "clip_id": "CLK-ZUL-ZAF-0019871", "language": "ZULU", "country_code": "ZA", "audio_filename": "audio/za/zulu/CLK-ZUL-ZAF-0019871.wav", "duration_seconds": 5.9, "transcript": "Ngicela usizo nge-akhawunti yami yebhange.", "phonetic_transcript": "ŋ͡ǀiːɕɛla uːsiːzo ŋɡɛ|akaːwuːnti jaːmi jɛβaːŋɡɛ", "speaker_id": "SPK-ZAF-2201", "speaker_gender": "FEMALE", "speaker_age_group": "YOUTH", "dialect_region": "KwaZulu-Natal", "acoustic_environment": "STUDIO", "snr_db": 38.1, "baseline_asr_wer": 0.22, "split": "TEST" }, { "clip_id": "CLK-AMH-ETH-0063490", "language": "AMHARIC", "country_code": "ET", "audio_filename": "audio/et/amharic/CLK-AMH-ETH-0063490.wav", "duration_seconds": 9.1, "transcript": "የእርስዎን ሂሳብ ቁጥር ይንገሩን።", "phonetic_transcript": null, "speaker_id": "SPK-ETH-0774", "speaker_gender": "MALE", "speaker_age_group": "SENIOR", "dialect_region": "Addis Ababa", "acoustic_environment": "OUTDOOR", "snr_db": 11.6, "baseline_asr_wer": 0.31, "split": "TRAIN" } ]
Request Dataset Access

All datasets are available under a commercial licence agreement. Our team typically responds within 2 business days.

Request Access
NDA may be required

Build with Data that reflects Africa

Request access to our full catalog of licensed human-validated African datasets or request custom data tailored to your project.