DS-11 Language & NLP

African Speech Recognition & Transcription Dataset

3,000+ hours of transcribed audio across 12 African languages — recorded in realistic acoustic environments by demographically diverse speakers — providing the broadest African speech corpus available for ASR model training, voice assistant localisation, and accessibility tool development.

This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.

The African Speech Recognition & Transcription Dataset spans 3,000+ hours of audio recorded across 12 African languages: Hausa, Yoruba, Igbo, Swahili, Zulu, Xhosa, Twi, Amharic, Wolof, Luganda, Kinyarwanda, and Mozambican Portuguese. Recordings were collected in controlled studio sessions, community centres, outdoor markets, and simulated call-centre environments — deliberately capturing the full acoustic range encountered in real-world deployment: background noise, varying microphone quality, speaker proximity variation, and multi-speaker overlap.

Each audio clip is paired with a human-verified orthographic transcript, a phonetic transliteration where applicable, speaker demographic metadata (age group, gender, dialect region), and acoustic environment tags. ASR confidence scores from a baseline wav2vec 2.0 model are included per clip, enabling curriculum learning approaches that sequence training from high-confidence to difficult utterances. Clip durations range from 2 to 30 seconds; the median is 8 seconds.

The dataset is partitioned into train / validation / test splits stratified by language, speaker identity (no speaker appears in both train and test), and acoustic environment. A separate out-of-domain evaluation set comprising radio broadcast excerpts and phone-call audio is provided for robustness testing. All audio is stored as 16 kHz mono WAV; transcripts are distributed as JSON sidecar files and as HuggingFace Datasets-compatible JSONL.

Key Use Cases

ASR model training and fine-tuning for African languages

Voice assistant localisation (Siri, Google Assistant, custom)

IVR system speech recognition for banking and telco

Accessibility tools: dictation and screen readers in local languages

Speaker diarisation and gender/age classification research

Language identification from audio

Cross-lingual transfer learning baseline benchmarking

Low-resource speech synthesis (TTS) data bootstrapping

Languages Covered

🇳🇬 Hausa

🇳🇬 Yoruba

🇳🇬 Igbo

🇰🇪 Swahili

🇿🇦 Zulu / Xhosa

🇬🇭 Twi (Akan)

🇪🇹 Amharic

🇸🇳 Wolof

🇺🇬 Luganda

🇷🇼 Kinyarwanda

🇲🇿 Mozambican Portuguese

📦 WAV 16 kHz + JSONL / HuggingFace

Dataset Highlights

Audio Hours

3,000+

transcribed speech

Languages

across 12 African countries

Unique Speakers

8,500+

demographically diverse

Acoustic Envs

studio, community, outdoor, call-centre

Geographic Coverage

Primary Coverage

Other Regions

Dataset Schema

Each record represents one audio clip and its associated transcript and metadata. Audio files are referenced by filename; transcripts and annotations are stored inline.

Field Name	Type	Description	Nullable	Example
clip_id	STRING	Unique clip identifier	No	CLK-HAS-NGA-0041823
language	ENUM	Spoken language: HAUSA, YORUBA, IGBO, SWAHILI, ZULU, XHOSA, TWI, AMHARIC, WOLOF, LUGANDA, KINYARWANDA, PT_MOZ	No	HAUSA
country_code	STRING	ISO 3166-1 alpha-2 country of recording	No	NG
audio_filename	STRING	WAV file path relative to dataset root	No	audio/ng/hausa/CLK-HAS-NGA-0041823.wav
duration_seconds	FLOAT	Clip duration in seconds	No	7.4
transcript	STRING	Human-verified orthographic transcription	No	Yaya za mu iya taimaka muku yau?
phonetic_transcript	STRING	Phonetic transliteration in IPA (null if not available)	Yes	null
speaker_id	STRING	Anonymised speaker identifier (consistent within split)	No	SPK-NGA-1847
speaker_gender	ENUM	Speaker gender: MALE, FEMALE	Yes	FEMALE
speaker_age_group	ENUM	Age group: YOUTH (15–24), ADULT (25–54), SENIOR (55+)	Yes	ADULT
dialect_region	STRING	Speaker dialect or regional variety label	Yes	Northern Nigeria
acoustic_environment	ENUM	Recording environment: STUDIO, COMMUNITY, OUTDOOR, CALL_CENTRE	No	COMMUNITY
snr_db	FLOAT	Estimated signal-to-noise ratio in decibels	Yes	18.3
baseline_asr_wer	FLOAT	Word error rate from baseline wav2vec 2.0 model (0–1)	Yes	0.14
split	ENUM	Dataset partition: TRAIN, VAL, TEST, OOD_EVAL	No	TRAIN

Sample Records

Four representative clip records spanning languages, acoustic environments, and speaker demographics.

speech_sample.json

[ { "clip_id": "CLK-HAS-NGA-0041823", "language": "HAUSA", "country_code": "NG", "audio_filename": "audio/ng/hausa/CLK-HAS-NGA-0041823.wav", "duration_seconds": 7.4, "transcript": "Yaya za mu iya taimaka muku yau?", "phonetic_transcript": null, "speaker_id": "SPK-NGA-1847", "speaker_gender": "FEMALE", "speaker_age_group": "ADULT", "dialect_region": "Northern Nigeria", "acoustic_environment": "COMMUNITY", "snr_db": 18.3, "baseline_asr_wer": 0.14, "split": "TRAIN" }, { "clip_id": "CLK-SWA-KEN-0087234", "language": "SWAHILI", "country_code": "KE", "audio_filename": "audio/ke/swahili/CLK-SWA-KEN-0087234.wav", "duration_seconds": 11.2, "transcript": "Ninataka kuhamisha pesa kwenye akaunti yangu nyingine.", "phonetic_transcript": null, "speaker_id": "SPK-KEN-0432", "speaker_gender": "MALE", "speaker_age_group": "ADULT", "dialect_region": "Nairobi", "acoustic_environment": "CALL_CENTRE", "snr_db": 24.7, "baseline_asr_wer": 0.08, "split": "TRAIN" }, { "clip_id": "CLK-ZUL-ZAF-0019871", "language": "ZULU", "country_code": "ZA", "audio_filename": "audio/za/zulu/CLK-ZUL-ZAF-0019871.wav", "duration_seconds": 5.9, "transcript": "Ngicela usizo nge-akhawunti yami yebhange.", "phonetic_transcript": "ŋ͡ǀiːɕɛla uːsiːzo ŋɡɛ|akaːwuːnti jaːmi jɛβaːŋɡɛ", "speaker_id": "SPK-ZAF-2201", "speaker_gender": "FEMALE", "speaker_age_group": "YOUTH", "dialect_region": "KwaZulu-Natal", "acoustic_environment": "STUDIO", "snr_db": 38.1, "baseline_asr_wer": 0.22, "split": "TEST" }, { "clip_id": "CLK-AMH-ETH-0063490", "language": "AMHARIC", "country_code": "ET", "audio_filename": "audio/et/amharic/CLK-AMH-ETH-0063490.wav", "duration_seconds": 9.1, "transcript": "የእርስዎን ሂሳብ ቁጥር ይንገሩን።", "phonetic_transcript": null, "speaker_id": "SPK-ETH-0774", "speaker_gender": "MALE", "speaker_age_group": "SENIOR", "dialect_region": "Addis Ababa", "acoustic_environment": "OUTDOOR", "snr_db": 11.6, "baseline_asr_wer": 0.31, "split": "TRAIN" } ]

Request Dataset Access

All datasets are available under a commercial licence agreement. Our team typically responds within 2 business days.

Request Access

NDA may be required

Related Datasets

Build with Data that reflects Africa

Request access to our full catalog of licensed human-validated African dataset or request a custom data tailored to your project.

Request Dataset Access Contact Sales