African Customer Service Conversation Dataset
200K+ multi-turn conversation threads from African banks and telcos — spanning WhatsApp logs, IVR transcripts, and live-chat sessions — each labelled with intent, resolution outcome, sentiment trajectory, and agent-action annotations for training production-grade customer service AI.
This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.
The African Customer Service Conversation Dataset contains 200K+ multi-turn dialogue threads collected from customer service operations at major banks, mobile network operators, and fintech platforms across Nigeria, Kenya, South Africa, and Senegal. Data spans three channels — WhatsApp Business API logs, IVR call transcripts (ASR-generated and human-corrected), and web/app live-chat sessions — providing channel-diverse training signal for omnichannel conversational AI deployments.
Each conversation thread is annotated at two levels of granularity. At the turn level, every customer utterance carries an intent label from a 36-class taxonomy (covering account queries, transaction disputes, loan applications, airtime/data purchase, complaint escalation, and more), a sentiment score, and an entity extraction tag set. At the thread level, each conversation carries an overall resolution label (resolved, escalated, abandoned), a first-contact resolution flag, a handle-time bucket, and an industry-vertical tag (BANKING, TELCO, FINTECH).
Language composition reflects the real-world multilingual nature of African customer interactions: 48 % English, 22 % Nigerian Pidgin, 14 % Swahili, 9 % Zulu/Xhosa, 4 % French (Senegal), and 3 % Hausa/Yoruba code-switches. All personally identifiable information — names, account numbers, phone numbers, addresses — has been replaced with typed placeholders (e.g., [CUSTOMER_NAME], [ACCOUNT_ID]) using a rule-based PII redaction pipeline validated by a legal review.
Key Use Cases
Language Distribution
Dataset Highlights
Geographic Coverage
Dataset Schema
Each record represents one conversation thread. Fields cover channel provenance, language, thread-level labels, and aggregated turn statistics. Individual turn arrays are stored as nested JSON in the turns field.
| Field Name | Type | Description | Nullable | Example |
|---|---|---|---|---|
| conversation_id | STRING | Unique conversation thread identifier | No | CONV-NGA-BK-0094712 |
| country_code | STRING | ISO 3166-1 alpha-2 country code | No | NG |
| industry | ENUM | Industry vertical: BANKING, TELCO, FINTECH | No | BANKING |
| channel | ENUM | Interaction channel: WHATSAPP, IVR, LIVE_CHAT | No | |
| primary_language | ENUM | Dominant language: ENGLISH, PIDGIN, SWAHILI, ZULU, FRENCH, HAUSA, YORUBA | No | PIDGIN |
| has_code_switching | BOOLEAN | True if the conversation mixes two or more languages | No | true |
| turn_count | INTEGER | Total number of turns in the conversation | No | 8 |
| primary_intent | STRING | Dominant customer intent from 36-class taxonomy | No | TRANSACTION_DISPUTE |
| resolution | ENUM | Thread outcome: RESOLVED, ESCALATED, ABANDONED | No | RESOLVED |
| first_contact_resolved | BOOLEAN | True if resolved without escalation or callback | No | true |
| handle_time_bucket | ENUM | Handle time: UNDER_2MIN, 2_5MIN, 5_10MIN, OVER_10MIN | No | 5_10MIN |
| opening_sentiment | ENUM | Customer sentiment at conversation open: POSITIVE, NEGATIVE, NEUTRAL | No | NEGATIVE |
| closing_sentiment | ENUM | Customer sentiment at conversation close: POSITIVE, NEGATIVE, NEUTRAL | Yes | POSITIVE |
| pii_redacted | BOOLEAN | True if PII placeholders were applied to this thread | No | true |
| split | ENUM | Dataset partition: TRAIN, VAL, TEST | No | TRAIN |
| turns | JSON | Array of turn objects {role, text, intent, sentiment, entities} — one per turn | No | [...] |
Sample Records
Four representative conversation threads spanning industries, channels, languages, and resolution outcomes.
Build with Data that reflects Africa
Request access to our full catalog of licensed human-validated African datasets or request custom data tailored to your project.