Measuring AI Intelligence in African Contexts.

The Datalens Africa LLM Leaderboard is the definitive benchmark for evaluating AI understanding of African contexts. Compare how leading AI systems perform across language, culture, knowledge, and healthcare using the most comprehensive African evaluation framework.

15k+

Questions

16+

Languages

32

Specialties

5

LLM Families

African AI Benchmark Leaderboard

Rank
Model
AfriMCQA
AfriMMLU
MasakhaNEWS
AfriMedQA
Overall Score
πŸ‘‘1
Gemini Logo
Gemini 3.5 Flash
Google Β· May '26
90.14%
80.88%
72.74%
84.71%
82.12%
2
Claude Logo
Claude Opus 4.6
Anthropic Β· May '26
82.45%
68.13%
79.19%
78.99%
77.19%
3
DeepSeek Logo
DeepSeek-V4-Pro
DeepSeek AI Β· May '26
β€”
76.75%
75.80%
76.24%
76.26%
4
OpenAI Logo
GPT-5.4
OpenAI Β· May '26
78.37%
72.75%
77.71%
74.71%
75.88%
5
OpenAI Logo
GPT-5.1
OpenAI Β· Feb '26
81.22%
62.13%
78.61%
76.72%
74.67%
6
Gemini Logo
Gemini 3.1 Flash Lite
Google Β· May '26
85.58%
69.38%
71.44%
72.06%
74.61%
7
Gemini Logo
Gemini 2.5 Pro
Google Β· Feb '26
85.58%
79.47%
51.22%
75.95%
73.05%
8
DeepSeek Logo
DeepSeek-R1
DeepSeek AI Β· Feb '26
β€”
70.63%
73.87%
72.99%
72.50%
9
Claude Logo
Claude Sonnet 4.6
Anthropic Β· Feb '26
79.33%
62.38%
68.75%
78.04%
72.12%
10
OpenAI Logo
GPT-5.2
OpenAI Β· Feb '26
75.50%
67.38%
75.27%
69.97%
72.03%
11
Gemini Logo
Gemini 2.5 Flash
Google Β· Feb '26
81.05%
60.50%
71.20%
74.42%
71.79%
12
Grok Logo
Grok 4.1 Fast Reasoning
xAI Β· Feb '26
72.60%
59.63%
64.90%
70.13%
66.81%
13
DeepSeek Logo
DeepSeek-V3.2
DeepSeek AI Β· Feb '26
β€”
61.50%
64.66%
71.06%
65.74%
14
Grok Logo
Grok 4 Fast Reasoning
xAI Β· Feb '26
74.76%
54.13%
60.26%
71.32%
65.12%
15
Claude Logo
Claude Haiku 4.5
Anthropic Β· Feb '26
63.46%
54.75%
67.72%
62.49%
62.10%

* Scores represent average performance across all benchmark sub-tasks in each dataset. Last updated: 23 May 2026.

AfriMCQA
Multilingual Cultural Understanding Β· Accuracy (%)
AfriMMLU
Knowledge & Reasoning Across Languages Β· Accuracy (%)
MasakhaNEWS
News Topic Classification Β· Macro-F1 (%)
AfriMedQA v2
Pan-African Clinical QA Β· Macro-Accuracy (%)

Four Pillars of African AI Evaluation

Each benchmark targets a distinct dimension of African contextual understanding, spanning culture, language, and clinical medicine, and providing a multidimensional view of LLM capability.

01
Afri-MCQA
Multimodal, culturally-grounded multiple-choice questions in 16+ African languages, testing deep cultural and linguistic comprehension.
Culture Multilingual Multimodal
Languages 16+
Task Type MCQ / VQA
Metric Accuracy
02
AfriMMLU
Human-translated MMLU spanning 5 subjects across 17 African languages. Tests knowledge across geography, law, economics, global facts and mathematics.
Knowledge Reasoning Human-Translated
Subjects 5 Domains
Languages 17
Metric Accuracy
03
MasakhaNEWS
News topic classification across 16 African languages and 7 categories. Tests in-language understanding of real-world African news across diverse topics.
NLP Classification News
Categories 7 Topics
Languages 16
Metric Macro-F1
04
AfriMedQA v2
Pan-African medical QA dataset with 15,275 questions from 621 contributors across 16 countries, covering 32+ clinical specialties.
Medical Expert-Curated Healthcare
Questions 15,275
Specialties 32+
Metric Macro-Accuracy

Frontier AI Systems Under Review

Google
Gemini Series
Four Gemini models evaluated; Gemini 3.5 Flash leads the entire leaderboard with the highest overall score.
Models3.5 Flash Β· 3.1 Flash Lite
Best Score82.12%
Anthropic
Claude Series
Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5 evaluated; Opus leads on MasakhaNEWS and ranks #2 overall.
ModelsOpus / Sonnet 4.6 Β· Haiku 4.5
Best Score77.19%
OpenAI
GPT Series
GPT-5.1, GPT-5.2, and GPT-5.4 evaluated across all four African benchmarks in zero-shot and few-shot settings.
ModelsGPT-5.1 Β· GPT-5.2 Β· GPT-5.4
Best Score75.88%
DeepSeek AI
DeepSeek Series
DeepSeek-V4-Pro, R1, and V3.2 evaluated; V4-Pro leads the family with strong MMLU and MasakhaNEWS scores.
ModelsV4-Pro Β· R1 Β· V3.2
Best Score76.26%
xAI
Grok Series
Grok 4 Fast Reasoning and Grok 4.1 Fast Reasoning evaluated with web access disabled for fairness.
ModelsGrok 4 Β· Grok 4.1
Best Score66.81%

Why This Leaderboard Matters

To our community,


We launched this leaderboard because a fundamental piece has been missing from the global AI conversation: a meaningful way to measure how well intelligent systems actually understand African realities.


For too long, Africa has remained at the margins of model evaluationβ€”frequently included in rhetoric, but rarely represented in data, linguistic diversity, or contextual testing. This benchmark is our commitment to changing that narrative.


When AI systems fail to grasp African languages or misinterpret local contexts, the result isn’t just a technical glitchβ€”it is digital exclusion. The future of intelligence cannot be built on a partial map of the world.


This leaderboard ensures that Africa is never an afterthought in global development. By building the data foundations and evaluation standards necessary for truly inclusive AI, we aren’t just measuring progress; we are shaping it.


Together, we can ensure Africa isn't just a spectator in the AI future, but a primary architect of it.


Best regards,

Olaoye Anthony Somide
Olaoye Anthony Somide
Founder & CEO, DataLens Africa

DataLens Studio

Help the next model
top this leaderboard.

Every score on this leaderboard is a reflection of the data behind the model. Better African language annotations produce better training data, which trains models that score higher on these benchmarks. DataLens Studio is the platform where that work happens β€” built specifically for African language labeling, RLHF, and cultural context annotation.

1
Annotate African language data
Label text, audio, and cultural context across 50+ African languages.
2
Build better training datasets
Human-verified, culturally-grounded data that frontier models are missing.
3
Models score higher on African benchmarks
Your contributions directly move the needle on this leaderboard.
DataLens
DataLens Studio
African Language Annotation Platform
Text, audio & RLHF annotation tasks
50+ African languages supported
Earn while contributing to African AI
Quality-reviewed by African language experts

Every annotation is a step toward AI that truly understands Africa.