Measuring AI Intelligence in African Contexts.

The Datalens Africa LLM Leaderboard is the definitive benchmark for evaluating AI understanding of African contexts. Compare how leading AI systems perform across language, culture, knowledge, and healthcare using the most comprehensive African evaluation framework.

15k+

Questions

16+

Languages

32

Specialties

5

LLM Families

African AI Benchmark Leaderboard

Rank
Model
AfriMCQA
AfriMMLU
MasakhaNEWS
AfriMedQA
Overall Score
πŸ‘‘1
OpenAI Logo
GPT-5.1
OpenAI
81.22%
62.13%
78.61%
76.72%
74.67%
2
Gemini Logo
Gemini 2.5 Pro
Google
85.58%
79.47%
51.22%
75.95%
73.05%
3
DeepSeek Logo
DeepSeek-R1
DeepSeek AI
β€”
70.63%
73.87%
72.99%
72.50%
4
Claude Logo
Claude Sonnet 4.6
Anthropic
79.33%
62.75%
68.75%
78.04%
72.22%
5
OpenAI Logo
GPT-5.2
OpenAI
75.50%
67.38%
75.27%
69.97%
72.03%
6
Gemini Logo
Gemini 2.5 Flash
Google
81.05%
60.50%
71.20%
74.42%
71.79%
7
Grok Logo
Grok 4.1 Fast Reasoning
xAI
72.60%
59.63%
64.90%
70.13%
66.81%
8
DeepSeek Logo
DeepSeek-V3.2
DeepSeek AI
β€”
61.50%
64.66%
71.06%
65.74%
9
Grok Logo
Grok 4 Fast Reasoning
xAI
74.76%
54.13%
60.26%
71.32%
65.12%
10
Claude Logo
Claude Haiku 4.5
Anthropic
63.46%
54.75%
67.72%
62.49%
62.10%

* Scores represent average performance across all benchmark sub-tasks in each dataset. Last updated: 22 Feb 2026.

AfriMCQA
Multilingual Cultural Understanding Β· Accuracy (%)
AfriMMLU
Knowledge & Reasoning Across Languages Β· Accuracy (%)
MasakhaNEWS
News Topic Classification Β· Macro-F1 (%)
AfriMedQA v2
Pan-African Clinical QA Β· Macro-Accuracy (%)

Four Pillars of African AI Evaluation

Each benchmark targets a distinct dimension of African contextual understanding, spanning culture, language, and clinical medicine, and providing a multidimensional view of LLM capability.

01
Afri-MCQA
Multimodal, culturally-grounded multiple-choice questions in 16+ African languages, testing deep cultural and linguistic comprehension.
Culture Multilingual Multimodal
Languages 16+
Task Type MCQ / VQA
Metric Accuracy
02
AfriMMLU
Human-translated MMLU spanning 5 subjects across 17 African languages. Tests knowledge across geography, law, economics, global facts and mathematics.
Knowledge Reasoning Human-Translated
Subjects 5 Domains
Languages 17
Metric Accuracy
03
MasakhaNEWS
News topic classification across 16 African languages and 7 categories. Tests in-language understanding of real-world African news across diverse topics.
NLP Classification News
Categories 7 Topics
Languages 16
Metric Macro-F1
04
AfriMedQA v2
Pan-African medical QA dataset with 15,275 questions from 621 contributors across 16 countries, covering 32+ clinical specialties.
Medical Expert-Curated Healthcare
Questions 15,275
Specialties 32+
Metric Acc + ROUGE

Frontier AI Systems Under Review

OpenAI
GPT Series
GPT-5.1 and GPT-5.2 evaluated across all four African benchmarks in zero-shot and few-shot settings.
ModelsGPT-5.1 Β· GPT-5.2
Best Score74.67%
Google
Gemini Series
Gemini 2.5 Pro and Flash tested across all benchmark categories, with Pro leading on MMLU.
Models2.5 Pro Β· 2.5 Flash
Best Score73.05%
Anthropic
Claude Series
Claude Sonnet 4.6 and Haiku 4.5 evaluated; Claude Sonnet led all models on AfriMedQA.
ModelsSonnet 4.6 Β· Haiku 4.5
Best Score72.22%
DeepSeek AI
DeepSeek Series
DeepSeek-R1 and V3.2 evaluated; R1 excels on MMLU and MasakhaNEWS despite no AfriMCQA score.
ModelsR1 Β· V3.2
Best Score72.50%
xAI
Grok Series
Grok 4 Fast Reasoning and Grok 4.1 Fast Reasoning evaluated with web access disabled for fairness.
ModelsGrok 4 Β· Grok 4.1
Best Score66.81%

Why This Leaderboard Matters

To our community,


We launched this leaderboard because a fundamental piece has been missing from the global AI conversation: a meaningful way to measure how well intelligent systems actually understand African realities.


For too long, Africa has remained at the margins of model evaluationβ€”frequently included in rhetoric, but rarely represented in data, linguistic diversity, or contextual testing. This benchmark is our commitment to changing that narrative.


When AI systems fail to grasp African languages or misinterpret local contexts, the result isn’t just a technical glitchβ€”it is digital exclusion. The future of intelligence cannot be built on a partial map of the world.


This leaderboard ensures that Africa is never an afterthought in global development. By building the data foundations and evaluation standards necessary for truly inclusive AI, we aren’t just measuring progress; we are shaping it.


Together, we can ensure Africa isn't just a spectator in the AI future, but a primary architect of it.


Best regards,

Olaoye Anthony Somide
Olaoye Anthony Somide
Founder & CEO, CipherSense AI

Unlock the Full Potential of Your Data

Empower your organization with reliable, high-quality data pipelines that fuel smarter decisions, optimize operations, and unlock new opportunities.