What is the DataLens Africa LLM Leaderboard?

The DataLens Africa LLM Leaderboard is the definitive benchmark ranking frontier large language models on their ability to understand African languages, cultures, and healthcare contexts. It evaluates models across four African-language benchmarks — AfriMCQA, AfriMMLU, MasakhaNEWS, and AfriMedQA — and is updated continuously as new models are evaluated.

Which LLM performs best on African language benchmarks?

As of June 2026, Gemini 3.5 Flash by Google leads the DataLens Africa LLM Leaderboard with an overall score of 82.12%. It holds the highest single-benchmark score ever recorded on the leaderboard — 90.14% on AfriMCQA — and also leads AfriMedQA at 84.71%. Claude Opus 4.6 ranks second at 77.19%, and DeepSeek-V4-Pro ranks third at 76.26%.

What benchmarks does the DataLens Africa LLM Leaderboard use?

The DataLens Africa LLM Leaderboard evaluates models across four benchmarks: AfriMCQA (multiple-choice question answering across African cultural and factual topics), AfriMMLU (massive multitask language understanding adapted for African contexts), MasakhaNEWS (news topic classification across 16 African languages), and AfriMedQA (medical question answering in African language and healthcare settings).

How is the overall score calculated on the DataLens Africa LLM Leaderboard?

The overall score is a weighted average of a model's scores across the available benchmarks — AfriMCQA, AfriMMLU, MasakhaNEWS, and AfriMedQA. Models that have not been evaluated on a specific benchmark, or whose results did not meet the evaluation validity threshold, are scored on the benchmarks available. All scores are expressed as percentages.

Which LLM is best for African healthcare AI applications?

Based on AfriMedQA scores on the DataLens Africa LLM Leaderboard, Gemini 3.5 Flash leads at 84.71%, followed by Claude Opus 4.6 at 78.99% and Claude Sonnet 4.6 at 78.04%. Both Google's Gemini family and Anthropic's Claude family show the strongest performance on African medical question-answering tasks as of mid-2026.

How often is the DataLens Africa LLM Leaderboard updated?

The DataLens Africa LLM Leaderboard is updated continuously as new model evaluations are completed. The first cohort of 10 models was published in February 2026. A second cohort of 5 new models was added in May 2026, bringing the total to 15 ranked models. New evaluations are added on a rolling basis as frontier models are submitted or selected for evaluation.

Which African languages are covered by the LLM benchmarks?

The benchmarks used on the DataLens Africa LLM Leaderboard cover a range of African languages. MasakhaNEWS spans 16 African languages including Hausa, Yoruba, Swahili, Amharic, Igbo, and Wolof. AfriMCQA and AfriMMLU cover African factual and cultural knowledge across multiple linguistic contexts. AfriMedQA focuses on medical terminology and clinical scenarios relevant to African healthcare settings.

Measuring AI Intelligence in African Contexts.

The Datalens Africa LLM Leaderboard is the definitive benchmark for evaluating AI understanding of African contexts. Compare how leading AI systems perform across language, culture, knowledge, and healthcare using the most comprehensive African evaluation framework.

Explore the Rankings View the Benchmarks

15k+

Questions

16+

Languages

32

Specialties

5

LLM Families

Rankings

African AI Benchmark Leaderboard

Rank

Model

AfriMCQA

AfriMMLU

MasakhaNEWS

AfriMedQA

Overall Score

👑1

Gemini 3.5 Flash

Google · May '26

90.14%

80.88%

72.74%

84.71%

82.12%

Claude Opus 4.6

Anthropic · May '26

82.45%

68.13%

79.19%

78.99%

77.19%

DeepSeek-V4-Pro

DeepSeek AI · May '26

—

76.75%

75.80%

76.24%

76.26%

GPT-5.4

OpenAI · May '26

78.37%

72.75%

77.71%

74.71%

75.88%

GPT-5.1

OpenAI · Feb '26

81.22%

62.13%

78.61%

76.72%

74.67%

Gemini 3.1 Flash Lite

Google · May '26

85.58%

69.38%

71.44%

72.06%

74.61%

Gemini 2.5 Pro

Google · Feb '26

85.58%

79.47%

51.22%

75.95%

73.05%

DeepSeek-R1

DeepSeek AI · Feb '26

—

70.63%

73.87%

72.99%

72.50%

Claude Sonnet 4.6

Anthropic · Feb '26

79.33%

62.38%

68.75%

78.04%

72.12%

GPT-5.2

OpenAI · Feb '26

75.50%

67.38%

75.27%

69.97%

72.03%

Gemini 2.5 Flash

Google · Feb '26

81.05%

60.50%

71.20%

74.42%

71.79%

Grok 4.1 Fast Reasoning

xAI · Feb '26

72.60%

59.63%

64.90%

70.13%

66.81%

DeepSeek-V3.2

DeepSeek AI · Feb '26

—

61.50%

64.66%

71.06%

65.74%

Grok 4 Fast Reasoning

xAI · Feb '26

74.76%

54.13%

60.26%

71.32%

65.12%

Claude Haiku 4.5

Anthropic · Feb '26

63.46%

54.75%

67.72%

62.49%

62.10%

* Scores represent average performance across all benchmark sub-tasks in each dataset. Last updated: 23 May 2026.

AfriMCQA

Multilingual Cultural Understanding · Accuracy (%)

AfriMMLU

Knowledge & Reasoning Across Languages · Accuracy (%)

MasakhaNEWS

News Topic Classification · Macro-F1 (%)

AfriMedQA v2

Pan-African Clinical QA · Macro-Accuracy (%)

Benchmarks

Four Pillars of African AI Evaluation

Each benchmark targets a distinct dimension of African contextual understanding, spanning culture, language, and clinical medicine, and providing a multidimensional view of LLM capability.

Afri-MCQA

Multimodal, culturally-grounded multiple-choice questions in 16+ African languages, testing deep cultural and linguistic comprehension.

Culture Multilingual Multimodal

Languages 16+

Task Type MCQ / VQA

Metric Accuracy

AfriMMLU

Human-translated MMLU spanning 5 subjects across 17 African languages. Tests knowledge across geography, law, economics, global facts and mathematics.

Knowledge Reasoning Human-Translated

Subjects 5 Domains

Languages 17

Metric Accuracy

MasakhaNEWS

News topic classification across 16 African languages and 7 categories. Tests in-language understanding of real-world African news across diverse topics.

NLP Classification News

Categories 7 Topics

Languages 16

Metric Macro-F1

AfriMedQA v2

Pan-African medical QA dataset with 15,275 questions from 621 contributors across 16 countries, covering 32+ clinical specialties.

Medical Expert-Curated Healthcare

Questions 15,275

Specialties 32+

Metric Macro-Accuracy

Evaluated Models

Frontier AI Systems Under Review

Google

Gemini Series

Four Gemini models evaluated; Gemini 3.5 Flash leads the entire leaderboard with the highest overall score.

Models3.5 Flash · 3.1 Flash Lite

Best Score82.12%

Anthropic

Claude Series

Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5 evaluated; Opus leads on MasakhaNEWS and ranks #2 overall.

ModelsOpus / Sonnet 4.6 · Haiku 4.5

Best Score77.19%

OpenAI

GPT Series

GPT-5.1, GPT-5.2, and GPT-5.4 evaluated across all four African benchmarks in zero-shot and few-shot settings.

ModelsGPT-5.1 · GPT-5.2 · GPT-5.4

Best Score75.88%

DeepSeek AI

DeepSeek Series

DeepSeek-V4-Pro, R1, and V3.2 evaluated; V4-Pro leads the family with strong MMLU and MasakhaNEWS scores.

ModelsV4-Pro · R1 · V3.2

Best Score76.26%

xAI

Grok Series

Grok 4 Fast Reasoning and Grok 4.1 Fast Reasoning evaluated with web access disabled for fairness.

ModelsGrok 4 · Grok 4.1

Best Score66.81%

Community Note

Why This Leaderboard Matters

To our community,

We launched this leaderboard because a fundamental piece has been missing from the global AI conversation: a meaningful way to measure how well intelligent systems actually understand African realities.

For too long, Africa has remained at the margins of model evaluation—frequently included in rhetoric, but rarely represented in data, linguistic diversity, or contextual testing. This benchmark is our commitment to changing that narrative.

When AI systems fail to grasp African languages or misinterpret local contexts, the result isn’t just a technical glitch—it is digital exclusion. The future of intelligence cannot be built on a partial map of the world.

This leaderboard ensures that Africa is never an afterthought in global development. By building the data foundations and evaluation standards necessary for truly inclusive AI, we aren’t just measuring progress; we are shaping it.

Together, we can ensure Africa isn't just a spectator in the AI future, but a primary architect of it.

Best regards,

Olaoye Anthony Somide

Founder & CEO, DataLens Africa

DataLens Studio

Help the next model
top this leaderboard.

Every score on this leaderboard is a reflection of the data behind the model. Better African language annotations produce better training data, which trains models that score higher on these benchmarks. DataLens Studio is the platform where that work happens — built specifically for African language labeling, RLHF, and cultural context annotation.

Annotate African language data

Label text, audio, and cultural context across 50+ African languages.

Build better training datasets

Human-verified, culturally-grounded data that frontier models are missing.

Models score higher on African benchmarks

Your contributions directly move the needle on this leaderboard.

DataLens Studio

African Language Annotation Platform

Text, audio & RLHF annotation tasks

50+ African languages supported

Earn while contributing to African AI

Quality-reviewed by African language experts

Start Annotating Sign in to your account

Every annotation is a step toward AI that truly understands Africa.

Measuring AI Intelligence in African Contexts.

15k+

16+

32

5

African AI Benchmark Leaderboard

Four Pillars of African AI Evaluation

Frontier AI Systems Under Review

Why This Leaderboard Matters

Help the next modeltop this leaderboard.

Help the next model
top this leaderboard.