African Tropical Disease Diagnostic Image Dataset
120K+ expert-labelled clinical images spanning malaria blood smears, trachoma eye photographs, skin lesion scans, and chest X-rays — annotated for 8 tropical and endemic diseases to train diagnostic AI models deployable in low-resource African healthcare settings.
This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.
The African Tropical Disease Diagnostic Image Dataset contains 120K+ clinical images collected at partner hospitals and community health clinics across Nigeria, Ghana, Kenya, and Tanzania. The dataset covers 8 disease targets: malaria (Giemsa-stained blood smear microscopy), trachoma (external eye photographs, WHO TF/TI grading), cutaneous leishmaniasis (skin lesion photographs), schistosomiasis (urine microscopy), tuberculosis (chest X-ray), typhoid (clinical photograph panels), sickle-cell crisis (peripheral blood smear), and healthy/negative controls for each modality. Images were captured using standardised protocols on optical microscopes with smartphone adaptors, digital fundus cameras, and portable X-ray units.
Each image is labelled by at least two clinicians — a specialist physician and a trained clinical officer — using a consensus adjudication protocol. Labels include: disease class, severity or parasitaemia grade (modality-dependent), image quality score (1–5), and a deployment-readiness flag indicating whether the image meets quality thresholds for model training. Bounding box annotations are available for 40 % of images; the remainder carry image-level class labels only.
The dataset is split into train / validation / test sets stratified by disease, site, and image quality. A separate low-quality evaluation set is provided to measure model robustness under field conditions. Image modalities are stored in separate sub-directories with modality-specific metadata files. All images are provided as JPEG (photographs, eye images) or 16-bit TIFF (microscopy, X-ray) with JSON annotation sidecars in COCO format.
Key Use Cases
Dataset Highlights
Compatible Frameworks & Formats
Geographic Coverage
Dataset Schema
Each record represents one annotated clinical image. Fields cover image provenance, modality, disease annotation, quality scoring, and dataset split assignment.
| Field Name | Type | Description | Nullable | Example |
|---|---|---|---|---|
| image_id | STRING | Unique image identifier | No | IMG-MAL-NGA-0071432 |
| country_code | STRING | ISO 3166-1 alpha-2 country of collection | No | NG |
| site_id | STRING | Anonymised collection site identifier | No | SITE-NGA-014 |
| collection_date | DATE | Date of image collection (YYYY-MM-DD) | Yes | 2023-04-17 |
| disease_target | ENUM | Disease: MALARIA, TRACHOMA, TB, LEISHMANIA, SCHISTOSOMIASIS, TYPHOID, SICKLE_CELL, HEALTHY | No | MALARIA |
| modality | ENUM | Image modality: BLOOD_SMEAR, EYE_PHOTO, SKIN_PHOTO, CHEST_XRAY, URINE_MICRO | No | BLOOD_SMEAR |
| label | STRING | Primary disease class label (disease-specific taxonomy) | No | P_FALCIPARUM_POSITIVE |
| severity_grade | STRING | Severity or parasitaemia grade (modality-dependent, null if not applicable) | Yes | MODERATE |
| image_quality_score | INTEGER | Annotator-assigned image quality score 1 (unusable) – 5 (excellent) | No | 4 |
| deployment_ready | BOOLEAN | True if image meets quality threshold for model training inclusion | No | true |
| has_bbox | BOOLEAN | True if bounding box annotations are available | No | true |
| bbox_count | INTEGER | Number of bounding boxes (0 if has_bbox is False) | No | 3 |
| annotator_agreement | FLOAT | Inter-annotator agreement score between two clinician reviewers (0–1) | Yes | 0.88 |
| image_filename | STRING | Image file path relative to dataset root | No | images/malaria/blood_smear/IMG-MAL-NGA-0071432.jpg |
| annotation_format | ENUM | Annotation format: COCO, IMAGE_LEVEL | No | COCO |
| split | ENUM | Dataset partition: TRAIN, VAL, TEST, LOW_QUALITY_EVAL | No | TRAIN |
Sample Records
Four representative image metadata records spanning disease targets, modalities, and annotation types.
Build with Data that reflects Africa
Request access to our full catalog of licensed human-validated African datasets or request custom data tailored to your project.