DS-14 HealthCare

African Tropical Disease Diagnostic Image Dataset

120K+ expert-labelled clinical images spanning malaria blood smears, trachoma eye photographs, skin lesion scans, and chest X-rays — annotated for 8 tropical and endemic diseases to train diagnostic AI models deployable in low-resource African healthcare settings.

Restricted Access — Clinical Data Use Agreement Required
This dataset contains de-identified clinical medical images. Access requires a signed Clinical Data Use Agreement and confirmation of institutional affiliation. Apply via the request form below.

This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.

The African Tropical Disease Diagnostic Image Dataset contains 120K+ clinical images collected at partner hospitals and community health clinics across Nigeria, Ghana, Kenya, and Tanzania. The dataset covers 8 disease targets: malaria (Giemsa-stained blood smear microscopy), trachoma (external eye photographs, WHO TF/TI grading), cutaneous leishmaniasis (skin lesion photographs), schistosomiasis (urine microscopy), tuberculosis (chest X-ray), typhoid (clinical photograph panels), sickle-cell crisis (peripheral blood smear), and healthy/negative controls for each modality. Images were captured using standardised protocols on optical microscopes with smartphone adaptors, digital fundus cameras, and portable X-ray units.

Each image is labelled by at least two clinicians — a specialist physician and a trained clinical officer — using a consensus adjudication protocol. Labels include: disease class, severity or parasitaemia grade (modality-dependent), image quality score (1–5), and a deployment-readiness flag indicating whether the image meets quality thresholds for model training. Bounding box annotations are available for 40 % of images; the remainder carry image-level class labels only.

The dataset is split into train / validation / test sets stratified by disease, site, and image quality. A separate low-quality evaluation set is provided to measure model robustness under field conditions. Image modalities are stored in separate sub-directories with modality-specific metadata files. All images are provided as JPEG (photographs, eye images) or 16-bit TIFF (microscopy, X-ray) with JSON annotation sidecars in COCO format.

Key Use Cases

Malaria parasite detection and species classification from blood smears
Trachoma grading for mass drug administration targeting
Tuberculosis screening from chest X-rays in low-resource settings
Skin lesion classification for cutaneous disease triage
Point-of-care diagnostic AI for community health workers
Telemedicine image triage and specialist referral decision support
Transfer learning for new tropical disease imaging tasks
Model robustness evaluation under low-quality field image conditions

Dataset Highlights

Total Images
120K+
expert-labelled clinical images
Disease Targets
8
malaria, trachoma, TB, leishmania…
Image Modalities
4
microscopy, eye, skin, chest X-ray
BBox Annotations
40 %
of images have bounding boxes (COCO)

Compatible Frameworks & Formats

🤖 PyTorch / torchvision
🤖 TensorFlow / Keras
📦 Ultralytics YOLOv8
🖼️ COCO annotation format
📡 JPEG + 16-bit TIFF
☁️ Roboflow / AWS Rekognition compatible

Geographic Coverage

Primary Coverage
Other Regions

Dataset Schema

Each record represents one annotated clinical image. Fields cover image provenance, modality, disease annotation, quality scoring, and dataset split assignment.

Field NameTypeDescriptionNullableExample
image_id STRING Unique image identifier No IMG-MAL-NGA-0071432
country_code STRING ISO 3166-1 alpha-2 country of collection No NG
site_id STRING Anonymised collection site identifier No SITE-NGA-014
collection_date DATE Date of image collection (YYYY-MM-DD) Yes 2023-04-17
disease_target ENUM Disease: MALARIA, TRACHOMA, TB, LEISHMANIA, SCHISTOSOMIASIS, TYPHOID, SICKLE_CELL, HEALTHY No MALARIA
modality ENUM Image modality: BLOOD_SMEAR, EYE_PHOTO, SKIN_PHOTO, CHEST_XRAY, URINE_MICRO No BLOOD_SMEAR
label STRING Primary disease class label (disease-specific taxonomy) No P_FALCIPARUM_POSITIVE
severity_grade STRING Severity or parasitaemia grade (modality-dependent, null if not applicable) Yes MODERATE
image_quality_score INTEGER Annotator-assigned image quality score 1 (unusable) – 5 (excellent) No 4
deployment_ready BOOLEAN True if image meets quality threshold for model training inclusion No true
has_bbox BOOLEAN True if bounding box annotations are available No true
bbox_count INTEGER Number of bounding boxes (0 if has_bbox is False) No 3
annotator_agreement FLOAT Inter-annotator agreement score between two clinician reviewers (0–1) Yes 0.88
image_filename STRING Image file path relative to dataset root No images/malaria/blood_smear/IMG-MAL-NGA-0071432.jpg
annotation_format ENUM Annotation format: COCO, IMAGE_LEVEL No COCO
split ENUM Dataset partition: TRAIN, VAL, TEST, LOW_QUALITY_EVAL No TRAIN

Sample Records

Four representative image metadata records spanning disease targets, modalities, and annotation types.

tropical_disease_sample.json
[ { "image_id": "IMG-MAL-NGA-0071432", "country_code": "NG", "site_id": "SITE-NGA-014", "collection_date": "2023-04-17", "disease_target": "MALARIA", "modality": "BLOOD_SMEAR", "label": "P_FALCIPARUM_POSITIVE", "severity_grade": "MODERATE", "image_quality_score": 4, "deployment_ready": true, "has_bbox": true, "bbox_count": 3, "annotator_agreement": 0.88, "image_filename": "images/malaria/blood_smear/IMG-MAL-NGA-0071432.jpg", "annotation_format": "COCO", "split": "TRAIN" }, { "image_id": "IMG-TRC-GHA-0038901", "country_code": "GH", "site_id": "SITE-GHA-006", "collection_date": "2022-09-05", "disease_target": "TRACHOMA", "modality": "EYE_PHOTO", "label": "TF_POSITIVE", "severity_grade": "GRADE_TF1", "image_quality_score": 5, "deployment_ready": true, "has_bbox": false, "bbox_count": 0, "annotator_agreement": 0.95, "image_filename": "images/trachoma/eye_photo/IMG-TRC-GHA-0038901.jpg", "annotation_format": "IMAGE_LEVEL", "split": "VAL" }, { "image_id": "IMG-TBC-KEN-0054217", "country_code": "KE", "site_id": "SITE-KEN-009", "collection_date": "2023-01-22", "disease_target": "TB", "modality": "CHEST_XRAY", "label": "TB_POSITIVE", "severity_grade": "MODERATE", "image_quality_score": 3, "deployment_ready": true, "has_bbox": true, "bbox_count": 2, "annotator_agreement": 0.79, "image_filename": "images/tb/chest_xray/IMG-TBC-KEN-0054217.tiff", "annotation_format": "COCO", "split": "TRAIN" }, { "image_id": "IMG-HLT-TZA-0092104", "country_code": "TZ", "site_id": "SITE-TZA-002", "collection_date": "2022-06-30", "disease_target": "HEALTHY", "modality": "BLOOD_SMEAR", "label": "NEGATIVE", "severity_grade": null, "image_quality_score": 2, "deployment_ready": false, "has_bbox": false, "bbox_count": 0, "annotator_agreement": 1, "image_filename": "images/healthy/blood_smear/IMG-HLT-TZA-0092104.jpg", "annotation_format": "IMAGE_LEVEL", "split": "LOW_QUALITY_EVAL" } ]
Request Dataset Access

All datasets are available under a commercial licence agreement. Our team typically responds within 2 business days.

Request Access
NDA may be required

Build with Data that reflects Africa

Request access to our full catalog of licensed human-validated African datasets or request custom data tailored to your project.