DS-06 Agriculture

Crop Disease & Pest Detection Image Dataset

80K+ field-captured images of crop diseases and pest damage across cassava, maize, and cocoa — each frame expert-annotated with bounding boxes, disease class labels, and severity scores for training computer vision early-warning systems.

This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.

The African Crop Disease & Pest Detection Image Dataset contains 80K+ high-resolution field photographs collected across Nigeria, Ghana, Cameroon, and Côte d'Ivoire — four countries that together account for a significant share of West and Central Africa's cassava, maize, and cocoa production. Images were captured under varying lighting and weather conditions using standardised smartphone protocols, ensuring realistic distribution of image quality for field-deployment models.

Each image is annotated by plant pathologists and certified agronomists with: a primary disease or pest class (29 distinct classes across the three crop types), a severity score on a 0–4 scale aligned with CABI severity standards, bounding-box or pixel-level segmentation masks (where available), and crop growth stage at time of capture. The class taxonomy covers fungal, bacterial, and viral diseases as well as major pest species including fall armyworm, cassava mealybug, and cocoa mirids.

The dataset is split into train / validation / test partitions stratified by crop type, country, and severity distribution. A held-out synthetic augmentation set generated via diffusion-model upsampling is included for regularisation experiments. All metadata fields are stored as JSON sidecar files compatible with standard annotation formats (COCO, YOLO, Pascal VOC).

Key Use Cases

Real-time crop disease detection via mobile app (on-device inference)
Farm early-warning system integration with satellite NDVI triggers
Extension-service triage: severity-ranked alert routing to agronomists
Multi-class pest identification for input advisory engines
Transfer learning baseline for new crop or regional adaptation
Synthetic data generation evaluation and benchmark
Agri-insurance loss estimation from image-based severity scores
Research: domain adaptation from lab to in-field image conditions

Dataset Highlights

Total Images
80K+
field-captured frames
Disease / Pest Classes
29
across cassava, maize, cocoa
Annotation Type
BBox + Mask
COCO / YOLO / VOC compatible
Severity Levels
0 – 4
CABI-aligned scale

Compatible Frameworks & Formats

🐍 PyTorch / torchvision
🤖 TensorFlow / Keras
📦 Ultralytics YOLOv8
🖼️ COCO / Pascal VOC / YOLO
☁️ Roboflow / AWS Rekognition
📡 Google Earth Engine (EVI trigger)

Geographic Coverage

Primary Coverage
Other Regions

Dataset Schema

Each record represents one annotated image. Metadata fields cover image provenance, annotation details, crop and disease taxonomy, severity, and dataset split assignment.

Field NameTypeDescriptionNullableExample
image_id STRING Unique image identifier No IMG-NGA-0048213
country_code STRING ISO 3166-1 alpha-2 country of capture No NG
capture_date DATE Date of field capture (YYYY-MM-DD) Yes 2023-08-14
crop_type ENUM Crop photographed: CASSAVA, MAIZE, COCOA No CASSAVA
growth_stage ENUM Crop growth stage: SEEDLING, VEGETATIVE, FLOWERING, MATURITY Yes VEGETATIVE
disease_class STRING Primary annotated disease or pest class (29 total classes) No Cassava Brown Streak Disease
disease_category ENUM Disease origin: FUNGAL, BACTERIAL, VIRAL, PEST, HEALTHY No VIRAL
severity_score INTEGER CABI-aligned severity 0 (healthy) – 4 (severe) No 2
bbox_count INTEGER Number of bounding box annotations in the image No 3
has_segmentation BOOLEAN True if pixel-level segmentation mask is available No false
image_width_px INTEGER Image width in pixels No 1920
image_height_px INTEGER Image height in pixels No 1080
capture_device ENUM Capture method: SMARTPHONE, DRONE, DSLR Yes SMARTPHONE
annotator_id STRING Anonymised annotator identifier No ANN-047
annotation_confidence FLOAT Inter-annotator agreement score for this image (0–1) Yes 0.92
is_synthetic BOOLEAN True if image was generated by diffusion-model augmentation No false
split ENUM Dataset partition: TRAIN, VAL, TEST, SYNTHETIC No TRAIN
annotation_format ENUM Annotation format available: COCO, YOLO, VOC No COCO

Sample Records

Four representative metadata records spanning crop types, disease categories, severity levels, and capture devices.

crop_disease_sample.json
[ { "image_id": "IMG-NGA-0048213", "country_code": "NG", "capture_date": "2023-08-14", "crop_type": "CASSAVA", "growth_stage": "VEGETATIVE", "disease_class": "Cassava Brown Streak Disease", "disease_category": "VIRAL", "severity_score": 2, "bbox_count": 3, "has_segmentation": false, "image_width_px": 1920, "image_height_px": 1080, "capture_device": "SMARTPHONE", "annotator_id": "ANN-047", "annotation_confidence": 0.92, "is_synthetic": false, "split": "TRAIN", "annotation_format": "COCO" }, { "image_id": "IMG-GHA-0011874", "country_code": "GH", "capture_date": "2023-06-22", "crop_type": "MAIZE", "growth_stage": "FLOWERING", "disease_class": "Fall Armyworm", "disease_category": "PEST", "severity_score": 3, "bbox_count": 7, "has_segmentation": true, "image_width_px": 4032, "image_height_px": 3024, "capture_device": "DSLR", "annotator_id": "ANN-012", "annotation_confidence": 0.88, "is_synthetic": false, "split": "TRAIN", "annotation_format": "COCO" }, { "image_id": "IMG-CIV-0027341", "country_code": "CI", "capture_date": "2022-11-05", "crop_type": "COCOA", "growth_stage": "MATURITY", "disease_class": "Black Pod Disease", "disease_category": "FUNGAL", "severity_score": 4, "bbox_count": 5, "has_segmentation": true, "image_width_px": 4000, "image_height_px": 3000, "capture_device": "SMARTPHONE", "annotator_id": "ANN-091", "annotation_confidence": 0.79, "is_synthetic": false, "split": "TEST", "annotation_format": "YOLO" }, { "image_id": "IMG-SYN-0090012", "country_code": "NG", "capture_date": null, "crop_type": "CASSAVA", "growth_stage": "VEGETATIVE", "disease_class": "Cassava Mosaic Disease", "disease_category": "VIRAL", "severity_score": 1, "bbox_count": 2, "has_segmentation": false, "image_width_px": 1024, "image_height_px": 1024, "capture_device": null, "annotator_id": "SYN-GEN", "annotation_confidence": null, "is_synthetic": true, "split": "SYNTHETIC", "annotation_format": "YOLO" } ]
Request Dataset Access

All datasets are available under a commercial licence agreement. Our team typically responds within 2 business days.

Request Access
NDA may be required

Build with Data that reflects Africa

Request access to our full catalog of licensed human-validated African datasets or request custom data tailored to your project.