DS-13 HealthCare

African Patient Electronic Health Record Dataset

1M+ de-identified patient episodes from tertiary and secondary hospitals in Nigeria and Ghana — covering demographics, ICD-10 diagnoses, treatment pathways, lab results, and discharge outcomes — purpose-built for clinical decision support, disease surveillance, and health-system AI in African contexts.

Restricted Access — Ethics Review Required
This dataset contains de-identified clinical data. Access requires submission of an institutional ethics approval letter and signing of the DataLens African Data Use Agreement. Apply via the request form below.

This is a synthetic dataset generated from high-quality expert-labelled seed data. All records are algorithmically derived — statistical distributions, inter-field correlations, and annotation characteristics faithfully replicate real-world patterns from the source data, while ensuring no real individual, organisation, or transaction can be identified or reconstructed.

The African Patient Electronic Health Record Dataset aggregates 1M+ de-identified patient episodes from 14 tertiary and secondary hospitals across Nigeria (Lagos, Abuja, Kano, Port Harcourt) and Ghana (Accra, Kumasi, Tamale). Each episode covers a single inpatient admission or outpatient visit and includes structured fields for patient demographics, presenting complaint, ICD-10-CM diagnosis codes (primary + up to 4 secondary), treatment interventions (procedure codes, prescribed medications), selected laboratory results, and discharge outcome.

De-identification was performed using a two-stage pipeline: a rule-based redactor replaced direct identifiers (names, MRN, dates shifted by a random per-patient offset), followed by a statistical disclosure control review to suppress rare combinations of quasi-identifiers. All data processing was conducted under ethics approvals from the respective hospital institutional review boards and the DataLens Africa Research Ethics Committee. The dataset complies with the Nigeria Data Protection Regulation (NDPR) and Ghana's Data Protection Act.

The dataset is structured to support a wide range of clinical AI tasks: supervised classification of diagnosis and readmission risk, survival analysis, treatment-effect estimation, and NLP extraction from free-text clinical notes (where available). A separate ICD-10 code co-occurrence graph and a hospital-level metadata file (bed capacity, facility type, urban/rural flag) are provided as companion files to enable multi-level modelling.

Key Use Cases

Clinical decision support: diagnosis and treatment recommendation
30-day readmission risk prediction
Disease burden and epidemiological surveillance modelling
Maternal and child health outcome analysis
Lab result anomaly detection and early deterioration alerts
ICD-10 code auto-suggestion from clinical notes (NLP)
Health-system capacity planning and resource allocation models
Bias and equity auditing of clinical AI across demographics

Dataset Highlights

Patient Episodes
1M+
inpatient + outpatient visits
Hospitals
14
tertiary and secondary facilities
ICD-10 Codes
2,400+
unique diagnosis codes observed
Ethics Compliant
NDPR + DPA
Nigeria & Ghana data protection

Geographic Coverage

Primary Coverage
Other Regions

Dataset Schema

Each record represents one patient episode (inpatient admission or outpatient visit). All direct identifiers have been removed; dates are shifted by a per-patient random offset preserving temporal ordering within a patient's history.

Field NameTypeDescriptionNullableExample
episode_id STRING Unique episode identifier No EP-NGA-LG-00841923
patient_id STRING Anonymised persistent patient identifier (links episodes for same patient) No PAT-NGA-0049182
country_code STRING ISO 3166-1 alpha-2 country code No NG
facility_id STRING Anonymised hospital / facility identifier No FAC-NGA-007
facility_type ENUM Facility level: TERTIARY, SECONDARY No TERTIARY
episode_type ENUM Visit type: INPATIENT, OUTPATIENT, EMERGENCY No INPATIENT
admission_year INTEGER Shifted year of admission (temporal ordering preserved within patient) No 2022
age_group ENUM Patient age group: INFANT, CHILD, ADOLESCENT, ADULT, ELDERLY No ADULT
gender ENUM Patient gender: MALE, FEMALE No FEMALE
primary_diagnosis STRING Primary ICD-10-CM diagnosis code No A01.0
secondary_diagnoses JSON Array of up to 4 secondary ICD-10-CM codes Yes ["E11.9", "I10"]
length_of_stay_days INTEGER Inpatient length of stay in days (null for outpatient) Yes 5
lab_results_summary JSON Key-value pairs of selected lab test results (test name → value + unit) Yes {"HB": "9.2 g/dL", "WBC": "11.4 k/uL"}
discharge_outcome ENUM Episode outcome: DISCHARGED, REFERRED, DECEASED, ABSCONDED No DISCHARGED
readmitted_30d BOOLEAN True if patient was readmitted within 30 days of discharge Yes false
has_clinical_notes BOOLEAN True if free-text clinical notes are available for this episode No true

Sample Records

Four representative patient episodes illustrating variation across facility types, diagnoses, and outcomes.

ehr_sample_records.json
[ { "episode_id": "EP-NGA-LG-00841923", "patient_id": "PAT-NGA-0049182", "country_code": "NG", "facility_id": "FAC-NGA-007", "facility_type": "TERTIARY", "episode_type": "INPATIENT", "admission_year": 2022, "age_group": "ADULT", "gender": "FEMALE", "primary_diagnosis": "A01.0", "secondary_diagnoses": [ "E11.9", "I10" ], "length_of_stay_days": 5, "lab_results_summary": { "HB": "9.2 g/dL", "WBC": "11.4 k/uL" }, "discharge_outcome": "DISCHARGED", "readmitted_30d": false, "has_clinical_notes": true }, { "episode_id": "EP-NGA-KN-00219047", "patient_id": "PAT-NGA-0112934", "country_code": "NG", "facility_id": "FAC-NGA-012", "facility_type": "SECONDARY", "episode_type": "OUTPATIENT", "admission_year": 2023, "age_group": "CHILD", "gender": "MALE", "primary_diagnosis": "B54", "secondary_diagnoses": [ "D64.9" ], "length_of_stay_days": null, "lab_results_summary": { "RDT_MALARIA": "POSITIVE", "HB": "7.1 g/dL" }, "discharge_outcome": "DISCHARGED", "readmitted_30d": true, "has_clinical_notes": false }, { "episode_id": "EP-GHA-AC-00503812", "patient_id": "PAT-GHA-0071204", "country_code": "GH", "facility_id": "FAC-GHA-003", "facility_type": "TERTIARY", "episode_type": "EMERGENCY", "admission_year": 2021, "age_group": "ELDERLY", "gender": "MALE", "primary_diagnosis": "I63.9", "secondary_diagnoses": [ "I10", "E11.9", "N18.3" ], "length_of_stay_days": 12, "lab_results_summary": { "CREATININE": "4.2 mg/dL", "BP_SYSTOLIC": "182 mmHg" }, "discharge_outcome": "REFERRED", "readmitted_30d": null, "has_clinical_notes": true }, { "episode_id": "EP-GHA-KU-00388201", "patient_id": "PAT-GHA-0093441", "country_code": "GH", "facility_id": "FAC-GHA-005", "facility_type": "SECONDARY", "episode_type": "INPATIENT", "admission_year": 2023, "age_group": "INFANT", "gender": "FEMALE", "primary_diagnosis": "P36.9", "secondary_diagnoses": [], "length_of_stay_days": 8, "lab_results_summary": { "CRP": "48 mg/L", "WBC": "18.2 k/uL" }, "discharge_outcome": "DISCHARGED", "readmitted_30d": false, "has_clinical_notes": true } ]
Request Dataset Access

All datasets are available under a commercial licence agreement. Our team typically responds within 2 business days.

Request Access
NDA may be required

Build with Data that reflects Africa

Request access to our full catalog of licensed human-validated African datasets or request custom data tailored to your project.