Bookmarks
CPRD COVID-19 Symptoms and Risk Factors Synthetic Dataset
Population Size
4,173,000
People
Years
2019 - 2021
Associated BioSamples
None/not available
Geographic coverage
United Kingdom
Lead time
1-2 months
Summary
Documentation
This wholly synthetic dataset is based on real anonymised primary care patient data extracted from the CPRD Aurum database. Researchers will not be able to access the real anonymised patient data extract which were used as the basis for the synthetic dataset generation to preserve patient privacy.
The dataset focuses on patients presenting to primary care with symptoms indicative of COVID-19 (confirmed/suspected COVID-19) and control patients with negative COVID-19 test results. The dataset includes data on sociodemographic and clinical risk factors. The ‘ground truth’ CPRD Aurum data extract used as the basis for generating this synthetic dataset included data till 13/04/2021 on patients with either suspected or confirmed COVID-19 as ascertained from the primary care record. The ground truth data extract was subject to data pre-processing and as such, the synthetic dataset based on this, does not reflect the structure of the source CPRD Aurum database.
The development of this synthetic dataset was funded by NHS X using the synthetic data generation and evaluation framework developed by CPRD under a grant from the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK. The methodology used to generate and evaluate this synthetic dataset is outlined in Wang et al. 2019 (DOI Bookmark:10.1109/CBMS.2019.00036).
Keywords
Observations
Observed Node | Disambiguating Description | Measured Value | Measured Property | Observation Date |
---|---|---|---|---|
Persons | Population size | 4173000 | COUNT | 01 Dec 2021 |
Provenance
Structural Metadata
Details
08/10/2024
01/12/2021
Coverage
03/12/2019
12/04/2021