Bookmarks

CPRD COVID-19 Symptoms and Risk Factors Synthetic Dataset

Population Size

4,173,000

People

Population Size statistic card

Years

2019 - 2021

Years statistic card

Associated BioSamples

None/not available

Associated BioSamples statistic card

Geographic coverage

United Kingdom

Geographic coverage statistic card

Lead time

1-2 months

Lead time statistic card

Summary

This synthetic dataset is based on anonymised real primary care patient data extracted from the CPRD Aurum database. The dataset focuses on patients presenting to primary care with symptoms indicative of COVID-19.

Documentation

This wholly synthetic dataset is based on real anonymised primary care patient data extracted from the CPRD Aurum database. Researchers will not be able to access the real anonymised patient data extract which were used as the basis for the synthetic dataset generation to preserve patient privacy.

The dataset focuses on patients presenting to primary care with symptoms indicative of COVID-19 (confirmed/suspected COVID-19) and control patients with negative COVID-19 test results. The dataset includes data on sociodemographic and clinical risk factors. The ‘ground truth’ CPRD Aurum data extract used as the basis for generating this synthetic dataset included data till 13/04/2021 on patients with either suspected or confirmed COVID-19 as ascertained from the primary care record. The ground truth data extract was subject to data pre-processing and as such, the synthetic dataset based on this, does not reflect the structure of the source CPRD Aurum database.

The development of this synthetic dataset was funded by NHS X using the synthetic data generation and evaluation framework developed by CPRD under a grant from the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK. The methodology used to generate and evaluate this synthetic dataset is outlined in Wang et al. 2019 (DOI Bookmark:10.1109/CBMS.2019.00036).

Dataset type

Health and disease

Dataset sub-type

Not applicable

Dataset population size

4173000

Keywords

Observations

Observed Node

Disambiguating Description

Measured Value

Measured Property

Observation Date

Persons

Population size

4173000

COUNT

01 Dec 2021

Provenance

Purpose of dataset collection

Study

Collection source setting

Other

Patient pathway description

Primary care

Image contrast

Not stated

Biological sample availability

None/not available

Structural Metadata

Details

Publishing frequency

Other

Version

2.0.0

Modified

08/10/2024

Distribution release date

01/12/2021

Citation Requirements

CPRD

Coverage

Start date

03/12/2019

End date

12/04/2021

Time lag

Not applicable

Geographic coverage

United Kingdom

Maximum age range

150

Follow-up

Unknown

Accessibility

Language

en

Controlled vocabulary

SNOMED CT

Format

Tab-delimited-text

Data Access Request

Dataset pipeline status

Not available

Time to dataset access

1-2 months

Access request cost

Access method category

Varies based on project

Access service description

Access to CPRD data, including UK Primary Care Data, and linked data such as Hospital Episode Statistics, is subject to protocol approval via CPRD’s Research Data Governance (RDG) Process. Independent scientific and patient advice is provided by Expert Review Committees (ERCs) and the Central Advisory Committee (CAC): https://www.cprd.com/research-applications

Jurisdiction

GB-GBN

Data use limitation

General research use,No linkage,Research-specific restrictions,Research use only

Data use requirements

Geographical restrictions,Institution-specific restrictions,Project-specific restrictions,Time limit on use,User-specific restriction

Data Controller

Clinical Practice Research Datalink (CPRD)

Data Processor

CPRD

Dataset Types: Health and disease


Collection Sources: Other

end of page