HDR Gateway logo
HDR Gateway logo

Bookmarks

CPRD Cardiovascular Disease Synthetic Dataset

Population Size

499,344

People

Years

2020

Associated BioSamples

None/not available

Geographic coverage

United Kingdom

Lead time

1-2 months

Summary

This synthetic dataset is based on anonymised real primary care patient data extracted from the CPRD Aurum database. The dataset focuses on cardiovascular disease risk factors and was a proof-of-concept dataset.

Documentation

This wholly synthetic dataset is based on real anonymised primary care patient data extracted from the CPRD Aurum database and focuses on cardiovascular disease risk factors. Researchers will not be able to access the real anonymised patient data extract which was used as the basis for the synthetic dataset generation to preserve patient privacy. The ground truth data extract was subject to data pre-processing and as such, the synthetic dataset, which is based on this, does not reflect the structure of the source CPRD Aurum database. This synthetic dataset was developed as part of a project funded by the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK. The methodology used to generate and evaluate this synthetic dataset is outlined in Wang et al. 2019.

Dataset type
Health and disease
Dataset sub-type
Not applicable
Dataset population size
499344

Keywords

CVD, Synthetic, Cardiovascular Disease

Observations

Observed Node
Disambiguating Description
Measured Value
Measured Property
Observation Date

Persons

Patients in the dataset

499344

COUNT

28 Jun 2020

Provenance

Purpose of dataset collection
Study
Collection source setting
Other
Patient pathway description
Primary care
Image contrast
Not stated
Biological sample availability
None/not available

Structural Metadata

Details

Publishing frequency
Other
Version
2.0.0
Modified

08/10/2024

Distribution release date

28/06/2020

Citation Requirements
CPRD

Coverage

Start date

25/03/2020

Time lag
Not applicable
Geographic coverage
United Kingdom
Maximum age range
150
Follow-up
Unknown

Accessibility

Language
en
Controlled vocabulary
SNOMED CT
Format
Tab delimited text

Data Access Request

Dataset pipeline status
Not available
Time to dataset access
1-2 months
Access request cost
Access method category
Varies based on project
Access service description
Access to CPRD data, including UK Primary Care Data, and linked data such as Hospital Episode Statistics, is subject to protocol approval via CPRD’s Research Data Governance (RDG) Process. Independent scientific and patient advice is provided by Expert Review Committees (ERCs) and the Central Advisory Committee (CAC): https://www.cprd.com/research-applications
Jurisdiction
GB-GBN
Data use limitation
General research use,No linkage,Research-specific restrictions,Research use only
Data use requirements
Geographical restrictions,Institution-specific restrictions,Project-specific restrictions,Time limit on use,User-specific restriction
Data Controller
Clinical Practice Research Datalink (CPRD)
Data Processor
CPRD

Dataset Types: Health and disease


Collection Sources: Other