HDR Gateway logo
HDR Gateway logo

Bookmarks

Comprehensive Patient Records for Cancer Outcomes

Population Size

40,000

People

Years

2008 - 2018

Associated BioSamples

Availability to be confirmed

Geographic coverage

United Kingdom

England

...see more

Lead time

Not applicable

Summary

The Comprehensive Patient Records research dataset relates to the medical history of cancer patients prior to cancer, their diagnosis and treatment, long-term outcomes, and medical history of matched non-cancer patients that form a comparator cohort.

Documentation

The data is derived from linked primary, secondary and tertiary care electronic health records and participant survey responses. Data is de-identified at source (Leeds Teaching Hospitals NHS Trust (LTHT) and ResearchOne) and linked using matching pseudonymous digests that are re-pseudonymised upon linkage by University of Leeds IT to produce irreversibly pseudonymous data that is processed into a research dataset. The data relates to the medical history of cancer patients prior to cancer, during their cancer diagnosis and treatment, and following their long-term outcomes, and the medical history of matched non-cancer patients that form a comparator cohort.

The data relates to 431,352 patients in the UK that LTHT have a ‘legitimate patient relationship’ with and that were determined by LTHT to have had a cancer diagnosis between 2004 and 2018 or be a matched non-cancer patient. Where available, data from ResearchOne provides primary care information for these patients. Where the patients were invited to participate in a patient reported outcomes measures survey (PROMS), this status is recorded. Where the patient returned a consented PROMS, the PROMS data will also be available once it has completed the extract, transform and load process.

The dataset is currently 5.7 GB and further ResearchOne and PROMS data is anticipated. The dataset is arranged as a relational database, with tables linking on the patient level by a pseudonymous digest. Each table is a comma separated values (CSV) file and relates to an event type, such as prescription cost, address history or diagnosis. All patients have an entry (row) in the demographics table the number of times a patient has an entry in the other tables depends on how many events of that type were recorded for the patient.

The dataset is split into two files, each with similar table structure the main dataset contains all patients and the PROMs dataset contains only those in the PROMs cohort (for whom additional PROMs data will be added). Each table has a re-pseudonymised digest field, “Digest2” and an indicator as to whether the patient has data from ResearchOne available, “TPP_Linked” (0 or 1). Additional fields per table are defined in Table 1. No fields contain sensitive information.

Contains patients in the UK that LTHT have a ‘legitimate patient relationship’ with and that were determined by LTHT to have had a cancer diagnosis between 2004 and 2018 or be a matched non-cancer patient.

Dataset type
Health and disease
Dataset sub-type
Not applicable
Dataset population size
40,000

Keywords

CANCER, population, Leeds, secondary care, data, primary care, research

Observations

Observed Node
Disambiguating Description
Measured Value
Measured Property
Observation Date

Persons

CPR population

40000

count

20 Oct 2017

Provenance

Purpose of dataset collection
Study
Collection source setting
Primary care - Clinic
Patient pathway description
Primary and Secondary care
Image contrast
Not stated
Biological sample availability
Availability to be confirmed

Structural Metadata

Details

Publishing frequency
Static
Version
1.0.0
Modified

08/10/2024

Distribution release date

12/01/2018

Citation Requirements

Leeds Teaching Hospital Trust (LTHT) are the data controller Professor Geoff Hall and Professor Adam Glaser are the Principal Investigators.

Coverage

Start date

01/01/2008

End date

12/01/2018

Time lag
Variable
Geographic coverage
United Kingdom, England, Yorkshire and The Humber, Leeds
Maximum age range
100
Follow-up
Unknown

Accessibility

Language
en
Controlled vocabulary
ICD10
Format
text/csv, text/xml

Data Access Request

Dataset pipeline status
Not available
Access rights
In Progress
Time to dataset access
Not applicable
Access request cost
This has not yet been defined and will be developed on a case by case basis initially.
Access method category
Direct access
Access service description
The Data Access will be from securely linked primary- and secondary-care data, non-identifiable patient information from GP surgeries, community care units and hospital records repository, eg Leeds Teaching Hospitals NHS Trust (LTHT) , Data Request and access process.
Data use limitation
General research use, Genetic studies only, Research-specific restrictions, Research use only, No linkage
Data use requirements
Collaboration required, Ethics approval required
Data Controller
Leeds Teaching Hospital Trust
Data Processor
Leeds Institute for Data Analytics

Dataset Types: Health and disease


Collection Sources: No collection sources listed