HDR UK Gateway
HDR Gateway logo

Bookmarks

Genomics England - Cancer

Population Size

72,874

People

Population Size statistic card

Years

2014 - 2019

Years statistic card

Associated BioSamples

DNA

Tissue

Associated BioSamples statistic card

Geographic coverage

UK

Geographic coverage statistic card

Lead time

2-6 months

Lead time statistic card

Summary

Cancer data are presented for either the patient-level cancer diagnosis or 'disease type' or the tumour-specific sample details of participants in the Cancer arm of the 100,000 Genomes Project.

Documentation

Cancer data are presented for either the patient level cancer diagnosis or 'disease type' or the tumour specific sample details of participants in the Cancer arm of the 100,000 Genomes Project.Data Relating to Cancer Participants:cancer_participant_disease: For each cancer participant in the 100,000 Genomes Project, this table includes data about their cancer disease type and subtype.cancerparticipanttumour: For each cancer participant's tumour in the 100,000 Genomes Project, this table contains data that characterises the tumour, e.g. staging and grading; morphology and location; recurrence at time of enrolment; and the basis of diagnosis.cancerparticipanttumour metastaticsite: For each cancer participant in the 100,000 Genomes Project, this table contains the site of their metastatic disease in the body (if applicable) at diagnosis.cancercareplan: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains information from their NHS cancer care plan on their treatment and care intent, in particular outcomes of MDT meetings and coded connected data (e.g. diagnoses from scans).cancersurgery: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains details of what surgical procedures were had, as well as the specific location of the intervention.cancerriskfactorgeneral: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains data on general cancer risk factors, namely smoking status, height, weight and alcohol consumption. This table was compiled with input from GeCIP members.cancerriskfactorcancerspecific: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains data on specific risk factors related to particular cancer types. This table was compiled with input from GeCIP members.cancerinvestimaging: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains: coded data on imaging investigations characterising the scan, its modality, anatomical site and outcome; as well as the outcome of the imaging report in free text form.Data derived from or relating to tumour samples:cancer_invest_sample_pathology: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains full pathology reports and other related data on and from their tumour samples around diagnosis and characterisation of the cancer. Please note that much of this information is also found in the clinicsample and cancerparticipanttumour tables.cancerspecificpathology: For a proportion tumours from cancer participants in the 100,000 Genomes Project, this table contains pathology data specific to that participant’s cancer type. This may provide additional data to the cancerinvestsamplepathology and cancerparticipanttumour tables.cancersystemicanticancertherapy: For a proportion tumours from cancer participants in the 100,000 Genome

Dataset type

Health and disease

Dataset population size

72874

Keywords

Observations

Observed Node

Disambiguating Description

Measured Value

Measured Property

Observation Date

Persons

Rare Disease Participants

72874

Count

30 Mar 2023

Persons

Cancer Participants

15624

Count

30 Mar 2023

Findings

Cancer Germline - Number of genomes

32753

Count

30 Mar 2023

Findings

Cancer Tumour - Number of genomes

17003

Count

30 Mar 2023

Findings

Rare Disease - Number of genomes

73517

Count

30 Mar 2023

Provenance

Purpose of dataset collection

Care, Study, Other

Source of data extraction

EPR, Electronic survey, LIMS, Other

Collection source setting

Clinic, Secondary care - Outpatients, Secondary care - In-patients

Patient pathway description

Linked datasets cover secondary care.

Image contrast

Not stated

Biological sample availability

DNA,Tissue

Structural Metadata

Details

Publishing frequency

Quarterly

Version

19.0.3

Modified

04/02/2026

Distribution release date

30/03/2023

Citation Requirements

The 100,000 Genomes Project Protocol v3, Genomics England. doi:10.6084/m9.figshare.4530893.v3. 2017. Publications that use the Genomics England Database should include an author as: Genomics England Research Consortium. Please see publication policy.

Coverage

Start date

01/01/2014

End date

01/01/2019

Time lag

2-6 months

Geographic coverage

UK

Maximum age range

150

Follow-up

Other

Accessibility

Language

en

Alignment with standardised data models

OTHER

Controlled vocabulary

OPCS4, READ, SNOMED CT, NHS NATIONAL CODES, ODS, ICD10, HPO, OTHER

Format

Multiple Formats Available

Data Access Request

Dataset pipeline status

Not available

Time to dataset access

2-6 months

Access request cost

Fees will be dependent on the type of access that is necessary. Raw data is not eligible for export. Summary-level data may be exported provided that it is approved through the Genomics England Airlock Process

Access service description

More information about the Genomics England Research Environment can be found here: https://www.genomicsengland.co.uk/research

Genomics England 100k participants have consented to longitudinal lifetime followup and recontact safely through our clinical network. BRST (Bioinformatics Research Services) are a team of bioinformatics who know the dataset inside out and provide consultancy projects on a case by case basis. Our network of clinical and medical experts can be made available on case by case basis. Researchers have the opportunity to work with our and access the GeCIP network who are a community of world-leading experts in specific cancers and rare diseases.

Data use limitation

General research use

Data use requirements

Ethics approval required,Project-specific restrictions,Publication moratorium

Data Controller

GENOMICS ENGLAND

Data Processor

GENOMICS ENGLAND

Dataset Types: Health and disease


Collection Sources: Clinic, Secondary care - Outpatients, Secondary care - In-patients

Relationships: