HDR UK Gateway
HDR Gateway logo

Bookmarks

Generations Study: female UK cohort (questionnaire, imaging, samples, linkage)

Population Size

113,000

People

Population Size statistic card

Years

2003

Years statistic card

Associated BioSamples

Plasma

DNA

...see more

Associated BioSamples statistic card

Geographic coverage

United Kingdom

Isle of Man

Geographic coverage statistic card

Lead time

2-6 months

Lead time statistic card

Summary

The Generations Study is a UK prospective cohort of ~113,000 women aged 16+, recruited 2004–2011 and followed for over 20 years. Focused on breast cancer causes, prognosis, and outcomes, the study has collected five rounds of questionnaires, NHS and cancer registry linkages, a biobank (plasma, DNA, urine; ~92% with blood samples), tumour tissue, mammograms, accelerometry, and pathology data. Over 190 peer-reviewed publications to date.

Documentation

The Breast Cancer Now Generations Study (BGS) is a large prospective cohort study of women's health in the UK. Between 2004 and 2011, approximately 113,000 women aged 16 and over were recruited from across the UK, with the aim of following participants for 40 years. The primary scientific focus is understanding the causes and outcomes of breast cancer, with a broader remit covering other cancers and conditions affecting women's health. Participants were recruited through supporters of Breakthrough Breast Cancer (now Breast Cancer Now), responses to publicity, and invitations of friends and family members from existing participants. Approximately 30% of participants are first-degree relatives of other cohort members; the cohort includes over 15,000 family units. Geographically, participants are predominantly from England (89%), with Scotland (7%), Wales (4%), and Northern Ireland (1%) also represented. The cohort is overrepresented by women of higher socioeconomic status and White ethnicity relative to the UK population. All participants completed a 44-page baseline questionnaire at recruitment covering demographics, reproductive and menstrual history, hormone use, lifestyle, medical history, anthropometrics, and family cancer history. Follow-up questionnaires were administered approximately 2, 6, 9, 13, and 17 years post recruitment. Participants were recruited between 2004 and 2009. The first follow-up took place between 2007 and 2012, the second between 2010 and 2015, the third between 2014 and 2019, the fourth between 2017 and 2023, and the fifth in 2025. Response rates were very high in the first and second follow-up which was on-paper (99%, 97%), and the third and fourth follow-up which were a combination of on-paper and online (96%, 83%) and the fifth follow-up which was online only (59.4%). More than 500,000 questionnaires have been completed in total. Blood samples were provided by approximately 92% of participants at baseline (27ml) and a subset of 8938 participants approximately 6 years after enrolment (18 ml). Each blood sample was processed into plasma and buffy coat aliquots; over 3.2 million 0.5 ml barcoded straws are held in a biorepository in LN2 tanks (-180°C). Urine samples were collected from a subset of 847 pre- and post-menopausal women not using hormonal contraception. Whole section H&E slides from diagnostic paraffin-embedded blocks, tumour tissue microarrays (TMAs) and loose tissue cores are available for participants who subsequently developed breast or ovarian cancer. Diagnostic biopsy H&E's from participants with benign breast disease are also being collected. Screening mammograms have been collected for a nested case-control sub-study, with ongoing expansion to serial mammograms from approximately 50,000 women of screening age in the cohort. More than 12,000 participants wore wrist-worn triaxial accelerometers continuously for 8 days at 100 Hz. Genotyping array data are available for nested case-control studies of breast and ovarian cancer, with polygenic risk scores derived for cases and controls. Whole exome sequencing of DNA samples from all eligible participants is underway. Hormone and biomarker assay data - including oestradiol, testosterone, progesterone, prolactin, SHBG, IGF-1, leptin, and AMH - are available for subsets of participants. Health outcomes are self-reported by participants, and cancers are confirmed through data linkages to national cancer registries and deaths via national death registries in England and Scotland. Other health outcomes are available from NHS electronic medical records in England (hospital in-patient and out-patient). NHS flagging through the National Health Service Central Registers has tracked vital status from 2003, fully for England and partly for Scotland. Individually identifiable health record data have been supplied to the ICR by NHS England and National Records of Scotland under ethics committee and Health Research Authority approvals. Key scientific contributions include identification of over 300 common genetic variants associated with breast cancer risk, characterisation of hormonal and reproductive risk factors, and evidence linking physical inactivity and adolescent smoking to increased risk. More than 190 peer-reviewed publications have used Generations Study data, catalogued in a PubMed collection https://pubmed.ncbi.nlm.nih.gov/collections/64898594/. The study is jointly governed by Breast Cancer Now and The Institute of Cancer Research as legal custodians, with the ICR acting as data controller. Research applications are reviewed by the study Principal Investigators and an Access Committee on scientific merit, feasibility, consistency with participant consent, and governance requirements. Survey data, health outcomes, genomic and biomarker data, imaging-derived data, registry-linked data, and biological samples are available to researchers for not-for-profit purposes following a Data Access Agreement.

Dataset type

Health and disease

Dataset population size

113000

Keywords

Dataset and BioSample Aliases

Observations

Observed Node

Disambiguating Description

Measured Value

Measured Property

Observation Date

Persons

Total participants recruited

113000

Count

31 Dec 2011

Persons

Participants with blood samples in biobank (~92% of total)

103000

Count

31 Dec 2011

Events

Total incident cancer cases recorded since recruitment

22839

Count

01 Feb 2026

Events

Incident Breast cancer cases recorded since recruitment

6568

Count

01 Feb 2026

Events

Incident Uterus cancer cases recorded since recruitment

1574

Count

01 Feb 2026

Events

Incident Colon cancer cases recorded since recruitment

957

Count

01 Feb 2026

Events

Incident Bronchus and Lung cancer cases recorded since recruitment

640

Count

01 Feb 2026

Events

Incident Ovary cancer cases recorded since recruitment

604

Count

01 Feb 2026

Events

Incident Melanoma cancer cases recorded since recruitment

497

Count

01 Feb 2026

Events

Incident Lymphoma cancer cases recorded since recruitment

336

Count

01 Feb 2026

Events

Incident Rectum cancer cases recorded since recruitment

291

Count

01 Feb 2026

Events

Incident Pancreas cancer cases recorded since recruitment

269

Count

01 Feb 2026

Events

Incident Kidney cancer cases recorded since recruitment

216

Count

01 Feb 2026

Events

Incident Thyroid cancer cases recorded since recruitment

198

Count

01 Feb 2026

Events

Incident Brain cancer cases recorded since recruitment

171

Count

01 Feb 2026

Events

Incident Oesophagus cancer cases recorded since recruitment

124

Count

01 Feb 2026

Findings

Peer-reviewed publications using Generations Study data

190

Count

01 Mar 2026

Provenance

Purpose of dataset collection

Research cohort

Source of data extraction

Electronic survey, EPR, LIMS, Machine generated, Paper-based, Other

Collection source setting

Cohort, study, trial, Home, Community, Patient report outcome, Wearables

Patient pathway description

Prospective volunteer cohort of women recruited 2004–2011 across all four UK nations. Linked to national cancer registries (currently England, Scotland planned for Wales, Northern Ireland), NHS electronic medical records, and national death registries. Covers the full patient pathway from primary prevention and cancer risk assessment through screening, diagnosis, treatment, and long-term outcomes.

Image contrast

Not stated

Biological sample availability

Plasma,DNA,Urine,Tissue

Details

Publishing frequency

Irregular

Version

1.0.2

Modified

22/06/2026

Citation Requirements

We thank Breast Cancer Now and The Institute of Cancer Research for support and funding of the Generations Study, and the study participants, study staff, and the doctors, nurses, and other health-care providers and health information sources who have contributed to the study. The ICR acknowledges NHS funding to the Royal Marsden/ICR NIHR Biomedical Research Centre.

Coverage

Start date

31/05/2003

Time lag

More than 6 months

Geographic coverage

United Kingdom, Isle of Man, Channel Islands

Minimum age range

16

Maximum age range

102

Follow-up

> 10 Years

Omics

Assay
Genotyping by array
Platform
Other

Accessibility

Language

en

Alignment with standardised data models

DICOM, LOCAL, NHS DATA DICTIONARY, NHS SCOTLAND DATA DICTIONARY, NHS WALES DATA DICTIONARY

Controlled vocabulary

ICD10, ICD9, ICDO3, LOCAL

Format

application/json, text/csv, image/dicom, OTHER

Data Access Request

Dataset pipeline status

Available

Time to dataset access

2-6 months

Access request cost

Cost recovery varies according to project scope and requested services.

Access method category

Varies based on project

Access service description

Data are currently available to approved researchers via secure data transfer following execution of a Data Sharing Agreement. Approved extracts are delivered as CSV and JSON formats. To initiate a request, contact Generations.Scientific@icr.ac.uk with a description of the proposed research and the data or samples required. Requests are reviewed by the study Principal Investigators and Access Committee on scientific merit, feasibility, consistency with participant consent, and data governance requirements.

Jurisdiction

GB-ENG

Data use limitation

Research use only

Data use requirements

Project-specific restrictions,Return to database or resource,User-specific restriction,Time limit on use

Data Controller

The Institute of Cancer Research

Dataset Types: Health and disease

Dataset Sub-types: Cancer


Collection Sources: Cohort, study, trial, Home, Community, Patient report outcome, Wearables

Publications about this dataset

The Breakthrough Generations Study: design of a long-term UK cohort study to investigate breast canc...Swerdlow AJ, Jones ME, Schoemaker MJ, Hemming J, T...

British journal of cancer

Published - 2011

The Breakthrough Generations Study: design of a long-term UK cohort study to investigate breast canc...Swerdlow AJ, Jones ME, Schoemaker MJ, Hemming J, T...

British journal of cancer

Published - 2011