HDR Gateway logo
HDR Gateway logo

Bookmarks

UK Biobank

Population Size

500,000

People

Years

2006

Associated BioSamples

Serum

Plasma

...see more

Geographic coverage

United Kingdom

Lead time

Not applicable

Summary

UK Biobank is a large-scale biomedical database and research resource that provides researchers access to detailed longitudinal phenotype, medical and genetic data from 500,000 volunteer participants.

Documentation

UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database, which is regularly augmented with additional data, is globally accessible to approved researchers and scientists undertaking vital research into the most common and life-threatening diseases. UK Biobank’s research resource is a major contributor to the advancement of modern medicine and treatment and has enabled several scientific discoveries that improve human health.

Since 2006, UK Biobank has collected an unprecedented amount of biological and medical data on half a million people, aged between 40 and 69 years old and living in the UK, as part of a large-scale prospective study. With their consent they regularly provide blood, urine and saliva samples, as well as detailed information about their lifestyle which is then linked to their health-related records to provide a deeper understanding of how individuals experience diseases. Genotyping, whole exome sequencing and whole genome sequencing is available for the whole cohort. Blood and urine biomarkers, telomere data, metabolomic and proteomic data and infectious disease markers have been assayed from the samples provided.

Since 2014 we have been undertaking the largest imaging study to date. We aim to undertake brain, cardiac and neck to knee MRI, whole body DXA and carotid ultrasound of 100,000 participants. We additionally have retinal images for 100,000 participants from baseline assessment, and accelerometer data for 100,000 participants collected 2013-2014.

Questionnaires that aim to capture data that is not readily captured by health data linkages are regularly sent to our participants.

The data – the largest and richest dataset of its kind – is de-identified and made widely accessible by UK Biobank to registered researchers around the world who use it to make new scientific discoveries about common and life-threatening diseases – such as cancer, heart disease and stroke – in order to improve public health.

Dataset type
Health and disease
Dataset sub-type
Not applicable
Dataset population size
500000

Keywords

UK BIOBANK, Genomics, Exome sequencing, WGS, Omics, Pain, Research, Cognitive Measures, Physical Measures, Magnetic resonance imaging, DXA, ECG, Accelerometer, Mental Health, Environment, Primary Care, COVID-19, Hospital episode statistics, Cancer Registry, Deaths, Sociodemographics, Digestive Health, Occupational Health, Biomarkers, Lifestyle, Health Data, Cardiac MRI, Brain MRI, Abdominal MRI, Carotid Ultrasound, Diet, Pain Hub

Observations

Observed Node
Disambiguating Description
Measured Value
Measured Property
Observation Date

Persons

Each participant has a large number (<5000) of data points associated with them. Recruitment started in 2006, but data collection is ongoing, and health data predates recruitment date. Summary statistics of all data can be found on our data showcase.

500000

Count

13 Mar 2006

Provenance

Purpose of dataset collection
Study
Collection source setting
Primary care - Clinic, Secondary care - Accident and Emergency, Secondary care - In-patients, Community, Clinic, Prescribing - Community pharmacy
Patient pathway description
UK Biobank is a volunteer based cohort. As such, there is a healthy volunteer effect that results in participants tending to be of higher socioeconomic status, remaining in education longer, slimmer, less smokers (although those that smoke tend to be heavier smokers) and lower consumers of alcohol than the general population. A comparison between UK Biobank participants and the general UK population has been published (https://doi.org/10.1093/aje/kwx246).

Whilst selection biases are seen in UK Biobank

, there is still substantial heterogeneity within the cohort. Whilst incidence and prevalence calculations are not generalisable to the UK population, exposure-outcome comparisons should be due to the heterogeneity in the cohort. However, it is important that researchers consider the potential biases of a data set that might limit generalisability of their results (as is the case for all observational data).
Image contrast
Not stated
Biological sample availability
Serum,Plasma,Whole blood,Saliva,Urine

Structural Metadata

Details

Publishing frequency
Continuous
Version
2.0.0
Modified

08/10/2024

Citation Requirements
UK Biobank

Coverage

Start date

13/03/2006

Time lag
Variable
Geographic coverage
United Kingdom
Minimum age range
40
Maximum age range
69
Follow-up
Continuous

Accessibility

Language
en
Controlled vocabulary
LOCAL, OPCS4, READ, SNOMED CT, DM+D, ICD10, ICD9
Format
Text/csv, dta, SAS, R, Image/ DICOM, NIFTI, PNG, Other/ VCF, CRAM, PLINK, BGEN, BED, CWA

Data Access Request

Dataset pipeline status
Not available
Time to dataset access
Not applicable
Access method category
Varies based on project
Access service description

Applications to access data are made through our bespoke access management system (https://bbams.ndph.ox.ac.uk/ams/).

Data access is either via data download (phenotype and genotype data) or via our Research Analysis Platform (phenotype, imaging, genotype, WES, WGS, omics). Our RAP is enabled by DNANexus and hosted by Amazon Web Services (https://www.ukbiobank.ac.uk/enable-your-research/research-analysis-platform).

Access costs depend on what data access is required.

Jurisdiction
GB-ENG
Data use limitation
General research use
Data use requirements
Institution-specific restrictions,Project-specific restrictions,Publication required,Return to database or resource,User-specific restriction,Time limit on use
Data Controller
UK Biobank
Data Processor
UK Biobank

Dataset Types: Health and disease


Collection Sources: Primary care - Clinic, Secondary care - Accident and Emergency, Secondary care - In-patients, Community, Clinic, Prescribing - Community pharmacy