HDR UK Gateway
HDR Gateway logo

Bookmarks

University College London Hospitals NHS OMOP dataset

Population Size

1,200,000

People

Population Size statistic card

Years

2019

Years statistic card

Associated BioSamples

None/not available

Associated BioSamples statistic card

Geographic coverage

United Kingdom

Geographic coverage statistic card

Lead time

Data only
Lead time statistic card

Summary

Standardised anonymised healthcare data in the OMOP Common Data Model (CDM). Includes data from 2019-04-01 to 2025-05-01 and is intended to support observational research and analytics.

Documentation

UCLH has an OMOP extraction system (omop_es) that connects our Electronic Health Record (EHR) to an architecture that delivers high quality, standardised extracts meeting the OMOP CDM standards. Our EHR contains records for 6 million patients, 13 million diagnoses and 50 million medication events. These derive from the UCLH patient population which includes national referrals for tertiary and quaternary services (cancer, neurology etc.) and general medical admissions from an inner city teaching hospital that treats >1m outpatients per year, and has >100k inpatient admissions.

UCLH has invested efforts and expertise to align international terminology systems e.g. SNOMED CT, LOINC, UCUM with NHS data standards, during EHR system build and post implementation. Our standardisation work has covered clinical domains i.e. Diagnosis and past medical history, Surgical and Ambulatory procedures, Diagnostic Imaging, Cardiac Echo, Lab Medicine including Biochemistry, Haematology, Microbiology, Immunology, Virology, Allergens, Medications (including route of administration); and Demographic information like Religion, Ethnicity. For some domains (e.g. diagnosis and surgical procedures) we have achieved 100% standardisation, others are an ongoing task.

Our data pipeline, the OMOP-Extraction System (OMOP-ES) is a modular, re-usable architecture written in over 20,000 lines of R. Extractions proceed through four stages.

  1. Standardisation - translates source data to OMOP concepts at full fidelity
  2. Projection - applies rules to redact, filter, transform & link
  3. Post-processing - allows linking of de-identified non-OMOP data
  4. Output - multiple formats & destinations incl. CSV, Parquet or SQLite for direct use or import in a TRE

The system is ● configurable to a variety of OMOP projects via a settings file ● reproducible and automated ● queries EPIC EHR and other sources ● automates filtering of sensitive data with safe defaults and ability for Information Governance teams to inspect settings before & after running ● tests and reports quality of standardisation ● being extended both by the 'core' team and by other trusts in an inner source fashion ● has a small mock database for system development and testing

Dataset type

Health and disease, Treatments/Interventions, Measurements/Tests, Imaging types, Omics, Socioeconomic

Dataset sub-type

Not applicable

Dataset population size

1200000

Associated media

Keywords

Observations

Observed Node

Disambiguating Description

Measured Value

Measured Property

Observation Date

Persons

1200000

count

30 Apr 2025

Provenance

Purpose of dataset collection

Care, Administrative

Source of data extraction

EPR

Collection source setting

Secondary care - Accident and Emergency, Secondary care - Outpatients, Secondary care - In-patients, Secondary care - Ambulance, Secondary care - ICU

Image contrast

Not stated

Biological sample availability

None/not available

Details

Publishing frequency

Irregular

Version

1.0.0

Modified

27/05/2025

Coverage

Start date

01/04/2019

Time lag

Variable

Geographic coverage

United Kingdom

Maximum age range

110

Accessibility

Language

en

Alignment with standardised data models

OMOP

Controlled vocabulary

OPCS4, SNOMED CT, DM+D, LOINC, ICD10, RXNORM, RXNORM EXTENSION

Format

parquet, csv

Data Access Request

Dataset pipeline status

Available

Jurisdiction

UK

Data use limitation

General research use

Data use requirements

Ethics approval required

Data Controller

University College London Hospital (UCLH)

Data Processor

University College London Hospital (UCLH)

Dataset Types: Health and disease, Treatments/Interventions, Measurements/Tests, Imaging types, Omics, Socioeconomic


Collection Sources: Secondary care - Accident and Emergency, Secondary care - Outpatients, Secondary care - In-patients, Secondary care - Ambulance, Secondary care - ICU