Dataset - Health Data Research Gateway

Bookmarks

University College London Hospitals NHS OMOP dataset

Population Size

1,200,000

People

Population Size statistic card

Years

2019

Years statistic card

Associated BioSamples

None/not available

Associated BioSamples statistic card

Geographic coverage

United Kingdom

Geographic coverage statistic card

Lead time

Data only

Lead time statistic card

Dataset

Publications using this dataset (external search)

Collection(s)

Data Custodian

Summary

Documentation

UCLH has an OMOP extraction system (omop_es) that connects our Electronic Health Record (EHR) to an architecture that delivers high quality, standardised extracts meeting the OMOP CDM standards. Our EHR contains records for 6 million patients, 13 million diagnoses and 50 million medication events. These derive from the UCLH patient population which includes national referrals for tertiary and quaternary services (cancer, neurology etc.) and general medical admissions from an inner city teaching hospital that treats >1m outpatients per year, and has >100k inpatient admissions.

UCLH has invested efforts and expertise to align international terminology systems e.g. SNOMED CT, LOINC, UCUM with NHS data standards, during EHR system build and post implementation. Our standardisation work has covered clinical domains i.e. Diagnosis and past medical history, Surgical and Ambulatory procedures, Diagnostic Imaging, Cardiac Echo, Lab Medicine including Biochemistry, Haematology, Microbiology, Immunology, Virology, Allergens, Medications (including route of administration); and Demographic information like Religion, Ethnicity. For some domains (e.g. diagnosis and surgical procedures) we have achieved 100% standardisation, others are an ongoing task.

Our data pipeline, the OMOP-Extraction System (OMOP-ES) is a modular, re-usable architecture written in over 20,000 lines of R. Extractions proceed through four stages.

Standardisation - translates source data to OMOP concepts at full fidelity
Projection - applies rules to redact, filter, transform & link
Post-processing - allows linking of de-identified non-OMOP data
Output - multiple formats & destinations incl. CSV, Parquet or SQLite for direct use or import in a TRE

The system is ● configurable to a variety of OMOP projects via a settings file ● reproducible and automated ● queries EPIC EHR and other sources ● automates filtering of sensitive data with safe defaults and ability for Information Governance teams to inspect settings before & after running ● tests and reports quality of standardisation ● being extended both by the 'core' team and by other trusts in an inner source fashion ● has a small mock database for system development and testing

Dataset type

Health and disease, Treatments/Interventions, Measurements/Tests, Imaging types, Omics, Socioeconomic

Dataset population size

Associated media

https://safehr-data.org/

Keywords

Observations

Observed Node	Disambiguating Description	Measured Value	Measured Property	Observation Date
Persons		1200000	count	30 Apr 2025

Provenance

Purpose of dataset collection

Source of data extraction

Collection source setting

, , , ,

Image contrast

Biological sample availability

Details

Publishing frequency

Version

Modified

27/05/2025

Coverage

Start date

01/04/2019

Time lag

Geographic coverage

Maximum age range

Accessibility

Language

Alignment with standardised data models

Controlled vocabulary

, , , , , ,

Format

Data Access Request

Dataset pipeline status

Access rights

https://safehr-data.org/

Jurisdiction

Data use limitation

Data use requirements

Data Controller

Data Processor

Dataset Types: Health and disease, Treatments/Interventions, Measurements/Tests, Imaging types, Omics, Socioeconomic

Collection Sources: Secondary care - Accident and Emergency, Secondary care - Outpatients, Secondary care - In-patients, Secondary care - Ambulance, Secondary care - ICU