Bookmarks
CCU037: Improving methods to minimise bias in ethnicity data for more representative and generalisable models, using CVD in COVID-19 as an example
Safe People
Organisation name
University of Oxford
Organisation sector
Academic Institute
Applicant name(s)
Sara Khalid
Sub-licence arrangements (if any)?
No
Safe Projects
Project ID
CCU037
Lay summary
Inequality in health has been made worse by the COVID-19 pandemic. People from minority ethnic backgrounds are more likely to become very sick or die from COVID-19. An example of inequality in health is technology for predicting a person’s future health risks. This involves routinely collected health information which is put into a computer model and then a health risk score for a patient is given. Doctors can use this to decide patient care. If there is bias in the data or bias in the model, the doctor can potentially make wrong decisions and patients can get the wrong care or no care. This could result in some groups of patients being incorrectly prioritised over others for booster vaccines, hospital beds, or life-saving treatments. This might affect patient and public trust, as well as cost the NHS. We are aiming to improve existing technology for predicting personalised future risk of health conditions, particularly those affecting overlooked groups of patients. We aim to do so by: a) improving the way recorded ethnicity is used in research, and b) improving the modelling process to build risk prediction models designed specifically to ethnicity groups and therefore more reliable.
Public benefit statement
We know that there are ethnicity biases for cardiovascular disease in COVID-19 patients. We are developing a calculator to predict cardiovascular disease in COVID-19 patients. We will use this as a first example and will then be able to use this approach across other health and disease areas. The calculator can be used by public to guide lifestyle choices, and by doctors to provide better care. This can also be used by researchers nationwide doing health research involving ethnicity. This work will be based on health information that represents almost everyone currently living in England and Wales, without being traced back to them. By extending to Northern Ireland and Scotland in future, we hope that this work will help to make health equal and fair for everyone in the UK. Visit the BHF Data Science Centre website for more detailed information about project outputs. https://bhfdatasciencecentre.org/projects/ccu037/
Technical summary
This project accessed the following datasets within the Trusted Research Environment(s) for CVD-COVID-UK / COVID-IMPACT: - ENGLAND: - Civil Registration - Deaths - COVID-19 SARI-Watch (formerly CHESS) - Covid-19 Second Generation Surveillance System - Covid-19 UK Non-hospital Antibody Testing Results - Covid-19 UK Non-hospital Antigen Testing Results - COVID-19 Vaccination Adverse Reaction - COVID-19 Vaccination Status - Emergency Care Data Set (ECDS) - GPES Data for Pandemic Planning and Research (COVID-19) - Hospital Episode Statistics Accident and Emergency - Hospital Episode Statistics Admitted Patient Care - Hospital Episode Statistics Critical Care - Hospital Episode Statistics Outpatients - ICNARC: Intensive Care National Audit and Research Centre - Medicines dispensed in Primary Care (NHSBSA data) - NICOR – MINAP: Myocardial Ischaemia National Audit Project - Secondary Care Prescribed Medicines (EPMA) - Secondary Uses Services Payment By Results - Uncurated Low Latency Hospital Data (Admitted Patient Care, Outpatients, Critical Care) - WALES: - Annual District Death Daily (ADDD) - Annual District Death Extract (ADDE) - Care Home Dataset (CARE) - Covid Vaccination Dataset (CVVD) - COVID-19 Consolidated Deaths (CDDS) - COVID-19 Shielded People List (CVSP) - COVID-19 Test Results (PATD) - Critical Care Dataset (CCDS) - Emergency Department Dataset (EDDS) - Emergency Department Dataset Daily (EDDD) - Intensive Care National Audit and Research Centre (ICCD) - Legacy - COVID only - Intensive Care National Audit and Research Centre (ICNC) - ONS 2011 Census Wales (CENW) - Outpatient Database for Wales (OPDW) - Outpatient Referral (OPRD) - Patient Episode Dataset for Wales (PEDW) - Welsh Ambulance Services NHS Trust (WASD) - Welsh Demographic Service Dataset (WDSD) - Welsh Dispensing Dataset (WDDS) - Legacy - Welsh Longitudinal General Practice Dataset (WLGP) - Welsh Primary Care - Welsh Longitudinal GP Dataset - Welsh Primary Care (Daily COVID codes only) (GPCD)
Latest approval date
18/11/2021
Safe Data
Dataset(s) name
Data sensitivity level
De-Personalised
Safe Setting
Access type
TRE