Bookmarks
Synthetic Dataset of Hospital Admissions for Patients with Type 1 and 2 Diabetes
Population Size
159,800
People
Years
2004 - 2022
Associated BioSamples
None/not available
Geographic coverage
https://www.geonames.org/2634343/west-midlands.html
Lead time
1-2 months
Summary
Documentation
Type 1 Diabetes is an autoimmune disease impacting on insulin production. Type 2 Diabetes is caused by insulin resistance. Both are chronic conditions associated with serious complications such as heart disease, kidney failure, vision loss, and neuropathy. In the UK, 10% of the NHS budget is spent on managing diabetes. The demand for care is rising, with an increasing number of acute hospital admissions.
This highly granular synthetic dataset represents approximately 159,800 diabetes patients acutely admitted between 2004 and 2022. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.
The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. The real data was read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.
Geography: This synthetic dataset is based on patient data from the West Midlands. The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build different synthetic data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Keywords
Observations
Observed Node | Disambiguating Description | Measured Value | Measured Property | Observation Date |
---|---|---|---|---|
Persons | 159,800 spells for patients with Diabetes between 01/2004 and 06/2022 | 159800 | Count | 06 Nov 2024 |
Provenance
Structural Metadata
Details
02/12/2024
03/12/2024
Coverage
01/01/2004
01/06/2022
Accessibility
Data Access Request
Trusted Research Environments (TRE) are built using Microsoft Azure services and hosted in the UK to provide research teams a safe, secure and agile environment which allows users to quickly analyse, interpret and form an enriched view of primary care information through a range of integrated datasets.
Health data collated from multiple sources is ingested into a secure data lake which will then allow subsets of data to be made available to research teams on approval of a data request. Once approved a customer specific TRE is made available with a standard set of leading analytical tools from Microsoft including Azure Databricks, Azure Machine Learning, Azure SQL and Azure Synapse (for large-scale data warehouses). Specific tools can be provided at an additional cost over the standard platform data access charge and the PIONEER team will work with you to determine your exact needs.
Access to the TRE is managed using the latest virtual desktop technology to provide a safe and secure end-user experience. By utilising leading edge design PIONEER are able to create TREs rapidly to enable us to service any customer requirement.