HDR Gateway logo
HDR Gateway logo

Bookmarks

Serpuk / Sail Databank

SeRPUK / SAIL Databank

Description

SAIL (saildatabank.com) operates on SeRPUK (serp.ac.uk) as one of many tenancies. SeRP provides appropriate technology stacks in line with the tenancy requirements to support research projects executed within ownership and governance tenancies. SeRP currently has 26 UK tenancies ranging from smaller research groups, specialist environments (e.g. UKCRIS), bespoke software supporting the research programmes processes and governance models (e.g. DPUK), sharing platforms to well established programmes (e.g. ALSPAC), NHS collaboration space (e.g. NDR), Government sharing (e.g. Welsh Government), none health (e.g. ADR, Family Justice) to the long-established SAIL Databank which is running a large research programme supporting hundreds of users and projects.

Active Users: 530 | Active Projects: 275


SAFE People - Login & Access

• All users must provide a CV: Evidence of research career via CV, belong to recognised Institution and completed data safe research training or equivalent.

✓ Login: Self-registration, login to Virtual Desktop Infrastructure (VDI) via SSO and 2FA

✓ Minimum Requirement: No minimum requirement

✓ International Access: International access allowed


SAFE Settings - Compute & Services

✓ Hybrid Cloud Environment

✓ Windows 7 (extended support), Windows 10, Server 2019, Debian and Ubuntu.

✓ Options for Windows 10 VDI - 4C/16GB – 8C/32GB – 16C/128GB -128C/1.5TB (Limited), SSD/Flash Array, Linux VDI, Linux Clusters – Bespoke or K8S, SLURM, SPARK, Jupyter GPU are available in Jupyter and SLURMHPC through SLURM Virtualisation through VMware or Openstack Storage through IBM all flash SAN's, Dell traditional SAN's to Redhat CEPH 1.6PB 3 Tiered storage - block, ISCSI, object Database provide IBM DB2 cluster, MS SQL Server Cluster, Postgres, Elastic search, SPARK SQL, Redis, MYSQL

✓ Ability to modify OS on request depending on governance.

✓ Managed Data analytics capabilities: K8s, Airflow, Spark, SLURM, Jupyter

✗ No federated queries (WIP – PoC developed as part of previous projects)

✗ No federated analytics (WIP – Pock developed as part of previous projects)


SAFE Settings - Security Certifications and Measures

✓ Security Certifications: ISO 27001, NHS Toolkit, DEA Accreditations

✓ Security Measures: Everything needed to gain ISO, 2FA, multi vender firewalls, DES. Account lock, SEIM (logging), Audit, Review Process, Monitoring etc.

✓ No VM direct access, Access only through VDI only.

✓ No VM access control (no USB, copy/paste, internet access whitelisted or internal mirrors)


SAFE Settings - Software access

✓ Default software: Office, R, Python, Conda, SPSS, STATA, SAS, Jupter Notebook, eclipse, VS Code, DB tooling

✓ Code/library import: Whitelisted package libraries for R, Python Conda, SAS, STATA, SPSS. Extra packages can be added by technical team on request. AV/Malware, SEIM and N/W monitoring

✓ Collaboration Software: Git, Wiki, Confluence, Shared File Store, Shared DB


SAFE Data - Data Access Mechanisms

✓ ✓ Data Provisioning: Scoped and minimized data provided as tables in a RDBMS with access to a filestore, git etc.

✓ Reduce re-identification risk by: pseudonymization, minimizing dataset, encryption and encryption of linkage keys, small number output suppression

✓ Receive Data: Ability to receive data on-demand

  • ✓ Linked Data: External data linked via data linkage mechanism

  • ✓ Sensitive Data: Some datasets with sensitive data are held centrally in an un-linkable manner

  • ✓ Open Data: Large collection of open data available

✓ Record Linkage: Deterministic, probabilistic linkage via 3rd party (can be customised depending on requirements)


SAFE Outputs - Data Output/export

✓ Individual level data can be exported. everything must go through human review and conform to the governance model in place. No disclosure is permitted from SAIL. Code and Methods can be exported.

✓ Export plans: Same as above, so long as approved or approvable against a governance model then aggregate, individual data - EHR, images, genomics, etc can be taken out.

✓ Data transmit to other SAFE Settings: we currently provide data feeds to Biobank for Welsh participants we also provide onward sharing of the JoinZoe Covid mobile app data to Scotland and NHS Digital.

✓ Statistical disclosure control process in place

Datasets & BioSamples (64)

Wales Asthma Observatory
Dataset population size: Unknown
Health and disease
Welsh Dispensing Dataset (WDDS) - Legacy
Dataset population size: Unknown
Health and disease
Suspected Cancer Pathway Monthly (SCPM)
Dataset population size: 3,000,000
Health and disease
SAIL Dementia e-Cohort (SDEC)
Dataset population size: Unknown
Health and disease
Welsh Results Reports Service (WRRS)
Dataset population size: Unknown
Health and disease
Pathology Data from WRRS (PATH) - Legacy
Dataset population size: Unknown
Health and disease

Data Uses (48)

Publications (27)

The international Perinatal Outcomes in the Pandemic (iPOP) study: protocol.
Stock SJ, Zoega H, Brockway M, Mulholland RH, Miller JE, Been JV, Wood R, Abok II, Alshaikh B, Ayede AI, Bacchini F, Bhutta ZA, Brew BK, Brook J, Calvert C, Campbell-Yeo M, Chan D, Chirombo J, Connor KL, Daly M, Einarsdóttir K, Fantasia I, Franklin M, Fraser A, Håberg SE, Hui L, Huicho L, Magnus MC, Morris AD, Nagy-Bonnard L, Nassar N, Nyadanu SD, Iyabode Olabisi D, Palmer KR, Pedersen LH, Pereira G, Racine-Poon A, Ranger M, Rihs T, Saner C, Sheikh A, Swift EM, Tooke L, Urquia ML, Whitehead C, Yilgwan C, Rodriguez N, Burgner D, Azad MB, iPOP Study Team.
2021
Illness characteristics of COVID-19 in children infected with the SARS-CoV-2 Delta variant
Molteni E, Sudre CH, Canas LS, Bhopal SS, Hughes RC, Chen L, Deng J, Murray B, Kerfoot E, Antonelli M, Graham M, Kläser K, May A, Hu C, Pujol JC, Wolf J, Hammers A, Spector TD, Ourselin S, Modat M, Steves CJ, Absoud M, Duncan EL.
2021