Bookmarks
SeRPUK / SAIL Databank
Description
SAIL (saildatabank.com) operates on SeRPUK (serp.ac.uk) as one of many tenancies. SeRP provides appropriate technology stacks in line with the tenancy requirements to support research projects executed within ownership and governance tenancies. SeRP currently has 26 UK tenancies ranging from smaller research groups, specialist environments (e.g. UKCRIS), bespoke software supporting the research programmes processes and governance models (e.g. DPUK), sharing platforms to well established programmes (e.g. ALSPAC), NHS collaboration space (e.g. NDR), Government sharing (e.g. Welsh Government), none health (e.g. ADR, Family Justice) to the long-established SAIL Databank which is running a large research programme supporting hundreds of users and projects.
Active Users: 530 | Active Projects: 275
SAFE People - Login & Access
• All users must provide a CV: Evidence of research career via CV, belong to recognised Institution and completed data safe research training or equivalent.
✓ Login: Self-registration, login to Virtual Desktop Infrastructure (VDI) via SSO and 2FA
✓ Minimum Requirement: No minimum requirement
✓ International Access: International access allowed
SAFE Settings - Compute & Services
✓ Hybrid Cloud Environment
✓ Windows 7 (extended support), Windows 10, Server 2019, Debian and Ubuntu.
✓ Options for Windows 10 VDI - 4C/16GB – 8C/32GB – 16C/128GB -128C/1.5TB (Limited), SSD/Flash Array, Linux VDI, Linux Clusters – Bespoke or K8S, SLURM, SPARK, Jupyter GPU are available in Jupyter and SLURMHPC through SLURM Virtualisation through VMware or Openstack Storage through IBM all flash SAN's, Dell traditional SAN's to Redhat CEPH 1.6PB 3 Tiered storage - block, ISCSI, object Database provide IBM DB2 cluster, MS SQL Server Cluster, Postgres, Elastic search, SPARK SQL, Redis, MYSQL
✓ Ability to modify OS on request depending on governance.
✓ Managed Data analytics capabilities: K8s, Airflow, Spark, SLURM, Jupyter
✗ No federated queries (WIP – PoC developed as part of previous projects)
✗ No federated analytics (WIP – Pock developed as part of previous projects)
SAFE Settings - Security Certifications and Measures
✓ Security Certifications: ISO 27001, NHS Toolkit, DEA Accreditations
✓ Security Measures: Everything needed to gain ISO, 2FA, multi vender firewalls, DES. Account lock, SEIM (logging), Audit, Review Process, Monitoring etc.
✓ No VM direct access, Access only through VDI only.
✓ No VM access control (no USB, copy/paste, internet access whitelisted or internal mirrors)
SAFE Settings - Software access
✓ Default software: Office, R, Python, Conda, SPSS, STATA, SAS, Jupter Notebook, eclipse, VS Code, DB tooling
✓ Code/library import: Whitelisted package libraries for R, Python Conda, SAS, STATA, SPSS. Extra packages can be added by technical team on request. AV/Malware, SEIM and N/W monitoring
✓ Collaboration Software: Git, Wiki, Confluence, Shared File Store, Shared DB
SAFE Data - Data Access Mechanisms
✓ ✓ Data Provisioning: Scoped and minimized data provided as tables in a RDBMS with access to a filestore, git etc.
✓ Reduce re-identification risk by: pseudonymization, minimizing dataset, encryption and encryption of linkage keys, small number output suppression
✓ Receive Data: Ability to receive data on-demand
✓ Linked Data: External data linked via data linkage mechanism
✓ Sensitive Data: Some datasets with sensitive data are held centrally in an un-linkable manner
✓ Open Data: Large collection of open data available
✓ Record Linkage: Deterministic, probabilistic linkage via 3rd party (can be customised depending on requirements)
SAFE Outputs - Data Output/export
✓ Individual level data can be exported. everything must go through human review and conform to the governance model in place. No disclosure is permitted from SAIL. Code and Methods can be exported.
✓ Export plans: Same as above, so long as approved or approvable against a governance model then aggregate, individual data - EHR, images, genomics, etc can be taken out.
✓ Data transmit to other SAFE Settings: we currently provide data feeds to Biobank for Welsh participants we also provide onward sharing of the JoinZoe Covid mobile app data to Scotland and NHS Digital.
✓ Statistical disclosure control process in place