Bookmarks
Open'SAFELY
Description
OpenSAFELY is a secure, transparent, open-source software platform for analysis of electronic health records data. All platform activity is publicly logged. All code for data management and analysis is shared, under open licenses and by default, for scientific review and efficient re-use. OpenSAFELY is a set of best practices encoded as software. It can be deployed to create a Trusted Research Environment (TRE) alongside appropriate database, compute, governance, and administrative elements
Active Users: 100 | Active Projects: 75
SAFE People - Login & Access
User accounts are provisioned following an exhaustive, online, per-project approval process. ONS Safe Researcher Training is requested of all users.
✓ Login:
✗ Minimum requirement: No.
✗ International access: Yes
SAFE Settings - Security Certifications and Measures
✓ Security certifications: Held in either a Tier 3 data centre, accredited to NHS Digital standards for centrally hosted clinical systems (ISO 27001 standard and IG Toolkit version 2) or a virtual private cloud environment certified to Cyber Essentials Plus, ISO 9001; ISO 27001 global certification; ISO 27017; ISO 27018., IOS 270001 Certification
✓ Security measures: A. The secure environment for level 3 and 4 is accessed via the EHR vendor's existing protocols. There are processes for locking down individual accounts or all accounts. All job execution can be terminated with a single command in an emergency. B. All source data for implementations in OpenSAFELY-TPP and OpenSAFELY-EMIS is stored by TPP or EMIS and backed up according to their standard operating procedures. C. Users are prevented from directly interacting with patient-level data; all data management and analysis code is logged; there is double review of released outputs.
✓ Secured operating system: Hybrid: Data may be held in those environments on premise (as with OpenSAFELY-TPP) or cloud (as with OpenSAFELY-EMIS) . OpenSAFELY infrastructure operates on public cloud.
✓ VM direct access: SSH Access is limited by design, to enforce our security principles. RDP/VNC access is provided to authorised users
✓ VM access controls: Identical measures to those employed to access the environment overall.
SAFE Settings - Software access
✓ Default software: R, Python, and Stata: https://docs.opensafely.org/actions-scripts/#execution-environments
✓ Code/library import: Users may request the addition of R, Python, or Stata packages to the appropriate docker images, which may be fulfilled by OpenSAFELY staff following review.
✓ Collaboration software: Users develop code on GitHub from which it is submitted for execution in an OpenSAFELY environment. Code collaboration is managed via the OpenSAFELY GitHub organisation. Collaboration on outputs is managed via group-level permissions on the "Job Server", where reviewed outputs are published.
✗ No software installation by users is permitted.
SAFE Data - Data Access Mechanisms
✓ Data Provisioning: The datasets are not provisioned into an interactive remote desktop environment, or similar. Users define their own data management pipeline for curating raw EHR and similar data into a one-row-per-patient dataset using OpenSAFELY data curation tools.
✓ Receive Data: New datasets are routinely received and linked following whatever secure mechanisms are supported by other settings, for example MESH or SFTP
- Linked Data: ✓ On request, and following a feasibility study, we work with the third party to understand the data schema, and with the data processor to link and ingest the data regularly into their underlying database. We provide access to this data by extending our fully-documented data API."
✓ Record Linkage: Usually deterministic exact matching via cryptographic hash of NHS number between primary care EHRs and supplementary data sets; or similar approaches for organisations or geographical areas.
✓ Reduce re-identification risk by: Data is pseudonymised at source by the data processor; analysts develop curation and analysis code using randomly generated dummy data before submitting it to be executed against real data inside a secure environment; all code is logged in full and shared in public at the point of results dissemination or sooner; statistical disclosure controls are applied.
SAFE Outputs - Data Output/export
✓ Tabular data, plain text, HTML, PDF, vector and raster images.
✗ Export plans: No
✓ Data transmit to other SAFE Settings: Raw data cannot be transferred to other Safe Settings, however, aggregate data can be transferred on request.
✓ All outputs are independently checked by two members of staff, using the framework developed and used by ONS and others. We are developing automated output checking tools for some disclosure control tasks.