HDR Gateway logo
HDR Gateway logo

Using Cohort Discovery

Using Cohort Discovery


Documentation

This is a quick guide to get you started using Cohort Discovery on the Gateway. The full guide can be downloaded here.

Log In

To access Cohort Discovery, go to the Health Data Research UK website (the Gateway):

Gateway Homepage

From here, either hover over and click on the ‘Cohort Discovery’ button in the middle:

Gateway homepage with Cohort Discovery hover over

Or you can click on the ‘search’ button at the top to see a menu where you can click on Cohort Discovery (circled in red):

Gateway homepage with search drop down

Clicking on Cohort Discovery using either way will take you to the Gateway landing page for Cohort Discovery:

Gateway landing page for Cohort Discovery

From here you can click on the button to access Cohort Discovery. First time users will be asked to register for a Gateway account, and then request additional permissions to access Cohort Discovery, just follow the on-screen instructions on the Gateway landing page for Cohort Discovery.

Landing Page

When you log into Cohort Discovery, this is the landing page you will see (you will not see the same symbols in the left hand menu, but you won’t need these if you are not an administrator).

Cohort Discovery landing page showing collections

The ‘Collections’ tab gives you an overview of all data collections that are available to you, and the variables available for querying.

Click on the ‘History’ tab to see all of your previous queries, including the query name, when it started, when it executed, it’s status, owner and whether it has been saved or not. In this example, ‘Query A’ started running at 10:37am and finished at 10:41am:

Cohort Discovery landing page with history

The ellipsis symbols under Actions enables you to edit, rerun or remove a query.

New queries are created by hitting the ‘Create New Query’ blue button on the top right-hand side. As you build and run new queries, they will appear in the history tab as a sortable list.

Create New Query

Click the ‘Create New Query’ blue button on the top right hand side of the page. On the new query page, you can start building your query:

Cohort Discovery new query

Here you can:

  • Choose the data collections you want to run the query against (a ‘collection’ is the equivalent of a ‘dataset’).
  • Rename the query (to something that reflects your search, like ‘Asthma’ or ‘NSAIDS’, this will be displayed in the history on the landing page).
  • Add a filter to your query. This brings up the query parameters to help you find the information you’re interested in.

Add Filter

The blue ‘Add filter’ button brings up the query builder:

Cohort Discovery new query add filter

The query builder lets you search for condition, procedure, medication, measurement and observation details, as well as the gender, age or racial elements recorded on the health record. These are all terms from the OMOP common data model, which has been mapped to the vocabularies of source data (e.g., ICD10, SNOMED, READ codes etc.). The first set of numbers (in red highlight) indicate the number of terms that exist in each section, while the second set of numbers (in grey highlight) indicate the number of data collections that contain a term (or set of terms).

Search terms can be found by using the dropdown menu’s or by entering them into the search bar. In this example, the term ‘asthma’ is used in the search bar, and the system has returned all condition and observation records that have the term asthma in the OMOP concept description.

Cohort Discovery new query add filter with asthma in search bar

Note that the search bar also accepts clinical codes as a search term, whether that is the code from the source data vocabulary (e.g., the ICD-10 code ‘J45’ for asthma) or the mapped OMOP concept code (e.g., the OMOP code ‘317009’ for asthma). Entering either ‘J45’ or ‘317009’ will return ‘asthma’, which you can then add to your query.

Select a search term to add it to your query:

Cohort Discovery add filter select asthma to add it to a new query group

Clicking ‘Add to new group’ will add this selection to Group 1 in your query (circled in green):

Cohort Discovery add filter select asthma added to group 1

A group can contain any number of parameters, although they must all have the same union operator (i.e., ‘AND’ or ‘OR’) within the group. There can be up to three groups per query (see ‘Question Groups’ below).

By clicking on the search term ‘Asthma’ in Group 1 in the query above (circled in green), you can see all source vocabulary terms that have been mapped to the OMOP concept being queried:

Cohort Discovery modal with asthma OMOP mappings

In this example, the OMOP code for Asthma is ‘317009’ and this mapped to the ICD10 code ‘J45’ in at least one of the source data collections and SNOMED code ‘155574008’ in another, and so on.

Click away from the term mapping and then the ‘play’ button to run the query. The query will be run against all selected OMOP data collections, each of which is hosted by the Data Custodian of that collection in their secure network areas.

The results of your query will be displayed after the time it takes for the federated query to run (a minute or two, depending on the number of collections you are running your query against, and the complexity of your query).

Query Results

After your query has run, you will see a count of patients for each data collection that has data relevant to the search term. Simple queries typically take about 60 seconds to run. You can create new queries and run them immediately after you have launched a previous query, you do not need to wait for the previous query to resolve (queries are queued and processed in the order they are created).

Count Details

The count details results table shows the collection name, external URL, total, status (of the currently running query against each data collection) and count relative to all results.

Cohort Discovery query results with count details

In this example, the query has returned total counts for four synthetic data collections that have different levels of low number suppression and rounding applied. The real count is 284 (‘Synthetic Data – No Obfuscation’), while the result returned for ‘Synthetic Data – Min 150 and rounding’ has been rounded up to 290, because rounding has been applied to that data collection.

Note that for this guide and any related demonstration videos published on the HDR UK website, we use synthetic datasets that contain artificial information. We do not demonstrate the live system which is only accessible to validated bona fide researchers.

Disclosure Control

Data custodians who make their datasets discoverable on Cohort Discovery implement several obfuscation processes to ensure that individuals cannot be identified:

  • Low number suppression – counts of less than 10 are returned as 0.
  • Rounding – all counts are rounded up to the nearest 10.

These are both configurable by the data custodian. In addition to the rounding of results, it is not possible to query the ID of an individual person, and all OMOP data has been pseudonymised. This means that all identifiable information like names and addresses have been removed, and all potentially identifiable information, such as date of birth, has been pseudonymized to a level that protects privacy (e.g., month and day of birth are removed, while year of birth is retained to calculate age at a given event).

Age in Years and Sex

Where a data collection contains age and sex information in the underlying OMOP data, the age and sex distribution will also be shown:

Cohort Discovery query results with age and sex distributions

Query Complexity

Question Groups

You can define up to 3 different question groups, and you can have as many questions in one group as you need. All query terms inside one group are combined by default using the ‘AND’ operator. The default operator combining many groups together is ‘OR’.

You can change the AND/OR logical operator within a group by clicking the operator name ‘AND’ or ‘OR’. Toggling this will also toggle the logical operator combining the many groups, meaning you will always have one of the logical situations in this image:

Cohort Discovery question group logical operator situations

Hypothetical illustration of inter- and intra-group logical operators. The first row shows the default situations with Group operators, and the second row shows the situation after the intra-group operator has been toggled to ‘OR’. Your query may look different to these, but will generally conform to this logical operator structure.

Adding terms and groups

After locating a suitable search term, click the term to open the term-type specific options. Clicking ‘Add to new group’ creates a new question group in your query, and adds the term to that group. Any subsequent additions will give an option to add any term to a completely new question group, or to an existing one.  

CD adding first term asthma to new query group

Image: add the first term to a new group in the query.

Cohort Discovery adding second term to a new group or an existing group

Image: add another term either to a new question group, or to an existing one from the ellipsis icon.

Inclusions and Exclusions

Within one query Group you can have ‘Inclusion’ and ‘Exclusion’ logical modifiers to your search terms. For example, you may want to have a search that excludes all individuals who have used a certain medication, or have had a certain diagnosis, in which case you would ‘exclude’ hits for that query from your group.

By default, all questions within a group are added as inclusions, and you can change that by clicking the inclusion-exclusion toggle icon as follows (circled in green):

Cohort Discovery query inclusions and exclusions

Here our query will return all patients who have had asthma and those who also had acute viral pharyngitis will be excluded.

You cannot include or exclude a whole question group, only the individual questions within a group.

Secondary Modifiers

Some question types accept age and time modifiers to control the time scale for possible answers. You can define either an age or a time modifier to your question, but not both. Age and time modifiers must be applied to the event and not the person. Age and time modifiers will not work when applied to the person.

Cohort Discovery secondary modifiers table

The modifier can be added to the query either during the selection of the query term (in the Query Builder) or once the term has been added to the Group by clicking the ellipsis button for the Group and selecting ‘Edit’, like so:

Cohort Discovery secondary modifiers for two queries in a group

Here the age modifiers are set for each question within the Group.

Genomic Queries

Cohort Discovery is actively working with the community to develop genomic querying capability. When ready, authorised users will be able to query genomics data using the ‘Select genes’ box below (highlighted in the green circle): 

Cohort Discovery genomics queries coming soon

Users not interested in genomics data can ignore this box. Users who are interested in this can stay tuned for further developments.

 


Still can’t find what you’re looking for?

The quickest way to get your issue solved is through the links above, but if you aren’t able to find a solution then contact us here: