Using Cohort Discovery

Documentation

1. What is Cohort Discovery?

Cohort Discovery is a service on the Health Data Research (HDR) Gateway that lets approved users rapidly determine the size of the population cohort that matches their research question within different datasets from across the UK, without having to directly contact each individual organisations that hold the data. It combines a natural-language query assistant with a step-by-step visual query builder, making it easier to scope a cohort before submitting a formal data access request.

Users can specify defined characteristics relevant to their proposed analysis (e.g. the number of female asthmatics under the age of 35) through the Cohort Discovery user interface. These search terms are then sent as a real-time query to multiple pseudonymised datasets across multiple Data Custodians, with results returned in the form of a numerical count of individuals that meet those specific criteria. Researchers can then understand whether a dataset contains a cohort (or group) of interest and if so, contact the Data Custodian to find out more or submit a Data Access Request.

Key benefits

Search across multiple UK datasets in a single automated query, without manually contacting each custodian
Use natural language or structured rules to define a cohort
Review anonymised cohort counts before requesting access
Manage and reuse previous queries through Query History
Makes subsequent Data Access Requests more specific, accelerating research process.
Saves significant time and effort for both the researcher and the Data Custodian

2. Registering for Cohort Discovery Service

Before requesting access to Cohort Discovery please ensure you have a valid Health Data Research Gateway account. Click “Sign In” from Health Data Research Gateway to create an account or sign in.

Access request

To request access to Cohort Discovery, once you have a valid Gateway account, go to https://healthdatagateway.org/en/about/cohort-discovery, click on ‘Access Cohort Discovery’ on the top-right of the page, and follow through to fill out the access request form. The Cohort Discovery Support team will provision your account and confirm when access is ready.

IMPORTANT: If your access has recently been approved, you might need to sign out and sign back in to your Gateway account to refresh your permissions before continuing.

3. Access Cohort Discovery

Sign in to the Health Data Research Gateway Health Data Research Gateway .
On the Gateway home page, select the Cohort Discovery tile.
On the About Cohort Discovery page, click on the blue ‘Access Cohort Discovery’ button to open the service. On the next popup, click on the green ‘Access Cohort Service (Beta)’ button to access the new site.
You will enter the Cohort Discovery workspace, where you can start a new query or review previous results.

4. Build your first query

The New Query workspace is the starting point for all Cohort Discovery searches. You can use natural-language input to get started quickly, or build a structured query using the ‘Insert’ tools in the left panel.

4.1 Open the New Query workspace

Select the New Query tab at the top of the Cohort Discovery screen.
Enter a name for your query in the Query Name field.
In the natural-language search box, type a brief description of the cohort you are looking for, for example: adults with diabetes and metformin.
Cohort Discovery will interpret your input and generate a starting set of structured rules in the query canvas below.

4.2 Filter by collections

By default, your query will run across all available collections you have access to. To target specific datasets, use the Filter Collections panel on the right side of the New Query screen.

Select the Filter Collections button at the top right of the query workspace.
Check the collections you want to include.
You can choose to include or exclude Synthetic data collections using the ‘Synthetic Data Collections’ toggle. Synthetic data collections may be useful for testing and feature exploration.

What are collections?

A collection is a dataset that has been onboarded to Cohort Discovery by a Data Custodian. Each collection is linked to a registered Gateway dataset and is made searchable once it has passed through the activation workflow.

5. Refine your query

After Cohort Discovery generates an initial set of rules from your natural-language input, you can review and refine each component using the query builder controls and the right-hand properties panel.

5.1 Query building components

Term	Definition
Rule	A single condition in the query, such as a diagnosis code or medication. Search by term name or OMOP concept ID.
Operator	Connects rules or groups using AND or OR logic.
Age rule	Restricts a query or group to a specific age range or life stage.
Group	A container that combines multiple rules into a single logical unit.

5.2 Refine a rule block

Click a rule card in the query canvas to select it.
The right-hand panel will show Rule Parameters, including Include / Exclude toggle, Age settings, and Timeframe.
Make your adjustments and the query canvas will update automatically.
Repeat for each rule until the query reflects your intended cohort.

5.3 Natural-language interpretation notices

When Cohort Discovery interprets a natural-language phrase, it may display an interpretation notice above the query canvas, for example: “Adults” interpreted as current age >= 18.

You can adjust the relevant rule parameter if the automatic interpretation does not match your intended definition.

5.4 Run the query

The Query preview shows the logical construction of your query. When you are satisfied with the query structure, select Run Query in the query canvas area.
Cohort Discovery will return an anonymised cohort count for each collection in your filter.
Results are rounded counts and are subject to low count suppression rules. They are not exact patient numbers but rather the approximate number of records that match your query.

Important — result interpretation

Cohort Discovery returns rounded cohort counts, not exact figures. Small counts may be suppressed to protect patient privacy. Use these results to understand dataset coverage and refine your research scope, to identify which data custodians to contact to further explore feasibility or to start a Data Access Request.

6. Review query history and results

The Query History tab gives you a complete list of all queries you have run, along with their statuses, result counts, and timestamps. Use this to revisit previous work, compare results across runs, and manage or share historical queries.

6.1 Access Query History

Select the Query History tab at the top of the Cohort Discovery workspace.
Your previous queries are listed in reverse chronological order.
Click on a past query of your interest to rerun it, edit it, download it in Json format, or delete it.

6.2 Query statuses

Status	Meaning
Successful	The query ran and returned results.
Pending	The query has been submitted and is being processed.
Failed	The query encountered an error. Review your query structure and try again.

7. Use in-app guidance

Cohort Discovery includes a built-in guidance panel that you can open at any time while working in the query builder. It provides the definition of core terms and short explanations of the main workflow steps, so you can get help without leaving the screen.

7.1 Open the guidance panel

Look for the help icon (?) or ‘Help’ tab at the top right of the query workspace to access Guidance and Help videos.
Select it to open the Terms Glossary and Context panel.
The panel covers: Query Building, Component Terms, Component Refinement, Understanding the Hierarchy, and Running Queries.
Select Close when you are done to return to your query.

Support button

A yellow ‘Need support?’ button is available at the bottom right of the query workspace at all times. Use it to contact the support team if you encounter an issue that in-app guidance does not resolve.

8. Get help and support

If you need help with Cohort Discovery, the following support options are available.

In-app support

Look for the help icon (?) or ‘Help’ tab at the top right of the query workspace to access Guidance and Help videos.
Use the guidance text available on the right-hand panel when interacting with the site.
Use the yellow ‘Need support?’ button at the bottom right of the query workspace to contact the team directly.

HDR UK Service Desk

For access requests, technical issues, or governance queries, email gateway@hdruk.ac.uk or raise a ticket via the HDR UK Service Desk. Select the Cohort Discovery category when submitting your request.

FAQs

Who can access Cohort Discovery?

Access is available to approved researchers from academia, public sector, industry, and international organisations. Your level of access may vary based on your organisation type and the permissions set by individual data custodians.

What data can I search using Cohort Discovery?

You can explore pseudonymised patient counts across datasets provided by participating data custodians. Each dataset reflects what has been made available for discovery based on the data partner's governance, format, and permissions.

Can I use Cohort Discovery to run live analyses or download data?

No. Cohort Discovery is for feasibility assessment only. It returns aggregate counts—not individual-level or downloadable data. If you wish to request access to the actual data, you must submit a formal request via the Gateway.

I’m based outside the UK—can I use Cohort Discovery?

Yes. Some data custodians allow access to international users, while others restrict to UK-based researchers only. The system automatically filters what you can access based on your user group.

Still can’t find what you’re looking for?

The quickest way to get your issue solved is through the links above, but if you aren’t able to find a solution then contact us here:

Contact support