HDR Gateway logo
HDR Gateway logo

Bookmarks

White Swan UK Oncology Online Patient & Public Conversations Dataset

Population Size

118,984

People

Years

2023

Associated BioSamples

None/not available

Geographic coverage

United Kingdom

Lead time

1-2 months

Summary

The dataset contains anonymised patient and public conversation which has taken place online regarding over 50 cancer types (This includes cancers most commonly experienced and rarer types)

Documentation

The dataset contains anonymised patient and public conversation which has taken place online regarding over 50 cancer types (This includes cancers most commonly experienced and rarer types).

The curation of the dataset is based on specific cancer types and cancer patient forums. It is not based on every social post about cancer within the online sources, which is often irrelevant to the patient experience.

Dataset type
Health and disease, Treatments/Interventions, Measurements/Tests, Imaging types, Socioeconomic, Lifestyle
Dataset sub-type
Cancer
Dataset population size
118984

Keywords

oncology, cancer, carcinoma, melanoma, leukemia, sarcoma, adenocarcinomas, Lymphoma, Myeloma

Observations

Observed Node
Disambiguating Description
Measured Value
Measured Property
Observation Date

Persons

Persons in this dataset are determined by the unique volume of chosen display names in the data. This is calculated per source (reddit, reviews, other forums), and then totaled together. In other forums and reviews domains persons may choose to denote themselves as anonymous. In this case, anonymous users are counted once per domain. For example, on 'https://healthunlocked.com/lungcancer'.

118984

Unique online names indicating number of persons

16 Apr 2025

Provenance

Purpose of dataset collection
Research cohort
Source of data extraction
Free text NLP
Collection source setting
Other
Image contrast
Not stated
Biological sample availability
None/not available

Structural Metadata

Details

Publishing frequency
Irregular
Version
1.0.0
Modified

07/04/2025

Citation Requirements
White Swan is a registered charity in England and Wales (1176486) improving health and wellbeing through AI technology and analytics.

Coverage

Start date

01/03/2023

Time lag
1-2 months
Geographic coverage
United Kingdom
Maximum age range
112
Follow-up
Other

Accessibility

Language
en
Alignment with standardised data models
OTHER, LOCAL
Controlled vocabulary
LOCAL, OTHER, HPO
Format
csv, xlsx, web page explorer

Data Access Request

Dataset pipeline status
Available
Access rights
In Progress
Time to dataset access
1-2 months
Access request cost
On Request
Access method category
Varies based on project
Access service description
On Request
Jurisdiction
UK
Data use limitation
Project-specific restrictions
Data use requirements
Project-specific restrictions
Data Controller
White Swan
Data Processor
White Swan

Dataset Types: Health and disease, Treatments/Interventions, Measurements/Tests, Imaging types, Socioeconomic, Lifestyle

Dataset Sub-types: Cancer


Collection Sources: Other