THE IMPORTANCE OF HEALTH DATA IN A DIGITAL WORLD
Throughout history, we’ve collected a huge amount of health information. In ancient and medieval times, health data was recorded on papyrus for teaching purposes in various formats. Fast forward to the 19th century, and we had more complex systems that analysed family history, habits, previous illnesses, and physical examinations.¹ Today, data collection is fundamental in healthcare. It underpins evidence-based decisions, enhances patient care and can be transformative in personalised treatment.
THE EVOLVING METHODS OF HEALTH DATA COLLECTION
The Greeks wrote narratives documenting illnesses and how they were cured.³ Hippocrates, an ancient Greek physician whom the Hippocratic Oath is attributed to, was one of the first recorded individuals to turn away from divine explanations for health and use observation (a form of data collection) to investigate treatments.4
THE MODERN HEALTH DATA REVOLUTION
Modern healthcare data can inform many different aspects of a patient’s care. By 2011, over 50% of doctors were using electronic health records.7 These provide healthcare service providers with a widespread view of a patient’s health. Digital tools, such as wearable devices and mobile health applications, are frequently used by healthcare professionals in targeted interventions before the onset of, or to help monitor or treat health conditions or diseases. For example, information about blood sugar levels, diet or activity levels could help clinicians diagnose, monitor or treat type-2 diabetes. They also enable citizens to manage and take control of their own health.
Clinical Data: Information created by healthcare professionals, such as diagnoses, treatment plans, medications, test results and imaging (e.g. X-rays).
Patient-Generated or Self-Reported Data: Information created by the individual, such as symptoms, activity, heart rate or sleep metrics from wearable devices, blood pressure or temperature monitoring, or apps tracking symptoms.
Administrative and Financial Data: Information used for insurance claims, billing, patient demographics, appointment dates, or hospital resource management.
Public Health and Surveillance Data: Information that is aggregated and used to monitor diseases, disease registries or data froropulation surveys.
Genomics and Omics Data: Information gathered from blood or tissue samples, including genetic information, risk assessments, and screening data.
Environmental and Social Data: Broader information that can affect health, such as housing data, pollution levels, income levels and employment rates.
STAGE’S HEALTH DATA ETHOS
Data is a useful tool for scientific research, but all projects – including STAGE – have to balance the good of medical discovery with the importance of preserving individual privacy. STAGE handles data from across Europe and takes the necessary steps to anonymise that data, allowing it to be utilised in our studies whilst protecting the need for individual privacy. One of the ways we do this is by using federated analysis principles, where data is kept secure and is treated as a single block, rather than individual bits of information. This allows the project to make the most of the statistical power (how confident we are of the data accurately detecting effects or differences) of European-wide data, whilst also comparing differences between data cohorts and preserving the security of individuals’ data.
The project follows the four ‘FAIR’ (Findable, Accessible, Interoperable and Reusable) data principles to maintain responsible, transparent, and ethical data usage. This helps others discover, use, and build upon research, which increases transparency, collaboration, and scientific progress.
THE DATA BEING USED IN STAGE
Re-using existing data
In STAGE, we are using data from twenty cohort studies, two biobanks and one administrative health database, with over 3.8 million study participants across the data sources.
Table 1. List of datasets in STAGE
The types of data included in these datasets fall into the following categories:
- social, lifestyle and environmental exposures
- diagnostics, e.g. International Classification of Diseases (ICD) codes
- clinical, e.g. body mass index, blood pressure
- self-reports
- medications, e.g. Anatomical Therapeutic Chemical (ATC) classification
- genetic
- and proteome, metabolome and methylome (omics).
Collecting new data
In two of the STAGE cohorts, KORA and NFBC1966, we are collecting new data, as part of clinical studies running from January 2026 until June 2028. It is expected that the number of participants will be around 11,000.
The two studies will monitor and quantify the impacts of preventative measures on biological ageing, multi-morbidity and wellbeing. The technology (two different apps) used in the studies will measure and monitor health goals, collecting data on mobility, cognitive functions, general well-being and dietary factors. This is in addition to traditional clinical data collection and physical assessments.
We’re particularly grateful to all participants of the studies who are contributing to the STAGE research.
Data Infrastructure
The project is using its new metadata (data about the data) to further develop the MOLGENIS European Health Data and Sample Catalogue, established in previous projects involving the team from UMCG and other STAGE partners.
One data-related key output from STAGE will be the Healthy Ageing Intelligence Portal. This platform aims to streamline internal and external reuse of STAGE data and will provide open access to a range of information, including cohort metadata, geospatial data, apps, methodologies, models, and other forms of data. The portal will facilitate the discovery of other data-driven outputs of the project – tools like the Age-Friendly Neighbourhood Atlas (see below), microsimulation models and agent-based model case studies, all designed to help inform healthy ageing policies across Europe. This portal will be fully aligned with FAIR data principles.
KEY OUTPUTS FROM STAGE
STAGE will use its data to better understand ageing and the risk of developing multi-morbidity. In addition to the Healthy Ageing Intelligence Portal, data will be used to create several other STAGE outputs:
Age-Friendly Neighbourhood Atlas: A Europe-wide, interactive, digital atlas that will provide a neighbourhood-level ‘healthy ageing index’ of all regions of Europe. This will enable users to visualise, examine and compare the healthy and active ageing characteristics in different areas.
Trustworthy and Robust Artificial Intelligence (AI) Tools: AI models will account for a wide range of factors, including medical history, lifestyle habits and environmental exposures.
This rich combination of data allows us to paint a fuller picture of an individual’s health and help predict the risk of multi-morbidity across the life-course.
Modern research is built on data. STAGE will continue to use the information we access in a FAIR, inclusive manner, balancing both scientific development and the protection of data privacy and reusability. Ultimately, this will ensure that the project can deliver useful, safe tools that help us age more healthily and without multiple health conditions – whilst having the information needed to make informed policy or healthcare decisions across Europe.
REFERENCES
¹Gillum RF. From papyrus to the electronic tablet: a brief history of the clinical medical record with lessons for the digital age. Am J Med. 2013 Oct;126(10):853-7. doi:
10.1016/j.amjmed.2013.03.024. PMID: 24054954.
² https://www.britishmuseum.org/collection/object/W_K-249
³ https://www.nlm.nih.gov/hmd/topics/greek-medicine/index.html
⁴ https://www.nlm.nih.gov/hmd/topics/greek-medicine/index.html
⁵ https://casebooks.lib.cam.ac.uk/astrological-medicine/history-of-medical-records
⁶ World Economic Forum – How to harness health data to improve patient outcomes
⁷ Hahn KA, Ohman-Strickland PA, Cohen DJ, et al. Electronic Medical Records Are Not Associated With Improved Documentation in Community Primary Care Practices. American Journal of Medical Quality. 2011;26(4):272-277. doi: 10.1177/1062860610392365
FURTHER READING
Video: Developing a Metadata Catalogue, Interview with Morris Swertz, PI in STAGE
Video: The LongITools Project Metadata Catalogue – explains its benefits, including its further development in the STAGE Project
Videos on how to use the European Health Research and Sample Data Catalogue are also available. Please visit the Molgenis Playlist on YouTube.
