The “Unravelling Data for Rapid Evidence-Based Response to COVID-19” project, funded by the European Research Framework Program H2020 and coordinated by the Prins Leopold Instituut Voor Tropische Geneeskun in Antwerp, collects electronic health records of more than 22,000 COVID hospitalized patients and surveillance and screening data on more than 1,900,000 positive cases, with continuous updates.
These data come mainly from front-line hospitals, but also from national health agencies and investigator-led observational studies. The information is heterogeneous, as in addition to this diverse source of data, there are a wide variety of variables in patient records, given the range of possible individual reactions to the virus. As a result, a large amount of data has been accumulated, the analysis of which could reveal interesting patterns for tackling COVID-19.
This information is valuable and available but underused, mainly due to two barriers; one is the technical difficulty of working synchronously with non-harmonized data from different countries, and the other is the handling of sensitive information subject to international ethical and data protection guidelines, such as medical data.
The unCoVer network proposes a solution to both problems through the construction of a federated data infrastructure that facilitates their interoperability in a secure environment, while complying with ethical and data protection guidelines. This proposal is mainly developed in the work package led by the MIDAS (Data Mining and Simulation) and GBT (Biomedical Engineering and Telemedicine Centre) groups of the Universidad Politécnica de Madrid, under the leadership of Prof. Ernestina Menasalvas and Prof. Enrique J. Gómez Aguilera.
The proposed solution is based on federated data analytics, a machine learning paradigm seeking to address the problem of data governance and privacy by training algorithms collaboratively without exchanging the data itself. It has been developed as an alternative for cross-institutional collaboration and proposes that the process travels to the data and not the data to the process, exchanging aggregated statistics instead of patient-level data.
The first step is to harmonize heterogeneous data from 23 institutions, including hospitals, research centers, non-profit organizations and local health authorities, among others. In turn, it is necessary to use an infrastructure that allows data federation, for which OPAL has been employed. Once the data is available and harmonized, it is accessed through DataSHIELD which allows local analysis and then integrates the results, never the data. The data remains secure and under the complete control of its custodians.
The infrastructure generated is shown in the figure.
In addition, the UPM has developed a dashboard that allows access to the analyses. This application allows, among other options, to display: General Data, General Quality Analysis, Population Analysis and Survival Analysis. It is a “toolbox” for working with data in real time that helps to discover certainties of the disease on which to support health strategies against the virus. (Both this and the rest of the tools used in the proposed architecture are based on open source technologies freely available online).
The technical tasks developed by the UPM team at unCover are complemented by others that seek to get the most out of the data, bringing together clinical and statistical analysis expertise to identify those questions of interest to respond to COVID-19. In addition, efforts are made to ensure collaboration with the scientific community and society in general through training and dissemination activities.
This project, which ends in November 2022, represents a valuable methodological contribution in the use of medical data against current and future pandemics.
More info in the project website: https://uncover-eu.net/