Artificial Intelligence and the fight against COVID-19

January 20, 2022

When Covid-19 hit Europe in March 2020, hospitals were plunged into a health crisis that was still poorly understood. The doctors did not know how to treat these patients. The research community scrambled to develop methodologies and software that many believed would allow hospitals to diagnose or triage patients more quickly, providing much- needed support to professionals fighting the disease on the front lines.

Hundreds of artificial intelligence tools have been created and many papers have been published over the past two years describing new machine learning-based models for dealing with COVID-19. These prediction models can be divided into three categories: models for the general population to predict the risk of having Covid-19 or being hospitalized for Covid-19; models to support the diagnosis of Covid-19 in patients with suspected infection; and models to support the prognosis of patients with Covid-19. All models reported moderate to excellent predictive performance in the investigational phase, but it remains unclear whether they offered potential clinical utility.

Two recent articles examined many of these published models. In one of them, published in the British Medical Journal, the authors tried to assess the usefulness of prediction models for the diagnosis of coronavirus disease 2019 (Covid-19). The authors examined more than 200 algorithms to find that none of them were suitable for clinical use. Only two were flagged as promising enough for further testing.

A second study, conducted by researchers at the University of Cambridge and published in Nature Machine Intelligence, focused on deep learning models to diagnose Covid and predict patient risk from medical images such as chest X-rays and computed tomography (CT) scans. ) chest. They identified 2212 studies, of which 415 were included after initial screening and, after the quality screening, 62 studies were included in their review which concluded that none were fit for clinical use.

Both teams found that the researchers repeated the same basic errors in the way they trained or tested their tools. Many of the problems that were discovered are related to the poor quality of the data that the researchers used to develop their tools. Information about Covid-19 patients, including medical scans, was collected and shared during a global pandemic, often by doctors struggling to treat those patients. The researchers wanted to help quickly, and these were the only public datasets available. But this meant that many tools were built using mislabeled data or data from unknown sources. Incorrect assumptions about the data often meant that the trained models did not perform as claimed, their performance estimates are likely to be optimistic and not representative of the target population.

Some tools end up being tested on the same data they were trained on, which makes them appear more accurate than they are. Many unknowingly used a dataset containing chest scans of children who did not have Covid as examples of what non-covid cases looked like. But as a result, AI technologies learned to identify children, not Covid. Many medical scans have been labeled based on whether the radiologists who created them said they showed covid. But that embeds, or incorporates, any bias from that particular doctor into the basic truth of a data set. It would be much better to label a medical examination with the result of a PCR test instead of a doctor's opinion.

Getting up-to-date quality data would also be easier if the formats were standardized. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), adopted and distributed by the Observational Health Data Sciences and Informatics (OHDSI) research network, is a unified database model for integrating various data sources. RWD, including HCE, to the same standard. OMOP CDM, now in version 6.0, has billions of standardized clinical observations from more than 20 countries, including Spain. Properly analyzed MDL-based real-world data has enormous potential to create real-world evidence, which is relevant, appropriate, and most importantly, practical enough to be incorporated into clinical practice.

This approach allowed contributing sites to run analytics code in a distributed or federated fashion, where each site runs analytics separately internally and returns a dataset of results without sharing data at the patient level.

A first study aimed to describe the characteristics of hospitalized patients with COVID-19. In particular, we set out to summarize people's demographics, medical conditions, and medication use. Our second objective was to compare the characteristics of people hospitalized for COVID-19 with those of patients hospitalized for influenza in previous seasons. A total of 34,128 people hospitalized with COVID-19 in the US, Spain, and South Korea were included in the study.

A second study was set up to investigate the use of adjuvant and repurposed drugs in patients admitted to hospital with Covid-19 on three continents (Asia, Europe, and North America). De-identified patient data from 303,264 patients from 11 databases showed that more than 3,400 different drugs were used in the treatment of COVID-19 patients. Among the most popular in the early stages of the pandemic was hydroxychloroquine, which was heavily promoted without the support of reliable evidence and later revoked from emergency approval status following randomized controlled trials (RCTs) and related studies, including another OHDSI study that showed a dangerous risk of combining hydroxychloroquine with another early prescribed COVID-19 therapeutic treatment, azithromycin.

The success of the OHDSI and OMOP CDM approach led to the creation of the European Health Data and Evidence Network (EHDEN), a federated data network of which IOMED is a part. The goal is to further standardize research methodologies on real-world data at scale, harmonizing 100 million health records by 2024 and thus becoming the trusted observational research ecosystem in Europe. Having data models to streamline electronic health record (EHR) analysis in near real-time, as well as open-source tools to analyze real world data (RWD) is one of the great benefits that OMOP brings. In 2020, these tools were rapidly harnessed in a huge effort to provide fast and reliable evidence to fight COVID-19 and we are confident that they will become a key element of the European clinical data infrastructure. Although the advances are promising, the associated challenges cannot be underestimated. AI algorithms must be robust enough to avoid biased learning, which can easily happen when training data sets are too small, too biased, or poorly annotated. This requires interdisciplinary international agreements for data sharing, standardization, preservation, anonymity, validation, and ongoing monitoring.

Implementing the tools in the clinic also requires a digitally-skilled workforce and widespread access to the latest technologies. At the same time, clinicians and patients need to be involved in the design and development process, as tools will ultimately only be successful if they feel comfortable using them. These are just a few examples of the obstacles facing AI technologies, but they reveal one of the key common characteristics: the need for a global effort.

Wynants L, Van Calster B, Collins G S, Riley R D, Heinze G, Schuit E et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal BMJ 2020; 369 :m1328 doi:10.1136/bmj.m1328 

Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199–217 (2021).

Burn, E., You, S.C., Sena, A.G. et al. Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study. Nat Commun 11, 5009 (2020).

Prats-Uribe A, Sena A G, Lai L Y H, Ahmed W, Alghoul H, Alser O et al. Use of repurposed and adjuvant drugs in hospital patients with covid-19: multinational network cohort study BMJ 2021; 373 :n1038 doi:10.1136/bmj.n1038

Image Description

Alberto Labarga

Senior Data Engineer