Elsevier scientist builds model to predict COVID-19 severity in veterans

For US government challenge, teams used AI to build models that predict the severity of COVID-19

By Alison Bert, DMA - October 6, 2020
Matt Clark quote card
Dr. Matthew Clark, Senior Director for Scientific Services for R&D solutions at Elsevier, was awarded bronze in the VHA Innovation Ecosystem and precisionFDA COVID-19 Risk Factor Modeling Challenge.

With the pandemic raging on this summer, the US government put out a call for expertise. The Veterans Health Administration (VHA) Innovation Ecosystem and the Food and Drug Administration (FDA) called on the scientific and analytics community to develop computational models to predict COVID-19 related health outcomes in veterans.

Using synthetic data based on veterans’ health records, entrants would use machine learning and AI to predict factors such as length of hospitalization and risk of complications and mortality. The top-performing models would be used to investigate additional risk and protective factors, including therapeutics for preexisting conditions and potential drug interactions.

The VHA Innovation Ecosystem and precisionFDA COVID-19 Risk Factor Modeling Challenge drew submissions from teams at the world’s top universities and AI institutes.

Dr. Matthew Clark (upper left) meets with his international team on Zoom.Then there was a scientist at Elsevier who entered on his own.

Dr. Matthew Clark, Senior Director of Scientific Services for R&D Solutions, realized that this challenge was similar to the work his team does for pharmaceutical companies.

“We help them solve problems by analyzing data and answering questions with data, as well as creating predictive models,” said Matthew, who has expertise in computer science and predictive pharmacology.

In addition, his team has been publishing about the biological pathways of COVID-19, and Matthew was eager to contribute to knowledge that could advance the treatment of the disease.

So after talking with his team, Matthew set out to tackle this challenge outside of work.

Analyzing millions of data points

Chart of timeline on <a href="https://precision.fda.gov/challenges/11/view" target="_blank">Challenge website</a>

The challenge provided synthetic health record data of 117,959 US veterans born between 1935 and 1999. The data included factors like temperature, blood pressure, weight, drugs prescribed, procedures performed and conditions that were diagnosed.

“Each patient may have visited the doctor several times over the past few years, so it included 6 million physician visits, providing 40 million data points,” Matthew said.

Preparing this data for analysis would be a massive undertaking. As Matthew explained, “70 to 80 percent of the work is organizing the data so you can use a modeling technique.”

The process involved extracting the data from the medical records and structuring it in a way that patients could be compared to each other. Matthew wrote custom programs to reformat the data and load the data into PostgreSQL, an open source database system. It took him two weeks, working nights and weekends.

Matthew said the nature of his work at Elsevier, which involved much travelling before the pandemic, prepared him for this part of the challenge: “You become efficient at doing small amounts of work in small time bites and coming back to it.”

His ultimate goal was to determine which factors, or combinations of factors, were related to patient survival and which would likely lead to ventilation or long hospital stays. To build the model, he developed machine learning algorithms to analyze the data.

But beyond analyzing independent factors, his model considered factors in combination: “Having two things happening at the same time can be more important than each one happening individually,” he explained. “In other words, the presence of two factors, like high blood pressure and high heart rate, may be much more important than either of them individually. The impact of having them together is greater than their impacts individually.”

What did his model reveal?

In building his model, Matthew discovered that key combinations were body temperature combined with systolic blood pressure, body weight or heart rate. These have a “positive impact,” raising risk of complications and death. Meanwhile, “negative impact” factors predictive of better outcomes were (younger) age combined with low heart rate and not having elevated platelets. The latter is consistent with reports of COVID patients having issues with blood clots; elevated platelets makes patients more prone to clotting.

The chart illustrates the relative magnitude of the most impactful factors affecting COVID survival found in the study. Combined fever and high systolic blood pressure was the largest risk factor. Combined fever and high body weight was the second most impactful for survival. Fever with high heart rate, and QALY, a measure of overall health, were also impactful. (Source: Matthew Clark)

Winners were those whose algorithms most closely predicted the actual health outcomes of the veterans.

“If you’re at the Olympics, it’s good to be on the podium.”

Results on <a href="https://precision.fda.gov/challenges/11/view/results" target="_blank">Challenge website</a>

When the results were posted this month, Matthew discovered he was awarded a bronze medal.

“I was excited,” he said. “Even if you don’t win first place, if you’re at the Olympics of data science, it’s good to be on the podium.”

His colleagues at Elsevier were also delighted. “This is really impressive work, especially when you look at the other groups at the top, which include some of the top AI groups in the world,” said Tim Hoctor, VP of Life Science Solutions Services. “It shows clearly the applied value of both our data and our capabilities.”

Recognition aside, Matthew said the impact of this kind of work is what matters most. For example, he won the medal for predicting the chance of mortality from COVID-19. Knowledge like that helps medical professionals determine which factors on the patient’s EHR are associated with higher risk. As Matthew explained:

If you're (caring for patients) in the hospital, you can say, this person has factors signaling a much higher chance of dying from COVID; we should watch them closely.


Written by

Alison Bert, DMA

Written by

Alison Bert, DMA

As Executive Editor of Strategic Communications at Elsevier, Dr. Alison Bert works with contributors around the world to publish daily stories for the global science and health communities. Previously, she was Editor-in-Chief of Elsevier Connect, which won the 2016 North American Excellence Award for Science & Education.

Alison joined Elsevier in 2007 from the world of journalism, where she was a business reporter and blogger for The Journal News, a Gannett daily newspaper in New York. In the previous century, she was a classical guitarist on the music faculty of Syracuse University. She received a doctorate in music from the University of Arizona, was Fulbright scholar in Spain, and studied in a master class with Andrés Segovia.

Why we can be optimistic about the post-COVID future
AI, big data, cybersecurity and IoT in the era of coronavirus
5 ways clinical pathways can improve oncology care


comments powered by Disqus