With the pandemic raging on this summer, the US government put out a call for expertise. The Veterans Health Administration (VHA) Innovation Ecosystem and the Food and Drug Administration (FDA) called on the scientific and analytics community to develop computational models to predict COVID-19 related health outcomes in veterans.
Using synthetic data based on veterans’ health records, entrants would use machine learning and AI to predict factors such as length of hospitalization and risk of complications and mortality. The top-performing models would be used to investigate additional risk and protective factors, including therapeutics for preexisting conditions and potential drug interactions.
The VHA Innovation Ecosystem and precisionFDA COVID-19 Risk Factor Modeling Challenge drew submissions from teams at the world’s top universities and AI institutes.
Then there was a scientist at Elsevier who entered on his own.
“We help them solve problems by analyzing data and answering questions with data, as well as creating predictive models,” said Matthew, who has expertise in computer science and predictive pharmacology.
In addition, his team has been publishing about the biological pathways of COVID-19, and Matthew was eager to contribute to knowledge that could advance the treatment of the disease.
So after talking with his team, Matthew set out to tackle this challenge outside of work.
Analyzing millions of data points
The challenge provided synthetic health record data of 117,959 US veterans born between 1935 and 1999. The data included factors like temperature, blood pressure, weight, drugs prescribed, procedures performed and conditions that were diagnosed.
“Each patient may have visited the doctor several times over the past few years, so it included 6 million physician visits, providing 40 million data points,” Matthew said.
Preparing this data for analysis would be a massive undertaking. As Matthew explained, “70 to 80 percent of the work is organizing the data so you can use a modeling technique.”
The process involved extracting the data from the medical records and structuring it in a way that patients could be compared to each other. Matthew wrote custom programs to reformat the data and load the data into PostgreSQL, an open source database system. It took him two weeks, working nights and weekends.
Matthew said the nature of his work at Elsevier, which involved much travelling before the pandemic, prepared him for this part of the challenge: “You become efficient at doing small amounts of work in small time bites and coming back to it.”
His ultimate goal was to determine which factors, or combinations of factors, were related to patient survival and which would likely lead to ventilation or long hospital stays. To build the model, he developed machine learning algorithms to analyze the data.
But beyond analyzing independent factors, his model considered factors in combination: “Having two things happening at the same time can be more important than each one happening individually,” he explained. “In other words, the presence of two factors, like high blood pressure and high heart rate, may be much more important than either of them individually. The impact of having them together is greater than their impacts individually.”
What did his model reveal?
In building his model, Matthew discovered that key combinations were body temperature combined with systolic blood pressure, body weight or heart rate. These have a “positive impact,” raising risk of complications and death. Meanwhile, “negative impact” factors predictive of better outcomes were (younger) age combined with low heart rate and not having elevated platelets. The latter is consistent with reports of COVID patients having issues with blood clots; elevated platelets makes patients more prone to clotting.
Winners were those whose algorithms most closely predicted the actual health outcomes of the veterans.
“If you’re at the Olympics, it’s good to be on the podium.”
When the results were posted this month, Matthew discovered he was awarded a bronze medal.
“I was excited,” he said. “Even if you don’t win first place, if you’re at the Olympics of data science, it’s good to be on the podium.”
His colleagues at Elsevier were also delighted. “This is really impressive work, especially when you look at the other groups at the top, which include some of the top AI groups in the world,” said Tim Hoctor, VP of Life Science Solutions Services. “It shows clearly the applied value of both our data and our capabilities.”
Recognition aside, Matthew said the impact of this kind of work is what matters most. For example, he won the medal for predicting the chance of mortality from COVID-19. Knowledge like that helps medical professionals determine which factors on the patient’s EHR are associated with higher risk. As Matthew explained:
If you're (caring for patients) in the hospital, you can say, this person has factors signaling a much higher chance of dying from COVID; we should watch them closely.
comments powered by Disqus