Worldwide, there are more than 9 million doctors and 18 million nurses, all taking notes on their patients. Those notes increasingly take the form of electronic health records (EHRs). Collectively, EHRs provide information on how healthcare professionals make treatment decisions and arrive at diagnoses.
However, getting intelligence out of this textual information on a large scale is still a challenge.
One way to tackle it is by developing a type of artificial intelligence called a deep learning framework that can tag, or annotate, the text with more structured information that is more easily analyzed. Jingqing Zhang (张敬卿), a PhD candidate at Imperial College London, has been working on this approach for his PhD, in collaboration with LexisNexis, under the LexisNexis Risk Solutions HPCC Systems academic program. LexisNexis is part of Elsevier’s parent company, RELX.
Jingqing hopes to develop a deep learning framework that will “make the clinical decision process smarter, more precise and more personalized.”
A chance connection
Jingqing started working with natural language processing (NLP) during his undergraduate degree in Computer Science and Technology at Tsinghua University in China. In 2016, he moved to London and joined Imperial’s Department of Computing to study for a Master of Research (MRes) and PhD.
The collaboration with LexisNexis was established by Jingqing’s supervisor, Prof Yi-Ke Guo, Co-Director of the Data Science Institute and Professor on the Faculty of Engineering at Imperial. “My initial research interest was in natural language processing, and my supervisor proposed the collaboration through his connections at LexisNexis,” Jingqing said. “I thought it was a great idea, so I wanted to go ahead.”
The connection between Jingqing and LexisNexis may have happened by chance, but it ended up being a productive collaboration thanks to their shared interests. Jingqing said:
LexisNexis are interested in how we can apply their data processing system, which is called HPCC Systems, in our data flow. We are using their system to process the textual data we have more efficiently, and then we apply our deep learning algorithms to extract intelligence from it.
Clinical decisions as a case study
Deep learning models and NLP can be applied across industries and domains to understand decision-making processes. Jingqing works on the technology more broadly and wants the technology to be suitable for any NLP task.
At the beginning of his academic program, he explored different ideas and tried to validate the basic deep learning models using sequential data, including textual and traffic data. In the last year, he has been working on large-volume data available on the internet. For his PhD, he needed a case study; he chose the clinical domain, as he explained:
We're interested in scenarios that require a certain explanation, like clinical decisions. For example, we need to understand why physicians prescribe a particular kind of medication at a particular dose instead of a different medication of dose. We believe that using the information in electronic health records can help us find connections and extract intelligence.
The data in EHRs is a kind of prior knowledge. There are two types of prior knowledge: unstructured and structured. Unstructured prior knowledge is in the form of text and can be found in a large text corpus, such as EHRs, Wikipedia, news articles and books. Structured knowledge can be defined by an ontology — a sort of taxonomy that outlines the categories and relationships between them for a certain topic.
Jingqing uses background knowledge, such as EHRs, with deep learning models to understand natural language processing.
Our question is: how can we efficiently and effectively combine that background knowledge into the deep learning model and extract intelligence from such large amounts of unstructured data?
This is where annotation comes in. Jingqing can connect an ontology to the unstructured data by annotating it with the relevant categories. For EHRs, this means annotating the data with categories that are listed in the Human Phenotype Ontology (HPO). In practice, this means a tag is placed in the EHR text if an “abnormality” is noted, which could be anything from a blood test result to a bone fracture.
Deep learning and precision medicine
Jingqing introduced this work in a seminar for Elsevier: Integrating Prior Knowledge with Learning in Natural Language Processing. So far, they have tested the framework on a subset of the phenotypic abnormalities in the HPO; they plan to extend the annotation to all 13,000 abnormalities listed.
The results of the work have been positive so far: experiments have shown the method is effective, efficient and scalable, and Jingqing and his colleagues believe it could provide an improved indication for disease diagnosis. However, contrary to some claims, Jingqing doesn’t see AI taking on the decision-making role:
The algorithm is always an assistant to the clinician, so the clinician will make their own decisions based on their expert knowledge. A deep learning model would provide them with helpful insights — for example, instead of reading 200 pages of information in different documents, the algorithm could filter out the important information they need to make their decision.
Jingqing also believes that in the future, deep learning models will be able to make the clinical decision process smarter, more precise and more personalized, thereby strengthening the foundation for precision medicine.
There is a lot of data available, but there's no technology right now that can take advantage of it all and have some benefits for humans, probably due to technology and regulation bottlenecks. But I think this is going to be the direction for more precision or intelligent medicine. If we can understand the data for clinical decisions or pharma research, I hope we can have a huge impact on the future.
The research is ongoing, and Jingqing is set to finish his PhD at the end of 2021:
After I graduate, I want to join industry, perhaps working on natural language processing and clinical data to push this research to the cutting edge.
comments powered by Disqus