Harnessing ontologies for pharma: Dr Jane Lomax on the synergy of AI and scientific expertise
20 February 2024
By Ann-Marie Roche
Dr Jane Lomax, Head of Ontologies for SciBite at Elsevier, talks about how her team of semantic AI experts are helping R&D professionals find the information they need fast.
SciBite’s Head of Ontologies talks about how her team is encoding scientific knowledge into software for the benefit of pharma and the public.
As Head of Ontologies for SciBite, Dr Jane Lomax opens in new tab/window has an ambitious goal: to encode scientific knowledge into software so researchers can find and connect the information they need when they need it.
“I believe ontologies have a fundamental role in leveraging the power of large language models — for the benefit of pharma and the public at large,” she says.
Based in Cambridge, UK, SciBite joined Elsevier in 2020. Their experts combine semantic AI with text analytics and data enrichment tools to help R&D professionals make faster, more effective decisions.
“We get people and machines to use the same language to talk about scientific things,” Jane explains. To do this, they take unstructured content and turns it into ordered machine-readable data for scientific discovery in the life sciences. “And this involves working with our expert scientific curators to encode their expertise into our software.”
With a PhD in Population Genetics and Parasitology and over 20 years of experience with FAIR opens in new tab/window data and ontologies — “basically, ontologies are a codification of scientific facts as we understand them” — Jane is a champion of those creating ontologies. “These are the people doing the foundational work, and they’re doing it on a shoestring,” she says.
She also believes ontologies offer a methodology to leverage the power of new AI technologies such as large language models (LLMs): “While LLMs can bring in their natural language and summarizing skills, ontologies can provide the backbone of scientific knowledge that the LLM can use, as well as making the output explainable and reproducible.”
It’s time for a chat!
‘Head of Ontologies’ — Is that a new cutting-edge title like ‘Prompt Engineer’?
Ontologies have been around for much longer than LLMs, but they fit into our AI age. I lead a team of experts in building and using ontologies, which are representations — or models — that provide a picture of the world so we can talk about things in this world and how they relate to each other.
Webinar
Watch The perils, pitfalls and promise of generative AI for R&D opens in new tab/window with SciBite’s Jane Lomax and other experts.
And what are the big problems ontologies are currently solving?
On a fundamental level, ontologies provide an agreed-upon and structured understanding of scientific language. At SciBite, we use these ontologies to help scientists extract knowledge from scientific literature. After all, there’s just so much text in modern science. Not only are the numbers of published papers increasing, but so are the ways we can generate data. There are even whole new types of science arising. You can't possibly process it all yourself. So what we do is provide the means for scientists to be able to condense the process and ask specific questions related to their specialty.
But ontologies also help with other barriers, such as dealing with all text ambiguities. There’s the famous hedgehog example. A fruit fly gene was named after Sonic the Hedgehog opens in new tab/window — because the fruit fly community is hilarious when it names genes. But it’s called ‘hedgehog’ for short. And obviously, there's also a creature called the hedgehog. And then there’s the actual Sonic the Hedgehog. So when looking across a wide array of documents, how do you know which one is being talked about? In such a case, you have to put rules into the software that disambiguate so you know what type of hedgehog you’re dealing with.
Disambiguating Sonic the Hedgehog sounds like fun. But it also sounds like painstaking work. How are these ontologies created and decided upon?
They arise from specialized, usually academic, communities that are basically doing it with very little support and without license restrictions. It’s actually where I started my career.
Ontologies are only valuable if they’re available to everyone; because they’re (based on) a standard, all databases can talk in the same language and be interoperable. In other words, ontologies have to be open to be useful. And SciBite has built our whole business on top of these ontologies. Without them, SciBite wouldn't exist. We’ve taken these public ontologies and added our special sauce to make them more accessible and easier to use. And AI plays into that.
How exactly do ontologies link with AI?
This is a whole new application for ontologies. While AI technologies are super powerful, the output must still be verified as truth. Ontologies represent the truth as agreed upon by humans: that something is this type of thing, and it relates to these other types of things. So if you can feed that into your AI, you get the best of both worlds.
The AI still does the hard bits — while having our underlying truths built into it. And this need for verification has only become more relevant with these emerging generative AIs. Ontologies can provide the control at the most important step of the literature review, and a consistency in information retrieval, ensuring we return the same documents each time via an explainable process.
What initially spurred your passion for ontologies?
I got lucky. I started in parasite population genetics. It was great, but wet lab work is also prolonged and unpredictable. I was beginning to suspect such a life wasn’t for me. And actually, what I enjoyed most during my PhD was the analysis — the bioinformatics part. So after my PhD, I started to look around in this area. And it was the time when the first bio-ontologies were just created, with Gene Ontology opens in new tab/window being the first one. And that’s the one I ended up working for.
No one knew what an ontology was in the life sciences — they often still don’t. But it’s much more widely known because there are a lot now. But at the time, no one knew what it was. So it was a shot in the dark, and it started as something tiny but then basically revolutionized the world of science. That was a great thing to be involved with.
Ontologies are all about classifying things and organizing the world; that’s always been an instinct of mine. So it just meshed nicely. It also involved computer scientists, biologists, philosophers — all of us coming together and trying to figure this out. It was a really exciting field — and still is 20 years later.
And now, with SciBite, you are applying ontologies to bring even more order to Elsevier’s mass of scientific data.
Like Gene Ontology, SciBite also started out tiny. And we grew it into a global business with nearly 100 people that Elsevier then acquired so we can grow further and serve our customers better. We’ve got a fantastic team that works really well together. And we’re able to help our customers, mostly pharma, with a solution thanks to our skills and software. So that’s super satisfying: We’ve gotten good at extracting understanding from data.
And now, as part of Elsevier, you can extract understanding at scale.
That’s precisely what we are in the process of doing: applying that across the whole of Elsevier’s suites.
And now after this perfect marriage between SciBite and Elsevier, you are bringing in a third party: large language models. Doesn’t that make things complicated?
It potentially makes everything much easier. These new technologies allow you to ask scientific questions in natural language. In turn, the LLMs will translate that into something structured and be able to request that across all these different data sources. And then they come back with something scientists can understand, complete with references. So it’s no longer a black box but a kind of explainable AI solution. You can go back to the research papers and check.
So this is very exciting: explainable AI with SciBite tools across Elsevier’s mass data. [S3] But yes, we’re still figuring it out.
And how are you coping with the speed of developments and the surrounding hype?
It’s all moving super-fast and everyone is trying to find their own way. A job like prompt engineer didn’t exist a year ago. At a conference recently, someone said LLMs had the shortest hype cycle ever, especially in the life sciences, because everyone said, “Oh my god, these things are amazing.” And then, almost immediately, they changed their tune to: “We can’t use this.”
But I do believe ontologies will play a key part in harnessing the power of LLMs. Meanwhile, the whole community is still just feeling its way. But there will be big changes in the next couple of years. And we’re going to move fast and figure out how we fit in. It's really an exciting time to be part of this community.
Are there other aspects to LLMs that excite you?
I think the democratization of these technologies is going to be key. Before, there was this barrier: You had to be able to write in Python to access this very rich set of tools. That’s all changed now. People across different disciplines will now also be able to access these really powerful technologies, which is a huge democratization. And the impact on education is only just being felt. My son is doing his GCSEs (high school diploma exams) now and creating sample exam questions using ChatGPT. He’s going to grow up with this all being normal and just part of the tools he’s able to use. So I think it’s a game changer.
And how do you see SciBite evolving with these rapidly evolving times?
I think we're going to continue to be pioneers and innovators in this field. I think what we do well is being able to prototype and iterate on new tech very fast. So I see us as a sort of the innovation skunkworks of Elsevier. We can further supercharge some of Elsevier’s products. And more of our stuff will be used at scale, taking away more of the tedious work. We’ve also got some new products coming out that deal with a big hurdle in the ontology world: mapping between ontologies. So again, I am really excited.
Is there anything that would make your job easier in terms of accelerating R&D for the life sciences? Is there something people can do?
Support, fund and recognize all of these ontologies that underpin all of this cool stuff we’re doing. And if you don’t have money, provide feedback: Use your expertise with those sources. They all have public trackers where you can make suggestions in terms of what needs to be fixed or added. It just makes the products better.
Become part of this virtuous circle!