How machine learning can speed up “annoyingly hard” medical research

Researcher Alexander Mathiasen hopes to solve medical problems with machine learning while working across disciplines

Alexander Mathiasen on TEDx stage
At TEDxOdense 2018, machine learning researcher Alexander Mathiasen takes his audience "into the mind of idiot computers — and idiot people."

When he had to pick a major, Alexander Mathiasen faced a tough choice. “I always loved to learn, and I had a hard time pinning my choice down to one subject. What I really wanted to do was advance medical research and help people fight diseases.”

Today, Alexander is working on a dissertation in machine learning as a PhD student at Aarhus University in Denmark. Although he’s not a clinician, he hopes his work will someday advance medical research.

Alexander Mathiasen is writing a dissertation on machine learning as a PhD student at Aarhus University in Denmark.“Even as an outsider, it is clear that medical research is super-complicated and annoyingly hard,” Alexander said. “Brilliant, hard-working researchers dedicate years of their careers to put potential medicines through animal trials, slowly building their way to human trials. Unfortunately, sometimes all this hard work breaks down at the last mile of their marathon.”

Many things can go wrong, he explained: for example, it can be difficult to track the right variables and keep the circumstances stable each time. Even when everything goes according to plan, results can differ when a trial is reproduced.

Alexander hopes his specialization can help solve this issue: “Computer tools could be of great help to medical research in terms of accelerating the speed at which research is carried out.”

In abstract terms, Alexander and his colleagues try to make computers learn by example, similar to the way humans learn – though computers lack innate human intelligence:

Computers are insanely stupid. If you want them to do something, you need to give them extremely specific instructions. Imagine learning to kick a football. You could probably learn this by seeing a few examples of how your friends do it. If you want a computer to do this, you would need to tell it exactly how to move every single muscle fiber. Instead of doing this, we just give the computer lots of examples, and tell it to figure it out itself.

They achieve this by optimizing algorithms. One popular approach uses artificial neural networks, which are loosely inspired by the human brain. They might learn to recognize if someone has pneumonia based on X-ray images, or suffers from diabetes after scanning the blood vessels in eyes.

Watch Alexander’s TedX talk

Machine learning is a fast-moving environment. To stay up to date, Alexander uses different tools, online and offline. “I’m in a lot of online discussion groups, for example, on Facebook and Reddit,” he said. “I also attend conferences to see which papers are popular and talk to colleagues in the field.”

He also relies on automatic prediction services that recommend what he should read next, mentioning that services such ArXiV sanity preserver, and Elsevier's own personalized recommendation engines can save researchers time. Additionally, Alexander started an old-fashioned reading group, whiteboard included. At each meeting, one group member presents a paper, and they discuss it in-depth afterwards.

The holy grail of biology

At TEDxOdense 2018, machine learning researcher Alexander Mathiasen takes his audience "into the mind of idiot computers — and idiot people."Although Alexander hasn’t deep-dived into the medical world yet, there is one topic that he is particularly interested in: proteins – or as he calls them, “nanomachines.”

“Proteins are amazingly small biological machines responsible for all kinds of tasks in our bodies,” he explained. “Some of them can even walk. Unfortunately, these machines sometimes malfunction. This is the cause of many of the terrible diseases we humans suffer. Our DNA holds the blueprint for all proteins. You can compare this with building with Lego bricks: the DNA decides which bricks to use in which order to build a specific protein.”

We already know how to sequence DNA and obtain this blueprint – Alexander is currently waiting for his and his mother’s DNA analysis, an original Christmas gift. However, a lot is still unknown about the 3D-structure of these chains of Lego bricks.

Biologists have been constructing 3D images of proteins for about six decades now, and the process is gruelingly slow. It can take years of work to scan one protein and get to know its structure. Alexander wonders if this can be done more efficiently. “The question is: can you just tell a computer: I already know what the chain of Lego bricks looks like; just tell me how to fold it all together.” This is called protein structure prediction. “It is one of the most important problems in bioinformatics,” he added.

Lost in translation

Alexander has high hopes for machine learning as an interdisciplinary tool to advance science in general. Nevertheless, he sees one big challenge: people in different research fields often speak another language altogether. They use jargon and theories that are only known within their fields. He noticed this first-hand while taking a course in quantum mechanics. “After the course, I struggled to read books at an undergraduate level,” he said. “The notation is different, the ways of thinking are different. … It’s very hard to communicate between fields.”

One funny, albeit astounding, anecdote illustrates this perfectly: “

"A physics friend of mine told me about this article, in which medical researchers wanted to determine the area under metabolic curves. The researchers managed to solve the problem by dividing the area under the curve into small shapes called trapezoids. Unfortunately for the researchers, this technique had been known for several hundreds of years, called the trapezoidal rule. It is sometimes included in first year calculus courses, and has even been reported to be used in Babylon 50 BC."

Even though there are more and more platforms that enable researchers to communicate, share papers and work together, Alexander doesn’t believe there is currently one available that solves this problem.

Read Our vision for the information system supporting research

Relics of the past

Ideally one would communicate science as simply as possible. “But when people have different prerequisites, this becomes surprisingly hard,” Alexander said.

In the machine learning world, there are a few new initiatives to exchange new knowledge without too much jargon and using clear visualizations. A new journal in the machine learning world, Distill, does exactly that. This is similar to the cross-discipline Elsevier initiative Atlas, in that it shares ideas in simple language without too much jargon and with using clear visualizations. Since he is part of a generation that practically grew up online, it is easy to see why Alexander is inspired by this way of sharing information: “Fifty years ago, you physically sent a paper to a conference. PDFs today still emulate that practice, but with so many visualization techniques around, I don’t think there is a good reason to keep using this format. To me it seems to be a legacy thing.”

Still, getting rid of jargon altogether seems like an impossible feat, however simply new findings are communicated. Since the world of research revolves around the publication of papers, researchers have only a limited amount of pages to convince reviewers that their idea is sound and makes a valid contribution. It is therefore efficient to use advanced language that builds upon previous knowledge. “Since the incentive of researchers is to get articles published, I see no way around this,” Alexander added.

Getting researchers from different fields to communicate at the same level seems nearly impossible. Alexander is not fully convinced a collaboration platform for science, however useful, would be able to fix this: “If someone figures this out, give me a call!” he said. “Some people dedicate their entire career to multidisciplinary research and spend five years learning both languages.”

For now, Alexander continues to work on his PhD. One of the problems he is working may be useful in solving the protein structure prediction problem, he said, adding that he doesn’t have to think twice about what motivates him: “Advancing medical research might allow us to enjoy a few more years with our loved ones. That’s what makes it so interesting to me.”

Additionally, Alexander is looking forward to finally receiving his and his mother’s Christmas gift. “I know I have a gene that makes me a little bit more likely to develop late-onset Alzheimer’s,” he said. “It would be interesting to look at this specific gene and compare the different types of mutations amongst a group of, say, a million people.”

To be able to do this, he would like to work somewhere with access to this kind of data:

For some problems, having a big data set is like having a cheat code.

There are no regrets of his unconventional road to enter this world. “For me, this seemed to be the most efficient way: getting into bioinformatics and computer science to improve the tools that people use in biology,” he said. “Essentially, I want to solve biology problems with computer science.”

Meanwhile, Alexander quickly became a fan of the unique culture in computer science. “I absolutely love the open science culture,” he concluded. “It’s about solving the problem. We’re not competing against each other. When it comes to publishing new findings, we’re all humans working together to solve specific problems – like curing diseases.”



Elisa Nelissen
Written by

Elisa Nelissen

Written by

Elisa Nelissen

A keen interest in knowledge drove Elisa Nelissen to study the carriers of information in a Book and Digital Media Studies degree at Leiden University in the Netherlands. That program brought her straight to Elsevier, where she spent a few years on the Global Communications team, making sure the world knew about Elsevier and its journals. Today, Elisa works as a freelance writer.


comments powered by Disqus