Cold mice are still affecting cancer research, data shows

Elsevier data scientists use data mining, NLP and neural networks to discover that researchers are not keeping mice at the best temperature for accurate results

Helena Deus at Harvard
Helena Deus, PhD, a Technology Research Director at Elsevier, presents her research team’s poster at the Harvard Postdoc Research Symposium this spring. (Photo by Alison Bert)

Editor’s note: The study featured here was conducted by four members of Elsevier Labs and published in the proceedings of the 2017 IEEE International Conference on Big Data. Recently, Dr. Helena Deus, Technology Research Director for Elsevier Labs, presented it at a poster session at Harvard. You can find the paper and poster at the end of her story.

All new drugs must be first tested with animals – typically mice – before being included in clinical studies. But the stress levels of the lab mice affect how their tumors respond to the chemotherapy drugs tested on them. In fact, independent studies in 2013 and 2015 showed that mice housed in 22°C (72°F) bioteriums became resistant to cancer drugs while mice kept at 30°C (86°F) did not.

So have scientists changed the housing temperature for mice following that discovery?

This is an important question, so my Elsevier data science colleagues and I set out to find an answer.

Check out our tech jobs

Text mining the literature and filtering the data

We started with all the research published by Elsevier after 2000, using the ScienceDirect keyword-based search to collect papers with the keywords “mouse,” “tumor” and “temperature” for a total of 133,000 results. However, when we analyzed the results, the majority did not mention mice housing information, so we had to find a way to eliminate false positives.

Finding experimental parameters in biomedical literature is not as easy as one might think. For example, parameters such as the temperature of a room are often considered irrelevant to the experiment and therefore are not explicitly mentioned anywhere in the paper. Instead, the author would cite animal housing guidelines such as those made available by their own institutions or by the NIH. Similarly, popular text mining techniques for biomedical literature target the retrieval of relevant studies based on searching the full text of the articles for keywords such as a gene name or a disease, so discovering measurements like temperature, even in the relevant “Materials and Methods” section of a paper, is not possible with most popular scientific search engines.

To answer our temperature question, we used Databricks to subsection the papers and collect sentences only in the “Methods” sections of the papers. We created a units and measures (U&M) annotator, which helped us extract from those sections any relevant sentences containing temperature measurements. Data scientists from Elsevier labs used a custom and open source tool called AnnotationQuery to obtain U&M as well as GENIA annotations. Finally, we applied a housing condition filter, which selected sentences such as “mice were kept.”

However, after filtering out false positives, we still found out that one in four of the results was not a housing condition sentence. To get around that problem, we created a small training set of 480 sentences, and we used that training set to feed a neural network.

After that neural network exercise, we got 97 percent accuracy in detecting sentences that were indeed about housing conditions. We could finally extract and plot the temperatures used by scientists in their mice experiments over time.

Our conclusions

The big conclusion of this study was that scientists have not changed their behavior even after a study in 2015 confirmed that mice show resistance to therapy in lower temperatures. The vast majority of scientists are still housing their mice at tempertures between 20-22°C, which is too cold for the mice, who respond better to therapy in the thermoneutral range of 30°C.

We think there are two reasons the behavior persists. The temperature range we observed is within the 20-26°C temperature range indicated in the National Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals, last updated in 2011.

Another reason could be that mice are housed in facilities for multiyear experiments, so changing the temperature of the housing environment would probably invalidate results obtained previously from ongoing experiments.

Read the research paper

This research was published in the proceedings of the 2017 IEEE International Conference on Big Data: Combining pattern matching with word embeddings for the extraction of experimental variables from scientific literature

The authors work for Elsevier Labs, an advanced technology group within Elsevier: Dr. Helena Deus (Cambridge, Massachusetts), Corey Harper (New York) and Darin McBeath (Cincinnati, Ohio) are Technology Researcher Directors at Elsevier Labs; Dr. Ron Daniel (San Francisco Bay Area) is the Director of Elsevier Labs.

View the poster

This is the poster Dr. Helena Deus presented at Harvard. Click on the image to enlarge.

Helena Deus, PhD, presented this poster at the Harvard Postdoc Symposium..


Written by

Helena Deus, PhD

Written by

Helena Deus, PhD

Dr. Helena Deus is a Technology Research Director for Elsevier Labs. She received her PhD in Bioinformatics from Universidade Nova de Lisboa, where she focused on linked data and semantic web applications for healthcare and life sciences with an emphasis on cancer research. Helena is a data scientist passionate about applying data integration and machine learning techniques to the generation of insights affecting clinical practice and biomedical discovery.

Before she joined Elsevier in 2017, Helena's roles included directing a knowledge engineering and data science team at Foundation Medicine and leading projects and strategy for Health Care and Life Sciences at the Digital Enterprise Research Institute, National University of Ireland at Galway (DERI/NUIG). Helena has published over 30 peer reviewed papers. She was one of the winners of the Big Data Track in the 2013 Semantic Web Challenge and of the Linked Data Cup with her work on linking data from The Cancer Genome Atlas.


comments powered by Disqus