Editor’s note: The study featured here was conducted by four members of Elsevier Labs and published in the proceedings of the 2017 IEEE International Conference on Big Data. Recently, Dr. Helena Deus, Technology Research Director for Elsevier Labs, presented it at a poster session at Harvard. You can find the paper and poster at the end of her story.
All new drugs must be first tested with animals – typically mice – before being included in clinical studies. But the stress levels of the lab mice affect how their tumors respond to the chemotherapy drugs tested on them. In fact, independent studies in 2013 and 2015 showed that mice housed in 22°C (72°F) bioteriums became resistant to cancer drugs while mice kept at 30°C (86°F) did not.
So have scientists changed the housing temperature for mice following that discovery?
This is an important question, so my Elsevier data science colleagues and I set out to find an answer.
Text mining the literature and filtering the data
We started with all the research published by Elsevier after 2000, using the ScienceDirect keyword-based search to collect papers with the keywords “mouse,” “tumor” and “temperature” for a total of 133,000 results. However, when we analyzed the results, the majority did not mention mice housing information, so we had to find a way to eliminate false positives.
Finding experimental parameters in biomedical literature is not as easy as one might think. For example, parameters such as the temperature of a room are often considered irrelevant to the experiment and therefore are not explicitly mentioned anywhere in the paper. Instead, the author would cite animal housing guidelines such as those made available by their own institutions or by the NIH. Similarly, popular text mining techniques for biomedical literature target the retrieval of relevant studies based on searching the full text of the articles for keywords such as a gene name or a disease, so discovering measurements like temperature, even in the relevant “Materials and Methods” section of a paper, is not possible with most popular scientific search engines.
To answer our temperature question, we used Databricks to subsection the papers and collect sentences only in the “Methods” sections of the papers. We created a units and measures (U&M) annotator, which helped us extract from those sections any relevant sentences containing temperature measurements. Data scientists from Elsevier labs used a custom and open source tool called AnnotationQuery to obtain U&M as well as GENIA annotations. Finally, we applied a housing condition filter, which selected sentences such as “mice were kept.”
However, after filtering out false positives, we still found out that one in four of the results was not a housing condition sentence. To get around that problem, we created a small training set of 480 sentences, and we used that training set to feed a neural network.
After that neural network exercise, we got 97 percent accuracy in detecting sentences that were indeed about housing conditions. We could finally extract and plot the temperatures used by scientists in their mice experiments over time.
The big conclusion of this study was that scientists have not changed their behavior even after a study in 2015 confirmed that mice show resistance to therapy in lower temperatures. The vast majority of scientists are still housing their mice at tempertures between 20-22°C, which is too cold for the mice, who respond better to therapy in the thermoneutral range of 30°C.
We think there are two reasons the behavior persists. The temperature range we observed is within the 20-26°C temperature range indicated in the National Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals, last updated in 2011.
Another reason could be that mice are housed in facilities for multiyear experiments, so changing the temperature of the housing environment would probably invalidate results obtained previously from ongoing experiments.
Read the research paper
This research was published in the proceedings of the 2017 IEEE International Conference on Big Data: Combining pattern matching with word embeddings for the extraction of experimental variables from scientific literature
The authors work for Elsevier Labs, an advanced technology group within Elsevier: Dr. Helena Deus (Cambridge, Massachusetts), Corey Harper (New York) and Darin McBeath (Cincinnati, Ohio) are Technology Researcher Directors at Elsevier Labs; Dr. Ron Daniel (San Francisco Bay Area) is the Director of Elsevier Labs.