Identifying bottlenecks in chemistry research via the #ChemSearch Challenge
Reaxys helps research chemists extract data from vast and increasing volume of chemistry literature
By Dileep Sharma Posted on 5 January 2016
Research chemists face an unenviable task: finding precise data points in the vast and increasing volume of chemistry literature published every month. Whether they are trying to find promising compounds with particular biological activities or materials that will withstand extreme conditions, they need exact information and they need it quickly. Reading through full-text original sources isn’t an option: the research environment and pace of discovery have changed too much. In the highly competitive world of R&D, the ability to quickly retrieve relevant answers from all those articles, patents and other sources is essential.
Scientists turned to online research tools for finding literature some time ago, but their expectations have changed. Once-prized comprehensive lists of literature that might be related to their question are starting to be seen as a burden. Scientists have expressed a clear desire for a research solution that enables them to find precise information — chemical properties, experimental procedures, synthesis options — rather than being directed to source texts. In short, they expect research solutions to help them master the data.
The role of the ChemSearch Challenge
Recognizing the shift in expectations from research solutions, we launched the ChemSearch Challenge in September. This series of online quizzes for chemists was designed to help identify and understand the key bottlenecks in searching. Entrants used their preferred chemistry database or search engine to find the correct answers to research questions, and their response times and accuracy are recorded. The Challenge wrapped up in December, but we have already gained insights into the bottlenecks in research.
To proceed with experiments, chemists need precise experimental data about chemical properties, bioactivities and reactions. It’s a rare chemist that does not perform frequent searches for such information. However, finding these answers in full-text original sources can be a slow process. Time spent searching for information is time away from the lab — and it quickly mounts up. In a survey that accompanied the ChemSearch Challenge, over half of the 279 respondents said they spend more than five hours a week on data and literature retrieval, and over a quarter spend more than 10 hours a week on this task (Figure 1).
Researchers perform four basic types of chemistry search:
- Literature searches, including searches for chemical compound data and synthesis plans within full-text articles
- Data retrieval, meaning direct retrieval of extracted synthesis plans, reactions and properties
- Patent searches to see whether similar products and compounds are or have been developed
- Material sourcing, including knowledge of the specific materials and compounds involved in any chemistry search
In the survey, 33 percent of the respondents’ queries focused on literature, while 43 percent focused on data. However, in terms of time spent, the opposite is true; respondents reported that 51 percent of their time was spent was literature searches and 32 percent on data retrieval (Figure 2).
The proportion of time spent on literature searches is a symptom of the information overload chemists face. The growing volume of data is not news to any researcher: new research is published continually. However, the rate of increase has accelerated in recent years. In their report “The Digital Universe of Opportunities,” analyst firm IDC predicts there will be 44 zettabytes, or 44 trillion gigabytes, of data by 2020, compared to 4.4 zettabyes in 2013. To put this in context, one zettabyte of data would fill 7.8 billion top-of-the-range iPhones – or one for every human currently living, with over half a billion to spare.
Why is the rate of data production increasing so rapidly? A major factor is that the costs of data creation have dropped dramatically as a result of technological and scientific advancement. For example, in life sciences, the cost of fully genome sequencing an individual or patient has dropped from more than $1.5 billion to under $1,000 in less than 20 years, opening up huge new sources of data to researchers.
What does this mean for chemists? Searching for a single piece of information using traditional methods means reading multiple full-text articles and patents. This is not a viable model given the rate of data growth. It’s also important to consider that the search strategy and algorithms — which are dependent on the research solution — determine the results of a literature search. If the search construction is not optimal or the tools are not well designed, the retrieved literature may even lack the desired answer. This is critical for researchers who are under pressure to get results. Chemists want to focus on connecting ideas, not finding ideas, especially since research approaches themselves have changed.
In life sciences, for instance, research is no longer focused on a reductionist approach to drug design—isolating the interaction between a therapeutic compound and a single biological target. According to Lars Rebien Sorensen, CEO of Novo Nordisk, curing complex diseases requires a complex approach:
"Most of the easy wins have already been made. Now we are into more indirect ways of treating diseases. These are much more complicated. This is not to belittle the advances so far, but things are getting difficult."
Searching for Answers
In the ChemSearch Challenge survey, the second-most important attribute for a chemistry research solution was that it should retrieve answers in the form of extracted experimental facts rather than just retrieving citations or original full-text sources. The emergence of this answer suggests that newer researchers, who are from the Google generation, realize the limitations of basic text-retrieval search tools. While fear of missing out on a key data point can be a strong incentive for researchers, they recognize that research solution choice should not focus solely on the comprehensiveness of the database but on having the right connections between data in order to build the shortest path from question to deeper insight.
It will be fascinating to see what further insights the ChemSearch Challenge reveals for understanding the bottlenecks in research. As the amount of available data grows, the needs of research industries change and technology evolves, we may see the end of the perceived need for access to every single piece of information. Tools that automate the basic level of reading in order to extract the salient facts and bring researchers directly to relevant and accessible data will be indispensable. Researchers can then use those facts to make the connections that will drive discovery forwards. In essence, we have to trust computers to do at least some of the work for us if we want research times to shorten, productivity to increase and discovery to continue.
Elsevier Connect Contributor
Dileep Sharma has over 12 years of experience in marketing and customer insights with global and multinational organizations in diverse industries: pharmaceutical, chemical, agriculture and retail. In his current role as Senior Solutions Marketing Manager for R&D Solutions at Elsevier, his focus is on developing and implementing short- and long-term marketing strategies that support the brand and marketing initiatives for Reaxys®, Elsevier’s Life Science Solutions flagship chemistry product. Dileep uses his knowledge and understanding of industry trends and market research to help Reaxys support Elsevier R&D Solutions customers' productivity and research goals.