Accelerate drug discovery by building and turning data into actionable insights

The prioritization of hits from large compound lists for further follow-up is a challenging task for medicinal chemists. During this step of drug discovery, multiple parameters such as synthetic accessibility, target specificity, physicochemical properties, and potential toxicities, in addition to desired biological activity, must be considered simultaneously. Increasing amounts of biological data are accumulating in the pharmaceutical industry and published literature (including journals and patents).

However, data does not equal actionable information, and guidelines for appropriate data capture, harmonization, integration, mining, and visualization need to be established to fully harness its potential. Here, we describe ongoing efforts at Merck & Co. to structure data in the area of discovery chemistry. We are integrating complementary data from both internal and external data sources (Reaxys) into one, and will demonstrate how this well-curated database facilitates compound set design, tool compound selection, target deconvolution in phenotypic screening, and predictive model building (e.g. target prediction).

Early in the discovery process, chemists select a subset of compounds for further research, often from many viable candidates. These decisions determine the success of a discovery campaign, and ultimately what kind of drugs are developed and marketed to the public. We present our findings in the context of complex problem solving and decision theory, and discuss the implications on drug discovery.