Dataset selection part 2


Application note

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Dataset selection part 2

At the time of the query, a total of 3,776 compounds were recorded in Reaxys Medicinal Chemistry as being inhibitors of the ErbB1 (EGFR-1) kinase. 4,255 biological activities coming from 271 citations were registered. When the query was limited to humans, Reaxys Medicinal Chemistry retrieved 880 compounds with 1,227 associated bioactivities (Figure 3).

Dataset selection Figure 3A | Elsevier
Dataset selection Figure 3B | Elsevier
Figure 3. A. Heatmap for inhibitors of ErbB1 kinase. The dataset can be filtered by target species as highlighted. B. The most potent ErbB1 inhibitors with pX activity above 7.0 (affinity < 100 nM) were selected and the Heatmap was sorted by activity against ErbB1 with the compound highlighted having a pX = 9.5 (IC50 = 0.33 nM).

Filtering the dataset further by parameter and selecting only those with IC50values, 746 compounds with 982 associated bioactivities were retrieved. Finally, the most potent ErbB1 inhibitors were selected (those with a pX >7; i.e., IC50< 100 nM). Using the Substances tab, detailed information can be obtained for each compound (Figure 4).

Dataset selection Figure 4 | Elsevier
Figure 4. Selection of ErbB1 kinase inhibitors as shown in the Substances tab in Reaxys Medicinal Chemistry

Computation

Prior to building models, 56 2D molecular descriptors were computed for a set of 66 ErbB1 compounds. They included 32 P_VSA based descriptors, 12 BCUT descriptors and 12 GCUT descriptors (3) 3. Szántai-Kis, C., Kövesdi, I., Eros, D., Bánhegyi, P., Ullrich, A. (2006) Prediction oriented QSAR modelling of EGFR inhibition. Current Medicinal Chemistry 13: 277–287.. These descriptors encode the three main physicochemical properties: hydrophobicity (SlogP_VSA, BCUT_SlogP, GCUT_SlogP), polarizability (SMR_VSA, BCUT_SMR, GCUT_SMR) and electrostatic interactions (PEOE_VSA, BCUT_PEOE, GCUT_PEOE). To build present regression models, QuaSAR-Evolution was used, which is a genetic-based algorithm implemented in the MOE cheminformatic suite. QuaSAR-Evolution applies the genetic algorithm to the problem of descriptor selection in QSAR. Descriptors selected at random are combined and a population of regression models is generated. The default setting was used for descriptor selection and the initial length (the number of descriptors) was set to 4.