欢快的咖啡 · 使用jQuery将局部变量转换为全局变量_使 ...· 2 月前 · |
很拉风的小狗 · 超七和玉手镯能同时佩戴吗?与其它珠宝一起戴有 ...· 4 月前 · |
飞奔的可乐 · 探索高性能的Vicuna模型:ChatGPT ...· 7 月前 · |
满身肌肉的熊猫 · fluri | Dart package· 8 月前 · |
傻傻的大熊猫 · 一蔚来ES8撞柱起火 传捷达牵手零跑 - 封面新闻· 1 年前 · |
Acc: accuracy; AD: applicability domain; BM: baseline model; CDK: chemistry development kit; CHO: Chinese hamster ovary; CoMFA: comparative molecular field analysis; CoMSIA: comparative molecular similarity index analysis; CPG-NN: counter-propagation neural networks; CT: classification tree; ECFP: extended connectivity fingerprints; FCFP: functional class fingerprints; G/PLS: genetic partial least squares; GA-kNN: k-nearest neighbor with genetic algorithm variable selection; GFA: genetic function approximation; GFA-MLR: genetic function approximation with multiple linear regression; GP: genetic programming; GP: Gaussian process; GP-FVS: Gaussian process with forward variable selection; GP-Nest: Gaussian process with nested sampling; GP-Opt: Gaussian process with conjugate gradient optimization; HEK: human embyonic kidney; HPLS: hierarchical PLS; HTS: high-throughput screening; KNB: kernelized naïve Bayes; kNN: k-nearest neighbors; LDA: linear discriminant analysis; LLR: local lazy regression; MNA: multilevel neighborhoods of atoms; MOE: molecular operating environment; NB: naïve Bayes; NN: neural networks; PLS: partial least squares; PLSD: partial least squares discriminant; parzen-window based model; rho: Spearman's rank correlation coefficient; RMSE: root-mean-square error; ROC: receiver operating characteristic; RP: recursive partitioning; RR: ridge regression; Se: sensitivity; SOM: self-organizing map; Sp: specificity; SR: stepwise regression; SVM: support vector machines; SVR: support vector regression; Y-rand: Y-randomization.
Although these models appear to be well-fitted, a critical analysis reveals that the vast majority of the published QSAR models do not comply with the standard validation procedures and the different statistical criteria described in the best practices of QSAR modeling [ 80 , 81 ]. Most of those models are indeed not compliant with the OECD guidance on QSAR model development and validation [ 82 ]. More specifically, the primary drawbacks of the majority of published QSAR studies are: (i) most models do not have proof of passing the Y-randomization test [ 21 , 23 , 26 , 28 , 29 , 31 – 35 , 38 , 40 , 41 , 45 – 49 , 51 – 56 , 58 , 59 , 63 – 65 , 68 – 70 , 75 , 79 ]; (ii) no proof of applicability domain (AD) estimation is provided [ 21 , 23 , 26 – 29 , 31 – 36 , 40 , 45 , 48 – 53 , 56 , 58 , 63 – 65 , 68 – 71 , 75 , 79 ]; and (iii) model predictivity is not acceptable [ 39 , 61 , 66 ]. As a consequence, despite the large number of QSAR models for hERG blockage available in the literature, only very few models can actually be employed to predict hERG blockage [ 60 , 61 , 74 , 78 ]. Most of the models and associated datasets used to build them are not available online for the scientific community. These major drawbacks compromise the practical use of prior models for reliable assessment of drug-induced QT syndrome.
Given the risks associated with hERG inhibition and the lack of reliable models freely available for the research community, we aimed to build predictive and well-characterized QSAR models for hERG blockage using the largest publicly available dataset for hERG blockage. In this study, we developed several consensus QSAR models combining different descriptor types and machine learning techniques (Combi-QSAR), all validated using a modeling workflow fully compliant with OECD guidelines. Moreover, we have applied these models to the World Drug Index (WDI) database for assessing whether some putative hERG blockers and non-blockers among marketed drugs and drug candidates could be identified.
We retrieved 11,958 chemical records containing affinity and inhibition data for the hERG channel from ChEMBL [ 83 ] v13 database (March, 2013).
Only the records related to the potency and the affinity values reported in activity as IC 50 , K i , and EC 50 , were retained. Subsequently, all concentrations were converted to −log(M) values. Compounds with multiple hERG measurements were identified during analyses of duplicates (see Data curation section). Because this dataset was composed from measurements done by multiple laboratories and different types of assays, the binary hERG blockade potential for duplicated records was analyzed to verify the dataset consistency as well as inter- and intra-laboratory assay variability. Different threshold levels have been proposed in the literature; for this reason we have used three binary classification thresholds (1 μM, 10 μM, and 20 μM) to discriminate between hERG blockers and non-blockers. Importantly, we have found an overall concordance between duplicates, considering multiple assays, as high as 93.61%, 90.73%, and 90.19% for the three aforementioned thresholds respectively. Given the high concordance between multiple assays for the same compound, we decided to merge the data. Original references were verified to guarantee that biological activities were correct in ChEMBL database and adjusted if needed. We have noted that compounds from five publications [ 73 , 84 – 87 ] had their potencies wrongly transcribed from original sources to the ChEMBL database. Moreover, compounds with undetermined activities (e.g., >20 μM; <1 μM; etc.) were kept only if they fit the class discrimination threshold. Finally, the datasets were divided into modeling sets (80%) and test sets (20%) using the modified Kennard-Stone algorithm ( http://labmol.farmacia.ufg.br/qsar ).
Additional chemical data for 561 compounds were retrieved from the hERG study published by Li et al. [ 48 ]. After curating the data, 553 compounds were retained. Subsequently, the overlap between this collection and the hERG dataset generated from the ChEMBL database was determined. There were 174 compounds that were present in both datasets, and only nine divergent binary hERG annotations were identified (94.8% of agreement), demonstrating the strong consistency for this dataset. The remaining 379 compounds that were absent from the modeling set were thus utilized for external validation of the QSAR models (see Results).
As an additional external validation set, the performance of the models developed in this study were compared to the ones from Li et al. [ 48 ]. The authors used 66 compounds with reported hERG activity from the WOMBAT-PK database [ 88 ]. These data originated from different sources with experimental binding activities evaluated in mammalian and non-mammalian cell lines, and were expressed in IC 50 , Ki, or percentage of current inhibition [ 89 – 93 ].
The WDI dataset (version 2010, http://thomsonreuters.com/world-drug-index/ ) involved almost 53,965 chemical compounds and pharmacologically active compounds, including all marketed drugs and compounds that entered clinical trials.
All aforementioned chemical datasets were carefully curated and standardized according to the protocol proposed by Fourches et al. [ 94 ]. Structural normalization of specific chemotypes, such as aromatic and nitro groups, was performed using ChemAxon Standardizer (v. 6.1, ChemAxon, Budapest, Hungary, http://www.chemaxon.com ). Inorganic salts, organometallic compounds, polymers, and mixtures were removed. Duplicates, i.e., identical compounds reported several times in the dataset, were identified using ISIDA/Duplicates software [ 95 ] and analyzed. If the experimental hERG data varied from different sources for a given compound, it was removed.
The Sequential Agglomerative Hierarchical Non-overlapping (SAHN) method implemented in the ISIDA/Cluster software ( http://fourches.web.unc.edu ) was applied to check the dataset structure diversity [ 95 ]. In this method, sub-structural molecular fragments (SMF) [ 96 ] are used as input for Euclidean distance calculation. Each compound is initially treated as one cluster. The algorithm proceeds by merging the n compounds sequentially into clusters using their pairwise Euclidean distances. New clusters are formed by the merger of existent clusters with the most similar clusters at each stage, whereas the distance matrix is updated with the distances between the newly formed cluster and all the other ones, according to the type of linkage specified by the user (complete linkage was used in this study). The process continues until one cluster remains. The software generates a dendrogram of the parent-child relationships between the clusters and a heat map of the proximity matrix colored according to the pairwise chemical similarity between compounds.
Four different types of molecular fingerprints reflecting the absence (0) or presence (1) of substructural fragment for each compound [ 97 ] were utilized in this study.
The MACCS structural keys were calculated using the RDKit ( http://www.rdkit.org ) in the KNIME platform [ 98 ]. The MACCS structural keys [ 99 ] are a collection of 166 predefined substructures associated with a SMARTS pattern and belonging to the dictionary-based fingerprint class. They were first planned for substructure searches and typically show a low performance level for virtual screening; thus, they are often used as a baseline fingerprint for benchmarking studies.
FeatMorgan fingerprints are circular fingerprints based on the Morgan algorithm and feature invariants (FCFP-like) [ 100 , 101 ]. They combine the RDKit Morgan fingerprint algorithm with pharmacophoric features calculated using “better” feature definitions. A pharmacophore is the ensemble of steric and electronic features essential for interaction with the biological target and responsible for biological activity [ 102 ], FCFPs are circular topological fingerprints where each pharmacophore represents a bit at the start. A number of iterations are performed to combine the initial pharmacophore identifiers with identifiers of neighboring pharmacophores until a specified diameter is reached and counted. The FCFP rule is derived from pharmacophore feature definitions (e.g., donor, acceptor, aromatic, halogen, basic, acidic, etc.) of the atoms in a molecule ( http://www.rdkit.org/docs/index.html ).
The pharmacophore fingerprints were calculated using the JChem suite from ChemAxon (v. 6.1.3), ChemAxon, Budapest, Hungary, ( http://www.chemaxon.com ) in the KNIME platform. The 2D pharmacophore fingerprints account for the pharmacophore properties of each atom, and the collection of all atom-atom pharmacophore feature pairs, along with their topological distances. More details are available at www.chemaxon.com/jchem/doc/user/PFp2D.html .
The PubChem fingerprints were calculated using the Chemistry Development Kit (CDK) [ 103 ] in the KNIME platform. PubChem fingerprints consist of an 881-dimensional vector of bits that accounts for the absence (0) or presence (1) of a substructure (fragment) for each compound. The 2D chemical representation of compounds is based on specific elements, types of ring systems, atom pairing, or atomic environment (nearest neighbors), etc. A detailed description of this fingerprint system is available at ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt .
The QSAR modeling workflow was carefully conducted in three major steps [ 81 , 104 ]: (i) data curation/preparation/analysis (selection of compounds and descriptors), (ii) model building, and (iii) model validation/selection. First, each dataset was randomly divided into a modeling set (80%) and a test set (20%) using the modified Kennard-Stone algorithm implemented in qsaR package v. 0.7 made for R, available at our lab group webpage ( http://labmol.farmacia.ufg.br/qsar ). Five-fold cross-validation procedure was implemented for model generation. The modeling set with known experimental activity was randomly divided into five subsets; subsequently, one subset (20% of the compounds) was selected as a test set, while the other four subsets (80% of the compounds) were merged as a training set. This procedure was repeated with the other subsets, allowing each of the five subsets to be used once as a test set. The 5-fold external cross-validation procedure was repeated three times and the predictions were averaged. Although models were generated only using the training set, model selection depended on the performance of both the training and test sets, because training set accuracy alone is insufficient to establish robust and externally predictive models [ 80 ]. After model selection, the external test set was screened in order to evaluate the actual predictivity of the model. In addition, 10 rounds of Y-randomization were performed for each dataset to assure that the accuracy of the models was not obtained due to chance correlations. The applicability domain (AD) for each descriptor type was estimated based on the Euclidean distances among the training sets of each model generated in the 5-fold cross-validation procedure. The distance of a test compound to its nearest neighbor in the training set was compared to the predefined applicability domain threshold level. If the distance was greater than this threshold level, the prediction was considered to be less trustworthy [ 105 ]. Four different machine learning methods, including the support vector machine (SVM) method with a radial basis kernel function (SVMradii) [ 106 ], the random forest (RF) method [ 107 ], the tree bagging method, and the gradient boosting method (GBM) were used for model building. The models were built using the qsaR package and its integration workflow plan for KNIME 2.9. All these procedures were united in publicly available KSAR workflow ( http://labmol.farmacia.ufg.br/ksar ) used in this study. KSAR workflow is tightly integrated with R and KNIME and includes many modules, such as the module for curating the data (e.g., removal of duplicates), the rational module (Kennard-Stone and modified Kennard-Stone algorithm), and the random dataset splits module, multiple machine learning methods, performance metrics to evaluate 5-fold cross-validation and external evaluation, the applicability domain (AD), the Y-randomization test, and many other utilities.
The SVM method is a general data modeling methodology first developed by Vapnik [ 106 ]. Briefly, a hyperplane in a high-dimensional feature space is built based on molecular descriptors using kernel functions; subsequently, a linear or non-linear model is constructed in this feature space to segregate compounds with different activities. In this study, a radial basis kernel function (SVMradii) was chosen to seek the optimal pair of the penalty parameter C and the kernel parameter γ.
The complete description of the original RF algorithm can be found elsewhere [ 107 ]. The RF method is an ensemble learning method in which single decision trees are built, and the final prediction is defined by all tree outputs. In each tree, 1/3 of the training set is randomly extracted (i.e., bootstrap sample) and used as an out-of-bag (OOB) set, while the remaining 2/3 of the training set is used for model building. The best split generated by the CART algorithm [ 108 ], among the m randomly selected descriptors from the entire pool in each node, is chosen. Then, each tree is grown to the largest possible extent without pruning. The OOB set is used as a test set for the current tree. The predicted classification values are defined by majority voting for one of the classes.
The tree bagging method averages the decision tree over many samples extracted from the modeling set by the bootstrap replicate. The same compound may appear multiple times in the bootstrap replicate, or it may not appear at all. Thus, on each of n rounds of bagging, a bootstrap replicate is created from the original training set. A base classifier is then trained on this replicate, and the process continues. After n rounds, a final combined classifier is formed by the majority vote of all of the base classifiers [ 109 ].
The GBM generates models by computing a sequence of trees, in which each successive tree is built from the prediction residuals of the preceding tree. A simple (best) partitioning of the data is determined at each step in the boosting tree algorithm, and the deviations of the observed values from the respective residuals for each partition are computed. Given the preceding sequence of trees, the next 3-node tree will then be fitted to the residuals in order to find another partition that will further reduce the residual (error) variance for the data [ 110 ].
The following metrics ( Equations 1 – 8 ) were used to assess different aspects of model performance:
where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
The largest publicly available dataset for hERG liability retrieved from the ChEMBL database ( https://www.ebi.ac.uk/chembl/ ) contained 11,958 associated bioactivity records for the hERG K + channel. Once curated, this dataset only contained 4,980 compounds for modeling. Threshold values for the blocker/non-blocker classification vary in the literature from 1 μM to 40 μM [ 56 , 66 , 69 , 111 ]. For this reason, binary classification models were built for three different thresholds: 1 μM, 10 μM, and 20 μM. Therefore, three datasets were derived from the original dataset and divided accordingly: 4,938 compounds met the threshold level of 1 μM, 4,833 compounds met the threshold level of 10 μM, and 4,544 compounds met the threshold level of 20 μM ( Table 2 ). Models were generated separately for each of the three different threshold levels to define the most suitable cutoff for discriminating between hERG blockers and non-blockers.
1 μM Threshold | 10 μM Threshold | 20 μM Threshold | ||||
---|---|---|---|---|---|---|
|
||||||
Blocker | Non-blocker | Blocker | Non-blocker | Blocker | Non-blocker | |
Modeling set | 795 | 3,156 | 2,277 | 1,590 | 2,666 | 970 |
Evaluation Set | 199 | 788 | 611 | 355 | 713 | 195 |
The examination of the 4,980 compounds suggested a high level of structural dissimilarity (dendrogram and heat map are shown on line at http://labmol.farmacia.ufg.br/predherg ).
The 5-fold cross-validation procedure was used to estimate the robustness of the models developed. The test set was applied to validate and to estimate the predictive power of the models. In this work, we have chosen the models generated for the threshold level of 10 μM, which showed the best performance and were validated internally and externally. The statistical results of generated QSAR models for the modeling set of the 10 μM threshold level are summarized in Table 3 . The detailed results for this threshold level, as well as the full results for the threshold levels of 1 μM and 20 μM, are available in Tables S1–S9 (Supplementary Material) . The combination of different descriptors and machine learning methods led to robust and predictive QSAR models, with balanced accuracy (BAC) values ranging between 0.74–0.87 and a coverage of 0.77–0.93 ( Table 3 ).
Model name | Modeling Set (threshold level 10 μM; n=3,867) | ||||||
---|---|---|---|---|---|---|---|
Accuracy | BAC | MCC | Sensitivity | Specificity | AUC | Coverage | |
|
|||||||
MACCS-SVM | 0.84 | 0.75 | 0.52 | 0.86 | 0.64 | 0.75 | 0.70 |
featMorgan-SVM | 0.86 | 0.78 | 0.57 | 0.86 | 0.69 | 0.78 | 0.74 |
Pharm. FP-SVM | 0.84 | 0.74 | 0.50 | 0.85 | 0.63 | 0.74 | 0.78 |
PubChem-SVM | 0.85 | 0.77 | 0.54 | 0.85 | 0.68 | 0.77 | 0.70 |
MACCS-RF | 0.84 | 0.75 | 0.52 | 0.85 | 0.66 | 0.75 | 0.70 |
featMorgan-RF | 0.85 | 0.77 | 0.55 | 0.86 | 0.68 | 0.77 | 0.74 |
Pharm. FP-RF | 0.86 | 0.77 | 0.55 | 0.84 | 0.71 | 0.77 | 0.78 |
PubChem-RF | 0.84 | 0.77 | 0.55 | 0.85 | 0.69 | 0.77 | 0.70 |
MACCS-TreeBag | 0.84 | 0.76 | 0.53 | 0.85 | 0.67 | 0.76 | 0.70 |
featMorgan-TreeBag | 0.85 | 0.76 | 0.52 | 0.84 | 0.67 | 0.76 | 0.74 |
Pharm. FP-TreeBag | 0.85 | 0.74 | 0.50 | 0.83 | 0.66 | 0.74 | 0.78 |
PubChem-TreeBag | 0.84 | 0.76 | 0.53 | 0.84 | 0.69 | 0.76 | 0.70 |
MACCS-GBM | 0.83 | 0.74 | 0.49 | 0.85 | 0.62 | 0.74 | 0.70 |
featMorgan-GBM | 0.85 | 0.74 | 0.50 | 0.84 | 0.64 | 0.74 | 0.74 |
Pharm. FP-GBM | 0.86 | 0.75 | 0.51 | 0.84 | 0.65 | 0.75 | 0.78 |
PubChem-GBM | 0.85 | 0.76 | 0.52 | 0.85 | 0.66 | 0.76 | 0.70 |
Consensus | 0.89 | 0.83 | 0.65 | 0.85 | 0.81 | 0.83 | 0.74 |
Consensus Rigor | 0.91 | 0.87 | 0.71 | 0.89 | 0.84 | 0.87 | 0.34 |
SVM: Support Vector Machine; RF: Random Forest; TreeBag: Tree Bagging method; GBM: Gradient boosting method; MACCS: MACCS keys; PubChem: PubChem Fingerprints; Pharm. FP: ChemAxon Pharmacophore Fingerprint; BAC: Balanced Accuracy; MCC: Matthews correlation coefficient; AUC: area under the receiver operating characteristic curve; Consensus and Consensus Rigor models were built by averaging the predicted values from the individual model for each machine learning technique that yielded the best performance with higher coverage (featMorgan-GBM, PubChem-TreeBag, Pharm. FP-RF, MACCS-SVM).
The best individual model was generated using the combination of featMorgan fingerprints with SVM (BAC = 78%; sensitivity = 86%; specificity = 69%; see Table 3 for more details).
To assure that the accuracy of the models was not due to chance correlation, 10 rounds of Y-randomization were performed for each dataset. The results are shown in Tables S3, S5 and S9 (Supplementary Material) .
Several QSAR models were generated using multiple machine learning algorithms and descriptors. Consensus QSAR modeling, i.e. , parallel development of multiple QSAR models using all pairwise combinations of different types of chemical descriptors and various machine learning techniques over single QSAR modes, has been shown to be advantageous [ 112 , 113 ]. Nevertheless, no need exists of the overabundance of models in the consensus ensemble [ 94 ]. Therefore, a verification procedure was conducted to indicate whether a consensus model, based on models from Table 3 , would offer additional advantages compared to the individual models. The consensus model was built by averaging the predicted values from the individual model for each machine learning technique that yielded the best performance with higher coverage. Consensus model considered only compounds that were predicted identically (with AD taken into account) by all the models. For example, if models 1, 2, and 3 predicted a compound to be a blocker, and this compound is inside the AD for these models, but model 4 predicted this compound to be a non-blocker, and the compound is outside the AD for this model, in the consensus model this compound was still classified as a blocker. However, if the compound is predicted to be a blocker, and it is inside the AD for models 1, 2, and 3, but the compound is predicted to be a non-blocker by model 4 and is inside the AD, the final prediction of the consensus is specified as inconclusive. In another situation, if all four models, independently of the outcome, yielded predictions outside the AD, the result for this compound was classified as unreliable.
Thus, the consensus model was built by combining the SVM model with MACCS fingerprints (MACCS-SVM), the Tree Bagging method with PubChem Fingerprints (PubChem-TreeBag), the RF model with the ChemAxon Pharmacophore Fingerprint (Pharm. FP-RF), and the gradient boosting model (GBM) with the featMorgan fingerprint (featMorgan-GBM). The generated consensus model demonstrated BAC of 83%, sensitivity of 85%, specificity of 81%, and coverage of 74% (see Table 3 for more details). Therefore, the consensus model discriminates well between hERG blockers and nonblockers—better than any of the individual models.
More rigorous consensus model was also developed (consensus rigor), by combining the same models as in the consensus model with more restrictive conditions. The consensus rigor model only considered the outcome to be reliable when a compound was inside the AD for the four models, and all of the predictions were equal. Any non-concordant prediction was specified as inconclusive. If the compound was outside the AD for any model, then the outcome was specified as unreliable. Expectedly, the increase prediction performance of consensus rigor model (BAC = 87%, sensitivity = 89%, specificity = 84%; see Table 3 ) was achieved at the expense of coverage (34%). Although consensus rigor model is very accurate predictor, its applicability is limited only for certain chemical classes.
In summary, the consensus model demonstrated better results for 5-fold external CV, with 5% accuracy and 20% sensitivity increase when compared with the best individual model (featMorgan-SVM). The statistical results for the external test set at the 10 μM threshold level are summarized in Table 4 . The complete results are shown in Table S6 (Supplementary Material) . Consensus model demonstrated the best performance among all other individual models (BAC of 91%, sensitivity of 89%, specificity of 93%, and coverage of 78%).
Model name | Test Set (threshold 10 μM, n=966) | ||||||
---|---|---|---|---|---|---|---|
Accuracy | BAC | MCC | Sensitivity | Specificity | AUC | Coverage | |
|
|||||||
MACCS-SVM | 0.83 | 0.79 | 0.60 | 0.91 | 0.66 | 0.79 | 0.85 |
Pharm. FP-RF | 0.84 | 0.81 | 0.64 | 0.91 | 0.70 | 0.81 | 0.88 |
PubChem-TreeBag | 0.83 | 0.80 | 0.61 | 0.88 | 0.73 | 0.80 | 0.71 |
featMorgan-GBM | 0.83 | 0.80 | 0.61 | 0.90 | 0.69 | 0.80 | 0.80 |
Consensus | 0.90 | 0.91 | 0.76 | 0.89 | 0.93 | 0.91 | 0.78 |
SVM: Support Vector Machine; RF: Random Forest; TreeBag: Tree Bagging method; GBM: Gradient boosting method; MACCS: MACCS keys; PubChem: PubChem Fingerprints; Pharm. FP: ChemAxon Pharmacophore Fingerprint; BAC: Balanced Accuracy; MCC: Matthews correlation coefficient; AUC: the area under receiver operating characteristic curve; Consensus was built by averaging the predicted value from each individual model (featMorgan-GBM, PubChem-TreeBag, Pharm. FP-RF, MACCS-SVM).
Li et al. [ 48 ] compiled a dataset of 561 compounds with chemical data for hERG activity. After exclusion of duplicates with our modeling set, 377 unique compounds were retained at the threshold level of 10 μM and used as an additional external validation set. Our consensus model reached BAC of 95%, sensitivity of 91%, specificity of 99%, and 84% of coverage for this additional external validation set.
Moreover, Li et al. [ 48 ] used an additional evaluation set comprising 66 compounds from the WOMBAT-PK database with reported hERG activity to validate the performance of their models. Therefore, this external set of 66 compounds from the WOMBAT-PK database was used to examine validation of the consensus model by comparing with the results from Li et al. [ 48 ] under the same conditions. The comparison of the statistical values is shown in Table 5 . The consensus model outperformed the Li et al. [ 48 ] models in specificity, sensitivity, and in other performance metrics, which reflected a higher BAC (~ 0.27) for the models developed in this research. Moreover, the consensus model for the 10 μM threshold level presented a BAC of 98% (which represents 1 compound misclassified out of 59), sensitivity of 98%, specificity 99%, and coverage of 89% ( Table 5 ).
Accuracy | BAC | MCC | Sensitivity | Specificity | AUC | Coverage | |
---|---|---|---|---|---|---|---|
Model name | WOMBAT-PK dataset (threshold 1μM, n=66) | ||||||
|
|||||||
SVM linear a | - | 0.72 | - | 0.68 | 0.74 | - | - |
SVM non linear a | - | 0.72 | - | 0.83 | 0.47 | - | - |
Consensus | 0.96 | 0.95 | 0.89 | 0.92 | 0.98 | 0.95 | 0.85 |
|
|||||||
Model name | WOMBAT-PK dataset (threshold 10 μM, n=66) | ||||||
|
|||||||
SVM linear a | - | 0.57 | - | 0.56 | 0.59 | - | - |
SVM non linear a | - | 0.71 | - | 0.77 | 0.59 | - | - |
Consensus | 0.98 | 0.98 | 0.96 | 0.98 | 0.99 | 0.99 | 0.89 |
|
|||||||
Model name | WOMBAT-PK dataset (threshold 20 μM, n=66) | ||||||
|
|||||||
SVM linear a | - | 0.60 | - | 0.60 | 0.60 | - | - |
SVM non linear a | - | 0.74 | - | 0.91 | 0.35 | - | - |
Consensus | 0.96 | 0.97 | 0.91 | 0.95 | 0.99 | 0.97 | 0.91 |
Czodrowski [ 71 ] performed an analysis of the hERG dataset retrieved from ChEMBL in which four classification models were built with different divisions of the dataset using the RDKit descriptors and the RF model. To the best of our knowledge, this is the only study found to use the hERG dataset content in the ChEMBL database. Initially we wanted to compare the models obtained by Czodrowski with the ones developed in our study. But then we have found that Czodrowski [ 71 ] did not calculate the AD for the models that allowed prediction of 100% of compounds but compromises their practical use. Consensus models developed in this study has AD estimation that reduced the coverage. We wanted to use the same set of compounds for fair comparison, but unfortunately predicted values were not reported in the study [ 71 ] that does not allow us to make direct comparison.
QSAR models were developed as virtual screening tools for revealing putative hERG blockers among marketed drugs and those in development using the WDI database for a case study. A total of 179 compounds were present in both the hERG and the WDI datasets: 103 blockers and 76 nonblockers. After the data curation 44,486 remaining unique compounds were predicted by consensus model developed in this research for revealing putative hERG blockers and non-blockers. 4,945 compounds were predicted to be blockers and 20,871 – to be non-blockers (the remaining compounds 18,670 were outside of the AD). All the compounds and corresponding predictions are available in the supplementary material and on-line ( http://labmol.farmacia.ufg.br/predherg/vs-wdi.pdf ).
Model interpretation revealed several SAR rules, which can guide structural optimization of hERG blockers into non-blockers. Figures 1 – 2 show some revealed SAR rules, involving changes in the amine nitrogen environment, adding oxygen atom, removing carbon atoms, aromatic substitutions, transformations involving some descriptors such as the difference between the topological polar surface area (ΔTPSA) [ 114 ] of the two molecules in the transformation and the Labute’s approximate surface area descriptor (Labute ASA) [ 115 ].
The general transformations in Fig. 1 show some changes in the environment of the amine nitrogen can reduce hERG inhibition. In Fig. 1A , we can see that removing carbons and/or changing the electronic environment around the basic nitrogen can result in a reduction in hERG inhibition. In this example, the modification of the pyrrolidine moiety by removing carbon atoms or changing it to another functionalized ring (in this example a morpholine ring), yielded in the reduction of the hERG binding. Furthermore, the next two transformations ( Fig. 1B and 1C ) show the same SAR rules that remove carbon atoms, reduce lipophilicity and/or change the electronic and steric environment around the basic nitrogen can transform a potent hERG blocker to less potent blocker or even to a non-blocker compound. Some of those observations were also found previously [ 48 , 116 ].
We can also observe that transformations that add a hydroxyl group reduce hERG inhibition ( Fig. 2A ). As already mentioned, removing carbon atoms, as well as reducing the lipophilicity, can result in a reduction in hERG binding ( Fig. 2B ).
We also noticed some SAR rules revealing specific structural changes through descriptors, like the topological polar surface area (TPSA) of the compounds [ 114 ]. If the difference between the descriptor TPSA (ΔTPSA) of two compounds involved in the transformation is equal or greater to 60, this can result in reduction in hERG inhibition ( Fig. 2C ). We have found in our modeling set that 50 compounds follow this SAR rule and only 3 compounds do not follow this rule. Another descriptor observed and related with changes in hERG binding potency is the Labute’s approximate surface area (Labute ASA) [ 115 ]. If the calculated Labute ASA descriptor is between 309 and 337, then the compound is frequently a hERG blocker ( Fig. 2D ). 130 compounds in our modeling set followed this SAR rule and 3 compounds do not follow this rule.
Importantly, our QSAR models were also capable of recognizing modifications that do not follow the general SAR rules, as shown in Fig. 3 . As we can see, some bioisosteric replacements have resulted in dramatic changes in activity. For example, the replacement of a furane ring by a tetrazole ring, which is a bioisosteric replacement and therefore should preserve the activity, resulted in a substantial alteration in hERG binding, changing the compound from a blocker to non-blocker ( Fig. 3A , left). The same is observed with the substitution of benzene to pyridine ring ( Fig. 3A , right). The bioisosteric replacement of aromatic rings in our modeling set had 169 examples that follow the SAR rule, as the bioisosteric replacement did not altered the activity. However, there were 21 examples in which this modification had altered dramatically the activity, changing from a blocker to non-blocker compound, and our model could capture such modifications. These cases represent the activity cliffs, i.e., structurally similar compounds with large differences in potency [ 117 ]. The modification of a chlorine to hydroxyl group in a aromatic ring also reduced dramatically the binding to hERG ( Fig. 3B , left). Although these groups are classic bioisosteres, this transformation involved the introduction of a hydroxyl group in an aromatic ring that alters the electronic environment in the aromatic group. The same is observed with the substitution of a chlorine by a nitrile group ( Fig. 3B , right), leading to a notable change in hERG binding, changing from a blocker compound to a non-blocker compound. The following examples in Fig. 3C and 3D are also activity cliffs.
In general, our SAR rules showed that to decrease toxicity of hERG blockers one should consider decreasing their lipophilicity, removing carbons and/or changing the electronic environment around the basic nitrogen, and increasing the topological polar surface area. It is important to note that our observations also indicated that hERG inhibition has complex structure-activity relationship as subtle changes in the structure often result in small changes in activity. Moreover, we observed a considerable number of activity cliffs in hERG dataset, most of them span a potency difference of at least 2 orders of magnitude. The final model showed the ability to predict such transformations and the threshold effect for compounds near to the border regions.
Poor pharmacokinetics and toxicity are important causes of costly late-stage failures in drug development. Our laboratory has been working to overcome or reduce these failures, developing in silico tools to early predict and optimize some properties, such as metabolism [ 118 – 125 ], mutagenicity (Ames test), Caco-2 permeability, blood-brain barrier penetration (BBBP), and water solubility [ 126 ], skin sensitization, skin permeability, among others [ 127 – 131 ].
We have developed statistically significant and externally predictive QSAR models of hERG blockage. The best model was obtained for the 10 μM threshold using the largest publicly available dataset of structurally diverse compounds including variety of drug classes. Consensus modeling by merging models developed with different sets of descriptors increased the balanced accuracy, sensitivity, and specificity of the models up to 81–85% with the coverage of ~75%. The models developed in this study can be used by the research community and regulatory scientists for the rapid evaluation of cardiac toxicity liability via hERG inhibition in chemical inventories. For instance, we applied our models for the virtual screening of the WDI dataset and identified 4,945 potential hERG blockers that may be candidates for targeted testing to determine hERG liability. As a result of our study, all curated datasets and developed models that can be used for the rapid identification of hERG blockers and non-blockers in the context of virtual screening for drug development, have been made publicly available at the LabMol ( http://labmol.farmacia.ufg.br/predherg ) and Chembench web-portals.
The authors would like to thank Brazilian Funding Agencies CAPES, CNPq and FAPEG for financial support and fellowships. We are also grateful to ChemAxon for providing us with academic licenses for their software. DF, EM, and AT thank NIH (grants GM66940 and GM096967) and EPA (grant RD 83499901) for financial support.
CONFLICT OF INTERESTS
The authors confirm that this article content has no conflicts of interest.
Supplementary tables and curated datasets are available as supplementary material ( http://labmol.farmacia.ufg.br/predherg ).