添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Discover a faster, simpler path to publishing in a high-quality journal. PLOS ONE promises fair, rigorous peer review, broad scope, and wide readership – a perfect fit for your research every time.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Affiliation Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh, Tangail, Bangladesh

Affiliations Department of Pharmacology, Medical Sciences Division, University of Oxford, Oxford, United Kingdom, Bioinformatics Division, National Institute of Biotechnology, Ashulia, Ganakbari, Savar, Dhaka, Bangladesh

Affiliation Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh, Tangail, Bangladesh

Affiliation Department of Molecular Biology, Universiti Putra Malaysia, Serdang, Selangor D.E., Malaysia

Affiliation Biology Department, Claflin University, Orangeburg, SC, United States of America

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic emerged in 2019 and still requiring treatments with fast clinical translatability. Frequent occurrence of mutations in spike glycoprotein of SARS-CoV-2 led the consideration of an alternative therapeutic target to combat the ongoing pandemic. The main protease (M pro ) is such an attractive drug target due to its importance in maturating several polyproteins during the replication process. In the present study, we used a classification structure–activity relationship (CSAR) model to find substructures that leads to to anti-M pro activities among 758 non-redundant compounds. A set of 12 fingerprints were used to describe M pro inhibitors, and the random forest approach was used to build prediction models from 100 distinct data splits. The data set’s modelability (MODI index) was found to be robust, with a value of 0.79 above the 0.65 threshold. The accuracy (89%), sensitivity (89%), specificity (73%), and Matthews correlation coefficient (79%) used to calculate the prediction performance, was also found to be statistically robust. An extensive analysis of the top significant descriptors unveiled the significance of methyl side chains, aromatic ring and halogen groups for M pro inhibition. Finally, the predictive model is made publicly accessible as a web-app named M pro pred in order to allow users to predict the bioactivity of compounds against SARS-CoV-2 M pro . Later, CMNPD, a marine compound database was screened by our app to predict bioactivity of all the compounds and results revealed significant correlation with their binding affinity to M pro . Molecular dynamics (MD) simulation and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) analysis showed improved properties of the complexes. Thus, the knowledge and web-app shown herein can be used to develop more effective and specific inhibitors against the SARS-CoV-2 M pro . The web-app can be accessed from https://share.streamlit.io/nadimfrds/mpropred/Mpropred_app.py .

thumbnail
Table 1. Mann–Whitney U test results of various properties of the compounds.

https://doi.org/10.1371/journal.pone.0287179.t001

Furthermore, the AD of the built model was determined using the MACCS fingerprints as the starting input for PCA analysis, as shown in Fig 2(F) . After balancing, the data set of 478 compounds was randomly divided into internal and external (80% and 20% respectively) subsets. It’s important to note that the internal set (80%) is used as the training set for building predictive models to predict on the external set. The external set’s chemical space distribution was revealed to be well inside the internal set’s boundaries. As a result, the AD for the CSAR model described herein appears to be well defined.

QSAR modeling

To develop an interpretable QSAR model, we used fingerprints computed through the PaDEL-Descriptor software. S2 S13 Files contains the computed 12 molecular fingerprints of our dataset. The data set’s modelability score or MODI index sorts active compounds from inactive compounds to determine the likelihood of obtaining the CSAR model. It was found that all the fingerprint descriptors have a MODI value greater than 0.65, proving that the data set is reliable for developing a classification model. Table 2 lists all of these fingerprints, as well as their descriptions and MODI indices.

Download:
Table 2. List of molecular fingerprints employed in the current study for representing chemical structures of the M pro inhibitor dataset along with their MODI indices.

https://doi.org/10.1371/journal.pone.0287179.t002

To distinguish between active and inactive M pro inhibitors, we created the CSAR model using the RF algorithm in this work. Table 3 displays the results of 100 independent runs with all the distinct categories of fingerprints, including internal validation test, 10-fold CV, and external validation test. Best averaged values for the MACCS fingerprints were Ac 84.69% and MCC 0.691, as determined by a 10-fold CV analysis. The external validation for the MACCS, Klekota–Roth, and 2D atom pairs fingerprint descriptors, as shown in Table 3 , was also better than the rest of the descriptors. Taking into account the results from 10-fold CV as well as the external validation tests, it is found that the MACCS fingerprint descriptors outperform the other fingerprint classes. Fig 3 contains the plot of experimental vs predicted pIC50 values for model that was constructed with MACCS fingerprint descriptors.

Download:
Fig 3. Plot showing experimental versus predicted pIC50 values for model constructed with MACCS fingerprint descriptors.

https://doi.org/10.1371/journal.pone.0287179.g003

thumbnail
Table 3. Performance summary of CSAR models for predicting M pro inhibitors of SARS-CoV-2.

https://doi.org/10.1371/journal.pone.0287179.t003

Interpretation of feature importance

The top-ranked MACCS fingerprints as obtained from the RF model are mentioned in Table 4 , comprised of fingerprints pertaining to different classes such as aromatic compounds, nitrogen-containing compounds, oxygen-containing compounds, halogens etc.

Download:
Table 4. List of the top-ranking MACCS fingerprints and their corresponding description.

https://doi.org/10.1371/journal.pone.0287179.t004

Model deployment as the M pro pred web-app and assessment

In order to allow biologists or chemists without a computer science background to apply the prediction model in their research, we deployed it as a public web-app known as the M pro pred and is available at https://share.streamlit.io/nadimfrds/mpropred/Mpropred_app.py . Briefly, a guide on using the M pro pred web-app ( Fig 4 ) is given below:

Step 1 . A text file (.txt) should be created containing the SMILES ID of the desired compounds space separated by a given name/ID ( Fig 4A ). SMILES IDs for any desired small compounds can be acquired from various databases e.g. Drugbank [ 54 ], PubChem [ 55 ] or ChemSpider [ 56 ] whereas custom compounds can be drawn on JSME structure editor [ 57 ] or ChemDraw [ 58 ] so as to create the SMILES notation of unknown compounds.

Step 2 . The above-mentioned URL should be typed into any web browser to view homepage of the web-app ( Fig 4B ).

Step 3 . The created text file should be uploaded to the web-app by clicking on the “Browse files” button ( Fig 4C ).

Step 4 . The process of prediction can be started upon clicking on the “Predict!” button ( Fig 4C ).

Step 5 . The results are showed in a box found below the “Prediction results” heading ( Fig 4D ). Typically, only a few seconds is required for the web-app to process the task. The users can also download the predicted results in the form of a CSV file by clicking the “Download Predictions” button.

Download:
Fig 4.

https://doi.org/10.1371/journal.pone.0287179.g004

Binding affinity of CMNPD compounds with M pro

Out of the various possible binding positions of each compound predicted by Autodock Vina, the best one was picked considering the lowest binding energy. The molecular docking score of top 200 CMNPD compounds with M pro ranged from -4.3 Kcal/mol to -10 Kcal/mol shown in S14 File while the result of top 5 compounds is presented in Table 5 . The amino acid interactions of M pro with the top 5 compounds was also identified. The lowest binding energy was found for the compound CMNPD285. The CMNPD16005 is stabilized by a highest number of seven hydrogen bonds and four hydrophobic bonds while binding with the M pro . The second highest number of hydrogen bonds (6) were formed in the CMNPD12721 complex which was also stabilized by seven hydrophobic bonds. All the 5 compounds formed stable interaction with the active site residues and the catalytic dyad comprised of His41 and Cys145 residues of M pro . The detailed interaction profile of the top 5 compounds including the N3 ligand with M pro is explored in Fig 5 .

Download:
Fig 5.

Two-dimensional (2D) representation of molecular docking analysis between the SARS-CoV-2 M pro and (A) N3, (B) CMNPD285, (C) CMNPD20581, (D) CMNPD12721, (E) CMNPD16005, (F) CMNPD6083.

https://doi.org/10.1371/journal.pone.0287179.g005

thumbnail
Table 5. Predicted pIC50 and binding affinity score of top 5 compounds from CMNPD database against M pro .

https://doi.org/10.1371/journal.pone.0287179.t005

Molecular dynamics (MD) simulation results

The RMSD of backbone atoms of the protein-ligand complexes were analyzed to view their stability. It can be observed from Fig 6(A) that CMNPD16005 complex shows the lowest RMSD than all other complexes. Surprisingly, the RMSD of the 6LU7-N3 complex is a bit higher than the CMNPD16005, which denotes the significant stability of CMNPD6083. The RMSD of CMNPD285 complex reaches to ∼0.4 nm from 60 to 85 ns, but the value increases after 85 ns and reaches to 0.3 nm. While viewing into the RMSD of CMNPD12721 complex, a steady increase of RMSD is observed after 60 ns. The value is decreased eventually indicating that CMNPD12721 might change the conformation of protein. Unlike the control and CMNPD16005 complex, RMSD of the CMNPD6083 complex is the mostly stable. Particularly, the CMNPD20581 complex shows the highest RMSD and higher degree of fluctuations throughout the period.

Download:
Fig 6.

The Root-mean-square deviation (A), Root-mean-square fluctuation (B), Radius of gyration (C), Solvent accessible surface area (D) and hydrogen bond (E) analysis of protein-ligand complexes from the molecular simulation of 100 ns at 300 K.

https://doi.org/10.1371/journal.pone.0287179.g006

As RMSF aids in understanding the region of the receptor that is fluctuated throughout simulation, the flexibility of every residue is determined to gain a better understanding of how ligand binding impacts receptor flexibility. It is understood from Fig 6(B) that the binding of CMNPD12721 makes the M pro most flexible in almost all areas in comparison to all other complexes. Overall, the residues such as Glu47, Met49, Leu50, Tyr154, Ala194, Thr196, Arg222, Asn277 and Phe305 are found flexible in case of both control and the ligand-bonded complexes.

The Rg represents the compactness of protein-ligand complexes. The lesser the fluctuation across the simulation period, the more compact the system is. The Rg of the 6LU7-N3 and CMNPD285 complexes were found to be nearly stable in case of fluctuation consistency throughout the simulation ( Fig 6C ). Besides, the Rg of CMNPD20581 was increased from 40 to 100 ns. The higher change of Rg might be due to protein folding, or distinct structural changes. The remaining complexes showed decreased Rg values indicating greater rigidness throughout the simulation period.

A higher SASA value implies that the protein volume is expanding, and a lower degree of fluctuation is mostly expected over time. SASA can be altered by the binding of any molecule, and this can have a significant impact on the receptor structure. The SASA values of all the complexes including the control were found lowest during the simulation period suggesting that the presence of these molecules potentially could limit protein expansion ( Fig 6D ).

Since intermolecular hydrogen bonds are known to contribute to conformational stability, the number of total hydrogen bonds in the protein-ligand complexes were determined. Most hydrogen bonds is observed for 6LU7-N3 complex, while the lowest number is observed in CMNPD20581 complex over the simulation period ( Fig 6E ). The remaining complexes possessed a significant number of hydrogen bonds (ranging from 3 to 8) compared to the CMNPD20581 complex.

Post simulation binding free energy results

Using the MM/PBSA method, the binding free energies of the last 20 ns with a 100 ps interval was estimated from MD trajectories. The overall binding energies of all the complexes were negative, showing greater binding ( Table 6 ). The CMNPD16005 complex showed the lowest binding free energy (-296.193 +/- 25.797 KJ/mol), indicating the most stable conformation of the compound. The other complexes similarly had a low binding energy, suggesting that they could be utilized as potential compounds. A comparative analysis of the binding free energies of the complexes were illustrated in Fig 7(A) . The results for the amino acid residue contribution in the binding of the compounds are shown in Fig 7(B) . The binding of the compounds to M pro involved the notable contribution of leu27, Met49, Cys145, Leu167, Pro168, and Thr190 amino acid residues.

Download:
Fig 7.

Graphical illustration of the binding free energy (A) and per residue contribution plot of protein-ligand complexes (B).

https://doi.org/10.1371/journal.pone.0287179.g007

thumbnail
Table 6. Binding free energy calculations (MM/PBSA) for six protein-ligand complexes.

https://doi.org/10.1371/journal.pone.0287179.t006

Discussion

The COVID-19 pandemic has caused severe damage on the health and daily lives of billions of people around the world over the last two years. We’ve seen a race against time to vaccinate as many people as possible in recent months; however, discrepancies in vaccine distribution between nations, as well as new developing variants, pose an additional public health risk, making it difficult to achieve full immunization [ 59 ]. Several vaccine formulations are now available, assisting in the development of immunity [ 60 63 ]. Nonetheless, there is an increasing interest in developing new anti-covid medications. The M pro , which is responsible for the cleavage of polypeptides during viral genome transcription, is a fascinating drug target in this scenario.

In the current study, we aimed to develop a classification model that is able to determine active from inactive compounds, and build a web-app for differentiating compounds for M pro with selectivity. We followed the Organisation for Economic Co-operation and Development (OECD) recommendations to develop robust QSAR models for this purpose [ 64 ]. These guidelines comprise of the following major points: (i) the data set should have a defined endpoint, (ii) it should use an explicit learning algorithm, (iii) there should be a defined applicability domain of the built model, (iv) appropriate measurement of robustness and predictivity and (v) interpretation of the important features of the QSAR model. We initially extracted a dataset of 758 compounds from literature review and thresholds of <0.5 and >10 μM for identifying active compounds from the inactives in order to build a classification model. Upon excluding the intermediate sets of compounds, we obtained a curated set of 478 compounds for detailed analysis. It is feasible to determine if a compound will exhibit the biological or pharmacological property needed for an orally active medicine in humans utilizing the Lipinski’s rule-of-five (Ro5) approach. These characteristics are based on the fact that almost all drugs are relatively large lipophilic compounds with MW, ALogP, the number of hydrogen hydrogen bond donors, and the number of hydrogen bond acceptors. We found that most of the compounds meet the Ro5 criteria ( Fig 2B–2E ) and the findings of statistical analysis from Mann–Whitney U test showed a significant difference between the active and inactive compounds ( Table 1 ). Also, the chemical space distribution shows that the external set lies well within the areas of the internal set indicating that the AD is well defined for the developed CSAR model found through PCA analysis results ( Fig 2F ).

Furthermore, we used interpretable molecular fingerprints to develop interpretable QSAR models and evaluated the model performances for all the used 12 fingerprints, following the aforementioned guidelines. Also, it is necessary to identify and address the activity cliffs in the data set using the data set’s modelability score or MODI index before the predictive model can be developed. The data set was found to have a MODI value more than 0.65 for all the 12 fingerprint descriptors, indicating that it is reliable for developing a classification model ( Table 2 ). Then we developed a QSAR model utilizing the random forest (RF) algorithm in order for differentiation of the active and inactive inhibitors for M pro . The best averaged values determined by a 10-fold CV analysis was found for the MACCS fingerprint descriptors (Ac of 89%, Sn of 89%, Sc of 73%, and MCC of 79%) ( Table 3 ). Similarly, Klekota–Roth and 2D atom pairs descriptors performed well, with the second and third highest best values for Ac and MCC, respectively, with Klekota–Roth fingerprints providing Ac and MCC values of 83.64% and 0.664, respectively, and 2D atom pairs fingerprints providing Ac and MCC values of 85.74% and 0.713, respectively ( Table 3 ). We found that the MACCS fingerprints were the best choice for model interpretation based on the Ac values, MCC values, overall external and CV.

Later, an investigation of the important features on selected descriptors was conducted to obtain a better view of the mechanistic details driving M pro . The top-ranked MACCS descriptors include descriptors of various classes such as aromatic compounds, nitrogen-containing compounds, oxygen-containing compounds and halogens as obtained from the RF model ( Table 4 ). M pro has been shown to be inhibited by a range of N-substituted isatin derivatives, with the highest activity being associated with derivatives having carboxamide groups at C-5 of the isatin core (IC50  =  0.045–17.8 μM) [ 65 ]. Several oxygen atoms containing small compounds were also found to inhibit M pro and blocks viral transcription [ 66 , 67 ]. Kowit et al. identified halogenated baicalein as a potent inhibitor of the M pro and they confirmed its inhibitory activity in an in vitro assay [ 68 ]. It was also found that the addition of halogen groups improves binding strength by an order of magnitude [ 69 ]. Hossum et al. generated a pharmacophore model and found three acceptor features and one aromatic ring feature as common in all the active hits including the co-crystallized ligand [ 70 ]. Thus, the top-ranked MACCS descriptors are in significant correlation with the properties of laboratory validated potent M pro inhibitors.

In a normal predictive model life cycle, after models are validated and outcomes are shown in the publications, the model’s utility is essentially over. In this way, the model has accomplished its goal to make predictions and offer useful insights into the underlying key characteristics. We believed that deployment of the predictive model as a public web-app that allows scientists and researchers, particularly in the fields of computational chemistry and biology, to use the predictive insights from the model would significantly improve its value, while also benefiting scientific communities, would greatly extend the model’s life cycle. We made the web-app available at “Streamlit share” platform ( Fig 4 ). In order to test the web-app to determine the correlation between predicted pIC50 and the binding affinity, we applied an integrated molecular modeling approach. All the available 31,492 compounds were submitted to the web-app to predict their pIC50 and it was found that top five compounds with highest binding affinity to M pro had pIC50 values ranging from 6.37 to 7 ( Table 5 ). They formed sufficient hydrogen bond and hydrophobic interactions and all of them formed stable interactions with the catalytic dyad consisting of His41 and Cys145 ( Fig 5 ).

Also, MD simulation results re-confirmed the stability of these five compounds with M pro . The RMSD plot indicates that all the five compounds are stable, with no unexpected rises in RMSD values across the simulated time ( Fig 6A ). The complexes had fewer fluctuations in the allowed range, according to the RMSF study ( Fig 6B ). The radius of gyration (Rg) of the protein-ligand complexes tended to be similar, indicating that every complex had a similar compactness behavior ( Fig 6C ). The SASA values showed that the volume of the complexes did not substantially increase ( Fig 6D ). Throughout the simulation, a significant number of hydrogen bonds were observed in all of the complexes, further elucidating their conformational stability ( Fig 6E ). Furthermore, the binding free energies for all of the complexes were estimated using the MM/PBSA method, and the results suggest that the complexes have a favorable binding energy with M pro ( Table 6 and Fig 7A ). It can be determined from the per-residue interaction energy profile that the leu27, Met49, Cys145, Leu167, Pro168, and Thr190 residues of M pro played an important role in protein-ligand stability and contributed significantly to the binding of the compounds ( Fig 7B ). As a result, these compounds may have the potential to interfere with and block the activity of SARS-CoV-2 M pro .

Thus, the web-app presented in the current study can be utilized for further research on various compounds to get a view into their anti-M pro activity. Also, upon evaluating the toxicity of the five marine derived compounds by various toxicity assays, their inhibition efficacy can be tested through in vitro laboratory validations.

Supporting information

S1 File. The SMILES ID and additional details of the 758 compounds.

https://doi.org/10.1371/journal.pone.0287179.s001

(XLSX)

S2 File. Computed AtomPairs2DCount fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s002

(CSV)

S6 File. Computed CDK Graph only fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s006

(CSV)

S8 File. Computed KlekotaRothCount fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s008

(CSV)

S13 File. Computed SubstructureCount fingerprint of our dataset.

https://doi.org/10.1371/journal.pone.0287179.s013

(CSV)

S14 File. The docking score of all the CMNPD compounds with M pro .

https://doi.org/10.1371/journal.pone.0287179.s014

(XLSX)

References

  1. 1. Lam ME. United by the global COVID-19 pandemic: divided by our values and viral identities. Humanit Soc Sci Commun 2021 81. 2021;8: 1–6.