![]() |
俊逸的草稿本 · 缺少cxcored.lib,cv.lib,c ...· 1 月前 · |
![]() |
风流倜傥的香烟 · AI对抗攻击-黑盒攻击 - ...· 2 月前 · |
![]() |
好帅的土豆 · 样本选择偏差 连享会 ...· 5 月前 · |
![]() |
卖萌的杯子 · 【张求会】《皮锡瑞全集》整理的“四难”与“三 ...· 7 月前 · |
![]() |
骑白马的西红柿
8 月前 |
Bin Yu, Wenying Qiu, Cheng Chen, Anjun Ma, Jing Jiang, Hongyan Zhou, Qin Ma, SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting, Bioinformatics , Volume 36, Issue 4, February 2020, Pages 1074–1081, https://doi.org/10.1093/bioinformatics/btz734
Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design.
We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the -gap dipeptide composition ( -gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases.
The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/ .
Supplementary data are available at Bioinformatics online.
Mitochondria are a vital organelle in eukaryotic cells and are involved in critical physiological processes such as cell differentiation, cell information transmission and apoptosis and growth. The bias of proteins submitochondrial localization can lead to a series of damage interactions and cause serious diseases such as Parkinson's disease ( Burbulla et al. , 2017 ), multifactor disorder ( Shi et al. , 2011 ) and Type-II diabetes ( Gerbitz et al. , 1996 ). Therefore, the study of protein submitochondrial localizations is of great significance to provide a theoretical basis for the exploration of the pathogenesis, diagnosis and development of new drugs at the molecular level for human disease. In recent research at the sub-cellular level, some significant results have been achieved in the prediction of the location of subcellular structures ( Chou and Shen, 2006 ; Yu et al. , 2018 ). Traditional biological methods such as cell separation, electron microscopy and fluorescence microscopy ( Li et al. , 2015 ; Mei, 2012 ) are not very feasible for analyzing large-scale biological data. However, computational models can not only analyze a large amount of biological data, but can also make preliminary predictions of unknown located protein sequences, which is an excellent auxiliary for traditional biological experimental methods.
Recently, researchers have developed some protein submitochondrial localization prediction models using feature extraction, feature selection and machine learning algorithms, such as SubMito ( Du and Li, 2006 ), GPLoc ( Nanni and Lumini, 2008 ), Predict_SubMito ( Zeng et al. , 2009 ), SubIdent ( Shi et al. , 2011 ), MitoLoc ( Zakeri et al. , 2011 ), SubMito-PSPCP ( Du and Yu, 2013 ) and TetraMito ( Lin et al. , 2013 ). The first submitochondrial protein localization study was done by Du and Li (2006) to develop SubMito, which first fuses amino acid component information and dipeptide component information, and then combines nine physical and chemical properties of the protein sequence into the support vector machine (SVM) classifier. Lin et al. (2013) developed TetraMito, which used the binomial distribution to select the tetrapeptide components, and generated the dataset M317 with prediction accuracy (ACC) of 94%, however, not considering the physicochemical properties. Nanni and Lumini (2008) proposed the GPLoc algorithm, which fused PseAAC and the physicochemical properties of the Amino Acid index (AAIndex) database to extract the feature of protein sequences from the dataset M317. Shi et al. (2011) constructed the SubIdent to locate the submitochondria on dataset M317 using discrete wavelet transform and combining with protein physicochemical properties such as the hydrophilic, hydrophobic and polarity values for feature extraction. Du and Yu (2013) constructed the SubMito-PSPCP to predict the protein submitochondria location. Qiu et al. (2018) proposed PseAAC-PsePSSM-WD method to predict the protein submitochondria locations. Mei (2012) proposed the multi-kernel transfer learning model for submitochondrial protein localization, which uses the transfer learning model to measure the individual contribution of GO's molecular function, cellular composition, and biological process. Since each feature extraction method only extracts part of the information of the protein sequence, research has been done to integrate multiple methods for protein subcellular information extraction. Jiao et al. ( Jiao and Du, 2017 ) proposed a method integrating functional domain enrichment score, position-specific physicochemical properties (PSPCP) and PseAAC obtaining a prediction ACC of 94%. Li et al. (2015 ) fused position specific scoring matrix and gene ontology to extract feature sequences and achieved ideal prediction ACC.
Feature extraction of protein sequences is based on a variety of feature extraction methods in which dimension is often relatively high and contains a lot of redundant information, reducing the predictive performance of the model. Dimension reduction methods filter the feature vectors in high-dimensional data to eliminate the unnecessary features. Commonly used methods are principal component analysis (PCA) ( Ahmad et al. , 2016 ), information gain ( Wen et al. , 2016 ), maximum relevance and minimum redundancy (mRMR) ( Khan et al. , 2017 ), maximum relevance maximum distance (MRMD) ( Zou et al. , 2016 ), singular value decomposition (SVD) ( Silvério-Machado et al. , 2015 ) and local linear discriminant analysis ( Yu et al. , 2018 ). Ahmad et al. ( Ahmad et al. , 2016 ) fused dipeptide composition (DPC), split amino acid composition (SAAC) and composition and translation methods for protein sequence feature extraction, then used PCA to select optimal feature vectors, and the satisfactory prediction ACC was obtained. Therefore, it is essential to select features for the sequence information after feature fusion.
Machine learning methodologies are commonly used in the research field of protein localization prediction, such as hidden Markov model (HMM) ( Fariselli et al. , 2005 ), support vector machine (SVM) ( Li et al. , 2015 ), K-nearest neighbor (KNN) ( Chou and Shen, 2006 ), increment of diversity (ID) ( Chen, 2012 ) and ensemble learning classifier ( Wang et al. , 2018 ). Ensemble learning has become a hotspot in the field of bioinformatics ( Wang et al. , 2018 ) to predict submitochondrial proteins locations. Zakeri et al. (2011 ) proposed MitoLoc applying ordered weighted averaging (OWA) to ten commonly used feature extraction methods to construct the better SVM classifier. But this method may have a lower ACC rate for a few classes due to data imbalance. Therefore, the classification problem of imbalanced datasets needs to be further studied, and there is still much room for improvement based on the integrated learning prediction model.
Here we propose SubMito-XGBoost for protein submitochondrial localization prediction using a three-step pipeline: (i) methods of -gap dipeptide composition ( -Gap DC), PseAAC, auto-correlation function (ACF) and Bi-gram position specific scoring matrix (Bi-gram PSSM) are integrated to extract protein sequences of submitochondria; (ii) the feature-extracted dataset is processed using the synthetic minority oversampling technique (SMOTE) method ( Chawla et al. , 2002 ) to ensure a balance among submitochondrial protein classes, and the ReliefF algorithm ( Kira and Rendell, 1992 ) is used to select the optimal feature subsets; (iii) the optimal feature vectors are input into the XGBoost classifier to predict the locations of the protein submitochondria. To evaluate the performance of the SubMito-XGBoost model, the leave-one-out-cross-validation (LOOCV) was applied to datasets M317 and M983. As a result, the prediction ACC of the two datasets were 97.65% and 98.94%, respectively, and the optimal parameter combination was determined. An additional dataset M495 was used as an independent test data for model prediction. The overall prediction ACC of SubMito-XGBoost was 94.83% showing a significant improvement of protein submitochondrial localization prediction compared with other existing tools.
Three submitochondrial protein datasets, M317, M495 and M983, were selected as objective representative baselines to construct the computational model for precise localization analysis. The datasets M317 and M983, constructed by Du et al. ( Du and Li, 2006 ; Du and Yu, 2013 ) with sequence similarities not exceeding 40%, were extracted from the Uniprot database (Release 2016_04, https://www.uniprot.org/ ) ( UniProt, 2015 ) and used as test sets for parameter selection. The dataset M495 ( Lin et al. , 2013 ) with similarity not exceeding 25% was used for independent test and has been experimentally annotated as the subcellular location ‘Mitochondrion’. The M495 protein data was extracted from the UniProt database ( http://lin.uestc.edu.cn/-server/subMito/data ). Proteins in each of the three datasets were grouped into three sub-regions: Intermembrane, Matrix and Outer membrane. The number of sub-region proteins in the three submitochondrial datasets are shown in Table 1 .
Datasets . |
Number of proteins
. |
|||
---|---|---|---|---|
Inner membrane . | Matrix . | Outer membrane . | Total . | |
M317 | 131 | 145 | 41 | 317 |
M495 | 254 | 132 | 109 | 495 |
M983 | 661 | 177 | 145 | 983 |
Datasets . |
Number of proteins
. |
|||
---|---|---|---|---|
Inner membrane . | Matrix . | Outer membrane . | Total . | |
M317 | 131 | 145 | 41 | 317 |
M495 | 254 | 132 | 109 | 495 |
M983 | 661 | 177 | 145 | 983 |
The -gap dipeptide composition ( -Gap DC) introduces important intrinsic correlation information of protein sequences in a multi-dimensional space, which has been successfully applied in protein structure predictions ( Ding and Li, 2015 ). Among them, the parameter selection plays an important role in the -Gap DC feature method. The PseAAC ( Chou, 2005 ) combines the sequence information of amino acid sequences with the physicochemical properties of amino acid residues. In this paper, we used the network server constructed by Shen and Chou (2008) , selecting the Type1 type in the pseudo amino acid model and two amino acid attribute features, Hydrophobicity and Hydrophilicity, with the weight parameter set to 0.05. The selection of the parameter cannot exceed the length of the shortest sequence in the submitochondrial protein dataset, and the shortest sequence lengths of the datasets M317, M983 and M495 are 54, 102 and 49, respectively, so the range of is 0 to 48. Thus, each time a protein sequence is entered, the server returns a -dimensional feature vector. The auto-correlation function (ACF) was proposed by Bu et al. ( Bu et al. , 2010 ) in 1999 to extract features from protein sequences by amino acid index. The ACF algorithm converts the protein sequence into a numerical sequence based on the indices of the 20 amino acids and then defines the autocorrelation function of the numerical sequence. We used the ACF feature extraction process with parameter , where is an integer and less than . We transformed a protein sequence into a self-correlation function eigenvector of fixed length . In the end, a feature extraction method based on Bi-gram position-specific scoring matrix (Bi-gram PSSM) ( Khan et al. , 2017 ) was adopted. This solves the shortcomings of calculating the feature vector of the original protein sequence when the feature vector is very sparse. Moreover, it converts protein sequences of different lengths into fixed-length feature vectors, retaining a large amount of evolutionary information. We used the PSI-BLAST program to compare the imported submitochondrial protein sequences in FASTA format with the non-redundant protein database nr, where the parameter is set to 3 iterations, the value of e is equal to 0.001, and the other parameters adopt the default settings to obtain the PSSM of each protein sequence. The Bi-gram PSSM method generated a 400-dimensional feature vectors for the protein sequence. The specific theoretical process of the four feature extraction methods is found in Supplementary Si1–Si4.
SMOTE proposed by Chawla et al. (2002) is a commonly used oversampling method. It uses a small number of samples to control the generation and distribution of artificial samples to achieve dataset class balance, thus effectively solving the problem of classification over-fitting. The specific theoretical process of the SMOTE method is located in Supplementary Si5.
The submitochondrial datasets M317, M983 and M495 used in this paper all have sample imbalance problems. Among them, the matrix proteins in the dataset M317 is about three times than that of the outer membrane proteins, the intermembrane proteins in the dataset M495 are nearly twice as large as the outer membrane protein sequences, and the intermembrane proteins in the dataset M983 is about five times than that of the outer membrane proteins. When the sample unbalanced dataset predicts the model, it often causes the prediction results that are biased toward the majority. To deal with the problem of dataset category imbalance, this paper uses SMOTE algorithm to select a few types of outer membrane protein samples and create new synthetic samples along the line segments connecting nearest neighbors by , so that the three types of protein sequences are approximately equal, and finally, the dataset reaches equilibrium.
Extreme gradient boosting (XGBoost) ( Chen and Guestrin, 2016 ) is an ensemble learning algorithm based on gradient boosting, which is applied by researchers to bioactive molecular prediction ( Babajide and Saeed, 2016 ), miRNA-disease association prediction ( Chen et al. , 2018 ) and post-translation modified locus prediction ( Zhao et al. , 2018 ). XGBoost is an optimization model that combines a linear model with a boosting tree model. It uses not only the first derivative but also the second derivative of the loss function for second-order derivation. This allows the algorithm to converge to the global optimality faster and improve the efficiency of the optimal solution of the model.
When using the XGBoost integrated learning algorithm to predict protein submitochondria, the essential steps are to optimize the XGBoost algorithm's general parameters, booster parameters and learning parameters.
At present, the test methods for evaluating the performance of the model based on statistical theory include independent test, -fold cross-validation test and LOOCV. Among them, LOOCV can reflect the ACC and generalization ability of the algorithm objectively and rigorously, so it is widely used in protein submitochondrial localization prediction ( Du and Yu, 2013 ; Lin et al. , 2013 ; Zakeri et al. , 2011 ).
is a statistic used to evaluate the generalization performance of the model, which is a weighted average of the precision and the recall. The performance of the model can be well evaluated in the case of unbalanced datasets. The higher the value of , the better the performance of the classifier. The MCC can measure the specificity and sensitivity of the model at the same time. When MCC = 1, the prediction is entirely correct. In this paper, by using the sensitivity, specificity, overall ACC and Matthew’s correlation coefficient of Supplementary Si6 to evaluate model performance, it is not only easy for researchers to understand, but also plays an essential role in the development of sequence analysis research.
For convenience, the protein submitochondrial localization prediction method proposed in this paper is called SubMito-XGBoost, with a workflow is shown in Figure 1 . The detailed steps are listed as following:
Step 1 : Data preparation and feature extraction. The protein sequences of the submitochondrial datasets M317, M983 and M495 were obtained from UniProt database. Then, -Gap DC, PseAAC, ACF and Bi-gram PSSM were used to extract the initial feature set. We acquired dimensional feature vectors by fusing the four feature extraction methods.
Step 2 : Sampling and dimensional reduction. The SMOTE algorithm was employed to balance the dataset. Thus, the number of protein sequences of the inner membrane, the outer membrane and the matrix are balanced. To eliminate redundant information, we selected the optimal feature subset as the input of the next step via the ReliefF algorithm (Supplementary Si7).
Step 3 : Model construction and model evaluation. The balanced samples were integrated into XGBoost classifier, and the SubMito-XGBoost model was built up. Then we used the LOOCV on M317 and M983. To further evaluate the prediction performance, M495 was employed as an independent test set to verify the validity of the model.
The experimental environment of all the following results was Intel (R) Core (TM) i5-3210M CPU @ 2.50 GHz 2.50 GHz 12.0GB of memory, MATLAB2014a and RStudio programming implementation.
The submitochondrial protein sequences were extracted by the integration of -Gap DC, PseAAC, ACF and Bi-gram PSSM methods. In -Gap DC, the choice of the parameter has a significant impact on the model prediction performance, which represents the correlation between any amino acid and residue intervals. To find the highest prediction performance, the parameter in -Gap DC was set in the range from 0 to 10, the parameter was chosen from 0 to 45 for PseAAC and the parameter of ACF was chosen from 1 to 45. Then, the feature extracted vectors were classified by XGBoost classifier, the prediction result was evaluated by LOOCV, and the optimal value was determined by the overall ACC on the datasets M317 and M983. The prediction ACC of the three different feature extraction methods on datasets M317 and M983 are shown in Tables 2–4 .
Data . |
parameter
. |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
M317 | 73.19 | 71.92 | 71.29 | 72.24 | 75.39 | 71.29 | 69.09 | 69.72 | 70.98 | 70.66 |
M983 | 68.06 | 67.55 | 67.96 | 68.67 | 68.97 | 68.97 | 67.24 | 67.34 | 66.84 | 67.75 |
Data . |
parameter
. |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
M317 | 73.19 | 71.92 | 71.29 | 72.24 | 75.39 | 71.29 | 69.09 | 69.72 | 70.98 | 70.66 |
M983 | 68.06 | 67.55 | 67.96 | 68.67 | 68.97 | 68.97 | 67.24 | 67.34 | 66.84 | 67.75 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 76.97 | 80.13 | 82.02 | 80.76 | 82.02 | 81.70 | 82.02 | 84.54 | 83.28 | 82.97 | ||||||||
M983 | 71.41 | 73.75 | 73.65 | 73.04 | 73.04 | 72.23 | 72.84 | 72.23 | 74.36 | 75.18 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 76.97 | 80.13 | 82.02 | 80.76 | 82.02 | 81.70 | 82.02 | 84.54 | 83.28 | 82.97 | ||||||||
M983 | 71.41 | 73.75 | 73.65 | 73.04 | 73.04 | 72.23 | 72.84 | 72.23 | 74.36 | 75.18 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 76.97 | 80.13 | 82.02 | 80.76 | 82.02 | 81.70 | 82.02 | 84.54 | 83.28 | 82.97 | ||||||||
M983 | 71.41 | 73.75 | 73.65 | 73.04 | 73.04 | 72.23 | 72.84 | 72.23 | 74.36 | 75.18 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 76.97 | 80.13 | 82.02 | 80.76 | 82.02 | 81.70 | 82.02 | 84.54 | 83.28 | 82.97 | ||||||||
M983 | 71.41 | 73.75 | 73.65 | 73.04 | 73.04 | 72.23 | 72.84 | 72.23 | 74.36 | 75.18 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 75.39 | 79.50 | 80.13 | 79.50 | 77.29 | 78.86 | 78.86 | 79.18 | 79.81 | 79.18 | ||||||||
M983 | 68.57 | 71.01 | 71.41 | 70.30 | 70.50 | 70.50 | 71.11 | 71.01 | 69.18 | 71.52 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 75.39 | 79.50 | 80.13 | 79.50 | 77.29 | 78.86 | 78.86 | 79.18 | 79.81 | 79.18 | ||||||||
M983 | 68.57 | 71.01 | 71.41 | 70.30 | 70.50 | 70.50 | 71.11 | 71.01 | 69.18 | 71.52 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 75.39 | 79.50 | 80.13 | 79.50 | 77.29 | 78.86 | 78.86 | 79.18 | 79.81 | 79.18 | ||||||||
M983 | 68.57 | 71.01 | 71.41 | 70.30 | 70.50 | 70.50 | 71.11 | 71.01 | 69.18 | 71.52 |
Data . |
parameter
. |
10 . | 15 . | 20 . | 25 . | 30 . | 35 . | 40 . | 45 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M317 | 75.39 | 79.50 | 80.13 | 79.50 | 77.29 | 78.86 | 78.86 | 79.18 | 79.81 | 79.18 | ||||||||
M983 | 68.57 | 71.01 | 71.41 | 70.30 | 70.50 | 70.50 | 71.11 | 71.01 | 69.18 | 71.52 |
It can be seen from Table 2 that when using -Gap DC to extract submitochondrial protein features, when , the highest ACC (ACC) of dataset M317 is 75.39%. And when the is 4 and 5, the highest ACC of the dataset M983 is 68.97%. To more intuitively compare the performance of the model under different values, Supplementary Figure S1 shows the trend of the ACC obtained by the datasets M317 and M983. Therefore, 4-gap DC is chosen as the primary feature of the -Gap DC feature extraction method. To unify the model parameters in datasets M317 and M983, the optimal value was 45 for PseAAC ( Supplementary Figure S2 ), and the corresponding dimension is ; For ACF, we selected , the trend of the ACC obtained by the datasets M317 and M983 is shown in Supplementary Figure S3 , and each protein generates a dimension feature vector. The specific discussion results are as Supplementary Sr1, Sr2.
We also used Bi-gram PSSM to extract the evolutionary information of submitochondria protein sequences. The sensitivity, specificity, MCC and ACC of the submitochondrial datasets M317 and M983 are shown in Supplementary Table S1 . The datasets M317 and M983 have higher overall prediction ACC of 86.12 and 83.42%, respectively, which can effectively extract protein sequence information.
To extract the feature information of protein sequences more comprehensively and realize the complementarity between different feature information, this paper draws on the method of feature fusion. On the M317 and M983 datasets, the four protein-sequence feature extraction methods of -Gap DC, PseAAC, ACF and Bi-gram PSSM were combined, and the XGBoost was used for classification. For comparison, Table 5 lists the predicted results of the four single feature information predictions with the predicted results obtained after the fusion. It can be seen from Table 5 that for dataset M317, using the four feature extraction methods after fusion gives an ACC of 87.38%, which is higher than the -Gap DC, PseAAC, ACF and Bi-gram PSSM methods, demonstrating that the fusion of these methods can obtain more useful biometric information from the protein sequence. Supplementary Sr3 lists the overall prediction ACC of the four single feature extraction methods for the dataset M983. The fusion of four extraction methods to extract the protein sequences can obtain more useful biometric information, which is effective for improving the performance of the protein submitochondrial localization model.
Prediction results of datasets M317 and M983 by four feature extraction methods
Data . | Methods . | Inner membrane . | Matrix . | Outer membrane . | ACC . | Feature numbers . |
---|---|---|---|---|---|---|
M317 | -Gap DC | 80.15 | 89.66 | 9.76 | 75.39 | 400 |
PseAAC | 84.73 | 95.17 | 34.15 | 82.97 | 65 | |
ACF | 82.44 | 92.41 | 29.27 | 80.13 | 100 | |
Bi-gram PSSM | 86.26 | 94. 48 | 56.10 | 86.12 | 400 | |
ALL | 90.84 | 93.79 | 53.66 | 87.38 | 965 | |
M983 | -Gap DC | 96.07 | 14.69 | 11.72 | 68.97 | 400 |
PseAAC | 94.86 | 32.77 | 37.24 | 75.18 | 65 | |
ACF | 90.32 | 37.85 | 26.21 | 71.41 | 100 | |
Bi-gram PSSM | 93.80 | 65.54 | 57.93 | 83.42 | 400 | |
ALL | 93.95 | 55.37 | 44.14 | 79.65 | 965 |
Data . | Methods . | Inner membrane . | Matrix . | Outer membrane . | ACC . | Feature numbers . |
---|---|---|---|---|---|---|
M317 | -Gap DC | 80.15 | 89.66 | 9.76 | 75.39 | 400 |
PseAAC | 84.73 | 95.17 | 34.15 | 82.97 | 65 | |
ACF | 82.44 | 92.41 | 29.27 | 80.13 | 100 | |
Bi-gram PSSM | 86.26 | 94. 48 | 56.10 | 86.12 | 400 | |
ALL | 90.84 | 93.79 | 53.66 | 87.38 | 965 | |
M983 | -Gap DC | 96.07 | 14.69 | 11.72 | 68.97 | 400 |
PseAAC | 94.86 | 32.77 | 37.24 | 75.18 | 65 | |
ACF | 90.32 | 37.85 | 26.21 | 71.41 | 100 | |
Bi-gram PSSM | 93.80 | 65.54 | 57.93 | 83.42 | 400 | |
ALL | 93.95 | 55.37 | 44.14 | 79.65 | 965 |
Note : ALL stands for fusion g-Gap DC, PseAAC, ACF and Bi-gram PSSM four feature extraction methods.
Prediction results of datasets M317 and M983 by four feature extraction methods
Data . | Methods . | Inner membrane . | Matrix . | Outer membrane . | ACC . | Feature numbers . |
---|---|---|---|---|---|---|
M317 | -Gap DC | 80.15 | 89.66 | 9.76 | 75.39 | 400 |
PseAAC | 84.73 | 95.17 | 34.15 | 82.97 | 65 | |
ACF | 82.44 | 92.41 | 29.27 | 80.13 | 100 | |
Bi-gram PSSM | 86.26 | 94. 48 | 56.10 | 86.12 | 400 | |
ALL | 90.84 | 93.79 | 53.66 | 87.38 | 965 | |
M983 | -Gap DC | 96.07 | 14.69 | 11.72 | 68.97 | 400 |
PseAAC | 94.86 | 32.77 | 37.24 | 75.18 | 65 | |
ACF | 90.32 | 37.85 | 26.21 | 71.41 | 100 | |
Bi-gram PSSM | 93.80 | 65.54 | 57.93 | 83.42 | 400 | |
ALL | 93.95 | 55.37 | 44.14 | 79.65 | 965 |
Data . | Methods . | Inner membrane . | Matrix . | Outer membrane . | ACC . | Feature numbers . |
---|---|---|---|---|---|---|
M317 | -Gap DC | 80.15 | 89.66 | 9.76 | 75.39 | 400 |
PseAAC | 84.73 | 95.17 | 34.15 | 82.97 | 65 | |
ACF | 82.44 | 92.41 | 29.27 | 80.13 | 100 | |
Bi-gram PSSM | 86.26 | 94. 48 | 56.10 | 86.12 | 400 | |
ALL | 90.84 | 93.79 | 53.66 | 87.38 | 965 | |
M983 | -Gap DC | 96.07 | 14.69 | 11.72 | 68.97 | 400 |
PseAAC | 94.86 | 32.77 | 37.24 | 75.18 | 65 | |
ACF | 90.32 | 37.85 | 26.21 | 71.41 | 100 | |
Bi-gram PSSM | 93.80 | 65.54 | 57.93 | 83.42 | 400 | |
ALL | 93.95 | 55.37 | 44.14 | 79.65 | 965 |
Note : ALL stands for fusion g-Gap DC, PseAAC, ACF and Bi-gram PSSM four feature extraction methods.
The submitochondrial protein datasets M317 and M983 used in this paper are highly unbalanced. Among them, the ratio of inner membrane to outer membrane in dataset M983 and M317 are 1: 5 and 1: 4, respectively. To balance the samples, this paper first used SMOTE re-sampling for the vector obtained after feature extraction, and then input it into the XGBoost for classification and used the LOOCV to test the result. For the datasets M317 and M983, the model prediction results based on the SMOTE algorithm training are shown in Supplementary Table S2 . Figure 2 displays the predicted ACC rate of the three classes of proteins in the datasets M317 and M983 before and after using the SMOTE algorithm. The prediction ACC was significantly improved in both datasets and subgroups; the outer membrane protein had the most significant change. The predicted changes to matrix proteins are also significant. Therefore, the SMOTE algorithm was used in the following analysis to balance samples.
Information redundancy and unrelated features will reduce the prediction performance of the model. Six feature selection methods, i.e. Kernel principal component analysis (KPCA) ( Xu et al. , 2010 ), multi-dimensional scaling (MDS) ( Gorman and Sawatari, 1985 ), mutual information (MI) ( Li et al. , 2017 ), maximum relevance and minimum redundancy (mRMR) ( Khan et al. , 2017 ), maximum relevance maximum distance (MRMD) ( Zou et al. , 2016 ) and ReliefF algorithm ( Kira and Rendell, 1992 ), were compared for prediction ACC changes of different dimensions on protein submitochondrial localization ( Fig. 3 ). The ACC of the datasets M317 and M983 were obtained under different dimensionality reduction algorithms and different dimensions, as shown in Supplementary Tables S3 and S4 .
For the M317 dataset, the highest ACC of 97.65% was obtained by using the ReliefF algorithm to reduce the protein submitochondria at the 350-dimensions. Compared with the prediction results of the four feature extraction methods without dimensionality reduction, it is 10.27% higher. For dataset M983, the highest ACC of 98.94% was obtained by using the ReliefF algorithm to reduce the protein submitochondria at the 350-dimensions. Meanwhile, compared with the prediction results of the four feature extraction methods without dimensionality reduction, it is 19.29% higher. By comparing the ACC of different dimensionality reduction algorithms in different dimensions, we can find the optimal feature dimension location prediction using the ReliefF algorithm. It is possible that the ReliefF algorithm can weigh the features, effectively eliminating redundant variables and improving prediction ACC. Meanwhile, it also considers the correlation between samples and categories and effectively selects the optimal feature subset. Considering this, this paper uses the ReliefF algorithm to reduce dimensionality, and the optimal dimension selects 350 dimensions.
The XGBoost algorithm is an improvement based on the gradient boosting tree ( Sheridan et al. , 2016 ), which can further improve the classification ACC of the model. In this paper, we found that tuning the hyperparameter combination through grid search on some global optimization problems became a better choice for the parameter optimization method of XGBoost algorithm. We selected variables from the influential parameters nrounds, max_depth, mid_child_weight, col_sample_bytree, gamma, eta, alpha, traversed all the parameter combinations, and determined the prediction result by the LOOCV. The variable selection takes the ACC as the model fitness function, makes the ACC value the largest, and obtains the optimal parameter combination corresponding to each dataset. The optimal combination of parameters and the results for datasets M317 and M983 are shown in Supplementary Tables S5 and S6 . The XGBoost algorithm based on the optimal parameters predicted the datasets M317 and M983, which have ACC of 97.65 and 98.94%, respectively. The higher F1 values were obtained, which were 96.62 and 97.27%, respectively. This indicates that the XGBoost algorithm is ideal for positioning prediction, which can significantly improve the prediction of various locations of protein submitochondria.
On the same dataset, the efficiency and ACC of the prediction method are closely related to the selection of the algorithm, and the prediction results obtained by different classifiers are entirely different. This paper choose XGBoost to compare with other algorithms, including decision tree (DT) ( Hostettler et al. , 2018 ), random forest (RF) ( Taherzadeh et al. , 2018 ), support vector machine (SVM) ( Li et al. , 2015 ), naïve Bayes (NB) ( He et al. , 2017 ), LibD3C ( Lin et al. , 2014 ) and K-nearest neighbor (KNN) ( Shen et al. , 2006 ). The KNN uses the Euclidean distance with the nearest neighbor number of 40. The SVM uses radial basis kernel function, and the LibD3C, NB, RF and DT use default parameters. The prediction results obtained by the datasets M317 and M983 under different classifiers are shown in Table 6 .
Prediction results of protein submitochondrial dataset M317 and M983 under different algorithms
Datasets . | Classifiers . |
LOOCV
. |
||||
---|---|---|---|---|---|---|
Sn (%) . | Sp (%) . | MCC . | ACC (%) . | Running time (Min) . | ||
M317 | KNN | 79.20 | 91.27 | 0.6844 | 79.53 | 10'16 |
NB | 70.75 | 83.64 | 0.5637 | 70.87 | 18'53 | |
SVM | 83.89 | 91.27 | 0.7517 | 83.66 | 140'12 | |
DT | 91.49 | 95.55 | 0.8705 | 91.54 | 28'37 | |
LibD3C | 93.45 | 93.28 | 0.9020 | 93.50 | 1003'17 | |
RF | 94.65 | 97.21 | 0.9205 | 94.69 | 185'03 | |
XGBoost | 96.54 | 98.27 | 0.9492 | 97.65 | 19'42 | |
M983 | KNN | 77.97 | 80.11 | 0.6789 | 78.49 | 30'56 |
NB | 73.31 | 77.53 | 0.6605 | 72.11 | 58'11 | |
SVM | 82.76 | 89.89 | 0.7989 | 87.26 | 210'29 | |
DT | 93.00 | 96.38 | 0.8986 | 94.30 | 91'07 | |
LibD3C | 94.12 | 95.24 | 0.9433 | 96.81 | 2811'48 | |
RF | 95.15 | 96.28 | 0.9506 | 96.68 | 385'52 | |
XGBoost | 98.54 | 97.27 | 0.9492 | 98.94 | 78'08 |
Datasets . | Classifiers . |
LOOCV
. |
||||
---|---|---|---|---|---|---|
Sn (%) . | Sp (%) . | MCC . | ACC (%) . | Running time (Min) . | ||
M317 | KNN | 79.20 | 91.27 | 0.6844 | 79.53 | 10'16 |
NB | 70.75 | 83.64 | 0.5637 | 70.87 | 18'53 | |
SVM | 83.89 | 91.27 | 0.7517 | 83.66 | 140'12 | |
DT | 91.49 | 95.55 | 0.8705 | 91.54 | 28'37 | |
LibD3C | 93.45 | 93.28 | 0.9020 | 93.50 | 1003'17 | |
RF | 94.65 | 97.21 | 0.9205 | 94.69 | 185'03 | |
XGBoost | 96.54 | 98.27 | 0.9492 | 97.65 | 19'42 | |
M983 | KNN | 77.97 | 80.11 | 0.6789 | 78.49 | 30'56 |
NB | 73.31 | 77.53 | 0.6605 | 72.11 | 58'11 | |
SVM | 82.76 | 89.89 | 0.7989 | 87.26 | 210'29 | |
DT | 93.00 | 96.38 | 0.8986 | 94.30 | 91'07 | |
LibD3C | 94.12 | 95.24 | 0.9433 | 96.81 | 2811'48 | |
RF | 95.15 | 96.28 | 0.9506 | 96.68 | 385'52 | |
XGBoost | 98.54 | 97.27 | 0.9492 | 98.94 | 78'08 |
Prediction results of protein submitochondrial dataset M317 and M983 under different algorithms
Datasets . | Classifiers . |
LOOCV
. |
||||
---|---|---|---|---|---|---|
Sn (%) . | Sp (%) . | MCC . | ACC (%) . | Running time (Min) . | ||
M317 | KNN | 79.20 | 91.27 | 0.6844 | 79.53 | 10'16 |
NB | 70.75 | 83.64 | 0.5637 | 70.87 | 18'53 | |
SVM | 83.89 | 91.27 | 0.7517 | 83.66 | 140'12 | |
DT | 91.49 | 95.55 | 0.8705 | 91.54 | 28'37 | |
LibD3C | 93.45 | 93.28 | 0.9020 | 93.50 | 1003'17 | |
RF | 94.65 | 97.21 | 0.9205 | 94.69 | 185'03 | |
XGBoost | 96.54 | 98.27 | 0.9492 | 97.65 | 19'42 | |
M983 | KNN | 77.97 | 80.11 | 0.6789 | 78.49 | 30'56 |
NB | 73.31 | 77.53 | 0.6605 | 72.11 | 58'11 | |
SVM | 82.76 | 89.89 | 0.7989 | 87.26 | 210'29 | |
DT | 93.00 | 96.38 | 0.8986 | 94.30 | 91'07 | |
LibD3C | 94.12 | 95.24 | 0.9433 | 96.81 | 2811'48 | |
RF | 95.15 | 96.28 | 0.9506 | 96.68 | 385'52 | |
XGBoost | 98.54 | 97.27 | 0.9492 | 98.94 | 78'08 |
Datasets . | Classifiers . |
LOOCV
. |
||||
---|---|---|---|---|---|---|
Sn (%) . | Sp (%) . | MCC . | ACC (%) . | Running time (Min) . | ||
M317 | KNN | 79.20 | 91.27 | 0.6844 | 79.53 | 10'16 |
NB | 70.75 | 83.64 | 0.5637 | 70.87 | 18'53 | |
SVM | 83.89 | 91.27 | 0.7517 | 83.66 | 140'12 | |
DT | 91.49 | 95.55 | 0.8705 | 91.54 | 28'37 | |
LibD3C | 93.45 | 93.28 | 0.9020 | 93.50 | 1003'17 | |
RF | 94.65 | 97.21 | 0.9205 | 94.69 | 185'03 | |
XGBoost | 96.54 | 98.27 | 0.9492 | 97.65 | 19'42 | |
M983 | KNN | 77.97 | 80.11 | 0.6789 | 78.49 | 30'56 |
NB | 73.31 | 77.53 | 0.6605 | 72.11 | 58'11 | |
SVM | 82.76 | 89.89 | 0.7989 | 87.26 | 210'29 | |
DT | 93.00 | 96.38 | 0.8986 | 94.30 | 91'07 | |
LibD3C | 94.12 | 95.24 | 0.9433 | 96.81 | 2811'48 | |
RF | 95.15 | 96.28 | 0.9506 | 96.68 | 385'52 | |
XGBoost | 98.54 | 97.27 | 0.9492 | 98.94 | 78'08 |
It can be seen from Table 6 that when using the XGBoost to predict the dataset M317, the ACC is 97.65%, which is 2.96–26.78% higher than the KNN, NB, SVM, DT, LibD3C and RF classifiers. From the perspective of the running time of the algorithm, the XGBoost is significantly shorter than the DT, RF and LibD3C classifiers. In dataset M983, the ACC of 98.94% was 2.13–26.83% higher than other classifiers. The specific prediction results of dataset M317 and M983 are shown in Supplementary Sr4, Tables S7 and S8 . To more intuitively compare the predictive performance of the seven classifiers, the variation of the overall prediction ACC in the datasets M317 and M983 under different classification algorithms is shown in Supplementary Figure S4 .
In summary, in the protein submitochondrial localization prediction, the two datasets used the XGBoost classifier to obtain the highest overall ACC, which is significantly higher than the other six classifiers. This is because the XGBoost introduces regularization terms, making the model simple with good generalization performance.
In this paper, the datasets M317 and M983 were extracted by the Bi-gram PSSM, PseAAC, -Gap DC and ACF methods to generate the 965-dimensional feature vector, and then the sample data was processed by SMOTE to make the data class balance. ReliefF was used to select the optimal feature vector. Finally, the XGBoost was input to the submitochondrial protein localization, and the prediction result was tested by the LOOCV method. To evaluate the generalization performance of the model, the independent dataset M495 was used to evaluate the predictive ability of the SubMito-XGBoost model, with the optimal parameters of , , and the optimal dimension is 350. The specific protein submitochondrial prediction results are shown in Table 7 .
Datasets . | Structure class . | Sn (%) . | Sp (%) . | MCC . | ACC (%) . |
---|---|---|---|---|---|
M317 | Inner membrane | 95.36 | 99.14 | 0.9524 | 97.65 |
Matrix | 97.93 | 97.73 | 0.9539 | ||
Outer membrane | 96.34 | 97.94 | 0.9414 | ||
M983 | Inner membrane | 99.06 | 96.16 | 0.9559 | 98.94 |
Matrix | 97.57 | 99.52 | 0.9595 | ||
Outer membrane | 98.99 | 99.50 | 0.9604 | ||
M495 | Inner membrane | 96.67 | 91.22 | 0.9026 | 94.83 |
Matrix | 93.80 | 98.63 | 0.9318 | ||
Outer membrane | 94.54 | 97.02 | 0.9191 |
Datasets . | Structure class . | Sn (%) . | Sp (%) . | MCC . | ACC (%) . |
---|---|---|---|---|---|
M317 | Inner membrane | 95.36 | 99.14 | 0.9524 | 97.65 |
Matrix | 97.93 | 97.73 | 0.9539 | ||
Outer membrane | 96.34 | 97.94 | 0.9414 | ||
M983 | Inner membrane | 99.06 | 96.16 | 0.9559 | 98.94 |
Matrix | 97.57 | 99.52 | 0.9595 | ||
Outer membrane | 98.99 | 99.50 | 0.9604 | ||
M495 | Inner membrane | 96.67 | 91.22 | 0.9026 | 94.83 |
Matrix | 93.80 | 98.63 | 0.9318 | ||
Outer membrane | 94.54 | 97.02 | 0.9191 |
Datasets . | Structure class . | Sn (%) . | Sp (%) . | MCC . | ACC (%) . |
---|---|---|---|---|---|
M317 | Inner membrane | 95.36 | 99.14 | 0.9524 | 97.65 |
Matrix | 97.93 | 97.73 | 0.9539 | ||
Outer membrane | 96.34 | 97.94 | 0.9414 | ||
M983 | Inner membrane | 99.06 | 96.16 | 0.9559 | 98.94 |
Matrix | 97.57 | 99.52 | 0.9595 | ||
Outer membrane | 98.99 | 99.50 | 0.9604 | ||
M495 | Inner membrane | 96.67 | 91.22 | 0.9026 | 94.83 |
Matrix | 93.80 | 98.63 | 0.9318 | ||
Outer membrane | 94.54 | 97.02 | 0.9191 |
Datasets . | Structure class . | Sn (%) . | Sp (%) . | MCC . | ACC (%) . |
---|---|---|---|---|---|
M317 | Inner membrane | 95.36 | 99.14 | 0.9524 | 97.65 |
Matrix | 97.93 | 97.73 | 0.9539 | ||
Outer membrane | 96.34 | 97.94 | 0.9414 | ||
M983 | Inner membrane | 99.06 | 96.16 | 0.9559 | 98.94 |
Matrix | 97.57 | 99.52 | 0.9595 | ||
Outer membrane | 98.99 | 99.50 | 0.9604 | ||
M495 | Inner membrane | 96.67 | 91.22 | 0.9026 | 94.83 |
Matrix | 93.80 | 98.63 | 0.9318 | ||
Outer membrane | 94.54 | 97.02 | 0.9191 |
According to the result in Table 7 , the ACC of dataset M317, M983 and M495 are 97.65, 98.94 and 94.83%, respectively. The experimental results show that the SubMito-XGBoost model can accurately predict the location of the inner membrane, matrix and outer membrane proteins in the two datasets. The sensitivity of the intermembrane protein, matrix and outer membrane proteins were 96.67, 93.80 and 94.54%, respectively. The above results indicate that the SubMito-XGBoost method proposed in this paper has excellent performance and can effectively predict the locations of protein submitochondria.
In order to objectively evaluate the predictions of the SubMito-XGBoost method, Figure 4 compares the results of this method with other prediction methods on datasets M317, M983 and M495. The specific prediction results of different methods can be seen in Supplementary Table S9 .
It can be seen from Figure 4 that for the dataset M317, the ACC of SubMito-XGBoost is 97.7%, which is higher than that obtained by other prediction methods. The SubMito proposed by Jiao et al. ( Jiao and Du, 2017 ) is 12.5% higher. Compared with Nanni et al. ( Nanni and Lumini, 2008 ), the ACC of the GP-Loc is 8.7% higher. The SubMito is only 51.2%, which is far lower than the SubMito-XGBoost. The results show that the SubMito-XGBoost can reduce the prediction error caused by sample imbalance, the ACC is significantly higher than other existing models, and better prediction results are obtained on the dataset M317. The prediction of the dataset M983 showed ACC of 98.9%. The SubMito-XGBoost is 9.9% higher the SubMito-PSPCP proposed by Du et al. ( Du and Yu, 2013 ). After the model was processed by SMOTE method, the sensitivity was increased by 17.9% and 21.6%. Compared with the method proposed by Ahmad et al. ( Ahmad et al. , 2016 ), the prediction of intermembrane proteins was increased by 12.0%.
For the independent test set M495, the SubMito-XGBoost yielded ACC of 94.8%, which is higher than the TetraMito constructed by Lin et al. (2013) . In summary, the SubMito-XGBoost method has good prediction results on all three submitochondrial datasets, which sufficiently indicates that the prediction method constructed in this paper can become a dominant protein submitochondrial localization prediction tool.
Chloroplast proteins are likely to use targeting mechanisms with similar mitochondrial protein features because mitochondria, chloroplasts and bacteria have many similarities in DNA molecular structure and protein synthesis properties. Therefore, when predicting protein submitochondrial localization, it is important to study chloroplasts and perform separate studies of organisms without chloroplasts. In order to test the generalization performance of the SubMito-XGBoost model, the original protein sub-mitochondrial dataset was divided into plant protein submitochondrial dataset and non-plant protein submitochondrial dataset for localization prediction based on species attribute. Among them, there are 172 sub-mitochondrial samples of plant proteins and 1, 623 non-plant samples, with specific data structure shown in Supplementary Table S10 . In the plant protein submitochondrial dataset, the accuracy of the SubMito-XGBoost method was 96.09%, and the accuracy of the intermembrane, matrix and outer membrane proteins were 98.89, 94.29 and 94.35%, respectively. In the non-plant protein submitochondrial dataset, the accuracy of SubMito-XGBoost was 90.94%, and the accuracy of the intermembrane, matrix and outer membrane proteins were 97.21, 83.57 and 87.88%, respectively.
This paper chose XGBoost algorithm for the comparison with other machine learning algorithms, namely decision tree algorithm, random forest, support vector machine, naive Bayes, LibD3C and K-nearest neighbor. The overall prediction accuracy and prediction accuracy of each classifier were obtained, and all the predicted results for plant and non-plant protein submitochondria under different algorithms are shown in Supplementary Tables S11 and S12 . In the plant and non-plant protein submitochondrial datasets, the SubMito-XGBoost method showed advanced performance. Specifically, the overall prediction accuracy of plant protein submitochondria was 1.06–12.24% higher than that of KNN, NB, SVM, DT, LibD3C and RF algorithms. The overall prediction accuracy of non-plant protein submitochondria was 1.31–18.92% higher than that predicted by KNN, NB, SVM, DT, LibD3C and RF. In summary, the method of this paper can effectively predict plant and non-plant protein submitochondria.
In the big data era, the sequence data accumulated in protein databases has grown exponentially. How to use machine learning methods to accurately predict the locations of protein submitochondria has become a challenging task in bioinformatics and proteomics research. This paper proposes a new protein submitochondrial localization prediction method, SubMito-XGBoost, to fuse the composition information, physicochemical properties, and evolutionary information of protein sequences. It also avoids the model overfitting problem and improves the model generalization ability using the SMOTE algorithm. Meanwhile, the ReliefF algorithm takes into account the correlation between classes and features, and weights the protein sequence features and puts them into the model, making SubMito-XGBoost very effective to process multi-class high-dimensional data. When positioning multiple classes of protein submitochondria, the ACC using XGBoost is improved by combining various weak classifiers into a strong classifier, and multi-threading is supported to improve operational efficiency. The experimental results showed that SubMito-XGBoost can not only effectively improve the prediction ACC of protein submitochondrial localization, but also accurately predict the location of the intermembrane, matrix and outer membrane proteins. We believe that SubMito-XGBoost can be used to predict protein submitochondrial localization and other substructure localization. Although SubMito-XGBoost improves the ACC of protein submitochondrial prediction to a certain extent, there is still much room for improvement in prediction ACC and algorithm efficiency. Additionally, the proposed method is only applicable to the prediction of protein submitochondrial localization and cannot accurately identify non-mitochondrial proteins. In the future, we will try more feature extraction and feature selection methods to improve the performance of SubMito-XGBoost, build large-scale benchmarking datasets, and implement deep learning method to predict protein submitochondrial localization.
We thank anonymous reviewers for valuable suggestions and comments. We also thank Ms Marlena Merling for her help in polishing the language of this manuscript.
This work was supported by the National Nature Science Foundation of China (No. 61863010), the Key Research and Development Program of Shandong Province of China (2019GGX101001), the Natural Science Foundation of Shandong Province of China (Nos. ZR2017MA014 and ZR2018MC007), the Project of Shandong Province Higher Educational Science and Technology Program (No. J17KA159) and the Scientific Research Fund of Hunan Provincial Key Laboratory of Mathematical Modelling and Analysis in Engineering (No. 2018MMAEZD10). This work used the Extreme Science and Engineering Discovery Environment, which is supported by the National Science Foundation (No. ACI-1548562).
Conflict of Interest : none declared.
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
Close
![]() |
风流倜傥的香烟 · AI对抗攻击-黑盒攻击 - 探索;实践的BLOG 2 月前 |