br Results and discussions br Imbalanced classification
4.4. Results and discussions
4.4.1. Imbalanced classification performance comparison
First of all, we obtain the classification test results of 10-fold cross validation when using the original imbalanced samples to train a classification tree. The accuracy, sensitivity and specificity are 0.9554, 0.0111 and 0.9954, respectively. The results show that using ordinary machine learning method on the imbalanced cancer data can lead to a very low sensitivity.
Table 7 summarizes the classification results of the proposed tree-based imbalanced ensemble classification method, classification tree using undersampled samples and classification tree using oversampled samples. Form Table 7, it can be seen that these three methods all have better overall prediction performance than the method without any strategy of han-dling data imbalance. To compare the three methods in Table 7, the tree-based imbalanced ensemble classification method achieves the best performance in terms of the four indicators. The proposed method also shows the best generalization ability because it has the smallest standard deviation in all indicators. The experimental results indicate that the proposed method can effectively address the advanced cancer survivability prediction.
The proposed ensemble regression method employs semi-random feature selection and MPEI-based ML-210 learner pre-selection together with Bootstrap Sampling to generate base learner pool. In order to examine the effectiveness of this approach, taking fold 1 as an example, we compare the performance of the proposed method with different base learner
Classification results of three imbalanced classification methods.
Method Indicator Mean Value Standard
Classification Method Specificity 0.6600 0.0663
Results of fold 1 under different base learner generation methods.
Fig. 5. The results fold 1 under different threshold of MPEI.
generation strategies. The results in Table 8 shows that using the three strategies at the same time can achieve the best prediction results in terms of three indicators, which indicates that the proposed semi-random feature selection and base learner preselection can effectively improve the prediction performance by handling the process of base learner pool con-struction.
The influence of the threshold of MPEI on the prediction performance is also studied. The results of the proposed method under different threshold of MPEI in fold 1 are shown in Fig. 5. The figure indicates that a very high MPEI or a very low MPEI both cannot ensure a good prediction result because the preselection based on a high MPEI cannot exclude bad base learners, and the preselection based on a low MPEI could exclude some good base learners and reduce the diversity of base learners. An appropriate threshold of MPEI can clearly decrease RMSE, MAE and increase the R2 of the predictions.
4.4.3. Comparison among SRRT-SEM and other regression methods
The experimental results of SRRT-SEM, GEFS, random subspace method, gradient boosting regression tree, random forest, AdaBoost regression tree and regression tree are summarized in Table 9, where 10-fold cross validation results of seven methods in three performance indicators are included and the average values are computed.
Performance comparison of regression methods based on two-stage model.
Method Indicator Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Fold 8 Fold 9 Fold 10 AVE Ensemble
Random Subspace RMSE
Gradient Boosting RMSE
Regression Tree MAE
Random Forest RMSE
AdaBoost Regression RMSE
Regression Tree RMSE
The ANOVA results for accuracy, sensitivity and specificity of comparative methods.