Supervised Unsupervised 串中百销节年电草时百 FU oRS a装是9-a88是99woEo天ooo Figure 11:Across-project prediction:performance comparison in terms of Popt Sunervised Unsupervised 人 T - 工- T-77 -- 生空● Figure 12:Across-project prediction:performance comparison in terms of ACC Supervised Unsupervised ……↓144↓◆444◆…4 o装o附鸭器%器89帝9品 Figure 13:Scott-Knott test under across-project prediction in terms of Popt Supervised Unsupervised 1 I 8 …↓444↓4… 字瑞9g鸡高— Figure 14:Scott-Knott test under across-project prediction in terms of ACC model (i.e. the EALR model)and the best two simple ond,the AGE model has a larger performance value in all unsupervised models obtained from across-project prediction. entries for Popt.For ACC,except one entry (i.e.the under- The first and the second columns are respectively the training lined 0.427).the AGE model also has a larger performance project and the testing project.The row "AVG"reports for value.On average (see the "AVG"row),the two best simple each prediction model the average Popt and ACC. unsupervised models exhibit more than 40%improvement Table 6:Across-project prediction:best supervised in terms of Popt and exhibit more than 70%improvement in vs.unsupervised models(RO3) terms of ACC Overall,the above observations suggest that simple un- Train Test EALR LT AGE Popt ACC Popt ACC Popt ACC supervised models could be better than the state-of-the-art BUG COL 0.701 0.497 0.858 0.641 0.8680.702 supervised models in effort-aware JIT defect prediction under JDT 0.592 0.281 0.815 0582 0.769 0.490 PLA 0.6950.429 0.770 0.494 0.740 0.427 across-project prediction MOZ 0.5633 0.219 0.660 0.367 0.622 0.240 POS 0.60g30.349 0.81005333 0.7620.432 Table 7:Overall performance:simple unsupervised COL BUG 0.6330.336 0.7260.435 0.7580.432 models vs.the best supervised model JDT 0.418 0.113 0.815 0.582 0.7690.490 CV TW-CV AP PLA 0.506 0.239 0.770 0.494 0.740 0.427 ACC Popt ACC Popt ACC OZ 0.4760148 o6h0 0367 0.622 0240 POS 0.4260.167 0.810 0.533 0.7620.432 NS X ND . BUG 0.6740376 07260.435 0.7580.432 NF A COL 0.564 0.378 0.858 0.641 0.868 0.702 PLA 0.555 0.277 0.770 0.494 0.740 0.427 Entropy 心 MOZ 0.502 0.168 0.660 0367 0.622 0.240 LT POS 0.514 0.299 0.810 0.533 0.762 0.432 FIX PLA BUG 0.6850.367 07260435 0.7580.432 NDEV COL 0.564 0.363 0.858 0.641 0.868 0.702 AGE V / TDT 0.458 0.152 0.815 0582 0.769 0400 NU● MOZ 0.498 0.168 0.660 0.367 0.622 0.240 EXP A A X + A POS 047R0230 08100523 0.7620.432 REXP MOZ BUG 0.7260.435 0.7580.432 + 0.619 0.35 SEXP ●)L 0.482 0.247 0.858 0.641 0.868 0.702 IDT 0.367 0.057 0.815 058) 0.769 0.490 V:(1)significantly better and (2)the magnitude is not trivial PLA 0.414 0.17 0.770 0.494 0.740 0.427 X:(1)significantly worse and (2)the magnitude is not trivial ●】 0.33820.135 0.8100.533 0.7620.432 :(1)the difference is not significant or(2)magnitude is trivial POS BUG 0.6290.35 0.7260.435 0.7580.432 Table 7 summarizes the main results from Section 5.1 】 0B170430 0.858 0.641 0.868 0.702 JDT 0.459 0.159 0.815 0.582 0.769 0.490 Section 5.2,and Section 5.3.As can be seen,the best two PLA 0.5440.261 0.7700.494 0.7400.427 simple unsupervised models are the LT/AGE models as they MOZ 0.4970.159 0.6600.367 0.6220.240 perform significantly better than the best supervised model AVG 0.5370.26R 0.7730.509 0.7530.454 under all of the three prediction settings in terms of both From Table 6.when comparing the simple unsupervised Popt and ACC.In addition,the NF/Entropy/NDEV/SEXP models against the EALR model,we have the following ob- models perform similarly to or better than the best supervised servations.First,the LT unsupervised model has a larger model.Thus,we have a strong evidence to support that performance value in all entries for both Popt and ACC.Sec- many simple unsupervised models perform well compared 165Figure 11: Across-project prediction: performance comparison in terms of Popt Figure 12: Across-project prediction: performance comparison in terms of ACC Figure 13: Scott-Knott test under across-project prediction in terms of Popt Figure 14: Scott-Knott test under across-project prediction in terms of ACC model (i.e. the EALR model) and the best two simple unsupervised models obtained from across-project prediction. The first and the second columns are respectively the training project and the testing project. The row “AVG” reports for each prediction model the average Popt and ACC. Table 6: Across-project prediction: best supervised vs. unsupervised models (RQ3) Train Test EALR LT AGE Popt ACC Popt ACC Popt ACC BUG COL 0.701 0.497 0.858 0.641 0.868 0.702 JDT 0.592 0.281 0.815 0.582 0.769 0.490 PLA 0.695 0.429 0.770 0.494 0.740 0.427 MOZ 0.563 0.219 0.660 0.367 0.622 0.240 POS 0.603 0.349 0.810 0.533 0.762 0.432 COL BUG 0.633 0.336 0.726 0.435 0.758 0.432 JDT 0.418 0.113 0.815 0.582 0.769 0.490 PLA 0.506 0.239 0.770 0.494 0.740 0.427 MOZ 0.476 0.148 0.660 0.367 0.622 0.240 POS 0.426 0.167 0.810 0.533 0.762 0.432 JDT BUG 0.674 0.376 0.726 0.435 0.758 0.432 COL 0.564 0.378 0.858 0.641 0.868 0.702 PLA 0.555 0.277 0.770 0.494 0.740 0.427 MOZ 0.502 0.168 0.660 0.367 0.622 0.240 POS 0.514 0.299 0.810 0.533 0.762 0.432 PLA BUG 0.685 0.367 0.726 0.435 0.758 0.432 COL 0.564 0.363 0.858 0.641 0.868 0.702 JDT 0.458 0.152 0.815 0.582 0.769 0.490 MOZ 0.498 0.168 0.660 0.367 0.622 0.240 POS 0.478 0.230 0.810 0.533 0.762 0.432 MOZ BUG 0.619 0.35 0.726 0.435 0.758 0.432 COL 0.482 0.247 0.858 0.641 0.868 0.702 JDT 0.367 0.057 0.815 0.582 0.769 0.490 PLA 0.414 0.173 0.770 0.494 0.740 0.427 POS 0.382 0.135 0.810 0.533 0.762 0.432 POS BUG 0.629 0.351 0.726 0.435 0.758 0.432 COL 0.617 0.439 0.858 0.641 0.868 0.702 JDT 0.459 0.159 0.815 0.582 0.769 0.490 PLA 0.544 0.261 0.770 0.494 0.740 0.427 MOZ 0.497 0.159 0.660 0.367 0.622 0.240 AVG 0.537 0.263 0.773 0.509 0.753 0.454 From Table 6, when comparing the simple unsupervised models against the EALR model, we have the following observations. First, the LT unsupervised model has a larger performance value in all entries for both Popt and ACC. Second, the AGE model has a larger performance value in all entries for Popt. For ACC, except one entry (i.e. the underlined 0.427), the AGE model also has a larger performance value. On average (see the “AVG” row), the two best simple unsupervised models exhibit more than 40% improvement in terms of Popt and exhibit more than 70% improvement in terms of ACC. Overall, the above observations suggest that simple unsupervised models could be better than the state-of-the-art supervised models in effort-aware JIT defect prediction under across-project prediction. Table 7: Overall performance: simple unsupervised models vs. the best supervised model CV TW-CV AP Popt ACC Popt ACC Popt ACC NS × × × × × × ND ≈ × × × ≈ × NF √ √ √ ≈ √ √ Entropy √ √ ≈ ≈ √ √ LT √ √ √ √ √ √ FIX × × × × × × NDEV √ ≈ ≈ ≈ √ ≈ AGE √ √ √ √ √ √ NUC √ ≈ √ × √ ≈ EXP ≈ ≈ × × ≈ ≈ REXP ≈ ≈ × ≈ ≈ ≈ SEXP ≈ ≈ ≈ ≈ √ ≈ √: (1) significantly better and (2) the magnitude is not trivial ×: (1) significantly worse and (2) the magnitude is not trivial ≈: (1) the difference is not significant or (2) magnitude is trivial Table 7 summarizes the main results from Section 5.1, Section 5.2, and Section 5.3. As can be seen, the best two simple unsupervised models are the LT/AGE models as they perform significantly better than the best supervised model under all of the three prediction settings in terms of both Popt and ACC. In addition, the NF/Entropy/NDEV/SEXP models perform similarly to or better than the best supervised model. Thus, we have a strong evidence to support that many simple unsupervised models perform well compared 165