正在加载图片...
344 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,VOL 41,NO.4,APRIL 2015 TABLE 7 Description Statistics for Each Data Set Bash 3.0 Gcc-core 3.4.0 Gimp 2.0.0 Subversion 1.2.0 Vim 6.2 Metric 25% 50%75% 25% 50% 75% 25% 50%75% 25% 50% 75% 25% 50%75% SLOC 14 26 10 18 39 9 17 31 10 16 31 9.25 17 38 FANIN 3 5 9 15 3 9 11.5 12 FANOUT 2 3 6 9 3 10 3 9 9 NPATH 2 18 8 16 Cyclomatic 3 6 2 9 4 9 8 CyclomaticModified 3 6 2 -2 4 9 8 CyclomaticStrict 3 2 12 5 1 10 10 Essential 1 1 4 Knots 0 0 0 5 0 0 0 32 0 4 Nesting 0 1 2 3 0 0 2 3 MaxEssentialKnots 0 0 1.5 0 0 2 3 MinEssentialKnots 0 1 0 0 0 0 3 n1 12 17 10 14 10 13 9 14 21 n2 12 17 18 32 0 12 NI 18 37 6 25 51.5 19 93.5 21 46 114 N2 13 25 53 19 40 号 17 183933 19 no 71.5 16 3 5 Added 0 0 0 0 0 0.067 0 0.176 0.882 0 0 0.039 0 0 0 Deleted 0 0 0 0.667 0.867 0 0.819 0 0.833 0 0 Modified 0 0 0 0 0.037 0.111 0 0 0 0 0 0 0 0 Coverage 0.500 0.773 0.583 0.840 0.644 0.838 1 0.667 0.812 1 0.496 0.737 MaxCoverage 0.667 0.928 1 0.778 0.961 0.850 0.965 0.875 0.977 1 0.727 0.936 MinCoverage 0.286 0.679 1 0.350 0.765 1 0.380 0.721 1 0.378 0.667 1 0.222 0.605 Overlap 0.494 0.963 1 0.623 0.971 0.672 0.946 0.639 0.873 1 0.454 0.905 Tightness 0.182 0.653 0.281 0.750 0.350 0.710 0.353 0.652 0.143 0.571 WFC 0.481 0.847 0.590 0.895 0.692 0.917 0.729 0.889 0.454 0.813 SBFC 0.3560.903 0.469 0.892 0.520 0.817 0.5390.739 0.290 0.754 NHD 0.6020.758 1 0.6440.798 0.644 0.778 0.6400.756 0.6190.757 process metrics in effort-aware post-release fault-proneness data.Of the code metrics,SLOC,most cyclomatic prediction(RQ4). complexity metrics and Halstead's software science metrics fall in the same component(PC1).This indi- 5.1 RQ1:Are Slice-Based Cohesion Metrics cates that cyclomatic complexity metrics and Redundant with Respect to the Most Commonly Halstead's software science metrics essentially mea- Used Code and Process Metrics? sure the size dimension.Knots,MaxEssentialKnots, We use the results from principal component analysis to and MinEssentialKnots fall in a component (PC3)dif- answer RQ1.Table 8 summarizes the rotated components ferent from most of the other code metrics (that from PCA for each data set (the detailed information mainly go in PC1),and the same happens for FANIN are shown in Tables 13,14,15,16,and 17 in Appendix B, and for NPATH.In particular,code churn metrics available in the online supplemental material).In Table 8, defines new PCs of their own compared with code the first and second columns are respectively the system metrics,indicating that process metrics and code name and the PC name.The third to fifth columns report metrics measure different information. the eigenvalue,the percentage of variance explained by ● Slice-based cohesion metrics are distributed in three each rotated component,and the cumulative percentage of distinct orthogonal components,which describe variance explained,respectively.The sixth column shows around 28 percent of the variance in the data.As which metrics are clustered into each PC.The last column can be seen,most slice-based cohesion metrics fall marks the PCs consisting of only slice-based cohesion in the same component(PC2).It is interesting that metrics.In particular,slice-based cohesion metrics are NHD always defines a PC of its own,regardless of shown in bold face. which data set is considered.This indicates that From Table 8,we can see that the metrics(the most com- NHD is different from the other slice-based cohe- monly used code and process metrics and the slice-based sion metrics.This is also true for MaxCoverage to a cohesion metrics)are clustered into ten to thirteen distinct limited extent. orthogonal components,which describe around 91~95 per- The core observation from Table 8 is that there is no over- cent of the variance in the data.Furthermore,we have the lap between all the PCs by the most commonly used code following observations: and process metrics and all the PCs by slice-based cohesion metrics.In other words,slice-based cohesion metrics indeed The most commonly used code and process metrics define the PCs of their own compared with the most com- are distributed in seven to 10 components,which monly used code and process metrics.Therefore,our PCA describe around 72 percent of the variance in the analysis results,from five different data sets,consistentlyprocess metrics in effort-aware post-release fault-proneness prediction (RQ4). 5.1 RQ1: Are Slice-Based Cohesion Metrics Redundant with Respect to the Most Commonly Used Code and Process Metrics? We use the results from principal component analysis to answer RQ1. Table 8 summarizes the rotated components from PCA for each data set (the detailed information are shown in Tables 13, 14, 15, 16, and 17 in Appendix B, available in the online supplemental material). In Table 8, the first and second columns are respectively the system name and the PC name. The third to fifth columns report the eigenvalue, the percentage of variance explained by each rotated component, and the cumulative percentage of variance explained, respectively. The sixth column shows which metrics are clustered into each PC. The last column marks the PCs consisting of only slice-based cohesion metrics. In particular, slice-based cohesion metrics are shown in bold face. From Table 8, we can see that the metrics (the most com￾monly used code and process metrics and the slice-based cohesion metrics) are clustered into ten to thirteen distinct orthogonal components, which describe around 9195 per￾cent of the variance in the data. Furthermore, we have the following observations:  The most commonly used code and process metrics are distributed in seven to 10 components, which describe around 72 percent of the variance in the data. Of the code metrics, SLOC, most cyclomatic complexity metrics and Halstead’s software science metrics fall in the same component (PC1). This indi￾cates that cyclomatic complexity metrics and Halstead’s software science metrics essentially mea￾sure the size dimension. Knots, MaxEssentialKnots, and MinEssentialKnots fall in a component (PC3) dif￾ferent from most of the other code metrics (that mainly go in PC1), and the same happens for FANIN and for NPATH. In particular, code churn metrics defines new PCs of their own compared with code metrics, indicating that process metrics and code metrics measure different information.  Slice-based cohesion metrics are distributed in three distinct orthogonal components, which describe around 28 percent of the variance in the data. As can be seen, most slice-based cohesion metrics fall in the same component (PC2). It is interesting that NHD always defines a PC of its own, regardless of which data set is considered. This indicates that NHD is different from the other slice-based cohe￾sion metrics. This is also true for MaxCoverage to a limited extent. The core observation from Table 8 is that there is no over￾lap between all the PCs by the most commonly used code and process metrics and all the PCs by slice-based cohesion metrics. In other words, slice-based cohesion metrics indeed define the PCs of their own compared with the most com￾monly used code and process metrics. Therefore, our PCA analysis results, from five different data sets, consistently TABLE 7 Description Statistics for Each Data Set Metric Bash 3.0 Gcc-core 3.4.0 Gimp 2.0.0 Subversion 1.2.0 Vim 6.2 25% 50% 75% 25% 50% 75% 25% 50% 75% 25% 50% 75% 25% 50% 75% SLOC 8 14 26 10 18 39 9 17 31 10 16 31 9.25 17 38 FANIN 3 4 7 5 9 15 3 5 9 5 7 11.5 4 7 12 FANOUT 2 3 6 2 4 9 3 5 10 3 5 9 2 4 9 NPATH 1 3 7 2 4 18 1 2 5 1 3 8 2 4 16 Cyclomatic 1 3 6 2 4 9 1 2 4 1 4 9 2 3 8 CyclomaticModified 1 3 6 2 4 8 1 2 4 1 4 9 2 3 8 CyclomaticStrict 1 3 8 2 5 12 1 2 5 1 4 10 2 4 10 Essential 1 1 3 1 1 4 1 1 1 1 3 7 1 1 4 Knots 0 0 3 0 1 5 0 0 1 0 2 8 0 1 4 Nesting 0 1 2 1 2 3 0 1 2 0 2 3 1 1 3 MaxEssentialKnots 0 0 1.5 0 0 3 0 0 0 0 2 7 0 0 3 MinEssentialKnots 0 0 1 0 0 3 0 0 0 0 2 7 0 0 3 n1 8 12 17 9 14 20 8 10 14 10 13 17 9 14 21 n2 7 12 21 11 18 32 11 18 30 12 19 32 10 17 32 N1 18 37 75 25 51.5 114 19 39 81 25 47 93.5 21 46 114 N2 13 25 53 19 40 88 17 33 70 19 36 71.5 16 36 85 Added 0 0 0 0 0 0.067 0 0.176 0.882 0 0 0.039 0 0 0 Deleted 0 0 0 0.667 0.867 1 0 0 0.819 0 0 0.833 0 0 0 Modified 0 0 0 0 0.037 0.111 0 0 0 0 0 0 0 0 0 Coverage 0.500 0.773 1 0.583 0.840 1 0.644 0.838 1 0.667 0.812 1 0.496 0.737 1 MaxCoverage 0.667 0.928 1 0.778 0.961 1 0.850 0.965 1 0.875 0.977 1 0.727 0.936 1 MinCoverage 0.286 0.679 1 0.350 0.765 1 0.380 0.721 1 0.378 0.667 1 0.222 0.605 1 Overlap 0.494 0.963 1 0.623 0.971 1 0.672 0.946 1 0.639 0.873 1 0.454 0.905 1 Tightness 0.182 0.653 1 0.281 0.750 1 0.350 0.710 1 0.353 0.652 1 0.143 0.571 1 WFC 0.481 0.847 1 0.590 0.895 1 0.692 0.917 1 0.729 0.889 1 0.454 0.813 1 SBFC 0.356 0.903 1 0.469 0.892 1 0.520 0.817 1 0.539 0.739 1 0.290 0.754 1 NHD 0.602 0.758 1 0.644 0.798 1 0.644 0.778 1 0.640 0.756 1 0.619 0.757 1 344 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 4, APRIL 2015
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有