Table 1: The subject systems System S_中国高校课件下载中心

点击下载：An Empirical Study on Dependence Clusters for Effort-Aware Fault-Proneness Prediction

正在加载图片...

Table 1:The subject systems Subject release Previous release Fixing release System Version Release Total functions faulty faulty Version Release Version Release date SLoC functions functions date date Bash 3.2 2006-10-11 49608 1947 68 3.49% 3.1 2005-12-08 3.2.57 2014-11-07 Gcc-core 4.0.0 2005-04-21 422182 13612 430 3.16% 3.4.0 2004-04-20 4.0.4 2007-01-31 Gimp 2.8.0 2012-05-12 557436 19978 818 4.10% 2.7.0 2009-08.-15 2.8.16 2015-11-21 Glbc 2.1.1 1999-05-24 172559 5923 417 7.04% 201 1997-02-04 213 2000-02-25 Gstreamer 1.0.0 2012-09-24 75985 3946 146 3.70% 0.11.90 2011-08-02 1.0.10 2013-08-30 date of the fixing release.The subject projects are moderate Size Metric to large-scale software systems (from 49 to 557 KSLOC). ROI They have only a small number of faulty functions (from approximately 3%to 7%of all functions).Furthermore,on Clusters:del.de2.de3.. Speamman rank correlation average,the fixing release comes out approximately 3 years after the subject version is released.We believe 3 years is Fault density sufficiently long for the majority of faulty functions to be Figure 2:Overview of the analysis method for RQi identified and fixed. projects varied from 158 to 4083.Of these five projects 4.2 Data Collection procedure GCC has the largest dependence cluster(that includes 4083 We collected data from the above mentioned five projects. functions).This,to a certain extent,indicates that GCC is For each subject system,we obtained the fault data and more complex than the other systems. identified dependence clusters for further analysis using the following steps.At the first step,we determined the faulty or not faulty label for each function.As mentioned before,any 5.METHODOLOGY AND RESULTS of the bug-fixing releases did not add any new features to the In the section.we describe the research method and report corresponding system.For each of the subject systems,we the experimental results in detail with respect to each of the compared these versions with the latest bug-fixing releases research questions. (identified by the last two columns of Table 1)and determined 5.1 which functions were changed.If a function was changed. RQ1.Are larger dependence clusters more it was marked as a faulty.Otherwise,it was marked as fault-prone? not-faulty.This method has been used to determine faulty In the following,we describe the research method used functions before 42. and report the experimental result to address RQ1. Our second step,collected the dependence clusters for each system using the Understand tool and an R package 5.1.I Research method igraph2.For each subject system,we first generated an Figure 2 provides an overview of the analysis method Understand database.Then,we extracted the call and data used to address RQ1.As can be seen,in order to answer dependencies for all functions from the generated database. RQ1,we use Spearman's rank correlation to investigate the In this way we obtained the SDG of the subject system. relationship between the size of dependence clusters and the After that,we used the function cluster in igraph package fault density of dependence clusters.Here,fault density to identify all dependence clusters.Each system's functions refers to the percentage of faulty functions in the dependence are divided into two groups:functions inside and functions clusters.There are two basic metrics to measure the size outside dependence clusters. of a graph:Size and Ties.Size is the number of functions Table 2:The dependence clusters in subject systems within dependence clusters while Ties is the number of edges between functions in dependence clusters.In this study, functions Size of we first use igraph to compute these two metrics for all System functions clusters inside clusters largest cluster dependence clusters in each subject system.We choose BASH 1947 46.2 483 Spearman's rank correlation rather than Pearson's linear GCC 13612 139 34.9 4083 GIMP 19978 363 14.2 158 correlation since the former is a non-parametric method and GLIB 5923 105 11.6 277 makes no normality assumptions on variables 30.According GSTR 3946 59 15.2 170 to Ott and Longnecker [30,for correlation coefficient rho,the correlation is considered either weak (rho 0.5),moderate Table 2 describes the clusters in the subject projects.The (0.5<rhol<0.8),or strong(0.8≤rhol≤1.0) third to the fifth columns respectively show the number of clusters,the percentage of functions inside clusters,and the Table 3:Spearman correlation for dependence clus- size of the largest cluster in each subject project.From Table ters size and fault density (RQ1) 2,we can see that there exist many dependence clusters Size System clusters Ties (from 41 to 363)in these projects.Furthermore,from 11.6% rho to 46.2%of the total functions are found inside dependence BASH 41 0.230 0.148 0.315 0.045 GCC 0.299 <0.001 0.233 0.006 clusters.Additionally,the size of the largest cluster in these 139 GIMP 363 0.150 0.004 0.195 <0.001 GLIB 105 0.092 0350 0.113 0.249 https://scitools.com GSTR 59 0.345 0.007 0.295 0023 http://igraph.org/r/ 299Table 1: The subject systems System Subject release Previous release Fixing release Version Release Total # functions # faulty % faulty Version Release Version Release date SLoC functions functions date date Bash 3.2 2006-10-11 49 608 1 947 68 3.49% 3.1 2005-12-08 3.2.57 2014-11-07 Gcc-core 4.0.0 2005-04-21 422 182 13 612 430 3.16% 3.4.0 2004-04-20 4.0.4 2007-01-31 Gimp 2.8.0 2012-05-12 557 436 19 978 818 4.10% 2.7.0 2009-08-15 2.8.16 2015-11-21 Glibc 2.1.1 1999-05-24 172 559 5 923 417 7.04% 2.0.1 1997-02-04 2.1.3 2000-02-25 Gstreamer 1.0.0 2012-09-24 75 985 3 946 146 3.70% 0.11.90 2011-08-02 1.0.10 2013-08-30 date of the fixing release. The subject projects are moderate to large-scale software systems (from 49 to 557 KSLOC). They have only a small number of faulty functions (from approximately 3% to 7% of all functions). Furthermore, on average, the fixing release comes out approximately 3 years after the subject version is released. We believe 3 years is sufficiently long for the majority of faulty functions to be identified and fixed. 4.2 Data Collection Procedure We collected data from the above mentioned five projects. For each subject system, we obtained the fault data and identified dependence clusters for further analysis using the following steps. At the first step, we determined the faulty or not faulty label for each function. As mentioned before, any of the bug-fixing releases did not add any new features to the corresponding system. For each of the subject systems, we compared these versions with the latest bug-fixing releases (identified by the last two columns of Table 1) and determined which functions were changed. If a function was changed, it was marked as a faulty. Otherwise, it was marked as not-faulty. This method has been used to determine faulty functions before [42]. Our second step, collected the dependence clusters for each system using the Understand1 tool and an R package igraph2 . For each subject system, we first generated an Understand database. Then, we extracted the call and data dependencies for all functions from the generated database. In this way we obtained the SDG of the subject system. After that, we used the function cluster in igraph package to identify all dependence clusters. Each system’s functions are divided into two groups: functions inside and functions outside dependence clusters. Table 2: The dependence clusters in subject systems % functions Size of System # functions # clusters inside clusters largest cluster BASH 1 947 41 46.2 483 GCC 13 612 139 34.9 4083 GIMP 19 978 363 14.2 158 GLIB 5 923 105 11.6 277 GSTR 3 946 59 15.2 170 Table 2 describes the clusters in the subject projects. The third to the fifth columns respectively show the number of clusters, the percentage of functions inside clusters, and the size of the largest cluster in each subject project. From Table 2, we can see that there exist many dependence clusters (from 41 to 363) in these projects. Furthermore, from 11.6% to 46.2% of the total functions are found inside dependence clusters. Additionally, the size of the largest cluster in these 1https://scitools.com 2http://igraph.org/r/ Clusters: dc1, dc2, dc3, ... Spearman rank correlation Size Metric Fault density RQ1 Figure 2: Overview of the analysis method for RQ1 projects varied from 158 to 4083. Of these five projects, GCC has the largest dependence cluster (that includes 4083 functions). This, to a certain extent, indicates that GCC is more complex than the other systems. 5. METHODOLOGY AND RESULTS In the section, we describe the research method and report the experimental results in detail with respect to each of the research questions. 5.1 RQ1. Are larger dependence clusters more fault-prone? In the following, we describe the research method used and report the experimental result to address RQ1. 5.1.1 Research method Figure 2 provides an overview of the analysis method used to address RQ1. As can be seen, in order to answer RQ1, we use Spearman’s rank correlation to investigate the relationship between the size of dependence clusters and the fault density of dependence clusters. Here, fault density refers to the percentage of faulty functions in the dependence clusters. There are two basic metrics to measure the size of a graph: Size and Ties. Size is the number of functions within dependence clusters while Ties is the number of edges between functions in dependence clusters. In this study, we first use igraph to compute these two metrics for all dependence clusters in each subject system. We choose Spearman’s rank correlation rather than Pearson’s linear correlation since the former is a non-parametric method and makes no normality assumptions on variables [30]. According to Ott and Longnecker [30], for correlation coefficient rho, the correlation is considered either weak (|rho| ≤ 0.5), moderate (0.5 < |rho| < 0.8), or strong (0.8 ≤ |rho| ≤ 1.0). Table 3: Spearman correlation for dependence clusters size and fault density (RQ1) System # clusters Size Ties rho p rho p BASH 41 0.230 0.148 0.315 0.045 GCC 139 0.299 < 0.001 0.233 0.006 GIMP 363 0.150 0.004 0.195 < 0.001 GLIB 105 0.092 0.350 0.113 0.249 GSTR 59 0.345 0.007 0.295 0.023 299

<<向上翻页向下翻页>>

点击下载：An Empirical Study on Dependence Clusters for Effort-Aware Fault-Proneness Prediction