account the human and organization of_中国高校课件下载中心

点击下载：An Empirical Study on Dependence Clusters for Effort-Aware Fault-Proneness Prediction

正在加载图片...

account the human and organization of information 17,19. System dependence graph at function-level 27,29,32].Their work showed that the logical dependencies explained most of the variance in fault proneness while work SubGin Functions inside dependence clusters flow dependencies had more impact than code dependencies In our study,we only investigate syntactic dependencies,but do so at a finer granularity. Oyetoyan et al.[31]studied the impact of cyclic depen- dencies on fault-proneness prediction.They found that most defects and defective components were concentrated in cyclic dependent components.The cyclic dependent components are those in call cycles in call dependence graph at the class level.These structures can also be viewed as dependence clusters.Our study is different from their study mainly with respect to the following aspects:1)our dependence clustering is at a finer granularity (i.e.,function level)while their study is at the file/class level;2)we take into account more types of dependencies,including both call and data dependencies: 3)we study the fault-proneness prediction model in effort- aware evaluations with respect to ranking and classification scenarios;4)we propose a segmented fault-proneness pre- 02 diction model and compare our models with the traditional SubGom Functions outside dependence cluster fault-proneness prediction models. 二二二二二 d:data dependency:c:control/call dependency: 3.RESEARCH QUESTIONS dc :the i-th dependence cluster In this section we discuss our research questions and use Figure 1:An SDG with dependence clusters the example SDG shown in Figure 1 to illustrate the ques- tions.In Figure 1,the nodes (e.g.,fI and f2)are functions stand the effects of dependence clusters on software quality. and the directed edges are dependencies between functions, Little is currently known on this subject.Our study attempts depicting data dependencies (labeled"d")and function call to fill this gap. dependencies (labeled"c"),respectively.In this dependence graph,there are 15 functions and 3 dependence clusters(i.e., dci,dc2,and des).In Figure 1,dci,de2,and des are separate 4. EXPERIMENTAL SETUP clusters since they are maximal strongly connected subgraph- This section first introduces the systems studied before s.The functions are divided into two groups:functions describing the procedure used to collect the experimental inside dependence clusters and functions outside dependence data. clusters.Functions inside dependence clusters and functions outside dependence clusters form the subgraphs SubGin and 4.1 Studied Projects SubGout,respectively. Table 1 summarizes the subjects used in the study.The First,because a code change to one element of a depen- first column is the system name.We use five well-known dence cluster likely ripples to the others elements of the open-source projects as subject systems:Bash(BASH),gcc- cluster,our first research question (RQ1)investigates the core (GCC).GIMP (GIMP).glibc (GLIB).and GStreamer relationship between the size of dependence clusters and (GSTR).Bash is a command language interpreter,gcc-core fault-proneness: is the GNU compiler collection.GIMP is the GNU Image RQ1.Are larger dependence clusters more fault-prone? Manipulation Program,glibc is the GNU Project's imple- Second,functions in Figure 1 are classified into two groups: mentation of the C standard library and GStreamer is a functions inside and outside dependence clusters.Our second multimedia framework.We chose these five projects as sub- research question (RQ2)focuses on the quality of functions jects for two reasons.First,they are well-known open source in these two groups. projects with a publicly available bug-fix history.In particu- RQ2.Are functions inside dependence clusters more fault- lar,the bug-fixing releases do not add any new features to the prone than functions outside dependence clusters? corresponding systems,thus allowing us to collect accurate Third,functions inside dependence clusters form a sub- fault data at the function level.For instance,gcc distribu- dependence graph (e.g.,SubGin of Figure 1).Different tion website states "Note that starting with version 3.3.4,we functions play different roles in this sub-graph.Thus,we set provide bug releases for older release branches for those users up RQ3 for functions inside dependence clusters as follows: who require a very high degree of stability".Second,they are RQ3.Are functions playing more important roles inside non-trivial software systems belonging to several different dependence clusters more fault-prone? domains.In Table 1.the second to the seventh columns are Finally,we aim to examine the usefulness of dependence respectively the version number,the release date,the total clusters for fault-proneness prediction.Therefore,our last source lines of code in the subject release,the number of research question (RQ4)is set up as follows: functions,the number of faulty functions,and the percentage RQ4.Are dependence clusters useful in fault-proneness of faulty functions.The eighth and the ninth columns are the prediction? version number and the release date of the previous version These research questions are important to both software used for computing the process metrics in Section 5.4.The researchers and practitioners,as they help us better under- last two columns are the version number and the release 298account the human and organization of information [17, 19, 27, 29, 32]. Their work showed that the logical dependencies explained most of the variance in fault proneness while work flow dependencies had more impact than code dependencies. In our study, we only investigate syntactic dependencies, but do so at a finer granularity. Oyetoyan et al. [31] studied the impact of cyclic dependencies on fault-proneness prediction. They found that most defects and defective components were concentrated in cyclic dependent components. The cyclic dependent components are those in call cycles in call dependence graph at the class level. These structures can also be viewed as dependence clusters. Our study is different from their study mainly with respect to the following aspects: 1) our dependence clustering is at a finer granularity (i.e., function level) while their study is at the file/class level; 2) we take into account more types of dependencies, including both call and data dependencies; 3) we study the fault-proneness prediction model in effortaware evaluations with respect to ranking and classification scenarios; 4) we propose a segmented fault-proneness prediction model and compare our models with the traditional fault-proneness prediction models. 3. RESEARCH QUESTIONS In this section we discuss our research questions and use the example SDG shown in Figure 1 to illustrate the questions. In Figure 1, the nodes (e.g., f1 and f2 ) are functions and the directed edges are dependencies between functions, depicting data dependencies (labeled “d”) and function call dependencies (labeled “c”), respectively. In this dependence graph, there are 15 functions and 3 dependence clusters (i.e., dc1, dc2, and dc3). In Figure 1, dc1, dc2, and dc3 are separate clusters since they are maximal strongly connected subgraphs. The functions are divided into two groups: functions inside dependence clusters and functions outside dependence clusters. Functions inside dependence clusters and functions outside dependence clusters form the subgraphs SubGin and SubGout, respectively. First, because a code change to one element of a dependence cluster likely ripples to the others elements of the cluster, our first research question (RQ1) investigates the relationship between the size of dependence clusters and fault-proneness: RQ1. Are larger dependence clusters more fault-prone? Second, functions in Figure 1 are classified into two groups: functions inside and outside dependence clusters. Our second research question (RQ2) focuses on the quality of functions in these two groups. RQ2. Are functions inside dependence clusters more faultprone than functions outside dependence clusters? Third, functions inside dependence clusters form a subdependence graph (e.g., SubGin of Figure 1). Different functions play different roles in this sub-graph. Thus, we set up RQ3 for functions inside dependence clusters as follows: RQ3. Are functions playing more important roles inside dependence clusters more fault-prone? Finally, we aim to examine the usefulness of dependence clusters for fault-proneness prediction. Therefore, our last research question (RQ4) is set up as follows: RQ4. Are dependence clusters useful in fault-proneness prediction? These research questions are important to both software researchers and practitioners, as they help us better underFigure 1: An SDG with dependence clusters stand the effects of dependence clusters on software quality. Little is currently known on this subject. Our study attempts to fill this gap. 4. EXPERIMENTAL SETUP This section first introduces the systems studied before describing the procedure used to collect the experimental data. 4.1 Studied Projects Table 1 summarizes the subjects used in the study. The first column is the system name. We use five well-known open-source projects as subject systems: Bash (BASH), gcccore (GCC), GIMP (GIMP), glibc (GLIB), and GStreamer (GSTR). Bash is a command language interpreter, gcc-core is the GNU compiler collection, GIMP is the GNU Image Manipulation Program, glibc is the GNU Project’s implementation of the C standard library and GStreamer is a multimedia framework. We chose these five projects as subjects for two reasons. First, they are well-known open source projects with a publicly available bug-fix history. In particular, the bug-fixing releases do not add any new features to the corresponding systems, thus allowing us to collect accurate fault data at the function level. For instance, gcc distribution website states “Note that starting with version 3.3.4, we provide bug releases for older release branches for those users who require a very high degree of stability”. Second, they are non-trivial software systems belonging to several different domains. In Table 1, the second to the seventh columns are respectively the version number, the release date, the total source lines of code in the subject release, the number of functions, the number of faulty functions, and the percentage of faulty functions. The eighth and the ninth columns are the version number and the release date of the previous version used for computing the process metrics in Section 5.4. The last two columns are the version number and the release 298

<<向上翻页向下翻页>>

点击下载：An Empirical Study on Dependence Clusters for Effort-Aware Fault-Proneness Prediction