to calculate all the maximal strongly connected components of dependence clusters and then used this method to identify found in the System Dependence Graph (SDG).In function- linchpins.which effectively hold a dependence cluster togeth- level dependence clusters,functions are regarded as the basic er.Their results showed that only a few vertices and edges units and each cluster consists of at least two functions.Ac- act as linchpins.After that,Binkley et al.10 introduced a cording to Binkley et al.[7],the function-level dependence simple transformation-based analysis algorithm to identify clusters can offer an effective proxy for the more expensive the impact of global variables on the presence of dependence statement-level dependence clusters. clusters.Their results showed that over half of the studied Based on this observation.we investigate dependence clus- programs include a global variable that was responsible for ters at the function-level.Our main contributions are the the formation of a dependence cluster. following: Beszedes et al.5,6 conducted an empirical study into the 1)We investigate whether the qualities of dependence properties of SEA-based dependence clusters.Such cluster clusters are influenced by their size.Our results show that are defined at the function-level and are based on the static larger dependence clusters tend to be more fault-prone. execute after (SEA)relation.Their empirical results showed 2)We examine whether functions inside dependence clus- that SEA-based dependence clusters occur frequently in pro- ters are more fault-prone than functions outside dependence grams regardless of their domain and size.However,the clusters.The results show that the proportion of faulty func- SEA-based relation only considers call structure information. tions inside dependence clusters is significantly greater than In other words,data dependencies are not considered in their that of functions outside all dependence clusters. study.In contrast,we take the data dependency between 3)We examine whether functions playing more important functions into account in our study. roles inside dependence clusters are more fault-prone.Our Binkley et al.7 compared the following two types of empirical results show that importance metrics are positively dependence clusters:slice-based dependence clusters at the correlated with fault-proneness. statement-level and SEA-based dependence clusters at the 4)Finally,we propose a segmented prediction model for function-level.They found that the less expensive SEA-based fault-proneness prediction.More specifically,we build two dependence clusters could be used as an effective proxy for different fault-proneness prediction models respectively for the more expensive slice-based dependence clusters.Unlike functions inside and functions outside dependence clusters. the above studies,we investigate dependence clusters from The empirical results show that our segmented prediction the perspective of software quality.More specifically,we model can significantly improve the prediction performance investigate whether dependence clusters have practical value in effort-aware evaluations in effort-aware fault-proneness prediction The rest of this paper is organized as follows.In Section 2. we summarize related work.We present our research ques- 2.2 Dependence Analysis in Fault-Proneness tions in Section 3.In Section 4,we describe the experimental Prediction setup,including the subject systems and the data collection method used.In Section 5,we describe the research method Zimmermann and Nagappan [41]calculated network met- and report the detailed experimental results with respect rics based on a dependence graph and used them to predict faults.More specifically,they first generate a SDG at the to each of the research questions.Section 6 discusses our function level.Two kinds of dependencies between functions findings.In Section 7,we examine threats to validity.Final- are then taken into account:call dependencies and data ly,Section 8 concludes the paper and outlines directions for dependencies.They then lift this graph up to binary level future work. since the defects were at the binary level.They considered 2.RELATED WORK the presence of dependencies without considering the multi- plicity of dependencies.After that,they compute network This section summarizes related work on dependence clus- measures on the dependence graph and then evaluated their ters and dependence analysis in fault-proneness prediction. fault-proneness prediction performance on Windows Server 2003.Their results show that the recall of the model built 2.1 Dependence Clusters from network measures was 10%higher than the model built Binkley and Harman [8]originally introduced the concept from complexity measures. of dependence clusters based on program slicing at the s- Ma et al.[23 conducted an empirical study to examine tatement level.They proposed a "same slice size"approach the effectiveness of network measures in the context of effort- to identifying dependence clusters using the SDG.Later, aware fault-proneness prediction,taking into account the Harman et al.18 extended this initial study to include a effort required to inspect predicted faulty module.They larger set of programs.Their empirical results showed that investigated dependence graphs at the file-level and did not the "same slice size"approach was extremely accurate.In consider the multiplicity of dependencies between files.They addition,they found that large dependence clusters were found that most network measures were of practical value surprisingly commonplace and consumed from more than in the context of effort-aware evaluations.Unlike these two 10%of the program to,in some cases,80%of the whole studies,our SDGs are finer grained (i.e.,function-level vs program.Islam et al.20,21 introduced the concept of binary/file-level).In addition,we take into account the coherent dependence clusters.In a coherent dependence clus- multiplicity of dependencies between functions ter,all elements depend upon the same set of elements and Cataldo et al.13 compared the relative impact of the also affect a common set of elements.They used coherent syntactic,logical,and work dependencies on fault-proneness dependence clusters to identify logical functionality within prediction.Syntactic dependencies are code dependencies programs. (e.g.,control and data dependencies).Logical dependencies Binkley and Harman [10,9 introduced a method to mea- focus on deducing dependencies between source code files sure the effect of an SDG vertex or an edge on the formation that are changed together [16].Finally,work dependencies 297to calculate all the maximal strongly connected components found in the System Dependence Graph (SDG). In functionlevel dependence clusters, functions are regarded as the basic units and each cluster consists of at least two functions. According to Binkley et al. [7], the function-level dependence clusters can offer an effective proxy for the more expensive statement-level dependence clusters. Based on this observation, we investigate dependence clusters at the function-level. Our main contributions are the following: 1) We investigate whether the qualities of dependence clusters are influenced by their size. Our results show that larger dependence clusters tend to be more fault-prone. 2) We examine whether functions inside dependence clusters are more fault-prone than functions outside dependence clusters. The results show that the proportion of faulty functions inside dependence clusters is significantly greater than that of functions outside all dependence clusters. 3) We examine whether functions playing more important roles inside dependence clusters are more fault-prone. Our empirical results show that importance metrics are positively correlated with fault-proneness. 4) Finally, we propose a segmented prediction model for fault-proneness prediction. More specifically, we build two different fault-proneness prediction models respectively for functions inside and functions outside dependence clusters. The empirical results show that our segmented prediction model can significantly improve the prediction performance in effort-aware evaluations. The rest of this paper is organized as follows. In Section 2, we summarize related work. We present our research questions in Section 3. In Section 4, we describe the experimental setup, including the subject systems and the data collection method used. In Section 5, we describe the research method and report the detailed experimental results with respect to each of the research questions. Section 6 discusses our findings. In Section 7, we examine threats to validity. Finally, Section 8 concludes the paper and outlines directions for future work. 2. RELATED WORK This section summarizes related work on dependence clusters and dependence analysis in fault-proneness prediction. 2.1 Dependence Clusters Binkley and Harman [8] originally introduced the concept of dependence clusters based on program slicing at the statement level. They proposed a “same slice size” approach to identifying dependence clusters using the SDG. Later, Harman et al. [18] extended this initial study to include a larger set of programs. Their empirical results showed that the “same slice size” approach was extremely accurate. In addition, they found that large dependence clusters were surprisingly commonplace and consumed from more than 10% of the program to, in some cases, 80% of the whole program. Islam et al. [20, 21] introduced the concept of coherent dependence clusters. In a coherent dependence cluster, all elements depend upon the same set of elements and also affect a common set of elements. They used coherent dependence clusters to identify logical functionality within programs. Binkley and Harman [10, 9] introduced a method to measure the effect of an SDG vertex or an edge on the formation of dependence clusters and then used this method to identify linchpins, which effectively hold a dependence cluster together. Their results showed that only a few vertices and edges act as linchpins. After that, Binkley et al. [10] introduced a simple transformation-based analysis algorithm to identify the impact of global variables on the presence of dependence clusters. Their results showed that over half of the studied programs include a global variable that was responsible for the formation of a dependence cluster. Besz´edes et al. [5, 6] conducted an empirical study into the properties of SEA-based dependence clusters. Such cluster are defined at the function-level and are based on the static execute after (SEA) relation. Their empirical results showed that SEA-based dependence clusters occur frequently in programs regardless of their domain and size. However, the SEA-based relation only considers call structure information. In other words, data dependencies are not considered in their study. In contrast, we take the data dependency between functions into account in our study. Binkley et al. [7] compared the following two types of dependence clusters: slice-based dependence clusters at the statement-level and SEA-based dependence clusters at the function-level. They found that the less expensive SEA-based dependence clusters could be used as an effective proxy for the more expensive slice-based dependence clusters. Unlike the above studies, we investigate dependence clusters from the perspective of software quality. More specifically, we investigate whether dependence clusters have practical value in effort-aware fault-proneness prediction. 2.2 Dependence Analysis in Fault-Proneness Prediction Zimmermann and Nagappan [41] calculated network metrics based on a dependence graph and used them to predict faults. More specifically, they first generate a SDG at the function level. Two kinds of dependencies between functions are then taken into account: call dependencies and data dependencies. They then lift this graph up to binary level since the defects were at the binary level. They considered the presence of dependencies without considering the multiplicity of dependencies. After that, they compute network measures on the dependence graph and then evaluated their fault-proneness prediction performance on Windows Server 2003. Their results show that the recall of the model built from network measures was 10% higher than the model built from complexity measures. Ma et al. [23] conducted an empirical study to examine the effectiveness of network measures in the context of effortaware fault-proneness prediction, taking into account the effort required to inspect predicted faulty module. They investigated dependence graphs at the file-level and did not consider the multiplicity of dependencies between files. They found that most network measures were of practical value in the context of effort-aware evaluations. Unlike these two studies, our SDGs are finer grained (i.e., function-level vs binary/file-level). In addition, we take into account the multiplicity of dependencies between functions. Cataldo et al. [13] compared the relative impact of the syntactic, logical, and work dependencies on fault-proneness prediction. Syntactic dependencies are code dependencies (e.g., control and data dependencies). Logical dependencies focus on deducing dependencies between source code files that are changed together [16]. Finally, work dependencies 297