338 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,VOL 41,NO.4,APRIL 2015 that slice-based cohesion metrics are more closely related to The independent variables in this study consist of two fault-proneness than the most commonly used code and pro- categories of metrics:(i)the most commonly used 19 cess metrics.From this expectation,we set up the following code and process metrics,and (ii)eight slice-based cohe- null hypothesis H30 and alternative hypothesis H3A for RQ3: sion metrics.All these metrics are collected at the func- tion level.The objective of this study is to empirically H30.Slice-based cohesion metrics are not more effective in effort- investigate the actual usefulness of slice-based cohesion aware post-release fault-proneness prediction than the most metrics in the context of effort-aware post-release fault- commonly used code and process metrics. H3A.Slice-based cohesion metrics are more effective in effort- proneness prediction,especially when compared with the most commonly used code and process metrics.With aware post-release fault-proneness prediction than the most these independent variables,we are able to test the four commonly used code and process metrics. The fourth research question(RQ4)of this study investi- null hypotheses described in Section 3.1. gates whether the model built with slice-based cohesion 3.3 Modeling Technique metrics and the most commonly used code and process met- rics together has a better ability to predict post-release fault- Logistic regression is a standard statistical modeling tech- proneness than the model built with the most commonly nique in which the dependent variable can take two differ- used code and process metrics alone.This issue is indeed ent values [281.It is suitable for building fault-proneness prediction models because the functions under consi- raised by the null hypothesis H1o.If the null hypothesis H1o is rejected,it means that slice-based cohesion metrics cap- deration are divided into two categories:faulty and not- ture different underlying dimensions of software quality faulty.Let Pr(Y =1X1,X2,...,Xn)represent the probabil- ity that the dependent variable Y=1 given the independent that are not captured by the most commonly used code and process metrics.In this case,we will naturally conjecture variables X1,X2,...,and Xn (i.e.the metrics in this study). that combining slice-based cohesion metrics with the most Then,a multivariate logistic regression model assumes that commonly used code and process metrics should give Pr(Y =1X1:X2...,Xn)is related to X1,X2,...,Xn by the a more complete indication of software quality.Conse- following equation: quently,the combination of slice-based cohesion metrics eu+1X1+2X2+…BnXn with the most commonly used code and process metrics will form a better indicator of post-release fault-proneness P(Y=1X,X2.Xn)=1+e+A1+22+石 than the combination of the most commonly used code and process metrics alone.From this reasoning,we set up the where o and B,s are the regression coefficients and can be following null hypothesis H4o and alternative hypothesis estimated through the maximization of a log-likelihood. H4A for RQ4: Odds ratio is the most commonly used measure to quantify the magnitude of the correlation between the independent H40.The combination of slice-based cohesion metrics with the and dependent variables in a logistic regression model.For most commonly used code and process metrics are not more a given independent variable Xi,the odds that Y=1 at effective in effort-aware post-release fault-proneness predic- Xi =x denotes the ratio of the probability that Y=1 to the tion than the combination of the most commonly used code probability that Y =1 at Xi=z,i.e. and process metrics H4A.The combination of slice-based cohesion metrics with the Pr(Y=1|,X=x,) most commonly used code and process metrics are more effective in effort-aware post-release fault-proneness predic- 0dds(Y=1X:=)=1-PY=1.,X=x, tion than the combination of the most commonly used code and process metrics. In this study,similar to [33],we use AOR,the odds 3.2 Variable Description ratio associated with one standard deviation increase,to The dependent variable in this study is a binary variable Y provide an intuitive insight into the impact of the indepen- that can take on only one of two different values.In the dent variable Xi: following,let the values be 0 and 1.Here,Y=1 represents that the corresponding function has at least one post-release faults and Y=0 represents that the corresponding function △OR(X)= Odds(Y =1X;=+i)=efo., Odds(Y =1Xi =x) has no post-release fault.In this paper,we use a modeling technique called logistic regression(described in Section 3.3) where B;and o;are respectively the regression coefficient to predict the probability of y=1.The probability of Y=1 and the standard deviation of the variable Xi.AOR(Xi) indeed indicates post-release fault-proneness,i.e.the extent can be used to compare the relative magnitude of the effects of a function being post-release faulty.As stated by Nagap- of different independent variables,as the same unit is used pan et al.[661,for the users,only post-release failures [421.AOR(Xi)>1 indicates that the independent variable is matter.It is hence essential to predict post-release fault- positively associated with dependent variable.AOR(Xi)=1 proneness of functions in a system in practice,as it enables indicates that there is no such correlation.AOR(X;)<1 developers to take focused preventive actions to improve indicates that there is a negative correlation.The univariate quality in a cost-effective way.Indeed,much effort has been logistic regression model is a special case of the multivariate devoted to post-release fault-proneness prediction [271,[341,logistic regression model,where there is only one indepen- [361,[42],[541,[60],[64],[651.[66]. dent variable.that slice-based cohesion metrics are more closely related to fault-proneness than the most commonly used code and process metrics. From this expectation, we set up the following null hypothesis H30 and alternative hypothesis H3A for RQ3: H30. Slice-based cohesion metrics are not more effective in effortaware post-release fault-proneness prediction than the most commonly used code and process metrics. H3A. Slice-based cohesion metrics are more effective in effortaware post-release fault-proneness prediction than the most commonly used code and process metrics. The fourth research question (RQ4) of this study investigates whether the model built with slice-based cohesion metrics and the most commonly used code and process metrics together has a better ability to predict post-release faultproneness than the model built with the most commonly used code and process metrics alone. This issue is indeed raised by the null hypothesis H10. If the null hypothesis H10 is rejected, it means that slice-based cohesion metrics capture different underlying dimensions of software quality that are not captured by the most commonly used code and process metrics. In this case, we will naturally conjecture that combining slice-based cohesion metrics with the most commonly used code and process metrics should give a more complete indication of software quality. Consequently, the combination of slice-based cohesion metrics with the most commonly used code and process metrics will form a better indicator of post-release fault-proneness than the combination of the most commonly used code and process metrics alone. From this reasoning, we set up the following null hypothesis H40 and alternative hypothesis H4A for RQ4: H40. The combination of slice-based cohesion metrics with the most commonly used code and process metrics are not more effective in effort-aware post-release fault-proneness prediction than the combination of the most commonly used code and process metrics. H4A. The combination of slice-based cohesion metrics with the most commonly used code and process metrics are more effective in effort-aware post-release fault-proneness prediction than the combination of the most commonly used code and process metrics. 3.2 Variable Description The dependent variable in this study is a binary variable Y that can take on only one of two different values. In the following, let the values be 0 and 1. Here, Y ¼ 1 represents that the corresponding function has at least one post-release faults and Y ¼ 0 represents that the corresponding function has no post-release fault. In this paper, we use a modeling technique called logistic regression (described in Section 3.3) to predict the probability of Y ¼ 1. The probability of Y ¼ 1 indeed indicates post-release fault-proneness, i.e. the extent of a function being post-release faulty. As stated by Nagappan et al. [66], for the users, only post-release failures matter. It is hence essential to predict post-release faultproneness of functions in a system in practice, as it enables developers to take focused preventive actions to improve quality in a cost-effective way. Indeed, much effort has been devoted to post-release fault-proneness prediction [27], [34], [36], [42], [54], [60], [64], [65], [66]. The independent variables in this study consist of two categories of metrics: (i) the most commonly used 19 code and process metrics, and (ii) eight slice-based cohesion metrics. All these metrics are collected at the function level. The objective of this study is to empirically investigate the actual usefulness of slice-based cohesion metrics in the context of effort-aware post-release faultproneness prediction, especially when compared with the most commonly used code and process metrics. With these independent variables, we are able to test the four null hypotheses described in Section 3.1. 3.3 Modeling Technique Logistic regression is a standard statistical modeling technique in which the dependent variable can take two different values [28]. It is suitable for building fault-proneness prediction models because the functions under consideration are divided into two categories: faulty and notfaulty. Let PrðY ¼ 1jX1; X2; ... ; XnÞ represent the probability that the dependent variable Y ¼ 1 given the independent variables X1, X2,..., and Xn (i.e. the metrics in this study). Then, a multivariate logistic regression model assumes that PrðY ¼ 1jX1; X2 ... ; XnÞ is related to X1; X2; ... ; Xn by the following equation: PrðY ¼ 1jX1; X2; ... XnÞ ¼ eaþb1X1þb2X2þ...bnXn 1 þ eaþb1X1þb2X2þ...bnXn ; where a and bis are the regression coefficients and can be estimated through the maximization of a log-likelihood. Odds ratio is the most commonly used measure to quantify the magnitude of the correlation between the independent and dependent variables in a logistic regression model. For a given independent variable Xi, the odds that Y ¼ 1 at Xi ¼ x denotes the ratio of the probability that Y ¼ 1 to the probability that Y ¼ 1 at Xi ¼ x, i.e. OddsðY ¼ 1jXi ¼ xÞ ¼ PrðY ¼ 1j ... ; Xi ¼ x; ...Þ 1 PrðY ¼ 1j ... ; Xi ¼ x; ...Þ : In this study, similar to [33], we use DOR, the odds ratio associated with one standard deviation increase, to provide an intuitive insight into the impact of the independent variable Xi: DORðXiÞ ¼ OddsðY ¼ 1jXi ¼ x þ siÞ OddsðY ¼ 1jXi ¼ xÞ ¼ ebisi ; where bi and si are respectively the regression coefficient and the standard deviation of the variable Xi. DORðXiÞ can be used to compare the relative magnitude of the effects of different independent variables, as the same unit is used [42]. DORðXiÞ > 1 indicates that the independent variable is positively associated with dependent variable. DORðXiÞ ¼ 1 indicates that there is no such correlation. DORðXiÞ < 1 indicates that there is a negative correlation. The univariate logistic regression model is a special case of the multivariate logistic regression model, where there is only one independent variable. 338 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 4, APRIL 2015