正在加载图片...
(1)We investigate the predictive effectiveness of unsuper- prediction models to predict defect-inducing changes.Their vised models in effort-aware JIT defect prediction,which is results showed that defect-inducing changes can be predicted an important topic in the area of defect prediction. at 60%recall and 61%precision on average. (2)We perform an in-depth evaluation on the simple unsu- pervised techniques (i.e.the unsupervised models in Section 2.2 Effort-aware Defect Prediction 3.2)under three prediction settings (i.e.cross-validation. Although the above-mentioned results were encouraging, time-wise-cross-validation,and across-project prediction). they did not take into account the effort required for quality (3)We compare simple unsupervised models with the state- assurance when applying the JIT defect prediction model- of-the-art supervised models in recent literature [8,13].The s in practice.Arisholm et al.[1]pointed out that,when experimental results show that many simple unsupervised locating defects,it was important to take into account the models perform significantly better than the state-of-the-art cost-effectiveness of using defect prediction models to focus supervised models on verification and validation activities.This viewpoint has The rest of this paper is organized as follows.Section 2 been recently taken by many module-level defect prediction introduces the background on defect prediction.Section 3 studies [12.25.34,33,32.39.42.Inspired by this viewpoint. describes the employed experimental methodology.Section 4 Kamei et al.applied effort-aware evaluation to JIT defect provides the experimental setup in our study,including the predictions 13.In their study,Kamei et al.used the total subject projects and the data sets.Section 5 reports in detail number of lines modified by a change as a measure of the the experimental results.Section 6 examines the threats effort required to inspect a change.In particular,they lever- to validity of our study.Section 7 concludes the paper and aged linear regression method to build the effort-aware JIT outlines directions for future work. defect prediction model(called the EALR model).In their model,the dependent variable was Y(x)/Effort(r),where 2.BACKGROUND Effort(x)was the effort required for inspecting the change In this section.we first introduce the background on just-in- x and Y(z)was 1 if x was defect inducing and 0 otherwise. time defect prediction and then effort-aware defect prediction. The results showed that,on average,the EALR model can Finally,we describe the existing work on the application of detect 35%of all defect-inducing changes,when using 20%of unsupervised models in traditional defect prediction the effort required to inspect all changes.As such,Kamei et al.believed that effort-aware JIT defect prediction was able 2.1 Just-in-time Defect Prediction to focus on the most risky changes and hence could reduce The origin of JIT (just-in-time)defect prediction can be the costs of developing high-quality software [13]. traced back to Mockus and Weiss [28,who used a number of change metrics to predict the probability of changes to be 2.3 Unsupervised Models in Traditional De- defect-inducing changes.For practitioners,JIT defect pre- fect Prediction diction is of more practical value compared with traditional In the last decades,supervised models have been the dom- defect predictions at module (e.g.file or class)level.As inant defect prediction paradigm in traditional defect predic- stated by Kamei et al.13,the reason is not only because tion at module (e.g.file or class)level 1,11,13,15,25,26, the predictions can considerably narrow down the code to be 28.33.32.42.43.44.In order to build a supervised model inspected for finding the latent defects,but also because the we need to collect the defect data such as the number of predictions can be made at check-in time when the change defects or the labeled information (buggy or not-buggy)for details are still fresh in the minds of the developers.In each module.For practitioners,it may be expensive to apply particular,in traditional defect predictions,after a module such supervised models in practice.The reason for this is is predicted as defect-prone,it may be difficult to find the that it is generally time-consuming and costly to collect the specific developer,who is most familiar with the code,to defect data.Furthermore,supervised defect prediction mod- inspect the module to find the latent defects.However,in els cannot be built if the defect data are unavailable.This is JIT defect predictions,it is easy to find such a developer to especially true when a new type of projects is developed or inspect the predicted defect-prone change,as each change is when historical defect data have not been collected associated with a particular developer. Compared with supervised models,unsupervised models In the last decade,Mockus and Weisss study led to an do not need the defect data.Due to this advantage,recent increasing interest in JIT defect predictions.Sliwerski et years have seen an increasing effort devoted to apply unsuper- al.[37]studied defect-inducing changes in two open-source vised modeling techniques to build defect prediction models software systems and found that the changes committed on [3,26,41,42.In practice,however,it is more important Friday had a higher probability to be defect-inducing changes to know the effort-aware prediction performance of unsuper- Eyolfson et al.[6]studied the influence of committed time vised defect prediction models.To this end,many researchers and developer experience on the existence of defects in a investigate whether unsupervised models are still effective software change.Their results showed that changes com- when taking into account the effort to inspect the modules mitted between midnight and 4AM were more likely to be that are predicted as "defect-prone".Koru et al.16,17,18, defect-inducing than the changes committed between 7AM 19 suggested that smaller modules should be inspected first. and noon.Yin et al.40 studied the bug-fixing changes as more defects would be detected per unit code inspection in several open-source software systems.They found that effort.The reason was that the relationship between module around 14.2%to 24.8%bug-fixing changes for post-release size and the number of defects was found not linear but log- bugs were defect-inducing and the concurrency defects were arithmic 16,18,indicating that defect-proneness increased the most difficult to be fixed.Kim et al.15]used numer- at a slower rate as module size increased.Menzies et al.[26] ous features extracted from various sources such as change used the ManualUp model to name the "smaller modules metadata,source code,and change log messages to build inspected first"strategy by Koru et al.In their experiments, 158(1) We investigate the predictive effectiveness of unsuper￾vised models in effort-aware JIT defect prediction, which is an important topic in the area of defect prediction. (2) We perform an in-depth evaluation on the simple unsu￾pervised techniques (i.e. the unsupervised models in Section 3.2) under three prediction settings (i.e. cross-validation, time-wise-cross-validation, and across-project prediction). (3) We compare simple unsupervised models with the state￾of-the-art supervised models in recent literature [8, 13]. The experimental results show that many simple unsupervised models perform significantly better than the state-of-the-art supervised models. The rest of this paper is organized as follows. Section 2 introduces the background on defect prediction. Section 3 describes the employed experimental methodology. Section 4 provides the experimental setup in our study, including the subject projects and the data sets. Section 5 reports in detail the experimental results. Section 6 examines the threats to validity of our study. Section 7 concludes the paper and outlines directions for future work. 2. BACKGROUND In this section, we first introduce the background on just-in￾time defect prediction and then effort-aware defect prediction. Finally, we describe the existing work on the application of unsupervised models in traditional defect prediction. 2.1 Just-in-time Defect Prediction The origin of JIT (just-in-time) defect prediction can be traced back to Mockus and Weiss [28], who used a number of change metrics to predict the probability of changes to be defect-inducing changes. For practitioners, JIT defect pre￾diction is of more practical value compared with traditional defect predictions at module (e.g. file or class) level. As stated by Kamei et al. [13], the reason is not only because the predictions can considerably narrow down the code to be inspected for finding the latent defects, but also because the predictions can be made at check-in time when the change details are still fresh in the minds of the developers. In particular, in traditional defect predictions, after a module is predicted as defect-prone, it may be difficult to find the specific developer, who is most familiar with the code, to inspect the module to find the latent defects. However, in JIT defect predictions, it is easy to find such a developer to inspect the predicted defect-prone change, as each change is associated with a particular developer. In the last decade, Mockus and Weis´ss study led to an increasing interest in JIT defect predictions. Sliwerski et ´ al. [37] studied defect-inducing changes in two open-source software systems and found that the changes committed on Friday had a higher probability to be defect-inducing changes. Eyolfson et al. [6] studied the influence of committed time and developer experience on the existence of defects in a software change. Their results showed that changes com￾mitted between midnight and 4AM were more likely to be defect-inducing than the changes committed between 7AM and noon. Yin et al. [40] studied the bug-fixing changes in several open-source software systems. They found that around 14.2% to 24.8% bug-fixing changes for post-release bugs were defect-inducing and the concurrency defects were the most difficult to be fixed. Kim et al. [15] used numer￾ous features extracted from various sources such as change metadata, source code, and change log messages to build prediction models to predict defect-inducing changes. Their results showed that defect-inducing changes can be predicted at 60% recall and 61% precision on average. 2.2 Effort-aware Defect Prediction Although the above-mentioned results were encouraging, they did not take into account the effort required for quality assurance when applying the JIT defect prediction model￾s in practice. Arisholm et al. [1] pointed out that, when locating defects, it was important to take into account the cost-effectiveness of using defect prediction models to focus on verification and validation activities. This viewpoint has been recently taken by many module-level defect prediction studies [12, 25, 34, 33, 32, 39, 42]. Inspired by this viewpoint, Kamei et al. applied effort-aware evaluation to JIT defect predictions [13]. In their study, Kamei et al. used the total number of lines modified by a change as a measure of the effort required to inspect a change. In particular, they lever￾aged linear regression method to build the effort-aware JIT defect prediction model (called the EALR model). In their model, the dependent variable was Y (x)/Effort(x), where Effort(x) was the effort required for inspecting the change x and Y (x) was 1 if x was defect inducing and 0 otherwise. The results showed that, on average, the EALR model can detect 35% of all defect-inducing changes, when using 20% of the effort required to inspect all changes. As such, Kamei et al. believed that effort-aware JIT defect prediction was able to focus on the most risky changes and hence could reduce the costs of developing high-quality software [13]. 2.3 Unsupervised Models in Traditional De￾fect Prediction In the last decades, supervised models have been the dom￾inant defect prediction paradigm in traditional defect predic￾tion at module (e.g. file or class) level [1, 11, 13, 15, 25, 26, 28, 33, 32, 42, 43, 44]. In order to build a supervised model, we need to collect the defect data such as the number of defects or the labeled information (buggy or not-buggy) for each module. For practitioners, it may be expensive to apply such supervised models in practice. The reason for this is that it is generally time-consuming and costly to collect the defect data. Furthermore, supervised defect prediction mod￾els cannot be built if the defect data are unavailable. This is especially true when a new type of projects is developed or when historical defect data have not been collected. Compared with supervised models, unsupervised models do not need the defect data. Due to this advantage, recent years have seen an increasing effort devoted to apply unsuper￾vised modeling techniques to build defect prediction models [3, 26, 41, 42]. In practice, however, it is more important to know the effort-aware prediction performance of unsuper￾vised defect prediction models. To this end, many researchers investigate whether unsupervised models are still effective when taking into account the effort to inspect the modules that are predicted as “defect-prone”. Koru et al. [16, 17, 18, 19] suggested that smaller modules should be inspected first, as more defects would be detected per unit code inspection effort. The reason was that the relationship between module size and the number of defects was found not linear but log￾arithmic [16, 18], indicating that defect-proneness increased at a slower rate as module size increased. Menzies et al. [26] used the ManualUp model to name the “smaller modules inspected first” strategy by Koru et al. In their experiments, 158
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有