正在加载图片...
Effort-Aware Just-in-Time Defect Prediction:Simple Unsupervised Models Could Be Better Than Supervised Models Yibiao Yang',Yuming Zhou',Jinping Liu',Yangyang Zhao',Hongmin Lu',Lei Xu', Baowen Xu',and Hareton Leung 'Department of Computer Science and Technology,Nanjing University,China 2Department of Computing,Hong Kong Polytechnic University,Hong Kong,China ABSTRACT consecutive commits in a given period of time)that introduce Unsupervised models do not require the defect data to build one or several defects into the source code in a software the prediction models and hence incur a low building cost system [37].Compared with traditional defect prediction and gain a wide application range.Consequently,it would at module (e.g.package,file,or class)level,JIT defect be more desirable for practitioners to apply unsupervised prediction is a fine granularity defect prediction.As stated models in effort-aware just-in-time (JIT)defect prediction by Kamei et al.[13,it allows developers to inspect an order of magnitude smaller number of SLOC (source lines of code) if they can predict defect-inducing changes well.However, little is currently known on their prediction effectiveness in to find latent defects.This could provide large savings in this context.We aim to investigate the predictive power effort over traditional coarser granularity defect predictions. of simple unsupervised models in effort-aware JIT defect In particular,JIT defect prediction can be performed at prediction,especially compared with the state-of-the-art su- check-in time [13.This allows developers to inspect the code pervised models in the recent literature.We first use the most changes for finding the latent defects when the change details commonly used change metrics to build simple unsupervised are still fresh in their minds.As a result,it is possible to models.Then,we compare these unsupervised models with find the latent defects faster.Furthermore,compared with the state-of-the-art supervised models under cross-validation, conventional non-effort-aware defect prediction,effort-aware time-wise-cross-validation,and across-project prediction set- JIT defect prediction takes into account the effort required tings to determine whether they are of practical value.The to inspect the modified code for a change [13].Consequently, experimental results,from open-source software systems, effort-aware JIT defect prediction would be more practical show that many simple unsupervised models perform better for practitioners,as it enables them to find more latent than the state-of-the-art supervised models in effort-aware defects per unit code inspection effort.Currently,there is JIT defect prediction. a significant strand of interest in developing effective effort- aware JIT defect prediction models [7.13]. CCS Concepts Kamei et al.[13]leveraged supervised method (i.e.the linear regression method)to build an effort-aware JIT de- .Software and its engineering-Risk management; fect prediction model.To the best of our knowledge,this Software development process management; is the first time to introduce effort-aware concept into JIT defect prediction.Their results showed that the proposed Keywords supervised model was effective in effort-aware performance evaluation compared with the random model.This work is Defect,prediction,changes,just-in-time,effort-aware significant,as it could help find more defect-inducing changes 1.INTRODUCTION per unit code inspection effort.In practice,however,it is often time-consuming and expensive to collect the defect data Recent years have seen an increasing interest in just-in-time (used as the dependent variable)to build supervised models. (JIT)defect prediction,as it enables developers to identify Furthermore,for many new projects,the defect data are defect-inducing changes at check-in time [7,13].A defect- unavailable,in which supervised models are not applicable. inducing change is a software change (i.e.a single or several Different from supervised models,unsupervised models do not need the defect data to build the defect prediction models. Corresponding author:zhouyuming@nju.edu.cn Therefore,for practitioners,it would be more desirable to apply unsupervised models if they can predict defects well. Permission to make digital or hard copies of all or part of this work for personal or According to recent studies [16,17,18,19,26,42],simple classroom use is granted without fee provided that copies are not made or distributed unsupervised models.such as the ManualUp model in which modules are prioritized in ascending order according to code omp must be honored.Abstracting with credit is permitted.To copy otherwise.or republish size,are effective in the context of effort-aware defect pre- to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from Permissions@acm.org. diction at coarser granularity.Up till now,however,little is known on the practical value of simple unsupervised models FSE'16,November 13-18,2016,Seattle,WA,USA ©2016ACM.978-1-4503 4 -6W16W11 15.00 in the context of effort-aware JIT defect prediction. http:/ldx.doi.org/10.1145/2950290.2950353 The main contributions of this paper are as follows: 157Effort-Aware Just-in-Time Defect Prediction: Simple Unsupervised Models Could Be Better Than Supervised Models Yibiao Yang1 , Yuming Zhou1 ∗ , Jinping Liu1 , Yangyang Zhao1 , Hongmin Lu1 , Lei Xu1 , Baowen Xu1 , and Hareton Leung2 1 Department of Computer Science and Technology, Nanjing University, China 2 Department of Computing, Hong Kong Polytechnic University, Hong Kong, China ABSTRACT Unsupervised models do not require the defect data to build the prediction models and hence incur a low building cost and gain a wide application range. Consequently, it would be more desirable for practitioners to apply unsupervised models in effort-aware just-in-time (JIT) defect prediction if they can predict defect-inducing changes well. However, little is currently known on their prediction effectiveness in this context. We aim to investigate the predictive power of simple unsupervised models in effort-aware JIT defect prediction, especially compared with the state-of-the-art su￾pervised models in the recent literature. We first use the most commonly used change metrics to build simple unsupervised models. Then, we compare these unsupervised models with the state-of-the-art supervised models under cross-validation, time-wise-cross-validation, and across-project prediction set￾tings to determine whether they are of practical value. The experimental results, from open-source software systems, show that many simple unsupervised models perform better than the state-of-the-art supervised models in effort-aware JIT defect prediction. CCS Concepts •Software and its engineering → Risk management; Software development process management; Keywords Defect, prediction, changes, just-in-time, effort-aware 1. INTRODUCTION Recent years have seen an increasing interest in just-in-time (JIT) defect prediction, as it enables developers to identify defect-inducing changes at check-in time [7, 13]. A defect￾inducing change is a software change (i.e. a single or several ∗Corresponding author: zhouyuming@nju.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. FSE’16, November 13-19, 2016, Seattle, WA, USA c 2016 ACM. ISBN 978-1-4503-4218-6/16/11. . . $15.00 DOI: http://dx.doi.org/10.1145/2950290.2950353 consecutive commits in a given period of time) that introduce one or several defects into the source code in a software system [37]. Compared with traditional defect prediction at module (e.g. package, file, or class) level, JIT defect prediction is a fine granularity defect prediction. As stated by Kamei et al. [13], it allows developers to inspect an order of magnitude smaller number of SLOC (source lines of code) to find latent defects. This could provide large savings in effort over traditional coarser granularity defect predictions. In particular, JIT defect prediction can be performed at check-in time [13]. This allows developers to inspect the code changes for finding the latent defects when the change details are still fresh in their minds. As a result, it is possible to find the latent defects faster. Furthermore, compared with conventional non-effort-aware defect prediction, effort-aware JIT defect prediction takes into account the effort required to inspect the modified code for a change [13]. Consequently, effort-aware JIT defect prediction would be more practical for practitioners, as it enables them to find more latent defects per unit code inspection effort. Currently, there is a significant strand of interest in developing effective effort￾aware JIT defect prediction models [7, 13]. Kamei et al. [13] leveraged supervised method (i.e. the linear regression method) to build an effort-aware JIT de￾fect prediction model. To the best of our knowledge, this is the first time to introduce effort-aware concept into JIT defect prediction. Their results showed that the proposed supervised model was effective in effort-aware performance evaluation compared with the random model. This work is significant, as it could help find more defect-inducing changes per unit code inspection effort. In practice, however, it is often time-consuming and expensive to collect the defect data (used as the dependent variable) to build supervised models. Furthermore, for many new projects, the defect data are unavailable, in which supervised models are not applicable. Different from supervised models, unsupervised models do not need the defect data to build the defect prediction models. Therefore, for practitioners, it would be more desirable to apply unsupervised models if they can predict defects well. According to recent studies [16, 17, 18, 19, 26, 42], simple unsupervised models, such as the ManualUp model in which modules are prioritized in ascending order according to code size, are effective in the context of effort-aware defect pre￾diction at coarser granularity. Up till now, however, little is known on the practical value of simple unsupervised models in the context of effort-aware JIT defect prediction. The main contributions of this paper are as follows: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. FSE’16, November 13–18, 2016, Seattle, WA, USA c 2016 ACM. 978-1-4503-4218-6/16/11...$15.00 http://dx.doi.org/10.1145/2950290.2950353 157
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有