正在加载图片...
in P and P'.However,consider a line of code s in P and II.BACKGROUND AND MOTIVATION P,for which the same profiler reported different coverage A.Coverage Profilers results,i.e..Cp(s)Cp(s)where Cp(s)refers to the profiled runtime execution count of statement s in program P. Code coverage profiling data (each line of code's execution count in a program execution)is the foundation of a broad The execution count Cp(s)is usually a nonnegative number spectrum of software engineering practices.Code coverage except for a special value-1 indicating the unknown coverage is the most widely adopted criteria for measuring testing information.This could happen when the coverage profiler failed to obtain the coverage information of statement s due thoroughness,and is also widely used in the automation of software engineering tasks.For example,test input generation to the information loss caused by the abstraction gap between techniques leverage code coverage to guide search iterations the source code and the intermediate code transformed during [3];fault localization techniques use code coverage to isolate compilation.Given Cp(s)Cp(s),either of the two cases applies: potentially faulty branches [4]-[6]. To obtain code coverage statistics,a code coverage profiler maintains each source code line an execution counter and I)(Strong Inconsistency)Cp(s)≥0ACp(s)≥0, updates them along with the program execution.Specifically, meaning that the coverage profiler reports the inconsis- given a program P,a coverage profiler runs P and outputs tent coverage information.Definitely there is a bug in each line of code s EP a number Cp(s)=n,indicating that the profiler because p'should follow exactly the same s was executed n times.A special value n =-1 indicates that execution path as P assuming that the coverage statistics the profiler provides no coverage information for this line. of program P are correct. Where should a profiler report a coverage statistics for a line 2)(Weak Inconsistency)Cp(s)=-1 V Cp(s)=-1, of code s (i.e.,whether Cp(s)=-1)is not well defined.Code indicating an inaccurate statistics because that a non- transformations (e.g.,expansion of macros,or compilation instrumented line is actually executed in its equivalent. from source code to intermediate code)and optimizations This is also for sure a bug because non-optimized cov- may lead to Cp(s)=-1 for a line,and different profilers erage statistics should faithfully reflect the program's generally have different opinions upon which lines would have execution path. Cp(s)>0.Later we see that this is a major limitation of existing techniques for validating coverage profilers. The self-validator Cod fully exploits the inconsistencies be- tween path-equivalent programs with zero false positive.Cod B.Validating Coverage Profilers addresses the limitation of C2V in Section II-C and handles Validating the correctness a coverage profiler is challenging weak inconsistencies whereas C2V has to ignore all weak because it is labor-intensive to obtain the ground truth of inconsistencies between independent profiler implementations coverage statistics.Though we have large number of test inputs to avoid being flooded by false positives.It is worth noting (any program used to test a compiler also works in testing a that such a technique of obtaining path-equivalent programs, profiler),lacking of a test oracle became the problem. firstly known as EMI,was proposed to validate the correctness The only known technique to uncover coverage profiler bugs of compiler optimizations [2].We found that this idea is also is C2V which is based on differential testing [11].Given a powerful in the validation of coverage profilers.Nevertheless, program P under profiling,C2V profiles it using two indepen- Cod differs from EMI as we proposed the specialized mutation dently implemented profilers to obtain for each statement s the strategies to acquire the program variants,and adopted the coverage statistics Cp(s)and cp(s).When Cp(s)Cp(s). results verification criterion in particular for testing coverage an inconsistency is found.When Cp(s)>0ACp(s)>0 profilers.We defer to Section V for the comparison details. (a strong inconsistency),C2V reports it as a bug candidate and uses clustering to filter out potential false positives and duplicates. Results We implemented Cod as a prototype and evaluated it on two popular coverage profilers.namely gcov and llvm-cov C.Limitations of Differential Testing integrated with the compiler gcc and llvm,respectively.As of Though being effective in uncovering coverage profiler the submission deadline,a total of 196 potential bugs (123 for bugs,differential testing also has the following major limi- gcov,73 for llvm-cov)are uncovered within 4 months,among tations: which 23 bugs have already been confirmed by the developers. First,differential testing cannot be applied when there is Promisingly,all the detected bugs are new bugs according to only a single coverage profiler.This is the case for many the developers'feedback. mainstream programming languages (e.g,Python and Perl). Second,differential testing requires heavy human efforts on Outline The rest of the paper is organized as follows.We analyzing the reports because it is hard to determine which introduce the necessary background and a brief motivation in one is faulty when two profilers disagree on the statistics of a Section II.Section III elaborates on the detailed approach, line of code. followed by the evaluation in Section IV.We discuss related Third,differential testing miss many potential bugs on weak work in Section V and conclude the paper in Section VI. inconsistencies.Two profilers can have inconsistent (but both 80in P and P . However, consider a line of code s in P and P , for which the same profiler reported different coverage results, i.e., CP (s) = CP (s) where CP (s) refers to the profiled runtime execution count of statement s in program P. The execution count CP (s) is usually a nonnegative number except for a special value −1 indicating the unknown coverage information. This could happen when the coverage profiler failed to obtain the coverage information of statement s due to the information loss caused by the abstraction gap between the source code and the intermediate code transformed during compilation. Given CP (s) = CP (s), either of the two cases applies: 1) (Strong Inconsistency) CP (s) ≥ 0 ∧ CP (s) ≥ 0, meaning that the coverage profiler reports the inconsis￾tent coverage information. Definitely there is a bug in the profiler because P should follow exactly the same execution path as P assuming that the coverage statistics of program P are correct. 2) (Weak Inconsistency) CP (s) = −1 ∨ CP (s) = −1, indicating an inaccurate statistics because that a non￾instrumented line is actually executed in its equivalent. This is also for sure a bug because non-optimized cov￾erage statistics should faithfully reflect the program’s execution path. The self-validator Cod fully exploits the inconsistencies be￾tween path-equivalent programs with zero false positive. Cod addresses the limitation of C2V in Section II-C and handles weak inconsistencies whereas C2V has to ignore all weak inconsistencies between independent profiler implementations to avoid being flooded by false positives. It is worth noting that such a technique of obtaining path-equivalent programs, firstly known as EMI, was proposed to validate the correctness of compiler optimizations [2]. We found that this idea is also powerful in the validation of coverage profilers. Nevertheless, Cod differs from EMI as we proposed the specialized mutation strategies to acquire the program variants, and adopted the results verification criterion in particular for testing coverage profilers. We defer to Section V for the comparison details. Results We implemented Cod as a prototype and evaluated it on two popular coverage profilers, namely gcov and llvm-cov integrated with the compiler gcc and llvm, respectively. As of the submission deadline, a total of 196 potential bugs (123 for gcov, 73 for llvm-cov) are uncovered within 4 months, among which 23 bugs have already been confirmed by the developers. Promisingly, all the detected bugs are new bugs according to the developers’ feedback. Outline The rest of the paper is organized as follows. We introduce the necessary background and a brief motivation in Section II. Section III elaborates on the detailed approach, followed by the evaluation in Section IV. We discuss related work in Section V and conclude the paper in Section VI. II. BACKGROUND AND MOTIVATION A. Coverage Profilers Code coverage profiling data (each line of code’s execution count in a program execution) is the foundation of a broad spectrum of software engineering practices. Code coverage is the most widely adopted criteria for measuring testing thoroughness, and is also widely used in the automation of software engineering tasks. For example, test input generation techniques leverage code coverage to guide search iterations [3]; fault localization techniques use code coverage to isolate potentially faulty branches [4]–[6]. To obtain code coverage statistics, a code coverage profiler maintains each source code line an execution counter and updates them along with the program execution. Specifically, given a program P, a coverage profiler runs P and outputs each line of code s ∈ P a number CP (s) = n, indicating that s was executed n times. A special value n = −1 indicates that the profiler provides no coverage information for this line. Where should a profiler report a coverage statistics for a line of code s (i.e., whether CP (s) = −1) is not well defined. Code transformations (e.g., expansion of macros, or compilation from source code to intermediate code) and optimizations may lead to CP (s) = −1 for a line, and different profilers generally have different opinions upon which lines would have CP (s) ≥ 0. Later we see that this is a major limitation of existing techniques for validating coverage profilers. B. Validating Coverage Profilers Validating the correctness a coverage profiler is challenging because it is labor-intensive to obtain the ground truth of coverage statistics. Though we have large number of test inputs (any program used to test a compiler also works in testing a profiler), lacking of a test oracle became the problem. The only known technique to uncover coverage profiler bugs is C2V which is based on differential testing [11]. Given a program P under profiling, C2V profiles it using two indepen￾dently implemented profilers to obtain for each statement s the coverage statistics CP (s) and C P (s). When CP (s) = C P (s), an inconsistency is found. When CP (s) ≥ 0 ∧ C P (s) ≥ 0 (a strong inconsistency), C2V reports it as a bug candidate and uses clustering to filter out potential false positives and duplicates. C. Limitations of Differential Testing Though being effective in uncovering coverage profiler bugs, differential testing also has the following major limi￾tations: First, differential testing cannot be applied when there is only a single coverage profiler. This is the case for many mainstream programming languages (e.g, Python and Perl). Second, differential testing requires heavy human efforts on analyzing the reports because it is hard to determine which one is faulty when two profilers disagree on the statistics of a line of code. Third, differential testing miss many potential bugs on weak inconsistencies. Two profilers can have inconsistent (but both 80
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有