正在加载图片...
programs using the whole text.However,we use Csmith [19] Organization.The rest of this paper is structured as follows as the random program generator and two Csmith-generated Section II introduces the background on code coverage.Sec- programs are not meaningfully comparable as they diverge in tion III describes our approach for code coverage validation. many ways [20].In addition,calculating similarities between Section IV reports the experimental results in detail.Section V programs using the whole text is expensive.To tackle this surveys related work.Section VI concludes the paper and challenge,only lines of code triggering inconsistencies are outlines the direction for future work. used for computing similarities between programs. Challenge 2:Reducing Test Programs.Reducing test pro- II.CODE COVERAGE grams for code coverage bugs is much more complex than In this section,we introduce the preliminary knowledge,the reducing test programs for compiler bugs as the later one importance,and the bug categories of code coverage. only requires testing the behavior of the compiled executables or the exit code of compilers [18].However.reducing test A.Preliminaries programs for code coverage bugs involves processing textual Code coverage is a quality assurance metric used to describe code coverage reports and identify inconsistencies.After each the degree to which the code of a given program is executed iteration of reduction,we need to specify the inconsistency of when a particular test suite executes [1].It is suggested that a interest we would like to preserve.In this study,we design a program with a high test coverage will have a lower chance of set of inconsistency types over coverage reports as the interest. containing undetected software bugs compared to a program Challenge 3:Inspecting Coverage Bugs.With the reduced with a low test coverage. test program,we need to inspect which code coverage tools The code coverage analysis process is generally divided have bugs before reporting bug.In practice,it is usually done into three tasks:code instrumentation,data gathering,and manually [21].In other words,developers manually inspect the coverage analysis.Specifically,code instrumentation inserts coverage reports to determine which coverage tools are buggy. additional code instructions or probes to monitor whether the To relieve the burden of manual intervention,we summarize specific program chunk is executed or not at runtime.The a number of rules that code coverage reports must follow.instrumentation can be done at the source level in a separate With those rules,we develop a tool to examine part of the pre-processing phase or at runtime by instrumenting byte code. inconsistent coverage reports and determine which tools have Data gathering aims to collect the coverage data produced at bugs automatically. test runtime.Finally,coverage analysis aims to analyze the We implemented a differential testing prototype called C2V collected results and to provide test strategy recommendations ("Code Coverage Validation")for code coverage tools.Inin order to reduce,to feed or to modify the relevant test suite. order to evaluate the effectiveness of our approach,we have Currently,many code coverage tools are available [16], applied c2V to gcov and llvm-cov,two widely used C [23]-[31].which support different languages(e.g.,C/C++and code coverage tools respectively in the production compilers Java),instrumentation levels (e.g.source code and byte code GCC [22]and Clang [17].Our evaluation confirms that C2V level),or coverage criteria.Coverage criteria are the rules or is very effective in finding code coverage bugs:46 bugs were requirements that a test suite needs to satisfy [32].In the lit- found(42 bugs confirmed/fixed)for gcov,while 37 bugs were erature,many coverage criteria have been proposed,including found(28 bugs confirmed/fixed)for llvm-cov. statement coverage,branch coverage,path coverage,condition Contributions.We made the following contributions: coverage,decision coverage,and data-flow coverage [33]. We introduce an effective testing approach to validating the These criteria can be used to guide the generation of a test code coverage tools,and have implemented it as a practical suite or to evaluate the effectiveness of a test suite [331. tool C2V for testing C coverage tools.C2V mainly consists of a random program generator,a comparer to identify B.Importance of Code Coverage inconsistencies between coverage reports,a filter to remove Code coverage is widely used in,but not limited to,the test programs triggering same coverage bugs,a test program following software techniques: reducer,and an inspector to automatically determine which Coverage-based regression testing.In the context of regres- coverage tools have bugs for bug reporting sion testing,test case prioritization and test suite augmenta- We adopted c2V to uncover 46 and 37 bugs for gcov and tion are the two widely used techniques [2]-[4],[10].[15], llvm-cov both of which are widely used and extensively [34]-[36].The former aims to improve the ability of test tested C code coverage tools,respectively.Specifically.for cases to find bugs by scheduling test cases in a specific gcov,42 bugs have already been confirmed/fixed:for llvm- order [10].[15].[37].One common practice is to achieve cov,28 bugs have been confirmed/fixed. a high code coverage as fast as possible [38].The latter is Our evaluation indicates that code coverage tools are not to generate new test cases to strengthen the ability of a test as reliable as it might have been envisaged.It opens up a suite to find bugs [6].[14].[36].[39].In practice,it is often new research direction to improve the reliability of code to generate new test cases to cover the source code affected coverage tools which calls for more attention in this area. by code changes. Besides,there is a need to examine the influence of those Coverage-based compiler testing.Recent years have seen an bugs on other techniques which depend on code coverage. increasing interest in compiler testing which aims to validate 489programs using the whole text. However, we use Csmith [19] as the random program generator and two Csmith-generated programs are not meaningfully comparable as they diverge in many ways [20]. In addition, calculating similarities between programs using the whole text is expensive. To tackle this challenge, only lines of code triggering inconsistencies are used for computing similarities between programs. Challenge 2: Reducing Test Programs. Reducing test pro￾grams for code coverage bugs is much more complex than reducing test programs for compiler bugs as the later one only requires testing the behavior of the compiled executables or the exit code of compilers [18]. However, reducing test programs for code coverage bugs involves processing textual code coverage reports and identify inconsistencies. After each iteration of reduction, we need to specify the inconsistency of interest we would like to preserve. In this study, we design a set of inconsistency types over coverage reports as the interest. Challenge 3: Inspecting Coverage Bugs. With the reduced test program, we need to inspect which code coverage tools have bugs before reporting bug. In practice, it is usually done manually [21]. In other words, developers manually inspect the coverage reports to determine which coverage tools are buggy. To relieve the burden of manual intervention, we summarize a number of rules that code coverage reports must follow. With those rules, we develop a tool to examine part of the inconsistent coverage reports and determine which tools have bugs automatically. We implemented a differential testing prototype called C2V (“Code Coverage Validation”) for code coverage tools. In order to evaluate the effectiveness of our approach, we have applied C2V to gcov and llvm-cov, two widely used C code coverage tools respectively in the production compilers GCC [22] and Clang [17]. Our evaluation confirms that C2V is very effective in finding code coverage bugs: 46 bugs were found (42 bugs confirmed/fixed) for gcov, while 37 bugs were found (28 bugs confirmed/fixed) for llvm-cov. Contributions. We made the following contributions: • We introduce an effective testing approach to validating the code coverage tools, and have implemented it as a practical tool C2V for testing C coverage tools. C2V mainly consists of a random program generator, a comparer to identify inconsistencies between coverage reports, a filter to remove test programs triggering same coverage bugs, a test program reducer, and an inspector to automatically determine which coverage tools have bugs for bug reporting. • We adopted C2V to uncover 46 and 37 bugs for gcov and llvm-cov both of which are widely used and extensively tested C code coverage tools, respectively. Specifically, for gcov, 42 bugs have already been confirmed/fixed; for llvm￾cov, 28 bugs have been confirmed/fixed. • Our evaluation indicates that code coverage tools are not as reliable as it might have been envisaged. It opens up a new research direction to improve the reliability of code coverage tools which calls for more attention in this area. Besides, there is a need to examine the influence of those bugs on other techniques which depend on code coverage. Organization. The rest of this paper is structured as follows. Section II introduces the background on code coverage. Sec￾tion III describes our approach for code coverage validation. Section IV reports the experimental results in detail. Section V surveys related work. Section VI concludes the paper and outlines the direction for future work. II. CODE COVERAGE In this section, we introduce the preliminary knowledge, the importance, and the bug categories of code coverage. A. Preliminaries Code coverage is a quality assurance metric used to describe the degree to which the code of a given program is executed when a particular test suite executes [1]. It is suggested that a program with a high test coverage will have a lower chance of containing undetected software bugs compared to a program with a low test coverage. The code coverage analysis process is generally divided into three tasks: code instrumentation, data gathering, and coverage analysis. Specifically, code instrumentation inserts additional code instructions or probes to monitor whether the specific program chunk is executed or not at runtime. The instrumentation can be done at the source level in a separate pre-processing phase or at runtime by instrumenting byte code. Data gathering aims to collect the coverage data produced at test runtime. Finally, coverage analysis aims to analyze the collected results and to provide test strategy recommendations in order to reduce, to feed or to modify the relevant test suite. Currently, many code coverage tools are available [16], [23]–[31], which support different languages (e.g., C/C++ and Java), instrumentation levels (e.g. source code and byte code level), or coverage criteria. Coverage criteria are the rules or requirements that a test suite needs to satisfy [32]. In the lit￾erature, many coverage criteria have been proposed, including statement coverage, branch coverage, path coverage, condition coverage, decision coverage, and data-flow coverage [33]. These criteria can be used to guide the generation of a test suite or to evaluate the effectiveness of a test suite [33]. B. Importance of Code Coverage Code coverage is widely used in, but not limited to, the following software techniques: • Coverage-based regression testing. In the context of regres￾sion testing, test case prioritization and test suite augmenta￾tion are the two widely used techniques [2]–[4], [10], [15], [34]–[36]. The former aims to improve the ability of test cases to find bugs by scheduling test cases in a specific order [10], [15], [37]. One common practice is to achieve a high code coverage as fast as possible [38]. The latter is to generate new test cases to strengthen the ability of a test suite to find bugs [6], [14], [36], [39]. In practice, it is often to generate new test cases to cover the source code affected by code changes. • Coverage-based compiler testing. Recent years have seen an increasing interest in compiler testing which aims to validate 489
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有