to integrate the necessary instrumentation code into the ex- with the execution frequency of each line.The first and second ecutable.While executing Pere with input i,we obtain the column list the execution frequency and the line number.The output O of the program.Meanwhile,the code coverage report frequency number"-1"in the first column indicates that the C can also be readily extracted by the coverage profiler T. coverage information is unknown. Each code coverage report contains the lines of code executed In this example,we first utilize gcc to compile the program and unexecuted in the test program P under the input i.Those p and then execute it to produce the output and coverage statements marked as unexecuted will be randomly pruned for report (shown as Figure 3 (a)).Note that the output in this case the purpose of generating P's equivalent variants,which will is 0.According to the original code coverage report of P,Cod be discussed shortly.Cod implemented the supports for both decides to remove the 6th statement from the original program, gcov and llvm-cov.Take gcov as an example,Cod extracts resulting in an equivalent program P'shown as Figure 3(b). coverage information by compiling the program P with the Next,we compile and execute p'to get the new output and flag:"-00 --coverage"under gcc.It tells the compiler to coverage report.Here,the output turns to be 1. instrument additional code in the object files for generating Since the outputs of these two program are not equal,P the extra profiling information at runtime.Cod then runs the and P'are somehow not equivalent,meaning that we actually executable binary under input i to produce coverage report for deleted some executed code.The code coverage tool wrongly the program P. marked some executed statements as not executed.A potential Generating Variants via Transformation Based on the bug is identified.We reported this bug to Bugzilla.The gcov coverage report C for the original program P,its variants are developers quickly confirmed and fixed it. generated (Line 5).Cod produces the variants by stochastically Bug Example Exposed by Strongly Inconsistent Coverage removing unexecuted program statements from the original Figure 4 illustrates another real bug example uncovered by program P.Specifically,for each of these removable lines strongly inconsistent code coverage reports between the pro- of code,we made a random choice.As such,we obtain a gram and its "equivalence"variant.Figure 4 (a)shows the number of variants P'that should be equivalent to the original coverage report for P.We can read from it that Line 10 is not program.The function genvariant (Lines 13-20)describe executed at all (i.e.,the execution count is 0).Cod prunes Line Cod's process for generating equivalence mutants via transfor- 10 to generate the equivalent program P'.After compiling and mation.Note that stochastically removing unexecuted program executing P',another coverage report shown as Figure 4(a)is statements would lead to many uncompilable mutants.Only produced.As can be seen,there exists an strong inconsistency the compilable ones are returned by genvariant (Line 17). in term of the execution frequency of Line 6,indicating a Comparing the Outputs and Coverage Reports Having the potential bug.This bug is submitted and confirmed already by outputs and the coverage reports for the orginal program P and gcov developers. its variants P,we detect bugs in the code coverage tool T by Bug Example Exposed by Weakly Inconsistent Coverage checking the existence of inconsistency.More specifically,We Figure 5 presents another confirmed real bug example found first compare the outputs of P and P'.If they are not identical, via the weakly inconsistent code coverage reports between the a potential bug would be reported in the code coverage tool. program and its equivalent variant.In Figure 5(a),Line 6 in Otherwise,the code coverage reports are further compared to p is not executed (i.e.,the execution count is 0).Cod gets seeking for inconsistencies.Note that only the code coverage rid of Line 6 to generate the equivalent program P'.Upon of the common lines of code between the programs P and p compiling and executing P',another coverage report shown as (i.e.those lines of code left in the variant program),will be Figure 5(a)is generated.Apparently,the weakly inconsistency considered for comparison.If the code coverage reports is not with respect to the execution frequency of Line 5 appears, consistent over the common lines,a potential bug is reported indicating a potential bug as well (Lines 9-12). IV.EVALUATION C.Illustrative Examples In the following,we take our reported three concrete bug This section presents our evaluation of Cod.We evaluated Cod using the most popular practical code coverage profilers: examples to illustrate how Cod works.Three bugs are newly discovered by Cod and confirmed by the GCC developers. gcov and llvm-cov and a set of testing programs for testing compilers,and compared the results with existing differential Bug Example Exposed by Different Outputs Figure 3 shows technique C2V [11]. a real bug example exposed via different outputs of two "equivalent"programs in gcov [17],a C code coverage tool A.Evaluation Setup integrated in GCC [18].Figure 3 (a)and (b)are the code coverage reports produced by gcov for the original program Profilers for Validation We evaluated Cod using the latest P and its equivalent program P'(by removing an unexecuted versions of gcov and llvm-cov,the most popular two code Line 8).respectively.Note that all the test programs are coverage profilers of C programs,as our experimental subjects. reformatted for presentation.As can be seen,a code coverage Both profilers are: report is an annotated version of the source code augmented 1)popular in the software engineering community; 83to integrate the necessary instrumentation code into the executable. While executing Pexe with input i, we obtain the output O of the program. Meanwhile, the code coverage report C can also be readily extracted by the coverage profiler T . Each code coverage report contains the lines of code executed and unexecuted in the test program P under the input i. Those statements marked as unexecuted will be randomly pruned for the purpose of generating P’s equivalent variants, which will be discussed shortly. Cod implemented the supports for both gcov and llvm-cov. Take gcov as an example, Cod extracts coverage information by compiling the program P with the flag: “-O0 --coverage” under gcc. It tells the compiler to instrument additional code in the object files for generating the extra profiling information at runtime. Cod then runs the executable binary under input i to produce coverage report for the program P. Generating Variants via Transformation Based on the coverage report C for the original program P, its variants are generated (Line 5). Cod produces the variants by stochastically removing unexecuted program statements from the original program P. Specifically, for each of these removable lines of code, we made a random choice. As such, we obtain a number of variants P that should be equivalent to the original program. The function genVariant (Lines 13 - 20) describe Cod’s process for generating equivalence mutants via transformation. Note that stochastically removing unexecuted program statements would lead to many uncompilable mutants. Only the compilable ones are returned by genVariant (Line 17). Comparing the Outputs and Coverage Reports Having the outputs and the coverage reports for the orginal program P and its variants P , we detect bugs in the code coverage tool T by checking the existence of inconsistency. More specifically, We first compare the outputs of P and P . If they are not identical, a potential bug would be reported in the code coverage tool. Otherwise, the code coverage reports are further compared to seeking for inconsistencies. Note that only the code coverage of the common lines of code between the programs P and P (i.e. those lines of code left in the variant program), will be considered for comparison. If the code coverage reports is not consistent over the common lines, a potential bug is reported as well (Lines 9–12). C. Illustrative Examples In the following, we take our reported three concrete bug examples to illustrate how Cod works. Three bugs are newly discovered by Cod and confirmed by the GCC developers. Bug Example Exposed by Different Outputs Figure 3 shows a real bug example exposed via different outputs of two “equivalent” programs in gcov [17], a C code coverage tool integrated in GCC [18]. Figure 3 (a) and (b) are the code coverage reports produced by gcov for the original program P and its equivalent program P (by removing an unexecuted Line 8), respectively. Note that all the test programs are reformatted for presentation. As can be seen, a code coverage report is an annotated version of the source code augmented with the execution frequency of each line. The first and second column list the execution frequency and the line number. The frequency number “-1” in the first column indicates that the coverage information is unknown. In this example, we first utilize gcc to compile the program P and then execute it to produce the output and coverage report (shown as Figure 3 (a)). Note that the output in this case is 0. According to the original code coverage report of P, Cod decides to remove the 6th statement from the original program, resulting in an equivalent program P shown as Figure 3(b). Next, we compile and execute P to get the new output and coverage report. Here, the output turns to be 1. Since the outputs of these two program are not equal, P and P are somehow not equivalent, meaning that we actually deleted some executed code. The code coverage tool wrongly marked some executed statements as not executed. A potential bug is identified. We reported this bug to Bugzilla. The gcov developers quickly confirmed and fixed it. Bug Example Exposed by Strongly Inconsistent Coverage Figure 4 illustrates another real bug example uncovered by strongly inconsistent code coverage reports between the program and its “equivalence” variant. Figure 4 (a) shows the coverage report for P. We can read from it that Line 10 is not executed at all (i.e., the execution count is 0). Cod prunes Line 10 to generate the equivalent program P . After compiling and executing P , another coverage report shown as Figure 4 (a) is produced. As can be seen, there exists an strong inconsistency in term of the execution frequency of Line 6, indicating a potential bug. This bug is submitted and confirmed already by gcov developers. Bug Example Exposed by Weakly Inconsistent Coverage Figure 5 presents another confirmed real bug example found via the weakly inconsistent code coverage reports between the program and its equivalent variant. In Figure 5 (a), Line 6 in P is not executed (i.e., the execution count is 0). Cod gets rid of Line 6 to generate the equivalent program P . Upon compiling and executing P , another coverage report shown as Figure 5 (a) is generated. Apparently, the weakly inconsistency with respect to the execution frequency of Line 5 appears, indicating a potential bug. IV. EVALUATION This section presents our evaluation of Cod. We evaluated Cod using the most popular practical code coverage profilers: gcov and llvm-cov and a set of testing programs for testing compilers, and compared the results with existing differential technique C2V [11]. A. Evaluation Setup Profilers for Validation We evaluated Cod using the latest versions of gcov and llvm-cov, the most popular two code coverage profilers of C programs, as our experimental subjects. Both profilers are: 1) popular in the software engineering community; 83