1: 1:int main() 11 -1lint main() 11 -1lint main() =1 2:1 21×11{ 2111{ -1: 3: switch (8) 3111 switch (8) 31 11 switch (8) -1: 4 411 4111 =1g case 8: 5111 case 8: 51y11 case 8: 1: 6 break; 61 1 break; 61 1 break; -1: 7 default: 7|11 default: 711 default: -1:8: abort () 81×01 abort () 81x0 i//abort () -1:9: br色ak: 91.1 break: 91x01 b色a: -1:10: 10111 10111 1:11: return 0; 1111 return 0; 11111 return 0; =1:12: 121/111 121111 (a)Cp (gcov) (b)Cp (llvm-cov) ©CPAias}U(ss}Ivm-cov) Fig.I.The bug case of LLVM #41821.Ilvm-cov incorrectly reported that the break in Line 9 is executed.This bug cannot be detected by differential testing [11]because gcov does not provide coverage statistics for Line 9.Visual conventions of coverage statistics:For a line of code s,gcov and llvm-cov output coverage statistics(s)in the first and second column,respectively.A-1 denotes that the profiler does not provide coverage information of s.A check mark or cross mark followed by a number n denotes that cp(s)=n. correct)interpretations over the coverage statistics.Section IV Program P Program Program P reveal that 92.9%of inconsistencies found by differential Pruner testing are weak,i.e.,Cp(s)Cp(s)with Cp(s)=-1V Cp(s)=-1.Our motivating example in Figures 1 (a)-(b) showed that for 10/12 lines,exactly one of gcov or llvm-cov Coverage Coverage profiler profiler provides no coverage information.Unfortunately.C2V has to ignore all of such weak consistencies (i.e.,not report any of them as a bug)because the vast majority of them are attributed to the feature of independently implemented profilers. Coveraoe Coverage Output O report C Finally,differential testing reports false positive even when report C Output O' two profilers report strongly inconsistent coverage statistics, because two profilers may disagree over the definition of the Consistent? execution count of a line,e.g..whether the initialization code of a for loop counts for one time of execution. D.Motivation Bug report The key observation leading to automatic self-validation of No a single profiler is that changing unexecuted statements in a Equal? program should not affect the coverage statistics.Take the program in Figure I (a)(b)as an example.Suppose that we comment out the function call in Line 8 of P and obtain Fig.2.The framework of Cod =P{ss}u[ss}as shown in Figure 1 (c)We assert that unchanged statements should have identical coverage statistics, the profiler because we have full control over P(thus easily i.e., to guarantee it is deterministic)and there is little chance that Vs ePnp'.Cp(s)=Cp(s), the compiler/hardware is defective. In the motivating example,Cp(sg)Cp(sg)revealed a reasonably assuming that: previously unknown profiler bug in which llvm-cov incorrectly 1)P is deterministic,contains no undefined behavior,and reported an unexecuted break as being executed once.This does not depend on external environment; bug case is missed by differential testing (particularly,C2V) 2)the coverage statistics is correct;and because all inconsistencies between Cp (gcov)and Cp(llvm- 3)the executions of P and p are consistent with their cov)are weak (Lines 1-5,7-10,and 12).Generally,weak semantics. inconsistencies between different compiler implementations Since we only remove "unexecuted"statements reported indicate different compilation strategies (thus do not indicate by a coverage profiler,P and p'should be semantically a profiler bug)and should not be reported by c2V. equivalent.Furthermore,a profiler(particularly under minimal III.APPROACH optimization)should be self-consistent in terms of which statement should have a coverage statistics.Therefore,if there A.Metamorphic Testing is an inconsistency Cp(s)Cp(s),no matter whether A test oracle is a mechanism for determining whether it is a strong or weak inconsistency,either of the above a test has passed or failed.Under certain circumstances, assumptions is violated.It turns out that we should blame however,the oracle is not available or too expensive to achieve. 811: 1:int main() -1: 2:{ -1: 3: switch (8) -1: 4: { -1: 5: case 8: 1: 6: break; -1: 7: default: -1: 8: abort (); -1: 9: break; -1: 10: } 1: 11: return 0; -1: 12:} 1| -1|int main() 2| 1|{ 3| 1| switch (8) 4| 1| { 5| 1| case 8: 6| 1| break; 7| 1| default: 8| ×0| abort (); 9| 1| break; 10| 1| } 11| 1| return 0; 12| 1|} 1| -1|int main() 2| 1|{ 3| 1| switch (8) 4| 1| { 5| 1| case 8: 6| 1| break; 7| 1| default: 8| ×0| ; // abort (); 9| ×0| break; 10| 1| } 11| 1| return 0; 12| 1|} (a) CP (gcov) (b) CP (llvm-cov) (c) CP\{s8}∪{s 8} (llvm-cov) Fig. 1. The bug case of LLVM #41821. llvm-cov incorrectly reported that the break in Line 9 is executed. This bug cannot be detected by differential testing [11] because gcov does not provide coverage statistics for Line 9. Visual conventions of coverage statistics: For a line of code s, gcov and llvm-cov output coverage statistics cP (s) in the first and second column, respectively. A -1 denotes that the profiler does not provide coverage information of s. A check mark or cross mark followed by a number n denotes that cP (s) = n. correct) interpretations over the coverage statistics. Section IV reveal that 92.9% of inconsistencies found by differential testing are weak, i.e., CP (s) = C P (s) with CP (s) = −1 ∨ C P (s) = −1. Our motivating example in Figures 1 (a)–(b) showed that for 10/12 lines, exactly one of gcov or llvm-cov provides no coverage information. Unfortunately, C2V has to ignore all of such weak consistencies (i.e., not report any of them as a bug) because the vast majority of them are attributed to the feature of independently implemented profilers. Finally, differential testing reports false positive even when two profilers report strongly inconsistent coverage statistics, because two profilers may disagree over the definition of the execution count of a line, e.g., whether the initialization code of a for loop counts for one time of execution. D. Motivation The key observation leading to automatic self-validation of a single profiler is that changing unexecuted statements in a program should not affect the coverage statistics. Take the program in Figure 1 (a)–(b) as an example. Suppose that we comment out the function call in Line 8 of P and obtain P = P\{s8}∪{s 8} as shown in Figure 1 (c) We assert that unchanged statements should have identical coverage statistics, i.e., ∀s ∈P∩P . CP (s) = CP (s), reasonably assuming that: 1) P is deterministic, contains no undefined behavior, and does not depend on external environment; 2) the coverage statistics is correct; and 3) the executions of P and P are consistent with their semantics. Since we only remove “unexecuted” statements reported by a coverage profiler, P and P should be semantically equivalent. Furthermore, a profiler (particularly under minimal optimization) should be self-consistent in terms of which statement should have a coverage statistics. Therefore, if there is an inconsistency CP (s) = CP (s), no matter whether it is a strong or weak inconsistency, either of the above assumptions is violated. It turns out that we should blame Bug report No Coverage report C Coverage report C’ Consistent? Equal? No Program P Program P’ Program Pruner Coverage profiler Coverage profiler Output O Output O’ Fig. 2. The framework of Cod the profiler because we have full control over P (thus easily to guarantee it is deterministic) and there is little chance that the compiler/hardware is defective. In the motivating example, CP (s9) = CP (s9) revealed a previously unknown profiler bug in which llvm-cov incorrectly reported an unexecuted break as being executed once. This bug case is missed by differential testing (particularly, C2V) because all inconsistencies between CP (gcov) and CP (llvmcov) are weak (Lines 1–5, 7–10, and 12). Generally, weak inconsistencies between different compiler implementations indicate different compilation strategies (thus do not indicate a profiler bug) and should not be reported by C2V. III. APPROACH A. Metamorphic Testing A test oracle is a mechanism for determining whether a test has passed or failed. Under certain circumstances, however, the oracle is not available or too expensive to achieve. 81