1: 1:int f(int i)( /1: 1:int f(int i)( -1: 2:int res; =1; 2:int res; 3: switch (i) 3: switch (i) ×0: 4: case 5: 4: case 5: ×0: 5: res i-i; =1; 5: /res =i-if ×0: 6: break; -1: 6: 1: 7: default: =1; 7 default: 1: 8 res =it 2; y1: 8: res it 2; y1: 9 break: w1: 9: break; -1:10: -1:10: 1:11: return res; 1:11: return res; =1:12:1 =1:12:1 1:13:int main(woid){ 1:13:int main(void)( 1:14: f2): /1:14: f(2】: /1:15: return 0; 1:15: return 0; -1:16:) -1:16:1 (a)P (gcov) (b)p=Ps5,s6 Uis5,s6}(gcov) Fig.6.In the case of gcov #90438,gcov refuses to report coverage information for the case statement after removing Lines #5-6,but reports the execution of its default branch,which may mislead a developer or an automated tool.Note that though Line #4 is not covered,it is not removed otherwise will result in a compilation error. TABLE III TABLE IV SUMMARY OF THE TEST PROGRAMS WITH INCONSISTENT COVERAGE SUMMARIZATION OF THE COMMON AND NON-COMMON REPORTS BY COD ON THE CONSISTENT TEST PROGRAMS BY C2V. INSTRUMENTATION SITES BETWEEN GCOV AND LLVM-COV FOR THE TEST PROGRAMS IN GCC TESTSUITES 7.4.0. C/C:NUMBER OF COMMON /NON-COMMON INSTRUMENTATION SITES. inconsistent in terms of Cod weakly consistent under C2V gcov llvm-cov c C Strong Weak Strong Weak Total Avg. Total Avg. 3745 10 23 19 9 8302616.499852319.56 % 45.73% 、 54.27% C.Discussions Only 5036 test programs in GCC testsuites 7.4.0 can be Statistics of Inconsistencies Table IlI summarizes the test successfully compiled and further processed by both gcov and programs in which inconsistencies are identified by Cod but llvm-cov.Table IV summarizes the total number and total unable to be identified by C2V.All these inconsistencies are percentage of common instrumentation sites and non-common true positives:one is either a bug or may mislead a developer instrumentation sites.The second and the forth columns re- or automatic tool. spectively show the total number/percentage C and C.The While using these test programs to test gcov by Cod,we third and the last columns respectively shows the average C respectively identified 23 weak and 10 strong inconsistencies and C.From Table IV,we can found that about 46%code from these test programs.For llvm-cov,28 weak and 19 strong lines are C and each test program has about 16 code lines are inconsistencies are identified.This indicates that Cod has C its unique ability to identify many inconsistencies that c2V Table V summarizes the statistics of the proportion of C for unable to given the same test programs.We thus believe that the 5036 test programs in GCC testsuite 7.4.0.We calculate the Cod is more powerful and useful than C2V. proportion as p=Cl/(C+Cl)for each test program.Then, Weak Inconsistencies Between Independently Implemented we can obtain how many test programs falls into different Coverage Profilers As aforementioned,independently imple- intervals as listed in the second row of Table V.From Table V. mented code coverage profilers might have different interpre- we can find that about 40%070%code lines in most test tations for the same code.This is the major source of weak programs are in C.This indicates that most code lines of inconsistencies that C2V cannot recognize as a bug. each program is instrumented by only one of the two coverage To further understand weak inconsistencies among profilers, profilers.Besides,we also found that only 1.14%test programs we collect the common instrumentation sites between gcov have exactly the same instrumentation sites under the two 9.0.0 and llvm-cov 9.0 for the test programs using programs profilers. Overall,our core observation is that different coverage in GCC testsuites 7.4.0.A code line s is a common instru- mentation site s∈C if cs(s)≠-lAC%(s)≠-l,where profilers indeed have quite different interpretations on a same c(s)and c(s)refer to the profiled runtime execution count piece of code. of code line s in program P respectively by gcov and llvm- Reliability of Code Coverage Profilers Under Compiler cov.When cs(s)c(s)A (c(s)=-1vc(s)=-1), Optimizations Finally,even though coverage profilers provide s is an non-common instrumentation site sC. only faithful statistics under the zero optimization level,we 861: 1:int f(int i) { -1: 2: int res; 1: 3: switch (i) { ×0: 4: case 5: ×0: 5: res = i - i; ×0: 6: break; 1: 7: default: 1: 8: res = i * 2; 1: 9: break; -1: 10: } 1: 11: return res; -1: 12:} 1: 13:int main(void) { 1: 14: f(2); 1: 15: return 0; -1: 16:} 1: 1:int f(int i) { -1: 2: int res; -1: 3: switch (i) { -1: 4: case 5: -1: 5: // res = i - i; -1: 6: // break; -1: 7: default: 1: 8: res = i * 2; 1: 9: break; -1: 10: } 1: 11: return res; -1: 12:} 1: 13:int main(void) { 1: 14: f(2); 1: 15: return 0; -1: 16:} (a) P (gcov) (b) P = P\{s5, s6}∪{s 5, s 6} (gcov) Fig. 6. In the case of gcov #90438, gcov refuses to report coverage information for the case statement after removing Lines #5–6, but reports the execution of its default branch, which may mislead a developer or an automated tool. Note that though Line #4 is not covered, it is not removed otherwise will result in a compilation error. TABLE III SUMMARY OF THE TEST PROGRAMS WITH INCONSISTENT COVERAGE REPORTS BY CO D ON THE CONSISTENT TEST PROGRAMS BY C2V. # weakly consistent under C2V # inconsistent in terms of Cod gcov llvm-cov Strong Weak Strong Weak 3745 10 23 19 9 C. Discussions Statistics of Inconsistencies Table III summarizes the test programs in which inconsistencies are identified by Cod but unable to be identified by C2V. All these inconsistencies are true positives: one is either a bug or may mislead a developer or automatic tool. While using these test programs to test gcov by Cod, we respectively identified 23 weak and 10 strong inconsistencies from these test programs. For llvm-cov, 28 weak and 19 strong inconsistencies are identified. This indicates that Cod has its unique ability to identify many inconsistencies that C2V unable to given the same test programs. We thus believe that Cod is more powerful and useful than C2V. Weak Inconsistencies Between Independently Implemented Coverage Profilers As aforementioned, independently implemented code coverage profilers might have different interpretations for the same code. This is the major source of weak inconsistencies that C2V cannot recognize as a bug. To further understand weak inconsistencies among profilers, we collect the common instrumentation sites between gcov 9.0.0 and llvm-cov 9.0 for the test programs using programs in GCC testsuites 7.4.0. A code line s is a common instrumentation site s ∈ C if CG P (s) = −1 ∧ CL P (s) = −1, where CG P (s) and CL P (s) refer to the profiled runtime execution count of code line s in program P respectively by gcov and llvmcov. When CG P (s) = CL P (s) ∧ (CG P (s) = −1 ∨ CL P (s) = −1), s is an non-common instrumentation site s ∈ C . TABLE IV SUMMARIZATION OF THE COMMON AND NON-COMMON INSTRUMENTATION SITES BETWEEN GCOV AND LLVM-COV FOR THE TEST PROGRAMS IN GCC TESTSUITES 7.4.0. C / C: NUMBER OF COMMON / NON-COMMON INSTRUMENTATION SITES. C C Total Avg. Total Avg. # 83026 16.49 98523 19.56 % 45.73% - 54.27% - Only 5036 test programs in GCC testsuites 7.4.0 can be successfully compiled and further processed by both gcov and llvm-cov. Table IV summarizes the total number and total percentage of common instrumentation sites and non-common instrumentation sites. The second and the forth columns respectively show the total number/percentage C and C. The third and the last columns respectively shows the average C and C. From Table IV, we can found that about 46% code lines are C and each test program has about 16 code lines are C. Table V summarizes the statistics of the proportion of C for the 5036 test programs in GCC testsuite 7.4.0. We calculate the proportion as p = |C|/(|C|+|C|) for each test program. Then, we can obtain how many test programs falls into different intervals as listed in the second row of Table V. From Table V, we can find that about 40%∼70% code lines in most test programs are in C. This indicates that most code lines of each program is instrumented by only one of the two coverage profilers. Besides, we also found that only 1.14% test programs have exactly the same instrumentation sites under the two profilers. Overall, our core observation is that different coverage profilers indeed have quite different interpretations on a same piece of code. Reliability of Code Coverage Profilers Under Compiler Optimizations Finally, even though coverage profilers provide only faithful statistics under the zero optimization level, we 86