正在加载图片...
TABLE I TABLE II STATISTICS OF BUG-TRIGGERING TEST PROGRAMS. LIST OF CONFIRMED OR FIXED BUGS.PN DENOTES A NORMAL PRIORITY DIFFTEST DENOTES WHETHER THE BUG CAN BE FOUND BY A Inconsistent Reports DIFFERENTIAL TESTING. Profilers Different Outputs Strong Weak D Profiler Bugzilla ID Priority Status Type DiffTest gcov 69 54 llvm-cov 0 62 11 gcov 88913 P3 Fixed Wrong Freq. gcov 88914 P3 Fixed Wrong Freq. 3 gcov 88924 P5 New Wrong Freq. in the test-suite shipped with the latest gcc release (7.4.0) gcov 88930 Fixed Wrong Freq. and 5,000 random programs generated by csmith [19].All 5 evaluated programs contain neither external environmental gcov 89465 令 Fixed Missing dependency nor undefined behavior.We run Cod over all the 6 gcov 89467 P3 Fixed Wrong Freq. test programs and collect all reported inconsistencies for a gcov 89468 P5 New Wrong Freq. manual inspection.We also compare these results with the gcov 89469 New Wrong Freq. state-of-the-art differential testing technique C2V [11]. 9 gcov 89470 P5 New Wrong Freq. Testing Environment We evaluated gcov shipped with the 10 gcov 89673 P5 New Spurious latest version of gcov (until gcc 9.0.1-20190414)and llvm- 11 gcov 89674 P5 New Spurious cov (until llvm 9.0.0-svn358899)during our experiments.All 12 gcov 89675 P3 Fixed Missing experiments were conducted on a hexa-core Intel(R)Core(TM) 13 gcov 90023P5New Spurious CPU@3.20GHz virtual machine with 10GiB of RAM running 14 90054P3 Fixed Missing Ubuntu Linux 18.04. gcov 15 gcov 90057 P3 Fixed Wrong Freq. B.Experimental Results gcov 90066 P5 New Wrong Freg. Inconsistent Reports For each of the test cases in our 17 gcov 90091 P3 New Wrong Freg. testbed,only one variant was generated by using Cod for the 18 gcov 90104 P3 New Wrong Freq. validation.The only variant is generated by removing all the 19 gcov 90425 P5 New Wrong Freq. unexecuted statements reported by coverage profilers from the 20 gcov 90439 P3 New Missing original test cases.It is obvious that generating more variants 21 llvm-cov 41051 PN New Wrong Freg. for each test program may trigger more inconsistencies over 22 llvm-cov 41821 PN New Spurious the test programs and probably detect more bugs in those cov- 23 llvm-cov 41849 PN New Missing erage profilers.Table I shows the statistics of bug-triggering test programs over two code coverage profilers under test,i.e.. gcov and llvm-cov.Column 2 refers to the total number of the pairs of test program with its variant,which can lead confirmation state,one was marked as duplicate,and only one to different execution outputs,and Column 3 shows the total was rejected by the developer (gcov #90438).This rejected number that can impose inconsistent coverage reports. case is controversial because gcc is performing optimization The single case in which the variant outputs a different value even under the zero optimization levels (as shown in Figure 6), (Figure 3)is due to the incorrect coverage statistics causing which may mislead a developer or an automated tool that are Cod to create functionally different "equivalent"mutated vari- based on the branch information in the coverage statistics. ants.Others inconsistencies also due to profiler bugs,which Following the notions from C2V,code coverage bugs inside are discussed as follows. coverage profilers can categorized as Spurious Marking,Miss- ing Marking,and Wrong Frequency.As shown in Column 6 Bugs Found We manually inspected all cases and found that of Table II,we can find that Cod is able to detect all three all reported (strong and weak)inconsistencies revealed defects types of bugs in coverage profilers.14 bugs belong to Wrong in the profiler.By far,we reported a total of 26 bugs to the Frequency,5 bugs belong to Missing Marking,and the rest 4 developers of gcov and llvm-cov.The manual classification bugs is Spurious.Besides,most of bugs are Wrong Freguency and reporting of profiler bugs is still on-going.We believe bugs,i.e.,the execution frequencies is wrongly reported. that more bugs will be reported in the future. 23/26 bugs are confirmed3 by the developers as listed in Among all these bugs,nearly half (12/26)cannot be mani- Table II.One of the remaining three is still in the pending fested by differential testing.Considering that differential test- ing leverages the coverage statistics of an independent profiler 3Consistent with C.Sun et al's [20]and V.Le et al's [21]studies,due to implementation(which produces correct coverage information the bug management process of LLVM is not as organized as that of GCC, in all these cases,and thus differential testing is essentially if a llvm-cov bug report has been CCed by Clang developers and there is comparing with a golden version)while Cod is merely self- no objection in the comments,we label the bug as confirmed.In addition.as stated by developers,if someone does not close the reported bug as"invalid". validation,we are expecting Cod to be effective and useful in then the bug is real in LLVM Bugzilla. finding code coverage profiler bugs. 85TABLE I STATISTICS OF BUG-TRIGGERING TEST PROGRAMS. Profilers Different Outputs Inconsistent Reports Strong Weak gcov 1 69 54 llvm-cov 0 62 11 in the test-suite shipped with the latest gcc release (7.4.0) and 5,000 random programs generated by csmith [19]. All evaluated programs contain neither external environmental dependency nor undefined behavior. We run Cod over all the test programs and collect all reported inconsistencies for a manual inspection. We also compare these results with the state-of-the-art differential testing technique C2V [11]. Testing Environment We evaluated gcov shipped with the latest version of gcov (until gcc 9.0.1-20190414) and llvm￾cov (until llvm 9.0.0-svn358899) during our experiments. All experiments were conducted on a hexa-core Intel(R) Core(TM) CPU@3.20GHz virtual machine with 10GiB of RAM running Ubuntu Linux 18.04. B. Experimental Results Inconsistent Reports For each of the test cases in our testbed, only one variant was generated by using Cod for the validation. The only variant is generated by removing all the unexecuted statements reported by coverage profilers from the original test cases. It is obvious that generating more variants for each test program may trigger more inconsistencies over the test programs and probably detect more bugs in those cov￾erage profilers. Table I shows the statistics of bug-triggering test programs over two code coverage profilers under test, i.e., gcov and llvm-cov. Column 2 refers to the total number of the pairs of test program with its variant, which can lead to different execution outputs, and Column 3 shows the total number that can impose inconsistent coverage reports. The single case in which the variant outputs a different value (Figure 3) is due to the incorrect coverage statistics causing Cod to create functionally different “equivalent” mutated vari￾ants. Others inconsistencies also due to profiler bugs, which are discussed as follows. Bugs Found We manually inspected all cases and found that all reported (strong and weak) inconsistencies revealed defects in the profiler. By far, we reported a total of 26 bugs to the developers of gcov and llvm-cov. The manual classification and reporting of profiler bugs is still on-going. We believe that more bugs will be reported in the future. 23/26 bugs are confirmed3 by the developers as listed in Table II. One of the remaining three is still in the pending 3Consistent with C. Sun et al’s [20] and V. Le et al’s [21] studies, due to the bug management process of LLVM is not as organized as that of GCC, if a llvm-cov bug report has been CCed by Clang developers and there is no objection in the comments, we label the bug as confirmed. In addition, as stated by developers, if someone does not close the reported bug as “invalid”, then the bug is real in LLVM Bugzilla. TABLE II LIST OF CONFIRMED OR FIXED BUGS. PN DENOTES A NORMAL PRIORITY. DIFFTEST DENOTES WHETHER THE BUG CAN BE FOUND BY A DIFFERENTIAL TESTING. ID Profiler Bugzilla ID Priority Status Type DiffTest 1 gcov 88913 P3 Fixed Wrong Freq. 2 gcov 88914 P3 Fixed Wrong Freq. 3 gcov 88924 P5 New Wrong Freq. 4 gcov 88930 P3 Fixed Wrong Freq. 5 gcov 89465 P3 Fixed Missing × 6 gcov 89467 P3 Fixed Wrong Freq. 7 gcov 89468 P5 New Wrong Freq. × 8 gcov 89469 P5 New Wrong Freq. 9 gcov 89470 P5 New Wrong Freq. 10 gcov 89673 P5 New Spurious × 11 gcov 89674 P5 New Spurious × 12 gcov 89675 P3 Fixed Missing × 13 gcov 90023 P5 New Spurious × 14 gcov 90054 P3 Fixed Missing 15 gcov 90057 P3 Fixed Wrong Freq. 16 gcov 90066 P5 New Wrong Freq. × 17 gcov 90091 P3 New Wrong Freq. 18 gcov 90104 P3 New Wrong Freq. × 19 gcov 90425 P5 New Wrong Freq. × 20 gcov 90439 P3 New Missing × 21 llvm-cov 41051 PN New Wrong Freq. 22 llvm-cov 41821 PN New Spurious × 23 llvm-cov 41849 PN New Missing × confirmation state, one was marked as duplicate, and only one was rejected by the developer (gcov #90438). This rejected case is controversial because gcc is performing optimization even under the zero optimization levels (as shown in Figure 6), which may mislead a developer or an automated tool that are based on the branch information in the coverage statistics. Following the notions from C2V, code coverage bugs inside coverage profilers can categorized as Spurious Marking, Miss￾ing Marking, and Wrong Frequency. As shown in Column 6 of Table II, we can find that Cod is able to detect all three types of bugs in coverage profilers. 14 bugs belong to Wrong Frequency, 5 bugs belong to Missing Marking, and the rest 4 bugs is Spurious. Besides, most of bugs are Wrong Frequency bugs, i.e., the execution frequencies is wrongly reported. Among all these bugs, nearly half (12/26) cannot be mani￾fested by differential testing. Considering that differential test￾ing leverages the coverage statistics of an independent profiler implementation (which produces correct coverage information in all these cases, and thus differential testing is essentially comparing with a golden version) while Cod is merely self￾validation, we are expecting Cod to be effective and useful in finding code coverage profiler bugs. 85
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有