TABLE I STATISTICS OF INCONSISTENCY-T_中国高校课件下载中心

点击下载：Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing

正在加载图片...

TABLE I STATISTICS OF INCONSISTENCY-TRIGGERING TEST PROGRAMS. #Test Programs #Time out Inconsistency-triggering Test Programs(After filtering) Num(After filtering)/%P CUOT C010 co cTor cIo cm Csmith-generate 1,000.0 182.92 261,065(758)131.95%3,331(119の238,554(251）1,625(36)4,547115)87(3312,470(151)451(53 GcC's test-suite 2.750 19 262/9.56% 81 124 15 30 Clang's test-suite 106 20/21.98% 9 4 TABLE II in latest version.It is worth noting that"won't fix"indicates a STATISTICS OF LINES OF CODE FOR THE ORIGINAL AND REDUCED VERSION OF CSMITH-GENERATED TEST PROGRAMS confirmed bug but will not be fixed by developers.There are total 6 bugs are pending for developers'responses which are min mean median max not listed in Table V.Consistent with C.Sun et al's [21]and Original 41 210 167 874 V.Le et al's [52]studies,due to the bug management process Reduced 2 10 8 27 Relative of LLVM is not organized as that of GCC,if a llvm-cov bug 4.76%4.79% report has been CCed by Clang developers and there is no TABLE III objection in the comments,we label the bug as confirmed.In INFORMATION OF ALL REPORTED BUGS addition,as stated by developers,if someone does not close gcov llvm-cov Total the reported bug as "invalid",then the bug is real in LLVM Confirmed 42 28 70 Bugzilla.Another 4 reported bugs were marked as duplicate Pending 6 by developers,since similar test programs trigger the same Duplicate 0 Rejected 3 3 coverage bug inside gcov or llvm-cov.The remaining 3 reports Reported 46 37 83 were rejected by GCC developers.Two of them is GCC's TABLE IV default optimization strategy and the other is with invalid code. BUG TYPES OF CONFIRMED BUGS Table V lists all the confirmed bugs in detail,including the identity,priority,current report status,bug types,the origins of gcov llvm-cov Total Spurious Marking 34 the bug-triggering test programs(i.e.Csmith-generated,GCC's 16 Missing Marking 5 18 or Clang's test-suite),and affected versions.Note that 'New' Wrong Frequency 11 indicates confirmed in GCC's Bugzilla,and 'Assigned'refers Total 28 70 to that the confirmed bugs are under the process of fixing. Bug type.We categorize coverage bugs into three classes as to the cases where some code is marked as unexecuted by gcov mentioned in Section II-C:Spurious Marking,Missing Mark- but as executed by llmv-cov.C101 category has the minimal ing,and Wrong Frequency.Table IV shows the breakdown number of inconsistent reports,indicating that inconsistencies of the bug types of all the confirmed bugs.Most of the bugs of Type A and Type C rarely occur simultaneously.In are spurious marking bugs,i.e.unexecuted code is wrongly addition,we can found that our method is very efficient for marked as executed. filtering potential"duplicate"test programs. Bug importance.As shown in Column 4 of Table V.all For those Csmith-generated inconsistency-triggering pro- our confirmed bugs have the default priority P3 except 14 of grams,our filtering strategy led to 758 test programs for re- them are reset to P5 by developers.Besides,13 of our reported duction and inspection.Table II lists their code size before and coverage bugs have been fixed by developers.Note that 3 bugs after reduction.We can see that the mean lines of code drops are confirmed as 'Works'which means that they are fixed in from 210 to 10,thus helping effectively report code coverage developers'version.Thus,we consider these three bugs as bugs.For those inconsistency-triggering test programs from fixed by developers.This together shows that our reported GCC's and Clang's test-suites,we intended to inspect all of bugs are important and worth the effort. them.The reasons are two-fold.First,programs in these test- Program source.As shown in Column 7 of Table V. suites are usually unique in program structures (unlike the test programs from all the three main sources (i.e.Csmith- randomly generated programs produced by Csmith).Second, generated,GCC's test-suite,and Clang's test-suite)can trigger the total number is not large (i.e.262 from GCC's test-suite coverage bugs.Two programs from Clang'test-suite trigger and 20 from Clang'test-suite).In summary,we analyzed about coverage bugs of gcov,and a number of programs from 1000 inconsistency-triggering test programs. GCC's test-suite can also help find coverage bugs for llvm- Bug Count.We filed a total of 83 bug reports for gcov and cov as well.It is worth noting that test programs from llvm-cov during our testing period.They can be found un-different sources may induce the same coverage bugs.It indeed der "yangyibiaoenju.edu.cn"in GCC's and LLVM's happened in our experiment,and we only reported once. Bugzilla databases.Table IlI shows the details of all the bugs Affected versions.We only tested gcov and llvm-cov inside we have reported so far.As shown in Column 4.till April the latest development trunks of GCC and Clang respectively. 16,2018.we have reported 83 bugs,of which 70 bugs are When we find a test case that reveals a bug in gcov or llvm-cov, confirmed by developers.Of the 70 confirmed bugs,11 are we also check the corresponding compiler's stable releases resolved as“fixed'",l7as“won'tfix”,and2as“works for me' against the same test case.Column 8 of Table V shows the 494TABLE I STATISTICS OF INCONSISTENCY-TRIGGERING TEST PROGRAMS. #Test Programs #Time out Inconsistency-triggering Test Programs(After filtering) #Num(After filtering) / %P C001 C010 C100 C011 C101 C110 C111 Csmith-generated 1,000,000 182,927 261,065 (758) / 31.95% 3,331 (119) 238,554 (251) 1,625 (36) 4,547 (115) 87 (33) 12,470 (151) 451 (53) GCC’s test-suite 2,756 15 262 / 9.56% 81 124 15 30 8 1 3 Clang’s test-suite 106 15 20 / 21.98% 9 5 4 10 1 0 TABLE II STATISTICS OF LINES OF CODE FOR THE ORIGINAL AND REDUCED VERSION OF CSMITH-GENERATED TEST PROGRAMS min mean median max Original 41 210 167 874 Reduced 2 10 8 27 Relative – 4.76% 4.79% – TABLE III INFORMATION OF ALL REPORTED BUGS gcov llvm-cov Total Confirmed 42 28 70 Pending 1 5 6 Duplicate 0 4 4 Rejected 3 0 3 Reported 46 37 83 TABLE IV BUG TYPES OF CONFIRMED BUGS gcov llvm-cov Total Spurious Marking 18 16 34 Missing Marking 13 5 18 Wrong Frequency 11 7 18 Total 42 28 70 to the cases where some code is marked as unexecuted by gcov but as executed by llmv-cov. C101 category has the minimal number of inconsistent reports, indicating that inconsistencies of T ype A and T ype C rarely occur simultaneously. In addition, we can found that our method is very efficient for filtering potential “duplicate” test programs. For those Csmith-generated inconsistency-triggering programs, our filtering strategy led to 758 test programs for reduction and inspection. Table II lists their code size before and after reduction. We can see that the mean lines of code drops from 210 to 10, thus helping effectively report code coverage bugs. For those inconsistency-triggering test programs from GCC’s and Clang’s test-suites, we intended to inspect all of them. The reasons are two-fold. First, programs in these testsuites are usually unique in program structures (unlike the randomly generated programs produced by Csmith). Second, the total number is not large (i.e. 262 from GCC’s test-suite and 20 from Clang’ test-suite). In summary, we analyzed about 1000 inconsistency-triggering test programs. Bug Count. We filed a total of 83 bug reports for gcov and llvm-cov during our testing period. They can be found under “yangyibiao@nju.edu.cn” in GCC’s and LLVM’s Bugzilla databases. Table III shows the details of all the bugs we have reported so far. As shown in Column 4, till April 16, 2018, we have reported 83 bugs, of which 70 bugs are confirmed by developers. Of the 70 confirmed bugs, 11 are resolved as “fixed”, 17 as “won’t fix”, and 2 as “works for me” in latest version. It is worth noting that “won’t fix” indicates a confirmed bug but will not be fixed by developers. There are total 6 bugs are pending for developers’ responses which are not listed in Table V. Consistent with C. Sun et al’s [21] and V. Le et al’s [52] studies, due to the bug management process of LLVM is not organized as that of GCC, if a llvm-cov bug report has been CCed by Clang developers and there is no objection in the comments, we label the bug as confirmed. In addition, as stated by developers, if someone does not close the reported bug as “invalid”, then the bug is real in LLVM Bugzilla. Another 4 reported bugs were marked as duplicate by developers, since similar test programs trigger the same coverage bug inside gcov or llvm-cov. The remaining 3 reports were rejected by GCC developers. Two of them is GCC’s default optimization strategy and the other is with invalid code. Table V lists all the confirmed bugs in detail, including the identity, priority, current report status, bug types, the origins of the bug-triggering test programs (i.e. Csmith-generated, GCC’s or Clang’s test-suite), and affected versions. Note that ‘New’ indicates confirmed in GCC’s Bugzilla, and ‘Assigned’ refers to that the confirmed bugs are under the process of fixing. Bug type. We categorize coverage bugs into three classes as mentioned in Section II-C: Spurious Marking, Missing Marking, and Wrong Frequency. Table IV shows the breakdown of the bug types of all the confirmed bugs. Most of the bugs are spurious marking bugs, i.e. unexecuted code is wrongly marked as executed. Bug importance. As shown in Column 4 of Table V, all our confirmed bugs have the default priority P3 except 14 of them are reset to P5 by developers. Besides, 13 of our reported coverage bugs have been fixed by developers. Note that 3 bugs are confirmed as ‘Works’ which means that they are fixed in developers’ version. Thus, we consider these three bugs as fixed by developers. This together shows that our reported bugs are important and worth the effort. Program source. As shown in Column 7 of Table V, test programs from all the three main sources (i.e. Csmithgenerated, GCC’s test-suite, and Clang’s test-suite) can trigger coverage bugs. Two programs from Clang’ test-suite trigger coverage bugs of gcov, and a number of programs from GCC’s test-suite can also help find coverage bugs for llvmcov as well. It is worth noting that test programs from different sources may induce the same coverage bugs. It indeed happened in our experiment, and we only reported once. Affected versions. We only tested gcov and llvm-cov inside the latest development trunks of GCC and Clang respectively. When we find a test case that reveals a bug in gcov or llvm-cov, we also check the corresponding compiler’s stable releases against the same test case. Column 8 of Table V shows the 494

<<向上翻页向下翻页>>

点击下载：Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing