正在加载图片...
A.Randomized Differential Testing C compilers,GCC and LLVM.Based on this idea,Athena Differential testing is originally introduced by McKee- [59]and Hermes [60]are developed subsequently.Athena [59] man [53]which attempt to detect bugs by checking in- generates EMI by randomly inserting code into and removing consistent behaviors across different comparable software or statements from dead code regions.Hermes [60]complements different software versions.Randomized differential testing mutation strategies by operating on live code regions,which is a widely-used black-box differential testing technique in overcomes the limitations of mutating dead code regions. which the inputs are randomly generated [19].[54].Yang et Le et al.[52]first used Csmith to generate single-file test al.[19]developed Csmith,a randomized test case generation programs and transformed each single-file test program into tool that can support a large subset of C features and avoid multiple compilation units.Then,they stochastically assigned introducing undefined and unspecified behaviors,to find C each unit an optimization level to thoroughly exercise link- compiler bugs.Lidbury et al.[40]developed CLsmith,a tool time-optimizers.They discovered and reported 37 LTO bugs built on top of Csmith,to validate OpenCL compilers based for GCC and LLVM in 11 months.These techniques heavily on differential testing and testing via equivalence modulo depend on the code coverage information. inputs (EMD).They presented several strategies for random generation of OpenCL kernels and an injection mechanism VI.CONCLUSION AND FUTURE WORK which allowed EMI testing to be applied to kernel in order to avoid little or no dynamically-dead code.Their study revealed We proposed a randomized differential testing approach to a significant number of OpenCL compiler bugs in commercial hunting code coverage bugs and implemented a tool named implementations.Sun et al.[21]applied randomized differen- c2V to test two C code coverage tools,gcov and llvm-cov. tial testing to find and analyze compiler warning defects across Our evaluations where 42 and 28 bugs confirmed from gcov GCC and LLVM.In less than six months,they successfully and llvm-cov respectively in a short few months provided a found 52 confirmed/fixed bugs.Different from prior studies. strong evidence that code coverage tools are not as reliable as we apply randomized differential testing to find code coverage they might have been envisaged.Overall,our approach has the bugs which we believe is an important topic. following main advantages:(1)it simplifies the difficult code B.Coverage-based Differential Testing coverage validation problem as a simple comparison problem; (2)the comparison between code coverage reports not only A number of recent studies leverage coverage to improve the checks whether a program chunk gets executed or not,but effectiveness of differential testing.Chen et al.[55]proposed also the exact execution frequencies.Any discrepancy in these a coverage-directed fuzzing approach to detecting inconsis- dimensions would alert a potential bug report,which helps find tencies between different implementations of Java Virtual subtle but deep semantic bugs in code coverage tools;and(3) Machine (JVM).They mutated seeding classfiles,executed our approach is simple,straightforward,and general.It can mutants on a reference JVM implementation,and used cov- be easily applied to validate different code coverage tools, erage uniqueness as a discipline for accepting representative under various programming languages and coverage criteria. mutants.The accepted mutants were then used as the inputs In the future,more efforts should be paid on this area and to differentially test different JVM implementations.Pei et there is a need to examine the influence of those bugs on al.[56]proposed DeepXplore,a whitebox coverage-directed other techniques which depend on code coverage. differential testing for detecting inconsistencies between multi- ple DNNs.They first introduced neuron coverage as a system- atic metric for measuring how much of the internal logic of a ACKNOWLEDGMENT DNNs had been tested and then used this information to guide the testing process.As can be seen,the prerequisite of the We thank Yanyan Jiang,Zhaogui Xu,and the anony- above techniques is to obtain the correct coverage.Our work mous reviewers for their constructive comments.We also provides a general and practical approach to finding coverage thank the GCC and LLVM developers especially Martin bugs,thus helping improve the quality of code coverage tools. Liska for analyzing and fixing our reported bugs.This work is supported by the National Natural Science Founda- C.Testing via Equivalence Modulo Inputs tion of China(61702256,61772259,61432001,61832009 Testing via equivalence modulo inputs is a new testing tech- 61772263.61802168.61872177).the Natural Science Foun- nique proposed in recent years.In nature,EMI testing is a kind dation of Jiangsu Province(BK20170652),the China Post- of metamorphic testing,which modifies a program to generate doctoral Science Foundation(2018T110481).the Fundamental variants with the same outputs as the original program [57]. Research Funds for the Central Universities (020214380032 [58].Le et al.[7]proposed to generate equivalent versions 02021430047),the National Key R&D Program of China of the program by profiling program's execution and pruning (2018YFB1003901).Zhendong Su was supported by United unexecuted code.Once a program and its equivalent variant are States NSF Grants 1528133 and 1618158,and Google and constructed,both are used as input of the compiler under test. Mozilla Faculty Research awards.Yuming Zhou (zhouyum- checking for inconsistencies in their results.So far,this method ing@nju.edu.cn)and Baowen Xu (bwxu@nju.edu.cn)are the has been used to detect 147 confirmed bugs in two open source corresponding authors. 497A. Randomized Differential Testing Differential testing is originally introduced by McKee￾man [53] which attempt to detect bugs by checking in￾consistent behaviors across different comparable software or different software versions. Randomized differential testing is a widely-used black-box differential testing technique in which the inputs are randomly generated [19], [54]. Yang et al. [19] developed Csmith, a randomized test case generation tool that can support a large subset of C features and avoid introducing undefined and unspecified behaviors, to find C compiler bugs. Lidbury et al. [40] developed CLsmith, a tool built on top of Csmith, to validate OpenCL compilers based on differential testing and testing via equivalence modulo inputs (EMI). They presented several strategies for random generation of OpenCL kernels and an injection mechanism which allowed EMI testing to be applied to kernel in order to avoid little or no dynamically-dead code. Their study revealed a significant number of OpenCL compiler bugs in commercial implementations. Sun et al. [21] applied randomized differen￾tial testing to find and analyze compiler warning defects across GCC and LLVM. In less than six months, they successfully found 52 confirmed/fixed bugs. Different from prior studies, we apply randomized differential testing to find code coverage bugs which we believe is an important topic. B. Coverage-based Differential Testing A number of recent studies leverage coverage to improve the effectiveness of differential testing. Chen et al. [55] proposed a coverage-directed fuzzing approach to detecting inconsis￾tencies between different implementations of Java Virtual Machine (JVM). They mutated seeding classfiles, executed mutants on a reference JVM implementation, and used cov￾erage uniqueness as a discipline for accepting representative mutants. The accepted mutants were then used as the inputs to differentially test different JVM implementations. Pei et al. [56] proposed DeepXplore, a whitebox coverage-directed differential testing for detecting inconsistencies between multi￾ple DNNs. They first introduced neuron coverage as a system￾atic metric for measuring how much of the internal logic of a DNNs had been tested and then used this information to guide the testing process. As can be seen, the prerequisite of the above techniques is to obtain the correct coverage. Our work provides a general and practical approach to finding coverage bugs, thus helping improve the quality of code coverage tools. C. Testing via Equivalence Modulo Inputs Testing via equivalence modulo inputs is a new testing tech￾nique proposed in recent years. In nature, EMI testing is a kind of metamorphic testing, which modifies a program to generate variants with the same outputs as the original program [57], [58]. Le et al. [7] proposed to generate equivalent versions of the program by profiling program’s execution and pruning unexecuted code. Once a program and its equivalent variant are constructed, both are used as input of the compiler under test, checking for inconsistencies in their results. So far, this method has been used to detect 147 confirmed bugs in two open source C compilers, GCC and LLVM. Based on this idea, Athena [59] and Hermes [60] are developed subsequently. Athena [59] generates EMI by randomly inserting code into and removing statements from dead code regions. Hermes [60] complements mutation strategies by operating on live code regions, which overcomes the limitations of mutating dead code regions. Le et al. [52] first used Csmith to generate single-file test programs and transformed each single-file test program into multiple compilation units. Then, they stochastically assigned each unit an optimization level to thoroughly exercise link￾time-optimizers. They discovered and reported 37 LTO bugs for GCC and LLVM in 11 months. These techniques heavily depend on the code coverage information. VI. CONCLUSION AND FUTURE WORK We proposed a randomized differential testing approach to hunting code coverage bugs and implemented a tool named C2V to test two C code coverage tools, gcov and llvm-cov. Our evaluations where 42 and 28 bugs confirmed from gcov and llvm-cov respectively in a short few months provided a strong evidence that code coverage tools are not as reliable as they might have been envisaged. Overall, our approach has the following main advantages: (1) it simplifies the difficult code coverage validation problem as a simple comparison problem; (2) the comparison between code coverage reports not only checks whether a program chunk gets executed or not, but also the exact execution frequencies. Any discrepancy in these dimensions would alert a potential bug report, which helps find subtle but deep semantic bugs in code coverage tools; and (3) our approach is simple, straightforward, and general. It can be easily applied to validate different code coverage tools, under various programming languages and coverage criteria. In the future, more efforts should be paid on this area and there is a need to examine the influence of those bugs on other techniques which depend on code coverage. ACKNOWLEDGMENT We thank Yanyan Jiang, Zhaogui Xu, and the anony￾mous reviewers for their constructive comments. We also thank the GCC and LLVM developers especially Martin Liska for analyzing and fixing our reported bugs. This ˇ work is supported by the National Natural Science Founda￾tion of China (61702256, 61772259, 61432001, 61832009, 61772263, 61802168, 61872177), the Natural Science Foun￾dation of Jiangsu Province (BK20170652), the China Post￾doctoral Science Foundation (2018T110481), the Fundamental Research Funds for the Central Universities (020214380032, 02021430047), the National Key R&D Program of China (2018YFB1003901). Zhendong Su was supported by United States NSF Grants 1528133 and 1618158, and Google and Mozilla Faculty Research awards. Yuming Zhou (zhouyum￾ing@nju.edu.cn) and Baowen Xu (bwxu@nju.edu.cn) are the corresponding authors. 497
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有