2019 34th IEEE/ACM International Conference on Automated Software Engineering(ASE) Automatic Self-Validation for Code Coverage Profilers Yibiao Yang*1,Yanyan Jiang",Zhiqiang Zuo*,Yang Wang", Hao Sun*,Hongmin Lu*,Yuming Zhou',and Baowen Xu* *State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,China School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan.China [yangyibiao,jyy,zqzuo)@nju.edu,cn,dz1933028@smail.nju.edu.cn shqking @gmail.com,hmlu,zhouyuming,bwxu@nju.edu.cn Abstract-Code coverage as the primitive dynamic program Even though the programming experts can specify the oracle behavior information,is widely adopted to facilitate a rich precisely,it requires enormous human intervention,making it spectrum of software engineering tasks,such as testing,fuzzing, impractical. debugging,fault detection,reverse engineering,and program understanding.Thanks to the widespread applications,it is A simple differential testing approach C2V tried to uncover crucial to ensure the reliability of the code coverage profilers. coverage bugs by comparing the coverage profiling results of Unfortunately,due to the lack of research attention and the the same input program over two different profiler implemen- existence of testing oracle problem,coverage profilers are far tations (e.g.,gcov and llvm-cov)[11].For instance,if gcov away from being tested sufficiently.Bugs are still regularly seen and llvm-cov provide different coverage information for the in the widely deployed profilers,like gcov and llvm-cov,along with gcc and llvm,respectively. same statement of the profiled program,a bug is reported. This paper proposes Cod,an automated self-validator for Due to the inconsistency of coverage semantics defined by effectively uncovering bugs in the coverage profilers.Starting different profiler implementations,it is rather common that from a test program (either from a compiler's test suite or generated randomly),Cod detects profiler bugs with zero false independently implemented coverage profilers exhibit different positive using a metamorphic relation in which the coverage opinions on the code-line based statistics (e.g.,the case in statistics of that program and a mutated variant are bridged. Figure 1)-this essentially contradicts the fundamental as- We evaluated Cod over two of the most well-known code sumption of differential testing that distinct coverage profilers coverage profilers,namely gcov and llvm-cov.Within a four should output identical coverage statistics for the same input month testing period,a total of 196 potential bugs(123 for gcov, program. 73 for llvm-cov)are found,among which 23 are confirmed by the developers. Approach To tackle the flaws of the existing approach,this pa- Index Terms-Code coverage,Metamorphic testing,Coverage per presents Cod,a fully automated self-validator of coverage profilers,Bug detection. profilers,based on the metamorphic testing formulation [121. I.INTRODUCTION Instead of comparing outputs from two independent profilers Cod takes a single profiler and a program P (either from Profiling code coverage data [I](e.g.,executed branches, a compiler's test suite or generated randomly)as input and paths,functions,etc.)of the instrumented subject programs is the cornerstone of a rich spectrum of software engineering uncovers the bugs by identifying the inconsistency of coverage results from P and its equivalent mutated variants whose practices,such as testing [2],fuzzing [3].debugging [4]- [6].specification mining [7].[8],fault detection [9].reverse coverage statistics are expected to be identical.The equivalent program variants are generated based on the assumption that engineering,and program understanding [10].Incorrect cov- modifying unexecuted code blocks should not affect the cover- erage information would severely mislead developers in their age statistics of executed blocks under the identical profiler, software engineering practices. Unfortunately,coverage profilers themselves (e.g.,gcov and which should generally hold in a non-optimized setting.This idea originates from EMI [2],a metamorphic testing approach llvm-cov)are prone to errors.Even a simple randomized which is targeted at compiler optimization bugs. differential testing technique exposed more than 70 bugs in coverage profilers [11].The reasons are two-fold.Firstly,nei- Specifically,assuming that the compiler is correct and ther the application-end developers nor academic researchers given a deterministic program P under profiling (either from paid sufficient attention to the testing of code coverage pro- a compiler's test suite or generated randomly)and fixate its filers.Secondly,automatic testing of coverage profilers is still input,Cod obtains a reference program P by removing the unexecuted statements in P.P/should strictly follow the same challenging due to the lack of test oracles.During the code coverage testing,the oracle is supposed to constitute the rich execution path as long as the coverage profiling data of p is correct.Therefore,Cod asserts that the coverage statistics execution information,e.g.,the execution frequency of each should be exactly the same over all unchanged statements code statement in the program under a given particular test case.Different from the functional oracle which usually can be obtained via the given specification,achieving the complete 1According to the developers [13]coverage statistics are only stable under zero optimization level. code coverage oracles turns out to be extremely challenging. -We assume this because mis-compilations are rare. 978-1-7281-2508-4/19/S31.00©2019EEE 79 IEEE D0I10.1109/ASE.2019.00018 Φcomputer societyAutomatic Self-Validation for Code Coverage Profilers Yibiao Yang∗†, Yanyan Jiang∗, Zhiqiang Zuo∗, Yang Wang∗, Hao Sun∗, Hongmin Lu∗, Yuming Zhou∗, and Baowen Xu∗ ∗State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China †School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China {yangyibiao, jyy, zqzuo}@nju.edu,cn, dz1933028@smail.nju.edu.cn shqking@gmail.com, {hmlu, zhouyuming, bwxu}@nju.edu.cn Abstract—Code coverage as the primitive dynamic program behavior information, is widely adopted to facilitate a rich spectrum of software engineering tasks, such as testing, fuzzing, debugging, fault detection, reverse engineering, and program understanding. Thanks to the widespread applications, it is crucial to ensure the reliability of the code coverage profilers. Unfortunately, due to the lack of research attention and the existence of testing oracle problem, coverage profilers are far away from being tested sufficiently. Bugs are still regularly seen in the widely deployed profilers, like gcov and llvm-cov, along with gcc and llvm, respectively. This paper proposes Cod, an automated self-validator for effectively uncovering bugs in the coverage profilers. Starting from a test program (either from a compiler’s test suite or generated randomly), Cod detects profiler bugs with zero false positive using a metamorphic relation in which the coverage statistics of that program and a mutated variant are bridged. We evaluated Cod over two of the most well-known code coverage profilers, namely gcov and llvm-cov. Within a fourmonth testing period, a total of 196 potential bugs (123 for gcov, 73 for llvm-cov) are found, among which 23 are confirmed by the developers. Index Terms—Code coverage, Metamorphic testing, Coverage profilers, Bug detection. I. INTRODUCTION Profiling code coverage data [1] (e.g., executed branches, paths, functions, etc.) of the instrumented subject programs is the cornerstone of a rich spectrum of software engineering practices, such as testing [2], fuzzing [3], debugging [4]– [6], specification mining [7], [8], fault detection [9], reverse engineering, and program understanding [10]. Incorrect coverage information would severely mislead developers in their software engineering practices. Unfortunately, coverage profilers themselves (e.g., gcov and llvm-cov) are prone to errors. Even a simple randomized differential testing technique exposed more than 70 bugs in coverage profilers [11]. The reasons are two-fold. Firstly, neither the application-end developers nor academic researchers paid sufficient attention to the testing of code coverage pro- filers. Secondly, automatic testing of coverage profilers is still challenging due to the lack of test oracles. During the code coverage testing, the oracle is supposed to constitute the rich execution information, e.g., the execution frequency of each code statement in the program under a given particular test case. Different from the functional oracle which usually can be obtained via the given specification, achieving the complete code coverage oracles turns out to be extremely challenging. Even though the programming experts can specify the oracle precisely, it requires enormous human intervention, making it impractical. A simple differential testing approach C2V tried to uncover coverage bugs by comparing the coverage profiling results of the same input program over two different profiler implementations (e.g., gcov and llvm-cov) [11]. For instance, if gcov and llvm-cov provide different coverage information for the same statement of the profiled program, a bug is reported. Due to the inconsistency of coverage semantics defined by different profiler implementations, it is rather common that independently implemented coverage profilers exhibit different opinions on the code-line based statistics (e.g., the case in Figure 1) — this essentially contradicts the fundamental assumption of differential testing that distinct coverage profilers should output identical coverage statistics for the same input program. Approach To tackle the flaws of the existing approach, this paper presents Cod, a fully automated self-validator of coverage profilers, based on the metamorphic testing formulation [12]. Instead of comparing outputs from two independent profilers, Cod takes a single profiler and a program P (either from a compiler’s test suite or generated randomly) as input and uncovers the bugs by identifying the inconsistency of coverage results from P and its equivalent mutated variants whose coverage statistics are expected to be identical. The equivalent program variants are generated based on the assumption that modifying unexecuted code blocks should not affect the coverage statistics of executed blocks under the identical profiler, which should generally hold in a non-optimized setting1. This idea originates from EMI [2], a metamorphic testing approach which is targeted at compiler optimization bugs. Specifically, assuming that the compiler is correct2 and given a deterministic program P under profiling (either from a compiler’s test suite or generated randomly) and fixate its input, Cod obtains a reference program P by removing the unexecuted statements in P. P should strictly follow the same execution path as long as the coverage profiling data of P is correct. Therefore, Cod asserts that the coverage statistics should be exactly the same over all unchanged statements 1According to the developers [13], coverage statistics are only stable under zero optimization level. 2We assume this because mis-compilations are rare. 79 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) 978-1-7281-2508-4/19/$31.00 ©2019 IEEE DOI 10.1109/ASE.2019.00018