Precise and Efficient Points-to Analysis via New Context-Sensitivity and Heap abstraction by Tian Tan A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE SCHOOL OF COMPUTER SCIENCE AND ENGINEERING THE UNIVERSITY OF NEW SOUTH WALES SYDNEY·AUSTRALIA Wednesday 19th April,2017 All rights reserved.This work may not be reproduced in whole or in part,by photocopy or other means,without the permission of the author. ©Tian Tan2017
Precise and Efficient Points-to Analysis via New Context-Sensitivity and Heap Abstraction by Tian Tan A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE SCHOOL OF Computer Science and Engineering Wednesday 19th April, 2017 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. c Tian Tan 2017
Abstract Points-to analysis addresses a fundamental problem in program analysis:determin- ing statically which objects a variable or reference can point to.As a fundamental technique,many real-world clients such as bug detection,security analysis,pro- gram understanding,compiler optimization and program verification,depend on the results of points-to analysis. A long-standing problem in points-to analysis is the balance between precision and efficiency.This thesis aims to improve both ends of the balance respectively. For precision,object-sensitivity is usually considered as the most precise context-sensitivity for points-to analysis for object-oriented languages,such as Java.However,it suffers from the scalability problem when increasing the context length and thus it is hard to further improve its precision.We present BEAN,a new object-sensitivity approach for points-to analysis.By identifying and eliminating the redundant context elements which contribute nothing to the precision,BEAN is able to improve the precision of any k- object-sensitive analysis by still using a k-limiting context abstraction. For efficiency,targeting the type-dependent clients such as call graph con- struction,devirtualization and may-fail casting,we present MAHJONG,a new heap abstraction approach for points-to analysis.By merging equiv- alent automata representing type-consistent objects that are created by the i
Abstract Points-to analysis addresses a fundamental problem in program analysis: determining statically which objects a variable or reference can point to. As a fundamental technique, many real-world clients such as bug detection, security analysis, program understanding, compiler optimization and program verification, depend on the results of points-to analysis. A long-standing problem in points-to analysis is the balance between precision and efficiency. This thesis aims to improve both ends of the balance respectively. • For precision, object-sensitivity is usually considered as the most precise context-sensitivity for points-to analysis for object-oriented languages, such as Java. However, it suffers from the scalability problem when increasing the context length and thus it is hard to further improve its precision. We present Bean, a new object-sensitivity approach for points-to analysis. By identifying and eliminating the redundant context elements which contribute nothing to the precision, Bean is able to improve the precision of any kobject-sensitive analysis by still using a k-limiting context abstraction. • For efficiency, targeting the type-dependent clients such as call graph construction, devirtualization and may-fail casting, we present Mahjong, a new heap abstraction approach for points-to analysis. By merging equivalent automata representing type-consistent objects that are created by the i
allocation-site abstraction,MAHJONG enables an allocation-site-based points- to analysis to run significantly faster while achieving nearly the same precision for type-dependent clients. We extensively evaluate BEAN and MAHJONG against the state-of-the-art points- to analysis for Java with large real-world Java applications and library.The results demonstrate that both BEAN and MAHJONG have met their goals of design.BEAN has succeeded in making points-to analysis more precise at only small increases in analysis cost.MAHJONG enables points-to analysis to run significantly faster while achieving nearly the same precision for type-dependent clients.We have released BEAN and MAHJONG as open-source tools. iⅱ
allocation-site abstraction, Mahjong enables an allocation-site-based pointsto analysis to run significantly faster while achieving nearly the same precision for type-dependent clients. We extensively evaluate Bean and Mahjong against the state-of-the-art pointsto analysis for Java with large real-world Java applications and library. The results demonstrate that both Bean and Mahjong have met their goals of design. Bean has succeeded in making points-to analysis more precise at only small increases in analysis cost. Mahjong enables points-to analysis to run significantly faster while achieving nearly the same precision for type-dependent clients. We have released Bean and Mahjong as open-source tools. ii
Publications Tian Tan,Yue Li,and Jingling Xue.Efficient and Precise Points-to Analy- sis:Modeling the Heap by Merging Equivalent Automata.38th ACM SIG- PLAN Conference on Programming Language Design and Implementation (PLDI'17). Yifei Zhang,Tian Tan,Yue Li and Jingling Xue.Ripple:Reflection Anal- ysis for Android Apps in Incomplete Information Environments.7th ACM Conference on Data and Applications Security and Privacy (CODASPY17). Tian Tan,Yue Li and Jingling Xue.Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting.23nd International Static Anal- ysis Symposium (SAS'16). Yue Li,Tian Tan,Yifei Zhang and Jingling Xue.Program Tailoring:Slic- ing by Sequential Criteria.30th European Conference on Object-Oriented Programming (ECOOP'16).Distinguished Paper Award. Yue Li,Tian Tan and Jingling Xue.Effective Soundness-Guided Reflection Analysis.22nd International Static Analysis Symposium (SAS'15). Yue Li,Tian Tan,Yulei Sui and Jingling Xue.Self-Inferencing Reflection Resolution for Java.28th European Conference on Object-Oriented Program- ming (ECOOP'14). iii
Publications • Tian Tan, Yue Li, and Jingling Xue. Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata. 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’17). • Yifei Zhang, Tian Tan, Yue Li and Jingling Xue. Ripple: Reflection Analysis for Android Apps in Incomplete Information Environments. 7th ACM Conference on Data and Applications Security and Privacy (CODASPY’17). • Tian Tan, Yue Li and Jingling Xue. Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting. 23nd International Static Analysis Symposium (SAS’16). • Yue Li, Tian Tan, Yifei Zhang and Jingling Xue. Program Tailoring: Slicing by Sequential Criteria. 30th European Conference on Object-Oriented Programming (ECOOP’16). Distinguished Paper Award. • Yue Li, Tian Tan and Jingling Xue. Effective Soundness-Guided Reflection Analysis. 22nd International Static Analysis Symposium (SAS’15). • Yue Li, Tian Tan, Yulei Sui and Jingling Xue. Self-Inferencing Reflection Resolution for Java. 28th European Conference on Object-Oriented Programming (ECOOP’14). iii
Acknowledgements First,I would like to thank my supervisor,Professor Jingling Xue,for supporting me with freedom and guidance throughout my Ph.D.studies.His serious attitude, diligence and enthusiasm to the research influence me and will continue to benefit me in my future career. My special thanks go to Yue Li,a great research partner and the best friend of mine.I worked with Yue during the whole period of my Ph.D.studies.He is like my elder brother and gives me great help in both research and life.I am very very very lucky to have him as my collaborator and friend. I also thank my labmates and colleagues from the CORG at UNSW,Peng Di, Yulei Sui,Sen Ye,Yu Su,Ding Ye,Hao Zhou,Xiaokang Fan,Hua Yan,Feng Zhang, Yifei Zhang,Jieyuan Zhang,Jie Liu,Diyu Wu and Jingbo Lu.It was a pleasant experience to get along with them during these unforgettable days. I thank the Doop team for making Doop (such a good points-to analysis framework for Java)public available.Doop is well-designed so that I can easily integrate both my thesis work BEAN and MAHJONG into it to perform evaluation. Many thanks go to the two international examiners of my thesis,Prof.Yannis Smaragdakis at University of Athens and Prof.Ondrej Lhotak at University of Waterloo,for their valuable time and comments. Last,but not least,I would like to express my gratefulness to my family.I iv
Acknowledgements First, I would like to thank my supervisor, Professor Jingling Xue, for supporting me with freedom and guidance throughout my Ph.D. studies. His serious attitude, diligence and enthusiasm to the research influence me and will continue to benefit me in my future career. My special thanks go to Yue Li, a great research partner and the best friend of mine. I worked with Yue during the whole period of my Ph.D. studies. He is like my elder brother and gives me great help in both research and life. I am very very very lucky to have him as my collaborator and friend. I also thank my labmates and colleagues from the CORG at UNSW, Peng Di, Yulei Sui, Sen Ye, Yu Su, Ding Ye, Hao Zhou, Xiaokang Fan, Hua Yan, Feng Zhang, Yifei Zhang, Jieyuan Zhang, Jie Liu, Diyu Wu and Jingbo Lu. It was a pleasant experience to get along with them during these unforgettable days. I thank the Doop team for making Doop (such a good points-to analysis framework for Java) public available. Doop is well-designed so that I can easily integrate both my thesis work Bean and Mahjong into it to perform evaluation. Many thanks go to the two international examiners of my thesis, Prof. Yannis Smaragdakis at University of Athens and Prof. Ondˇrej Lhot´ak at University of Waterloo, for their valuable time and comments. Last, but not least, I would like to express my gratefulness to my family. I iv
am deeply indebted to my parents for their fantastic love,endless support and everything else they have given me all through these years.Without them,I would not have been where I am now.I dedicate my dissertation and love to them
am deeply indebted to my parents for their fantastic love, endless support and everything else they have given me all through these years. Without them, I would not have been where I am now. I dedicate my dissertation and love to them. v
Contents Abstract i Acknowledgements iv 1 Introduction 1 1.1 Challenges.. 2 1.2 Contributions 1.3 Organization 5 2 Background:Context-Sensitive Points-to Analysis 7 2.1 Preliminary 7 2.2 Formalism .. 10 2.2.1 Notations.............. 10 2.2.2 Context-Sensitive Points-to Analysis 12 2.2.3 Context-Sensitivity ............... 13 3 BEAN:Precise Points-to Analysis via Object Allocation Graph 17 3.1 Overview.......... 17 3.2 Motivation.......····· 20 3.2.1 Redundant Elements in Method Contexts·..·.····. 20 3.2.2 Redundant Elements in Heap Contexts..··....···. 22 vi
Contents Abstract i Acknowledgements iv 1 Introduction 1 1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background: Context-Sensitive Points-to Analysis 7 2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Context-Sensitive Points-to Analysis . . . . . . . . . . . . . 12 2.2.3 Context-Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 13 3 Bean: Precise Points-to Analysis via Object Allocation Graph 17 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Redundant Elements in Method Contexts . . . . . . . . . . 20 3.2.2 Redundant Elements in Heap Contexts . . . . . . . . . . . . 22 vi
3.2.3 Discussion.... 24 3.3 BEAN 25 3.3.1 Object Allocation Graph 26 3.3.2 Context Selection. 27 3.3.3 Discussion.... 28 3.4 Formalism........·· 29 3.4.1 Object Allocation Graph.... 29 3.4.2 Context Selection 31 3.4.3 BEAN-directed Object-Sensitive Points-to Analysis 36 3.4.4 Properties·... 36 3.5 Implementation.. 38 3.6 Evaluation..········ 38 3.6.1 Experimental Setting.....····..·········· 39 3.6.2 RQ1:Precision and Performance Measurements ...... 40 3.6.3 RQ2:A Real-World Case Study 43 3.7 Related Work······· 46 4 MAHJONG:Efficient Points-to Analysis by Merging Equivalent Au- tomata 49 4.1 Overview.........::···..···· 49 4.2 Motivation...............··… 52 4.2.1 Allocation-Type Abstraction:A Naive Solution....... 53 4.2.2 MAHJONG:Our Solution.... 54 4.3MAHJ0NG...... 60 4.3.1 Overview of MAHJONG... 60 4.3.2 The NFA Builder.... 61 4.3.3 The DFA Converter..... 61 vii
3.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Bean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.1 Object Allocation Graph . . . . . . . . . . . . . . . . . . . . 26 3.3.2 Context Selection . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.1 Object Allocation Graph . . . . . . . . . . . . . . . . . . . . 29 3.4.2 Context Selection . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.3 Bean-directed Object-Sensitive Points-to Analysis . . . . . 36 3.4.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . 39 3.6.2 RQ1: Precision and Performance Measurements . . . . . . . 40 3.6.3 RQ2: A Real-World Case Study . . . . . . . . . . . . . . . . 43 3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Mahjong: Efficient Points-to Analysis by Merging Equivalent Automata 49 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.1 Allocation-Type Abstraction: A Naive Solution . . . . . . . 53 4.2.2 Mahjong: Our Solution . . . . . . . . . . . . . . . . . . . . 54 4.3 Mahjong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.1 Overview of Mahjong . . . . . . . . . . . . . . . . . . . . . 60 4.3.2 The NFA Builder . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.3 The DFA Converter . . . . . . . . . . . . . . . . . . . . . . . 61 vii
4.3.4 The Automata Equivalence Checker........ 62 4.3.5 The Heap Modeler 62 4.3.6 MAHJONG-based Points-.to Analysis..······· 62 4.4 Algorithms..... 66 4.4.1 MAHJONG.... 66 4.4.2 The NFA Builder..............·.... 68 4.4.3 The DFA Converter.....··· 69 4.4.4 The Automata Equivalence Checker 70 4.4.5 The Heap Modeler ........ 72 4.5 Implementation......... 72 4.6 Evaluation... 73 4.6.1 RQ1:MAHJONG's Effectiveness as a Pre-Analysis...... 75 4.6.2RQ2:MAHJONG-based Points-.to Analysis..·...···· 78 4.7 Related Work.......... 85 5 Conclusions and Future Work 88 5.1 Conclusions 88 5.2 Future Work.... 89 Bibliography 90 viii
4.3.4 The Automata Equivalence Checker . . . . . . . . . . . . . . 62 4.3.5 The Heap Modeler . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.6 Mahjong-based Points-to Analysis . . . . . . . . . . . . . . 62 4.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Mahjong . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.2 The NFA Builder . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.3 The DFA Converter . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.4 The Automata Equivalence Checker . . . . . . . . . . . . . . 70 4.4.5 The Heap Modeler . . . . . . . . . . . . . . . . . . . . . . . 72 4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6.1 RQ1: Mahjong’s Effectiveness as a Pre-Analysis . . . . . . 75 4.6.2 RQ2: Mahjong-based Points-to Analysis . . . . . . . . . . 78 4.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5 Conclusions and Future Work 88 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Bibliography 90 viii
List of Figures 2 1 Notations...················· 11 2.2 Context-sensitive points-to analysis...···..····.····· 12 3.1 Method contexts for 2-obj and BEAN.................. 21 3.2 Heap contexts for2-obj and BEAN..·......·.....·... 23 3.3 Overview of BEAN...······.················· 25 3.4 The OAGs for the two motivating programs in Figures 3.1 and 3.2. 26 3.5 Notations for object allocation graph.................. 29 3.6 Rules for building the OAG,G=(N,E),for a program based on a 30 3.7 Rules for basic relations in an OAG,G=(N,E).·,···.···· 31 3.8 Rules for context selection in an OAG,G=(N,E).········ 32 3.9 An OAG...······ 33 3.10 Three Cases marked for [Hcrx-DIv]and [Hcrx-Cyc]in Figure 3.8... 35 3.11 A real--world application for using java.util.HaseSet...,···. 44 3.l2 Part of OAG related to HS,/1 and HS/2.....··.·.·..···. 45 4.1 An example program illustrating object merging...·..····.· 53 4.2 Field points-to graph rooted at of and o2............... 55 4.3 Illustrating Condition 2 in Definition 1................. 57 ix
List of Figures 2.1 Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Context-sensitive points-to analysis. . . . . . . . . . . . . . . . . . . 12 3.1 Method contexts for 2-obj and Bean. . . . . . . . . . . . . . . . . . 21 3.2 Heap contexts for 2-obj and Bean. . . . . . . . . . . . . . . . . . . 23 3.3 Overview of Bean. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 The OAGs for the two motivating programs in Figures 3.1 and 3.2. 26 3.5 Notations for object allocation graph. . . . . . . . . . . . . . . . . . 29 3.6 Rules for building the OAG, G = (N, E), for a program based on a pre-analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.7 Rules for basic relations in an OAG, G = (N, E). . . . . . . . . . . 31 3.8 Rules for context selection in an OAG, G = (N, E). . . . . . . . . . 32 3.9 An OAG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.10 Three Cases marked for [Hctx-Div] and [Hctx-Cyc] in Figure 3.8. . . 35 3.11 A real-world application for using java.util.HaseSet. . . . . . . . 44 3.12 Part of OAG related to HS/1 and HS/2. . . . . . . . . . . . . . . . . 45 4.1 An example program illustrating object merging. . . . . . . . . . . 53 4.2 Field points-to graph rooted at o T 1 and o T 2 . . . . . . . . . . . . . . . 55 4.3 Illustrating Condition 2 in Definition 1. . . . . . . . . . . . . . . . . 57 ix