A Principled Approach to Selective Context Sensitivity for Pointer Analysis YUE LI,Nanjing University,China and Aarhus University,Denmark TIAN TAN",Nanjing University,China and Aarhus University,Denmark ANDERS MOLLER,Aarhus University,Denmark YANNIS SMARAGDAKIS,University of Athens,Greece Context sensitivity is an essential technique for ensuring high precision in static analyses.It has been observed that applying context sensitivity partially,only on a select subset of the methods,can improve the balance between analysis precision and speed.However,existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset.In this work,we present a more principled approach for identifying precision-critical methods,based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.Using this theoretical foundation, we present an efficient algorithm,ZIPPER,to recognize these flow patterns in a given program and employ context sensitivity accordingly.We also present a variant,ZIPER,that additionally takes into account which methods are disproportionally costly to analyze with context sensitivity. Our experimental results on standard benchmark and real-world Java programs show that ZIPPER preserves effectively all of the precision(98.8%)of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap,2obj for short),with a substantial speedup(on average, 3.4x and up to 9.4x),and that ZIPPERe preserves 94.7%of the precision of 2obj,with an order-of-magnitude speedup (on average,25.5x and up to 88x).In addition,for 10 programs that cannot be analyzed by 2obj within a 3 hours time limit,on average ZIPPER can guide 2obj to finish analyzing them in less than 11 minutes with high precision compared to context-insensitive and introspective context-sensitive analyses. CCS Concepts:.Theory of computation-Program analysis. Additional Key Words and Phrases:static analysis,points-to analysis,Java ACM Reference Format: Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis.2020.A Principled Approach to Selective Context Sensitivity for Pointer Analysis.ACM Trans.Program.Lang.Syst.1,1,Article 1 (January 2020),40 pages. https:/doi.org/10.1145/3381915 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that compute abstractions of the possible values of pointer variables in a program.Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs,and it is used in a wide range of software engineering tools,e.g.,for bug detection [Chandra et al.2009;Naik et al.2006,2009]. security analysis [Arzt et al.2014;Grech and Smaragdakis 2017;Livshits and Lam 2005],program "Corresponding author Authors'email addresses:yueli@nju.edu.cn,tiantan@nju.edu.cn,amoeller@cs.au.dk,smaragd@di.uoagr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2020 Association for Computing Machinery. 0164-0925/2020/1-ART1$15.00 https:/doi.org/10.1145/3381915 ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1 A Principled Approach to Selective Context Sensitivity for Pointer Analysis YUE LI, Nanjing University, China and Aarhus University, Denmark TIAN TAN∗ , Nanjing University, China and Aarhus University, Denmark ANDERS MØLLER, Aarhus University, Denmark YANNIS SMARAGDAKIS, University of Athens, Greece Context sensitivity is an essential technique for ensuring high precision in static analyses. It has been observed that applying context sensitivity partially, only on a select subset of the methods, can improve the balance between analysis precision and speed. However, existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset. In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis. Using this theoretical foundation, we present an efficient algorithm, Zipper, to recognize these flow patterns in a given program and employ context sensitivity accordingly. We also present a variant, Zipper𝑒 , that additionally takes into account which methods are disproportionally costly to analyze with context sensitivity. Our experimental results on standard benchmark and real-world Java programs show that Zipper preserves effectively all of the precision (98.8%) of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap, 2obj for short), with a substantial speedup (on average, 3.4× and up to 9.4×), and that Zipper𝑒 preserves 94.7% of the precision of 2obj, with an order-of-magnitude speedup (on average, 25.5× and up to 88×). In addition, for 10 programs that cannot be analyzed by 2obj within a 3 hours time limit, on average Zipper𝑒 can guide 2obj to finish analyzing them in less than 11 minutes with high precision compared to context-insensitive and introspective context-sensitive analyses. CCS Concepts: • Theory of computation → Program analysis. Additional Key Words and Phrases: static analysis, points-to analysis, Java ACM Reference Format: Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2020. A Principled Approach to Selective Context Sensitivity for Pointer Analysis. ACM Trans. Program. Lang. Syst. 1, 1, Article 1 (January 2020), 40 pages. https://doi.org/10.1145/3381915 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that compute abstractions of the possible values of pointer variables in a program. Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs, and it is used in a wide range of software engineering tools, e.g., for bug detection [Chandra et al. 2009; Naik et al. 2006, 2009], security analysis [Arzt et al. 2014; Grech and Smaragdakis 2017; Livshits and Lam 2005], program ∗Corresponding author Authors’ email addresses: yueli@nju.edu.cn, tiantan@nju.edu.cn, amoeller@cs.au.dk, smaragd@di.uoa.gr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 0164-0925/2020/1-ART1 $15.00 https://doi.org/10.1145/3381915 ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
12 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis verification [Fink et al.2008;Pradel et al.2012],and program debugging and understanding [Li et al.2016;Sridharan et al.2007]. For decades,numerous analysis techniques have been developed to make pointer analysis more precise and more efficient,especially for object-oriented languages [Hind 2001;Smaragdakis and Balatsouras 2015;Sridharan et al.2013].One of the most successful ideas for producing high precision is context sensitivity [Milanova et al.2002,2005;Sharir and Pnueli 1981;Shivers 1991; Smaragdakis et al.2011],which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method's variables and thereby reduce spurious object flows.However,despite great precision benefits,context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006; Oh et al.2014;Tan et al.2016,2017;Xu and Rountev 2008].One reason is that,with conventional context-sensitivity techniques,every method in a program is treated the same,meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence,too much space and time is consumed [Smaragdakis et al.2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision.It is far from trivial to determine when a context-sensitive analysis will yield precision benefits(or conversely,to determine when omitting context sensitivity for a method would introduce imprecision).This challenge of effectively identifying precision-critical methods has been the focus of past work [Hassanshahi et al. 2017;Jeon et al.2019;Smaragdakis et al.2014;Wei and Ryder 2015].Those techniques are based on heuristics that seem to correlate with imprecision,but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis.For example,introspective analysis [Smaragdakis et al.2014]requires tuning multiple parameters involving sizes of various kinds of points-to sets,and data-driven analysis [Jeong et al. 2017]is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this article,we provide a new more principled approach,named ZIPPER,to efficiently identify precision-critical methods,based on insights about how imprecision is introduced.The key observa- tion is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows,which we call direct,wrapped,and unwrapped flows.Moreover, we show that these three kinds of value flows can be recognized efficiently.Based on information obtained from a fast,context-insentive pointer analysis,ZIPPER constructs a precision flow graph (PFG)that concisely models the relevant value flow.The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time, compared to the pointer analysis itself. Additionally,we provide a variant of ZIPPER,named ZIPPERe,which also identifies efficiency- critical methods,i.e.,methods that are costly to analyze with context sensitivity.With ZIPPER, only the methods that are precision-critical but not efficiency-critical will be analyzed context- sensitively.By applying context sensitivity only to the methods selected by ZIPPER or ZIPpERe,a pointer analysis runs significantly faster than conventional techniques that apply context sensitivity indiscriminately to all methods,while retaining most of the attainable precision. In summary,we make the following key contributions. We describe three general patterns of value flow that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis(Section 3). We present the ZIPPER approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity(Section 4). ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:2 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis verification [Fink et al. 2008; Pradel et al. 2012], and program debugging and understanding [Li et al. 2016; Sridharan et al. 2007]. For decades, numerous analysis techniques have been developed to make pointer analysis more precise and more efficient, especially for object-oriented languages [Hind 2001; Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. One of the most successful ideas for producing high precision is context sensitivity [Milanova et al. 2002, 2005; Sharir and Pnueli 1981; Shivers 1991; Smaragdakis et al. 2011], which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method’s variables and thereby reduce spurious object flows. However, despite great precision benefits, context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Oh et al. 2014; Tan et al. 2016, 2017; Xu and Rountev 2008]. One reason is that, with conventional context-sensitivity techniques, every method in a program is treated the same, meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence, too much space and time is consumed [Smaragdakis et al. 2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision. It is far from trivial to determine when a context-sensitive analysis will yield precision benefits (or conversely, to determine when omitting context sensitivity for a method would introduce imprecision). This challenge of effectively identifying precision-critical methods has been the focus of past work [Hassanshahi et al. 2017; Jeon et al. 2019; Smaragdakis et al. 2014; Wei and Ryder 2015]. Those techniques are based on heuristics that seem to correlate with imprecision, but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis. For example, introspective analysis [Smaragdakis et al. 2014] requires tuning multiple parameters involving sizes of various kinds of points-to sets, and data-driven analysis [Jeong et al. 2017] is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this article, we provide a new more principled approach, named Zipper, to efficiently identify precision-critical methods, based on insights about how imprecision is introduced. The key observation is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows, which we call direct, wrapped, and unwrapped flows. Moreover, we show that these three kinds of value flows can be recognized efficiently. Based on information obtained from a fast, context-insentive pointer analysis, Zipper constructs a precision flow graph (PFG) that concisely models the relevant value flow. The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time, compared to the pointer analysis itself. Additionally, we provide a variant of Zipper, named Zipper𝑒 , which also identifies efficiencycritical methods, i.e., methods that are costly to analyze with context sensitivity. With Zipper𝑒 , only the methods that are precision-critical but not efficiency-critical will be analyzed contextsensitively. By applying context sensitivity only to the methods selected by Zipper or Zipper𝑒 , a pointer analysis runs significantly faster than conventional techniques that apply context sensitivity indiscriminately to all methods, while retaining most of the attainable precision. In summary, we make the following key contributions. • We describe three general patterns of value flow that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis (Section 3). • We present the Zipper approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity (Section 4). ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 13 ZIPPER can guide context-sensitive pointer analysis to run faster while keeping most of its precision. We propose the ZIPPER express(ZIPPER)approach,which is developed on the basis of ZIPPER but additionally considers analysis cost when selecting the methods to be analyzed with context sensitivity(Section 5).Generally,ZIPPERe can guide context-sensitive pointer analysis to run extremely fast while being reasonably precise. In contrast to other techniques that apply context sensitivity selectively,the ZIPPER and ZIPPERe approaches are based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of multiple and complex analysis parameters. We provide an extensive experimental evaluation to evaluate the effectiveness of ZIPPER and ZIPPER(Section 6). On average,ZIPPER reports that only 38%of the methods are precision-critical,which preserves 98.8%of the precision(measured as average across a range of popular analysis clients)for a 2-object-sensitive pointer analysis with a context-sensitive heap(2obj),for a speedup of 3.4x and up to 9.2x.These results demonstrate that the three general patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. On average,ZIPPERe reports that only 14%of the methods are precision-critical but not efficiency-critical,which preserves 94.7%of the precision for 2obj for a speedup of 25.5x and up to 88x.In addition,for 10 programs that 2obj is unable to analyze within 3 hours,on average,ZIPPER can guide 2obj to finish analyzing them under 11 minutes with still good precision.Moreover,for some programs,ZIPPERe-guided pointer analysis can even run faster while being more precise than context-insensitive analysis.These results establish ZIPPERe as a new sweet spot in state-of-the-art pointer analyses,for making highly practical trade-offs between efficiency and precision. The present article extends and supersedes the paper by Li et al.[2018a]presented at the ACM Conference on Object-Oriented Programming,Languages,Systems,and Applications 2018(OOPSLA 2018).In comparison,this article contains the following major extensions(and reflects an updated, more complete understanding throughout the rest of the text): We add a brief literature review of the history and current trends in context-sensitive pointer analysis for Java(Section 2). We present the new pointer analysis variant,ZIPPERe(ZIPPER express),which is extremely fast with also good precision(Section 5). We investigate whether the precision-loss patterns of ZIPPER,as its theoretical foundation, are general enough to effectively identify the precision-critical methods and thus preserve the precision for other mainstream context-sensitivity variants,including call-site sensitivity, type sensitivity and hybrid context sensitivity(Section 6.5). We conduct new experiments and extensively evaluate ZIPPERe in terms of efficiency and precision in practice(Section 6.6). 2 CONTEXT SENSITIVITY:A BRIEF REVIEW In this section,we describe how context-sensitivity has evolved for Java pointer analysis from purely improving precision to making good precision and efficiency trade-offs.In addition to giving a brief review to discuss this important and intricate research topic,we also provide some necessary background for our work.We focus on whole-program analysis,which is the main application setting of this research topic [Smaragdakis and Balatsouras 2015;Sridharan et al.2013]. ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:3 Zipper can guide context-sensitive pointer analysis to run faster while keeping most of its precision. • We propose the Zipper express (Zipper𝑒 ) approach, which is developed on the basis of Zipper but additionally considers analysis cost when selecting the methods to be analyzed with context sensitivity (Section 5). Generally, Zipper𝑒 can guide context-sensitive pointer analysis to run extremely fast while being reasonably precise. In contrast to other techniques that apply context sensitivity selectively, the Zipper and Zipper𝑒 approaches are based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of multiple and complex analysis parameters. • We provide an extensive experimental evaluation to evaluate the effectiveness of Zipper and Zipper𝑒 (Section 6). – On average, Zipper reports that only 38% of the methods are precision-critical, which preserves 98.8% of the precision (measured as average across a range of popular analysis clients) for a 2-object-sensitive pointer analysis with a context-sensitive heap (2obj), for a speedup of 3.4× and up to 9.2×. These results demonstrate that the three general patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. – On average, Zipper𝑒 reports that only 14% of the methods are precision-critical but not efficiency-critical, which preserves 94.7% of the precision for 2obj for a speedup of 25.5× and up to 88×. In addition, for 10 programs that 2obj is unable to analyze within 3 hours, on average, Zipper𝑒 can guide 2obj to finish analyzing them under 11 minutes with still good precision. Moreover, for some programs, Zipper𝑒 -guided pointer analysis can even run faster while being more precise than context-insensitive analysis. These results establish Zipper𝑒 as a new sweet spot in state-of-the-art pointer analyses, for making highly practical trade-offs between efficiency and precision. The present article extends and supersedes the paper by Li et al. [2018a] presented at the ACM Conference on Object-Oriented Programming, Languages, Systems, and Applications 2018 (OOPSLA 2018). In comparison, this article contains the following major extensions (and reflects an updated, more complete understanding throughout the rest of the text): • We add a brief literature review of the history and current trends in context-sensitive pointer analysis for Java (Section 2). • We present the new pointer analysis variant, Zipper𝑒 (Zipper express), which is extremely fast with also good precision (Section 5). • We investigate whether the precision-loss patterns of Zipper, as its theoretical foundation, are general enough to effectively identify the precision-critical methods and thus preserve the precision for other mainstream context-sensitivity variants, including call-site sensitivity, type sensitivity and hybrid context sensitivity (Section 6.5). • We conduct new experiments and extensively evaluate Zipper𝑒 in terms of efficiency and precision in practice (Section 6.6). 2 CONTEXT SENSITIVITY: A BRIEF REVIEW In this section, we describe how context-sensitivity has evolved for Java pointer analysis from purely improving precision to making good precision and efficiency trade-offs. In addition to giving a brief review to discuss this important and intricate research topic, we also provide some necessary background for our work. We focus on whole-program analysis, which is the main application setting of this research topic [Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:4 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis There are two major factors that control the precision of pointer analysis:how to abstract the heap,and how to abstract the call stack.Compared with the few approaches on heap abstraction for Java pointer analysis [Kanvar and Khedker 2016;Lhotak and Hendren 2003;Tan et al.2017], there is ample literature on stack modeling.Among the schemes for modeling the stack,in the past years,context sensitivity has been considered as the most effective approach to improve the analysis precision for Java programs [Lhotak and Hendren 2006]and has thus attracted the most attention from researchers [Bravenboer and Smaragdakis 2009;Hassanshahi et al.2017;Jeon et al. 2019,2018;Jeong et al.2017;Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006;Li et al. 2018b;Milanova et al.2002,2005;Smaragdakis and Balatsouras 2015;Smaragdakis et al.2011,2014; Sridharan et al.2013;Tan et al.2016,2017;Thakur and Nandivada 2019;Thiessen and Lhotak 2017]. We summarize the research work on context-sensitive pointer analysis into three categories according to(1)the kinds of context elements,(2)the composition of context elements,and (3)the selection of which parts of a given program to analyze with context sensitivity.Interestingly,the techniques in different categories are orthogonal and can thus be combined for producing more sophisticated pointer analyses. Kinds of Context Elements.Many elements in a program,e.g.,call sites,allocation sites,and types, can be considered as context elements in context sensitivity.The earliest type of context elements for Java pointer analysis inherits from the foundational approaches used both for C and for functional languages [Sharir and Pnueli 1981;Shivers 1991]where context elements are call sites.We refer to this style of context sensitivity as call-site sensitivity or call-string sensitivity [Smaragdakis and Balatsouras 2015].Unlike C programs,which often predominantly manipulate values in the stack. Java programs have their inter-procedural data flow primarily through heap objects.Accordingly. Milanova et al.[2005]proposed object sensitivity that uses allocation sites of receiver objects(the program points where these objects are created)as context elements.Generally,object sensitivity is more precise and efficient than call-site sensitivity,and is considered the most effective context- sensitivity variant for producing good precision for Java.This conclusion has been extensively validated in various pointer analysis frameworks [Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006;Naik et al.2006;Sridharan et al.2013;Tan et al.2016,2017]. Despite being precise,object sensitivity is difficult to scale for large and complex Java programs, which motivated the concept of type sensitivity [Smaragdakis et al.2011].In type sensitivity,the context elements are the types of the classes containing allocation sites(as the latter would appear in object sensitivity).This more coarse-grained choice of context elements enables type sensitivity to run much faster by sacrificing some precision compared to object sensitivity.Today,call-site, object,and type sensitivity are still considered the three mainstream context-sensitivity variants for Java pointer analysis [Jeong et al.2017;Smaragdakis and Balatsouras 2015;Tan et al.2017]. Composition of Context Elements.With context sensitivity,each method can be analyzed in multiple contexts,where a context consists of a list of context elements that model the run-time call stack.For example,in call-string sensitivity [Sharir and Pnueli 1981;Shivers 1991],a context for a method m is composed by the context elements that are listed as [c1,c2,...],where c is m's call site,cz is the call site of the method that contains c,etc.To ensure termination of the analysis,a k-limiting approach is typically followed:only the most recent k context elements are selected,and, for efficiency,k is usually set to a very small value in practice.The call-string sensitivity approach has been adopted in virtually all context-sensitive pointer analyses [Smaragdakis and Balatsouras 2015;Sridharan et al.2013].However,Tan et al.[2016]found that this conventional approach leads to many context elements that are not useful for improving precision while occupying the limited slots of context elements determined by k.To address this problem,Tan et al.[2016]presented an approach to identify such redundant context elements.By skipping those elements,a resulting ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:4 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis There are two major factors that control the precision of pointer analysis: how to abstract the heap, and how to abstract the call stack. Compared with the few approaches on heap abstraction for Java pointer analysis [Kanvar and Khedker 2016; Lhoták and Hendren 2003; Tan et al. 2017], there is ample literature on stack modeling. Among the schemes for modeling the stack, in the past years, context sensitivity has been considered as the most effective approach to improve the analysis precision for Java programs [Lhoták and Hendren 2006] and has thus attracted the most attention from researchers [Bravenboer and Smaragdakis 2009; Hassanshahi et al. 2017; Jeon et al. 2019, 2018; Jeong et al. 2017; Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Li et al. 2018b; Milanova et al. 2002, 2005; Smaragdakis and Balatsouras 2015; Smaragdakis et al. 2011, 2014; Sridharan et al. 2013; Tan et al. 2016, 2017; Thakur and Nandivada 2019; Thiessen and Lhoták 2017]. We summarize the research work on context-sensitive pointer analysis into three categories according to (1) the kinds of context elements, (2) the composition of context elements, and (3) the selection of which parts of a given program to analyze with context sensitivity. Interestingly, the techniques in different categories are orthogonal and can thus be combined for producing more sophisticated pointer analyses. Kinds of Context Elements. Many elements in a program, e.g., call sites, allocation sites, and types, can be considered as context elements in context sensitivity. The earliest type of context elements for Java pointer analysis inherits from the foundational approaches used both for C and for functional languages [Sharir and Pnueli 1981; Shivers 1991] where context elements are call sites. We refer to this style of context sensitivity as call-site sensitivity or call-string sensitivity [Smaragdakis and Balatsouras 2015]. Unlike C programs, which often predominantly manipulate values in the stack, Java programs have their inter-procedural data flow primarily through heap objects. Accordingly, Milanova et al. [2005] proposed object sensitivity that uses allocation sites of receiver objects (the program points where these objects are created) as context elements. Generally, object sensitivity is more precise and efficient than call-site sensitivity, and is considered the most effective contextsensitivity variant for producing good precision for Java. This conclusion has been extensively validated in various pointer analysis frameworks [Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Naik et al. 2006; Sridharan et al. 2013; Tan et al. 2016, 2017]. Despite being precise, object sensitivity is difficult to scale for large and complex Java programs, which motivated the concept of type sensitivity [Smaragdakis et al. 2011]. In type sensitivity, the context elements are the types of the classes containing allocation sites (as the latter would appear in object sensitivity). This more coarse-grained choice of context elements enables type sensitivity to run much faster by sacrificing some precision compared to object sensitivity. Today, call-site, object, and type sensitivity are still considered the three mainstream context-sensitivity variants for Java pointer analysis [Jeong et al. 2017; Smaragdakis and Balatsouras 2015; Tan et al. 2017]. Composition of Context Elements. With context sensitivity, each method can be analyzed in multiple contexts, where a context consists of a list of context elements that model the run-time call stack. For example, in call-string sensitivity [Sharir and Pnueli 1981; Shivers 1991], a context for a method 𝑚 is composed by the context elements that are listed as [𝑐1, 𝑐2, . . . ], where 𝑐1 is 𝑚’s call site, 𝑐2 is the call site of the method that contains 𝑐1, etc. To ensure termination of the analysis, a 𝑘-limiting approach is typically followed: only the most recent 𝑘 context elements are selected, and, for efficiency, 𝑘 is usually set to a very small value in practice. The call-string sensitivity approach has been adopted in virtually all context-sensitive pointer analyses [Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. However, Tan et al. [2016] found that this conventional approach leads to many context elements that are not useful for improving precision while occupying the limited slots of context elements determined by 𝑘. To address this problem, Tan et al. [2016] presented an approach to identify such redundant context elements. By skipping those elements, a resulting ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:5 1 object m(object o){ 2 return o: 3} 4 x1 new A() 5x2=m(x1): 6 y1 new B(); 7y2=m(y1): Fig.1.Example of precision loss in context-insensitive analysis context-sensitive pointer analysis is guaranteed to be at least as precise as(practically,more precise than)the conventional context sensitivity for the same choice of k.Similarly,Jeon et al.[2018] developed a machine-learning approach to select context elements,leading to a context-sensitive pointer analysis that is both reasonably efficient and precise. Selective Use of Context Sensitivity.The conventional approach to context sensitive analysis is to uniformly apply context sensitivity to every method in the given program.However,in recent years,researchers have become aware that for some methods,context sensitivity does not help improve the analysis precision but only introduces extra analysis cost.As a result,many selective context-sensitive pointer analyses have been proposed and contribute to a promising research direction for making practical trade-offs between precision and efficiency [Hassanshahi et al.2017;Jeon et al.2019;Jeong et al.2017;Li et al.2018b;Smaragdakis et al.2014].In these analyses,context sensitivity is applied only for the methods where it is deemed beneficial to the overall analysis precision.However,finding such precision-critical methods is challenging.The existing approaches to selective context-sensitive pointer analyses rely on heuristics [Hassanshahi et al.2017;Smaragdakis et al.2014]or machine learning [Jeon et al.2019;Jeong et al.2017].The heuristic approaches require manual tuning of multiple complicated parameters.The machine learning approaches are able to reveal some program features that may correlate with the analysis effectiveness;however,the weaknesses of machine learning approaches are also well known:they requires training and manual oversight during the tuning phase,they can behave unpredictably for new inputs,and they offer few insights on why they work. In this article,we present a more principled approach that explains when using context sensitivity for a method is beneficial for precision,or conversely,when omitting context sensitivity introduces imprecision. 3 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS To address the challenge of how to predict which methods are precision-critical in a given program, in this section,we introduce a general model to show that most of the precision loss in context- insensitive pointer analysis for Java can be expressed in terms of three patterns of value flows,or as combinations of these.Precision-loss patterns are independent of the chosen variant of context sensitivity,such as call-site sensitivity [Sharir and Pnueli 1981;Shivers 1991],object sensitivity [Mi- lanova et al.2005],and type sensitivity [Smaragdakis et al.2011].We first introduce the three precision loss patterns and then describe three corresponding concrete examples(Sections 3.1-3.3). This characterization of precision loss provides the conceptual foundation for ZIPPER and ZIPPER to identify the methods that will be analyzed with context sensitivity,as explained in Sections 4 and 5,respectively. A context-insensitive analysis does not distinguish between different calls to a method but merges the incoming abstract values(or points-to sets,in the case of pointer analysis)[Sharir and Pnueli 1981].Figure 1 shows a simple example.If method m is analyzed context-insensitively,then ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:5 1 2 3 4 5 6 7 Object m(Object o){ return o; } x1 = new A(); x2 = m(x1); y1 = new B(); y2 = m(y1); Fig. 1. Example of precision loss in context-insensitive analysis. context-sensitive pointer analysis is guaranteed to be at least as precise as (practically, more precise than) the conventional context sensitivity for the same choice of 𝑘. Similarly, Jeon et al. [2018] developed a machine-learning approach to select context elements, leading to a context-sensitive pointer analysis that is both reasonably efficient and precise. Selective Use of Context Sensitivity. The conventional approach to context sensitive analysis is to uniformly apply context sensitivity to every method in the given program. However, in recent years, researchers have become aware that for some methods, context sensitivity does not help improve the analysis precision but only introduces extra analysis cost. As a result, many selective context-sensitive pointer analyses have been proposed and contribute to a promising research direction for making practical trade-offs between precision and efficiency [Hassanshahi et al. 2017; Jeon et al. 2019; Jeong et al. 2017; Li et al. 2018b; Smaragdakis et al. 2014]. In these analyses, context sensitivity is applied only for the methods where it is deemed beneficial to the overall analysis precision. However, finding such precision-critical methods is challenging. The existing approaches to selective context-sensitive pointer analyses rely on heuristics [Hassanshahi et al. 2017; Smaragdakis et al. 2014] or machine learning [Jeon et al. 2019; Jeong et al. 2017]. The heuristic approaches require manual tuning of multiple complicated parameters. The machine learning approaches are able to reveal some program features that may correlate with the analysis effectiveness; however, the weaknesses of machine learning approaches are also well known: they requires training and manual oversight during the tuning phase, they can behave unpredictably for new inputs, and they offer few insights on why they work. In this article, we present a more principled approach that explains when using context sensitivity for a method is beneficial for precision, or conversely, when omitting context sensitivity introduces imprecision. 3 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS To address the challenge of how to predict which methods are precision-critical in a given program, in this section, we introduce a general model to show that most of the precision loss in contextinsensitive pointer analysis for Java can be expressed in terms of three patterns of value flows, or as combinations of these. Precision-loss patterns are independent of the chosen variant of context sensitivity, such as call-site sensitivity [Sharir and Pnueli 1981; Shivers 1991], object sensitivity [Milanova et al. 2005], and type sensitivity [Smaragdakis et al. 2011]. We first introduce the three precision loss patterns and then describe three corresponding concrete examples (Sections 3.1–3.3). This characterization of precision loss provides the conceptual foundation for Zipper and Zipper𝑒 to identify the methods that will be analyzed with context sensitivity, as explained in Sections 4 and 5, respectively. A context-insensitive analysis does not distinguish between different calls to a method but merges the incoming abstract values (or points-to sets, in the case of pointer analysis) [Sharir and Pnueli 1981]. Figure 1 shows a simple example. If method m is analyzed context-insensitively, then ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:6 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Direct Wrapped Unwrapped Flow Flow Flow IN Method Objects Value flow via assignments S field load/store operations, OUT Method or method calls/returns Fig.2.Three general patterns of value flow that cause precision loss in context-insensitive analysis. the two objects are mixed together,so the analysis conservatively concludes that both x2 and y2 may point to both the A object and the B object. In contrast,a context-sensitive analysis would analyze m twice,corresponding to the two different call sites,and thereby conclude that x2 can only point to an A object and y2 can only point to a B object.The price of that extra precision is that the method needs to be analyzed multiple times, so context sensitivity should ideally only be applied when the precision gain outweighs the extra analysis time. To characterize the relevant value flows,we first introduce some terminology. Definition 3.1(IN and Our methods).Given a class C and a method M that is declared in C or inherited from C's super-classes,if M contains one or more parameters then M is an IN method of C,and if M's return type is non-void then M is an Our method of C.(In the example in Figure 1,m is both an IN and an Our method of the surrounding class.) Definition 3.2(Object wrapping and unwrapping).If an object O is stored in a field of an object W (or in an array entry of W,in case W is an array),then O is wrapped into W.Conversely,if an object O is loaded from a field of an object W(or from an array entry of W in case W is an array),then O is unwrapped from W.(The simple example in Figure 1 contains no wrapping or unwrapping.) With these definitions in place,we can describe the three precision-loss patterns as different kinds of value flows,depicted in Figure 2. Definition 3.3(Direct flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C,and then flows(via a series of assignments,field load/store operations,method calls,or returns)to the return value of an Our method,M2,of the same class C,then we say the program has direct flow from M to M2.(The example in Figure 1 is a simple instance of this pattern.) Definition 3.4(Wrapped flow).If,in some execution of the program,an object O is passed as a parameter to an IN method Mi of class C and then flows to a store operation that wraps O into an object W,where W subsequently flows to the result of an Our method,M2,of the same class C,then we say the program has wrapped flow from Mi to M2.More generally,the wrapped flow pattern also covers value flow through multiple object wrapping steps,for example when W is itself wrapped into another object W,which flows to the return value of M2. Definition 3.5(Unwrapped flow).If,in some execution of the program,an object O is passed as a parameter to an IN method Mi of class C and then flows to a load operation that unwraps an object U from O,where U subsequently flows to the return value of an OuT method,M2,of the same class C,then we say the program has unwrapped flow from M to M2.As in the previous definition, unwrapped flow also covers value flow through multiple object unwrapping steps. ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:6 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis Direct Flow Wrapped Flow Unwrapped Flow IN Method OUT Method Value flow via assignments, field load/store operations, or method calls/returns Objects Fig. 2. Three general patterns of value flow that cause precision loss in context-insensitive analysis. the two objects are mixed together, so the analysis conservatively concludes that both x2 and y2 may point to both the A object and the B object. In contrast, a context-sensitive analysis would analyze m twice, corresponding to the two different call sites, and thereby conclude that x2 can only point to an A object and y2 can only point to a B object. The price of that extra precision is that the method needs to be analyzed multiple times, so context sensitivity should ideally only be applied when the precision gain outweighs the extra analysis time. To characterize the relevant value flows, we first introduce some terminology. Definition 3.1 (In and Out methods). Given a class 𝐶 and a method 𝑀 that is declared in 𝐶 or inherited from 𝐶’s super-classes, if 𝑀 contains one or more parameters then 𝑀 is an In method of 𝐶, and if 𝑀’s return type is non-void then 𝑀 is an Out method of 𝐶. (In the example in Figure 1, m is both an In and an Out method of the surrounding class.) Definition 3.2 (Object wrapping and unwrapping). If an object 𝑂 is stored in a field of an object 𝑊 (or in an array entry of𝑊 , in case𝑊 is an array), then 𝑂 is wrapped into 𝑊 . Conversely, if an object 𝑂 is loaded from a field of an object 𝑊 (or from an array entry of 𝑊 in case 𝑊 is an array), then 𝑂 is unwrapped from 𝑊 . (The simple example in Figure 1 contains no wrapping or unwrapping.) With these definitions in place, we can describe the three precision-loss patterns as different kinds of value flows, depicted in Figure 2. Definition 3.3 (Direct flow). If, in some execution of the program, an object 𝑂 is passed as a parameter to an In method 𝑀1 of class 𝐶, and then flows (via a series of assignments, field load/store operations, method calls, or returns) to the return value of an Out method, 𝑀2, of the same class 𝐶, then we say the program has direct flow from 𝑀1 to 𝑀2. (The example in Figure 1 is a simple instance of this pattern.) Definition 3.4 (Wrapped flow). If, in some execution of the program, an object 𝑂 is passed as a parameter to an In method 𝑀1 of class 𝐶 and then flows to a store operation that wraps 𝑂 into an object 𝑊 , where 𝑊 subsequently flows to the result of an Out method, 𝑀2, of the same class 𝐶, then we say the program has wrapped flow from 𝑀1 to 𝑀2. More generally, the wrapped flow pattern also covers value flow through multiple object wrapping steps, for example when 𝑊 is itself wrapped into another object 𝑊 ′ , which flows to the return value of 𝑀2. Definition 3.5 (Unwrapped flow). If, in some execution of the program, an object 𝑂 is passed as a parameter to an In method 𝑀1 of class 𝐶 and then flows to a load operation that unwraps an object 𝑈 from 𝑂, where 𝑈 subsequently flows to the return value of an Out method, 𝑀2, of the same class 𝐶, then we say the program has unwrapped flow from 𝑀1 to 𝑀2. As in the previous definition, unwrapped flow also covers value flow through multiple object unwrapping steps. ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:7 Direct Flow 1 class Person 15} O @ String name;String id; 16 /Usage Code (19)name1 name2 (24) void setName(String nm){ 17 void main(){ this.name nm: 18 Person p1 new Person(); (3) updateID(): 19 String name1 new String("A"): this:name (4) 6 20 p1.setName(name1): 7 void updateID(){ 21 String id1 p1.getID(): newName (8) 8 String newName this.name: 22 9 this.id newName: 23 Person p2 new Person(); this.id (9) 101 24 String name2 new String("B"): 11 String getID(){ 25 p2.setName(name2): id (12) 12 String id this.id: 26 String id2 p2.getID(): 13 return id: 27 (21)1d1 1d2(26) 14 ①② ①② Fig.3.Example of direct flow.(The line number for each variable/field reference on the right-hand side is shown in parentheses.) 3.1 Pattern 1:Direct Flow The setter and getter example shown in Figure 3 demonstrates how direct flow is an indication of precision loss for a context-insensitive analysis.The Person class provides methods setName and getID to modify a person's name and retrieve his or her ID.Whenever a person's name is modified, the ID is updated accordingly (line 5). After executing this code,id1 in line 21(resp.id2 in line 26)points to object 1in line 19 (resp.2 in line 24)only.However,if the three methods of Person are analyzed using a context-insensitive pointer analysis,then idl and id2 will both imprecisely point to objects 1and2.Let us examine how this imprecision is connected to the direct flow pattern. The right-hand side of Figure 3 illustrates how two objects1and 2,respectively pointed to by name1 and name2,first flow from their creation sites in lines 19 and 24 to the parameter nm of the IN method setName in line 3,and then to id in line 12 through a series of store and load operations (line 4-line 8-line 9-line 12),and finally out of the Our method getID to id1 and id2 in lines 21 and 26.Hence,by Definition 3.3,the red arrows in Figure 3 form a direct flow. Notice that with a context-insensitive analysis,objects 1and 2are merged in the same points-to set and further propagated according to this direct flow.In the analysis,the merged objects will flow out of the Our method,causing id1 and id2 to point to spurious objects.Such imprecision will only get worse when some operations are further applied on id1 and id2(not shown in this example),possibly polluting other parts of the program. One way to avoid the imprecision is to apply context sensitivity to the methods that participate in the direct flow.We consider these to be precision-critical methods,since analyzing just one of them context-insensitively will likely introduce imprecision.With a context-sensitive analysis(for most variants of context sensitivity),in Figure 3,all variables and field references along the direct flow will be analyzed separately.For example,object sensitivity will use the two allocation sites at lines 18 and 23 as contexts.Accordingly,the merged paths along this direct flow are separated by the two contexts,like unzipping a zipper-hence the name of our technique.A similar strategy of separating merged paths also applies to wrapped and unwrapped flows,as shown next. 3.2 Pattern 2:Wrapped Flow The collection and iterator example shown in Figure 4 demonstrates how the wrapped flow pattern yields precision loss for a context-insensitive analysis.To keep the example simple,the collection ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:7 Direct Flow name1 1 2 1 2 class Person { 1 String name; String id; void setName(String nm) { this.name = nm; updateID(); } void updateID() { String newName = this.name; this.id = newName; } String getID() { String id = this.id; return id; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 } // Usage Code void main() { Person p1 = new Person(); String name1 = new String("A"); p1.setName(name1); String id1 = p1.getID(); Person p2 = new Person(); String name2 = new String("B"); p2.setName(name2); String id2 = p2.getID(); } 15 16 17 18 19 20 21 22 23 24 25 26 27 name2 nm this.name newName this.id id id1 id2 2 (24) (3) (4) (8) (9) (12) (21) (26) (19) Fig. 3. Example of direct flow. (The line number for each variable/field reference on the right-hand side is shown in parentheses.) 3.1 Pattern 1: Direct Flow The setter and getter example shown in Figure 3 demonstrates how direct flow is an indication of precision loss for a context-insensitive analysis. The Person class provides methods setName and getID to modify a person’s name and retrieve his or her ID. Whenever a person’s name is modified, the ID is updated accordingly (line 5). After executing this code, id1 in line 21 (resp. id2 in line 26) points to object 1 in line 19 (resp. 2 in line 24) only. However, if the three methods of Person are analyzed using a context-insensitive pointer analysis, then id1 and id2 will both imprecisely point to objects 1 and 2 . Let us examine how this imprecision is connected to the direct flow pattern. The right-hand side of Figure 3 illustrates how two objects 1 and 2 , respectively pointed to by name1 and name2, first flow from their creation sites in lines 19 and 24 to the parameter nm of the In method setName in line 3, and then to id in line 12 through a series of store and load operations (line 4 → line 8 → line 9 → line 12), and finally out of the Out method getID to id1 and id2 in lines 21 and 26. Hence, by Definition 3.3, the red arrows in Figure 3 form a direct flow. Notice that with a context-insensitive analysis, objects 1 and 2 are merged in the same points-to set and further propagated according to this direct flow. In the analysis, the merged objects will flow out of the Out method, causing id1 and id2 to point to spurious objects. Such imprecision will only get worse when some operations are further applied on id1 and id2 (not shown in this example), possibly polluting other parts of the program. One way to avoid the imprecision is to apply context sensitivity to the methods that participate in the direct flow. We consider these to be precision-critical methods, since analyzing just one of them context-insensitively will likely introduce imprecision. With a context-sensitive analysis (for most variants of context sensitivity), in Figure 3, all variables and field references along the direct flow will be analyzed separately. For example, object sensitivity will use the two allocation sites at lines 18 and 23 as contexts. Accordingly, the merged paths along this direct flow are separated by the two contexts, like unzipping a zipper—hence the name of our technique. A similar strategy of separating merged paths also applies to wrapped and unwrapped flows, as shown next. 3.2 Pattern 2: Wrapped Flow The collection and iterator example shown in Figure 4 demonstrates how the wrapped flow pattern yields precision loss for a context-insensitive analysis. To keep the example simple, the collection ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:8 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Wrapped Flow class Collection{ 18 Object next(){ ( ② 2 Object elem; 19 return this.next: (25)s1 s2(31) 3 void add(Object el){ 20 this.elem =el: 21 (3) 22 //Usage Code 6 Iterator iterator(){ 23 void main(){ this.elem (4) Object e this.elem: 24 Collection c1 new Collection(): Iterator itr 25 String s1 new String("A"): ⊙ new Iterator(e); 26 c1.add(s1): 9 return itr; 27 Iterator i1 c1.iterator(): (8)itr obi (14) 10 } 28 Object o1 i1.next() 11} 29 this next(③X②) (15) 12 class Iterator 30 Collection c2 new Collection(): 13 Object next: 1 String s2 new String("B"): 17 (27)(33) 14 Iterator(Object obj){ 32 c2.add(s2): ①回0回 15 this.next obj: 33 Iterator 12 c2.iterator(); 16 34 Object o2 12.next(); 01 (28)(34) 17 35 ①@①② Fig.4.Example of wrapped flow. only stores one element,however the code pattern is directly analogous to realistic code,for arbitrarily-sized collections.Class Collection provides an add method to add an element to the collection and an iterator method to return an iterator that has a pointer,next,pointing to the collection element(as set in line 15).The element is passed as an argument to the newly created iterator(line 8),which establishes a connection between the collection and its iterator.Two objects D(line 25)and (line 31)are stored in two different collections,c1(line 26)and c2 (line 32).The two objects are then accessed by the iterators of the collections(lines 28 and 34). After executing the code,o1 in line 28(resp.o2 in line 34)points to object 1(resp.)only.How- ever,if any of the four methods of Collection and Iterator are analyzed context-insensitively,o1 and o2 will both imprecisely point to both objects1and 2.Let us examine how this imprecision is connected to the wrapped flow pattern. As shown on the right-hand side of Figure 4,similarly to the direct flow example in Figure 3, objects 1 and2 flow into the IN method add of class Collection,and then further to lines 7, 8,and 14.Unlike a direct flow,the objects 1 and2 do not directly flow out of the Our method iterator of class Collection;instead,a wrapper Iterator object,(created on line 8)in which object 1 or 2 is stored,flows out of this Our method. Object wrapping(Definition 3.2)occurs in line 15:objects 1and (pointed to by obj)are stored into the next field of the object pointed to by this,and this points to the receiver object of the constructor call in line 8,which is also pointed to by itr in line 8.As a wrapper object(that stores object1or )flows out of an OUT method of the same class,by Definition 3.4,the solid blue arrows in Figure 4 form a wrapped flow. With a context-insensitive analysis,objects 1 and 2are merged in the same points-to set and further propagated according to this wrapped flow.However,unlike a direct flow,imprecision is not introduced until the access operation (e.g.,the next calls in lines 28 and 34)is applied on the flowing-out wrapper object,causing variables o1 and o2 to point to spurious objects.The wrapper objects carry the flowing-in objects,which originate from outside the class,so context sensitivity can separate the merged objects all along their flow through the Collection class. The example also helps illustrate some subtleties of the flow definitions.Note that the precision loss patterns are expressed relative to a class:for each of the three patterns,the IN method and ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:8 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis s1 s2 o1 o2 itr this.elem el Wrapped Flow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Object next() { return this.next; } } //Usage Code void main() { Collection c1 = new Collection(); String s1 = new String("A"); c1.add(s1); Iterator i1 = c1.iterator(); Object o1 = i1.next(); Collection c2 = new Collection(); String s2 = new String("B"); c2.add(s2); Iterator i2 = c2.iterator(); Object o2 = i2.next(); } class Collection { Object elem; void add(Object el) { this.elem = el; } Iterator iterator(){ Object e = this.elem; Iterator itr = new Iterator(e); return itr; } } class Iterator { Object next; Iterator(Object obj) { this.next = obj; } e obj this.next 1 2 i1 i2 1 2 1 2 1 2 1 2 1 2 (25) (31) (3) (4) (7) (14) (15) (33) (34) (8) (27) (28) Fig. 4. Example of wrapped flow. only stores one element, however the code pattern is directly analogous to realistic code, for arbitrarily-sized collections. Class Collection provides an add method to add an element to the collection and an iterator method to return an iterator that has a pointer, next, pointing to the collection element (as set in line 15). The element is passed as an argument to the newly created iterator (line 8), which establishes a connection between the collection and its iterator. Two objects 1 (line 25) and 2 (line 31) are stored in two different collections, c1 (line 26) and c2 (line 32). The two objects are then accessed by the iterators of the collections (lines 28 and 34). After executing the code, o1 in line 28 (resp. o2 in line 34) points to object 1 (resp. 2 ) only. However, if any of the four methods of Collection and Iterator are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects 1 and 2 . Let us examine how this imprecision is connected to the wrapped flow pattern. As shown on the right-hand side of Figure 4, similarly to the direct flow example in Figure 3, objects 1 and 2 flow into the In method add of class Collection, and then further to lines 7, 8, and 14. Unlike a direct flow, the objects 1 and 2 do not directly flow out of the Out method iterator of class Collection; instead, a wrapper Iterator object, , (created on line 8) in which object 1 or 2 is stored, flows out of this Out method. Object wrapping (Definition 3.2) occurs in line 15: objects 1 and 2 (pointed to by obj) are stored into the next field of the object pointed to by this, and this points to the receiver object of the constructor call in line 8, which is also pointed to by itr in line 8. As a wrapper object (that stores object 1 or 2 ) flows out of an Out method of the same class, by Definition 3.4, the solid blue arrows in Figure 4 form a wrapped flow. With a context-insensitive analysis, objects 1 and 2 are merged in the same points-to set and further propagated according to this wrapped flow. However, unlike a direct flow, imprecision is not introduced until the access operation (e.g., the next calls in lines 28 and 34) is applied on the flowing-out wrapper object, causing variables o1 and o2 to point to spurious objects. The wrapper objects carry the flowing-in objects, which originate from outside the class, so context sensitivity can separate the merged objects all along their flow through the Collection class. The example also helps illustrate some subtleties of the flow definitions. Note that the precision loss patterns are expressed relative to a class: for each of the three patterns, the In method and ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 19 Unwrapped Flow 1 class SyncBox 19 Object getItem()( Object box: 28 Object it this.item: (26) s1 s2 (31) 3 SyncBox(Boxbox){ 21 return it; this.box box: 22 (27) b1 b2 32) 23 box (3) 6 Object getItem(){ 24 //Usage Code synchronized(this){ 25 void main(){ this.box (4) 8 Box b this.box: 26 String s1 new String("A"): 9 object o b.getItem(): 27 Box b1 new Box(s1); 回 (8) 10 return o: 28 SyncBox sb1 new SyncBox(b1): 11 29 Object o1 sb1.getItem(): (20) Eh因1tem①② 12 30 13} 31 String s2 new String("B"): (20) 14 class Box{ 32 Box b2 new Box(s2): ǎ 15 Object item: 33 SyncBox sb2 new SyncBox(b2): (9) 16 Box(Object item){ 34 Object o2 sb2.getItem() 1) this.item item; 353 (29)01 02(34) 18 ①0 ①@ Fig.5.Example of unwrapped flow. the Our method must be in the same class,although the value flow may involve other classes,as described in Definitions 3.3-3.5.Intuitively,if the precision-loss flows introduced in each class (through method calls on the objects of the class)could be identified and then avoided by use of context sensitivity,the imprecision of the whole program could be accordingly controlled via such a divide-and-conquer scheme.In addition,this design choice enables an efficient and elegant algorithm for identifying occurrences of the patterns in a given program,by considering each class one by one,as explained in Section 4. Therefore,the dashed arrows(bottom right of Figure 4)formed by calling the next method in lines 28 and 34,do not belong to the wrapped flow,because the calls happen after the wrapper objects flow out from the Our method of class Collection.Thus,as explained in Section 3.1, only methods add and iterator (in Collection)and the constructor Iterator(in Iterator)are included in the wrapped flow and thus considered precision-critical.However,if we consider IN and Our methods from the point of view of class Iterator,then method next is also precision-critical, since it is involved in a direct flow together with the Iterator constructor,much like the setter and getter methods in Section 3.1. 3.3 Pattern 3:Unwrapped Flow We use a synchronized box example(based on classes SynchronizedSet and Set in the JDK but heavily simplified)to illustrate an unwrapped flow,as shown in Figure 5.Class SyncBox encapsulates class Box by providing synchronization in the encapsulating method getItem(lines 6-12).Two objects 1 and 2are stored into two Box objects(represented by and pointed to by b1 and b2 in lines 27 and 32),which are further stored into two SyncBox objects(lines 28 and 33). After executing the code,o1 in line 29 (resp.o2 in line 34)points to object1(resp.2)only. However,if any of the four methods of classes SyncBox and Box are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects1and 2.Let us examine how this imprecision is connected to the unwrapped flow pattern. As shown on the right-hand side of Figure 5,similar to the direct flow in Figure 3,two Box objects 1and 2(pointed to by b1 and b2,respectively)flow into the body of class SyncBox through its constructor,which acts as an IN method,and then further to b in line 8.Unlike in a direct flow, ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:9 o1 o2 (34) b1 s1 s2 b this.box box Unwrapped Flow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 this.item class SyncBox { Object box; SyncBox(Box box) { this.box = box; } Object getItem() { synchronized(this) { Box b = this.box; Object o = b.getItem(); return o; } } } class Box { Object item; Box(Object item) { this.item = item; } //Usage Code void main() { String s1 = new String("A"); Box b1 = new Box(s1); SyncBox sb1 = new SyncBox(b1); Object o1 = sb1.getItem(); String s2 = new String("B"); Box b2 = new Box(s2); SyncBox sb2 = new SyncBox(b2); Object o2 = sb2.getItem(); } Object getItem() { Object it = this.item; return it; } } b2 it o (26) (31) (27) (32) (3) (4) (8) (20) (20) (9) (29) 1 2 1 2 1 2 1 2 1 2 Fig. 5. Example of unwrapped flow. the Out method must be in the same class, although the value flow may involve other classes, as described in Definitions 3.3—3.5. Intuitively, if the precision-loss flows introduced in each class (through method calls on the objects of the class) could be identified and then avoided by use of context sensitivity, the imprecision of the whole program could be accordingly controlled via such a divide-and-conquer scheme. In addition, this design choice enables an efficient and elegant algorithm for identifying occurrences of the patterns in a given program, by considering each class one by one, as explained in Section 4. Therefore, the dashed arrows (bottom right of Figure 4) formed by calling the next method in lines 28 and 34, do not belong to the wrapped flow, because the calls happen after the wrapper objects flow out from the Out method of class Collection. Thus, as explained in Section 3.1, only methods add and iterator (in Collection) and the constructor Iterator (in Iterator) are included in the wrapped flow and thus considered precision-critical. However, if we consider In and Out methods from the point of view of class Iterator, then method next is also precision-critical, since it is involved in a direct flow together with the Iterator constructor, much like the setter and getter methods in Section 3.1. 3.3 Pattern 3: Unwrapped Flow We use a synchronized box example (based on classes SynchronizedSet and Set in the JDK but heavily simplified) to illustrate an unwrapped flow, as shown in Figure 5. Class SyncBox encapsulates class Box by providing synchronization in the encapsulating method getItem (lines 6–12). Two objects 1 and 2 are stored into two Box objects (represented by and pointed to by b1 and b2 in lines 27 and 32), which are further stored into two SyncBox objects (lines 28 and 33). After executing the code, o1 in line 29 (resp. o2 in line 34) points to object 1 (resp. 2 ) only. However, if any of the four methods of classes SyncBox and Box are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects 1 and 2 . Let us examine how this imprecision is connected to the unwrapped flow pattern. As shown on the right-hand side of Figure 5, similar to the direct flow in Figure 3, two Box objects 1 and 2 (pointed to by b1 and b2, respectively) flow into the body of class SyncBox through its constructor, which acts as an In method, and then further to b in line 8. Unlike in a direct flow, ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020
1:10 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis the flowing-in objects and 2]do not flow out of the Our method getItem of class SyncBox; instead,the two unwrapped objects 1 and 2(respectively stored in1 and 2)are the ones that flow out of this Our method. Object unwrapping (Definition 3.2)occurs in line 20,as a result of the call in line 9:the Box objects (1and 2pointed to by b)are the receiver objects of this virtual call,and this in line 20 will also point to them during pointer analysis.The load operation in line 20 lets the unwrapped objects(1 and 2)flow to it (line 20),and finally to o1 and o2 (lines 29 and 34)through consecutive method return values(line 21-line 9 and then line 10-lines 29 and 34).As the unwrapped objects(retrieved from the flowing-in objects)flow out of an Our method of the same class,by Definition 3.5,the green arrows(in Figure 5)form an unwrapped flow. We can observe that objects and 2](and hence the unwrapped objects 1and they contain) are merged in the same points-to set and further propagated according to this unwrapped flow. Although the flowing-in objects do not flow out of an Our method of the same class to introduce imprecision,the unwrapped objects do,causing the receiving variables,in this case o1 and o2(lines 29 and 34),to point to spurious objects. Note that the program points where the unwrapped objects are stored in the flowing-in objects (lines 26-27 and 31-32)do not belong in the unwrapped flow,as the objects have not yet entered the IN method of class SyncBox.Thus,only constructor SyncBox,method getItem(in SyncBox), and method getItem(in Box)belong in the unwrapped flow and are considered precision-critical. However,as in the explanation of the wrapped flow example in Section 3.2,if we consider IN and Our methods from the point of view of class Box,its constructor,Box,will still be analyzed context-sensitively as it is part of a direct flow(together with the getItem method in Box). 3.4 Combination of Flows Some imprecision cannot be described by one pattern alone but only by combinations.Consider the example of an object W that flows into an IN method,where an object O is unwrapped from W.Then O is wrapped into another wrapper object,W',which flows out from an Our method of the same class.Imprecision may arise in this case,and although none of the three basic flow patterns in isolation match this flow,it is captured by a combination of unwrapped and wrapped flows.ZIPPER identifies not only occurrences of the three patterns but also such combinations.Our experiments(Section 6)show that the patterns and their combinations account for essentially all the imprecision that may appear in context-insensitive analysis. 4 ZIPPER This section introduces ZIPPER:our approach for identifying precision-critical methods based on the precision loss patterns of Section 3.Even if the patterns successfully characterize the main causes of precision loss in context-insensitive analysis,two challenges remain.First,the precision loss patterns are defined in dynamic execution terms,while ZIPPER has to capture the potential for these patterns using static information.Second,useful static information has to be computable from a mere context-insensitive analysis,in order to guide a context-sensitive one.That is,the potential for precision loss has to be detected from an analysis that already exhibits this loss.The ZIPPER approach is defined with these goals in mind,and manages to make context-sensitive pointer analysis run faster while preserving most of its precision. We present the overview of ZIPPER in Section 4.1 and the concepts of object flow graphs and precision flow graphs in Sections 4.2 and 4.3,respectively. ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020
1:10 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis the flowing-in objects 1 and 2 do not flow out of the Out method getItem of class SyncBox; instead, the two unwrapped objects 1 and 2 (respectively stored in 1 and 2 ) are the ones that flow out of this Out method. Object unwrapping (Definition 3.2) occurs in line 20, as a result of the call in line 9: the Box objects ( 1 and 2 pointed to by b) are the receiver objects of this virtual call, and this in line 20 will also point to them during pointer analysis. The load operation in line 20 lets the unwrapped objects ( 1 and 2 ) flow to it (line 20), and finally to o1 and o2 (lines 29 and 34) through consecutive method return values (line 21 → line 9 and then line 10 → lines 29 and 34). As the unwrapped objects (retrieved from the flowing-in objects) flow out of an Out method of the same class, by Definition 3.5, the green arrows (in Figure 5) form an unwrapped flow. We can observe that objects 1 and 2 (and hence the unwrapped objects 1 and 2 they contain) are merged in the same points-to set and further propagated according to this unwrapped flow. Although the flowing-in objects do not flow out of an Out method of the same class to introduce imprecision, the unwrapped objects do, causing the receiving variables, in this case o1 and o2 (lines 29 and 34), to point to spurious objects. Note that the program points where the unwrapped objects are stored in the flowing-in objects (lines 26–27 and 31–32) do not belong in the unwrapped flow, as the objects have not yet entered the In method of class SyncBox. Thus, only constructor SyncBox, method getItem (in SyncBox), and method getItem (in Box) belong in the unwrapped flow and are considered precision-critical. However, as in the explanation of the wrapped flow example in Section 3.2, if we consider In and Out methods from the point of view of class Box, its constructor, Box, will still be analyzed context-sensitively as it is part of a direct flow (together with the getItem method in Box). 3.4 Combination of Flows Some imprecision cannot be described by one pattern alone but only by combinations. Consider the example of an object 𝑊 that flows into an In method, where an object 𝑂 is unwrapped from 𝑊 . Then 𝑂 is wrapped into another wrapper object, 𝑊 ′ , which flows out from an Out method of the same class. Imprecision may arise in this case, and although none of the three basic flow patterns in isolation match this flow, it is captured by a combination of unwrapped and wrapped flows. Zipper identifies not only occurrences of the three patterns but also such combinations. Our experiments (Section 6) show that the patterns and their combinations account for essentially all the imprecision that may appear in context-insensitive analysis. 4 ZIPPER This section introduces Zipper: our approach for identifying precision-critical methods based on the precision loss patterns of Section 3. Even if the patterns successfully characterize the main causes of precision loss in context-insensitive analysis, two challenges remain. First, the precision loss patterns are defined in dynamic execution terms, while Zipper has to capture the potential for these patterns using static information. Second, useful static information has to be computable from a mere context-insensitive analysis, in order to guide a context-sensitive one. That is, the potential for precision loss has to be detected from an analysis that already exhibits this loss. The Zipper approach is defined with these goals in mind, and manages to make context-sensitive pointer analysis run faster while preserving most of its precision. We present the overview of Zipper in Section 4.1 and the concepts of object flow graphs and precision flow graphs in Sections 4.2 and 4.3, respectively. ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020