cm acm sable Precision-Guided Context Sensitivity for Pointer Analysis YUE LI,Aarhus University,Denmark TIAN TAN,Aarhus University,Denmark ANDERS MOLLER,Aarhus University,Denmark YANNIS SMARAGDAKIS,University of Athens,Greece Context sensitivity is an essential technique for ensuring high precision in Java pointer analyses.It has been observed that applying context sensitivity partially,only on a select subset of the methods,can improve the balance between analysis precision and speed.However,existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset.In this work,we present a more principled approach for identifying precision-critical methods,based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis.Accordingly,we provide an efficient algorithm to recognize these flow patterns in a given program and exploit them to yield good tradeoffs between analysis precision and speed. Our experimental results on standard benchmark and real-world programs show that a pointer analysis that applies context sensitivity partially,only on the identified precision-critical methods,preserves effectively all (98.8%)of the precision of a highly-precise conventional context-sensitive pointer analysis(2-object-sensitive with a context-sensitive heap),with a substantial speedup (on average 3.4X,and up to 9.2X). CCS Concepts:.Theory of computation-Program analysis; Additional Key Words and Phrases:static analysis,points-to analysis,Java ACM Reference Format: Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis.2018.Precision-Guided Context Sensitivity for Pointer Analysis.Proc.ACM Program.Lang.2,OOPSLA,Article 141(November 2018),29 pages.https://doi. org/10.1145/3276511 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that estimate the possible values of 14 pointer variables in a program.Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs,and it is used in a wide range of software engineering tools,e.g.,for bug detection [Chandra et al.2009;Naik et al.2006,2009],security analysis [Arzt et al.2014;Grech and Smaragdakis 2017;Livshits and Lam 2005].program verifica- tion [Fink et al.2008;Pradel et al.2012],and program debugging and understanding [Li et al.2016; Sridharan et al.2007]. For decades,numerous analysis techniques have been developed to make pointer analysis more precise and more efficient,especially for object-oriented languages [Hind 2001;Smaragdakis and Balatsouras 2015;Sridharan et al.2013].One of the most successful ideas for producing high precision is context sensitivity [Milanova et al.2002,2005;Sharir and Pnueli 1981;Shivers 1991; Smaragdakis et al.2011],which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method's variables and Authors'email addresses:yueli@cs.au.dk,tiantan@cs.au.dk,amoeller@cs.au dk,smaragd@diuoagr. This work is licensed under a Creative Commons Attribution 4.0 International License. 2018 Copyright held by the owner/author(s). 2475-1421/2018/11-ART141 https:/doi.org10.1145/3276511 Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141 Precision-Guided Context Sensitivity for Pointer Analysis YUE LI, Aarhus University, Denmark TIAN TAN, Aarhus University, Denmark ANDERS MéLLER, Aarhus University, Denmark YANNIS SMARAGDAKIS, University of Athens, Greece Context sensitivity is an essential technique for ensuring high precision in Java pointer analyses. It has been observed that applying context sensitivity partially, only on a select subset of the methods, can improve the balance between analysis precision and speed. However, existing techniques are based on heuristics that do not provide much insight into what characterizes this method subset. In this work, we present a more principled approach for identifying precision-critical methods, based on general patterns of value flows that explain where most of the imprecision arises in context-insensitive pointer analysis. Accordingly, we provide an efficient algorithm to recognize these flow patterns in a given program and exploit them to yield good tradeoffs between analysis precision and speed. Our experimental results on standard benchmark and real-world programs show that a pointer analysis that applies context sensitivity partially, only on the identified precision-critical methods, preserves effectively all (98.8%) of the precision of a highly-precise conventional context-sensitive pointer analysis (2-object-sensitive with a context-sensitive heap), with a substantial speedup (on average 3.4X, and up to 9.2X). CCS Concepts: • Theory of computation → Program analysis; Additional Key Words and Phrases: static analysis, points-to analysis, Java ACM Reference Format: Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis. 2018. Precision-Guided Context Sensitivity for Pointer Analysis. Proc. ACM Program. Lang. 2, OOPSLA, Article 141 (November 2018), 29 pages. https://doi. org/10.1145/3276511 1 INTRODUCTION Pointer analysis is a fundamental family of static analyses that estimate the possible values of pointer variables in a program. Such information is essential for reasoning about aliasing and inter-procedural control flow in object-oriented programs, and it is used in a wide range of software engineering tools, e.g., for bug detection [Chandra et al. 2009; Naik et al. 2006, 2009], security analysis [Arzt et al. 2014; Grech and Smaragdakis 2017; Livshits and Lam 2005], program verification [Fink et al. 2008; Pradel et al. 2012], and program debugging and understanding [Li et al. 2016; Sridharan et al. 2007]. For decades, numerous analysis techniques have been developed to make pointer analysis more precise and more efficient, especially for object-oriented languages [Hind 2001; Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. One of the most successful ideas for producing high precision is context sensitivity [Milanova et al. 2002, 2005; Sharir and Pnueli 1981; Shivers 1991; Smaragdakis et al. 2011], which allows each program method to be analyzed under different contexts, to separate the static abstractions of different dynamic instantiations of the method’s variables and Authors’ email addresses: yueli@cs.au.dk, tiantan@cs.au.dk, amoeller@cs.au.dk, smaragd@di.uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2018 Copyright held by the owner/author(s). 2475-1421/2018/11-ART141 https://doi.org/10.1145/3276511 Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018. This work is licensed under a Creative Commons Attribution 4.0 International License
141:2 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis thereby reduce spurious object flows.However,despite great precision benefits,context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013;Lhotak and Hendren 2006; Oh et al.2014;Tan et al.2016,2017;Xu and Rountev 2008].One reason is that,with conventional context-sensitivity techniques,every method in a program is treated the same,meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence,too much space and time is consumed [Smaragdakis et al.2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision.It is far from trivial to determine when a context-sensitive analysis will yield precision benefits(or conversely,to determine when omitting context sensitivity for a method would introduce imprecision).This challenge of effectively identifying the precision-critical methods has been the focus of past work [Hassanshahi et al.2017;Jeong et al.2017;Smaragdakis et al.2014;Wei and Ryder 2015].Those techniques are based on heuristics that seem to correlate with imprecision,but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis.For example,introspective analysis [Smaragdakis et al.2014]requires tuning multiple parameters involving sizes of various kinds of points-to sets,and data-driven analysis [Jeong et al. 2017]is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this paper,we provide a more principled approach,named ZIPPER,to efficiently identify precision-critical methods,based on insights about how imprecision is introduced.The key observa- tion is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows,which we call direct,wrapped,and unwrapped flows.Moreover, we show that these three kinds of value flows can be recognized efficiently.Based on information obtained from a fast,context-insentive pointer analysis,ZIPPER constructs a precision flow graph (PFG)that concisely models the relevant value flow.The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time compared to the pointer analysis itself.By applying context sensitivity to the precision-critical methods identified by ZIPPER,a pointer analysis runs significantly faster than conventional tech- niques that apply context sensitivity indiscriminately to all methods,while retaining most of the precision. In summary,we make the following contributions. We describe three fundamental patterns of value flows that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis(Section 2). We present the ZIPPER approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity(Section 3). ZIPPER can guide context-sensitive pointer analysis to run faster while keeping most of its precision.In contrast to other techniques that apply context sensitivity selectively,the ZIPER approach is based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of analysis parameters. We provide an extensive experimental evaluation of our implementation of ZIPPER to evaluate its effectiveness(Section 4).On average,ZIPPER reports that only 38%of the methods are precision-critical,which preserves 98.8%of the precision(measured as average across a range of popular analysis clients)for a 2-object-sensitive pointer analysis with a context-sensitive heap,for a speedup of 3.4X and up to 9.2X.These results demonstrate that the three patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:2 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis thereby reduce spurious object flows. However, despite great precision benefits, context sensitivity comes with heavy efficiency costs [Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Oh et al. 2014; Tan et al. 2016, 2017; Xu and Rountev 2008]. One reason is that, with conventional context-sensitivity techniques, every method in a program is treated the same, meaning that many methods that do not benefit from context sensitivity are analyzed for multiple contexts redundantly. As a consequence, too much space and time is consumed [Smaragdakis et al. 2014]. This naturally raises the question of whether it is possible to apply context sensitivity selectively, only for the methods where it is beneficial to the overall analysis precision. It is far from trivial to determine when a context-sensitive analysis will yield precision benefits (or conversely, to determine when omitting context sensitivity for a method would introduce imprecision). This challenge of effectively identifying the precision-critical methods has been the focus of past work [Hassanshahi et al. 2017; Jeong et al. 2017; Smaragdakis et al. 2014; Wei and Ryder 2015]. Those techniques are based on heuristics that seem to correlate with imprecision, but they do not provide a comprehensive understanding of how and where the imprecision is introduced in a context-insensitive pointer analysis. For example, introspective analysis [Smaragdakis et al. 2014] requires tuning multiple parameters involving sizes of various kinds of points-to sets, and data-driven analysis [Jeong et al. 2017] is parameterized by a collection of syntactic features and relies on machine learning for selecting good heuristics. In this paper, we provide a more principled approach, named Zipper, to efficiently identify precision-critical methods, based on insights about how imprecision is introduced. The key observation is that most cases in which imprecision arises in a context-insensitive pointer analysis fit into three general patterns of value flows, which we call direct, wrapped, and unwrapped flows. Moreover, we show that these three kinds of value flows can be recognized efficiently. Based on information obtained from a fast, context-insentive pointer analysis, Zipper constructs a precision flow graph (PFG) that concisely models the relevant value flow. The identification of precision-critical methods can then be formulated as a graph reachability problem on the PFG and solved in negligible time, compared to the pointer analysis itself. By applying context sensitivity to the precision-critical methods identified by Zipper, a pointer analysis runs significantly faster than conventional techniques that apply context sensitivity indiscriminately to all methods, while retaining most of the precision. In summary, we make the following contributions. • We describe three fundamental patterns of value flows that help in explaining how and where most of the imprecision is introduced in a context-insensitive pointer analysis (Section 2). • We present the Zipper approach to effectively recognize the three value-flow patterns and thereby identify the precision-critical methods that benefit from context sensitivity (Section 3). Zipper can guide context-sensitive pointer analysis to run faster while keeping most of its precision. In contrast to other techniques that apply context sensitivity selectively, the Zipper approach is based on a tangible understanding of imprecision and not on heuristics that require non-transparent machine learning or other tuning of analysis parameters. • We provide an extensive experimental evaluation of our implementation of Zipper to evaluate its effectiveness (Section 4). On average, Zipper reports that only 38% of the methods are precision-critical, which preserves 98.8% of the precision (measured as average across a range of popular analysis clients) for a 2-object-sensitive pointer analysis with a context-sensitive heap, for a speedup of 3.4X and up to 9.2X. These results demonstrate that the three patterns of value flows indeed capture the vast majority of methods that benefit from context sensitivity. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:3 2 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS Our approach is based on the key insight that most of the precision loss in context-insensitive pointer analysis for Java can be expressed in terms of three basic patterns of value flows,or as combinations of these.We assume the reader is familiar with state-of-the-art context-sensitive pointer analysis techniques,e.g.,as covered in several surveys [Ryder 2003;Smaragdakis and Balatsouras 2015;Sridharan et al.2013],however,the precision loss patterns are independent of the chosen variant of context sensitivity,such as call-site sensitivity [Sharir and Pnueli 1981;Shivers 1991],object sensitivity [Milanova et al.2005],and type sensitivity [Smaragdakis et al.2011].In this section,we introduce the three precision loss patterns and then describe three corresponding concrete examples(Sections 2.1-2.3).This characterization of precision loss provides the conceptual foundation for ZIPPER to identify precision-critical methods as explained in Section 3. A context-insensitive analysis does not distinguish between different calls to a method but merges the incoming abstract values(or points-to sets,in the case of pointer object m(object o){ analysis)[Sharir and Pnueli 1981].Figure 1 shows a simple example. 2 return o; If method m is analyzed context-insensitively,then the two objects 3} are mixed together,so the analysis conservatively concludes that 4 x1 new A(): both x2 and y2 may point to both the A object and the B object. 5x2=m(x1): In contrast,a context-sensitive analysis would analyze m twice, 6 y1 new B(); 7y2=m(y1): corresponding to the two different call sites,and thereby conclude Fig.1.Example of precision loss that x2 can only point to an A object and y2 can only point to a B in context-insensitive analysis. object.The price of that extra precision is that the method needs to be analyzed multiple times,so context sensitivity should ideally only be applied when the precision gain outweighs the extra analysis time. To characterize the relevant value flows,we first introduce some terminology. Definition 2.1(IN and Our methods).Given a class C and a method M that is declared in C or inherited from C's super-classes,if M contains one or more parameters then M is an IN method of C,and if M's return type is non-void then M is an Our method of C.(In the example in Figure 1,m is both an IN and an Our method of the surrounding class.) Definition 2.2(Object wrapping and unwrapping).If an object O is stored in a field of an object W (or in an array entry of W,in case W is an array),then O is wrapped into W.Conversely,if an object O is loaded from a field of an object W(or from an array entry of W in case W is an array),then O is unwrapped from W.(The simple example in Figure 1 contains no wrapping or unwrapping.) With these definitions in place,we can describe the three precision-loss patterns as different kinds of value flows,depicted in Figure 2. Definition 2.3(Direct flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C,and then flows(via a series of assignments,field Direct Wrapped Unwrapped Flow Flow Flow IN Method Objects Value flow via assignments, field load/store operations, OUT Method or method calls/returns Fig.2.Three basic patterns of value flow that cause precision loss in context-insensitive analysis. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:3 2 CAUSES OF IMPRECISION IN CONTEXT-INSENSITIVE POINTER ANALYSIS Our approach is based on the key insight that most of the precision loss in context-insensitive pointer analysis for Java can be expressed in terms of three basic patterns of value flows, or as combinations of these. We assume the reader is familiar with state-of-the-art context-sensitive pointer analysis techniques, e.g., as covered in several surveys [Ryder 2003; Smaragdakis and Balatsouras 2015; Sridharan et al. 2013], however, the precision loss patterns are independent of the chosen variant of context sensitivity, such as call-site sensitivity [Sharir and Pnueli 1981; Shivers 1991], object sensitivity [Milanova et al. 2005], and type sensitivity [Smaragdakis et al. 2011]. In this section, we introduce the three precision loss patterns and then describe three corresponding concrete examples (Sections 2.1ś2.3). This characterization of precision loss provides the conceptual foundation for Zipper to identify precision-critical methods as explained in Section 3. A context-insensitive analysis does not distinguish between different calls to a method but merges 1 2 3 4 5 6 7 Object m(Object o){ return o; } x1 = new A(); x2 = m(x1); y1 = new B(); y2 = m(y1); Fig. 1. Example of precision loss in context-insensitive analysis. the incoming abstract values (or points-to sets, in the case of pointer analysis) [Sharir and Pnueli 1981]. Figure 1 shows a simple example. If method m is analyzed context-insensitively, then the two objects are mixed together, so the analysis conservatively concludes that both x2 and y2 may point to both the A object and the B object. In contrast, a context-sensitive analysis would analyze m twice, corresponding to the two different call sites, and thereby conclude that x2 can only point to an A object and y2 can only point to a B object. The price of that extra precision is that the method needs to be analyzed multiple times, so context sensitivity should ideally only be applied when the precision gain outweighs the extra analysis time. To characterize the relevant value flows, we first introduce some terminology. Definition 2.1 (In and Out methods). Given a class C and a method M that is declared in C or inherited from C’s super-classes, if M contains one or more parameters then M is an In method of C, and if M’s return type is non-void then M is an Out method of C. (In the example in Figure 1, m is both an In and an Out method of the surrounding class.) Definition 2.2 (Object wrapping and unwrapping). If an object O is stored in a field of an object W (or in an array entry ofW , in caseW is an array), then O is wrapped into W . Conversely, if an object O is loaded from a field of an object W (or from an array entry of W in case W is an array), then O is unwrapped from W . (The simple example in Figure 1 contains no wrapping or unwrapping.) With these definitions in place, we can describe the three precision-loss patterns as different kinds of value flows, depicted in Figure 2. Definition 2.3 (Direct flow). If, in some execution of the program, an object O is passed as a parameter to an In method M1 of class C, and then flows (via a series of assignments, field Direct Flow Wrapped Flow Unwrapped Flow IN Method OUT Method Value flow via assignments, field load/store operations, or method calls/returns Objects Fig. 2. Three basic patterns of value flow that cause precision loss in context-insensitive analysis. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
141:4 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Direct Flow 1 class Person 15} O ② String name;String id; 16 /Usage Code (19)name1 name2 (24) void setName(String nm){ 17 void main(){ 4 this.name nm: 18 Person p1 new Person(); (3) updateID(): 19 String name1 new String("A"): this.name (4) 6 20 p1.setName(name1): 7 void updateID(){ 21 String id1 p1.getID(): newName (8) 8 String newName this.name: 22 9 this.id newName: 23 Person p2 new Person(); this.id (9) 101 24 String name2 new String("B"): 11 String getID(){ 25 p2.setName(name2): id (12) 12 String id this.id: 26 String id2 p2.getID(): 13 return id: 27 (21)1d1 1d2(26) 14 ①② ①② Fig.3.Example of direct flow.(The line number for each variable/field reference on the right-hand side is shown in parentheses.) load/store operations,method calls,or returns)to the return value of an Our method,M2,of the same class C,then we say the program has direct flow from Mi to M2.(The example in Figure 1 is a simple instance of this pattern.) Definition 2.4(Wrapped flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C and then flows to a store operation that wraps O into an object W,where W subsequently flows to the result of an Our method,M2,of the same class C,then we say the program has wrapped flow from M to M2.More generally,the wrapped flow pattern also covers value flow through multiple object wrapping steps,for example when W is itself wrapped into another object W',which flows to the return value of M2. Definition 2.5(Unwrapped flow).If,in some execution of the program,an object O is passed as a parameter to an IN method M of class C and then flows to a load operation that unwraps an object U from O,where U subsequently flows to the return value of an Our method,M2,of the same class C,then we say the program has unwrapped flow from M to M2.As in the previous definition, unwrapped flow also covers value flow through multiple object unwrapping steps. 2.1 Pattern 1:Direct Flow The setter and getter example shown in Figure 3 demonstrates how direct flow is an indication of precision loss for a context-insensitive analysis.The Person class provides methods setName and getID to modify a person's name and retrieve his or her ID.Whenever a person's name is modified, the ID is updated accordingly (line 5). After executing this code,id1 in line 21(resp.id2 in line 26)points to object 1in line 19(resp.2 in line 24)only.However,if the three methods of Person are analyzed using a context-insensitive pointer analysis,then id1 and id2 will both imprecisely point to objects 1 and 2.Let us examine how this imprecision is connected to the direct flow pattern. The right-hand side of Figure 3 illustrates how two objects1and 2,respectively pointed to by namel and name2,first flow from their creation sites in lines 19 and 24 to the parameter nm of the IN method setName in line 3,and then to id in line 12 through a series of store and load operations (line 4-line 8-line 9line 12),and finally out of the Our method getID to id1 and id2 in lines 21 and 26.Hence,by Definition 2.3,the red arrows in Figure 3 form a direct flow. Notice that with a context-insensitive analysis,objects 1and 2are merged in the same points-to set and further propagated according to this direct flow.In the analysis,the merged objects will Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:4 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis Direct Flow name1 1 2 1 2 class Person { 1 String name; String id; void setName(String nm) { this.name = nm; updateID(); } void updateID() { String newName = this.name; this.id = newName; } String getID() { String id = this.id; return id; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 } // Usage Code void main() { Person p1 = new Person(); String name1 = new String("A"); p1.setName(name1); String id1 = p1.getID(); Person p2 = new Person(); String name2 = new String("B"); p2.setName(name2); String id2 = p2.getID(); } 15 16 17 18 19 20 21 22 23 24 25 26 27 name2 nm this.name newName this.id id id1 id2 2 (24) (3) (4) (8) (9) (12) (21) (26) (19) Fig. 3. Example of direct flow. (The line number for each variable/field reference on the right-hand side is shown in parentheses.) load/store operations, method calls, or returns) to the return value of an Out method, M2, of the same class C, then we say the program has direct flow from M1 to M2. (The example in Figure 1 is a simple instance of this pattern.) Definition 2.4 (Wrapped flow). If, in some execution of the program, an object O is passed as a parameter to an In method M1 of class C and then flows to a store operation that wraps O into an object W , where W subsequently flows to the result of an Out method, M2, of the same class C, then we say the program has wrapped flow from M1 to M2. More generally, the wrapped flow pattern also covers value flow through multiple object wrapping steps, for example when W is itself wrapped into another object W ′ , which flows to the return value of M2. Definition 2.5 (Unwrapped flow). If, in some execution of the program, an object O is passed as a parameter to an In method M1 of class C and then flows to a load operation that unwraps an object U from O, where U subsequently flows to the return value of an Out method, M2, of the same class C, then we say the program has unwrapped flow from M1 to M2. As in the previous definition, unwrapped flow also covers value flow through multiple object unwrapping steps. 2.1 Pattern 1: Direct Flow The setter and getter example shown in Figure 3 demonstrates how direct flow is an indication of precision loss for a context-insensitive analysis. The Person class provides methods setName and getID to modify a person’s name and retrieve his or her ID. Whenever a person’s name is modified, the ID is updated accordingly (line 5). After executing this code, id1 in line 21 (resp. id2 in line 26) points to object 1 in line 19 (resp. 2 in line 24) only. However, if the three methods of Person are analyzed using a context-insensitive pointer analysis, then id1 and id2 will both imprecisely point to objects 1 and 2 . Let us examine how this imprecision is connected to the direct flow pattern. The right-hand side of Figure 3 illustrates how two objects 1 and 2 , respectively pointed to by name1 and name2, first flow from their creation sites in lines 19 and 24 to the parameter nm of the In method setName in line 3, and then to id in line 12 through a series of store and load operations (line 4 → line 8 → line 9 → line 12), and finally out of the Out method getID to id1 and id2 in lines 21 and 26. Hence, by Definition 2.3, the red arrows in Figure 3 form a direct flow. Notice that with a context-insensitive analysis, objects 1 and 2 are merged in the same points-to set and further propagated according to this direct flow. In the analysis, the merged objects will Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:5 Wrapped Flow 1 class Collection{ 18 Object next(){ ① (② 2 Object elem; 19 return this.next: (25)s1 s2(31) 3 void add(Object el){ 20 this.elem el: 21} (3) 22 //Usage Code Iterator iterator(){ 23 void main(){ this.elem (4) 7 Object e this.elem: 24 Collection c1 new Collection(): 8 Iterator itr 25 String s1 new String("A"): (7) new Iterator(e); 26 c1.add(s1): 9 return itr; 27 Iterator i1 c1.iterator(): (8)itr obi (14) 10 } 28 Object o1 i1.next(); 29 this next(③X②) 11} (15) 12 class Iterator 30 Collection c2 new Collection(): 13 Object next: 1 String s2 new String("B"): 17 (27)(33) 14 Iterator(Object obj) 32 c2.add(s2): ①回0回 1 this.next obj: Iterator 12 c2.iterator(); 16 34 Object o2 12.next(); 01 (28)(34) 17 35 ①@①② Fig.4.Example of wrapped flow. flow out of the Our method,causing id1 and id2 to point to spurious objects.Such imprecision will only get worse when some operations are further applied on idl and id2(not shown in this example),possibly polluting other parts of the program. One way to avoid the imprecision is to apply context sensitivity to the methods that participate in the direct flow.We consider these to be precision-critical methods,since analyzing just one of them context-insensitively will likely introduce imprecision.With a context-sensitive analysis(for most variants of context sensitivity),in Figure 3,all variables and field references along the direct flow will be analyzed separately.For example,object sensitivity will use the two allocation sites at lines 18 and 23 as contexts.Accordingly,the merged paths along this direct flow are separated by the two contexts,like unzipping a zipper-hence the name of our technique.A similar strategy of separating merged paths also applies to wrapped and unwrapped flows,as shown next. 2.2 Pattern 2:Wrapped Flow The collection and iterator example shown in Figure 4 demonstrates how the wrapped flow pattern yields precision loss for a context-insensitive analysis.To keep the example simple,the collection only stores one element,however the code pattern is directly analogous to realistic code,for arbitrarily-sized collections.Class Collection provides an add method to add an element to the collection and an iterator method to return an iterator that has a pointer,next,pointing to the collection element(as set in line 15).The element is passed as an argument to the newly created iterator(line 8),which establishes a connection between the collection and its iterator.Two objects 1(line 25)and (line 31)are stored in two different collections,c1(line 26)and c2 (line 32).The two objects are then accessed by the iterators of the collections(lines 28 and 34). After executing the code,o1 in line 28 (resp.o2 in line 34)points to object1(resp.only.How- ever,if any of the four methods of Collection and Iterator are analyzed context-insensitively,o1 and o2 will both imprecisely point to both objects 1and Let us examine how this imprecision is connected to the wrapped flow pattern. As shown on the right-hand side of Figure 4,similarly to the direct flow example in Figure 3, objects 1 and 2 flow into the IN method add of class Collection,and then further to lines 7, 8,and 14.Unlike a direct flow,the objects 1 and 2 do not directly flow out of the Our method Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:5 s1 s2 o1 o2 itr this.elem el Wrapped Flow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Object next() { return this.next; } } //Usage Code void main() { Collection c1 = new Collection(); String s1 = new String("A"); c1.add(s1); Iterator i1 = c1.iterator(); Object o1 = i1.next(); Collection c2 = new Collection(); String s2 = new String("B"); c2.add(s2); Iterator i2 = c2.iterator(); Object o2 = i2.next(); } class Collection { Object elem; void add(Object el) { this.elem = el; } Iterator iterator(){ Object e = this.elem; Iterator itr = new Iterator(e); return itr; } } class Iterator { Object next; Iterator(Object obj) { this.next = obj; } e obj this.next 1 2 i1 i2 1 2 1 2 1 2 1 2 1 2 (25) (31) (3) (4) (7) (14) (15) (33) (34) (8) (27) (28) Fig. 4. Example of wrapped flow. flow out of the Out method, causing id1 and id2 to point to spurious objects. Such imprecision will only get worse when some operations are further applied on id1 and id2 (not shown in this example), possibly polluting other parts of the program. One way to avoid the imprecision is to apply context sensitivity to the methods that participate in the direct flow. We consider these to be precision-critical methods, since analyzing just one of them context-insensitively will likely introduce imprecision. With a context-sensitive analysis (for most variants of context sensitivity), in Figure 3, all variables and field references along the direct flow will be analyzed separately. For example, object sensitivity will use the two allocation sites at lines 18 and 23 as contexts. Accordingly, the merged paths along this direct flow are separated by the two contexts, like unzipping a zipperÐhence the name of our technique. A similar strategy of separating merged paths also applies to wrapped and unwrapped flows, as shown next. 2.2 Pattern 2: Wrapped Flow The collection and iterator example shown in Figure 4 demonstrates how the wrapped flow pattern yields precision loss for a context-insensitive analysis. To keep the example simple, the collection only stores one element, however the code pattern is directly analogous to realistic code, for arbitrarily-sized collections. Class Collection provides an add method to add an element to the collection and an iterator method to return an iterator that has a pointer, next, pointing to the collection element (as set in line 15). The element is passed as an argument to the newly created iterator (line 8), which establishes a connection between the collection and its iterator. Two objects 1 (line 25) and 2 (line 31) are stored in two different collections, c1 (line 26) and c2 (line 32). The two objects are then accessed by the iterators of the collections (lines 28 and 34). After executing the code, o1 in line 28 (resp. o2 in line 34) points to object 1 (resp. 2 ) only. However, if any of the four methods of Collection and Iterator are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects 1 and 2 . Let us examine how this imprecision is connected to the wrapped flow pattern. As shown on the right-hand side of Figure 4, similarly to the direct flow example in Figure 3, objects 1 and 2 flow into the In method add of class Collection, and then further to lines 7, 8, and 14. Unlike a direct flow, the objects 1 and 2 do not directly flow out of the Out method Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
141:6 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis iterator of class Collection;instead,a wrapper Iterator object,(created on line 8)in which object 1or2 is stored,flows out of this Our method. Object wrapping (Definition 2.2)occurs in line 15:objects 1 and (pointed to by obj)are stored into the next field of the object pointed to by this,and this points to the receiver object of the constructor call in line 8,which is also pointed to by itr in line 8.As a wrapper object(that stores object 1or2)flows out of an Our method of the same class,by Definition 2.4,the solid blue arrows in Figure 4 form a wrapped flow. With a context-insensitive analysis,objects 1and are merged in the same points-to set and further propagated according to this wrapped flow.However,unlike a direct flow,imprecision is not introduced until the access operation(e.g.,the next calls in lines 28 and 34)is applied on the flowing-out wrapper object,causing variables o1 and o2 to point to spurious objects.The wrapper objects carry the flowing-in objects,which originate from outside the class,so context sensitivity can separate the merged objects all along their flow through the Collection class. The example also helps illustrate some subtleties of the flow definitions.Note that the precision loss patterns are expressed relative to a class:for each of the three patterns,the IN method and the Our method must be in the same class,although the value flow may involve other classes,as described in Definitions 2.3-2.5.Intuitively,if the precision loss flows introduced in each class (through method calls on the objects of the class)could be identified and then avoided by use of context sensitivity,the imprecision of the whole program could be accordingly controlled via such a divide-and-conquer scheme.In addition,this design choice enables an efficient and elegant algorithm for identifying occurrences of the patterns in a given program,by considering each class one by one,as explained in Section 3. Therefore,the dashed arrows(bottom right of Figure 4)formed by calling the next method in lines 28 and 34,do not belong to the wrapped flow,because the calls happen after the wrapper objects flow out from the Our method of class Collection.Thus,as explained in Section 2.1, only methods add and iterator (in Collection)and the constructor Iterator (in Iterator)are included in the wrapped flow and thus considered precision-critical.However,if we consider IN and Our methods from the point of view of class Iterator,then method next is also precision-critical, since it is involved in a direct flow together with the Iterator constructor,much like the setter and getter methods in Section 2.1. 2.3 Pattern 3:Unwrapped Flow We use a synchronized box example(based on classes SynchronizedSet and Set in the JDK but heavily simplified)to illustrate an unwrapped flow,as shown in Figure 5.Class SyncBox encapsulates class Box by providing synchronization in the encapsulating method getItem(lines 6-12).Two objects 1 and 2 are stored into two Box objects(represented by and pointed to by b1 and b2 in lines 27 and 32),which are further stored into two SyncBox objects(lines 28 and 33). After executing the code,o1 in line 29(resp.o2 in line 34)points to object 1(resp.only. However,if any of the four methods of classes SyncBox and Box are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects 1and Let us examine how this imprecision is connected to the unwrapped flow pattern. As shown on the right-hand side of Figure 5,similar to the direct flow in Figure 3,two Box objects 1and 2(pointed to by b1 and b2,respectively)flow into the body of class SyncBox through its constructor,which acts as an IN method,and then further to b in line 8.Unlike in a direct flow, the flowing-in objects 1 and 2 do not flow out of the Our method getItem of class SyncBox; instead,the two unwrapped objects 1and (respectively stored inand 2)are the ones that flow out of this Our method. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:6 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis iterator of class Collection; instead, a wrapper Iterator object, , (created on line 8) in which object 1 or 2 is stored, flows out of this Out method. Object wrapping (Definition 2.2) occurs in line 15: objects 1 and 2 (pointed to by obj) are stored into the next field of the object pointed to by this, and this points to the receiver object of the constructor call in line 8, which is also pointed to by itr in line 8. As a wrapper object (that stores object 1 or 2 ) flows out of an Out method of the same class, by Definition 2.4, the solid blue arrows in Figure 4 form a wrapped flow. With a context-insensitive analysis, objects 1 and 2 are merged in the same points-to set and further propagated according to this wrapped flow. However, unlike a direct flow, imprecision is not introduced until the access operation (e.g., the next calls in lines 28 and 34) is applied on the flowing-out wrapper object, causing variables o1 and o2 to point to spurious objects. The wrapper objects carry the flowing-in objects, which originate from outside the class, so context sensitivity can separate the merged objects all along their flow through the Collection class. The example also helps illustrate some subtleties of the flow definitions. Note that the precision loss patterns are expressed relative to a class: for each of the three patterns, the In method and the Out method must be in the same class, although the value flow may involve other classes, as described in Definitions 2.3Ð2.5. Intuitively, if the precision loss flows introduced in each class (through method calls on the objects of the class) could be identified and then avoided by use of context sensitivity, the imprecision of the whole program could be accordingly controlled via such a divide-and-conquer scheme. In addition, this design choice enables an efficient and elegant algorithm for identifying occurrences of the patterns in a given program, by considering each class one by one, as explained in Section 3. Therefore, the dashed arrows (bottom right of Figure 4) formed by calling the next method in lines 28 and 34, do not belong to the wrapped flow, because the calls happen after the wrapper objects flow out from the Out method of class Collection. Thus, as explained in Section 2.1, only methods add and iterator (in Collection) and the constructor Iterator (in Iterator) are included in the wrapped flow and thus considered precision-critical. However, if we consider In and Out methods from the point of view of class Iterator, then method next is also precision-critical, since it is involved in a direct flow together with the Iterator constructor, much like the setter and getter methods in Section 2.1. 2.3 Pattern 3: Unwrapped Flow We use a synchronized box example (based on classes SynchronizedSet and Set in the JDK but heavily simplified) to illustrate an unwrapped flow, as shown in Figure 5. Class SyncBox encapsulates class Box by providing synchronization in the encapsulating method getItem (lines 6ś12). Two objects 1 and 2 are stored into two Box objects (represented by and pointed to by b1 and b2 in lines 27 and 32), which are further stored into two SyncBox objects (lines 28 and 33). After executing the code, o1 in line 29 (resp. o2 in line 34) points to object 1 (resp. 2 ) only. However, if any of the four methods of classes SyncBox and Box are analyzed context-insensitively, o1 and o2 will both imprecisely point to both objects 1 and 2 . Let us examine how this imprecision is connected to the unwrapped flow pattern. As shown on the right-hand side of Figure 5, similar to the direct flow in Figure 3, two Box objects 1 and 2 (pointed to by b1 and b2, respectively) flow into the body of class SyncBox through its constructor, which acts as an In method, and then further to b in line 8. Unlike in a direct flow, the flowing-in objects 1 and 2 do not flow out of the Out method getItem of class SyncBox; instead, the two unwrapped objects 1 and 2 (respectively stored in 1 and 2 ) are the ones that flow out of this Out method. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:7 Unwrapped Flow 1 class SyncBox 19 Object getItem()( Object box: 28 Object it this.item: (26) s1 s2 (31) 3 SyncBox(Boxbox){ 21 return it; this.box box: 22 (27) b1 b2 32) 23 box (3) 6 Object getItem(){ 24 //Usage Code synchronized(this){ 25 void main(){ this.box (4) 8 Box b this.box: 26 String s1 new String("A"): 9 object o b.getItem(): 27 Box b1 new Box(s1); 回 (8) 10 return o: 28 SyncBox sb1 new SyncBox(b1): 11 29 Object o1 sb1.getItem(): (20) Eh因1tem①② 12 30 13} 31 String s2 new String("B"): 14 class Box{ 32 Box b2 new Box(s2) ǎ (20) 15 Object item: 33 SyncBox sb2 new SyncBox(b2); (9) 16 Box(Object item){ 34 Object o2 sb2.getItem() 1) this.item item; 351 (29)01 02(34) 18 ①② ①@ Fig.5.Example of unwrapped flow. Object unwrapping(Definition 2.2)occurs in line 20,as a result of the call in line 9:the Box objects and 2pointed to by b)are the receiver objects of this virtual call,and this in line 20 will also point to them during pointer analysis.The load operation in line 20 lets the unwrapped objects(and 2)flow to it (line 20),and finally to o1 and o2 (lines 29 and 34)through consecutive method return values(line 21line 9 and then line 10-lines 29 and 34).As the unwrapped objects(retrieved from the flowing-in objects)flow out of an Our method of the same class,by Definition 2.5,the green arrows(in Figure 5)form an unwrapped flow. We can observe that objects and 2](and hence the unwrapped objects1and 2 they contain) are merged in the same points-to set and further propagated according to this unwrapped flow. Although the flowing-in objects do not flow out of an Our method of the same class to introduce imprecision,the unwrapped objects do,causing the receiving variables,in this case o1 and o2(lines 29 and 34),to point to spurious objects. Note that the program points where the unwrapped objects are stored in the flowing-in objects (lines 26-27 and 31-32)do not belong in the unwrapped flow,as the objects have not yet entered the IN method of class SyncBox.Thus,only constructor SyncBox,method getItem(in SyncBox), and method getItem(in Box)belong in the unwrapped flow and are considered precision-critical. However,as in the explanation of the wrapped flow example in Section 2.2,if we consider IN and Our methods from the point of view of class Box,its constructor,Box,will still be analyzed context-sensitively as it is part of a direct flow (together with the getItem method in Box). Finally,some imprecision cannot be described by one pattern alone but only by combinations. Consider the example of an object W that flows into an IN method,where an object O is unwrapped from W.Then O is wrapped into another wrapper object,W',which flows out from an Our method of the same class.Imprecision may arise in this case,and although none of the three basic flow patterns in isolation match this flow,it is captured by a combination of unwrapped and wrapped flows.ZIPPER identifies not only occurrences of the three patterns but also such combinations.Our experiments(Section 4)show that the patterns and their combinations account for essentially all the imprecision that may appear in context-insensitive analysis. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:7 o1 o2 (34) b1 s1 s2 b this.box box Unwrapped Flow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 this.item class SyncBox { Object box; SyncBox(Box box) { this.box = box; } Object getItem() { synchronized(this) { Box b = this.box; Object o = b.getItem(); return o; } } } class Box { Object item; Box(Object item) { this.item = item; } //Usage Code void main() { String s1 = new String("A"); Box b1 = new Box(s1); SyncBox sb1 = new SyncBox(b1); Object o1 = sb1.getItem(); String s2 = new String("B"); Box b2 = new Box(s2); SyncBox sb2 = new SyncBox(b2); Object o2 = sb2.getItem(); } Object getItem() { Object it = this.item; return it; } } b2 it o (26) (31) (27) (32) (3) (4) (8) (20) (20) (9) (29) 1 2 1 2 1 2 1 2 1 2 Fig. 5. Example of unwrapped flow. Object unwrapping (Definition 2.2) occurs in line 20, as a result of the call in line 9: the Box objects ( 1 and 2 pointed to by b) are the receiver objects of this virtual call, and this in line 20 will also point to them during pointer analysis. The load operation in line 20 lets the unwrapped objects ( 1 and 2 ) flow to it (line 20), and finally to o1 and o2 (lines 29 and 34) through consecutive method return values (line 21 → line 9 and then line 10 → lines 29 and 34). As the unwrapped objects (retrieved from the flowing-in objects) flow out of an Out method of the same class, by Definition 2.5, the green arrows (in Figure 5) form an unwrapped flow. We can observe that objects 1 and 2 (and hence the unwrapped objects 1 and 2 they contain) are merged in the same points-to set and further propagated according to this unwrapped flow. Although the flowing-in objects do not flow out of an Out method of the same class to introduce imprecision, the unwrapped objects do, causing the receiving variables, in this case o1 and o2 (lines 29 and 34), to point to spurious objects. Note that the program points where the unwrapped objects are stored in the flowing-in objects (lines 26ś27 and 31ś32) do not belong in the unwrapped flow, as the objects have not yet entered the In method of class SyncBox. Thus, only constructor SyncBox, method getItem (in SyncBox), and method getItem (in Box) belong in the unwrapped flow and are considered precision-critical. However, as in the explanation of the wrapped flow example in Section 2.2, if we consider In and Out methods from the point of view of class Box, its constructor, Box, will still be analyzed context-sensitively as it is part of a direct flow (together with the getItem method in Box). Finally, some imprecision cannot be described by one pattern alone but only by combinations. Consider the example of an objectW that flows into an In method, where an object O is unwrapped fromW . Then O is wrapped into another wrapper object,W ′ , which flows out from an Out method of the same class. Imprecision may arise in this case, and although none of the three basic flow patterns in isolation match this flow, it is captured by a combination of unwrapped and wrapped flows. Zipper identifies not only occurrences of the three patterns but also such combinations. Our experiments (Section 4) show that the patterns and their combinations account for essentially all the imprecision that may appear in context-insensitive analysis. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
141:8 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis ZIPPER Object Flow Graph Context-Insensitive OFG Precision Flow Graph Graph (OFG) hich meth Context-Sensitive Reachability Pointer Analysis (PFG) Construction Construction inter Analysis on PFG Fig.6.Overview of ZIPPER. 3 ZIPPER This section introduces ZIPPER:our approach to identifying precision-critical methods based on the precision loss patterns of Section 2.Even if the patterns successfully characterize the main causes of precision loss in context-insensitive analysis,two challenges remain.First,the precision loss patterns are defined in dynamic execution terms,while ZIPPER has to capture the potential for these patterns using static information.Second,useful static information has to be computable from a mere context-insensitive analysis,in order to guide a context-sensitive one.That is,the potential for precision loss has to be detected from an analysis that already exhibits this loss.The ZIPPER approach is defined with these goals in mind,and manages to make context-sensitive pointer analysis run faster while preserving most of its precision. We present the overview of ZIPPER in Section 3.1 and the concepts of object flow graphs and precision flow graphs in Sections 3.2 and 3.3,respectively. 3.1 Overview of ZIPPER The goal of ZIPPER is to efficiently recognize the precision-critical methods in a given program.The central part of ZIPPER is the notion of precision flow graphs(PFGs)that allow us to express all three precision loss patterns in a uniform way,in the sense that each kind of flow can be represented by a path in a PFG.Intuitively,a PFG is much like the right-hand side graphs of Figures 3-5,but replacing the field expressions by the abstract objects and their fields.Via the PFGs,we can convert the problem of identifying precision-critical methods to an abstract graph computation.All methods that are involved in one of the three kinds of flows can be efficiently extracted by solving a simple graph reachability problem on the PFGs. Constructing the PFGs requires information about how objects flow in the program.We leverage the concept of object flow graphs(OFGs)[Tonella and Potrich 2005]as explained in Section 3.2. The OFG for a program allows tracing the flow of objects through local assignments,calls and returns,and field load and store operations in the program.Therefore,it can naturally express the direct flow pattern,in a static analysis that approximates the dynamic flows of objects.However, the original OFG formulation does not represent wrapped and unwrapped flows,thus we cannot directly use it to identify precision-critical methods.For this reason,we build the PFGs on top of the OFG to uniformly express all three precision loss patterns. Figure 6 shows the overall structure of ZIPPER,which itself contains three components:the object flow graph construction,the precision flow graph construction,and the graph reachability computation.First,a fast but imprecise context-insensitive pointer analysis is performed as a pre-analysis for ZIPPER.To simplify the discussion,we assume that the pre-analysis abstracts objects by their allocation-sites [Chase et al.1990],but our technique also works for other object abstractions [Kanvar and Khedker 2016].This pre-analysis provides the information for the OFG construction,in the form of a relation pt(v)that captures the points-to set for each variable v.Based on the OFG,a PFG is constructed for each class.Afterwards,ZIPPER computes graph reachability on each PFG to determine which methods are precision-critical.Finally,a selective context-sensitive Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:8 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis Object Flow Graph (OFG) Construction Precision Flow Graph (PFG) Construction Graph Reachability on PFG OFG PFG ZIPPER Context-Insensitive Pointer Analysis Context-Sensitive Pointer Analysis which methods need contexts Fig. 6. Overview of Zipper. 3 ZIPPER This section introduces Zipper: our approach to identifying precision-critical methods based on the precision loss patterns of Section 2. Even if the patterns successfully characterize the main causes of precision loss in context-insensitive analysis, two challenges remain. First, the precision loss patterns are defined in dynamic execution terms, while Zipper has to capture the potential for these patterns using static information. Second, useful static information has to be computable from a mere context-insensitive analysis, in order to guide a context-sensitive one. That is, the potential for precision loss has to be detected from an analysis that already exhibits this loss. The Zipper approach is defined with these goals in mind, and manages to make context-sensitive pointer analysis run faster while preserving most of its precision. We present the overview of Zipper in Section 3.1 and the concepts of object flow graphs and precision flow graphs in Sections 3.2 and 3.3, respectively. 3.1 Overview of Zipper The goal of Zipper is to efficiently recognize the precision-critical methods in a given program. The central part of Zipper is the notion of precision flow graphs (PFGs) that allow us to express all three precision loss patterns in a uniform way, in the sense that each kind of flow can be represented by a path in a PFG. Intuitively, a PFG is much like the right-hand side graphs of Figures 3ś5, but replacing the field expressions by the abstract objects and their fields. Via the PFGs, we can convert the problem of identifying precision-critical methods to an abstract graph computation. All methods that are involved in one of the three kinds of flows can be efficiently extracted by solving a simple graph reachability problem on the PFGs. Constructing the PFGs requires information about how objects flow in the program. We leverage the concept of object flow graphs (OFGs) [Tonella and Potrich 2005] as explained in Section 3.2. The OFG for a program allows tracing the flow of objects through local assignments, calls and returns, and field load and store operations in the program. Therefore, it can naturally express the direct flow pattern, in a static analysis that approximates the dynamic flows of objects. However, the original OFG formulation does not represent wrapped and unwrapped flows, thus we cannot directly use it to identify precision-critical methods. For this reason, we build the PFGs on top of the OFG to uniformly express all three precision loss patterns. Figure 6 shows the overall structure of Zipper, which itself contains three components: the object flow graph construction, the precision flow graph construction, and the graph reachability computation. First, a fast but imprecise context-insensitive pointer analysis is performed as a pre-analysis for Zipper. To simplify the discussion, we assume that the pre-analysis abstracts objects by their allocation-sites [Chase et al. 1990], but our technique also works for other object abstractions [Kanvar and Khedker 2016]. This pre-analysis provides the information for the OFG construction, in the form of a relation pt(v) that captures the points-to set for each variable v. Based on the OFG, a PFG is constructed for each class. Afterwards, Zipper computes graph reachability on each PFG to determine which methods are precision-critical. Finally, a selective context-sensitive Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:9 Variable node o.f Object field node Program Object Flow Graph Local assignment b=a; a a b; ① a ① (a(p) Assignments in b c.m(a) c.f a; ② ② method calls/returns m(p){return r:} (C→mthi3 d c; ③ (r b) c.f d; ④ ④ Field load b=a.f; where o e pt(a) @.f -b e=d.f;⑤ Field store a.f =b; b 0.f assuming oe pt(c)and o E pt(d) where o e pt(a) Fig.7.Object flow graph construction,with an example. pointer analysis is performed,guided by ZIPPER's results,so that the pointer analysis applies context sensitivity to only the precision-critical methods reported by ZIPPER. 3.2 Object Flow Graphs The object flow graph(OFG)of a program,as in its original form by Tonella and Potrich [2005],is a directed graph that expresses how objects flow in the program.The nodes in the OFG represent program pointers,which can point to objects,and the edges represent basic object flow among the pointers.More precisely,the OFG contains a node for each variable in the program and for each field of each abstract object.Objects are abstracted in the same way as in the pre-analysis, as described in Section 3.1:we here assume allocation-site abstraction is being used,which is the most common choice,but the technique also works for other choices.An edge ab in the OFG means that the objects pointed by pointer a may flow to(and also be pointed to by)pointer b. Another way to view the OFG is that it is the subset constraint graph in an Andersen-style points-to analysis [Andersen 1994;Sridharan et al.2013]. Tonella and Potrich [2005]propose to build the OFG with more precision by cloning the variables of a method for each of its receiver objects(conceptually like object sensitivity [Milanova et al. 2002,2005]),so that the flow involved in different receiver objects of the same method can be distinguished.However,this is unnecessary for ZIPPER,since it builds the OFG based on the results of a context-insensitive analysis,and all flow queries are done at the class level instead of the object level,as explained in Section 2.Therefore,we perform no such cloning. Due to the close connection between OFGs and Andersen-style analysis,constructing the OFG is trivial,based on the points-to relation pt(v)provided by the context-insensitive pre-analysis. Figure 7 illustrates this construction.The left-hand side of Figure 7 lists(from left to right)the four basic object flows,the related Java statements that induce the flows,and the corresponding graph edges in the OFG. Consider the code fragment and its corresponding OFG on the right-hand side of Figure 7.There are five statements labeled 1-5,and each statement causes an edge(with the same label)to be added to the OFG.With the OFG,the object flow information can be directly obtained simply by checking graph reachability without the need to explicitly track alias information among variables or field accesses.For example,variable e is reachable from b in the OFG,which means that the objects pointed to by b may flow to (and also be pointed to by)e. As a result,direct flows can be expressed naturally by the paths in the OFG,however,that is not the case for wrapped and unwrapped flows.In the next section,we describe how to augment the OFG to express all three kinds of flows. Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
Precision-Guided Context Sensitivity for Pointer Analysis 141:9 Local assignment Assignments in method calls/returns Field load Field store b = a; b = c.m(a); m(p){ return r;} … b = a.f; a.f = b; p a b r a b b b v Variable node o.f Object field node Program Object Flow Graph a = b; c.f = a; d = c; c.f = d; e = d.f; a 1 2 3 4 5 b d e c 1 2 3 4 5 c mthis where o ∈ pt(a) o.f o.f o.f where o ∈ pt(a) assuming o ∈ pt(c) and o ∈ pt(d) Fig. 7. Object flow graph construction, with an example. pointer analysis is performed, guided by Zipper’s results, so that the pointer analysis applies context sensitivity to only the precision-critical methods reported by Zipper. 3.2 Object Flow Graphs The object flow graph (OFG) of a program, as in its original form by Tonella and Potrich [2005], is a directed graph that expresses how objects flow in the program. The nodes in the OFG represent program pointers, which can point to objects, and the edges represent basic object flow among the pointers. More precisely, the OFG contains a node for each variable in the program and for each field of each abstract object. Objects are abstracted in the same way as in the pre-analysis, as described in Section 3.1: we here assume allocation-site abstraction is being used, which is the most common choice, but the technique also works for other choices. An edge a→b in the OFG means that the objects pointed by pointer a may flow to (and also be pointed to by) pointer b. Another way to view the OFG is that it is the subset constraint graph in an Andersen-style points-to analysis [Andersen 1994; Sridharan et al. 2013]. Tonella and Potrich [2005] propose to build the OFG with more precision by cloning the variables of a method for each of its receiver objects (conceptually like object sensitivity [Milanova et al. 2002, 2005]), so that the flow involved in different receiver objects of the same method can be distinguished. However, this is unnecessary for Zipper, since it builds the OFG based on the results of a context-insensitive analysis, and all flow queries are done at the class level instead of the object level, as explained in Section 2. Therefore, we perform no such cloning. Due to the close connection between OFGs and Andersen-style analysis, constructing the OFG is trivial, based on the points-to relation pt(v) provided by the context-insensitive pre-analysis. Figure 7 illustrates this construction. The left-hand side of Figure 7 lists (from left to right) the four basic object flows, the related Java statements that induce the flows, and the corresponding graph edges in the OFG. Consider the code fragment and its corresponding OFG on the right-hand side of Figure 7. There are five statements labeled 1 ś 5 , and each statement causes an edge (with the same label) to be added to the OFG. With the OFG, the object flow information can be directly obtained simply by checking graph reachability without the need to explicitly track alias information among variables or field accesses. For example, variable e is reachable from b in the OFG, which means that the objects pointed to by b may flow to (and also be pointed to by) e. As a result, direct flows can be expressed naturally by the paths in the OFG, however, that is not the case for wrapped and unwrapped flows. In the next section, we describe how to augment the OFG to express all three kinds of flows. Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018
141:10 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Algorithm 1:PFGBUILDER OFG (Object Flow Graph) Input c (Input class) (Set of statements in the program) Output:PFGe (Precision Flow Graph for class c) 1PFGc←-{},VisitedNodes←{h,WUEdges←-{} 2 foreach m∈INe do foreach parameter p of m do LDFs(Np)where Np is the OFG node for p 5 return PFGc 6 Function DFs(N) 1 if NE VisitedNodes then 8 return 9 add N to VisitedNodes 10 if N is a variable node Na then 11 foreach b a.fes do /Handling unwrapped flow 12 Ladd Na一to WUEdges 13 foreach b.f aes do /Handling wrapped flow 14 foreach o∈pt(b)do 15 Ladd N No]to WUEdges 16 foreach N→N'∈OFGU WUEdges do 17 addN→W'to PFGc DEs(N) 3.3 Precision Flow Graphs and Graph Reachability We first explain how to construct precision flow graphs(PFGs)and then how to identify precision- critical methods by performing graph reachability on each PFG. Precision Flow Graph Construction.As explained in Section 3.2,one OFG is built for the entire program.Since the PFGs serve to express the three kinds of precision loss patterns,which are all defined relative to a class,as explained in Section 2,we construct one PFG for each class in the program.As the OFG can already describe direct flow(Section 3.2),the task of building the PFG is to add edges that can express the other two kinds of flows:wrapped and unwrapped flows. Algorithm 1(PFGBUILDER)shows how to build PFGe for a given class c.For simplicity,we represent the PFG and the OFG by their sets of graph edges,and the graph nodes are implicitly those that appear in the edge sets. Three sets are initialized to empty sets in line 1:the PFG edges,the set of visited nodes,and WUEdges,which denotes a set of extra edges for wrapped and unwrapped flows.As all three kinds of flows begin from the parameters of an IN method(see Section 2),the algorithm starts by iterating through those methods(lines 2-3,where INe denotes the set of IN methods of the input class c). Function DFs(line 6)traverses the input OFG and adds the edges for wrapped and unwrapped flows.As a result,the returned PFGe(line 5)includes all the nodes that can be reached from each parameter of IN methods of c,through direct,wrapped,and unwrapped flows,or combinations of these.Specifically,unwrapped and wrapped flows are handled in lines 11-12 and lines 13-15, respectively,by adding the corresponding edges to WUEdges.Finally,the generated PFG includes Proc.ACM Program.Lang.,Vol.2,No.OOPSLA,Article 141.Publication date:November 2018
141:10 Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis Algorithm 1: PfgBuilder Input : OFG (Object Flow Graph) c (Input class) S (Set of statements in the program) Output : PFGc (Precision Flow Graph for class c) 1 PFGc ← { }, VisitedNodes ← { }, WUEdges ← { } 2 foreach m ∈ Inc do 3 foreach parameter p of m do 4 Dfs(Np ) where Np is the OFG node for p 5 return PFGc 6 Function Dfs(N) 7 if N ∈ VisitedNodes then 8 return 9 add N to VisitedNodes 10 if N is a variable node Na then 11 foreach b = a.f ∈ S do // Handling unwrapped flow 12 add Na → Nb to WUEdges 13 foreach b.f = a ∈ S do // Handling wrapped flow 14 foreach o ∈ pt(b) do 15 add Na → N[o] to WUEdges 16 foreach N → N ′ ∈ OFG ∪ WUEdges do 17 add N → N ′ to PFGc 18 Dfs(N ′ ) 3.3 Precision Flow Graphs and Graph Reachability We first explain how to construct precision flow graphs (PFGs) and then how to identify precisioncritical methods by performing graph reachability on each PFG. Precision Flow Graph Construction. As explained in Section 3.2, one OFG is built for the entire program. Since the PFGs serve to express the three kinds of precision loss patterns, which are all defined relative to a class, as explained in Section 2, we construct one PFG for each class in the program. As the OFG can already describe direct flow (Section 3.2), the task of building the PFG is to add edges that can express the other two kinds of flows: wrapped and unwrapped flows. Algorithm 1 (PfgBuilder) shows how to build PFGc for a given classc. For simplicity, we represent the PFG and the OFG by their sets of graph edges, and the graph nodes are implicitly those that appear in the edge sets. Three sets are initialized to empty sets in line 1: the PFG edges, the set of visited nodes, and WUEdges, which denotes a set of extra edges for wrapped and unwrapped flows. As all three kinds of flows begin from the parameters of an In method (see Section 2), the algorithm starts by iterating through those methods (lines 2ś3, where Inc denotes the set of In methods of the input class c). Function Dfs (line 6) traverses the input OFG and adds the edges for wrapped and unwrapped flows. As a result, the returned PFGc (line 5) includes all the nodes that can be reached from each parameter of In methods of c, through direct, wrapped, and unwrapped flows, or combinations of these. Specifically, unwrapped and wrapped flows are handled in lines 11ś12 and lines 13ś15, respectively, by adding the corresponding edges to WUEdges. Finally, the generated PFG includes Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 141. Publication date: November 2018