Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity TIAN TAN,Nanjing University,China YUE LI',Nanjing University,China XIAOXING MA,Nanjing University,China CHANG XU,Nanjing University,China YANNIS SMARAGDAKIS,University of Athens,Greece Traditional context-sensitive pointer analysis is hard to scale for large and complex Java programs.To address this issue,a series of selective context-sensitivity approaches have been proposed and exhibit promising results. In this work,we move one step further towards producing highly-precise pointer analyses for hard-to-analyze Java programs by presenting the Unity-Relay framework,which takes selective context sensitivity to the next 147 level.Briefly,Unity-Relay is a one-two punch:given a set of different selective context-sensitivity approaches, say S=S1,...,Sn,Unity-Relay first provides a mechanism(called Unity)to combine and maximize the precision of all components of S.When Unity fails to scale,Unity-Relay offers a scheme(called Relay)to pass and accumulate the precision from one approach Si in S to the next,Si+1,leading to an analysis that is more precise than all approaches in S. As a proof-of-concept,we instantiate Unity-Relay into a tool called BATON and extensively evaluate it on a set of hard-to-analyze Java programs,using general precision metrics and popular clients.Compared with the state of the art,BATON achieves the best precision for all metrics and clients for all evaluated programs. The difference in precision is often dramatic-up to 71%of alias pairs reported by previously-best algorithms are found to be spurious and eliminated. CCS Concepts:.Theory of computation-Program analysis. Additional Key Words and Phrases:Pointer Analysis,Alias Analysis,Context Sensitivity,Java ACM Reference Format: Tian Tan,Yue Li,Xiaoxing Ma,Chang Xu,and Yannis Smaragdakis.2021.Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity.Proc.ACM Program.Lang.5,OOPSLA, Article 147(October 2021),27 pages.https://doi.org/10.1145/3485524 1 INTRODUCTION Pointer analysis is important for an array of real-world applications such as bug detection [Chandra et al.2009;Naik et al.2006],security analysis [Arzt et al.2014;Livshits and Lam 2005],program verification [Fink et al.2008;Pradel et al.2012]and program understanding [Li et al.2016;Sridharan "Corresponding author Authors'addresses:Tian Tan,State Key Laboratory for Novel Software Technology,Nanjing University,China,tiantan@ nju.edu.cn;Yue Li,State Key Laboratory for Novel Software Technology,Nanjing University,China,yueli@nju.edu.cn; Xiaoxing Ma,State Key Laboratory for Novel Software Technology,Nanjing University,China,xxm@nju.edu.cn;Chang Xu, State Key Laboratory for Novel Software Technology,Nanjing University,China,changxu@nju.edu.cn;Yannis Smaragdakis, Department of Informatics and Telecommunications,University of Athens,Greece,yannis@smaragd.org. ©0 Y This work is licensed under a Creative Commons Attribution 4.0 International License. 2021 Copyright held by the owner/author(s). 2475-1421/2021/10-ART147 https:/doi.org/10.1145/3485524 Proc.ACM Program.Lang.,Vol.5,No.OOPSLA,Article 147.Publication date:October 2021
147 Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity TIAN TAN, Nanjing University, China YUE LI∗ , Nanjing University, China XIAOXING MA, Nanjing University, China CHANG XU, Nanjing University, China YANNIS SMARAGDAKIS, University of Athens, Greece Traditional context-sensitive pointer analysis is hard to scale for large and complex Java programs. To address this issue, a series of selective context-sensitivity approaches have been proposed and exhibit promising results. In this work, we move one step further towards producing highly-precise pointer analyses for hard-to-analyze Java programs by presenting the Unity-Relay framework, which takes selective context sensitivity to the next level. Briefly, Unity-Relay is a one-two punch: given a set of different selective context-sensitivity approaches, say 𝑆 = 𝑆1, . . . , 𝑆𝑛, Unity-Relay first provides a mechanism (called Unity) to combine and maximize the precision of all components of 𝑆. When Unity fails to scale, Unity-Relay offers a scheme (called Relay) to pass and accumulate the precision from one approach 𝑆𝑖 in 𝑆 to the next, 𝑆𝑖+1, leading to an analysis that is more precise than all approaches in 𝑆. As a proof-of-concept, we instantiate Unity-Relay into a tool called Baton and extensively evaluate it on a set of hard-to-analyze Java programs, using general precision metrics and popular clients. Compared with the state of the art, Baton achieves the best precision for all metrics and clients for all evaluated programs. The difference in precision is often dramaticÐup to 71% of alias pairs reported by previously-best algorithms are found to be spurious and eliminated. CCS Concepts: • Theory of computation → Program analysis. Additional Key Words and Phrases: Pointer Analysis, Alias Analysis, Context Sensitivity, Java ACM Reference Format: Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis. 2021. Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity. Proc. ACM Program. Lang. 5, OOPSLA, Article 147 (October 2021), 27 pages. https://doi.org/10.1145/3485524 1 INTRODUCTION Pointer analysis is important for an array of real-world applications such as bug detection [Chandra et al. 2009; Naik et al. 2006], security analysis [Arzt et al. 2014; Livshits and Lam 2005], program verification [Fink et al. 2008; Pradel et al. 2012] and program understanding [Li et al. 2016; Sridharan ∗Corresponding author Authors’ addresses: Tian Tan, State Key Laboratory for Novel Software Technology, Nanjing University, China, tiantan@ nju.edu.cn; Yue Li, State Key Laboratory for Novel Software Technology, Nanjing University, China, yueli@nju.edu.cn; Xiaoxing Ma, State Key Laboratory for Novel Software Technology, Nanjing University, China, xxm@nju.edu.cn; Chang Xu, State Key Laboratory for Novel Software Technology, Nanjing University, China, changxu@nju.edu.cn; Yannis Smaragdakis, Department of Informatics and Telecommunications, University of Athens, Greece, yannis@smaragd.org. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2021 Copyright held by the owner/author(s). 2475-1421/2021/10-ART147 https://doi.org/10.1145/3485524 Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021. This work is licensed under a Creative Commons Attribution 4.0 International License
147:2 Tian Tan,Yue Li,Xiaoxing Ma,Chang Xu,and Yannis Smaragdakis et al.2007].A more precise pointer analysis is always favored by its clients as it implies,e.g.,fewer false alarms of bugs and vulnerabilities,more accurate code navigation,or potential for optimization. Context sensitivity has been demonstrated to be a major scheme for improving precision of Java pointer analysis [Fegade and Wimmer 2020;Feng et al.2015;Kanvar and Khedker 2016;Lhotak and Hendren 2008;Lhotak and Hendren 2006;Milanova et al.2002,2005;Tan et al.2017;Thakur and Nandivada 2019,2020;Thiessen and Lhotak 2017;Whaley and Lam 2004;Xu and Rountev 2008].With years of development,traditional context sensitivity performs well for small/medium programs,e.g.,2obj(object-sensitive pointer analysis with context length being two)can analyze most DaCapo benchmarks rapidly and precisely [Kastrinis and Smaragdakis 2013;Smaragdakis et al.2011];however,it is hard for it to scale to large and complex Java programs with good precision [Smaragdakis et al.2014].(An analysis not scaling means that it blows up in complexity and does not terminate even under very high time limits.) To address this problem,a series of research work [Hassanshahi et al.2017;Jeon et al.2019; Jeong et al.2017;Li et al.2018a,b,2020;Lu and Xue 2019;Minseok Jeon and Oh 2020;Oh et al.2014, 2015;Smaragdakis et al.2014;Wei and Ryder 2015]propose to apply context sensitivity selectively in the sense that contexts no longer apply to all methods,while context elements(e.g.,objects, types,call-sites)and context lengths may vary for different selected methods.Other,non-selected, methods are treated context-insensitively,in order to enable good scalability. Conversely,to gain better precision,selective context-sensitivity approaches usually have to sacrifice efficiency by allowing more methods to be analyzed context-sensitively,sometimes with longer context lengths or more precise but heavier context elements.In other words,these analyses need to make careful precision and efficiency trade-offs,where one more step towards precision may put the analysis at the risk of unscalability [Li et al.2020].As a result,seeking further precision improvement becomes hard,as limited efficiency margins are left to play with.Under such constraints,if the approach for more precise pointer analysis is not designed well,it may introduce significant overhead with minor or no precision improvement(e.g.,by selecting for sensitive treatment many precision-useless methods).More commonly,the analysis will choose to scale,but will sacrifice precision(e.g.,by selecting very few methods to treat context sensitively). Problem.For many real-world applications where points-to or alias information is required,such as certain bug detectors,security analyzers,etc.,good precision is much favored,even if the price is to run the analysis for a long time (e.g.,several hours)[Christakis and Bird 2016].Thus,the challenge we seek to address is to achieve,given a long but reasonable time allowance,precise pointer analysis results for large and complex Java programs,for which traditional context-sensitive analyses fail to scale,and selective context-sensitive analyses scale but with limited precision Method.We introduce a general(meta-)framework called Unity-Relay to produce highly precise pointer analyses for hard-to-analyze Java programs,by systematically exploiting and utilizing any selective context sensitivity technique.Unity-Relay is a one-two punch,with precision as its first priority: Unity-Relay takes as inputs a program P and a set of different selective context-sensitivity approaches,S; it first provides a method(called Unity)to unify all approaches in S(playing the role of a meta-heuristic),maximizing precision by applying the most stringent context-sensitivity selection made by any approach in S; if Unity fails to scale,the Relay method is employed.It divides the scalability burden of Unity into different passes(each pass is a run of context-sensitive pointer analysis guided by an approach in S),with the precision achieved by one pass being transmitted to and Proc.ACM Program.Lang.,Vol.5.No.OOPSLA,Article 147.Publication date:October 2021
147:2 Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis et al. 2007]. A more precise pointer analysis is always favored by its clients as it implies, e.g., fewer false alarms of bugs and vulnerabilities, more accurate code navigation, or potential for optimization. Context sensitivity has been demonstrated to be a major scheme for improving precision of Java pointer analysis [Fegade and Wimmer 2020; Feng et al. 2015; Kanvar and Khedker 2016; Lhoták and Hendren 2008; Lhoták and Hendren 2006; Milanova et al. 2002, 2005; Tan et al. 2017; Thakur and Nandivada 2019, 2020; Thiessen and Lhoták 2017; Whaley and Lam 2004; Xu and Rountev 2008]. With years of development, traditional context sensitivity performs well for small/medium programs, e.g., 2obj (object-sensitive pointer analysis with context length being two) can analyze most DaCapo benchmarks rapidly and precisely [Kastrinis and Smaragdakis 2013; Smaragdakis et al. 2011]; however, it is hard for it to scale to large and complex Java programs with good precision [Smaragdakis et al. 2014]. (An analysis not scaling means that it blows up in complexity and does not terminate even under very high time limits.) To address this problem, a series of research work [Hassanshahi et al. 2017; Jeon et al. 2019; Jeong et al. 2017; Li et al. 2018a,b, 2020; Lu and Xue 2019; Minseok Jeon and Oh 2020; Oh et al. 2014, 2015; Smaragdakis et al. 2014; Wei and Ryder 2015] propose to apply context sensitivity selectively in the sense that contexts no longer apply to all methods, while context elements (e.g., objects, types, call-sites) and context lengths may vary for different selected methods. Other, non-selected, methods are treated context-insensitively, in order to enable good scalability. Conversely, to gain better precision, selective context-sensitivity approaches usually have to sacrifice efficiency by allowing more methods to be analyzed context-sensitively, sometimes with longer context lengths or more precise but heavier context elements. In other words, these analyses need to make careful precision and efficiency trade-offs, where one more step towards precision may put the analysis at the risk of unscalability [Li et al. 2020]. As a result, seeking further precision improvement becomes hard, as limited efficiency margins are left to play with. Under such constraints, if the approach for more precise pointer analysis is not designed well, it may introduce significant overhead with minor or no precision improvement (e.g., by selecting for sensitive treatment many precision-useless methods). More commonly, the analysis will choose to scale, but will sacrifice precision (e.g., by selecting very few methods to treat context sensitively). Problem. For many real-world applications where points-to or alias information is required, such as certain bug detectors, security analyzers, etc., good precision is much favored, even if the price is to run the analysis for a long time (e.g., several hours) [Christakis and Bird 2016]. Thus, the challenge we seek to address is to achieve, given a long but reasonable time allowance, precise pointer analysis results for large and complex Java programs, for which traditional context-sensitive analyses fail to scale, and selective context-sensitive analyses scale but with limited precision. Method. We introduce a general (meta-)framework called Unity-Relay to produce highly precise pointer analyses for hard-to-analyze Java programs, by systematically exploiting and utilizing any selective context sensitivity technique. Unity-Relay is a one-two punch, with precision as its first priority: • Unity-Relay takes as inputs a program P and a set of different selective context-sensitivity approaches, 𝑆; • it first provides a method (called Unity) to unify all approaches in 𝑆 (playing the role of a meta-heuristic), maximizing precision by applying the most stringent context-sensitivity selection made by any approach in 𝑆; • if Unity fails to scale, the Relay method is employed. It divides the scalability burden of Unity into different passes (each pass is a run of context-sensitive pointer analysis guided by an approach in 𝑆), with the precision achieved by one pass being transmitted to and Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:3 accumulated in the next pass:in each pass,the points-to results retrieved from the previous pass are used to on-the-fly filter intermediate points-to results,keeping inferred points-to sets small and,thus,enhancing scalability. Given any selective context-sensitive pointer analysis A(guided by an approach in S),if A is sound and scalable for P,no matter how precise A is,Unity-Relay can be expected to produce a sound and scalable analysis with a precision that is guaranteed to be at least as good as A's,in the worst case.(Experimental results show that the precision is always better.)This property also implies that even when new(possibly special-purpose)selective context-sensitivity insights are developed in the future,Unity-Relay will still be able to leverage them to produce a more precise analysis in practice. Results.As a proof-of-concept,we instantiate the Unity-Relay framework in a tool called BATON. We extensively evaluate BAToN on a set of hard-to-analyze Java programs(including the toughest programs evaluated in the past literature of selective context-sensitive pointer analysis for Java), using three general precision metrics and four popular clients(also the most complete set of precision metrics/clients used in recent literature).Compared to the state of the art,experimental results show that BAToN is able to improve precision significantly,and achieves the best precision for all metrics and clients for all evaluated programs.To the best of our knowledge,this is the first time that this level of analysis precision has been attained for these hard-to-analyze programs. Moreover,because of Unity-Relay's nature as a general meta-framework,we expect it to see more future instantiations and to unleash the power of more selective context-sensitivity approaches,to produce more precise pointer analyses. In summary,this work makes the following contributions: We present Unity-Relay,a simple and practical framework to produce precise pointer analyses for hard-to-analyze Java programs,by unleashing the power of selective context sensitivity(Section 3). We formulate the Unity-Relay framework,prove its soundness and precision guarantees, and discuss its scalability (Section 4). We present BAroN,a tool instantiated from the Unity-Relay framework,which will be released as an open-source tool(Section 5). We conduct extensive experiments by comparing BAroN with the state of the art,to demon- strate its effectiveness in the real world(the experimental results can be obtained using the accompanying artifact [Tan et al.2021])(Section 6).BATON substantially improves on the precision of previous algorithms in the literature-e.g.,eliminating over 33%of alias pairs (soundly determined to be spurious)on average,and up to 71%,compared to the best past contender evaluated. 2 BACKGROUND We assume that the reader is familiar with the high-level concepts of pointer analysis,as introduced in several surveys [Smaragdakis and Balatsouras 2015;Sridharan et al.2013].This section describes some necessary background knowledge about context sensitivity for whole-program pointer analysis for Java Traditional Context Sensitivity.A method is analyzed context-sensitively means that the static abstraction of different dynamic instantiations of variables and heap objects within the method will be treated separately during analysis under different abstract entities called contexts.Accordingly, spurious object flows will be reduced,thus making the analysis more precise.Given a method m,each of its contexts typically consists of a list of consecutive context elements in the form of Proc.ACM Program.Lang.,Vol.5,No.OOPSLA,Article 147.Publication date:October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:3 accumulated in the next pass: in each pass, the points-to results retrieved from the previous pass are used to on-the-fly filter intermediate points-to results, keeping inferred points-to sets small and, thus, enhancing scalability. Given any selective context-sensitive pointer analysis 𝐴 (guided by an approach in 𝑆), if 𝐴 is sound and scalable for P, no matter how precise 𝐴 is, Unity-Relay can be expected to produce a sound and scalable analysis with a precision that is guaranteed to be at least as good as 𝐴’s, in the worst case. (Experimental results show that the precision is always better.) This property also implies that even when new (possibly special-purpose) selective context-sensitivity insights are developed in the future, Unity-Relay will still be able to leverage them to produce a more precise analysis in practice. Results. As a proof-of-concept, we instantiate the Unity-Relay framework in a tool called Baton. We extensively evaluate Baton on a set of hard-to-analyze Java programs (including the toughest programs evaluated in the past literature of selective context-sensitive pointer analysis for Java), using three general precision metrics and four popular clients (also the most complete set of precision metrics/clients used in recent literature). Compared to the state of the art, experimental results show that Baton is able to improve precision significantly, and achieves the best precision for all metrics and clients for all evaluated programs. To the best of our knowledge, this is the first time that this level of analysis precision has been attained for these hard-to-analyze programs. Moreover, because of Unity-Relay’s nature as a general meta-framework, we expect it to see more future instantiations and to unleash the power of more selective context-sensitivity approaches, to produce more precise pointer analyses. In summary, this work makes the following contributions: • We present Unity-Relay, a simple and practical framework to produce precise pointer analyses for hard-to-analyze Java programs, by unleashing the power of selective context sensitivity (Section 3). • We formulate the Unity-Relay framework, prove its soundness and precision guarantees, and discuss its scalability (Section 4). • We present Baton, a tool instantiated from the Unity-Relay framework, which will be released as an open-source tool (Section 5). • We conduct extensive experiments by comparing Baton with the state of the art, to demonstrate its effectiveness in the real world (the experimental results can be obtained using the accompanying artifact [Tan et al. 2021]) (Section 6). Baton substantially improves on the precision of previous algorithms in the literatureÐe.g., eliminating over 33% of alias pairs (soundly determined to be spurious) on average, and up to 71%, compared to the best past contender evaluated. 2 BACKGROUND We assume that the reader is familiar with the high-level concepts of pointer analysis, as introduced in several surveys [Smaragdakis and Balatsouras 2015; Sridharan et al. 2013]. This section describes some necessary background knowledge about context sensitivity for whole-program pointer analysis for Java. Traditional Context Sensitivity. A method is analyzed context-sensitively means that the static abstraction of different dynamic instantiations of variables and heap objects within the method will be treated separately during analysis under different abstract entities called contexts. Accordingly, spurious object flows will be reduced, thus making the analysis more precise. Given a method m, each of its contexts typically consists of a list of consecutive context elements in the form of Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
147:4 Tian Tan,Yue Li,Xiaoxing Ma,Chang Xu,and Yannis Smaragdakis [c1,c2,...to model different run-time conditions.For example,for call-site sensitivity,c1 is the call site of m,and c2 is the call site of the method containing cl,etc.However,for efficiency, only the most recent I context elements are kept(I is also called the context length)and I is usually limited to a small number for whole-program analysis in practice [Kastrinis and Smaragdakis 2013;Smaragdakis et al.2011].There are three mainstream context-sensitivity variants for Java pointer analysis:call-site sensitivity,object sensitivity,and type sensitivity.Henceforth,we use 3obj to denote object sensitivity with context length being 3.Longer context indicates more precision:more spurious data flows can be possibly eliminated.Thus,3obj is more precise than 2obj.For different kinds of context elements with the same length,past work has demonstrated that object sensitivity typically outperforms call-site sensitivity,in terms of both precision and efficiency [Milanova et al.2005;Smaragdakis et al.2011],and is more precise but also more expensive than type sensitivity [Smaragdakis et al.2011;Tan et al.2017].Traditionally,context-sensitive pointer analysis uniformly applies context sensitivity to every method in a program to maintain high precision.However,this approach often does not work for large and complicated programs as such treatment is too costly to scale [Li et al.2018b;Smaragdakis et al.2014] Selective Context Sensitivity.Selective context sensitivity proposes to apply context sensitivity selectively,only for precision-useful/non-scalability-threat methods.The analysis precision of such methods will help improve the overall program analysis precision,while not incurring so much cost as to make the analysis hard to scale.The remaining methods are analyzed context-insensitively so that the space and time cost originally incurred for these methods(in traditional context sensitivity)is saved,leaving room to produce precise and scalable analyses even for hard-to-analyze programs.However,it is challenging to accurately determine which methods are precision-useful but not scalability-threat in general.To deal with this problem,in the past years,various selective context-sensitivity approaches have been presented,based on different insights and policies.For example,the target methods could be selected according to expert experience [WALA 2018]. program patterns [Li et al.2018a,2020],abstracted memory capacity [Li et al.2018b],parameterized heuristics [Hassanshahi et al.2017;Smaragdakis et al.2014],or machine learning approaches [Jeon et al.2018;Jeong et al.2017].In addition,in selective context sensitivity,some approaches apply the same context sensitivity to the selected methods [Hassanshahi et al.2017:Li et al.2018a.2020: Smaragdakis et al.2014],while in other approaches context elements and lengths may vary for different subsets of the selected methods [Jeon et al.2019;Jeong et al.2017;Li et al.2018b].We refer the readers to Section 7 for more details. 3 THE UNITY-RELAY FRAMEWORK,INFORMALLY We first give an overview of the Unity-Relay framework(Section 3.1),and then explain the key ideas of Unity (Section 3.2)and Relay(Section 3.3),respectively. 3.1 Overview Figure 1 shows a pictorial overview of the Unity-Relay framework:given a program P and a set of selective context-sensitive(C.S.)approach/strategy components,S,the framework will yield highly precise pointer analysis results for P. Unity-Relay is a one-two punch.It first calls its Unity component.Taking as input the results obtained by running each approach in S for P,the context-sensitivity selector of Unity(C.S.Selector in Figure 1)unifies them and makes new configuration about which portion of P should be analyzed by what context-sensitivity variants (ie.,what are the context elements and lengths)to maximize the precision,according to the unity principle,as introduced in Section 3.2.Intuitively,the Unity Proc.ACM Program.Lang.,Vol.5.No.OOPSLA,Article 147.Publication date:October 2021
147:4 Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis [c1,c2,...] to model different run-time conditions. For example, for call-site sensitivity, c1 is the call site of m, and c2 is the call site of the method containing c1, etc. However, for efficiency, only the most recent 𝑙 context elements are kept (𝑙 is also called the context length) and 𝑙 is usually limited to a small number for whole-program analysis in practice [Kastrinis and Smaragdakis 2013; Smaragdakis et al. 2011]. There are three mainstream context-sensitivity variants for Java pointer analysis: call-site sensitivity, object sensitivity, and type sensitivity. Henceforth, we use 3obj to denote object sensitivity with context length being 3. Longer context indicates more precision: more spurious data flows can be possibly eliminated. Thus, 3obj is more precise than 2obj. For different kinds of context elements with the same length, past work has demonstrated that object sensitivity typically outperforms call-site sensitivity, in terms of both precision and efficiency [Milanova et al. 2005; Smaragdakis et al. 2011], and is more precise but also more expensive than type sensitivity [Smaragdakis et al. 2011; Tan et al. 2017]. Traditionally, context-sensitive pointer analysis uniformly applies context sensitivity to every method in a program to maintain high precision. However, this approach often does not work for large and complicated programs as such treatment is too costly to scale [Li et al. 2018b; Smaragdakis et al. 2014]. Selective Context Sensitivity. Selective context sensitivity proposes to apply context sensitivity selectively, only for precision-useful/non-scalability-threat methods. The analysis precision of such methods will help improve the overall program analysis precision, while not incurring so much cost as to make the analysis hard to scale. The remaining methods are analyzed context-insensitively so that the space and time cost originally incurred for these methods (in traditional context sensitivity) is saved, leaving room to produce precise and scalable analyses even for hard-to-analyze programs. However, it is challenging to accurately determine which methods are precision-useful but not scalability-threat in general. To deal with this problem, in the past years, various selective context-sensitivity approaches have been presented, based on different insights and policies. For example, the target methods could be selected according to expert experience [WALA 2018], program patterns [Li et al. 2018a, 2020], abstracted memory capacity [Li et al. 2018b], parameterized heuristics [Hassanshahi et al. 2017; Smaragdakis et al. 2014], or machine learning approaches [Jeon et al. 2018; Jeong et al. 2017]. In addition, in selective context sensitivity, some approaches apply the same context sensitivity to the selected methods [Hassanshahi et al. 2017; Li et al. 2018a, 2020; Smaragdakis et al. 2014], while in other approaches context elements and lengths may vary for different subsets of the selected methods [Jeon et al. 2019; Jeong et al. 2017; Li et al. 2018b]. We refer the readers to Section 7 for more details. 3 THE UNITY-RELAY FRAMEWORK, INFORMALLY We first give an overview of the Unity-Relay framework (Section 3.1), and then explain the key ideas of Unity (Section 3.2) and Relay (Section 3.3), respectively. 3.1 Overview Figure 1 shows a pictorial overview of the Unity-Relay framework: given a program P and a set of selective context-sensitive (C.S.) approach/strategy components, 𝑆, the framework will yield highly precise pointer analysis results for P. Unity-Relay is a one-two punch. It first calls its Unity component. Taking as input the results obtained by running each approach in 𝑆 for P, the context-sensitivity selector of Unity (C.S. Selector in Figure 1) unifies them and makes new configuration about which portion of P should be analyzed by what context-sensitivity variants (i.e., what are the context elements and lengths) to maximize the precision, according to the unity principle, as introduced in Section 3.2. Intuitively, the Unity Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:5 Points-To info in previous Unity Relay pass PTpre Program P C.S.Selector of C.S.Selector of Unity (Figure 2) Trigger Relay (Figure 3) when Selective C.S. Selected Selected C.S. C.S. Approaches Unity Selective C.S. Selective C.S fails Pointer Analysis Pointer Analysis filtered by PTpre Precise Pointer Analysis Results for P Fig.1.Overview of the Unity-Relay Framework. principle maintains for each program element the most precise context that any component context- sensitivity approach would use.Then the new selected context-sensitivity information(Selected C.S. in Figure 1)will guide the selective context-sensitive pointer analysis to analyze P,and its results will be the output of Unity-Relay if the analysis is able to finish running within the time limit; otherwise(i.e,if Unity is too expensive to scale),Unity-Relay will trigger its Relay component to analyze P. Generally,given n approaches in S,Relay will run the selective context-sensitive pointer analysis n times(ie.,we have n passes in Relay).In each pass,as shown in Figure 1,the selective context- sensitive pointer analysis is guided by the context-sensitivity information(Selected C.S.)output from the C.S.selector of Relay according to the relay principle as illustrated in Section 3.3.Between successive passes,the precision of the earlier pass will be transmitted to and accumulated in the next pass.This is achieved by soundly filtering the pointer analysis results in the current pass by using the points-to information(points-to sets of variables)from the previous pass(PTpre in Figure 1).As a result,in Relay,the scalability burden is shared in each analysis run(i.e.,each pass): a component strategy is not burdened by others,but instead each strategy that completes helps both the precision and the scalability of the rest.Although overall precision is not guaranteed to reach that of Unity,precision improvements are reaped in a stable and accumulative way. 3.2 Unity Recall that every selective context-sensitive analysis represents a precision and efficiency trade-off, where precision bumps up against the limits of analysis scalability.In other words,for past selective context-sensitive analyses,the degree of precision(typically:the proportion of the program to be analyzed context-sensitively)is chosen to be approximately at a point where further improving it would render the analysis non-scalable.According to previous work,treating context-sensitively even a small set of methods may significantly hinder scalability,for the "wrong"choice of meth- ods [Li et al.2020].Now it seems that we are trapped:if a small step towards precision may hit the scalability wall(in the precision and efficiency balance made by each selective context-sensitivity approach),how can we reap a further noticeable precision improvement in a general,policy-agnostic meta-framework?To escape from the trap,in Unity,we propose to take a big step forward towards higher precision,by allowing many more methods to be analyzed under context sensitivity,with Proc.ACM Program.Lang.,Vol.5,No.OOPSLA,Article 147.Publication date:October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:5 C.S. Selector of Unity (Figure 2) Selective C.S. Pointer Analysis Selected C.S. C.S. Selector of Relay (Figure 3) Selective C.S. Pointer Analysis filtered by pai Selected C.S. Points-To info in previous Unity Relay Trigger when Unity fails Precise Pointer Analysis Results for P Program P Selective C.S. Approaches pass PTpre PTpre Fig. 1. Overview of the Unity-Relay Framework. principle maintains for each program element the most precise context that any component contextsensitivity approach would use. Then the new selected context-sensitivity information (Selected C.S. in Figure 1) will guide the selective context-sensitive pointer analysis to analyze P, and its results will be the output of Unity-Relay if the analysis is able to finish running within the time limit; otherwise (i.e., if Unity is too expensive to scale), Unity-Relay will trigger its Relay component to analyze P. Generally, given 𝑛 approaches in 𝑆, Relay will run the selective context-sensitive pointer analysis 𝑛 times (i.e., we have 𝑛 passes in Relay). In each pass, as shown in Figure 1, the selective contextsensitive pointer analysis is guided by the context-sensitivity information (Selected C.S.) output from the C.S. selector of Relay according to the relay principle as illustrated in Section 3.3. Between successive passes, the precision of the earlier pass will be transmitted to and accumulated in the next pass. This is achieved by soundly filtering the pointer analysis results in the current pass by using the points-to information (points-to sets of variables) from the previous pass (PT𝑝𝑟𝑒 in Figure 1). As a result, in Relay, the scalability burden is shared in each analysis run (i.e., each pass): a component strategy is not burdened by others, but instead each strategy that completes helps both the precision and the scalability of the rest. Although overall precision is not guaranteed to reach that of Unity, precision improvements are reaped in a stable and accumulative way. 3.2 Unity Recall that every selective context-sensitive analysis represents a precision and efficiency trade-off, where precision bumps up against the limits of analysis scalability. In other words, for past selective context-sensitive analyses, the degree of precision (typically: the proportion of the program to be analyzed context-sensitively) is chosen to be approximately at a point where further improving it would render the analysis non-scalable. According to previous work, treating context-sensitively even a small set of methods may significantly hinder scalability, for the łwrongž choice of methods [Li et al. 2020]. Now it seems that we are trapped: if a small step towards precision may hit the scalability wall (in the precision and efficiency balance made by each selective context-sensitivity approach), how can we reap a further noticeable precision improvement in a general, policy-agnostic meta-framework? To escape from the trap, in Unity, we propose to take a big step forward towards higher precision, by allowing many more methods to be analyzed under context sensitivity, with Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
147:6 Tian Tan,Yue Li,Xiaoxing Ma,Chang Xu,and Yannis Smaragdakis 3obj 2obj 1type 2obj 2type 2type 9 CI 9 CI S1 S2 S3 Unity Fig.2.An example for illustrating the idea of Unity.S1,S2 and S3 are three input selective context-sensitivity approaches.The four squares show the selected context-sensitivity information for the same program P (i.e.,which methods of P should be analyzed by what context-sensitivity variants)yielded by the three input selective approaches(S1,S2 and S3)and Unity respectively.For example,the square of S1 means:the S1 approach determines that the methods enclosed in the green circle will be analyzed by 2obj while the remaining methods of P will be analyzed context-insensitively(i.e.,CI). heavier context elements and longer context lengths,if possible.This counterintuitive proposal is based on the following insight. Insight.Analyzing more methods context-sensitively may not incur an efficiency decline(some- times it may even accelerate an analysis as more false data flows are pruned away),as long as the right methods are chosen.Avoiding scalability-threat methods is essential [Li et al.2020; Smaragdakis et al.2014],but the net number of methods analyzed context-sensitively is not. As described in Section 2,each of the existing selective context-sensitivity approaches(henceforth, just "selective approaches")has its own insight (e.g.,based on expert experience,parameterized heuristics,principled program patterns,etc.)to identify which methods are more likely useful for improving precision while not incurring much efficiency cost,when analyzed context-sensitively. Limited by the effectiveness of each insight,every approach may miss a collection of methods which are truly precision-useful and efficiency-friendly;however,this problem can be alleviated when combining more selective approaches to identify more target methods.As a result,we propose to simultaneously and context-sensitively analyze all the methods identified by a set of selective approaches,S.Although now possibly many more methods need to be analyzed context-sensitively, we still have a chance to obtain an analysis that is scalable even if we analyze them together,as each of these extra methods is determined as not a scalability threat by at least one approach in S.Hence,although it is hard to give one principle to accurately decide which methods,say M,are precision-useful but not scalability-threatening,our proposal naturally takes advantages of the insights of different selective approaches for identifying M from different perspectives. We illustrate the key idea of Unity in Figure 2.Assume we have three selective approaches S1,S2,and S3.Figure 2 depicts the selected context-sensitivity information determined by each approach for the same program P.For instance,after analyzing P,S1 reports that the set of methods in the green circle(in S1 of Figure 2)should be analyzed by 2obj and the remaining methods(the gray part in S1)need to be analyzed context-insensitively(CI).Similarly,we have the results for S2 and S3 in Figure 2,and in the case of S3,three context-sensitivity variants,3obj,2obj and 1type are selected for different sets of methods. What does Unity produce given the above three approaches and their selected context-sensitivity information?Briefly,Unity finds out the most precise configuration by reassigning context-sensitivity variants to the methods which are reported to be analyzed context-sensitively by at least two input approaches.For example,among the overlapped methods reported by S1 and S3,from the point Proc.ACM Program.Lang.,Vol.5.No.OOPSLA,Article 147.Publication date:October 2021
147:6 Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis 2obj 2type 1type 2obj 3obj CI 2obj CI 1type 2obj 3obj CI 2type CI S1 S2 S3 Unity Fig. 2. An example for illustrating the idea of Unity. S1, S2 and S3 are three input selective context-sensitivity approaches. The four squares show the selected context-sensitivity information for the same program P (i.e., which methods of P should be analyzed by what context-sensitivity variants) yielded by the three input selective approaches (S1, S2 and S3) and Unity respectively. For example, the square of S1 means: the S1 approach determines that the methods enclosed in the green circle will be analyzed by 2obj while the remaining methods of P will be analyzed context-insensitively (i.e., CI). heavier context elements and longer context lengths, if possible. This counterintuitive proposal is based on the following insight. Insight. Analyzing more methods context-sensitively may not incur an efficiency decline (sometimes it may even accelerate an analysis as more false data flows are pruned away), as long as the right methods are chosen. Avoiding scalability-threat methods is essential [Li et al. 2020; Smaragdakis et al. 2014], but the net number of methods analyzed context-sensitively is not. As described in Section 2, each of the existing selective context-sensitivity approaches (henceforth, just łselective approachesž) has its own insight (e.g., based on expert experience, parameterized heuristics, principled program patterns, etc.) to identify which methods are more likely useful for improving precision while not incurring much efficiency cost, when analyzed context-sensitively. Limited by the effectiveness of each insight, every approach may miss a collection of methods which are truly precision-useful and efficiency-friendly; however, this problem can be alleviated when combining more selective approaches to identify more target methods. As a result, we propose to simultaneously and context-sensitively analyze all the methods identified by a set of selective approaches, 𝑆. Although now possibly many more methods need to be analyzed context-sensitively, we still have a chance to obtain an analysis that is scalable even if we analyze them together, as each of these extra methods is determined as not a scalability threat by at least one approach in 𝑆. Hence, although it is hard to give one principle to accurately decide which methods, say M, are precision-useful but not scalability-threatening, our proposal naturally takes advantages of the insights of different selective approaches for identifying M from different perspectives. We illustrate the key idea of Unity in Figure 2. Assume we have three selective approaches S1, S2, and S3. Figure 2 depicts the selected context-sensitivity information determined by each approach for the same program P. For instance, after analyzing P, S1 reports that the set of methods in the green circle (in S1 of Figure 2) should be analyzed by 2obj and the remaining methods (the gray part in S1) need to be analyzed context-insensitively (CI). Similarly, we have the results for S2 and S3 in Figure 2, and in the case of S3, three context-sensitivity variants, 3obj, 2obj and 1type are selected for different sets of methods. What does Unity produce given the above three approaches and their selected context-sensitivity information? Briefly, Unity finds out the most precise configuration by reassigning context-sensitivity variants to the methods which are reported to be analyzed context-sensitively by at least two input approaches. For example, among the overlapped methods reported by S1 and S3, from the point Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:7 of view of S1,some methods that should have been analyzed under 2obj,are now assigned to 3obj(color turns from green to red)as 3obj is more precise than 2obj;for the rest of overlapped methods,they remain green(i.e.,still be analyzed by 2obj)as 2obj is more precise than 1type (reported by S3).After applying similar treatment to other overlapped methods(with the disjoint ones unchanged),we get the output of Unity as in Figure 2.Note that the methods reported by all three input approaches(the middle part of the Unity figure)remain green as 2obj(reported by S1) is more precise than 2type(by S2)and 1type(by S3).Section 4 will detail the rules on how Unity deals with more input approaches with different context-sensitivity variants. From the point of view of each selective approach,Unity modifies it by(1)adding many more methods(identified by other approaches)to be analyzed context-sensitively,and(2)upgrading the context-sensitivity variant to the most precise one for the methods which are also selected by other approaches.We have explained the insight for(1)at the beginning of Section 3.2,and the insight for(2)is similar:if a method m is reported to be analyzed under a certain context-sensitivity variant,cs,by an input approach s,this implies that according to the insight of s,m is considered as not a scalability-threat,regardless of how costly cs is.Then,if the input approaches are reliable, we still have a chance to produce a scalable analysis,even if selecting the most precise cs.Later, experimental results further demonstrate the validity of the Unity insight:even if many more methods are analyzed context-sensitively and are under the most precise configuration (w.r.t.the input approaches).Unity is still able to scale for 12 out of 13 hard-to-analyze programs(with default settings). However,there is no guarantee regarding the scalability of Unity.What can we do if Unity fails to scale?As a second try,a straightforward approach is to not make the analysis too costly. For example,if some overlapped methods M are reported to be analyzed under 3obj and 1type by different input approaches,what about considering the less precise(also less costly)one,i.e.,1type? Nevertheless,this may introduce significant precision loss,and if M is critical form improving precision,such treatment may make the analysis even less precise than the original one(s)guided by certain input approach(es)individually.To reap as much precision as possible while achieving good scalability,we employ the Relay technique. The core insight of Relay is to divide the scalability burden of Unity into different passes(each pass is a run of selective context-sensitive pointer analysis)and the precision achieved by the former pass can be transmitted to and accumulated in the next pass.Based on this insight,we design two options of Relay,called Relay-o1 and Relay-02 that the former is more precise but less scalable than the latter.Figure 3 depicts the above idea with the same input selective approaches as in Figure 2.We introduce Relay-o1 first. 3.3 Relay The first pass of Relay-o1 in Figure 3 differs from Unity in that it only analyzes the methods(say M1)selected by S1(enclosed in the circle with solid line)context-sensitively,with the remaining ones being analyzed under CI.Similarly,only the methods,say M2(M3)selected by S2(S3)are analyzed context-sensitively in pass 2(pass 3).Thus,for each context-sensitive method of M1(M2 or M3),as shown in the figure,Relay selects the same context-sensitivity variant for it as in Unity, namely,still the most precise configuration suggested by any of the combined approaches.For example, for M1 in pass 1,its context-sensitivity variants distribution is the same as the one of Unity in Figure 2(i.e.,the same subset of methods are assigned to 3obj while the remaining methods are assigned to 2obj).For each pass of Relay,only a subset of methods of Unity are analyzed context-sensitively.Although this does not ensure better scalability than Unity,no more methods than the base approach are analyzed context-sensitively-only the same methods may be analyzed Proc.ACM Program.Lang.,Vol.5,No.OOPSLA,Article 147.Publication date:October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:7 of view of S1, some methods that should have been analyzed under 2obj, are now assigned to 3obj (color turns from green to red) as 3obj is more precise than 2obj; for the rest of overlapped methods, they remain green (i.e., still be analyzed by 2obj) as 2obj is more precise than 1type (reported by S3). After applying similar treatment to other overlapped methods (with the disjoint ones unchanged), we get the output of Unity as in Figure 2. Note that the methods reported by all three input approaches (the middle part of the Unity figure) remain green as 2obj (reported by S1) is more precise than 2type (by S2) and 1type (by S3). Section 4 will detail the rules on how Unity deals with more input approaches with different context-sensitivity variants. From the point of view of each selective approach, Unity modifies it by (1) adding many more methods (identified by other approaches) to be analyzed context-sensitively, and (2) upgrading the context-sensitivity variant to the most precise one for the methods which are also selected by other approaches. We have explained the insight for (1) at the beginning of Section 3.2, and the insight for (2) is similar: if a method m is reported to be analyzed under a certain context-sensitivity variant, cs, by an input approach 𝑠, this implies that according to the insight of 𝑠, m is considered as not a scalability-threat, regardless of how costly cs is. Then, if the input approaches are reliable, we still have a chance to produce a scalable analysis, even if selecting the most precise cs. Later, experimental results further demonstrate the validity of the Unity insight: even if many more methods are analyzed context-sensitively and are under the most precise configuration (w.r.t. the input approaches), Unity is still able to scale for 12 out of 13 hard-to-analyze programs (with default settings). However, there is no guarantee regarding the scalability of Unity. What can we do if Unity fails to scale? As a second try, a straightforward approach is to not make the analysis too costly. For example, if some overlapped methods M are reported to be analyzed under 3obj and 1type by different input approaches, what about considering the less precise (also less costly) one, i.e., 1type? Nevertheless, this may introduce significant precision loss, and if M is critical form improving precision, such treatment may make the analysis even less precise than the original one(s) guided by certain input approach(es) individually. To reap as much precision as possible while achieving good scalability, we employ the Relay technique. The core insight of Relay is to divide the scalability burden of Unity into different passes (each pass is a run of selective context-sensitive pointer analysis) and the precision achieved by the former pass can be transmitted to and accumulated in the next pass. Based on this insight, we design two options of Relay, called Relay-o1 and Relay-o2 that the former is more precise but less scalable than the latter. Figure 3 depicts the above idea with the same input selective approaches as in Figure 2. We introduce Relay-o1 first. 3.3 Relay The first pass of Relay-o1 in Figure 3 differs from Unity in that it only analyzes the methods (say M1) selected by S1 (enclosed in the circle with solid line) context-sensitively, with the remaining ones being analyzed under CI. Similarly, only the methods, say M2 (M3) selected by S2 (S3) are analyzed context-sensitively in pass 2 (pass 3). Thus, for each context-sensitive method of M1 (M2 or M3), as shown in the figure, Relay selects the same context-sensitivity variant for it as in Unity, namely, still the most precise configuration suggested by any of the combined approaches. For example, for M1 in pass 1, its context-sensitivity variants distribution is the same as the one of Unity in Figure 2 (i.e., the same subset of methods are assigned to 3obj while the remaining methods are assigned to 2obj). For each pass of Relay, only a subset of methods of Unity are analyzed context-sensitively. Although this does not ensure better scalability than Unity, no more methods than the base approach are analyzed context-sensitivelyÐonly the same methods may be analyzed Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
147:8 Tian Tan,Yue Li,Xiaoxing Ma,Chang Xu,and Yannis Smaragdakis 4 ,3ob1 300 2obj: 2obl:1type Relay-01 2type Precision of 2obf Precision of Pass 1(P1) 2type Pass 2(P2) 2 CI ci CI Pass 1 Pass 2 Pass 3 3obj 2obj 1type Relay-02 Precision of Precision of Pass 1(P1) 2type Pass 2(P2) CI CI Pass 1 Pass 2 Pass 3 Fig.3.Examples for illustrating the idea of Relay with the same input approaches setting as in Figure 2.Relay has two options,Relay-o1 and Relay-02 and the former is more precise but less scalable than the latter.In each pass of both options,only the methods enclosed in the solid circles,say M.are analyzed context-sensitively (the dotted lines in Relay-o1 are to facilitate understanding).Both options adopt the same precision filtering mechanism,and their difference only lies in what context-sensitivity variants are assigned to M. with a more precise context.Therefore,the probability to select scalability-threat methods is low, making Relay more likely to scale compared to Unity. After running the first pass,its precision,which is reflected in the points-to set for every variable, will be transmitted to the next pass(pass 2).Then,in pass 2,we can use the precision of pass 1, i.e.,the points-to information,to help soundly filtering the points-to sets for all variables when performing the analysis in pass 2.For example,assume a variable v points to three objects under CI with its points-to set,say pt(v),being {o1,02,03).After pass 1,pt(v)becomes {o1,03)where o2 is identified as a spurious object by analyzing M1 context-sensitively.However,in pass 2,purely analyzing M2 under the contexts shown in Figure 3 cannot recognize that v should not point to o2. However,this imprecision does not matter,as the pt(v)transmitted from pass 1 can be used to filter pt(v)during the analysis in pass 2,i.e.,preventing o2 from being propagated to pt(v).Such filtering is safe as the pointer analysis in pass 1 is sound,so no true objects will be filtered away in Relay(this will be formally proven in Section 4). Finally,although uncommon,what if Relay-o1 is not scalable?After all,given a program,we hope to confidently rely on Unity-Relay for yielding precise pointer analysis results all the time. Under this context,we design Relay-02 as the last line of defense of Unity-Relay.Relay-o2 employs the same principle as Relay-o1 with the exception that context-sensitivity variants (selected by input approach Si)are not reassigned any more in each pass,say Passi(see Relay-o2 in Figure 3).In other words,Pass;and the analysis guided by Si,say Ai,adopt the same selective context sensitivity.The only difference between running the selective approaches independently is that precision accumulates,using the same filtering mechanism as in Relay-01.Thus,in Passi,the points-to set for each variable is never larger than the one in Ai,resulting in less data propagation and faster analysis convergence.The only extra cost introduced in Relay-02 is the process of filtering,which is small compared to the overhead of a context-sensitive pointer analysis,and easily offset by the aforementioned efficiency benefit.Thus,we can expect Relay-02 to scale as long Proc.ACM Program.Lang.,Vol.5.No.OOPSLA,Article 147.Publication date:October 2021
147:8 Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis 2obj CI 3obj 1type 2obj 3obj CI 2obj 2type CI 2type 2obj Pass 1 Pass 2 Pass 3 Precision of Pass 1 (P1) Precision of Pass 2 (P2) P1 P2 2obj CI 1type 2obj 3obj CI CI Pass 1 Pass 2 Pass 3 Precision of Pass 1 (P1) Precision of Pass 2 (P2) P1 P2 Relay-o2 Relay-o1 2type Fig. 3. Examples for illustrating the idea of Relay with the same input approaches setting as in Figure 2. Relay has two options, Relay-o1 and Relay-o2 and the former is more precise but less scalable than the latter. In each pass of both options, only the methods enclosed in the solid circles, say M, are analyzed context-sensitively (the dotted lines in Relay-o1 are to facilitate understanding). Both options adopt the same precision filtering mechanism, and their difference only lies in what context-sensitivity variants are assigned to M. with a more precise context. Therefore, the probability to select scalability-threat methods is low, making Relay more likely to scale compared to Unity. After running the first pass, its precision, which is reflected in the points-to set for every variable, will be transmitted to the next pass (pass 2). Then, in pass 2, we can use the precision of pass 1, i.e., the points-to information, to help soundly filtering the points-to sets for all variables when performing the analysis in pass 2. For example, assume a variable v points to three objects under CI with its points-to set, say pt(v), being {o1,o2,o3}. After pass 1, pt(v) becomes {o1,o3} where o2 is identified as a spurious object by analyzing M1 context-sensitively. However, in pass 2, purely analyzing M2 under the contexts shown in Figure 3 cannot recognize that v should not point to o2. However, this imprecision does not matter, as the pt(v) transmitted from pass 1 can be used to filter pt(v) during the analysis in pass 2, i.e., preventing o2 from being propagated to pt(v). Such filtering is safe as the pointer analysis in pass 1 is sound, so no true objects will be filtered away in Relay (this will be formally proven in Section 4). Finally, although uncommon, what if Relay-o1 is not scalable? After all, given a program, we hope to confidently rely on Unity-Relay for yielding precise pointer analysis results all the time. Under this context, we design Relay-o2 as the last line of defense of Unity-Relay. Relay-o2 employs the same principle as Relay-o1 with the exception that context-sensitivity variants (selected by input approach S𝑖 ) are not reassigned any more in each pass, say Pass𝑖 (see Relay-o2 in Figure 3). In other words, Pass𝑖 and the analysis guided by S𝑖 , say A𝑖 , adopt the same selective context sensitivity. The only difference between running the selective approaches independently is that precision accumulates, using the same filtering mechanism as in Relay-o1. Thus, in Pass𝑖 , the points-to set for each variable is never larger than the one in A𝑖 , resulting in less data propagation and faster analysis convergence. The only extra cost introduced in Relay-o2 is the process of filtering, which is small compared to the overhead of a context-sensitive pointer analysis, and easily offset by the aforementioned efficiency benefit. Thus, we can expect Relay-o2 to scale as long Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:9 Variable CI S1 S2 S1&S2 Relay 1 x.f a; a o1 01 07 01 01 2 b=y.f; X 02 02 02 02 02 3 y 02,03 03 02,03 03 03 4 p.f a; p 04 04 04 04 04 5 b=q.f; q 04,05 04,05 o5 05 05 6 01 01 01 01 Fig.4.Example for illustrating precision intervention in Relay.Specifically,b is supposed to be a null pointer at run time,and only Relay can analyze it precisely in this case. as A;scales.We will further discuss the scalability of Relay-02 in Section 4.Note that Relay-02 is fundamentally different from merely intersecting the points-to results of all pointer analyses guided by the input approaches.In Relay-02,there is early precision intervention of each analysis pass to the next:the pointer analysis in each pass is filtered by the points-to results produced by the previous passes on the fly,likely preventing the computation of more spurious value flows during the analysis.As a result,Relay-02 is more precise than simply intersecting the final points-to sets after individually running each guided pointer analysis without such precision intervention. We next use an example,in Figure 4,to illustrate the benefit of precision intervention.The left side of Figure 4 is an example code snippet,where the value of a is stored into x.f and p.f(through lines 1 and 4),and b loads values from y.f and q.f (through lines 2 and 5).The table on the right side contains the points-to sets of the variables in the code snippet under different pointer analyses. Column CI gives the points-to sets produced by a context-insensitive pointer analysis,and we consider two selective context-sensitive pointer analyses,S1 and S2,which identify o2 and o4 as spurious objects for variables y and q,respectively.If we simply intersect the points-to results of S1 and S2,then we can eliminate spurious objects for y and q,as shown in column S1&S2.However,b still points to the spurious object o1 under S1&S2,as o1 can flow to b through either line 2 or line 5: both S1 and S2 consider o1 to be valid for b,for different reasons each.Among the analyses in the table,only Relay can block the two flows(through lines 2 and 5)simultaneously.In Relay,with S1 as pass 1 and S2 as pass 2,S2 identifies o4 as spurious for q so that the flow through line 5 can be blocked (as p and q are not aliased).Further benefiting from the precision intervention,the flow through line 2 is also blocked as pass 2 also incorporates the result from pass 1,i.e.,o2 is identified as spurious for y by S1(so that x and y are not aliased).As a result,o1 is identified as spurious for b in pass 2.(The points-to set of b is empty,based on the fragment and example values shown,but other program statements could give it values,orthogonally to the example-e.g.,by setting y.f or q.f.) In the overall Relay mechanism,Relay-o1 and Relay-02 apply independently to analysis passes and are mixed as necessary for scalability.The two Relay variants can be seen as the best and worst precision options,and there is design space between them for making precision and efficiency trade-offs.In our design,precision is the first priority,so for every Passi,we first attempt Relay-o1. If Relay-o1 is not scalable,we select Relay-02(which scales if the base approach does),and repeat the same scheme(i.e.,first trying Relay-01,then Relay-02 if it fails)to Passi+1,etc.until all n passes complete(assuming there are n input selective approaches). The running order for different passes of Relay does not affect the soundness and precision guarantees of Relay(including both Relay-o1 and Relay-02)and also the scalability potential of Relay-02(as proved and discussed in Section 4.4).In other words,in practice,regardless of the pass ordering,users can rely on Relay to produce an analysis that is more precise than the one(say Proc.ACM Program.Lang.,Vol.5,No.OOPSLA,Article 147.Publication date:October 2021
Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:9 1 x.f = a; 2 b = y.f; 3 ... 4 p.f = a; 5 b = q.f; Variable CI S1 S2 S1&S2 Relay a o1 o1 o1 o1 o1 x o2 o2 o2 o2 o2 y o2,o3 o3 o2,o3 o3 o3 p o4 o4 o4 o4 o4 q o4,o5 o4,o5 o5 o5 o5 b o1 o1 o1 o1 Fig. 4. Example for illustrating precision intervention in Relay. Specifically, b is supposed to be a null pointer at run time, and only Relay can analyze it precisely in this case. as A𝑖 scales. We will further discuss the scalability of Relay-o2 in Section 4. Note that Relay-o2 is fundamentally different from merely intersecting the points-to results of all pointer analyses guided by the input approaches. In Relay-o2, there is early precision intervention of each analysis pass to the next: the pointer analysis in each pass is filtered by the points-to results produced by the previous passes on the fly, likely preventing the computation of more spurious value flows during the analysis. As a result, Relay-o2 is more precise than simply intersecting the final points-to sets after individually running each guided pointer analysis without such precision intervention. We next use an example, in Figure 4, to illustrate the benefit of precision intervention. The left side of Figure 4 is an example code snippet, where the value of a is stored into x.f and p.f (through lines 1 and 4), and b loads values from y.f and q.f (through lines 2 and 5). The table on the right side contains the points-to sets of the variables in the code snippet under different pointer analyses. Column CI gives the points-to sets produced by a context-insensitive pointer analysis, and we consider two selective context-sensitive pointer analyses, S1 and S2, which identify o2 and o4 as spurious objects for variables y and q, respectively. If we simply intersect the points-to results of S1 and S2, then we can eliminate spurious objects for y and q, as shown in column S1&S2. However, b still points to the spurious object o1 under S1&S2, as o1 can flow to b through either line 2 or line 5: both S1 and S2 consider o1 to be valid for b, for different reasons each. Among the analyses in the table, only Relay can block the two flows (through lines 2 and 5) simultaneously. In Relay, with S1 as pass 1 and S2 as pass 2, S2 identifies o4 as spurious for q so that the flow through line 5 can be blocked (as p and q are not aliased). Further benefiting from the precision intervention, the flow through line 2 is also blocked as pass 2 also incorporates the result from pass 1, i.e., o2 is identified as spurious for y by S1 (so that x and y are not aliased). As a result, o1 is identified as spurious for b in pass 2. (The points-to set of b is empty, based on the fragment and example values shown, but other program statements could give it values, orthogonally to the exampleÐe.g., by setting y.f or q.f.) In the overall Relay mechanism, Relay-o1 and Relay-o2 apply independently to analysis passes and are mixed as necessary for scalability. The two Relay variants can be seen as the best and worst precision options, and there is design space between them for making precision and efficiency trade-offs. In our design, precision is the first priority, so for every Pass𝑖 , we first attempt Relay-o1. If Relay-o1 is not scalable, we select Relay-o2 (which scales if the base approach does), and repeat the same scheme (i.e., first trying Relay-o1, then Relay-o2 if it fails) to Pass𝑖+1, etc. until all 𝑛 passes complete (assuming there are 𝑛 input selective approaches). The running order for different passes of Relay does not affect the soundness and precision guarantees of Relay (including both Relay-o1 and Relay-o2) and also the scalability potential of Relay-o2 (as proved and discussed in Section 4.4). In other words, in practice, regardless of the pass ordering, users can rely on Relay to produce an analysis that is more precise than the one (say Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021
147:10 Tian Tan,Yue Li,Xiaoxing Ma,Chang Xu,and Yannis Smaragdakis A)guided by any input selective approach,and can expect Relay-o2 to scale as long as A scales. However,the pass ordering may affect the final precision of Relay(although experimental results show that such effect is very minor).For example,assume two variables v1 and v2 point to {ol,02} and (o3,04)respectively under CI.Pass 1 identifies o2 as a spurious object for v1,and pass 2 can identify o4 as a false object if and only if it is aware that v1 does not point to o2 with the help of the result from pass 1.But when we swap the order of pass 1 and pass 2,v2 may still point to o4 (actually a spurious object)at the conclusion of all passes,resulting in different precision.We will discuss the pass ordering issue using the experimental results in Section 6.2. 4 FORMALISM AND PROPERTIES We formalize the Unity-Relay framework,prove its key properties,and discuss its important features.Specifically,we detail how to unify various context-sensitivity approaches with different context elements and lengths,and how to guide selective context-sensitive pointer analysis in each pass to filter spurious objects using the results from the previous pass,while achieving soundness and precision guarantees. 4.1 Domain and Notations Our notation is shown in Figure 5.For formalization purposes,two domains of context elements, K and B are introduced.The former denotes the kinds of context elements,i.e.,obj for object sensitivity,type for type-sensitivity and call for call-site sensitivity(identified by their locations in the program)while the latter denotes the concrete context elements forming each context c E C. For instance,in object sensitivity,c consists of a list of heap objects (allocation sites)o;EB.Each context-sensitivity variant cs E CS can be expressed as a pair of context length l and a kind of context element k,denoted by lk,e.g.,2obj or 1type,hence we have CS =NxK,where N denotes the natural numbers. pt and fpt denote the analysis results.pt(c,x)represents the points-to set of variable x under context c.fpt(c,oi,f)represents the points-to set of instance field f of an abstract object o;under context c.The objects in the points-to sets of pt and fot are also qualified by contexts,i.e.,heap contexts. gencs denotes the function that generates contexts according to the length and kind of context elements of cs,which will be further explained in Section 4.2.contextsOfmaps a method to a set of contexts under which the method is analyzed.selectCS is the core of selective context sensitivity, which selects a proper context-sensitivity variant cs for each method.We use S to denote a set of selective approaches,indexed by subscript:Si.If a pointer analysis PA is guided by a selective approach Si,we refer to it as PA-Si. 4.2 Selective Context-Sensitive Pointer Analysis We present a formulation of general selective context-sensitive pointer analysis in Figure 6,which covers five basic statements:object allocation ([NEw)),local assignment ([AssIGN]),field load ([LoAD)) and store ([STORE]),and method call ([CALL]).Similar to [Sridharan et al.2013;Tan et al.2016],we elide the rules for static members and arrays,as the former is straightforward and the latter can be modeled as load and store to an artificial field of each array.In Figure 6,the premises within boxes are only related to Unity-Relay,and thus can be ignored for now. For each rule in Figure 6,m denotes the method containing the corresponding statement(in blue color).Here we only explain [NEw]and [CALL]as the other three rules are standard. In context-sensitive pointer analysis,an abstract object is typically represented by a heap context and the label of the allocation site,as in our [NEw]rule.For simplicity,we directly use the context of the method containing the allocation site(i.e.,c)as the heap context for the created abstract Proc.ACM Program.Lang.,Vol.5.No.OOPSLA,Article 147.Publication date:October 2021
147:10 Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis A) guided by any input selective approach, and can expect Relay-o2 to scale as long as A scales. However, the pass ordering may affect the final precision of Relay (although experimental results show that such effect is very minor). For example, assume two variables v1 and v2 point to {o1,o2} and {o3,o4} respectively under CI. Pass 1 identifies o2 as a spurious object for v1, and pass 2 can identify o4 as a false object if and only if it is aware that v1 does not point to o2 with the help of the result from pass 1. But when we swap the order of pass 1 and pass 2, v2 may still point to o4 (actually a spurious object) at the conclusion of all passes, resulting in different precision. We will discuss the pass ordering issue using the experimental results in Section 6.2. 4 FORMALISM AND PROPERTIES We formalize the Unity-Relay framework, prove its key properties, and discuss its important features. Specifically, we detail how to unify various context-sensitivity approaches with different context elements and lengths, and how to guide selective context-sensitive pointer analysis in each pass to filter spurious objects using the results from the previous pass, while achieving soundness and precision guarantees. 4.1 Domain and Notations Our notation is shown in Figure 5. For formalization purposes, two domains of context elements, K and E are introduced. The former denotes the kinds of context elements, i.e., obj for object sensitivity, type for type-sensitivity and call for call-site sensitivity (identified by their locations in the program) while the latter denotes the concrete context elements forming each context 𝑐 ∈ C. For instance, in object sensitivity, 𝑐 consists of a list of heap objects (allocation sites) 𝑜𝑖 ∈ E . Each context-sensitivity variant 𝑐𝑠 ∈ CS can be expressed as a pair of context length 𝑙 and a kind of context element 𝑘, denoted by 𝑙𝑘, e.g., 2obj or 1type, hence we have CS = N × K, where N denotes the natural numbers. pt and fpt denote the analysis results. pt(𝑐, 𝑥) represents the points-to set of variable 𝑥 under context 𝑐. fpt(𝑐, 𝑜𝑖 , 𝑓 ) represents the points-to set of instance field 𝑓 of an abstract object 𝑜𝑖 under context 𝑐. The objects in the points-to sets of pt and fpt are also qualified by contexts, i.e., heap contexts. gen𝑐𝑠 denotes the function that generates contexts according to the length and kind of context elements of 𝑐𝑠, which will be further explained in Section 4.2. contextsOf maps a method to a set of contexts under which the method is analyzed. selectCS is the core of selective context sensitivity, which selects a proper context-sensitivity variant 𝑐𝑠 for each method. We use 𝑆 to denote a set of selective approaches, indexed by subscript: 𝑆𝑖 . If a pointer analysis PA is guided by a selective approach 𝑆𝑖 , we refer to it as PA-𝑆𝑖 . 4.2 Selective Context-Sensitive Pointer Analysis We present a formulation of general selective context-sensitive pointer analysis in Figure 6, which covers five basic statements: object allocation ([New]), local assignment ([Assign]), field load ([Load]) and store ([Store]), and method call ([Call]). Similar to [Sridharan et al. 2013; Tan et al. 2016], we elide the rules for static members and arrays, as the former is straightforward and the latter can be modeled as load and store to an artificial field of each array. In Figure 6, the premises within boxes are only related to Unity-Relay, and thus can be ignored for now. For each rule in Figure 6, 𝑚 denotes the method containing the corresponding statement (in blue color). Here we only explain [New] and [Call] as the other three rules are standard. In context-sensitive pointer analysis, an abstract object is typically represented by a heap context and the label of the allocation site, as in our [New] rule. For simplicity, we directly use the context of the method containing the allocation site (i.e., 𝑐) as the heap context for the created abstract Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021