Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity Yue Li Tian Tan Anders Moller Yannis Smaragdakis Aarhus University Aarhus University Aarhus University University of Athens yueli@cs.au.dk tiantan@cs.au.dk amoeller@cs.au.dk smaragd@di.uoa.gr ABSTRACT timeout(>10800) ▣2obj▣2type▣Cl Context-sensitivity is important in pointer analysis to ensure high precision,but existing techniques suffer from unpredictable scala- bility.Many variants of context-sensitivity exist,and it is difficult to choose one that leads to reasonable analysis time and obtains 5000 high precision,without running the analysis multiple times. 4000 We present the SCALER framework that addresses this problem SCALER efficiently estimates the amount of points-to information 3000 that would be needed to analyze each method with different variants of context-sensitivity.It then selects an appropriate variant for 2000 each method so that the total amount of points-to information is bounded,while utilizing the available space to maximize precision. Our experimental results demonstrate that SCALER achieves pre- 份导时8济付 dictable scalability for all the evaluated programs(e.g.,speedups can reach 10x for 2-object-sensitivity),while providing a precision dbugs pmd soot that matches or even exceeds that of the best alternative techniques. Figure 1:Comparison of running time(seconds)of 2-object sensitivity,2-type sensitivity,and context-insensitive analy- CCS CONCEPTS ses.The y-axis is truncated to 7000 seconds for readability Theory of computation-Program analysis; and all truncated cases reach the time budget,10800 seconds. KEYWORDS challenging to produce precise analysis results without sacrificing scalability [12.30.35].Thus,for decades,researchers have contin- static analysis,points-to analysis,Java ued to develop sophisticated pointer analysis techniques [2,14- ACM Reference Format: 16,18,22,24,25,32,33,37,38]. Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis.2018.Scalability- One of the key mechanisms for achieving high analysis precision First Pointer Analysis with Self-Tuning Context-Sensitivity.In Proceedings is context sensitivity,which allows each program method to be of the 26th ACM Joint European Software Engineering Conference and Sympo- analyzed differently according to the context it is used in [17]. sium on the Foundations of Software Engineering (ESEC/FSE'18),November Context sensitivity has different variants,depending on the kind of 4-9.2018,Lake Buena Vista,FL,USA.ACM,New York,NY,USA,12 pages. context information used.For Java programs,object-sensitivity [25] https:/doi.org/10.1145/3236024.3236041 and type-sensitivity [32]have proven to be quite effective.The 1 INTRODUCTION former is strictly more precise but less efficient than the latter [15, 37].However,with any available variant,although the analysis can Pointer analysis is a family of static analysis techniques that provide gain in precision,scalability is known to be unpredictable [33,38]. a foundation for many other analyses and software engineering in the sense that programs very similar in size and other complexity tasks,such as program slicing [36,39].reflection analysis [19,31], metrics may have completely different scalability profiles. bug detection[13,26],security analysis [1,23],program verifica- Figure 1 shows time spent analyzing 10 real-world Java pro- tion [8,27],and program debugging and comprehension [5,21]. grams'under 2-object-sensitivity (2obj)[25],which is among The goal of pointer analysis is to statically compute a set of objects the most precise variants of context sensitivity,2-type-sensitivity (abstracted as their allocation sites)that a program variable may (2type)[32],and context-insensitivity(CI).We observe that point to during run time.Although stating this goal is simple,it is 2obj is not scalable for 6 out of 10 programs within 3 hours Permission to make digital or hard copies of all or part of this work for personal or while it can finish running for 3 programs within 5 minutes; classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation program size is far from a reliable predictor-for example,jython on the first page.Copyrights for components of this work owned by others than the (12 718 methods)is smaller than briss(26 582 methods),how author(s)must be honored.Abstracting with credit is permitted To copy otherwise.or republish,to post on servers or to redistribute to lists,requires prior specific permission ever,2type is not scalable for the former but scalable for the and/or a fee.Request permissions from permissions@acm.org. latter; ESEC/FSE '18.November 4-9,2018,Lake Buena Vista,FL.USA 2018 Copyright held by the owner/author(s).Publication rights licensed to ACM ACM1SBN978-1-4503-5573-5/18/11..$15.00 IThese are all popular open-source applications,including the heaviest(jython and ttps:/doi.org/10.1145/3236024.3236041 eclipse)of the DaCapo benchmarks [3]. 129
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity Yue Li Aarhus University yueli@cs.au.dk Tian Tan Aarhus University tiantan@cs.au.dk Anders Mùller Aarhus University amoeller@cs.au.dk Yannis Smaragdakis University of Athens smaragd@di.uoa.gr ABSTRACT Context-sensitivity is important in pointer analysis to ensure high precision, but existing techniques suffer from unpredictable scalability. Many variants of context-sensitivity exist, and it is difficult to choose one that leads to reasonable analysis time and obtains high precision, without running the analysis multiple times. We present the Scaler framework that addresses this problem. Scaler efficiently estimates the amount of points-to information that would be needed to analyze each method with different variants of context-sensitivity. It then selects an appropriate variant for each method so that the total amount of points-to information is bounded, while utilizing the available space to maximize precision. Our experimental results demonstrate that Scaler achieves predictable scalability for all the evaluated programs (e.g., speedups can reach 10x for 2-object-sensitivity), while providing a precision that matches or even exceeds that of the best alternative techniques. CCS CONCEPTS · Theory of computation → Program analysis; KEYWORDS static analysis, points-to analysis, Java ACM Reference Format: Yue Li, Tian Tan, Anders Mùller, and Yannis Smaragdakis. 2018. ScalabilityFirst Pointer Analysis with Self-Tuning Context-Sensitivity. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4ś9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3236024.3236041 1 INTRODUCTION Pointer analysis is a family of static analysis techniques that provide a foundation for many other analyses and software engineering tasks, such as program slicing [36, 39], reflection analysis [19, 31], bug detection [13, 26], security analysis [1, 23], program verification [8, 27], and program debugging and comprehension [5, 21]. The goal of pointer analysis is to statically compute a set of objects (abstracted as their allocation sites) that a program variable may point to during run time. Although stating this goal is simple, it is Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ESEC/FSE ’18, November 4ś9, 2018, Lake Buena Vista, FL, USA © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-5573-5/18/11. . . $15.00 https://doi.org/10.1145/3236024.3236041 285 2458 53 93 5374 95 960 289 2950 45 54 1203 228 48 135 49 117 112 22 22 67 994 0 1000 2000 3000 4000 5000 6000 2obj 2type CI timeout (>10800) Figure 1: Comparison of running time (seconds) of 2-object sensitivity, 2-type sensitivity, and context-insensitive analyses. The y-axis is truncated to 7000 seconds for readability, and all truncated cases reach the time budget, 10800 seconds. challenging to produce precise analysis results without sacrificing scalability [12, 30, 35]. Thus, for decades, researchers have continued to develop sophisticated pointer analysis techniques [2, 14ś 16, 18, 22, 24, 25, 32, 33, 37, 38]. One of the key mechanisms for achieving high analysis precision is context sensitivity, which allows each program method to be analyzed differently according to the context it is used in [17]. Context sensitivity has different variants, depending on the kind of context information used. For Java programs, object-sensitivity [25] and type-sensitivity [32] have proven to be quite effective. The former is strictly more precise but less efficient than the latter [15, 37]. However, with any available variant, although the analysis can gain in precision, scalability is known to be unpredictable [33, 38], in the sense that programs very similar in size and other complexity metrics may have completely different scalability profiles. Figure 1 shows time spent analyzing 10 real-world Java programs1 under 2-object-sensitivity (2obj) [25], which is among the most precise variants of context sensitivity, 2-type-sensitivity (2type) [32], and context-insensitivity (CI). We observe that • 2obj is not scalable for 6 out of 10 programs within 3 hours, while it can finish running for 3 programs within 5 minutes; • program size is far from a reliable predictorÐfor example, jython (12 718 methods) is smaller than briss (26 582 methods), however, 2type is not scalable for the former but scalable for the latter; 1These are all popular open-source applications, including the heaviest (jython and eclipse) of the DaCapo benchmarks [3]. 129
ESEC/FSE'18.November 4-9,2018,Lake Buena Vista,FL,USA Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis context insensitivity exhibits very good and stable scalability We propose the SCALER pointer analysis framework that,given for all programs (but it is much less precise). a total scalability threshold,automatically selects an appropriate Scalability of context-sensitivity is not only unpredictable,but also variant of context sensitivity for each method in the program tends to be bimodal [32]:when analyzing a given program with being analyzed(Section 3).The approach relies on the concept of scalability-critical methods that helps explain why context- existing context-sensitivity techniques,usually the analysis either sensitive pointer analysis is sometimes unscalable. terminates relatively quickly(we say that the analysis is scalable in this case),or it blows up and does not terminate even with very We present a novel technique to efficiently estimate the amount high time limits of points-to information that would be needed to analyze each Consider a scenario where the task is to analyze a set of pro- method with different variants of context sensitivity,using ob- grams within a given time budget,for example as part of a large- ject allocation graphs(Section 4). scale security analysis.Should one pick a precise context-sensitive We describe our open-source implementation(Section 5). pointer analysis and take the risk that it may not scale for sev- We conduct extensive experiments by comparing SCALER with eral programs,or pick a scalable,context-insensitive,analysis that state-of-the-art pointer analyses (Section 6).The evaluation sacrifices precision for all programs?One answer is introspective demonstrates that SCALER achieves predictable scalability for all analysis [33],which tunes context sensitivity according to the re- the evaluated programs in one shot,while providing a precision sults of a first-pass,context-insensitive analysis.However,that that matches or even exceeds that of the best alternatives.As approach is,as the authors put it,an "if all else fails"analysis that an example,the jython benchmark from DaCapo is known to should only be used if traditional context-sensitive algorithms fail. cause problems for context-sensitive pointer analysis [15,38]: Thus,computing resources are wasted,because one has to wait 1type is the most precise conventional pointer analysis that is until the context-sensitive analysis reaches a timeout,before the scalable for jython,taking around 33 minutes on an ordinary introspective analysis is run as a fallback. PC,while 2obj and 2type do not complete within 3 hours.In In this paper,we present a pointer analysis framework named comparison,SCALER achieves significantly better precision than SCALER that has the following desirable properties. both 1type and the state-of-the-art introspective analysis [33] (i)Users only have to apply SCALER once for each program,without and takes under 8 minutes. a need to experiment with different variants of context sensitiv- 2 BACKGROUND ity. (ii)SCALER prioritizes scalability.Given a reasonable time budget, Pointer analysis typically consists of computing the points-to sets SCALER can be expected to finish analyzing any given program of variables in the program text.2 Points-to sets form a relation P within the budget.More specifically,if a context-insensitive between variables and abstract objects,i.e.,a subset of pointer analysis is scalable for P,users can confidently expect Var x Obi that SCALER will also scale for P. with Var being the set of program variables and Obj the set of (iii)SCALER is able to achieve precision comparable to,or better abstract objects.Abstract objects are typically represented as al- than that of the most precise context-sensitivity variant that is location sites,i.e.,instructions that allocate objects (e.g.,new in scalable for P.That is,the user does not need to manually pick a Java)[6].An allocation site stands for all the run-time objects it can context-sensitivity variant a priori,but can expect to effectively possibly allocate.This representation by nature loses significant match the best option that one could have picked,on a single precision.A program variable corresponds to many run-time incar- analysis run.Experimentally,this precision is much greater than nations during program execution-not just for executions under prior introspective analyses [33]. different inputs but also for different instances of local variables The key insight of SCALER is that the size of context-sensitive during distinct activations of the same method. points-to information (or,equivalently,the total memory consumed) To combat the loss of precision,context-sensitive pointer anal- is the critical factor that determines whether analysis of a given pro- ysis enhances the computed relations to maintain a more precise gram using a particular context-sensitivity variant is scalable or not, abstraction of variables and objects [30,35].Variables get quali- and that it is possible to estimate the size of context-sensitive points- fied with contexts,to distinguish their different incarnations.The to information produced by different kinds of context sensitivity analysis effectively computes a subset of without conducting the context-sensitive analysis.This estimate Ctx x Var x Obj can be computed using a cheap,context-insensitive pre-analysis, where Ctx is a set of contexts for variables.3 This precision is valu- by leveraging the notion of object allocation graphs [37]. able for intermediate analysis computations,even though the final SCALER is parameterized by a number called the total scalability analysis results get collapsed in the easily-understandable Var x Obj threshold (TST),which can be selected based on the available mem- relation:distinguishing the behavior of a much-used variable or ory.For a given TST,SCALER automatically adapts different kinds of context sensitivity (e.g.,object-sensitivity,type-sensitivity)and con- 2A second formuation is that of alias analysis which computes the pairs of variables text insensitivity,at the level of individual methods in the program, or expressions that can be aliased.For most published algorithms,computing points-to sets and computing alias sets are equivalent problems:one can be mapped to the other to stay within the desired bound and thereby retain scalability. without affecting the algorithm's fundamental precision or scalability. while utilizing the available space to maximize precision. For simplicity of exposition we ignore context sensitivity for abstract objects(aka heap cloning)which also qualifies Obj with a set of heap contexts,HCtr,in much the In summary,this paper makes the following contributions: same way as qualifying variables. 130
ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis • context insensitivity exhibits very good and stable scalability for all programs (but it is much less precise). Scalability of context-sensitivity is not only unpredictable, but also tends to be bimodal [32]: when analyzing a given program with existing context-sensitivity techniques, usually the analysis either terminates relatively quickly (we say that the analysis is scalable in this case), or it blows up and does not terminate even with very high time limits. Consider a scenario where the task is to analyze a set of programs within a given time budget, for example as part of a largescale security analysis. Should one pick a precise context-sensitive pointer analysis and take the risk that it may not scale for several programs, or pick a scalable, context-insensitive, analysis that sacrifices precision for all programs? One answer is introspective analysis [33], which tunes context sensitivity according to the results of a first-pass, context-insensitive analysis. However, that approach is, as the authors put it, an łif all else failsž analysis that should only be used if traditional context-sensitive algorithms fail. Thus, computing resources are wasted, because one has to wait until the context-sensitive analysis reaches a timeout, before the introspective analysis is run as a fallback. In this paper, we present a pointer analysis framework named Scaler that has the following desirable properties. (i) Users only have to apply Scaler once for each program, without a need to experiment with different variants of context sensitivity. (ii) Scaler prioritizes scalability. Given a reasonable time budget, Scaler can be expected to finish analyzing any given program P within the budget. More specifically, if a context-insensitive pointer analysis is scalable for P, users can confidently expect that Scaler will also scale for P. (iii) Scaler is able to achieve precision comparable to, or better than that of the most precise context-sensitivity variant that is scalable for P. That is, the user does not need to manually pick a context-sensitivity variant a priori, but can expect to effectively match the best option that one could have picked, on a single analysis run. Experimentally, this precision is much greater than prior introspective analyses [33]. The key insight of Scaler is that the size of context-sensitive points-to information (or, equivalently, the total memory consumed) is the critical factor that determines whether analysis of a given program using a particular context-sensitivity variant is scalable or not, and that it is possible to estimate the size of context-sensitive pointsto information produced by different kinds of context sensitivity without conducting the context-sensitive analysis. This estimate can be computed using a cheap, context-insensitive pre-analysis, by leveraging the notion of object allocation graphs [37]. Scaler is parameterized by a number called the total scalability threshold (TST), which can be selected based on the available memory. For a given TST, Scaler automatically adapts different kinds of context sensitivity (e.g., object-sensitivity, type-sensitivity) and context insensitivity, at the level of individual methods in the program, to stay within the desired bound and thereby retain scalability, while utilizing the available space to maximize precision. In summary, this paper makes the following contributions: • We propose the Scaler pointer analysis framework that, given a total scalability threshold, automatically selects an appropriate variant of context sensitivity for each method in the program being analyzed (Section 3). The approach relies on the concept of scalability-critical methods that helps explain why contextsensitive pointer analysis is sometimes unscalable. • We present a novel technique to efficiently estimate the amount of points-to information that would be needed to analyze each method with different variants of context sensitivity, using object allocation graphs (Section 4). • We describe our open-source implementation (Section 5). • We conduct extensive experiments by comparing Scaler with state-of-the-art pointer analyses (Section 6). The evaluation demonstrates that Scaler achieves predictable scalability for all the evaluated programs in one shot, while providing a precision that matches or even exceeds that of the best alternatives. As an example, the jython benchmark from DaCapo is known to cause problems for context-sensitive pointer analysis [15, 38]: 1type is the most precise conventional pointer analysis that is scalable for jython, taking around 33 minutes on an ordinary PC, while 2obj and 2type do not complete within 3 hours. In comparison, Scaler achieves significantly better precision than both 1type and the state-of-the-art introspective analysis [33] and takes under 8 minutes. 2 BACKGROUND Pointer analysis typically consists of computing the points-to sets of variables in the program text.2 Points-to sets form a relation between variables and abstract objects, i.e., a subset of Var × Obj with Var being the set of program variables and Obj the set of abstract objects. Abstract objects are typically represented as allocation sites, i.e., instructions that allocate objects (e.g., new in Java) [6]. An allocation site stands for all the run-time objects it can possibly allocate. This representation by nature loses significant precision. A program variable corresponds to many run-time incarnations during program executionÐnot just for executions under different inputs but also for different instances of local variables during distinct activations of the same method. To combat the loss of precision, context-sensitive pointer analysis enhances the computed relations to maintain a more precise abstraction of variables and objects [30, 35]. Variables get qualified with contexts, to distinguish their different incarnations. The analysis effectively computes a subset of Ctx × Var × Obj where Ctx is a set of contexts for variables.3 This precision is valuable for intermediate analysis computations, even though the final analysis results get collapsed in the easily-understandable Var ×Obj relation: distinguishing the behavior of a much-used variable or 2A second formulation is that of alias analysis, which computes the pairs of variables or expressions that can be aliased. For most published algorithms, computing points-to sets and computing alias sets are equivalent problems: one can be mapped to the other without affecting the algorithm’s fundamental precision or scalability. 3 For simplicity of exposition, we ignore context sensitivity for abstract objects (a.k.a., heap cloning) which also qualifies Obj with a set of heap contexts, HCtx, in much the same way as qualifying variables. 130
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE'18.November 4-9,2018,Lake Buena Vista.FL,USA object (e.g.,in a library method)according to the context in which 3.1.1 Scalability-Critical Methods.The scalability of a context- it is used helps the precision at all use sites of the common code. sensitive pointer analysis is closely linked to the amount of points-to Two observations on context sensitivity will be important in information it produces,i.e.,the size ofrelation pt'Ctxx VarxObj. our subsequent discussion.First,contexts qualify variables but are The challenge that ScALER addresses is to closely upper bound the generally chosen per method.The main use of context sensitivity size of pt'without performing a context-sensitive analysis,given is to distinguish different activations of the same method(e.g. only the result,pt Var x Obj,of a context-insensitive analysis. "stack frames"for the same procedure,in traditional stack-based Prior approaches have made similar attempts to obtain scalability, languages),which create fresh instances of all local variables at run but without a concrete model to predict scalability effectively,as time.Therefore.all local variables of the same method have the explained in Section 7. same set of contexts By then distinguishing which parts of the program can contribute Second,the worst-case complexity of a context-sensitive pointer disproportionately to the total size of pt',SCALER can adjust its analysis is much higher than that of a context-insensitive one:the context sensitivity at different methods,to yield higher precision number of computed points-to facts,and hence the complexity of only when this does not endanger scalability. the analysis,increases in the worst case multiplicatively by the To achieve this,we introduce the concept of scalability-critical size of the set Ctx.However,in the common case,the precision of methods.These are methods that are likely to yield very large context sensitivity often compensates,reducing the analysis com- amounts of context-sensitive points-to information.Whether a plexity:different contexts divide the points-to sets of variables into method is scalability-critical depends on the context sensitivity in non-overlapping subsets.In the ideal case,if a context-insensitive question.For example,a method may be scalability-critical under analysis computes a points-to relation pt Var x Obj,a context- 2obj(a 2-object-sensitive analysis)but not under 2type(a 2-type- sensitive analysis would compute a relation pt'Ctxx Varx Obj sensitive analysis). that is not greater in cardinality than Var x Obj.The extra precision Definition 3.1 (Scalability-Critical Methods).A method m is of the Ctx information would be enough to split the original points- scalability-critical (or,for brevity,just critical)under context sensi- to sets into disjoint subsets.This observation also hints at why tivity c,if the value of the product #ctx#ptsm exceeds a given context sensitivity has unpredictable scalability:When it works scalability threshold STp,where well to maintain precision,its cost is modest to none.When it fails to do so,however,the cost is orders-of-magnitude higher. .#ctx is an estimate of the number of contexts for m if using The actual definitions of the set Ctx can vary widely.Three context-sensitivity variant c,computed using a fast context- main categories are call-site sensitivity,object sensitivity,and type insensitive analysis, sensitivity: #ptsm is the sum of the sizes of the points-to sets for all local variables in m,computed using the same context-insensitive Call-site sensitivity has Ctx be a sequence of call sites,i.e., analysis,and instructions that call the method in which the qualified variable STp is a value that can be given by users(Section 3.1.2)or has been declared. computed automatically(Section 3.2)for program p. .Object sensitivity has Ctx be a sequence of abstract objects:they are the receiver object of the method containing the qualified That is,we upper-bound the potential contribution of a method variable,the receiver object of the method that allocated the to the context-sensitive points-to set(pt'Ctxx Varx Obj)by com previous receiver object,etc. puting the number of potential combinations of possible abstract Type sensitivity keeps the same information as object sensitiv- objects(i.e.,a subset of Obj)and contexts (i.e.,a subset of Ctx)that ity,but objects used as contexts are collapsed not per allocation may pertain to the method.The individual numbers are guaranteed instruction but per class that contains it. to be upper bounds,since they are computed by a less precise(and, We refer the reader to surveys that discuss the options in full de- thus,conservatively over-approximate)context-insensitive analy- tail [30,351. sis.Accordingly,their product is guaranteed to be an upper bound of the number of potential combinations. 3 THE SCALER FRAMEWORK The intuition behind Definition 3.1 is that a method m may cause scalability problems for different reasons:(1)m is being analyzed in In this section,we present an overview of the SCALER approach. too many contexts,(2)too much points-to information is computed We first describe the idea of scalability thresholds,which is the within m,or(3)although the individual numbers of contexts and key to predictable scalability (Section 3.1).Next,we explain how points-to facts for m are not very large,their product is.For this SCALER automatically chooses a suitable scalability threshold for reason,the product of the estimates #ctx and #ptsm gives an each program,based on a single parameter called the total scalability indication of potential scalability problems. threshold that depends only on the available space for storing points- It makes sense to perform this reasoning at the method level, to information(Section 3.2).These ideas are then collected in the since(as discussed in Section 2)decisions on context sensitivity are overall SCALER framework(Section 3.3). made per-method,i.e.,for all local variables of a method together. The number #ptsm can be obtained directly from the context- 3.1 Scalability Thresholds insensitive points-to relation:#ptsm=vemIpt(v,).Comput- We begin by introducing the concept of scalability-critical meth- ing #ctx is less straightforward.This is one of the technical ods(Section 3.1.1),and we then leverage this concept to address novelties of the approach,and we postpone its discussion until unscalability (Section 3.1.2). Section 4. 131
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA object (e.g., in a library method) according to the context in which it is used helps the precision at all use sites of the common code. Two observations on context sensitivity will be important in our subsequent discussion. First, contexts qualify variables but are generally chosen per method. The main use of context sensitivity is to distinguish different activations of the same method (e.g., łstack framesž for the same procedure, in traditional stack-based languages), which create fresh instances of all local variables at run time. Therefore, all local variables of the same method have the same set of contexts. Second, the worst-case complexity of a context-sensitive pointer analysis is much higher than that of a context-insensitive one: the number of computed points-to facts, and hence the complexity of the analysis, increases in the worst case multiplicatively by the size of the set Ctx. However, in the common case, the precision of context sensitivity often compensates, reducing the analysis complexity: different contexts divide the points-to sets of variables into non-overlapping subsets. In the ideal case, if a context-insensitive analysis computes a points-to relation pt ⊆ Var × Obj, a contextsensitive analysis would compute a relation pt′ ⊆ Ctx × Var × Obj that is not greater in cardinality than Var ×Obj. The extra precision of the Ctx information would be enough to split the original pointsto sets into disjoint subsets. This observation also hints at why context sensitivity has unpredictable scalability: When it works well to maintain precision, its cost is modest to none. When it fails to do so, however, the cost is orders-of-magnitude higher. The actual definitions of the set Ctx can vary widely. Three main categories are call-site sensitivity, object sensitivity, and type sensitivity: • Call-site sensitivity has Ctx be a sequence of call sites, i.e., instructions that call the method in which the qualified variable has been declared. • Object sensitivity has Ctx be a sequence of abstract objects: they are the receiver object of the method containing the qualified variable, the receiver object of the method that allocated the previous receiver object, etc. • Type sensitivity keeps the same information as object sensitivity, but objects used as contexts are collapsed not per allocation instruction but per class that contains it. We refer the reader to surveys that discuss the options in full detail [30, 35]. 3 THE SCALER FRAMEWORK In this section, we present an overview of the Scaler approach. We first describe the idea of scalability thresholds, which is the key to predictable scalability (Section 3.1). Next, we explain how Scaler automatically chooses a suitable scalability threshold for each program, based on a single parameter called the total scalability threshold that depends only on the available space for storing pointsto information (Section 3.2). These ideas are then collected in the overall Scaler framework (Section 3.3). 3.1 Scalability Thresholds We begin by introducing the concept of scalability-critical methods (Section 3.1.1), and we then leverage this concept to address unscalability (Section 3.1.2). 3.1.1 Scalability-Critical Methods. The scalability of a contextsensitive pointer analysis is closely linked to the amount of points-to information it produces, i.e., the size of relation pt′ ⊆ Ctx×Var×Obj. The challenge that Scaler addresses is to closely upper bound the size of pt′ without performing a context-sensitive analysis, given only the result, pt ⊆ Var × Obj, of a context-insensitive analysis. Prior approaches have made similar attempts to obtain scalability, but without a concrete model to predict scalability effectively, as explained in Section 7. By then distinguishing which parts of the program can contribute disproportionately to the total size of pt′ , Scaler can adjust its context sensitivity at different methods, to yield higher precision only when this does not endanger scalability. To achieve this, we introduce the concept of scalability-critical methods. These are methods that are likely to yield very large amounts of context-sensitive points-to information. Whether a method is scalability-critical depends on the context sensitivity in question. For example, a method may be scalability-critical under 2obj (a 2-object-sensitive analysis) but not under 2type (a 2-typesensitive analysis). Definition 3.1 (Scalability-Critical Methods). A method m is scalability-critical (or, for brevity, just critical) under context sensitivity c, if the value of the product #ctxc m·#ptsm exceeds a given scalability threshold STp , where • #ctxc m is an estimate of the number of contexts for m if using context-sensitivity variant c, computed using a fast contextinsensitive analysis, • #ptsm is the sum of the sizes of the points-to sets for all local variables in m, computed using the same context-insensitive analysis, and • STp is a value that can be given by users (Section 3.1.2) or computed automatically (Section 3.2) for program p. That is, we upper-bound the potential contribution of a method to the context-sensitive points-to set (pt′ ⊆ Ctx×Var×Obj) by computing the number of potential combinations of possible abstract objects (i.e., a subset of Obj) and contexts (i.e., a subset of Ctx) that may pertain to the method. The individual numbers are guaranteed to be upper bounds, since they are computed by a less precise (and, thus, conservatively over-approximate) context-insensitive analysis. Accordingly, their product is guaranteed to be an upper bound of the number of potential combinations. The intuition behind Definition 3.1 is that a methodm may cause scalability problems for different reasons: (1) m is being analyzed in too many contexts, (2) too much points-to information is computed within m, or (3) although the individual numbers of contexts and points-to facts for m are not very large, their product is. For this reason, the product of the estimates #ctxc m and #ptsm gives an indication of potential scalability problems. It makes sense to perform this reasoning at the method level, since (as discussed in Section 2) decisions on context sensitivity are made per-method, i.e., for all local variables of a method together. The number #ptsm can be obtained directly from the contextinsensitive points-to relation: #ptsm = Í v ∈m |pt(v, _)|. Computing #ctxc m is less straightforward. This is one of the technical novelties of the approach, and we postpone its discussion until Section 4. 131
ESEC/FSE'18,November 4-9,2018,Lake Buena Vista,FL,USA Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis #ctx#ptsm c=2obj ∑m#ctx号,#ptsn 1000000 -c=2type ctx1,p1→obj1 c=1type ct2,p2→obj4 cbx3,p3-→obj9 Program 2 100000 #ct乐·#pts STp ctxk,Pm→ob TST is Memor 10000 Related Figure 3:Context-sensitive points-to information and TST. 0 points-to relation exhausts the available RAM.The scalability of method methodaooo methodaooo 卡1type2hype中 2obj meth pointer analysis is therefore related to the features of the analysis environment,most importantly the memory size. Figure 2:Choosing a context-sensitivity strategy c for the We now show how SCALER determines a scalability threshold for methods of program p based on a scalability threshold,STp. a given program,based on the concept of a total scalability threshold that represents the analysis capacity and can be selected based on, 3.1.2 Choosing Context Sensitivity Strategies.Given a scalability for example,the available memory,independently of the program threshold STp,ScALER classifies each method m as scalability-critical being analyzed. or not,relative to a context sensitivity variant.It then ensures scal- ability for critical methods by selecting a cheaper (less precise) Definition 3.2 (Total Scalability Threshold).A total scalability context sensitivity variant for them.As a result,different methods threshold(TST)is a number that satisfies the inequality E(STp)< will be analyzed with different context sensitivity variants. TST where E(STp)is the estimated cost for the given value of STp. Assume there is a set of context sensitivity variants computed as the sum of the sizes of the points-to relations(cf.Defi- nition 3.1)for all the methods in the program: C={c1,c2,,cn} where ci is typically more precise(but less efficient)than ci-1.For 8(STp)=∑m#ctxm leeC(,sTp).ptSm example,this set could be C={1type,2type,2obj). Figure 2 illustrates(imaginary)distributions of #ctx#ptsm Figure 3 illustrates the concept.The complexity of a pointer values (log-scale y-axis)for a 10 000-method program,with each analysis(running on a given program)is closely related to the size method m(x-axis)ranked by decreasing value of #ctxm#ptsm of the points-to information,which is upper-bounded by E(STp). under different context sensitivity variants c e C.Assume a suitable The TST of the environment can be seen as analogous to the volume STp value has been chosen(we explain in Section 3.2 how this can of a container.The volume of the points-to information computed be done).According to Definition 3.1,the first 4000 methods are by the analysis has to fit in the TST volume for the analysis to scalability-critical under 2obj,and the first 2 000 are scalability- scale,i.e.,E(STp)<TST.This effectively normalizes the analysis critical also under 2type.All 10 000 methods are non-scalability- capacity for all programs,regardless of the number of methods they critical under 1 type. contain,or the variants of context sensitivity employed.We discuss For each method m,SCALER selects the most precise context- in Section 6 the actual TST values used in our experiments. sensitivity variant ci E C for which m is not scalability-critical, Importantly the TSTinequality,depends and context-insensitivity(Cl)if none exists.That is,method m is on the choice of context sensitivity,which can vary based on the analyzed with context-sensitivity SelectCtx(m,STp): approach described in Section 3.1.The self tuning approach of Cn if m is non-critical under cn for STp ScALER consists of computing.given the TST,an appropriate value of STp for a program p that will be used to assign appropriate SelectCtx(m,STp)= Ck if 3ck:m is non-critical under ck A m is critical under ck+for STp context-sensitivity variants to p's methods. CI otherwise To maximize precision while ensuring scalability,SCALER needs to pick the largest STp that satisfies the TST inequality.In this way. For example,in Figure 2,methods 1 to 1 999 will be analyzed as many methods as possible are analyzed with the most precise using 1type,methods 2000 to 3 999 using 2type,and methods 4000 context sensitivity that the total analysis capacity can afford.That to 10000 using 2obj. is,SCALER needs to pick max{STp E(STp)s TST}. The remaining issue,which we address in the following section, Figure 4 illustrates the mechanism using the distribution of Fig- is how to choose an appropriate scalability threshold for a given ure 2.This time,STp is not given by the user in advance.Instead, program to be analyzed. its value is computed to satisfy the inequality shown in Figure 4. For a candidate STp value,each method is classified as detailed in 3.2 Total Scalability Thresholds Section 3.1.2:it is assigned the most precise context sensitivity that As discussed earlier,the overall cost of a context-sensitive pointer keeps it from being scalability-critical. analysis is closely linked to the size of the points-to relation being In the case of Figure 4,the value of E(STp)for this program computed.An analysis that fails to terminate within realistic time corresponds to the sum of the areas of A1,A2,and A3.If A1 +A2 limits often does so because the space needed to represent the +A3 under the candidate STp is below TST,then STp is viable,but 132
ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis 2type 2obj 0 10 000 ST method 1 method 2000 method 4000 method 10 000 1type 2type 1type 1 000 000 2obj 100 000 p #ctx c m·#pts m c = c = c = Figure 2: Choosing a context-sensitivity strategy c for the methods of program p based on a scalability threshold, STp . 3.1.2 Choosing Context Sensitivity Strategies. Given a scalability threshold STp , Scaler classifies each methodm as scalability-critical or not, relative to a context sensitivity variant. It then ensures scalability for critical methods by selecting a cheaper (less precise) context sensitivity variant for them. As a result, different methods will be analyzed with different context sensitivity variants. Assume there is a set of context sensitivity variants C = { c1, c2, ..., cn } where ci is typically more precise (but less efficient) than ci−1. For example, this set could be C = {1type, 2type, 2obj}. Figure 2 illustrates (imaginary) distributions of #ctxc m·#ptsm values (log-scale y-axis) for a 10 000-method program, with each method m (x-axis) ranked by decreasing value of #ctxc m·#ptsm, under different context sensitivity variantsc ∈ C. Assume a suitable STp value has been chosen (we explain in Section 3.2 how this can be done). According to Definition 3.1, the first 4 000 methods are scalability-critical under 2obj, and the first 2 000 are scalabilitycritical also under 2type. All 10 000 methods are non-scalabilitycritical under 1type. For each method m, Scaler selects the most precise contextsensitivity variant ci ∈ C for which m is not scalability-critical, and context-insensitivity (CI) if none exists. That is, method m is analyzed with context-sensitivity SelectCtx(m, STp ): SelectCtx(m, STp ) = cn if m is non-critical under cn for STp ck if ∃ck : m is non-critical under ck ∧ m is critical under ck+1 for STp CI otherwise For example, in Figure 2, methods 1 to 1 999 will be analyzed using 1type, methods 2 000 to 3 999 using 2type, and methods 4 000 to 10 000 using 2obj. The remaining issue, which we address in the following section, is how to choose an appropriate scalability threshold for a given program to be analyzed. 3.2 Total Scalability Thresholds As discussed earlier, the overall cost of a context-sensitive pointer analysis is closely linked to the size of the points-to relation being computed. An analysis that fails to terminate within realistic time limits often does so because the space needed to represent the TST is Memory- TST Related ctx1, p1 obj1 ctx2, p2 obj4 ctx3, p3 obj9 ctx , p obj k m n … … … … … … … … Program 1 Program 2 #ctx c Σm m·#pts m #ctx c Σm m·#pts m #ctx c Σm m·#pts m Figure 3: Context-sensitive points-to information and TST. points-to relation exhausts the available RAM. The scalability of pointer analysis is therefore related to the features of the analysis environment, most importantly the memory size. We now show how Scaler determines a scalability threshold for a given program, based on the concept of a total scalability threshold that represents the analysis capacity and can be selected based on, for example, the available memory, independently of the program being analyzed. Definition 3.2 (Total Scalability Threshold). A total scalability threshold (TST) is a number that satisfies the inequality E(STp ) ≤ TST where E(STp ) is the estimated cost for the given value of STp , computed as the sum of the sizes of the points-to relations (cf. Definition 3.1) for all the methods in the program: E(STp ) = Í m #ctx SelectCtx(m,STp ) m · #ptsm Figure 3 illustrates the concept. The complexity of a pointer analysis (running on a given program) is closely related to the size of the points-to information, which is upper-bounded by E(STp ). The TST of the environment can be seen as analogous to the volume of a container. The volume of the points-to information computed by the analysis has to fit in the TST volume for the analysis to scale, i.e., E(STp ) ≤ TST. This effectively normalizes the analysis capacity for all programs, regardless of the number of methods they contain, or the variants of context sensitivity employed. We discuss in Section 6 the actual TST values used in our experiments. Importantly, in the TST inequality, #ctx SelectCtx(m,STp ) m depends on the choice of context sensitivity, which can vary based on the approach described in Section 3.1. The self tuning approach of Scaler consists of computing, given the TST, an appropriate value of STp for a program p that will be used to assign appropriate context-sensitivity variants to p’s methods. To maximize precision while ensuring scalability, Scaler needs to pick the largest STp that satisfies the TST inequality. In this way, as many methods as possible are analyzed with the most precise context sensitivity that the total analysis capacity can afford. That is, Scaler needs to pick max{STp | E(STp ) ≤ TST}. Figure 4 illustrates the mechanism using the distribution of Figure 2. This time, STp is not given by the user in advance. Instead, its value is computed to satisfy the inequality shown in Figure 4. For a candidate STp value, each method is classified as detailed in Section 3.1.2: it is assigned the most precise context sensitivity that keeps it from being scalability-critical. In the case of Figure 4, the value of E(STp ) for this program corresponds to the sum of the areas of A1, A2, and A3. If A1 + A2 + A3 under the candidate STp is below TST, then STp is viable, but 132
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE'18.November 4-9,2018,Lake Buena Vista.FL,USA #ctxC#ptsm 1000000 -c=2obj SCALER performs this computation by leveraging the object al- c=2type location graph (OAG)structure proposed by Tan et al.[37].With c=1type the OAG,the context computation problem can be formulated as a graph traversal problem.For any program,based on an OAG 100000 derived from pre-analysis(context-insensitive pointer analysis), E(STp)=A1 +A2+A3s TST SCALER computes the number of contexts(#ctx)of every method for each kind of context sensitivity c used(2obj,2type and 1type STp STp is automatically computed in our setup)by enumerating all contexts of the method. 10000 based on the above inequality The OAG of a program is a directed graph.A node of the OAG represents an abstract object,which is identified by its allocation A1 A2 A3 site in the program.An edge of the OAG,say o-02,represents 0 method,methodaooo methodaooo methodsoooo an object-allocation relation between o1 and 02,i.e.,o1 is a receiver object of the method that contains the allocation site of o2.SCALER Figure 4:Selecting the scalability threshold(STp)based on leverages the pre-analysis to build the OAG for the given program the total scalability threshold (TST). The OAG provides a graphical perspective of object-(and type-) sensitivity,i.e.,a k-depth context in object-sensitivity corresponds to higher STp values may also be viable.The maximum such STp a k-node path in the OAG [37].Thus,to compute k-object-sensitive is computed using binary search in a range between 0 and the contexts of a method m,SCALER simply enumerates k-node paths maximum value of #ctxm selectCx(m.ST)sm for all m. in the OAG,leading to the receiver objects of m. Figure 6 illustrates the mechanism with a simple example.The allocation sites are labeled B1.B2,and C1,respectively.Suppose we 3.3 Overall Approach compute 2obj contexts for method m()(line 12)and its receiver Figure 5 shows the overall structure of the ScALER framework. object is C1(allocated at line 16).Further,B1 and B2 are two allocator SCALER is a stand-alone system that automatically selects a per- objects of C1 according to k-object-sensitivity [25,32].The possible method context-sensitivity variant that can subsequently be used (2obj)contexts of m()are [B1,C1]and [B2,C1].The corresponding to configure an existing pointer analysis tool like DooP,WALA,or OAG is given in Figure 6,which shows two 2-node paths that CHORD. are exactly the 2obj contexts for m().Since a context-insensitive Given a program p,a fast (but imprecise)context-insensitive analysis over-approximates the fully precise OAG,some edges may pointer analysis is run first to produce some basic points-to infor- be spurious.However,this is fine,since we only need an upper mation that is used for estimating #ptsm(Section 3.1.1)and #cx bound of the number of possible contexts,to establish scalability. (Section 4).The core of SCALER consists of one component that Type sensitivity is an isomorphic approximation of object sensi- decides the scalability threshold based on a given total scalability tivity [30],thus we can easily derive the contexts for type sensitivity threshold(Section 3.2)and another component that chooses context with the OAG.Following the definition [32],SCALER computes the sensitivity strategies for all methods in the program (Section 3.1.2). contexts for type sensitivity by merely replacing objects in contexts The framework relies on the collection C of context-sensitivity (as computed from the OAG)by the types that contain the alloca- variants.All that is needed for each variant is a mechanism for tion sites of the objects.For example,to compute the contexts for estimating the number of contexts,as presented in Section 4 for method m()(Figure 6)under 2type,SCALER first obtains the 2obj object-sensitivity and type-sensitivity. contexts,i.e.,[B1,C1]and [B2,C1],then replaces B1 and B2 by type A,and C1 by type B.As a result,there is only one context of m() 4 ESTIMATING THE NUMBER OF CONTEXTS under 2type,i.e.,[A,B]. In Section 3.1 we postponed the discussion of how to compute an 5 IMPLEMENTATION upper bound on the number of contexts(#ctx),for a given variant We have implemented SCALER as a stand-alone open-source tool in of context sensitivity c,using only context-insensitive analysis results.This computation is one of the key elements of SCALER and Java,available at http://www.brics.dk/scaler/.SCALER is designed to distinguishes it from prior approaches that have also tried to adapt work with various pointer analysis frameworks such as Doop [4], context sensitivity on a per-method basis [14,33,41]. WALA [7],CHORD [26],and SooT [40].To demonstrate its effective- The computation of the possible contexts for the context-sensitive ness,we have integrated ScALER with Doop [4],a state-of-the-art pointer analysis framework for Java analysis of a method,from only context-insensitive analysis results. is relatively easy for simple variants of context sensitivity,such as In practice,we found that the #ctxfn#ptsm values of some Java call-site sensitivity [28]:the possible contexts are call sites,readily collection methods under package java.util.are very large,be- identifiable in the program code.This computation is nontrivial cause(1)the collection methods are frequently called,thus their for object-and type-sensitivity[24,32],however:the contexts of a #ctx values are large,and (2)many objects are passed to the method are abstract objects,determined by the analysis mechanism. AWe do not consider call-site sensitive analyses(1call,2call)or 1-object sensitivity We are not aware of any prior work that performs a similar com- (1obj)nour evaluation.as these analysis variants are typically both less precise and putation of possible contexts for object-sensitive or type-sensitive less scalable than at least one of the analyses in our setup.The SCALER approach can be analyses,without running the analyses themselves. adapted to any collection of context sensitivity variants,as long as a relative ordering in both increasing precision and cost is possible. 133
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA method A1 A2 A3 A1 + A2 + A3 ≤ TST is automatically computed based on the above inequality 1 method 2000 method 4000 method 10 000 0 10 000 1 000 000 100 000 STp #ctx c m·#pts m STp 2type 1type c 2obj = c = c = E(STp) = Figure 4: Selecting the scalability threshold (STp ) based on the total scalability threshold (TST). higher STp values may also be viable. The maximum such STp is computed using binary search in a range between 0 and the maximum value of #ctx SelectCtx(m,STp ) m · #ptsm for all m. 3.3 Overall Approach Figure 5 shows the overall structure of the Scaler framework. Scaler is a stand-alone system that automatically selects a permethod context-sensitivity variant that can subsequently be used to configure an existing pointer analysis tool like Doop, Wala, or Chord. Given a program p, a fast (but imprecise) context-insensitive pointer analysis is run first to produce some basic points-to information that is used for estimating #ptsm (Section 3.1.1) and #ctxc m (Section 4). The core of Scaler consists of one component that decides the scalability threshold based on a given total scalability threshold (Section 3.2) and another component that chooses context sensitivity strategies for all methods in the program (Section 3.1.2). The framework relies on the collection C of context-sensitivity variants. All that is needed for each variant is a mechanism for estimating the number of contexts, as presented in Section 4 for object-sensitivity and type-sensitivity. 4 ESTIMATING THE NUMBER OF CONTEXTS In Section 3.1 we postponed the discussion of how to compute an upper bound on the number of contexts (#ctxc m), for a given variant of context sensitivity c, using only context-insensitive analysis results. This computation is one of the key elements of Scaler and distinguishes it from prior approaches that have also tried to adapt context sensitivity on a per-method basis [14, 33, 41]. The computation of the possible contexts for the context-sensitive analysis of a method, from only context-insensitive analysis results, is relatively easy for simple variants of context sensitivity, such as call-site sensitivity [28]: the possible contexts are call sites, readily identifiable in the program code. This computation is nontrivial for object- and type-sensitivity[24, 32], however: the contexts of a method are abstract objects, determined by the analysis mechanism. We are not aware of any prior work that performs a similar computation of possible contexts for object-sensitive or type-sensitive analyses, without running the analyses themselves. Scaler performs this computation by leveraging the object allocation graph (OAG) structure proposed by Tan et al. [37]. With the OAG, the context computation problem can be formulated as a graph traversal problem. For any program, based on an OAG derived from pre-analysis (context-insensitive pointer analysis), Scaler computes the number of contexts (#ctxc m) of every method for each kind of context sensitivity c used (2obj, 2type and 1type in our setup) by enumerating all contexts of the method.4 The OAG of a program is a directed graph. A node of the OAG represents an abstract object, which is identified by its allocation site in the program. An edge of the OAG, say o1 → o2, represents an object-allocation relation between o1 and o2, i.e., o1 is a receiver object of the method that contains the allocation site of o2. Scaler leverages the pre-analysis to build the OAG for the given program. The OAG provides a graphical perspective of object- (and type-) sensitivity, i.e., a k-depth context in object-sensitivity corresponds to a k-node path in the OAG [37]. Thus, to compute k-object-sensitive contexts of a method m, Scaler simply enumerates k-node paths in the OAG, leading to the receiver objects of m. Figure 6 illustrates the mechanism with a simple example. The allocation sites are labeled B1, B2, and C1, respectively. Suppose we compute 2obj contexts for method m() (line 12) and its receiver object is C1 (allocated at line 16). Further, B1 and B2 are two allocator objects of C1 according to k-object-sensitivity [25, 32]. The possible (2obj) contexts of m() are [B1,C1] and [B2,C1]. The corresponding OAG is given in Figure 6, which shows two 2-node paths that are exactly the 2obj contexts for m(). Since a context-insensitive analysis over-approximates the fully precise OAG, some edges may be spurious. However, this is fine, since we only need an upper bound of the number of possible contexts, to establish scalability. Type sensitivity is an isomorphic approximation of object sensitivity [30], thus we can easily derive the contexts for type sensitivity with the OAG. Following the definition [32], Scaler computes the contexts for type sensitivity by merely replacing objects in contexts (as computed from the OAG) by the types that contain the allocation sites of the objects. For example, to compute the contexts for method m() (Figure 6) under 2type, Scaler first obtains the 2obj contexts, i.e., [B1,C1] and [B2,C1], then replaces B1 and B2 by type A, and C1 by type B. As a result, there is only one context of m() under 2type, i.e., [A,B]. 5 IMPLEMENTATION We have implemented Scaler as a stand-alone open-source tool in Java, available at http://www.brics.dk/scaler/. Scaler is designed to work with various pointer analysis frameworks such as Doop [4], Wala [7], Chord [26], and Soot [40]. To demonstrate its effectiveness, we have integrated Scaler with Doop [4], a state-of-the-art pointer analysis framework for Java. In practice, we found that the #ctxc m·#ptsm values of some Java collection methods under package java.util.* are very large, because (1) the collection methods are frequently called, thus their #ctxc m values are large, and (2) many objects are passed to the 4We do not consider call-site sensitive analyses (1call, 2call) or 1-object sensitivity (1obj) in our evaluation, as these analysis variants are typically both less precise and less scalable than at least one of the analyses in our setup. The Scaler approach can be adapted to any collection of context sensitivity variants, as long as a relative ordering in both increasing precision and cost is possible. 133
ESEC/FSE'18.November 4-9,2018,Lake Buena Vista,FL,USA Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis TST:( SCALER Selected Program P Optional context TST-Based STp. ST-Based sensitivity Scalable Precise Points-to Pre-Analysis ST Selection Context Sensitivity for every Pointer Analysis information method of p Figure 5:Overview of the SCALER framework. 1 class A{ 14c1a55B{ costs thousands or hundreds of seconds;2 programs (luindex and void foo(){ 15 void bar(){ B b1 new B()j//B1 1 cc new c();//C1 lusearch)represent simple programs for which 2obj costs only b1.bar(); 17 c.m(); dozens of seconds.These programs are all analyzed with a large 18 Java library:Open JDK 1.6.0_24. 6 void goo(){ 19J 7 B b2 new B();//B2 SCALER uses a default TST value of 30M(million)which exhibits 8 b2.bar(); uniform scalability on all machines in our experimental setup.To 9 18 answer RQ1-RQ3(Section 6.1-6.3),we run the experiments on our 11c1a55C{ default machine with a Xeon E5-2697A 2.6GHz CPU and 48GB of 12 void m(){...} RAM,with the default TST(30M).To answer RQ4,different TST 13 values are considered,and the experiments are also run on machines with smaller and larger memory sizes(Section 6.4). Figure 6:An example illustrating #ctx computation. collection methods,thus their #ptsm values are large.This would 6.1 Scalability and Precision of SCALER-Guided affect STp-based context sensitivity,which may make SCALER pick Pointer Analysis less precise context sensitivity for these methods.However,collec- In this section,we examine the scalability and precision of SCALER- tion methods are important to the precision of pointer analysis,thus guided pointer analysis(SCALER for short)by comparing it with should be analyzed as precisely as possible by ScALER.To address conventional context-sensitive pointer analyses for Java:object- this problem,SCALER treats methods under package java.util. sensitivity [25]and type-sensitivity [32],which are the mainstream specially,explicitly assigning them to be analyzed by the most variants adopted in recent analysis clients.For example,object- precise context sensitivity(i.e.,2obj in our settings). sensitivity is widely adopted in recent Android static analysis frame works [1,9]and type-sensitivity is used in recent reflection analy- 6 EVALUATION sis [20]and security analysis [10]tools. In the evaluation,we investigate the following research questions: Table I shows the results for all analyses.Each program has five rows of data,respectively representing context-insensitive RQ1.Does SCALER consistently achieve scalability,while matching (CI),conventional context-sensitive,SCALER,and two introspective or exceeding the precision of the most precise conventional (IntroA and IntroB)pointer analyses.(The last two analyses will be pointer analyses [25,32]that use the same variant of context- discussed in Section 6.2.)For conventional context-sensitive pointer sensitivity for the entire program? analyses,we consider 2obj,2type,and 1type.Call-site-sensitivity RQ2.How does SCALER fare against introspective analysis [33]that and lobj are not considered as the former is not effective for Java also applies context-sensitivity selectively for the different programs [15,17,37]and the latter is usually both less efficient methods of the program? and less precise than 2type [15,32].As it is virtually impossible to RQ3.What is the overhead of running SCALER?What are the com- predict the scalability of a precise context-sensitive pointer analysis puted values of STp and what are the resulting distributions in advance,to get the most precise conventional analysis results we of SelectCtx in practice? run 2obj,2type,and 1type in the given order(from the most to RQ4.How does SCALER perform with different TST values and the least precise one)until one of them terminates within 3 hours. different memory sizes? So in Table 1,the second row for each program shows the results Experimental Setup.All pointer analyses were performed using of the most precise conventional pointer analysis that is scalable. Doop [29](with the version published in the artifact of [33],which 6.1.1 Scalability of SCALER-Guided Pointer Analysis.In Table 1,the contains the exact setup for different analyses).The time budget third column for each program demonstrates ScALER's extremely for all analyses is 3 hours(10 800 seconds).To demonstrate the good scalability.2obj is not scalable for the first six programs,and generality of SCALER's scalability and precision,we consider 10 even the fast 2type also fails to scale for jython and soot.How- real-world Java programs that cover different levels of program ever,SCALER scales for all of them,with its one-shot principle.In complexity.Five of these programs come from the DaCapo 2006 addition,for the first seven complex and large programs,SCALER benchmark suite and include the toughest-to-analyze programs runs (sometimes significantly)faster than the most precise con- (jython,eclipse).The rest are well-known open-source applica- ventional pointer analysis that is scalable,even if we add SCALER's tions.Concretely,as shown in Table 1,6 programs(jython,soot. pre-analysis time(CI). pmd,briss,jedit and eclipse)represent complex applications for For example,according to past literature and extensive experi- which 2obj is not scalable within 3 hours;2 programs (findbugs ence [15,32,37,38],jython is considered the most troublesome and chart)represent medium-complexity programs for which 2obj program in terms of scalability,among the DaCapo benchmarks [3]. 134
ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis TST-Based ST Selection ST-Based Context Sensitivity STp Selected context sensitivity for every method of P Scalable & Precise Pre-Analysis Pointer Analysis Points-to information SCALER TST Optional Program P Figure 5: Overview of the Scaler framework. 1 class A { 2 void foo() { 3 B b1 = new B();//B1 4 b1.bar(); 5 } 6 void goo() { 7 B b2 = new B();//B2 8 b2.bar(); 9 } 10 } 11 class C { 12 void m() {...} 13 } 14 class B { 15 void bar() { 16 C c = new C();//C1 17 c.m(); 18 } 19 } B1 B2 C1 Figure 6: An example illustrating #ctxc m computation. collection methods, thus their #ptsm values are large. This would affect STp -based context sensitivity, which may make Scaler pick less precise context sensitivity for these methods. However, collection methods are important to the precision of pointer analysis, thus should be analyzed as precisely as possible by Scaler. To address this problem, Scaler treats methods under package java.util.* specially, explicitly assigning them to be analyzed by the most precise context sensitivity (i.e., 2obj in our settings). 6 EVALUATION In the evaluation, we investigate the following research questions: RQ1. Does Scaler consistently achieve scalability, while matching or exceeding the precision of the most precise conventional pointer analyses [25, 32] that use the same variant of contextsensitivity for the entire program? RQ2. How does Scaler fare against introspective analysis [33] that also applies context-sensitivity selectively for the different methods of the program? RQ3. What is the overhead of running Scaler? What are the computed values of STp and what are the resulting distributions of SelectCtx in practice? RQ4. How does Scaler perform with different TST values and different memory sizes? Experimental Setup. All pointer analyses were performed using Doop [29] (with the version published in the artifact of [33], which contains the exact setup for different analyses). The time budget for all analyses is 3 hours (10 800 seconds). To demonstrate the generality of Scaler’s scalability and precision, we consider 10 real-world Java programs that cover different levels of program complexity. Five of these programs come from the DaCapo 2006 benchmark suite and include the toughest-to-analyze programs (jython, eclipse). The rest are well-known open-source applications. Concretely, as shown in Table 1, 6 programs (jython, soot, pmd, briss, jedit and eclipse) represent complex applications for which 2obj is not scalable within 3 hours; 2 programs (findbugs and chart) represent medium-complexity programs for which 2obj costs thousands or hundreds of seconds; 2 programs (luindex and lusearch) represent simple programs for which 2obj costs only dozens of seconds. These programs are all analyzed with a large Java library: Open JDK 1.6.0_24. Scaler uses a default TST value of 30M (million) which exhibits uniform scalability on all machines in our experimental setup. To answer RQ1śRQ3 (Section 6.1ś6.3), we run the experiments on our default machine with a Xeon E5-2697A 2.6GHz CPU and 48GB of RAM, with the default TST (30M). To answer RQ4, different TST values are considered, and the experiments are also run on machines with smaller and larger memory sizes (Section 6.4). 6.1 Scalability and Precision of Scaler-Guided Pointer Analysis In this section, we examine the scalability and precision of Scalerguided pointer analysis (Scaler for short) by comparing it with conventional context-sensitive pointer analyses for Java: objectsensitivity [25] and type-sensitivity [32], which are the mainstream variants adopted in recent analysis clients. For example, objectsensitivity is widely adopted in recent Android static analysis frameworks [1, 9] and type-sensitivity is used in recent reflection analysis [20] and security analysis [10] tools. Table 1 shows the results for all analyses. Each program has five rows of data, respectively representing context-insensitive (CI), conventional context-sensitive, Scaler, and two introspective (IntroA and IntroB) pointer analyses. (The last two analyses will be discussed in Section 6.2.) For conventional context-sensitive pointer analyses, we consider 2obj, 2type, and 1type. Call-site-sensitivity and 1obj are not considered as the former is not effective for Java programs [15, 17, 37] and the latter is usually both less efficient and less precise than 2type [15, 32]. As it is virtually impossible to predict the scalability of a precise context-sensitive pointer analysis in advance, to get the most precise conventional analysis results we run 2obj, 2type, and 1type in the given order (from the most to the least precise one) until one of them terminates within 3 hours. So in Table 1, the second row for each program shows the results of the most precise conventional pointer analysis that is scalable. 6.1.1 Scalability of Scaler-Guided Pointer Analysis. In Table 1, the third column for each program demonstrates Scaler’s extremely good scalability. 2obj is not scalable for the first six programs, and even the fast 2type also fails to scale for jython and soot. However, Scaler scales for all of them, with its one-shot principle. In addition, for the first seven complex and large programs, Scaler runs (sometimes significantly) faster than the most precise conventional pointer analysis that is scalable, even if we add Scaler’s pre-analysis time (CI). For example, according to past literature and extensive experience [15, 32, 37, 38], jython is considered the most troublesome program in terms of scalability, among the DaCapo benchmarks [3]. 134
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE'18.November 4-9,2018,Lake Buena Vista.FL,USA Table 1:Efficiency and precision metrics for all programs and pointer analyses.In all cases,lower is better. Time Precision metrics Time Precision metrics Progran Analysis (seconds) #may-fail #poly #reach #call Program Analysis (seconds) #may-fail #poly #reach #call 3h=10800s casts calls methods edges 3h=10800s casts calls methods edges CI 112 2234 2778 12718 114856 CI 135 4190 9197 20862161222 2obj+2type→1typd 3h+>3h+1997 2117 2577 12430111834 2obj2type ≥3h+960 3401 8542 20314145210 jython SCALER 452 1852 250012167 107410 eclipse SCALER 652 3211 848620374145953 IntroA 381 2202 2632 12663 114095 IntroA 1369 4175 888920853 160139 IntroB 23h IntroB 528 3640 853920491 149980 CI 994 16570 16532 32459 415476 CI 49 2508 292513036 77370 2obj+2type1type >3h+3h+3812 16212 15511 32308 386840 2458 1409 2182 12657 65836 soot SCALER 1236 10549 14822 31982 374877 findbugs SCALER 272 1452 2195 12676 66177 IntroA 1295 16503 15947 32390 413083 IntroA 188 2271 2422 12960 73681 IntroB 4850 15474 14895 32222 319431 IntroB 397 2024 2372 12882 70725 CI 67 2948 4183 15254 104457 CI 48 1810 1852 12064 63453 2obj+2type >3h+1203 2317 3577 14863 92885 2obj 285 883 1378 11330 52374 pmd SCALER 705 2176 3536 14895 92775 chart SCALER 254 976 1402 11530 53198 IntroA 356 2820 3823 15117 101762 IntroA 128 1580 1613 11952 61323 IntroB 2986 2524 3694 15006 96565 IntroB 189 1236 1497 11518 55594 CI 228 4904 6297 26582176785 CI 22 734 940 6670 33130 2obj+2type 33h+5374 3589 5208 25478 150735 2obj 53 297 675 6256 29021 briss SCALER 1194 3428 5323 25652 152761 luindex SCALER 5 297 675 6256 29021 IntroA 497 4889 6076 26507 175565 IntroA 45 617 802 6600 32370 IntroB ≥3弘 IntroB 8 450 714 6316 29835 CI 117 3398 4769 21232120309 CI 22 844 1133 7352 36343 2obj+2type 3h+2950 2507 3971 20620 98819 2obj 93 299 850 6904 31811 jedit SCALER 1769 2397 4012 20726 99536 lusearch SCALER 93 299 850 6904 31811 IntroA 300 3110 442921075116745 IntroA 94 681 981 7277 35531 IntroB 6942 2609 408820730105116 IntroB 462 891 6970 32656 To get the most precise results for a conventional pointer analysis, as shown in Section 6.4,the precision of SCALER can improve by 23 597 seconds (3h 3h 1997s)are spent before one discovers simply increasing the TST value. that 1type is scalable for jython on our machine;however,SCALER only costs 452 seconds for jython,and with better precision(in Answer to RQ1:SCALER-guided pointer analysis exhibits extremely terms of all precision metrics)as described in the next section. good and uniform scalability while matching or even exceeding the precision of the most precise conventional pointer analysis that is scalable 6.1.2 Precision of SCALER-Guided Pointer Analysis.To measure precision,we use four independently useful client analyses that are often(although rarely all together)used as precision metrics in 6.2 Comparison with Introspective Analyses existing literature [14,15,32,33,38].Together they paint a fairly ac- The closest relative of our work in past literature is introspective curate picture of analysis precision,as it impacts clients.The clients analysis [33]:a technique that also attempts to tune context sensi- are:a cast-resolution analysis(metric:the number of cast operations tivity per-method based on a pre-analysis.Introspective analysis that may fail-#may-fail casts),a devirtualization analysis(metric uses heuristics(such as "total points-to information")that do not, the number of virtual call sites that cannot be disambiguated into however,have the upper-bound guarantee,scalability emphasis,or monomorphic calls-#poly calls),a method reachability analysis context-number-estimation ability of SCALER (metric:the number of reachable methods-#reach methods),and a There are two published heuristics leading to different variants of call-graph construction analysis (metric:the number of call graph introspective analyses,IntroA and IntroB.Generally,IntroA is faster edges-#call edges).In Table 1,columns(4-7)for each program but less precise than IntroB.Like SCALER,introspective analysis also list precision results.In all cases,lower is better.We find that. relies on a context-insensitive analysis(CI)as its pre-analysis.Un- overwhelmingly,ScALER achieves comparable,and typically bet- like SCALER,which decides on each method what context-sensitivity ter,precision than the most precise conventional pointer analysis it needs(e.g.,2obj.2type,1type or CI),introspective analysis de- that is comparably scalable(e.g..1type for jython,2type for pmd cides on each method whether it needs contexts.Despite the dif- and 2obj for luindex).In the case of findbugs,the precision of ference,the computation time of producing the context selection SCALER is marginally lower than that of 2obj,but the running time information is very similar (a few seconds for each program on av- is an order of magnitude lower.The chart benchmark is the only erage).As a result,the overhead of both analyses'decision making one for which SCALER is slightly less precise than 2obj without also can be considered similar. being much faster,yet ScALER still attains most of the precision In Table 1,the last three rows for each program show the compar- gains of 2obj relative to a context-insensitive analysis.Moreover, ison results.In most programs(except soot and eclipse),IntroA 135
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Table 1: Efficiency and precision metrics for all programs and pointer analyses. In all cases, lower is better. Program Analysis Time (seconds) 3h=10 800s Precision metrics Program Analysis Time (seconds) 3h=10 800s Precision metrics #may-fail casts #poly calls #reach methods #call edges #may-fail casts #poly calls #reach methods #call edges CI 112 2 234 2 778 12 718 114 856 CI 135 4 190 9 197 20 862 161 222 2obj 2type 1type >3h+>3h+1 997 2 117 2 577 12 430 111 834 2obj 2type >3h+960 3 401 8 542 20 314 145 210 jython Scaler 452 1 852 2 500 12 167 107 410 eclipse Scaler 652 3 211 8 486 20 374 145 953 IntroA 381 2 202 2 632 12 663 114 095 IntroA 1 369 4 175 8 889 20 853 160 139 IntroB >3h ś ś ś ś IntroB 528 3 640 8 539 20 491 149 980 CI 994 16 570 16 532 32 459 415 476 CI 49 2 508 2 925 13 036 77 370 2obj 2type 1type >3h+>3h+3 812 16 212 15 511 32 308 386 840 2obj 2 458 1 409 2 182 12 657 65 836 soot Scaler 1 236 10 549 14 822 31 982 374 877 findbugs Scaler 272 1 452 2 195 12 676 66 177 IntroA 1 295 16 503 15 947 32 390 413 083 IntroA 188 2 271 2 422 12 960 73 681 IntroB 4 850 15 474 14 895 32 222 319 431 IntroB 397 2 024 2 372 12 882 70 725 CI 67 2 948 4 183 15 254 104 457 CI 48 1 810 1 852 12 064 63 453 2obj 2type >3h+1 203 2 317 3 577 14 863 92 885 2obj 285 883 1 378 11 330 52 374 pmd Scaler 705 2 176 3 536 14 895 92 775 chart Scaler 254 976 1 402 11 530 53 198 IntroA 356 2 820 3 823 15 117 101 762 IntroA 128 1 580 1 613 11 952 61 323 IntroB 2 986 2 524 3 694 15 006 96 565 IntroB 189 1 236 1 497 11 518 55 594 CI 228 4 904 6 297 26 582 176 785 CI 22 734 940 6 670 33 130 2obj 2type >3h+5 374 3 589 5 208 25 478 150 735 2obj 53 297 675 6 256 29 021 briss Scaler 1 194 3 428 5 323 25 652 152 761 luindex Scaler 53 297 675 6 256 29 021 IntroA 497 4 889 6 076 26 507 175 565 IntroA 45 617 802 6 600 32 370 IntroB >3h ś ś ś ś IntroB 48 450 714 6 316 29 835 CI 117 3 398 4 769 21 232 120 309 CI 22 844 1 133 7 352 36 343 2obj 2type >3h+2 950 2 507 3 971 20 620 98 819 2obj 93 299 850 6 904 31 811 jedit Scaler 1 769 2 397 4 012 20 726 99 536 lusearch Scaler 93 299 850 6 904 31 811 IntroA 300 3 110 4 429 21 075 116 745 IntroA 94 681 981 7 277 35 531 IntroB 6 942 2 609 4 088 20 730 105 116 IntroB 96 462 891 6 970 32 656 To get the most precise results for a conventional pointer analysis, 23 597 seconds (3h + 3h + 1 997s) are spent before one discovers that 1type is scalable for jython on our machine; however, Scaler only costs 452 seconds for jython, and with better precision (in terms of all precision metrics) as described in the next section. 6.1.2 Precision of Scaler-Guided Pointer Analysis. To measure precision, we use four independently useful client analyses that are often (although rarely all together) used as precision metrics in existing literature [14, 15, 32, 33, 38]. Together they paint a fairly accurate picture of analysis precision, as it impacts clients. The clients are: a cast-resolution analysis (metric: the number of cast operations that may failÐ#may-fail casts), a devirtualization analysis (metric: the number of virtual call sites that cannot be disambiguated into monomorphic callsÐ#poly calls), a method reachability analysis (metric: the number of reachable methodsÐ#reach methods), and a call-graph construction analysis (metric: the number of call graph edgesÐ#call edges). In Table 1, columns (4ś7) for each program list precision results. In all cases, lower is better. We find that, overwhelmingly, Scaler achieves comparable, and typically better, precision than the most precise conventional pointer analysis that is comparably scalable (e.g., 1type for jython, 2type for pmd and 2obj for luindex). In the case of findbugs, the precision of Scaler is marginally lower than that of 2obj, but the running time is an order of magnitude lower. The chart benchmark is the only one for which Scaler is slightly less precise than 2obj without also being much faster, yet Scaler still attains most of the precision gains of 2obj relative to a context-insensitive analysis. Moreover, as shown in Section 6.4, the precision of Scaler can improve by simply increasing the TST value. Answer to RQ1: Scaler-guided pointer analysis exhibits extremely good and uniform scalability while matching or even exceeding the precision of the most precise conventional pointer analysis that is scalable. 6.2 Comparison with Introspective Analyses The closest relative of our work in past literature is introspective analysis [33]: a technique that also attempts to tune context sensitivity per-method based on a pre-analysis. Introspective analysis uses heuristics (such as łtotal points-to informationž) that do not, however, have the upper-bound guarantee, scalability emphasis, or context-number-estimation ability of Scaler. There are two published heuristics leading to different variants of introspective analyses, IntroA and IntroB. Generally, IntroA is faster but less precise than IntroB. Like Scaler, introspective analysis also relies on a context-insensitive analysis (CI) as its pre-analysis. Unlike Scaler, which decides on each method what context-sensitivity it needs (e.g., 2obj, 2type, 1type or CI), introspective analysis decides on each method whether it needs contexts. Despite the difference, the computation time of producing the context selection information is very similar (a few seconds for each program on average). As a result, the overhead of both analyses’ decision making can be considered similar. In Table 1, the last three rows for each program show the comparison results. In most programs (except soot and eclipse), IntroA 135
ESEC/FSE'18.November 4-9,2018,Lake Buena Vista,FL,USA Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Table 2:Performance of SCALER. Program jython soot pmd briss jedit eclipse findbugs chart luindex lusearch avg. SCALER time (seconds)35.0 1.8 0.8 2.6 1.4 1.0 0.5 0.4 0.2 0.34.4 runs faster than ScALER thus also exhibiting good scalability;how- 2obj 2type 1type DC ever,SCALER's precision is significantly better than IntroA's(SCALER sT。 16607035423273517099830778699108627350807339354949 wins in precision in all the precision metrics of all the programs). 100% 90% IntroB exhibits better precision than IntroA in all cases(when it is scalable)but it is still less precise than SCALER in all the cases 0% 70% except #call-edges for soot and #reach-methods for chart.Regard- 60% ing efficiency,SCALER runs faster than IntroB for most programs 50% (except eclipse,chart,and luindex);in addition,IntroB is not 40% scalable for two programs(jython and briss). 30% 0% Answer to RQ2:Comparing with state-of-the-art adaptive anal- 0% yses,introspective analyses IntroA and IntroB,both SCALER and 0% IntroA exhibit extremely good scalability while ScALER's precision is ython ▣▣ chart luindex lusearch significantly better than IntroA's;and ScALER's scalability is better than IntroB's while being more precise in most cases. Figure 7:The STp value (on top of each bar)computed by SCALER for each program and the distribution of different 6.3 SCALER's Effectiveness as a Pre-Analysis kinds of context-sensitivities selected according to each STp. This section answers RO3 by examining the overhead of SCALER's and briss),medium STp values for medium-complexity programs adaptivity as well as the STp values selected by ScALER for each (findbugs and chart),and large STp values for simple programs program,and the corresponding distribution of different kinds of (luindex and lusearch). context sensitivity based on the selected STp. SCALER automatically selects 2obj,the most precise context- 6.3.1 Overhead of SCALER.The overall overhead of the SCALER sensitivity in our experiments,for most methods(80.6%per pro- framework(Figure 5)consists of two components:(1)a context- gram on average).This is the reason why ScALER-guided pointer insensitive pointer analysis CL,which provides points-to informa- analysis achieves very good precision as shown in Table 1.This tion to SCALER,and(2)SCALER logic itself,which performs TST- also demonstrates that our insight of scalability-critical methods based STp selection and STp-based context-sensitivity.SCALER's (Section 3.1.1)holds in practice:in most cases,only a small set of pre-analysis time(CI)is given in Table 1,and Table 2 shows the scalability-critical methods make the whole analysis unscalable. overhead of SCALER itself.The average overall overhead of SCALER We next discuss two outlier cases,at both ends of the spectrum, for each program is 183.8 seconds,of which CI pre-analysis costs since they are informative of SCALER's limit behavior account for the vast majority (179.4 seconds),while SCALER logic soot.SCALER selects 0 as STp value for soot,so most of its costs only 4.4 seconds.Considering the significant scalability im- methods are analyzed with context-insensitivity(CI).The reason is provements achieved by SCALER,its overhead is negligible. that soot is very large.The pre-analysis of SCALER reports that the SCALER spends 35 seconds for jython,which is markedly longer total size of points-to information of soot(i.e.,the sum of #ptsm than for other programs(Table 2).The reason is that jython is of all methods)is 110 901 529,which already exceeds our default especially complicated in the sense that many of its methods have TST(30M).As a result,SCALER automatically selects 2obj for only enormous numbers of contexts,thus ScALER spends much time on 1792 methods in package java.util.*(as explained in Section 5) #ctxf computation.For instance,SCALER's #ctxfn computation and CI for all other methods according to the definition of SelectCtx shows that 130 methods of jython have more than 1000 000 con- in Section 3.1.2,to ensure the scalability of pointer analysis. texts under 2obj.(For comparison,the maximal #ctxf value of a single method under 2obj in soot,the largest program in our luindex and lusearch.SCALER selects two very large STp val- evaluation,is only 88 473.)In this way,SCALER also reveals the ues for luindex and lusearch(the two simplest programs in our reason why many context-sensitive pointer analyses fail to ana- evaluation).Accordingly,it assigns 2obj for all methods in the lyze jython scalably as reported in previous work [15,32,37,38]. two programs which are all classified as non-scalability-critical SCALER avoids the problem due to its STp-based context-sensitivity methods under 2obj.There is only one exception:the method design (Section 3.1.2). ()>.In Java's type hierarchy. object is the supertype of all classes;thus its method will 6.3.2 ST Values and Context-Sensitivity Distributions.Figure 7 gives be called whenever a constructor is called,yielding too many con- the STp values selected by ScALER for each program according to texts(#ctx).and a this variable that abstractly points to all the the default TST(30M),and the distribution of different kinds of objects created in the program,which makes the method's #ptsm context sensitivity over the methods of the programs based on value very large. the selected STp.Generally,given the same TST,SCALER automat- As a result,the #ctxt#ptsm value for this method ex- ically selects small STp values for complex programs (e.g.,soot ceeds the selected STp value,so SCALER selects less precise context 136
ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis Table 2: Performance of Scaler. Program jython soot pmd briss jedit eclipse findbugs chart luindex lusearch avg. Scaler time (seconds) 35.0 1.8 0.8 2.6 1.4 1.0 0.5 0.4 0.2 0.3 4.4 runs faster than Scaler thus also exhibiting good scalability; however, Scaler’s precision is significantly better than IntroA’s (Scaler wins in precision in all the precision metrics of all the programs). IntroB exhibits better precision than IntroA in all cases (when it is scalable) but it is still less precise than Scaler in all the cases except #call-edges for soot and #reach-methods for chart. Regarding efficiency, Scaler runs faster than IntroB for most programs (except eclipse, chart, and luindex); in addition, IntroB is not scalable for two programs (jython and briss). Answer to RQ2: Comparing with state-of-the-art adaptive analyses, introspective analyses IntroA and IntroB, both Scaler and IntroA exhibit extremely good scalability while Scaler’s precision is significantly better than IntroA’s; and Scaler’s scalability is better than IntroB’s while being more precise in most cases. 6.3 Scaler’s Effectiveness as a Pre-Analysis This section answers RQ3 by examining the overhead of Scaler’s adaptivity as well as the STp values selected by Scaler for each program, and the corresponding distribution of different kinds of context sensitivity based on the selected STp . 6.3.1 Overhead of Scaler. The overall overhead of the Scaler framework (Figure 5) consists of two components: (1) a contextinsensitive pointer analysis CI, which provides points-to information to Scaler, and (2) Scaler logic itself, which performs TSTbased STp selection and STp -based context-sensitivity. Scaler’s pre-analysis time (CI) is given in Table 1, and Table 2 shows the overhead of Scaler itself. The average overall overhead of Scaler for each program is 183.8 seconds, of which CI pre-analysis costs account for the vast majority (179.4 seconds), while Scaler logic costs only 4.4 seconds. Considering the significant scalability improvements achieved by Scaler, its overhead is negligible. Scaler spends 35 seconds for jython, which is markedly longer than for other programs (Table 2). The reason is that jython is especially complicated in the sense that many of its methods have enormous numbers of contexts, thus Scaler spends much time on #ctxc m computation. For instance, Scaler’s #ctxc m computation shows that 130 methods of jython have more than 1 000 000 contexts under 2obj. (For comparison, the maximal #ctxc m value of a single method under 2obj in soot, the largest program in our evaluation, is only 88 473.) In this way, Scaler also reveals the reason why many context-sensitive pointer analyses fail to analyze jython scalably as reported in previous work [15, 32, 37, 38]. Scaler avoids the problem due to its STp -based context-sensitivity design (Section 3.1.2). 6.3.2 ST Values and Context-Sensitivity Distributions. Figure 7 gives the STp values selected by Scaler for each program according to the default TST (30M), and the distribution of different kinds of context sensitivity over the methods of the programs based on the selected STp . Generally, given the same TST, Scaler automatically selects small STp values for complex programs (e.g., soot 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% jython soot pmd briss jedit eclipse findbugs chart luindex lusearch 2obj 2type 1type CI STp 16607 0 35423 2735 17099 8307 78699 108627 35080733 9354949 Figure 7: The STp value (on top of each bar) computed by Scaler for each program and the distribution of different kinds of context-sensitivities selected according to each STp . and briss), medium STp values for medium-complexity programs (findbugs and chart), and large STp values for simple programs (luindex and lusearch). Scaler automatically selects 2obj, the most precise contextsensitivity in our experiments, for most methods (80.6% per program on average). This is the reason why Scaler-guided pointer analysis achieves very good precision as shown in Table 1. This also demonstrates that our insight of scalability-critical methods (Section 3.1.1) holds in practice: in most cases, only a small set of scalability-critical methods make the whole analysis unscalable. We next discuss two outlier cases, at both ends of the spectrum, since they are informative of Scaler’s limit behavior. soot. Scaler selects 0 as STp value for soot, so most of its methods are analyzed with context-insensitivity (CI). The reason is that soot is very large. The pre-analysis of Scaler reports that the total size of points-to information of soot (i.e., the sum of #ptsm of all methods) is 110 901 529, which already exceeds our default TST (30M). As a result, Scaler automatically selects 2obj for only 1 792 methods in package java.util.* (as explained in Section 5) and CI for all other methods according to the definition of SelectCtx in Section 3.1.2, to ensure the scalability of pointer analysis. luindex and lusearch. Scaler selects two very large STp values for luindex and lusearch (the two simplest programs in our evaluation). Accordingly, it assigns 2obj for all methods in the two programs which are all classified as non-scalability-critical methods under 2obj. There is only one exception: the method ()>. In Java’s type hierarchy, Object is the supertype of all classes; thus its method will be called whenever a constructor is called, yielding too many contexts (#ctxc m), and a this variable that abstractly points to all the objects created in the program, which makes the method’s #ptsm value very large. As a result, the #ctxc m·#ptsm value for this method exceeds the selected STp value, so Scaler selects less precise context 136
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE'18.November 4-9,2018,Lake Buena Vista.FL,USA Precision Time Precision Time 3400 Timeout 12000 2200 2188 12000 3300 2176 3300 10000 10000 3211 2150 3200 3170 8000 8000 3100 3071 6000 2100 2080 2080 6000 3000 2979 4000 2D50 4000 2050 2900 2000 2000 2800 2000 0 20M 30M 60M 80M 150M 20M 30M 60M 80M 150M ☐#may-fall casts-量-12GB★48GB ◆-368G8 ☐#may-fall casts量-12GB◆48GB ◆-368GB (a)eclipse (b)pmd Figure 8:SCALER-guided pointer analysis time (efficiency)and the corresponding number of may-fail casts(precision)with different TST values(20M,30M,60M,80M and 150M)on the machines under different memory sizes(12GB,48GB,and 368GB). sensitivity for it.Since java.lang.Object's constructor is scalability vary as a function of TST values,for three different RAM an empty method,analyzing it with or without 2obj does not affect configurations.As the figure demonstrates,there is nothing special the precision of pointer analysis.Thus SCALER-guided pointer anal- about ScALER's default 30M TST:the system adapts appropriately ysis achieves exactly the same precision as the conventional 2obj for both larger and smaller values.Users can simply increase or analysis for luindex and lusearch(see Table 1).This again demon- decrease the TST value to achieve better precision(up to the level strates the self-tuning ability of ScALER which can automatically the analysis setup can support)or better efficiency,respectively. obtain maximal precision for simple programs. This simple design with TST being the only tuning knob is able to drive complicated pointer analyses to adapt to different programs Answer to RQ3:SCALER automatically selects appropriate STp to achieve their preferred scalability and precision. values and context-sensitivity to ensure the scalability of pointer analysis for programs of varying complexity.In addition,the cost of SCALER-Guided Pointer Analysis with Different Memory Sizes.As such good adaptivity is very low. shown in Figure 8,with the largest TST value of 150M,SCALER- guided pointer analysis is not scalable for pmd with 12GB or 48GB 6.4 SCALER-Guided Pointer Analysis with memory,but it is scalable if using 368GB memory.With the same Different TSTs and Memory Sizes TST,SCALER-guided pointer analysis running on a machine with a larger memory is more likely to be scalable.For example,with a TST As real-world analysis settings may differ widely,we conduct ex- of 150M,SCALER-guided pointer analysis is not scalable with 12GB periments by running SCALER with different TST and memory sizes, to further evaluate the adaptiveness and usability of SCALER in prac- memory for any of the 6 programs,it is scalable for 4 programs tice.We run SCALER for the top six costliest-to-analyze programs if using 48GB memory,and it is scalable for all 6 programs with 368GB memory.This result indirectly demonstrates that scalability in Table 1(jython,soot,pmd,briss,jedit,and eclipse),with (and,thus,the choice of TST)is tied to memory size. TST values(20M,30M.60M,80M,and 150M)on three machines with different memory sizes:12GB,48GB,and 368GB,representing Answer to RQ4:Better precision or better efficiency of SCALER- typical memory sizes of a personal laptop,a commodity server,and guided pointer analysis can be achieved by simply increasing or a large server,respectively. decreasing the TST value.The scalability is related to memory sizes. Conventional pointer analyses that are not scalable for these six with the same TST,SCALER-guided pointer analysis is more likely to programs in Table 1(with memory size 48GB)are also all unscal- scale under larger memory:thus a small(large)TST is recommended able on another machine with a much larger memory size,368GB. for a small(large)memory size for good scalability(precision). However,under SCALER's default TST of 30M(as used in previous experiments),SCALER-guided pointer analysis still scales for all six 7 RELATED WORK programs even on a machine with only 12GB memory.This result further demonstrates the effectiveness of SCALER in making pointer Context sensitivity,despite bringing great precision benefits,intro- analysis scalable in practice. duces many uncertainties to scalability,which may render a pointer Since the six programs under different TST and memory settings analysis useless in practice.We focus on related work that leverages exhibit similar trends,for space reasons we only show two rep- pre-analysis to achieve good efficiency and precision trade-offs for resentative programs,eclipse and pmd,in Figure 8.We illustrate context-sensitive pointer analysis. precision changes via the #may-fail casts metric,which is arguably Introspective analysis [33]attempts to achieve precision and the most common precision metric in Java pointer-analysis litera- scalability trade-offs by refining a context-sensitive analysis while ture[14,15,17,32-34,37,38]. avoiding its worst-case cost.Similar to ScALER,it first performs a pre-analysis(context-insensitive pointer analysis)to extract neces- SCALER-Guided Pointer Analysis under Different TSTs.Figure 8 sary information to guide later pointer analysis.Unlike SCALER,it encodes a lot of information:it shows how both precision and relies on a set of six manually-selected metrics (e.g.,the maximum 137
Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA 3 300 3 211 3 170 3 071 2 979 0 2 000 4 000 6 000 8 000 10 000 12 000 2 800 2 900 3 000 3 100 3 200 3 300 3 400 20M 30M 60M 80M 150M #may-fail casts 12GB 48GB 368GB Time Timeout Precision Timeout (a) eclipse 2 188 2 176 2 080 2 080 2 050 0 2 000 4 000 6 000 8 000 10 000 12 000 2 000 2 050 2 100 2 150 2 200 20M 30M 60M 80M 150M #may-fail casts 12GB 48GB 368GB Time Timeout Precision Timeout (b) pmd Figure 8: Scaler-guided pointer analysis time (efficiency) and the corresponding number of may-fail casts (precision) with different TST values (20M, 30M, 60M, 80M and 150M) on the machines under different memory sizes (12GB, 48GB, and 368GB). sensitivity for it. Since java.lang.Object’s constructor is an empty method, analyzing it with or without 2obj does not affect the precision of pointer analysis. Thus Scaler-guided pointer analysis achieves exactly the same precision as the conventional 2obj analysis for luindex and lusearch (see Table 1). This again demonstrates the self-tuning ability of Scaler which can automatically obtain maximal precision for simple programs. Answer to RQ3: Scaler automatically selects appropriate STp values and context-sensitivity to ensure the scalability of pointer analysis for programs of varying complexity. In addition, the cost of such good adaptivity is very low. 6.4 Scaler-Guided Pointer Analysis with Different TSTs and Memory Sizes As real-world analysis settings may differ widely, we conduct experiments by running Scaler with different TST and memory sizes, to further evaluate the adaptiveness and usability of Scaler in practice. We run Scaler for the top six costliest-to-analyze programs in Table 1 (jython, soot, pmd, briss, jedit, and eclipse), with TST values (20M, 30M, 60M, 80M, and 150M) on three machines with different memory sizes: 12GB, 48GB, and 368GB, representing typical memory sizes of a personal laptop, a commodity server, and a large server, respectively. Conventional pointer analyses that are not scalable for these six programs in Table 1 (with memory size 48GB) are also all unscalable on another machine with a much larger memory size, 368GB. However, under Scaler’s default TST of 30M (as used in previous experiments), Scaler-guided pointer analysis still scales for all six programs even on a machine with only 12GB memory. This result further demonstrates the effectiveness of Scaler in making pointer analysis scalable in practice. Since the six programs under different TST and memory settings exhibit similar trends, for space reasons we only show two representative programs, eclipse and pmd, in Figure 8. We illustrate precision changes via the #may-fail casts metric, which is arguably the most common precision metric in Java pointer-analysis literature [14, 15, 17, 32ś34, 37, 38]. Scaler-Guided Pointer Analysis under Different TSTs. Figure 8 encodes a lot of information: it shows how both precision and scalability vary as a function of TST values, for three different RAM configurations. As the figure demonstrates, there is nothing special about Scaler’s default 30M TST: the system adapts appropriately for both larger and smaller values. Users can simply increase or decrease the TST value to achieve better precision (up to the level the analysis setup can support) or better efficiency, respectively. This simple design with TST being the only tuning knob is able to drive complicated pointer analyses to adapt to different programs to achieve their preferred scalability and precision. Scaler-Guided Pointer Analysis with Different Memory Sizes. As shown in Figure 8, with the largest TST value of 150M, Scalerguided pointer analysis is not scalable for pmd with 12GB or 48GB memory, but it is scalable if using 368GB memory. With the same TST, Scaler-guided pointer analysis running on a machine with a larger memory is more likely to be scalable. For example, with a TST of 150M, Scaler-guided pointer analysis is not scalable with 12GB memory for any of the 6 programs, it is scalable for 4 programs if using 48GB memory, and it is scalable for all 6 programs with 368GB memory. This result indirectly demonstrates that scalability (and, thus, the choice of TST) is tied to memory size. Answer to RQ4: Better precision or better efficiency of Scalerguided pointer analysis can be achieved by simply increasing or decreasing the TST value. The scalability is related to memory sizes: with the same TST, Scaler-guided pointer analysis is more likely to scale under larger memory; thus a small (large) TST is recommended for a small (large) memory size for good scalability (precision). 7 RELATED WORK Context sensitivity, despite bringing great precision benefits, introduces many uncertainties to scalability, which may render a pointer analysis useless in practice. We focus on related work that leverages pre-analysis to achieve good efficiency and precision trade-offs for context-sensitive pointer analysis. Introspective analysis [33] attempts to achieve precision and scalability trade-offs by refining a context-sensitive analysis while avoiding its worst-case cost. Similar to Scaler, it first performs a pre-analysis (context-insensitive pointer analysis) to extract necessary information to guide later pointer analysis. Unlike Scaler, it relies on a set of six manually-selected metrics (e.g., the maximum 137
ESEC/FSE'18.November 4-9,2018,Lake Buena Vista,FL,USA Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis field points-to set over all fields)to define two heuristics,result- SCALER's lightweight pre-analysis,the learning phase is heavy and ing in two introspective analyses,IntroA and IntroB,which are costs 54 hours in the Jeong et al.experimental setting. compared with SCALER in Section 6.2.Benefiting from the new in- Generally,a machine learning approach is sensitive to the train- sight of SCALER(TST-based self-tuning context sensitivity).SCALER ing on input programs,and its learned results are usually difficult outperforms IntroB on both precision and scalability and achieves to explain,for example to discern why the learning algorithm se- the same level of scalability as IntroA,while being significantly lects a given context for a method.Instead,SCALER is a principled more precise.Moreover,the six different metrics in introspective rigorously-modeled,approach derived from simple insights;thus analysis need appropriate values to be set in advance to produce its guided results are tractable and interpretable,leading to more effective analysis results;SCALER's insights enable its methodology stable and uniform effectiveness. to be quite simple:it only needs one value(TST)for users to achieve Unlike conventional context-sensitive pointer analysis,which better precision or better efficiency as desired,resulting in better uses consecutive context elements for each context,the BEAN ap- usability in practice.Finally,introspective analysis is not one-shot: proach by Tan et al.[37]identifies and skips the redundant context using it will always incur a cost in precision,even if the program elements that are useless for improving the precision of context- could be analyzed more precisely.Therefore,its user will deploy it sensitive analysis.As a result,the saved space allows more precision- only after first attempting a precise analysis and failing. useful context elements to be involved to distinguish more contexts. Hassanshahi et al.[11]leverage similar metrics as introspective making the analysis more precise with a small efficiency overhead. analysis to guide selective object-sensitive pointer analysis for large Precision is the focus of BEAN while SCALER's is scalability.In addi codebases.However,their pre-analysis involves several phases(that tion,as explained in Section 4,rather than identifying redundant need different metrics and heuristics):a context-insensitive analysis precision-useless context elements,SCALER leverages the OAG from is first performed to extract the program kernel where a context- Tan et al [37]to compute the context numbers in advance. insensitive or fixed object-sensitive analysis is still not sufficiently MAHJONG [38],a recent heap abstraction for pointer analysis precise;then a fixed(heavy)object-sensitive pointer analysis is of Java,is also based on a cheap pre-analysis,like SCALER.It en- applied to the extracted(smaller)kernel to determine appropriate ables an allocation-site-based pointer analysis to run significantly context depth for each selected object.After these pre-analyses. faster while achieving nearly the same precision for type-dependent the selected object-sensitive information is used to guide the main clients,such as call graph construction.Differently,SCALER works pointer analysis which is demonstrated to work well for the Open- for general pointer analysis,including alias analysis(i.e.,not just JDK library.However,unlike introspective analysis [33]and SCALER, type-dependent clients)which cannot be handled by MAHJONG the overhead of the pre-analysis is uncertain,as it heavily relies effectively.In addition,SCALER is able to scale for trouble programs on the complexity of the extracted kernel,which further depends such as jython where even the very fast MAHJONG analysis fails on various metric values selected by users.Thus,it is unclear if the technique can exhibit general effectiveness for arbitrary Java 8 programs in practice CONCLUSIONS Both of the above approaches [11,33]involve metrics and heuris- Good scalability is hard to achieve for precise context-sensitive tics that are defined manually.An alternative is to use machine pointer analysis.To tackle this problem,we have introduced the learning techniques,as in the two approaches we describe next. SCALER framework,which automatically chooses a suitable context- Wei and Ryder [41]present an adaptive context-sensitive analy- sensitivity variant for each method in the given program,based on sis for JavaScript.They first use a machine learning algorithm to a fast,context-insensitive pre-analysis.The key insight is that it is obtain the relationship between some user-defined method char- possible to efficiently identify scalability-critical methods and that acteristics(extracted from a pre-analysis)and context-sensitivity scalability can be predicted using the ideas of scalability thresholds choice(1-call-site-,1-object-,or 1-parameter-sensitivity),and ex- and total scalability thresholds.The focus of SCALER is scalability, press the results as a decision tree.Based on domain knowledge but at the same time it aims to maximize precision,relative to a the decision tree is further manually adjusted to produce heuristics given total scalability threshold that can be selected based on the that are easy to interpret while the classifications can still maintain available memory. good accuracy.Finally,based on the heuristics,methods are ana The experimental evaluation of SCALER demonstrates that it is lyzed with different context sensitivity,resulting in better precision able to achieve extremely good scalability while producing highly achieved than single context-sensitive analysis. precise points-to results,in one shot,regardless of the programs Jeong et al.[14]propose a data-driven approach to guiding con- being analyzed.This may directly benefit many other program text sensitivity for Java.Unlike SCALER,where various kinds of analyses and software engineering tools that require scalable and context sensitivity with different lengths are applied to different precise pointer analysis.Moreover,we expect the ideas behind methods,only a single kind of context sensitivity is applied to the SCALER may help other kinds of static analyses to become more program,and each method is finally assigned an appropriate context scalable with good precision for real-world programs. length,including zero (i.e.,context insensitivity).As deep contexts are finally properly applied to only a subset of the methods,more ACKNOWLEDGMENTS efficient context-sensitive analysis can be achieved with still good precision.To select a context length for each method,25 metrics This work was supported by the European Research Council(ERC) (atomic features)are selected and a machine learning approach is under the FP7 and Horizon 2020 research and innovation programs used to learn heuristics based on these metrics.However,unlike (grant agreements 307334 and 647544) 138
ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis field points-to set over all fields) to define two heuristics, resulting in two introspective analyses, IntroA and IntroB, which are compared with Scaler in Section 6.2. Benefiting from the new insight of Scaler (TST-based self-tuning context sensitivity), Scaler outperforms IntroB on both precision and scalability and achieves the same level of scalability as IntroA, while being significantly more precise. Moreover, the six different metrics in introspective analysis need appropriate values to be set in advance to produce effective analysis results; Scaler’s insights enable its methodology to be quite simple: it only needs one value (TST) for users to achieve better precision or better efficiency as desired, resulting in better usability in practice. Finally, introspective analysis is not one-shot: using it will always incur a cost in precision, even if the program could be analyzed more precisely. Therefore, its user will deploy it only after first attempting a precise analysis and failing. Hassanshahi et al. [11] leverage similar metrics as introspective analysis to guide selective object-sensitive pointer analysis for large codebases. However, their pre-analysis involves several phases (that need different metrics and heuristics): a context-insensitive analysis is first performed to extract the program kernel where a contextinsensitive or fixed object-sensitive analysis is still not sufficiently precise; then a fixed (heavy) object-sensitive pointer analysis is applied to the extracted (smaller) kernel to determine appropriate context depth for each selected object. After these pre-analyses, the selected object-sensitive information is used to guide the main pointer analysis which is demonstrated to work well for the OpenJDK library. However, unlike introspective analysis [33] and Scaler, the overhead of the pre-analysis is uncertain, as it heavily relies on the complexity of the extracted kernel, which further depends on various metric values selected by users. Thus, it is unclear if the technique can exhibit general effectiveness for arbitrary Java programs in practice. Both of the above approaches [11, 33] involve metrics and heuristics that are defined manually. An alternative is to use machine learning techniques, as in the two approaches we describe next. Wei and Ryder [41] present an adaptive context-sensitive analysis for JavaScript. They first use a machine learning algorithm to obtain the relationship between some user-defined method characteristics (extracted from a pre-analysis) and context-sensitivity choice (1-call-site-, 1-object-, or 1-parameter-sensitivity), and express the results as a decision tree. Based on domain knowledge, the decision tree is further manually adjusted to produce heuristics that are easy to interpret while the classifications can still maintain good accuracy. Finally, based on the heuristics, methods are analyzed with different context sensitivity, resulting in better precision achieved than single context-sensitive analysis. Jeong et al. [14] propose a data-driven approach to guiding context sensitivity for Java. Unlike Scaler, where various kinds of context sensitivity with different lengths are applied to different methods, only a single kind of context sensitivity is applied to the program, and each method is finally assigned an appropriate context length, including zero (i.e., context insensitivity). As deep contexts are finally properly applied to only a subset of the methods, more efficient context-sensitive analysis can be achieved with still good precision. To select a context length for each method, 25 metrics (atomic features) are selected and a machine learning approach is used to learn heuristics based on these metrics. However, unlike Scaler’s lightweight pre-analysis, the learning phase is heavy and costs 54 hours in the Jeong et al. experimental setting. Generally, a machine learning approach is sensitive to the training on input programs, and its learned results are usually difficult to explain, for example to discern why the learning algorithm selects a given context for a method. Instead, Scaler is a principled, rigorously-modeled, approach derived from simple insights; thus its guided results are tractable and interpretable, leading to more stable and uniform effectiveness. Unlike conventional context-sensitive pointer analysis, which uses consecutive context elements for each context, the Bean approach by Tan et al. [37] identifies and skips the redundant context elements that are useless for improving the precision of contextsensitive analysis. As a result, the saved space allows more precisionuseful context elements to be involved to distinguish more contexts, making the analysis more precise with a small efficiency overhead. Precision is the focus of Bean while Scaler’s is scalability. In addition, as explained in Section 4, rather than identifying redundant precision-useless context elements, Scaler leverages the OAG from Tan et al. [37] to compute the context numbers in advance. Mahjong [38], a recent heap abstraction for pointer analysis of Java, is also based on a cheap pre-analysis, like Scaler. It enables an allocation-site-based pointer analysis to run significantly faster while achieving nearly the same precision for type-dependent clients, such as call graph construction. Differently, Scaler works for general pointer analysis, including alias analysis (i.e., not just type-dependent clients) which cannot be handled by Mahjong effectively. In addition, Scaler is able to scale for trouble programs such as jython where even the very fast Mahjong analysis fails. 8 CONCLUSIONS Good scalability is hard to achieve for precise context-sensitive pointer analysis. To tackle this problem, we have introduced the Scaler framework, which automatically chooses a suitable contextsensitivity variant for each method in the given program, based on a fast, context-insensitive pre-analysis. The key insight is that it is possible to efficiently identify scalability-critical methods and that scalability can be predicted using the ideas of scalability thresholds and total scalability thresholds. The focus of Scaler is scalability, but at the same time it aims to maximize precision, relative to a given total scalability threshold that can be selected based on the available memory. The experimental evaluation of Scaler demonstrates that it is able to achieve extremely good scalability while producing highly precise points-to results, in one shot, regardless of the programs being analyzed. This may directly benefit many other program analyses and software engineering tools that require scalable and precise pointer analysis. Moreover, we expect the ideas behind Scaler may help other kinds of static analyses to become more scalable with good precision for real-world programs. ACKNOWLEDGMENTS This work was supported by the European Research Council (ERC) under the FP7 and Horizon 2020 research and innovation programs (grant agreements 307334 and 647544). 138