Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE'18.November 4-9,2018,Lake Buena Vista.FL,USA object (e.g.,in a library method)according to the context in which 3.1.1 Scalability-Critical Methods.The scalability of a context- it is used helps the precision at all use sites of the common code. sensitive pointer analysis is closely linked to the amount of points-to Two observations on context sensitivity will be important in information it produces,i.e.,the size ofrelation pt'Ctxx VarxObj. our subsequent discussion.First,contexts qualify variables but are The challenge that ScALER addresses is to closely upper bound the generally chosen per method.The main use of context sensitivity size of pt'without performing a context-sensitive analysis,given is to distinguish different activations of the same method(e.g. only the result,pt Var x Obj,of a context-insensitive analysis. "stack frames"for the same procedure,in traditional stack-based Prior approaches have made similar attempts to obtain scalability, languages),which create fresh instances of all local variables at run but without a concrete model to predict scalability effectively,as time.Therefore.all local variables of the same method have the explained in Section 7. same set of contexts By then distinguishing which parts of the program can contribute Second,the worst-case complexity of a context-sensitive pointer disproportionately to the total size of pt',SCALER can adjust its analysis is much higher than that of a context-insensitive one:the context sensitivity at different methods,to yield higher precision number of computed points-to facts,and hence the complexity of only when this does not endanger scalability. the analysis,increases in the worst case multiplicatively by the To achieve this,we introduce the concept of scalability-critical size of the set Ctx.However,in the common case,the precision of methods.These are methods that are likely to yield very large context sensitivity often compensates,reducing the analysis com- amounts of context-sensitive points-to information.Whether a plexity:different contexts divide the points-to sets of variables into method is scalability-critical depends on the context sensitivity in non-overlapping subsets.In the ideal case,if a context-insensitive question.For example,a method may be scalability-critical under analysis computes a points-to relation pt Var x Obj,a context- 2obj(a 2-object-sensitive analysis)but not under 2type(a 2-type- sensitive analysis would compute a relation pt'Ctxx Varx Obj sensitive analysis). that is not greater in cardinality than Var x Obj.The extra precision Definition 3.1 (Scalability-Critical Methods).A method m is of the Ctx information would be enough to split the original points- scalability-critical (or,for brevity,just critical)under context sensi- to sets into disjoint subsets.This observation also hints at why tivity c,if the value of the product #ctx#ptsm exceeds a given context sensitivity has unpredictable scalability:When it works scalability threshold STp,where well to maintain precision,its cost is modest to none.When it fails to do so,however,the cost is orders-of-magnitude higher. .#ctx is an estimate of the number of contexts for m if using The actual definitions of the set Ctx can vary widely.Three context-sensitivity variant c,computed using a fast context- main categories are call-site sensitivity,object sensitivity,and type insensitive analysis, sensitivity: #ptsm is the sum of the sizes of the points-to sets for all local variables in m,computed using the same context-insensitive Call-site sensitivity has Ctx be a sequence of call sites,i.e., analysis,and instructions that call the method in which the qualified variable STp is a value that can be given by users(Section 3.1.2)or has been declared. computed automatically(Section 3.2)for program p. .Object sensitivity has Ctx be a sequence of abstract objects:they are the receiver object of the method containing the qualified That is,we upper-bound the potential contribution of a method variable,the receiver object of the method that allocated the to the context-sensitive points-to set(pt'Ctxx Varx Obj)by com previous receiver object,etc. puting the number of potential combinations of possible abstract Type sensitivity keeps the same information as object sensitiv- objects(i.e.,a subset of Obj)and contexts (i.e.,a subset of Ctx)that ity,but objects used as contexts are collapsed not per allocation may pertain to the method.The individual numbers are guaranteed instruction but per class that contains it. to be upper bounds,since they are computed by a less precise(and, We refer the reader to surveys that discuss the options in full de- thus,conservatively over-approximate)context-insensitive analy- tail [30,351. sis.Accordingly,their product is guaranteed to be an upper bound of the number of potential combinations. 3 THE SCALER FRAMEWORK The intuition behind Definition 3.1 is that a method m may cause scalability problems for different reasons:(1)m is being analyzed in In this section,we present an overview of the SCALER approach. too many contexts,(2)too much points-to information is computed We first describe the idea of scalability thresholds,which is the within m,or(3)although the individual numbers of contexts and key to predictable scalability (Section 3.1).Next,we explain how points-to facts for m are not very large,their product is.For this SCALER automatically chooses a suitable scalability threshold for reason,the product of the estimates #ctx and #ptsm gives an each program,based on a single parameter called the total scalability indication of potential scalability problems. threshold that depends only on the available space for storing points- It makes sense to perform this reasoning at the method level, to information(Section 3.2).These ideas are then collected in the since(as discussed in Section 2)decisions on context sensitivity are overall SCALER framework(Section 3.3). made per-method,i.e.,for all local variables of a method together. The number #ptsm can be obtained directly from the context- 3.1 Scalability Thresholds insensitive points-to relation:#ptsm=vemIpt(v,).Comput- We begin by introducing the concept of scalability-critical meth- ing #ctx is less straightforward.This is one of the technical ods(Section 3.1.1),and we then leverage this concept to address novelties of the approach,and we postpone its discussion until unscalability (Section 3.1.2). Section 4. 131Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA object (e.g., in a library method) according to the context in which it is used helps the precision at all use sites of the common code. Two observations on context sensitivity will be important in our subsequent discussion. First, contexts qualify variables but are generally chosen per method. The main use of context sensitivity is to distinguish different activations of the same method (e.g., łstack framesž for the same procedure, in traditional stack-based languages), which create fresh instances of all local variables at run time. Therefore, all local variables of the same method have the same set of contexts. Second, the worst-case complexity of a context-sensitive pointer analysis is much higher than that of a context-insensitive one: the number of computed points-to facts, and hence the complexity of the analysis, increases in the worst case multiplicatively by the size of the set Ctx. However, in the common case, the precision of context sensitivity often compensates, reducing the analysis complexity: different contexts divide the points-to sets of variables into non-overlapping subsets. In the ideal case, if a context-insensitive analysis computes a points-to relation pt ⊆ Var × Obj, a contextsensitive analysis would compute a relation pt′ ⊆ Ctx × Var × Obj that is not greater in cardinality than Var ×Obj. The extra precision of the Ctx information would be enough to split the original pointsto sets into disjoint subsets. This observation also hints at why context sensitivity has unpredictable scalability: When it works well to maintain precision, its cost is modest to none. When it fails to do so, however, the cost is orders-of-magnitude higher. The actual definitions of the set Ctx can vary widely. Three main categories are call-site sensitivity, object sensitivity, and type sensitivity: • Call-site sensitivity has Ctx be a sequence of call sites, i.e., instructions that call the method in which the qualified variable has been declared. • Object sensitivity has Ctx be a sequence of abstract objects: they are the receiver object of the method containing the qualified variable, the receiver object of the method that allocated the previous receiver object, etc. • Type sensitivity keeps the same information as object sensitivity, but objects used as contexts are collapsed not per allocation instruction but per class that contains it. We refer the reader to surveys that discuss the options in full detail [30, 35]. 3 THE SCALER FRAMEWORK In this section, we present an overview of the Scaler approach. We first describe the idea of scalability thresholds, which is the key to predictable scalability (Section 3.1). Next, we explain how Scaler automatically chooses a suitable scalability threshold for each program, based on a single parameter called the total scalability threshold that depends only on the available space for storing pointsto information (Section 3.2). These ideas are then collected in the overall Scaler framework (Section 3.3). 3.1 Scalability Thresholds We begin by introducing the concept of scalability-critical methods (Section 3.1.1), and we then leverage this concept to address unscalability (Section 3.1.2). 3.1.1 Scalability-Critical Methods. The scalability of a contextsensitive pointer analysis is closely linked to the amount of points-to information it produces, i.e., the size of relation pt′ ⊆ Ctx×Var×Obj. The challenge that Scaler addresses is to closely upper bound the size of pt′ without performing a context-sensitive analysis, given only the result, pt ⊆ Var × Obj, of a context-insensitive analysis. Prior approaches have made similar attempts to obtain scalability, but without a concrete model to predict scalability effectively, as explained in Section 7. By then distinguishing which parts of the program can contribute disproportionately to the total size of pt′ , Scaler can adjust its context sensitivity at different methods, to yield higher precision only when this does not endanger scalability. To achieve this, we introduce the concept of scalability-critical methods. These are methods that are likely to yield very large amounts of context-sensitive points-to information. Whether a method is scalability-critical depends on the context sensitivity in question. For example, a method may be scalability-critical under 2obj (a 2-object-sensitive analysis) but not under 2type (a 2-typesensitive analysis). Definition 3.1 (Scalability-Critical Methods). A method m is scalability-critical (or, for brevity, just critical) under context sensitivity c, if the value of the product #ctxc m·#ptsm exceeds a given scalability threshold STp , where • #ctxc m is an estimate of the number of contexts for m if using context-sensitivity variant c, computed using a fast contextinsensitive analysis, • #ptsm is the sum of the sizes of the points-to sets for all local variables in m, computed using the same context-insensitive analysis, and • STp is a value that can be given by users (Section 3.1.2) or computed automatically (Section 3.2) for program p. That is, we upper-bound the potential contribution of a method to the context-sensitive points-to set (pt′ ⊆ Ctx×Var×Obj) by computing the number of potential combinations of possible abstract objects (i.e., a subset of Obj) and contexts (i.e., a subset of Ctx) that may pertain to the method. The individual numbers are guaranteed to be upper bounds, since they are computed by a less precise (and, thus, conservatively over-approximate) context-insensitive analysis. Accordingly, their product is guaranteed to be an upper bound of the number of potential combinations. The intuition behind Definition 3.1 is that a methodm may cause scalability problems for different reasons: (1) m is being analyzed in too many contexts, (2) too much points-to information is computed within m, or (3) although the individual numbers of contexts and points-to facts for m are not very large, their product is. For this reason, the product of the estimates #ctxc m and #ptsm gives an indication of potential scalability problems. It makes sense to perform this reasoning at the method level, since (as discussed in Section 2) decisions on context sensitivity are made per-method, i.e., for all local variables of a method together. The number #ptsm can be obtained directly from the contextinsensitive points-to relation: #ptsm = Í v ∈m |pt(v, _)|. Computing #ctxc m is less straightforward. This is one of the technical novelties of the approach, and we postpone its discussion until Section 4. 131