Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE'18.November 4-9,2018,Lake Buena Vista.FL,USA #ctxC#ptsm 1000000 -c=2obj SCALER performs this computation by leveraging the object al- c=2type location graph (OAG)structure proposed by Tan et al.[37].With c=1type the OAG,the context computation problem can be formulated as a graph traversal problem.For any program,based on an OAG 100000 derived from pre-analysis(context-insensitive pointer analysis), E(STp)=A1 +A2+A3s TST SCALER computes the number of contexts(#ctx)of every method for each kind of context sensitivity c used(2obj,2type and 1type STp STp is automatically computed in our setup)by enumerating all contexts of the method. 10000 based on the above inequality The OAG of a program is a directed graph.A node of the OAG represents an abstract object,which is identified by its allocation A1 A2 A3 site in the program.An edge of the OAG,say o-02,represents 0 method,methodaooo methodaooo methodsoooo an object-allocation relation between o1 and 02,i.e.,o1 is a receiver object of the method that contains the allocation site of o2.SCALER Figure 4:Selecting the scalability threshold(STp)based on leverages the pre-analysis to build the OAG for the given program the total scalability threshold (TST). The OAG provides a graphical perspective of object-(and type-) sensitivity,i.e.,a k-depth context in object-sensitivity corresponds to higher STp values may also be viable.The maximum such STp a k-node path in the OAG [37].Thus,to compute k-object-sensitive is computed using binary search in a range between 0 and the contexts of a method m,SCALER simply enumerates k-node paths maximum value of #ctxm selectCx(m.ST)sm for all m. in the OAG,leading to the receiver objects of m. Figure 6 illustrates the mechanism with a simple example.The allocation sites are labeled B1.B2,and C1,respectively.Suppose we 3.3 Overall Approach compute 2obj contexts for method m()(line 12)and its receiver Figure 5 shows the overall structure of the ScALER framework. object is C1(allocated at line 16).Further,B1 and B2 are two allocator SCALER is a stand-alone system that automatically selects a per- objects of C1 according to k-object-sensitivity [25,32].The possible method context-sensitivity variant that can subsequently be used (2obj)contexts of m()are [B1,C1]and [B2,C1].The corresponding to configure an existing pointer analysis tool like DooP,WALA,or OAG is given in Figure 6,which shows two 2-node paths that CHORD. are exactly the 2obj contexts for m().Since a context-insensitive Given a program p,a fast (but imprecise)context-insensitive analysis over-approximates the fully precise OAG,some edges may pointer analysis is run first to produce some basic points-to infor- be spurious.However,this is fine,since we only need an upper mation that is used for estimating #ptsm(Section 3.1.1)and #cx bound of the number of possible contexts,to establish scalability. (Section 4).The core of SCALER consists of one component that Type sensitivity is an isomorphic approximation of object sensi- decides the scalability threshold based on a given total scalability tivity [30],thus we can easily derive the contexts for type sensitivity threshold(Section 3.2)and another component that chooses context with the OAG.Following the definition [32],SCALER computes the sensitivity strategies for all methods in the program (Section 3.1.2). contexts for type sensitivity by merely replacing objects in contexts The framework relies on the collection C of context-sensitivity (as computed from the OAG)by the types that contain the alloca- variants.All that is needed for each variant is a mechanism for tion sites of the objects.For example,to compute the contexts for estimating the number of contexts,as presented in Section 4 for method m()(Figure 6)under 2type,SCALER first obtains the 2obj object-sensitivity and type-sensitivity. contexts,i.e.,[B1,C1]and [B2,C1],then replaces B1 and B2 by type A,and C1 by type B.As a result,there is only one context of m() 4 ESTIMATING THE NUMBER OF CONTEXTS under 2type,i.e.,[A,B]. In Section 3.1 we postponed the discussion of how to compute an 5 IMPLEMENTATION upper bound on the number of contexts(#ctx),for a given variant We have implemented SCALER as a stand-alone open-source tool in of context sensitivity c,using only context-insensitive analysis results.This computation is one of the key elements of SCALER and Java,available at http://www.brics.dk/scaler/.SCALER is designed to distinguishes it from prior approaches that have also tried to adapt work with various pointer analysis frameworks such as Doop [4], context sensitivity on a per-method basis [14,33,41]. WALA [7],CHORD [26],and SooT [40].To demonstrate its effective- The computation of the possible contexts for the context-sensitive ness,we have integrated ScALER with Doop [4],a state-of-the-art pointer analysis framework for Java analysis of a method,from only context-insensitive analysis results. is relatively easy for simple variants of context sensitivity,such as In practice,we found that the #ctxfn#ptsm values of some Java call-site sensitivity [28]:the possible contexts are call sites,readily collection methods under package java.util.are very large,be- identifiable in the program code.This computation is nontrivial cause(1)the collection methods are frequently called,thus their for object-and type-sensitivity[24,32],however:the contexts of a #ctx values are large,and (2)many objects are passed to the method are abstract objects,determined by the analysis mechanism. AWe do not consider call-site sensitive analyses(1call,2call)or 1-object sensitivity We are not aware of any prior work that performs a similar com- (1obj)nour evaluation.as these analysis variants are typically both less precise and putation of possible contexts for object-sensitive or type-sensitive less scalable than at least one of the analyses in our setup.The SCALER approach can be analyses,without running the analyses themselves. adapted to any collection of context sensitivity variants,as long as a relative ordering in both increasing precision and cost is possible. 133Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA method A1 A2 A3 A1 + A2 + A3 ≤ TST is automatically computed based on the above inequality 1 method 2000 method 4000 method 10 000 0 10 000 1 000 000 100 000 STp #ctx c m·#pts m STp 2type 1type c 2obj = c = c = E(STp) = Figure 4: Selecting the scalability threshold (STp ) based on the total scalability threshold (TST). higher STp values may also be viable. The maximum such STp is computed using binary search in a range between 0 and the maximum value of #ctx SelectCtx(m,STp ) m · #ptsm for all m. 3.3 Overall Approach Figure 5 shows the overall structure of the Scaler framework. Scaler is a stand-alone system that automatically selects a permethod context-sensitivity variant that can subsequently be used to configure an existing pointer analysis tool like Doop, Wala, or Chord. Given a program p, a fast (but imprecise) context-insensitive pointer analysis is run first to produce some basic points-to information that is used for estimating #ptsm (Section 3.1.1) and #ctxc m (Section 4). The core of Scaler consists of one component that decides the scalability threshold based on a given total scalability threshold (Section 3.2) and another component that chooses context sensitivity strategies for all methods in the program (Section 3.1.2). The framework relies on the collection C of context-sensitivity variants. All that is needed for each variant is a mechanism for estimating the number of contexts, as presented in Section 4 for object-sensitivity and type-sensitivity. 4 ESTIMATING THE NUMBER OF CONTEXTS In Section 3.1 we postponed the discussion of how to compute an upper bound on the number of contexts (#ctxc m), for a given variant of context sensitivity c, using only context-insensitive analysis results. This computation is one of the key elements of Scaler and distinguishes it from prior approaches that have also tried to adapt context sensitivity on a per-method basis [14, 33, 41]. The computation of the possible contexts for the context-sensitive analysis of a method, from only context-insensitive analysis results, is relatively easy for simple variants of context sensitivity, such as call-site sensitivity [28]: the possible contexts are call sites, readily identifiable in the program code. This computation is nontrivial for object- and type-sensitivity[24, 32], however: the contexts of a method are abstract objects, determined by the analysis mechanism. We are not aware of any prior work that performs a similar computation of possible contexts for object-sensitive or type-sensitive analyses, without running the analyses themselves. Scaler performs this computation by leveraging the object allocation graph (OAG) structure proposed by Tan et al. [37]. With the OAG, the context computation problem can be formulated as a graph traversal problem. For any program, based on an OAG derived from pre-analysis (context-insensitive pointer analysis), Scaler computes the number of contexts (#ctxc m) of every method for each kind of context sensitivity c used (2obj, 2type and 1type in our setup) by enumerating all contexts of the method.4 The OAG of a program is a directed graph. A node of the OAG represents an abstract object, which is identified by its allocation site in the program. An edge of the OAG, say o1 → o2, represents an object-allocation relation between o1 and o2, i.e., o1 is a receiver object of the method that contains the allocation site of o2. Scaler leverages the pre-analysis to build the OAG for the given program. The OAG provides a graphical perspective of object- (and type-) sensitivity, i.e., a k-depth context in object-sensitivity corresponds to a k-node path in the OAG [37]. Thus, to compute k-object-sensitive contexts of a method m, Scaler simply enumerates k-node paths in the OAG, leading to the receiver objects of m. Figure 6 illustrates the mechanism with a simple example. The allocation sites are labeled B1, B2, and C1, respectively. Suppose we compute 2obj contexts for method m() (line 12) and its receiver object is C1 (allocated at line 16). Further, B1 and B2 are two allocator objects of C1 according to k-object-sensitivity [25, 32]. The possible (2obj) contexts of m() are [B1,C1] and [B2,C1]. The corresponding OAG is given in Figure 6, which shows two 2-node paths that are exactly the 2obj contexts for m(). Since a context-insensitive analysis over-approximates the fully precise OAG, some edges may be spurious. However, this is fine, since we only need an upper bound of the number of possible contexts, to establish scalability. Type sensitivity is an isomorphic approximation of object sensitivity [30], thus we can easily derive the contexts for type sensitivity with the OAG. Following the definition [32], Scaler computes the contexts for type sensitivity by merely replacing objects in contexts (as computed from the OAG) by the types that contain the allocation sites of the objects. For example, to compute the contexts for method m() (Figure 6) under 2type, Scaler first obtains the 2obj contexts, i.e., [B1,C1] and [B2,C1], then replaces B1 and B2 by type A, and C1 by type B. As a result, there is only one context of m() under 2type, i.e., [A,B]. 5 IMPLEMENTATION We have implemented Scaler as a stand-alone open-source tool in Java, available at http://www.brics.dk/scaler/. Scaler is designed to work with various pointer analysis frameworks such as Doop [4], Wala [7], Chord [26], and Soot [40]. To demonstrate its effectiveness, we have integrated Scaler with Doop [4], a state-of-the-art pointer analysis framework for Java. In practice, we found that the #ctxc m·#ptsm values of some Java collection methods under package java.util.* are very large, because (1) the collection methods are frequently called, thus their #ctxc m values are large, and (2) many objects are passed to the 4We do not consider call-site sensitive analyses (1call, 2call) or 1-object sensitivity (1obj) in our evaluation, as these analysis variants are typically both less precise and less scalable than at least one of the analyses in our setup. The Scaler approach can be adapted to any collection of context sensitivity variants, as long as a relative ordering in both increasing precision and cost is possible. 133