ESEC/FSE'18.November 4-9,2018,Lake Buena Vista,FL,USA Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis Table 2:Performance of SCALER. Program jython soot pmd briss jedit eclipse findbugs chart luindex lusearch avg. SCALER time (seconds)35.0 1.8 0.8 2.6 1.4 1.0 0.5 0.4 0.2 0.34.4 runs faster than ScALER thus also exhibiting good scalability;how- 2obj 2type 1type DC ever,SCALER's precision is significantly better than IntroA's(SCALER sT。 16607035423273517099830778699108627350807339354949 wins in precision in all the precision metrics of all the programs). 100% 90% IntroB exhibits better precision than IntroA in all cases(when it is scalable)but it is still less precise than SCALER in all the cases 0% 70% except #call-edges for soot and #reach-methods for chart.Regard- 60% ing efficiency,SCALER runs faster than IntroB for most programs 50% (except eclipse,chart,and luindex);in addition,IntroB is not 40% scalable for two programs(jython and briss). 30% 0% Answer to RQ2:Comparing with state-of-the-art adaptive anal- 0% yses,introspective analyses IntroA and IntroB,both SCALER and 0% IntroA exhibit extremely good scalability while ScALER's precision is ython ▣▣ chart luindex lusearch significantly better than IntroA's;and ScALER's scalability is better than IntroB's while being more precise in most cases. Figure 7:The STp value (on top of each bar)computed by SCALER for each program and the distribution of different 6.3 SCALER's Effectiveness as a Pre-Analysis kinds of context-sensitivities selected according to each STp. This section answers RO3 by examining the overhead of SCALER's and briss),medium STp values for medium-complexity programs adaptivity as well as the STp values selected by ScALER for each (findbugs and chart),and large STp values for simple programs program,and the corresponding distribution of different kinds of (luindex and lusearch). context sensitivity based on the selected STp. SCALER automatically selects 2obj,the most precise context- 6.3.1 Overhead of SCALER.The overall overhead of the SCALER sensitivity in our experiments,for most methods(80.6%per pro- framework(Figure 5)consists of two components:(1)a context- gram on average).This is the reason why ScALER-guided pointer insensitive pointer analysis CL,which provides points-to informa- analysis achieves very good precision as shown in Table 1.This tion to SCALER,and(2)SCALER logic itself,which performs TST- also demonstrates that our insight of scalability-critical methods based STp selection and STp-based context-sensitivity.SCALER's (Section 3.1.1)holds in practice:in most cases,only a small set of pre-analysis time(CI)is given in Table 1,and Table 2 shows the scalability-critical methods make the whole analysis unscalable. overhead of SCALER itself.The average overall overhead of SCALER We next discuss two outlier cases,at both ends of the spectrum, for each program is 183.8 seconds,of which CI pre-analysis costs since they are informative of SCALER's limit behavior account for the vast majority (179.4 seconds),while SCALER logic soot.SCALER selects 0 as STp value for soot,so most of its costs only 4.4 seconds.Considering the significant scalability im- methods are analyzed with context-insensitivity(CI).The reason is provements achieved by SCALER,its overhead is negligible. that soot is very large.The pre-analysis of SCALER reports that the SCALER spends 35 seconds for jython,which is markedly longer total size of points-to information of soot(i.e.,the sum of #ptsm than for other programs(Table 2).The reason is that jython is of all methods)is 110 901 529,which already exceeds our default especially complicated in the sense that many of its methods have TST(30M).As a result,SCALER automatically selects 2obj for only enormous numbers of contexts,thus ScALER spends much time on 1792 methods in package java.util.*(as explained in Section 5) #ctxf computation.For instance,SCALER's #ctxfn computation and CI for all other methods according to the definition of SelectCtx shows that 130 methods of jython have more than 1000 000 con- in Section 3.1.2,to ensure the scalability of pointer analysis. texts under 2obj.(For comparison,the maximal #ctxf value of a single method under 2obj in soot,the largest program in our luindex and lusearch.SCALER selects two very large STp val- evaluation,is only 88 473.)In this way,SCALER also reveals the ues for luindex and lusearch(the two simplest programs in our reason why many context-sensitive pointer analyses fail to ana- evaluation).Accordingly,it assigns 2obj for all methods in the lyze jython scalably as reported in previous work [15,32,37,38]. two programs which are all classified as non-scalability-critical SCALER avoids the problem due to its STp-based context-sensitivity methods under 2obj.There is only one exception:the method design (Section 3.1.2). <java.lang.Object:void <init>()>.In Java's type hierarchy. object is the supertype of all classes;thus its <init>method will 6.3.2 ST Values and Context-Sensitivity Distributions.Figure 7 gives be called whenever a constructor is called,yielding too many con- the STp values selected by ScALER for each program according to texts(#ctx).and a this variable that abstractly points to all the the default TST(30M),and the distribution of different kinds of objects created in the program,which makes the method's #ptsm context sensitivity over the methods of the programs based on value very large. the selected STp.Generally,given the same TST,SCALER automat- As a result,the #ctxt#ptsm value for this <init>method ex- ically selects small STp values for complex programs (e.g.,soot ceeds the selected STp value,so SCALER selects less precise context 136ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis Table 2: Performance of Scaler. Program jython soot pmd briss jedit eclipse findbugs chart luindex lusearch avg. Scaler time (seconds) 35.0 1.8 0.8 2.6 1.4 1.0 0.5 0.4 0.2 0.3 4.4 runs faster than Scaler thus also exhibiting good scalability; however, Scaler’s precision is significantly better than IntroA’s (Scaler wins in precision in all the precision metrics of all the programs). IntroB exhibits better precision than IntroA in all cases (when it is scalable) but it is still less precise than Scaler in all the cases except #call-edges for soot and #reach-methods for chart. Regarding efficiency, Scaler runs faster than IntroB for most programs (except eclipse, chart, and luindex); in addition, IntroB is not scalable for two programs (jython and briss). Answer to RQ2: Comparing with state-of-the-art adaptive analyses, introspective analyses IntroA and IntroB, both Scaler and IntroA exhibit extremely good scalability while Scaler’s precision is significantly better than IntroA’s; and Scaler’s scalability is better than IntroB’s while being more precise in most cases. 6.3 Scaler’s Effectiveness as a Pre-Analysis This section answers RQ3 by examining the overhead of Scaler’s adaptivity as well as the STp values selected by Scaler for each program, and the corresponding distribution of different kinds of context sensitivity based on the selected STp . 6.3.1 Overhead of Scaler. The overall overhead of the Scaler framework (Figure 5) consists of two components: (1) a contextinsensitive pointer analysis CI, which provides points-to information to Scaler, and (2) Scaler logic itself, which performs TSTbased STp selection and STp -based context-sensitivity. Scaler’s pre-analysis time (CI) is given in Table 1, and Table 2 shows the overhead of Scaler itself. The average overall overhead of Scaler for each program is 183.8 seconds, of which CI pre-analysis costs account for the vast majority (179.4 seconds), while Scaler logic costs only 4.4 seconds. Considering the significant scalability improvements achieved by Scaler, its overhead is negligible. Scaler spends 35 seconds for jython, which is markedly longer than for other programs (Table 2). The reason is that jython is especially complicated in the sense that many of its methods have enormous numbers of contexts, thus Scaler spends much time on #ctxc m computation. For instance, Scaler’s #ctxc m computation shows that 130 methods of jython have more than 1 000 000 contexts under 2obj. (For comparison, the maximal #ctxc m value of a single method under 2obj in soot, the largest program in our evaluation, is only 88 473.) In this way, Scaler also reveals the reason why many context-sensitive pointer analyses fail to analyze jython scalably as reported in previous work [15, 32, 37, 38]. Scaler avoids the problem due to its STp -based context-sensitivity design (Section 3.1.2). 6.3.2 ST Values and Context-Sensitivity Distributions. Figure 7 gives the STp values selected by Scaler for each program according to the default TST (30M), and the distribution of different kinds of context sensitivity over the methods of the programs based on the selected STp . Generally, given the same TST, Scaler automatically selects small STp values for complex programs (e.g., soot 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% jython soot pmd briss jedit eclipse findbugs chart luindex lusearch 2obj 2type 1type CI STp 16607 0 35423 2735 17099 8307 78699 108627 35080733 9354949 Figure 7: The STp value (on top of each bar) computed by Scaler for each program and the distribution of different kinds of context-sensitivities selected according to each STp . and briss), medium STp values for medium-complexity programs (findbugs and chart), and large STp values for simple programs (luindex and lusearch). Scaler automatically selects 2obj, the most precise contextsensitivity in our experiments, for most methods (80.6% per program on average). This is the reason why Scaler-guided pointer analysis achieves very good precision as shown in Table 1. This also demonstrates that our insight of scalability-critical methods (Section 3.1.1) holds in practice: in most cases, only a small set of scalability-critical methods make the whole analysis unscalable. We next discuss two outlier cases, at both ends of the spectrum, since they are informative of Scaler’s limit behavior. soot. Scaler selects 0 as STp value for soot, so most of its methods are analyzed with context-insensitivity (CI). The reason is that soot is very large. The pre-analysis of Scaler reports that the total size of points-to information of soot (i.e., the sum of #ptsm of all methods) is 110 901 529, which already exceeds our default TST (30M). As a result, Scaler automatically selects 2obj for only 1 792 methods in package java.util.* (as explained in Section 5) and CI for all other methods according to the definition of SelectCtx in Section 3.1.2, to ensure the scalability of pointer analysis. luindex and lusearch. Scaler selects two very large STp values for luindex and lusearch (the two simplest programs in our evaluation). Accordingly, it assigns 2obj for all methods in the two programs which are all classified as non-scalability-critical methods under 2obj. There is only one exception: the method <java.lang.Object: void <init>()>. In Java’s type hierarchy, Object is the supertype of all classes; thus its <init> method will be called whenever a constructor is called, yielding too many contexts (#ctxc m), and a this variable that abstractly points to all the objects created in the program, which makes the method’s #ptsm value very large. As a result, the #ctxc m·#ptsm value for this <init> method exceeds the selected STp value, so Scaler selects less precise context 136