A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:17 from a context-insensitive pre-analysis as described in Section 4.If #ptsm for a method m is large, it indicates that analyzing m context-sensitively would likely produce many context-sensitive points-to facts [Li et al.2018b]resulting in a high analysis time. However,we find that simply using #ptsm to decide whether a given method m is efficiency- critical does not work well:the analysis becomes both inefficient and imprecise for some programs as the wrong methods are selected for context sensitivity.At the same time,we also do not want to complicate ZIpPERe by incorporating multiple configurable parameters as in existing work [Hassanshahi et al.2017;Smaragdakis et al.2014;Wei and Ryder 2015]. The precision-loss model introduced in Section 3 enables us to address this problem.Recall that the precision-critical methods are identified based on the granularity of each class(from its IN methods to its Our methods).This means that,for each class,if one method that is involved in the precision-loss flows(or their combinations)is not analyzed with context sensitivity,we may still lose the precision even if all the remaining involved methods are analyzed context-sensitively. Based on this key observation,in ZIPPER we identify efficiency-critical methods using #ptsm but at the granularity of classes rather than methods.This means that we use the sum of #ptsm of all the methods m that are involved in the precision-loss flows(or their combinations)when analyzing class c,and we denote this parameter as #ptse(note that m may belong to a class other than c,as explained in Section 3).For each class c,all its involved methods can be treated as one: either all or none of its methods are analyzed with context sensitivity. How to Identify Efficiency-Critical Methods Using #ptse?When #ptse for a class c is large,i.e., exceeds a certain threshold,ZIPpERe excludes all its related methods that are involved in the precision-loss flows from context sensitivity,thereby preventing them from hurting the analysis speed.Note that,as explained above,#ptse is not the points-to size for c;instead,it represents the cumulative points-to size of all precision-critical methods involved in the precision-loss flows starting from and ending in class c. The last problem is how to select a shared threshold for different programs.A fixed value will not work well as a constant threshold may be too large for small or simple programs,or too small for large or complex programs.To resolve this problem,instead of a fixed value,we consider the threshold as a percentage value,PV.That is,for each class c in program p,if(#ptsc/#ptsp)>PV (where #ptsp is the sum of the sizes of the points-to sets for all variables in p),we say that #ptse is too large relative to the points-to size of the whole program,and thus consider the related methods in c to be efficiency-critical methods.With this heuristic,we only need one threshold,i.e.,PV,for all programs.By default PV is set to 5%. 5.2 The ZIPPER Algorithm Given the background and insights presented in Section 5.1,in this section we explain how ZIPPERe works.Algorithm 4 describes how ZIPpERe selects a set of methods,i.e.,Mcs (the output of the algorithm),that will be analyzed context-sensitively in the main analysis.Briefly,they are the remaining methods after excluding the efficiency-critical methods from the precision-critical methods identified by the approach as used in ZIPPER.Thus Algorithm 4 extends Algorithm 3. As inputs of the algorithm,OFG is the object flow graph,which is built by the pre-analysis as described in Section 4.2,and PV is the shared threshold for all programs as described in Section 5.1. In line 1,TH is per program and denotes the threshold for the given program p,representing the efficiency-critical bound for the sizes of the points-to set of each class c in p.As illustrated in Section 5.1,when #ptse is larger than TH,ZIPPERe will consider all precision-critical methods related to class c to be efficiency-critical.As shown in line 6,#ptse is the size of the points-to sets ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020.A Principled Approach to Selective Context Sensitivity for Pointer Analysis 1:17 from a context-insensitive pre-analysis as described in Section 4. If #pts𝑚 for a method 𝑚 is large, it indicates that analyzing 𝑚 context-sensitively would likely produce many context-sensitive points-to facts [Li et al. 2018b] resulting in a high analysis time. However, we find that simply using #pts𝑚 to decide whether a given method 𝑚 is efficiencycritical does not work well: the analysis becomes both inefficient and imprecise for some programs as the wrong methods are selected for context sensitivity. At the same time, we also do not want to complicate Zipper𝑒 by incorporating multiple configurable parameters as in existing work [Hassanshahi et al. 2017; Smaragdakis et al. 2014; Wei and Ryder 2015]. The precision-loss model introduced in Section 3 enables us to address this problem. Recall that the precision-critical methods are identified based on the granularity of each class (from its In methods to its Out methods). This means that, for each class, if one method that is involved in the precision-loss flows (or their combinations) is not analyzed with context sensitivity, we may still lose the precision even if all the remaining involved methods are analyzed context-sensitively. Based on this key observation, in Zipper𝑒 we identify efficiency-critical methods using #pts𝑚 but at the granularity of classes rather than methods. This means that we use the sum of #pts𝑚 of all the methods 𝑚 that are involved in the precision-loss flows (or their combinations) when analyzing class 𝑐, and we denote this parameter as #pts𝑐 (note that 𝑚 may belong to a class other than 𝑐, as explained in Section 3). For each class 𝑐, all its involved methods can be treated as one: either all or none of its methods are analyzed with context sensitivity. How to Identify Efficiency-Critical Methods Using #pts𝑐 ? When #pts𝑐 for a class 𝑐 is large, i.e., exceeds a certain threshold, Zipper𝑒 excludes all its related methods that are involved in the precision-loss flows from context sensitivity, thereby preventing them from hurting the analysis speed. Note that, as explained above, #pts𝑐 is not the points-to size for 𝑐; instead, it represents the cumulative points-to size of all precision-critical methods involved in the precision-loss flows starting from and ending in class 𝑐. The last problem is how to select a shared threshold for different programs. A fixed value will not work well as a constant threshold may be too large for small or simple programs, or too small for large or complex programs. To resolve this problem, instead of a fixed value, we consider the threshold as a percentage value, PV. That is, for each class 𝑐 in program 𝑝, if (#pts𝑐 / #pts𝑝 ) > PV (where #pts𝑝 is the sum of the sizes of the points-to sets for all variables in 𝑝), we say that #pts𝑐 is too large relative to the points-to size of the whole program, and thus consider the related methods in 𝑐 to be efficiency-critical methods. With this heuristic, we only need one threshold, i.e., PV, for all programs. By default PV is set to 5%. 5.2 The Zipper𝑒 Algorithm Given the background and insights presented in Section 5.1, in this section we explain how Zipper𝑒 works. Algorithm 4 describes how Zipper𝑒 selects a set of methods, i.e., Mcs (the output of the algorithm), that will be analyzed context-sensitively in the main analysis. Briefly, they are the remaining methods after excluding the efficiency-critical methods from the precision-critical methods identified by the approach as used in Zipper. Thus Algorithm 4 extends Algorithm 3. As inputs of the algorithm, OFG is the object flow graph, which is built by the pre-analysis as described in Section 4.2, and PV is the shared threshold for all programs as described in Section 5.1. In line 1, TH is per program and denotes the threshold for the given program 𝑝, representing the efficiency-critical bound for the sizes of the points-to set of each class 𝑐 in 𝑝. As illustrated in Section 5.1, when #pts𝑐 is larger than TH, Zipper𝑒 will consider all precision-critical methods related to class 𝑐 to be efficiency-critical. As shown in line 6, #pts𝑐 is the size of the points-to sets ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020