1:16 Yue Li,Tian Tan,Anders Moller,and Yannis Smaragdakis The precise statements of Algorithms 1 and 2 capture the design choices of ZIPPER.Inferences on flow patterns are made on a per-class basis,and context sensitivity is applied on a per-method basis.It is easy to imagine applying context sensitivity at a finer granularity.That is,we could apply context sensitivity to only the variables and object fields that are involved in the flows in the precision-loss patterns(i.e.,the nodes stored in FlowNodes in Algorithm 2)instead of the entire containing methods.In this way,although within the same precision-critical methods,other variables and object fields that are irrelevant to precision-loss patterns can be analyzed context- insensitively,which may lead to better efficiency.For simplicity,in this article we only consider context sensitivity at the granularity of methods,and leave the potential of more refined options for future work. 5 ZIPPER EXPRESS In Section 3,we have explained where precision is lost in a context-insensitive pointer analysis in terms of the three patterns of precision-loss flows.Based on this principle,ZIPpER(Section 4)has been designed to apply context sensitivity only to the identified precision-critical methods,while analyzing the remaining methods context-insensitively.ZIPPER fulfills its goal(as demonstrated in Section 6.1):it is able to preserve virtually all of the precision of highly-precise conventional context-sensitive pointer analyses while making them run faster.One can use ZIPPER as a full replacement for a conventional context-sensitive pointer analysis such as 2obj. However,ZIPPER's precision-preserving principle is very strict in the sense that it will not sacrifice any precision for improving efficiency.On one hand,this design choice is suitable for validating whether our precision-loss model(Section 3)can help identify thoroughly the precision- critical methods that capture the vast majority of precision of full context sensitivity;on the other hand,it may make ZIPPER unscalable for some complex programs. To address this problem,in this section,we present a new pointer analysis,ZIPPER express (ZIPPERe).ZIPPERe makes a new precision and efficiency trade-off,running significantly faster while being only slightly less precise than ZIPPER.Consequently,ZIPPERe is a new sweet spot in state-of-the-art pointer analysis for Java,with great efficiency and good precision. 5.1 Insights of ZIPPER Among the precision-critical methods identified by ZIPPER,some may significantly degrade analysis efficiency if treated context sensitively.Therefore,to make the analysis run fast,we need to find those efficiency-critical methods.The idea behind ZIPPERe is to apply context sensitivity only to the precision-critical methods that are not efficiency-critical,i.e.,those that do not significantly hurt analysis efficiency. How to Identify Efficiency-Critical Methods?In ZIPPERe,we consider the size of the points-to set for each method m,denoted as #ptsm,as an important factor for determining whether a method may incur serious efficiency problems when being analyzed context sensitively.Here the size of the points-to set for a method m means the sum of the sizes of the points-to sets for all the variables in m.As the analysis is built on top of the three-address code of a program,an extra variable is introduced when a parameter or field in the program is referenced in this code format.Therefore, the variables in a method include all original local variables,as well as those extra variables. Leveraging #ptsm to estimate the efficiency cost for pointer analysis has been adopted in existing literature [Smaragdakis et al.2014],and there are two reasons to consider this metric.First,the efficiency of a context-sensitive pointer analysis is directly correlated with the number of its generated points-to facts:the larger the number,the more memory and analysis iterations are needed,and,accordingly,more analysis time is spent.Second,#ptsm can easily be approximated ACM Trans.Program.Lang.Syst.,Vol.1,No.1,Article 1.Publication date:January 2020.1:16 Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis The precise statements of Algorithms 1 and 2 capture the design choices of Zipper. Inferences on flow patterns are made on a per-class basis, and context sensitivity is applied on a per-method basis. It is easy to imagine applying context sensitivity at a finer granularity. That is, we could apply context sensitivity to only the variables and object fields that are involved in the flows in the precision-loss patterns (i.e., the nodes stored in FlowNodes in Algorithm 2) instead of the entire containing methods. In this way, although within the same precision-critical methods, other variables and object fields that are irrelevant to precision-loss patterns can be analyzed contextinsensitively, which may lead to better efficiency. For simplicity, in this article we only consider context sensitivity at the granularity of methods, and leave the potential of more refined options for future work. 5 ZIPPER EXPRESS In Section 3, we have explained where precision is lost in a context-insensitive pointer analysis in terms of the three patterns of precision-loss flows. Based on this principle, Zipper (Section 4) has been designed to apply context sensitivity only to the identified precision-critical methods, while analyzing the remaining methods context-insensitively. Zipper fulfills its goal (as demonstrated in Section 6.1): it is able to preserve virtually all of the precision of highly-precise conventional context-sensitive pointer analyses while making them run faster. One can use Zipper as a full replacement for a conventional context-sensitive pointer analysis such as 2obj. However, Zipper’s precision-preserving principle is very strict in the sense that it will not sacrifice any precision for improving efficiency. On one hand, this design choice is suitable for validating whether our precision-loss model (Section 3) can help identify thoroughly the precisioncritical methods that capture the vast majority of precision of full context sensitivity; on the other hand, it may make Zipper unscalable for some complex programs. To address this problem, in this section, we present a new pointer analysis, Zipper express (Zipper𝑒 ). Zipper𝑒 makes a new precision and efficiency trade-off, running significantly faster while being only slightly less precise than Zipper. Consequently, Zipper𝑒 is a new sweet spot in state-of-the-art pointer analysis for Java, with great efficiency and good precision. 5.1 Insights of Zipper𝑒 Among the precision-critical methods identified by Zipper, some may significantly degrade analysis efficiency if treated context sensitively. Therefore, to make the analysis run fast, we need to find those efficiency-critical methods. The idea behind Zipper𝑒 is to apply context sensitivity only to the precision-critical methods that are not efficiency-critical, i.e., those that do not significantly hurt analysis efficiency. How to Identify Efficiency-Critical Methods? In Zipper𝑒 , we consider the size of the points-to set for each method 𝑚, denoted as #pts𝑚, as an important factor for determining whether a method may incur serious efficiency problems when being analyzed context sensitively. Here the size of the points-to set for a method 𝑚 means the sum of the sizes of the points-to sets for all the variables in 𝑚. As the analysis is built on top of the three-address code of a program, an extra variable is introduced when a parameter or field in the program is referenced in this code format. Therefore, the variables in a method include all original local variables, as well as those extra variables. Leveraging #pts𝑚 to estimate the efficiency cost for pointer analysis has been adopted in existing literature [Smaragdakis et al. 2014], and there are two reasons to consider this metric. First, the efficiency of a context-sensitive pointer analysis is directly correlated with the number of its generated points-to facts: the larger the number, the more memory and analysis iterations are needed, and, accordingly, more analysis time is spent. Second, #pts𝑚 can easily be approximated ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2020