Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:9 Variable CI S1 S2 S1&S2 Relay 1 x.f a; a o1 01 07 01 01 2 b=y.f; X 02 02 02 02 02 3 y 02,03 03 02,03 03 03 4 p.f a; p 04 04 04 04 04 5 b=q.f; q 04,05 04,05 o5 05 05 6 01 01 01 01 Fig.4.Example for illustrating precision intervention in Relay.Specifically,b is supposed to be a null pointer at run time,and only Relay can analyze it precisely in this case. as A;scales.We will further discuss the scalability of Relay-02 in Section 4.Note that Relay-02 is fundamentally different from merely intersecting the points-to results of all pointer analyses guided by the input approaches.In Relay-02,there is early precision intervention of each analysis pass to the next:the pointer analysis in each pass is filtered by the points-to results produced by the previous passes on the fly,likely preventing the computation of more spurious value flows during the analysis.As a result,Relay-02 is more precise than simply intersecting the final points-to sets after individually running each guided pointer analysis without such precision intervention. We next use an example,in Figure 4,to illustrate the benefit of precision intervention.The left side of Figure 4 is an example code snippet,where the value of a is stored into x.f and p.f(through lines 1 and 4),and b loads values from y.f and q.f (through lines 2 and 5).The table on the right side contains the points-to sets of the variables in the code snippet under different pointer analyses. Column CI gives the points-to sets produced by a context-insensitive pointer analysis,and we consider two selective context-sensitive pointer analyses,S1 and S2,which identify o2 and o4 as spurious objects for variables y and q,respectively.If we simply intersect the points-to results of S1 and S2,then we can eliminate spurious objects for y and q,as shown in column S1&S2.However,b still points to the spurious object o1 under S1&S2,as o1 can flow to b through either line 2 or line 5: both S1 and S2 consider o1 to be valid for b,for different reasons each.Among the analyses in the table,only Relay can block the two flows(through lines 2 and 5)simultaneously.In Relay,with S1 as pass 1 and S2 as pass 2,S2 identifies o4 as spurious for q so that the flow through line 5 can be blocked (as p and q are not aliased).Further benefiting from the precision intervention,the flow through line 2 is also blocked as pass 2 also incorporates the result from pass 1,i.e.,o2 is identified as spurious for y by S1(so that x and y are not aliased).As a result,o1 is identified as spurious for b in pass 2.(The points-to set of b is empty,based on the fragment and example values shown,but other program statements could give it values,orthogonally to the example-e.g.,by setting y.f or q.f.) In the overall Relay mechanism,Relay-o1 and Relay-02 apply independently to analysis passes and are mixed as necessary for scalability.The two Relay variants can be seen as the best and worst precision options,and there is design space between them for making precision and efficiency trade-offs.In our design,precision is the first priority,so for every Passi,we first attempt Relay-o1. If Relay-o1 is not scalable,we select Relay-02(which scales if the base approach does),and repeat the same scheme(i.e.,first trying Relay-01,then Relay-02 if it fails)to Passi+1,etc.until all n passes complete(assuming there are n input selective approaches). The running order for different passes of Relay does not affect the soundness and precision guarantees of Relay(including both Relay-o1 and Relay-02)and also the scalability potential of Relay-02(as proved and discussed in Section 4.4).In other words,in practice,regardless of the pass ordering,users can rely on Relay to produce an analysis that is more precise than the one(say Proc.ACM Program.Lang.,Vol.5,No.OOPSLA,Article 147.Publication date:October 2021.Making Pointer Analysis More Precise by Unleashing the Power of Selective Context Sensitivity 147:9 1 x.f = a; 2 b = y.f; 3 ... 4 p.f = a; 5 b = q.f; Variable CI S1 S2 S1&S2 Relay a o1 o1 o1 o1 o1 x o2 o2 o2 o2 o2 y o2,o3 o3 o2,o3 o3 o3 p o4 o4 o4 o4 o4 q o4,o5 o4,o5 o5 o5 o5 b o1 o1 o1 o1 Fig. 4. Example for illustrating precision intervention in Relay. Specifically, b is supposed to be a null pointer at run time, and only Relay can analyze it precisely in this case. as A𝑖 scales. We will further discuss the scalability of Relay-o2 in Section 4. Note that Relay-o2 is fundamentally different from merely intersecting the points-to results of all pointer analyses guided by the input approaches. In Relay-o2, there is early precision intervention of each analysis pass to the next: the pointer analysis in each pass is filtered by the points-to results produced by the previous passes on the fly, likely preventing the computation of more spurious value flows during the analysis. As a result, Relay-o2 is more precise than simply intersecting the final points-to sets after individually running each guided pointer analysis without such precision intervention. We next use an example, in Figure 4, to illustrate the benefit of precision intervention. The left side of Figure 4 is an example code snippet, where the value of a is stored into x.f and p.f (through lines 1 and 4), and b loads values from y.f and q.f (through lines 2 and 5). The table on the right side contains the points-to sets of the variables in the code snippet under different pointer analyses. Column CI gives the points-to sets produced by a context-insensitive pointer analysis, and we consider two selective context-sensitive pointer analyses, S1 and S2, which identify o2 and o4 as spurious objects for variables y and q, respectively. If we simply intersect the points-to results of S1 and S2, then we can eliminate spurious objects for y and q, as shown in column S1&S2. However, b still points to the spurious object o1 under S1&S2, as o1 can flow to b through either line 2 or line 5: both S1 and S2 consider o1 to be valid for b, for different reasons each. Among the analyses in the table, only Relay can block the two flows (through lines 2 and 5) simultaneously. In Relay, with S1 as pass 1 and S2 as pass 2, S2 identifies o4 as spurious for q so that the flow through line 5 can be blocked (as p and q are not aliased). Further benefiting from the precision intervention, the flow through line 2 is also blocked as pass 2 also incorporates the result from pass 1, i.e., o2 is identified as spurious for y by S1 (so that x and y are not aliased). As a result, o1 is identified as spurious for b in pass 2. (The points-to set of b is empty, based on the fragment and example values shown, but other program statements could give it values, orthogonally to the exampleÐe.g., by setting y.f or q.f.) In the overall Relay mechanism, Relay-o1 and Relay-o2 apply independently to analysis passes and are mixed as necessary for scalability. The two Relay variants can be seen as the best and worst precision options, and there is design space between them for making precision and efficiency trade-offs. In our design, precision is the first priority, so for every Pass𝑖 , we first attempt Relay-o1. If Relay-o1 is not scalable, we select Relay-o2 (which scales if the base approach does), and repeat the same scheme (i.e., first trying Relay-o1, then Relay-o2 if it fails) to Pass𝑖+1, etc. until all 𝑛 passes complete (assuming there are 𝑛 input selective approaches). The running order for different passes of Relay does not affect the soundness and precision guarantees of Relay (including both Relay-o1 and Relay-o2) and also the scalability potential of Relay-o2 (as proved and discussed in Section 4.4). In other words, in practice, regardless of the pass ordering, users can rely on Relay to produce an analysis that is more precise than the one (say Proc. ACM Program. Lang., Vol. 5, No. OOPSLA, Article 147. Publication date: October 2021