modifications.MAHJONG is simple conceptually and drops 1Ax=newA();/∥o 10 class A easily on any allocation-site-based points-to analysis. 2Ay=newA();/∥o 11Af: Compared to the allocation-site abstraction,MAHJONG 3 A z new A();//o 12 void foo(){... 133 allows a points-to analysis to run significantly faster while 4 x.f new B();//of 14 class B extends A achieving nearly the same precision for type-dependent 5 y.f new c();/o 15 void foo(){... clients.Thus,MAHJONG makes it possible to accelerate a 6z.f=newc();/o唱 16} given points-to analysis or replace it with a more precise 7 Aa z.f; 17 class C extends A but usually more costly points-to analysis that is either in- 8 a.foo(); 18 void foo(){... efficient or unscalable if the allocation-site abstraction is 9C c=(C)a; 19} used.MAHJONG is expected to provide significant benefits to many program analyses,such as bug detection,security Figure 1.An example program illustrating object merging. analysis,program verification and program understanding, where call graphs are required [3,5,7,16,26,31,32,43, 54,55]. able.In this paper,we aim to improve this by looking for We demonstrate the effectiveness of MAHJONG by dis- a lightweight alternative that satisfies the needs of type- cussing some insights on why it is a better alternative of dependent clients,but not necessarily others such as may- the allocation-site abstraction for type-dependent clients and alias.To this end,we would like to avoid distinguishing two conducting an evaluation extensively on 12 large Java pro- objects if merging them loses no or little precision. grams with five widely used context-sensitive points-to anal- In Section 2.1,we see that blindly merging objects of yses and three significant type-dependent clients,call graph the same type is ineffective.In Section 2.2,we describe construction,devirtualization and may-fail casting [20,22, our solution that merges objects representing equivalent au- 39,40,42].Take,3obj,a 3-object-sensitive points-to anal- tomata only.For object-oriented programs,merging objects ysis [29],the most precise one used in our evaluation,as amounts to merging their corresponding allocation sites. an example.For the four programs that can be analyzed scalably under 3obj,our MAHJONG-based 3obj runs 131X 2.1 Allocation-Type Abstraction:A Naive Solution faster,on average,while achieving nearly the same precision In this so-called allocation-type abstraction,all objects with for all the three clients.For the remaining eight,where 3obj the same type are merged,with one object per type.As is unscalable in 5 hours each,our MAHJONG-based 3obj can previously noted,this naive solution often gains efficiency analyze five of them in an average of 33.42 minutes. but may incur a significant loss of precision [19,27,38.51]. In summary,our paper makes the following contributions: We present MAHJONG,a new heap abstraction that can Example 2.1.Consider Figure 1,where of represents the significantly scale an allocation-site-based points-to anal- abstract object of type t created at the allocation site at line ysis for object-oriented programs while achieving nearly i.We will use this notation in the rest of the paper. the same precision for type-dependent clients. For the three type-dependent clients,call graph construc- tion,devirtualization and may-fail casting,only lines 8-9 We formulate the problem of checking the type-consistency are relevant.According to an allocation-site-based Ander of two objects as one of testing the equivalence of two sen's points-to analysis [4].z,y and z point to of,o2 and automata,which is solvable in almost linear time. og,respectively.As r.f,y.f and z.f are not aliases,a points We implement MAHJONG as a stand-alone open-source to of.Thus,a.foo()at line 8 is a mono-call and can thus be tool.MAHJONG is simple (with only 1500 LOC of Java devirtualized,and in addition.the cast(C)at line 9 is safe. in total)and drops easily on any allocation-site-based However,if of,o2 and og are merged,then r.f,y.f and points-to analysis. 2.f will be aliases,causing a to also point too.As a result, We conduct extensive experiments to evaluate the effec- a.foo()becomes a poly-call and thus non-devirtualizable.In tiveness of MAHJONG in practice. addition,the cast (C)is no longer considered safe Consider pmd,a program analyzed by (1)3obj-a 3- 2.Motivation object-sensitive points-to analysis [29]using the allocation- For points-to analysis,type-dependent clients,such as call site abstraction,(2)T-3obj-3obj using the allocation-type graph construction,devirtualization and may-fail casting, abstraction,and (3)M-3obj-3obj using the MAHJONG share similar needs:their precision depends on the types heap abstraction introduced in this paper.For 3obj,pmd of pointed-to objects rather than the pointed-to objects is analyzed in 14469.3 seconds,allowing 44004 call graph themselves.For such clients,the conventional allocation- edges to be discovered.T-3obj is the fastest (50.3 seconds), site abstraction is often too fine-grained,contributing little but is the most imprecise(50666 call graph edges).In con- to improving their precision but rendering the underlying trast,M-3obj is as precise as 3obj(44016 call graph edges) points-to analysis unduly inefficient or eventually unscal- but is also nearly as fast as T-3obj(127.7 seconds). 279modifications. MAHJONG is simple conceptually and drops easily on any allocation-site-based points-to analysis. Compared to the allocation-site abstraction, MAHJONG allows a points-to analysis to run significantly faster while achieving nearly the same precision for type-dependent clients. Thus, MAHJONG makes it possible to accelerate a given points-to analysis or replace it with a more precise but usually more costly points-to analysis that is either inefficient or unscalable if the allocation-site abstraction is used. MAHJONG is expected to provide significant benefits to many program analyses, such as bug detection, security analysis, program verification and program understanding, where call graphs are required [3, 5, 7, 16, 26, 31, 32, 43, 54, 55]. We demonstrate the effectiveness of MAHJONG by discussing some insights on why it is a better alternative of the allocation-site abstraction for type-dependent clients and conducting an evaluation extensively on 12 large Java programs with five widely used context-sensitive points-to analyses and three significant type-dependent clients, call graph construction, devirtualization and may-fail casting [20, 22, 39, 40, 42]. Take, 3obj, a 3-object-sensitive points-to analysis [29], the most precise one used in our evaluation, as an example. For the four programs that can be analyzed scalably under 3obj, our MAHJONG-based 3obj runs 131X faster, on average, while achieving nearly the same precision for all the three clients. For the remaining eight, where 3obj is unscalable in 5 hours each, our MAHJONG-based 3obj can analyze five of them in an average of 33.42 minutes. In summary, our paper makes the following contributions: • We present MAHJONG, a new heap abstraction that can significantly scale an allocation-site-based points-to analysis for object-oriented programs while achieving nearly the same precision for type-dependent clients. • We formulate the problem of checking the type-consistency of two objects as one of testing the equivalence of two automata, which is solvable in almost linear time. • We implement MAHJONG as a stand-alone open-source tool. MAHJONG is simple (with only 1500 LOC of Java in total) and drops easily on any allocation-site-based points-to analysis. • We conduct extensive experiments to evaluate the effectiveness of MAHJONG in practice. 2. Motivation For points-to analysis, type-dependent clients, such as call graph construction, devirtualization and may-fail casting, share similar needs: their precision depends on the types of pointed-to objects rather than the pointed-to objects themselves. For such clients, the conventional allocationsite abstraction is often too fine-grained, contributing little to improving their precision but rendering the underlying points-to analysis unduly inefficient or eventually unscal- 1 A x = new A(); // o A 1 2 A y = new A(); // o A 2 3 A z = new A(); // o A 3 4 x.f = new B(); // o B 4 5 y.f = new C(); // o C 5 6 z.f = new C(); // o C 6 7 A a = z.f; 8 a.foo(); 9 C c = (C) a; 10 class A { 11 A f; 12 void foo() {...} 13 } 14 class B extends A { 15 void foo() {...} 16 } 17 class C extends A { 18 void foo() {...} 19 } Figure 1. An example program illustrating object merging. able. In this paper, we aim to improve this by looking for a lightweight alternative that satisfies the needs of typedependent clients, but not necessarily others such as mayalias. To this end, we would like to avoid distinguishing two objects if merging them loses no or little precision. In Section 2.1, we see that blindly merging objects of the same type is ineffective. In Section 2.2, we describe our solution that merges objects representing equivalent automata only. For object-oriented programs, merging objects amounts to merging their corresponding allocation sites. 2.1 Allocation-Type Abstraction: A Naive Solution In this so-called allocation-type abstraction, all objects with the same type are merged, with one object per type. As previously noted, this naive solution often gains efficiency but may incur a significant loss of precision [19, 27, 38, 51]. Example 2.1. Consider Figure 1, where o t i represents the abstract object of type t created at the allocation site at line i. We will use this notation in the rest of the paper. For the three type-dependent clients, call graph construction, devirtualization and may-fail casting, only lines 8 – 9 are relevant. According to an allocation-site-based Andersen’s points-to analysis [4], x, y and z point to o A 1 , o A 2 and o A 3 , respectively. As x.f, y.f and z.f are not aliases, a points to o C 6 . Thus, a.foo() at line 8 is a mono-call and can thus be devirtualized, and in addition, the cast (C) at line 9 is safe. However, if o A 1 , o A 2 and o A 3 are merged, then x.f, y.f and z.f will be aliases, causing a to also point to o B 4 . As a result, a.foo() becomes a poly-call and thus non-devirtualizable. In addition, the cast (C) is no longer considered safe. Consider pmd, a program analyzed by (1) 3obj—a 3- object-sensitive points-to analysis [29] using the allocationsite abstraction, (2) T-3obj—3obj using the allocation-type abstraction, and (3) M-3obj—3obj using the MAHJONG heap abstraction introduced in this paper. For 3obj, pmd is analyzed in 14469.3 seconds, allowing 44004 call graph edges to be discovered. T-3obj is the fastest (50.3 seconds), but is the most imprecise (50666 call graph edges). In contrast, M-3obj is as precise as 3obj (44016 call graph edges) but is also nearly as fast as T-3obj (127.7 seconds). 279