Two objects with the same type are type-consistent if 0 traversing from the two objects along the same sequence of field names always lead to objects of one single type. Definition 2.1(Type-Consistent Objects).Two objects,o and oj,with the same type are said to be type-consistent, denoted oi=oj,if for every sequence of field names. f =f1.f2.....fn,the following two conditions hold: Figure 2.Field points-to graph rooted at of and o. 1.{to|o∈pts(o.f)}={t[o|o∈pts(oj-f)},and 2.2 MAHJONG:Our Solution 2.{tlo]loE pts(oi.f))=1. To address the needs of type-dependent clients,MAHJONG In Figure 2,of and of are type-consistent.For the objects is designed to maximally preserve the precision of the reached from of and of,along f.f.h,g and g.k,their sets allocation-site abstraction while reaping the efficiency of the of types are (U),[){X}and (Y),respectively. allocation-type abstraction as much as possible.For a given We illustrate the intuition behind the notion of type- program,we first build a heap abstraction by performing a consistency with an example discussed below. pre-analysis,i.e.,a fast but imprecise allocation-site-based Example 2.3.Let us return to Figure 1,for which the Andersen's points-to analysis [4]and then use it to guide a subsequent points-to analysis.Based on the pre-analysis, allocation-type abstraction will merge of,o2 and o(Sec- tion 2.1).By Definition 2.1,o and o are type-consistent we define type-consistent objects that can be merged(Sec- tion 2.2.1)and formulate the problem of checking the type- (as os.f points to of and og.f points to o)but of is not type-consistent with any (as of.f points to of).After o2 consistency of two objects as one of testing the equivalence of two automata in almost linear time(Section 2.2.2). and o are merged,y.f and z.f are regarded as aliases. Thus,a will point to not only o as before but also of spu- 2.2.1 Defining Type-Consistent Objects riously.However,as os and o have the same type C,the precision of call graph construction and devirtualization at After the pre-analysis,the field points-to graph(FPG)is line 8 and may-fail casting at line 9 will not be affected. available,representing the points-to information for the ob- ject fields.To facilitate a subsequent reduction of the prob- Let us examine Definition 2.1.Condition 1 is self- lem of checking type-consistency as one of testing the equiv- explanatory in order to maximally preserve precision for alence of automata,we introduce the field points-to graph type-dependent clients.What is the rationale behind Con- rooted at an object o as go =(H,F,a,o,T,T).H is the set dition 2?The pre-analysis is fast but imprecise.Enforcing of objects reachable from o.F is the set of field names tra Condition 2 maximally avoids precision loss,as shown below. versed along the way.The points-to relations for the object Pre-Analysis A More Precise Points-to Analysis fields are defined by a field points-to map a HxF P(H).T is the set of types of the objects in H.The object- to-type map t:HT reveals the type of an object. Figure 2 gives the field points-to graphs rooted at of and o,by using the same notation for objects in Figure 1. (a)Allocation-Site (b)Allocation-Site (c)MAHONG without Abstraction Abstraction Condition 2 Example 2.2.Consider of first in Figure 2.gor =(H,F, a,o,T,).H={,,哈,o;F={f,g,h,k Figure 3.Illustrating Condition 2 in Definition 2.1. ao,fl={oMao,h={og,ao,g={哈1, and alo哈,={ogT={T,U,X,Y};and tlo1?]=T, Example 2.4.Suppose of.f and of.f point to bothox rou]=U.tlo ]=X,and t[os ]=Y.Similarly,gor can and o during the pre-analysis(Figure 3(a))but of and o3, be constructed. respectively,in a more precise allocation-site-based points- Unlike the allocation-type abstraction,where all the ob- to analysis,A(Figure 3(b)).If Condition 2 is ignored,o and of will become type-consistent according to the pre- jects with the same type are merged blindly,we will merge so-called type-consistent objects,thereby avoiding the im- analysis and thus merged into,say,of(represented by of precision introduced by the allocation-type abstraction. or of).Running A with this new abstraction will result in Let f =f1.f2.....fn,where n 0,be a sequence of precision loss,as o.f and of.f now point to objects of field names.For the field points-to graph go rooted at an types X and Y(Figure 3(c)). 口 object o,we write pts(o.f)to represent the set of objects In Definition 2.1,the type-consistency relation is an that can be reached from o along any path of points-to edges equivalence relation.It is straightforward to verify that =is labeled by fi.f2.....fn in o in that order.In Figure 2. reflexive,symmetric and transitive. pts(of.f)=tog}and pts(of.f.h)=to,o. Let H be the set of all abstract objects in the program. 2803 U 5 X 9 Y 11 g Y k 1 T o 7 Y h h f o o o o o 2 T 4 U 6 X 8 Y g f h k o o o o 1 T o 2 T G Go Figure 2. Field points-to graph rooted at o T 1 and o T 2 . 2.2 MAHJONG: Our Solution To address the needs of type-dependent clients, MAHJONG is designed to maximally preserve the precision of the allocation-site abstraction while reaping the efficiency of the allocation-type abstraction as much as possible. For a given program, we first build a heap abstraction by performing a pre-analysis, i.e., a fast but imprecise allocation-site-based Andersen’s points-to analysis [4] and then use it to guide a subsequent points-to analysis. Based on the pre-analysis, we define type-consistent objects that can be merged (Section 2.2.1) and formulate the problem of checking the typeconsistency of two objects as one of testing the equivalence of two automata in almost linear time (Section 2.2.2). 2.2.1 Defining Type-Consistent Objects After the pre-analysis, the field points-to graph (FPG) is available, representing the points-to information for the object fields. To facilitate a subsequent reduction of the problem of checking type-consistency as one of testing the equivalence of automata, we introduce the field points-to graph rooted at an object o as Go = (H, F, α, o, T , τ). H is the set of objects reachable from o. F is the set of field names traversed along the way. The points-to relations for the object fields are defined by a field points-to map α : H × F 7→ P(H). T is the set of types of the objects in H. The objectto-type map τ : H 7→ T reveals the type of an object. Figure 2 gives the field points-to graphs rooted at o T 1 and o T 2 , by using the same notation for objects in Figure 1. Example 2.2. Consider o T 2 first in Figure 2. Go T 2 = (H, F, α, oT 2 , T , τ). H = {o T 2 , oU 4 , oX 6 , oY 8 }; F = {f, g, h, k}; α[o T 2 , f] = {o U 4 }, α[o U 4 , h] = {o Y 8 }, α[o T 2 , g] = {o X 6 }, and α[o X 6 , k] = {o Y 8 }; T = {T, U, X, Y }; and τ[o T 2 ] = T, τ[o U 4 ] = U, τ[o X 6 ] = X, and τ[o Y 8 ] = Y . Similarly, Go T 1 can be constructed. Unlike the allocation-type abstraction, where all the objects with the same type are merged blindly, we will merge so-called type-consistent objects, thereby avoiding the imprecision introduced by the allocation-type abstraction. Let ¯f = f1.f2. · · · .fn, where n > 0, be a sequence of field names. For the field points-to graph Go rooted at an object o, we write pts(o. ¯f) to represent the set of objects that can be reached from o along any path of points-to edges labeled by f1, f2, . . . , fn in Go in that order. In Figure 2, pts(o T 1 .f) = {o U 3 } and pts(o T 1 .f.h) = {o Y 7 , oY 9 }. Two objects with the same type are type-consistent if traversing from the two objects along the same sequence of field names always lead to objects of one single type. Definition 2.1 (Type-Consistent Objects). Two objects, oi and oj , with the same type are said to be type-consistent, denoted oi ≡ oj , if for every sequence of field names, ¯f = f1.f2. · · · .fn, the following two conditions hold: 1. {τ[o] | o ∈ pts(oi . ¯f)} = {τ[o] | o ∈ pts(oj . ¯f)}, and 2. {τ[o] | o ∈ pts(oi . ¯f)} = 1. In Figure 2, o T 1 and o T 2 are type-consistent. For the objects reached from o T 1 and o T 2 , along f, f.h, g and g.k, their sets of types are {U}, {Y }, {X} and {Y }, respectively. We illustrate the intuition behind the notion of typeconsistency with an example discussed below. Example 2.3. Let us return to Figure 1, for which the allocation-type abstraction will merge o A 1 , o A 2 and o A 3 (Section 2.1). By Definition 2.1, o A 2 and o A 3 are type-consistent (as o A 2 .f points to o C 5 and o A 3 .f points to o C 6 ) but o A 1 is not type-consistent with any (as o A 1 .f points to o B 4 ). After o A 2 and o A 3 are merged, y.f and z.f are regarded as aliases. Thus, a will point to not only o C 6 as before but also o C 5 spuriously. However, as o C 5 and o C 6 have the same type C, the precision of call graph construction and devirtualization at line 8 and may-fail casting at line 9 will not be affected. Let us examine Definition 2.1. Condition 1 is selfexplanatory in order to maximally preserve precision for type-dependent clients. What is the rationale behind Condition 2? The pre-analysis is fast but imprecise. Enforcing Condition 2 maximally avoids precision loss, as shown below. f O T i f f Pre-Analysis A More Precise Points-to Analysis (a) Allocation-Site Abstraction (b) Allocation-Site Abstraction (c) MAHJONG without Condition 2 O X 1 O Y 2 O T i O X 1 f O T j O Y 2 O T j f f O X 1 O Y 2 O T k f f O X 1 O Y 2 Figure 3. Illustrating Condition 2 in Definition 2.1. Example 2.4. Suppose o T i .f and o T j .f point to both o X 1 and o Y 2 during the pre-analysis (Figure 3(a)) but o X 1 and o Y 2 , respectively, in a more precise allocation-site-based pointsto analysis, A (Figure 3(b)). If Condition 2 is ignored, o T i and o T j will become type-consistent according to the preanalysis and thus merged into, say, o T k (represented by o T i or o T j ). Running A with this new abstraction will result in precision loss, as o T i .f and o T j .f now point to objects of types X and Y (Figure 3(c)). In Definition 2.1, the type-consistency relation ≡ is an equivalence relation. It is straightforward to verify that ≡ is reflexive, symmetric and transitive. Let H be the set of all abstract objects in the program. 280