Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting Tian Tan1,Yue Lil,and Jingling Xuel.2 1 School of Computer Science and Engineering,UNSW Australia 2 Advanced Innovation Center for Imaging Technology,CNU,China Abstract.Object-sensitivity is regarded as arguably the best context abstraction for pointer analysis in object-oriented languages.However,a k-object-sensitive pointer analysis,which uses a sequence of k allocation sites (as k context elements)to represent a calling context of a method call,may end up using some context elements redundantly without in- ducing a finer partition of the space of(concrete)calling contexts for the method call.In this paper,we introduce BEAN,a general approach for improving the precision of any k-object-sensitive analysis,denoted k-obj. by still using a k-limiting context abstraction.The novelty is to identify allocation sites that are redundant context elements in k-obj from an Object Allocation Graph (OAG).which is built based on a pre-analysis (e.g.,a context-insensitive Andersen's analysis)performed initially on a program and then avoid them in the subsequent k-object-sensitive anal- ysis for the program.BEAN is generally more precise than k-obj,with a precision that is guaranteed to be as good as k-obj in the worst case.We have implemented BEAN as an open-source tool and applied it to refine two state-of-the-art whole-program pointer analyses in DooP.For two representative clients (may-alias and may-fail-cast)evaluated on a set of nine large Java programs from the DaCapo benchmark suite,BEAN has succeeded in making both analyses more precise for all these benchmarks under each client at only small increases in analysis cost. 1 Introduction Pointer analysis,as an enabling technology,plays a key role in a wide range of client applications,including bug detection [3,25,35,34],security analysis [1,13], compiler optimisation [6,33],and program understanding [12].Two major di- mensions of pointer analysis precision are flow-sensitivity and context-sensitivity. For C/C++programs,flow-sensitivity is needed by many clients [11,16,37,32]. For object-oriented programs,e.g.,Java programs,however,context-sensitivity is known to deliver trackable and useful precision [17,19-21,28-30],in general. There are two general approaches to achieving context-sensitivity for object- oriented programs,call-site-sensitivity (k-CFA)[27]and object-sensitivity [23. 24,29](among others).A k-CFA analysis represents a calling context of a method call by using a sequence of k call sites(i.e.,k labels with each denoting a call site).In contrast,a k-object-sensitive analysis uses k object allocation sites (i.e., k labels with each denoting a new statement)as context elements
Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting Tian Tan1 , Yue Li1 , and Jingling Xue1,2 1 School of Computer Science and Engineering, UNSW Australia 2 Advanced Innovation Center for Imaging Technology, CNU, China Abstract. Object-sensitivity is regarded as arguably the best context abstraction for pointer analysis in object-oriented languages. However, a k-object-sensitive pointer analysis, which uses a sequence of k allocation sites (as k context elements) to represent a calling context of a method call, may end up using some context elements redundantly without inducing a finer partition of the space of (concrete) calling contexts for the method call. In this paper, we introduce Bean, a general approach for improving the precision of any k-object-sensitive analysis, denoted k-obj, by still using a k-limiting context abstraction. The novelty is to identify allocation sites that are redundant context elements in k-obj from an Object Allocation Graph (OAG), which is built based on a pre-analysis (e.g., a context-insensitive Andersen’s analysis) performed initially on a program and then avoid them in the subsequent k-object-sensitive analysis for the program. Bean is generally more precise than k-obj, with a precision that is guaranteed to be as good as k-obj in the worst case. We have implemented Bean as an open-source tool and applied it to refine two state-of-the-art whole-program pointer analyses in Doop. For two representative clients (may-alias and may-fail-cast) evaluated on a set of nine large Java programs from the DaCapo benchmark suite, Bean has succeeded in making both analyses more precise for all these benchmarks under each client at only small increases in analysis cost. 1 Introduction Pointer analysis, as an enabling technology, plays a key role in a wide range of client applications, including bug detection [3, 25, 35, 34], security analysis [1, 13], compiler optimisation [6, 33], and program understanding [12]. Two major dimensions of pointer analysis precision are flow-sensitivity and context-sensitivity. For C/C++ programs, flow-sensitivity is needed by many clients [11, 16, 37, 32]. For object-oriented programs, e.g., Java programs, however, context-sensitivity is known to deliver trackable and useful precision [17, 19–21, 28–30], in general. There are two general approaches to achieving context-sensitivity for objectoriented programs, call-site-sensitivity (k-CFA) [27] and object-sensitivity [23, 24, 29] (among others). A k-CFA analysis represents a calling context of a method call by using a sequence of k call sites (i.e., k labels with each denoting a call site). In contrast, a k-object-sensitive analysis uses k object allocation sites (i.e., k labels with each denoting a new statement) as context elements
Among all the context abstractions (including k-CFA)proposed,object- sensitivity is regarded as arguably the best for pointer analysis in object-oriented languages [14,17,29].This can be seen from its widespread adoption in a num- ber of pointer analysis frameworks for Java,such as Doop [7,4],CHORD [5] and WALA [36].In addition,object-sensitivity has also been embraced by many other program analysis tasks,including typestate verification [9,38],data race detection [25],information flow analysis [1,10,22],and program slicing [21]. Despite its success,a k-object-sensitive pointer analysis,which uses a se- quence of k allocation sites (as k context elements)to represent a calling context of a method call,may end up using some context elements redundantly in the sense that these redundant context elements fail to induce a finer partition of the space of(concrete)calling contexts for the method call.As a result,many opportunities for making further precision improvements are missed. In this paper,we introduce BEAN,a general approach for improving the precision of a k-object-sensitive pointer analysis,denoted k-obj,for Java.by avoiding redundant context elements in k-obj while still maintaining a k-limiting context abstraction.The novelty lies in identifying redundant context elements by solving a graph problem on an OAG (Object Allocation Graph),which is built based on a pre-analysis (e.g.,a context-insensitive Andersen's analysis) performed initially on a program,and then avoid them in the subsequent k- object-sensitive analysis.By construction,BEAN is generally more precise than k-obj,with a precision that is guaranteed to be as good as k-obj in the worst case. We have implemented BEAN and applied it to refine two state-of-the-art (whole-program)pointer analyses,2obj+h and S-2obj+h [14],provided in Doop [7],resulting in two BEAN-directed pointer analyses,B-2obj+h and B-S-2obj+h, respectively.We have considered may-alias and may-fail-cast,two representative clients used elsewhere [8,29,30]for measuring the precision of a pointer analysis on a set of nine large Java programs from the DaCapo benchmark suite.Our re- sults show that B-2obj+h (B-S-2obj+h)is more precise than 2obj+h(S-2obj+h) for every evaluated benchmark under each client,at some small increases in analysis cost. This paper presents and validates a new idea on improving the precision of object-sensitive pointer analysis by exploiting an object allocation graph.Con- sidering the broad applications of object-sensitivity in analysing Java programs, we expect more clients to benefit from the BEAN approach,in practice.Specifi- cally,this paper makes the following contributions: -We introduce a new approach,BEAN,for improving the precision of any k- object-sensitive pointer analysis.k-obj.for Java.by avoiding its redundant context elements while maintaining still a k-limiting context abstraction. We introduce a new kind of graph,called an OAG(object allocation graph), constructed from a pre-analysis for the program,as a general mechanism to identify redundant context elements used in k-obj. We have implemented BEAN as a soon-to-be released open-source tool,which is expected to work well with various object-sensitive analyses for Java
Among all the context abstractions (including k-CFA) proposed, objectsensitivity is regarded as arguably the best for pointer analysis in object-oriented languages [14, 17, 29]. This can be seen from its widespread adoption in a number of pointer analysis frameworks for Java, such as Doop [7, 4], Chord [5] and Wala [36]. In addition, object-sensitivity has also been embraced by many other program analysis tasks, including typestate verification [9, 38], data race detection [25], information flow analysis [1, 10, 22], and program slicing [21]. Despite its success, a k-object-sensitive pointer analysis, which uses a sequence of k allocation sites (as k context elements) to represent a calling context of a method call, may end up using some context elements redundantly in the sense that these redundant context elements fail to induce a finer partition of the space of (concrete) calling contexts for the method call. As a result, many opportunities for making further precision improvements are missed. In this paper, we introduce Bean, a general approach for improving the precision of a k-object-sensitive pointer analysis, denoted k-obj, for Java, by avoiding redundant context elements in k-obj while still maintaining a k-limiting context abstraction. The novelty lies in identifying redundant context elements by solving a graph problem on an OAG (Object Allocation Graph), which is built based on a pre-analysis (e.g., a context-insensitive Andersen’s analysis) performed initially on a program, and then avoid them in the subsequent kobject-sensitive analysis. By construction, Bean is generally more precise than k-obj, with a precision that is guaranteed to be as good as k-obj in the worst case. We have implemented Bean and applied it to refine two state-of-the-art (whole-program) pointer analyses, 2obj+h and S-2obj+h [14], provided in Doop [7], resulting in two Bean-directed pointer analyses, B-2obj+h and B-S-2obj+h, respectively. We have considered may-alias and may-fail-cast, two representative clients used elsewhere [8, 29, 30] for measuring the precision of a pointer analysis on a set of nine large Java programs from the DaCapo benchmark suite. Our results show that B-2obj+h (B-S-2obj+h) is more precise than 2obj+h (S-2obj+h) for every evaluated benchmark under each client, at some small increases in analysis cost. This paper presents and validates a new idea on improving the precision of object-sensitive pointer analysis by exploiting an object allocation graph. Considering the broad applications of object-sensitivity in analysing Java programs, we expect more clients to benefit from the Bean approach, in practice. Specifi- cally, this paper makes the following contributions: – We introduce a new approach, Bean, for improving the precision of any kobject-sensitive pointer analysis, k-obj, for Java, by avoiding its redundant context elements while maintaining still a k-limiting context abstraction. – We introduce a new kind of graph, called an OAG (object allocation graph), constructed from a pre-analysis for the program, as a general mechanism to identify redundant context elements used in k-obj. – We have implemented Bean as a soon-to-be released open-source tool, which is expected to work well with various object-sensitive analyses for Java
We have applied BEAN to refine two state-of-the-art object-sensitive pointer analyses for Java.BEAN improves their precision for two representative clients on a set of nine Java programs in DaCapo at small time increases. 2 Motivation When analysing Java programs,there are two types of context,a method contert for local variables and a heap contert for object fields.In k-obj,a k-object- sensitive analysis [24,29],a method context is a sequence of k allocation sites and a heap context is typically a sequence of k-1 allocation sites.Given an allocation site at label e.e is also referred to as an abstract object for the site. Currently,k-obj,where k=2,represents a 2-object-sensitive analysis with a 1-context-sensitive heap (with respect to allocation sites),denoted 2obj+h[14], which usually achieves the best tradeoff between precision and scalability and has thus been widely adopted in pointer analysis for Java [8,21,29].In 2obj+h, a heap context for an abstract object e is a receiver object of the method that made the allocation of e(known as an allocator object).A method context for a method call is a receiver object of the method plus its allocator object. Below we examine the presence of redundant context elements in 2obj+h, with two examples,one for method contexts and one for heap contexts.This serves to motivate the BEAN approach proposed for avoiding such redundancy. 2.1 Redundant Elements in Method Contexts We use an example in Fig.1 to illustrate how 2obj+h analyses it imprecisely due to its use of a redundant context element in method contexts and how BEAN avoids the imprecision by avoiding this redundancy.We consider a may-alias client that queries for the alias relation between variables v1 and v2. In Fig.1(a),we identify the six allocation sites by their labels given in their end-of-line comments (in green),i.e.,A/1,A/2,0/1,0/2,B/1,and C/1. In Fig.1(b),we give the context-sensitive call graph computed by 2obj+h, where each method is analysed separately for each different calling context,de- noted by [...(in red).C.identify()has two concrete calling contexts but analysed only once under [B/1,C/1].We can see that B/1 is redundant (rela- tive to C/1)since adding B/1 to C/1 fails to separate the two concrete calling contexts.As a result,variables vi and v2 are made to point to both 0/1 and 0/2 at the same time,causing may-alias to report a spurious alias.During any program execution,v1 and v2 can only point to 0/1 and 0/2,respectively. In Fig.1(c),we give the context-sensitive call graph computed by BEAN, where C.identify()is now analysed separately under two different contexts, [A/1,C/1]and [A/2,C/1].Due to the improved precision,v1 (v2)now points to 0/1(0/2)only,causing may-alias to conclude that both are no longer aliases
– We have applied Bean to refine two state-of-the-art object-sensitive pointer analyses for Java. Bean improves their precision for two representative clients on a set of nine Java programs in DaCapo at small time increases. 2 Motivation When analysing Java programs, there are two types of context, a method context for local variables and a heap context for object fields. In k-obj, a k-objectsensitive analysis [24, 29], a method context is a sequence of k allocation sites and a heap context is typically a sequence of k − 1 allocation sites. Given an allocation site at label `, ` is also referred to as an abstract object for the site. Currently, k-obj, where k = 2, represents a 2-object-sensitive analysis with a 1-context-sensitive heap (with respect to allocation sites), denoted 2obj+h [14], which usually achieves the best tradeoff between precision and scalability and has thus been widely adopted in pointer analysis for Java [8, 21, 29]. In 2obj+h, a heap context for an abstract object ` is a receiver object of the method that made the allocation of ` (known as an allocator object). A method context for a method call is a receiver object of the method plus its allocator object. Below we examine the presence of redundant context elements in 2obj+h, with two examples, one for method contexts and one for heap contexts. This serves to motivate the Bean approach proposed for avoiding such redundancy. 2.1 Redundant Elements in Method Contexts We use an example in Fig. 1 to illustrate how 2obj+h analyses it imprecisely due to its use of a redundant context element in method contexts and how Bean avoids the imprecision by avoiding this redundancy. We consider a may-alias client that queries for the alias relation between variables v1 and v2. In Fig. 1(a), we identify the six allocation sites by their labels given in their end-of-line comments (in green), i.e., A/1, A/2, O/1, O/2, B/1, and C/1. In Fig. 1(b), we give the context-sensitive call graph computed by 2obj+h, where each method is analysed separately for each different calling context, denoted by [...] (in red). C.identify() has two concrete calling contexts but analysed only once under [B/1,C/1]. We can see that B/1 is redundant (relative to C/1) since adding B/1 to [C/1] fails to separate the two concrete calling contexts. As a result, variables v1 and v2 are made to point to both O/1 and O/2 at the same time, causing may-alias to report a spurious alias. During any program execution, v1 and v2 can only point to O/1 and O/2, respectively. In Fig. 1(c), we give the context-sensitive call graph computed by Bean, where C.identify() is now analysed separately under two different contexts, [A/1, C/1] and [A/2, C/1]. Due to the improved precision, v1 (v2) now points to O/1 (O/2) only, causing may-alias to conclude that both are no longer aliases
1 void main(Object[]args){ 23 A al nev A()/A/1 A/1 A/1,B1] Object v1 a1.foo(nev Object());/0/1 A.foo0 B.bar( 4 B/1,C11 A a2 ney A(:/A/2 main0 A/2] 1A2,B/1 C.identity( Object v2 a2.foo(nev Object());/0/2 7) Afoo0 B.bar() 8 class A Object foo(Object v) Bb B() /B/1 (b)Context-sensitive call graph by 2obj+h return b.bar(v); 131 14 class B AM] A/1,B/1 A/1,C/1 Object bar(Object v){ cc nev C();/C/1 A.foo() B.bar0 Cidentity0 17 return c.identity(v); 1 A/2 A2,B1] A2,C/1 191 20 class C{ A.foo() B.bar0 C.identity 21 Object identity(Object v)return v;} 22 (a)Program (c)Context-sensitive call graph by BEAN Fig.1:Method contexts for 2objth and BEAN. 2.2 Redundant Elements in Heap Contexts We now use an example in Fig.2 to illustrate how 2obj+h analyses it imprecisely due to its use of a redundant element in heap contexts and how BEAN avoids the imprecision by avoiding this redundancy.Our may-alias client now issues an alias query for variables emp1 and emp2.In Fig.2(a),we identify again its six allocation sites by their labels given at their end-of-line comments(in green). Fig.2(b)shows the context-sensitive field points-to graph computed by 2obj+h, where each node represents an abstract heap object created under the corre- sponding context,denoted [...],(in red),and each edge represents a field points- to relation with the corresponding field name being labeled on the edge.An array object is analysed with its elements collapsed to one pseudo-field,denoted arr Hence,x[i]=y (y x[i])is handled as x.arr y (y x.arr). In this example,two companies,Co/1 and Co/2,maintain their employee information by using two different ArrayLists,with each implemented internally by a distinct array of type Object [at line 22.However,2obj+h has modelled the two array objects imprecisely by using one abstract object Obj[]/1 under [AL/1].Note that AL/1 is redundant since adding it to [makes no difference to the handling of Obj[]/1.As a result,emp1 and emp2 will both point to Emp/1 and Emp/2,causing may-alias to regard both as aliases conservatively. Fig.2(c)shows the context-sensitive field points-to graph computed by BEAN. This time,the Object [arrays used by two companies Co/1 and Co/2 are dis- tinguished under two distinct heap contexts Co/1 and Co/2.As a result,our may-alias client will no longer report emp1 and emp2 to be aliases. 2.3 Discussion As illustrated above,k-obj selects blindly a sequence of k-most-recent allocation sites as a context.To analyse large-scale software scalably,k is small,which is 2
A/1 main() A.foo() A.foo() B.bar() B.bar() C.identity() A/2 A/1, B/1 A/2, B/1 B/1, C/1 A/1 main() A.foo() A.foo() B.bar() B.bar() A/2 A/1, B/1 A/2, B/1 C.identity() A/1, C/1 C.identity() A/2, C/1 (a) (b) (c) Context-sensitive call graph by 2obj+h Program Context-sensitive call graph by BEAN 1 void main(Object[] args) { 2 A a1 = new A(); // A/1 3 Object v1 = a1.foo(new Object()); // O/1 4 5 A a2 = new A(); // A/2 6 Object v2 = a2.foo(new Object()); // O/2 7 } 8 class A { 9 Object foo(Object v) { 10 Bb= new B(); // B/1 11 return b.bar(v); 12 } 13 } 14 class B { 15 Object bar(Object v) { 16 Cc= new C(); // C/1 17 return c.identity(v); 18 } 19 } 20 class C { 21 Object identity(Object v) { return v; } 22 } Fig. 1: Method contexts for 2obj+h and Bean. 2.2 Redundant Elements in Heap Contexts We now use an example in Fig. 2 to illustrate how 2obj+h analyses it imprecisely due to its use of a redundant element in heap contexts and how Bean avoids the imprecision by avoiding this redundancy. Our may-alias client now issues an alias query for variables emp1 and emp2. In Fig. 2(a), we identify again its six allocation sites by their labels given at their end-of-line comments (in green). Fig. 2(b) shows the context-sensitive field points-to graph computed by 2obj+h, where each node represents an abstract heap object created under the corresponding context, denoted [...], (in red), and each edge represents a field pointsto relation with the corresponding field name being labeled on the edge. An array object is analysed with its elements collapsed to one pseudo-field, denoted arr. Hence, x[i] = y (y = x[i]) is handled as x.arr = y (y = x.arr). In this example, two companies, Co/1 and Co/2, maintain their employee information by using two different ArrayLists, with each implemented internally by a distinct array of type Object[] at line 22. However, 2obj+h has modelled the two array objects imprecisely by using one abstract object Obj[]/1 under [AL/1]. Note that AL/1 is redundant since adding it to [ ] makes no difference to the handling of Obj[]/1. As a result, emp1 and emp2 will both point to Emp/1 and Emp/2, causing may-alias to regard both as aliases conservatively. Fig. 2(c) shows the context-sensitive field points-to graph computed by Bean. This time, the Object[] arrays used by two companies Co/1 and Co/2 are distinguished under two distinct heap contexts [Co/1] and [Co/2]. As a result, our may-alias client will no longer report emp1 and emp2 to be aliases. 2.3 Discussion As illustrated above, k-obj selects blindly a sequence of k-most-recent allocation sites as a context. To analyse large-scale software scalably, k is small, which is 2
void main(String args) any():/Co/1 ICo/1] p1.addEaployee(new Enplo e(O):∥Ep/ (Emp/1 56- obj[/1 pany(): Co/a 1Co/2 Eap/2 Emp/2 9 即1 oyee ep2 comp2.getEaployee(0) 10 class Employee (... 11C1a55C0mpan节 Context-sensitive fields points-to private rrayList emps: (b) graph by 2obj+h new ArrayList():/AL/1 15 emps.add(emp); Co/ JCon1 e) mps-get(i): 18】 Co/1 emps AL/1 elems (Emp/1 19 class ArrayList [ele ICo/21 ICo/2] List(elems nev Obiect[10]:}/Obj/1 Co/2emp elems (AL/1 (Ob0/1 arr Emp/2 23 void add(Object e){elems[size++]-e;} Object get(int i){return elens[i]; (c) Context-sensitive fields points-to (a)Program graph by BEAN Fig.2:Heap contexts for 2o6j+h and BEAN. for a method context and 1 for a heap context in 2obj+h.Therefore,redundant context elements,such as B/1 in [B/1,C/1]in Fig.1(b)and AL/1 in [AL/1]in Fig.2(b),should be avoided since they waste precious space in a context yet contribute nothing in separating the concrete calling contexts for a call site. This paper aims to address this problem in k-obj by excluding redundant elements from its contexts so that their limited context positions can be more profitably exploited to achieve better precision,as shown in Figs.1(c)and 2(c). 3 Methodology We introduce a new approach,BEAN, as illustrated in Fig.3,to improving Pownts-To the precision of a k-object-sensitive Pre-analysis OAG Set Construction pointer analysis,k-obj.The basic idea k-object-sensitive Selected Context is to refine k-obj by avoiding its re- pointer analysis (k-obj) Contexts Selection dundant context elements while main- taining still a k-limiting context ab- straction.An element e in a context Fig.3:Overview of BEAN c for a call or allocation site is redun- dant if c with e removed does not change the context represented by c.For example,B/1 in B/1,C/1]in Fig.1(b)and AL/1 in [AL/1]in Fig.2(b)are re- dundant. BEAN proceeds in two stages.In Stage 1,we aim to identify redundant con- text elements used in k-obj.To do so,we first perform usually a fast but im- precise pre-analysis,e.g.,a context-insensitive Andersen's pointer analysis on a program to obtain its points-to information.Based on the points-to information
Obj[]/1 AL/1 Co/1 Co/1 AL/1 emps elems elems Emp/1 1 void main(String[] args) { 2 Company comp1 = new Company(); // Co/1 3 comp1.addEmployee(new Employee()); // Emp/1 4 Employee emp1 = comp1.getEmployee(0); 5 6 Company comp2 = new Company(); // Co/2 7 comp2.addEmployee(new Employee()); // Emp/2 8 Employee emp2 = comp2.getEmployee(0); 9 } 10 class Employee {...} 11 class Company { 12 private ArrayList emps; 13 Company() { emps = new ArrayList(); } // AL/1 14 void addEmployee(Employee emp) { emps.add(emp); } 15 Employee getEmployee(int i) { 16 return (Employee) emps.get(i); 17 } 18 } 19 class ArrayList { 20 private Object[] elems; 21 private int size = 0; 22 ArrayList() { elems = new Object[10]; } // Obj[]/1 23 void add(Object e) { elems[size++] = e; } 24 Object get(int i) { return elems[i]; } 25 } Co/2 Co/2 AL/1 emps Emp/2 arr arr Co/1 Co/1 AL/1 emps Co/2 Co/2 AL/1 emps elems Obj[]/1 elems Obj[]/1 Co/1 Co/2 arr arr Emp/1 Emp/2 (b) Context-sensitive fields points-to graph by 2obj+h (c) Context-sensitive fields points-to (a) Program graph by BEAN Fig. 2: Heap contexts for 2obj+h and Bean. for a method context and 1 for a heap context in 2obj+h. Therefore, redundant context elements, such as B/1 in [B/1,C/1] in Fig. 1(b) and AL/1 in [AL/1] in Fig. 2(b), should be avoided since they waste precious space in a context yet contribute nothing in separating the concrete calling contexts for a call site. This paper aims to address this problem in k-obj by excluding redundant elements from its contexts so that their limited context positions can be more profitably exploited to achieve better precision, as shown in Figs. 1(c) and 2(c). 3 Methodology Points-To OAG Construction Pre-analysis Context Selection Selected Set Contexts k-object-sensitive pointer analysis (k-obj) Fig. 3: Overview of Bean. We introduce a new approach, Bean, as illustrated in Fig. 3, to improving the precision of a k-object-sensitive pointer analysis, k-obj. The basic idea is to refine k-obj by avoiding its redundant context elements while maintaining still a k-limiting context abstraction. An element e in a context c for a call or allocation site is redundant if c with e removed does not change the context represented by c. For example, B/1 in [B/1,C/1] in Fig. 1(b) and AL/1 in [AL/1] in Fig. 2(b) are redundant. Bean proceeds in two stages. In Stage 1, we aim to identify redundant context elements used in k-obj. To do so, we first perform usually a fast but imprecise pre-analysis, e.g., a context-insensitive Andersen’s pointer analysis on a program to obtain its points-to information. Based on the points-to information
discovered,we construct an object allocation graph (OAG)to capture the ob- ject allocation relations in k-obj.Subsequently,we traverse the OAG to select method and heap contexts by avoiding redundant context elements that would otherwise be used by k-obj.In Stage 2,we refine k-obj by avoiding its redun- dant context elements.Essentially,we perform a k-object-sensitive analysis in the normal way,by using the contexts selected in the first stage,instead. 3.1 Object Allocation Graph The OAG of a program is a directed graph,G=(N,E).A node eE N represents a label of an (object)allocation site in the program.An edge 12E E represents an object allocation relation.As G is context-insensitive.a label eE G is also interchangeably referred to (in the literature)as the (unique)abstract heap object that models all the concrete objects created at the allocation site e. Given this,e1>2 signifies that e is the the receiver object of the method that made the allocation of 2.Therefore,e is called an allocator object of e2 [29]. (a)Fig.1 (b)Fig.2 (c)Example 2 Fig.4:The OAGs for the two motivating programs in Figs.1 and 2. Figure 4 gives the OAGs for the two programs in Figs.1 and 2,which are de- liberately designed to be isomorphic.In Fig.4(a),A/1 and A/2 are two allocators of B/1.In Fig.4(b),AL/1 is an allocator of Obj[]/1.Some objects,e.g.,those created in main()or static initialisers,have no allocators.For convenience,we assume the existence of a dummy node,Oroot,so that every object has at least one allocator.The isomorphic OAG in Fig.4(c)will be referred to in Example 2. The concept of allocator object captures the essence of object sensitivity.By definition [24,29],a context for an allocation site i.e.,an abstract object consists of its allocator object (e),the allocator object of e,and so on.The OAG provides a new perspective for object sensitivity,since a context for an object e is simply a path from Oroot to e.As a result,the problem of selecting contexts for e can be recast as one of solving a problem of distinguishing different paths from Oroot to l.Traditionally,a k-object-sensitive analysis selects blindly a suffix of a path from Oroot to e with length k
discovered, we construct an object allocation graph (OAG) to capture the object allocation relations in k-obj. Subsequently, we traverse the OAG to select method and heap contexts by avoiding redundant context elements that would otherwise be used by k-obj. In Stage 2, we refine k-obj by avoiding its redundant context elements. Essentially, we perform a k-object-sensitive analysis in the normal way, by using the contexts selected in the first stage, instead. 3.1 Object Allocation Graph The OAG of a program is a directed graph, G = (N, E). A node ` ∈ N represents a label of an (object) allocation site in the program. An edge `1 → `2 ∈ E represents an object allocation relation. As G is context-insensitive, a label ` ∈ G is also interchangeably referred to (in the literature) as the (unique) abstract heap object that models all the concrete objects created at the allocation site `. Given this, `1 → `2 signifies that `1 is the the receiver object of the method that made the allocation of `2. Therefore, `1 is called an allocator object of `2 [29]. O A/1 root A/2 B/1 C/1 O/1 O/2 O Co/1 root Co/2 AL/1 Obj[]/1 Emp/1 Emp/2 O O1 root O2 O3 O4 O5 O6 (a) Fig. 1 (b) Fig. 2 (c) Example 2 Fig. 4: The OAGs for the two motivating programs in Figs. 1 and 2. Figure 4 gives the OAGs for the two programs in Figs. 1 and 2, which are deliberately designed to be isomorphic. In Fig. 4(a), A/1 and A/2 are two allocators of B/1. In Fig. 4(b), AL/1 is an allocator of Obj[]/1. Some objects, e.g., those created in main() or static initialisers, have no allocators. For convenience, we assume the existence of a dummy node, Oroot, so that every object has at least one allocator. The isomorphic OAG in Fig. 4(c) will be referred to in Example 2. The concept of allocator object captures the essence of object sensitivity. By definition [24, 29], a context for an allocation site `, i.e., an abstract object ` consists of its allocator object (` 0 ), the allocator object of ` 0 , and so on. The OAG provides a new perspective for object sensitivity, since a context for an object ` is simply a path from Oroot to `. As a result, the problem of selecting contexts for ` can be recast as one of solving a problem of distinguishing different paths from Oroot to `. Traditionally, a k-object-sensitive analysis selects blindly a suffix of a path from Oroot to ` with length k
3.2 Context Selection Given an object e in G,BEAN selects its contexts in G as sequences of its direct or indirect allocators that are useful to distinguish different paths from Oroot to e while avoiding redundant ones that would otherwise be used in k-obj.The key insight is that in many cases,it is unnecessary to use all nodes of a path to distinguish the path from the other paths leading to the same node.In contrast, k-obj is not equipped with G and thus has to select blindly a suffix of each such path as a context,resulting in many redundant context elements being used. Method Contexts Figure 1 compares the method contexts used by 2obj+h and BEAN for the first example given.As shown in Figure 1(b),2obj+h analyses C.identity()under one context B/1,C/1],where B/1 is redundant,without being able to separate its two concrete calling contexts.In contrast,BEAN avoids using B/1 by examining the OAG of this example in Fig.4(a).There are two paths from0 root to C/1:0root→A/1→B/1→C/1and0root→A/2→B/1→ C/1.Note that 2obj+h has selected a suffix of the two paths,B/1-C/1,which happens to represent the same context [B/1,C/1]for C.identity().BEAN dis- tinguishes these two paths by ignoring the redundant node B/1,thereby settling with the method contexts shown in Figure 1(c).As a result,C.identity()is now analysed under two different contexts [A/1,C/1]and [A/2,C/1]more precisely. Heap Contexts Figure 2 compares the heap contexts used by 2objth and BEAN for the second example given.As shown in Figure 2(b),2obj+h fails to separate the two array objects created at the allocation site Obj[]/1 for two companies Co/1 and Co/2 by using one context [AL/1],where AL/1 is redundant. In contrast,BEAN avoids using AL/1 by examining the OAG of this example in Fig.4(b).There are two paths from Oroot to Obj[]/1:Oroot Co/1 AL/1→0bj[☐/1and0root→Co/2→AL/1→0bj[0/1.Note that2ob+hhas selected a suffix of the two paths,AL/1-Obj[]/1,which happens to represent the same heap context [AL/1]for Obj[]/1.BEAN distinguishes these two paths by ignoring the redundant node AL/1,thereby settling with the heap contexts shown in Figure 2(c).As a result,the two array objects created at Obj[]/1 are distinguished under two different contexts [Co/1]and [Co/2]more precisely. 3.3 Discussion BEAN,as shown in Fig.3,is designed to be a general-purpose technique for refining k-obj with three design goals,under the condition that its pre-analysis is sound.First,as the pre-analysis is usually less precise than k-obj,the OAG constructed for the program may contain some object allocation relations that are not visible in k-obj.Therefore,BEAN is not expected to be optimal in the sense that it can avoid all redundant context elements in k-obj.Second,if the pre-analysis is more precise than k-obj(e.g.,in some parts of the program),then the OAG may miss some object allocation relations that are visible in k-obj.This allows BEAN to avoid using context elements that are redundant with respect to the pre-analysis but not k-obj,making the resulting analysis even more precise
3.2 Context Selection Given an object ` in G, Bean selects its contexts in G as sequences of its direct or indirect allocators that are useful to distinguish different paths from Oroot to ` while avoiding redundant ones that would otherwise be used in k-obj. The key insight is that in many cases, it is unnecessary to use all nodes of a path to distinguish the path from the other paths leading to the same node. In contrast, k-obj is not equipped with G and thus has to select blindly a suffix of each such path as a context, resulting in many redundant context elements being used. Method Contexts Figure 1 compares the method contexts used by 2obj+h and Bean for the first example given. As shown in Figure 1(b), 2obj+h analyses C.identity() under one context [B/1,C/1], where B/1 is redundant, without being able to separate its two concrete calling contexts. In contrast, Bean avoids using B/1 by examining the OAG of this example in Fig. 4(a). There are two paths from Oroot to C/1: Oroot → A/1 → B/1 → C/1 and Oroot → A/2 → B/1 → C/1. Note that 2obj+h has selected a suffix of the two paths, B/1 → C/1, which happens to represent the same context [B/1,C/1] for C.identity(). Bean distinguishes these two paths by ignoring the redundant node B/1, thereby settling with the method contexts shown in Figure 1(c). As a result, C.identity() is now analysed under two different contexts [A/1,C/1] and [A/2,C/1] more precisely. Heap Contexts Figure 2 compares the heap contexts used by 2obj+h and Bean for the second example given. As shown in Figure 2(b), 2obj+h fails to separate the two array objects created at the allocation site Obj[]/1 for two companies Co/1 and Co/2 by using one context [AL/1], where AL/1 is redundant. In contrast, Bean avoids using AL/1 by examining the OAG of this example in Fig. 4(b). There are two paths from Oroot to Obj[]/1: Oroot → Co/1 → AL/1 → Obj[]/1 and Oroot → Co/2 → AL/1 → Obj[]/1. Note that 2obj+h has selected a suffix of the two paths, AL/1 → Obj[]/1, which happens to represent the same heap context [AL/1] for Obj[]/1. Bean distinguishes these two paths by ignoring the redundant node AL/1, thereby settling with the heap contexts shown in Figure 2(c). As a result, the two array objects created at Obj[]/1 are distinguished under two different contexts [Co/1] and [Co/2] more precisely. 3.3 Discussion Bean, as shown in Fig. 3, is designed to be a general-purpose technique for refining k-obj with three design goals, under the condition that its pre-analysis is sound. First, as the pre-analysis is usually less precise than k-obj, the OAG constructed for the program may contain some object allocation relations that are not visible in k-obj. Therefore, Bean is not expected to be optimal in the sense that it can avoid all redundant context elements in k-obj. Second, if the pre-analysis is more precise than k-obj (e.g., in some parts of the program), then the OAG may miss some object allocation relations that are visible in k-obj. This allows Bean to avoid using context elements that are redundant with respect to the pre-analysis but not k-obj, making the resulting analysis even more precise
Finally,BEAN is expected to be more precise than k-obj in general,with a precision that is guaranteed to be as good as k-obj in the worst case. 4 Formalism Without loss of generality,we formalise BEAN as a k-object-sensitive pointer analysis with a(k-1)-context-sensitive heap (with respect to allocation sites), as in [14].Thus,the depth of its method (heap)contexts is k(k-1). 4.1 Notations We will use the notations in Fig.5.The variable r,yeV top section lists the domains used.As we heap objecto,o;∈H method m∈M focus on object-sensitive analysis,a con- field fEF text is a sequence of objects.The mid- context c∈C=H°UHUH2. dle section gives the five key relations pt:C×V+P(C×H) used.pt and fpt store the analysis results: fpt:C×HxF→P(C×H) pt(c,r)represents the points-to set of vari- mtdCtrSelector:Cx HC able x under context c and fpt(c,oi,f) heapC'trSelector:CxHC contertsOf:M→P(C) represents the points-to set of the field f of an abstract object oi under con- OAG G=(N,E) text c.mtdCtrSelector and heapCtrSelector odoi,oj∈NsH choose the method contexts for method edge o→o∈EsN×N calls and heap contexts for allocation sites, Fig.5:Notations. respectively.contertsOf maps a method to its contexts.The last section defines the OAG used for a program:oioj means that oi is an allocator object of oj,i.e.,the receiver object of the method that made the allocation of oj. 4.2 Object Allocation Graph Figure 6 gives the rules for building the OAG,G=(N,E),for a program, based on the points-to sets computed by a pre-analysis,which may or may not be context-sensitive.As G is (currently)context-insensitive,the context information that appears in a points-to set (if any)is simply ignored.[OAG-NODE] and [OAG-DUMMYNODE]build N.[OAG-EDGE]and [OAG-DUMMYEDGE]build E. By [OAG-NoDEl,we add to N all the pointed-to target objects found during the pre-analysis.By [OAG-DUMMYNoDEl,we add a dummy node oroot to N. By [OAG-EDGEl,we add to E an edge oioj if oi is an allocator object of oj. Here,mthis,where m is the name of a method,represents the this variable of method m,which points to the receiver object of method m.By [OAG-DUvEDGEl, we add an edge from oroot to every object oi without any incoming edge yet,to indicate that oroot is now a pseudo allocator object of o.Note that an object allocated in main()or a static initialiser does not have an allocator object.Due to oroot,every object has at least one allocator object
Finally, Bean is expected to be more precise than k-obj in general, with a precision that is guaranteed to be as good as k-obj in the worst case. 4 Formalism Without loss of generality, we formalise Bean as a k-object-sensitive pointer analysis with a (k − 1)-context-sensitive heap (with respect to allocation sites), as in [14]. Thus, the depth of its method (heap) contexts is k (k − 1). 4.1 Notations variable x, y ∈ V heap object oi , oj ∈ H method m ∈ M field f ∈ F context c ∈ C = H0 ∪ H1 ∪ H2 ... pt : C × V → P(C × H) fpt : C × H × F → P(C × H) mtdCtxSelector : C × H → C heapCtxSelector : C × H → C contextsOf : M → P(C) OAG G = (N, E) node oi , oj ∈ N ⊆ H edge oi → oj ∈ E ⊆ N × N Fig. 5: Notations. We will use the notations in Fig. 5. The top section lists the domains used. As we focus on object-sensitive analysis, a context is a sequence of objects. The middle section gives the five key relations used. pt and fpt store the analysis results: pt(c, x) represents the points-to set of variable x under context c and fpt(c, oi , f) represents the points-to set of the field f of an abstract object oi under context c. mtdCtxSelector and heapCtxSelector choose the method contexts for method calls and heap contexts for allocation sites, respectively. contextsOf maps a method to its contexts. The last section defines the OAG used for a program: oi → oj means that oi is an allocator object of oj , i.e., the receiver object of the method that made the allocation of oj . 4.2 Object Allocation Graph Figure 6 gives the rules for building the OAG, G = (N, E), for a program, based on the points-to sets computed by a pre-analysis, which may or may not be context-sensitive. As G is (currently) context-insensitive, the context information that appears in a points-to set (if any) is simply ignored. [OAG-Node] and [OAG-DummyNode] build N. [OAG-Edge] and [OAG-DummyEdge] build E. By [OAG-Node], we add to N all the pointed-to target objects found during the pre-analysis. By [OAG-DummyNode], we add a dummy node oroot to N. By [OAG-Edge], we add to E an edge oi → oj if oi is an allocator object of oj . Here, mthis, where m is the name of a method, represents the this variable of method m, which points to the receiver object of method m. By [OAG-DummyEdge], we add an edge from oroot to every object oi without any incoming edge yet, to indicate that oroot is now a pseudo allocator object of oi . Note that an object allocated in main() or a static initialiser does not have an allocator object. Due to oroot, every object has at least one allocator object
(-,o)∈pt(--) [OAG-NODE] o:∈N Oroot∈N [OAG-DUMMYNODE] (,oi)∈pt(-,nthis)m∈M oj is allocated in m [OAG-EDGE] 0i→0∈E oiE N o:oroot oi does not have any incoming edge [OAG-DUMMYEDGE 0root→oi∈E Fig.6:Rules for building the OAG,G=(N,E),for a program based on a pre-analysis. Ezample 1.Figure 4 gives the OAGs for the two programs in Figs.1 and 2.For reasons of symmetry,let us apply the rules in Fig.6 to build the OAG in Fig.4(a) only.Suppose we perform a context-insensitive Andersen's pointer analysis as the pre-analysis on the program in Fig.1.The points-to sets are:pt(v1)=pt(v2)= {0/1,0/2},pt(a1)={A/1},pt(a2)={A/2},pt(b)={B/1},and pt(c)= {C/1}.By [OAG-NODE]and [OAG-DUMMIYNODE],N [oroot,A/1,A/2,B/1,C/1,0/1, 0/2}.By [OAG-Enc,we add A/1→B/1,A/2→B/1andB/1→C/1,since B/1 is allocated in foo()with the receiver objects being A/1 and A/2 and C/1 is allocated in bar()on the receiver object B/1.By [OAG-DUMMYEDGE],we add 0root→A/1,Oroot→A/2,0root→0/1 and Oroot+0/2. ▣ Due to recursion,an OAG may have cycles including self-loops.This means that an abstract heap object may be a direct or indirect allocator object of another heap object,and conversely (with both being possibly the same). 4.3 Context Selection Figure 7 establishes some basic relations in an OAG,G=(N,E),with possibly cycles.By [REACH-REFLEXIVE]and [REACH-TRANSITIvE],we speak of graph reachability in the standard manner.In [CoNFLUENCE),Yoi identifies a conventional confluence point.In [DIVERGENCE],0iot states that oi is a divergence point,with at least two outgoing paths reaching ot,implying that either o:is a confluence point or at least one confluence point exists earlier on the two paths. oi∈N 0:→0∈E0j0k [REACH-REFLEXIVE] REACH-TRANSITIVE] 0540 0i+0k 0→0i∈E0k→o:∈E05≠0k [【CONFLUENCE Yoi 0,→0;∈EO:→ok∈E0≠oeO)00kO:DIVERGENCE国 0i<0t Fig.7:Rules for basic relations in an OAG,G=(N,E)
h , oii ∈ pt( , ) oi ∈ N [OAG-Node] oroot ∈ N [OAG-DummyNode] h , oii ∈ pt( , mthis) m ∈ M oj is allocated in m oi → oj ∈ E [OAG-Edge] oi ∈ N oi 6= oroot oi does not have any incoming edge oroot → oi ∈ E [OAG-DummyEdge] Fig. 6: Rules for building the OAG, G = (N, E), for a program based on a pre-analysis. Example 1. Figure 4 gives the OAGs for the two programs in Figs. 1 and 2. For reasons of symmetry, let us apply the rules in Fig. 6 to build the OAG in Fig. 4(a) only. Suppose we perform a context-insensitive Andersen’s pointer analysis as the pre-analysis on the program in Fig. 1. The points-to sets are: pt(v1) = pt(v2) = {O/1, O/2}, pt(a1) = {A/1}, pt(a2) = {A/2}, pt(b) = {B/1}, and pt(c) = {C/1}. By [OAG-Node] and [OAG-DummyNode], N = {oroot, A/1, A/2, B/1, C/1, O/1, O/2}. By [OAG-Edge], we add A/1 → B/1, A/2 → B/1 and B/1 → C/1, since B/1 is allocated in foo() with the receiver objects being A/1 and A/2 and C/1 is allocated in bar() on the receiver object B/1. By [OAG-DummyEdge], we add oroot → A/1, oroot → A/2, oroot → O/1 and oroot → O/2. ut Due to recursion, an OAG may have cycles including self-loops. This means that an abstract heap object may be a direct or indirect allocator object of another heap object, and conversely (with both being possibly the same). 4.3 Context Selection Figure 7 establishes some basic relations in an OAG, G = (N, E), with possibly cycles. By [Reach-Reflexive] and [Reach-Transitive], we speak of graph reachability in the standard manner. In [Confluence], goi identifies a conventional confluence point. In [Divergence], oi ≺ ot states that oi is a divergence point, with at least two outgoing paths reaching ot, implying that either ot is a confluence point or at least one confluence point exists earlier on the two paths. oi ∈ N oi oi [Reach-Reflexive] oi → oj ∈ E oj ok oi ok [Reach-Transitive] oj → oi ∈ E ok → oi ∈ E oj 6= ok goi [Confluence] oi → oj ∈ E oi → ok ∈ E oj 6= ok oj ot ok ot oi ≺ot [Divergence] Fig. 7: Rules for basic relations in an OAG, G = (N, E)
ot∈N0root→o:∈Eoi∽or (oroot0t,[]if o:=ot then heapCtrSelector([],0:)=[] [HCTX-INIT] c:(rep,c)05→o,∈Eo:o40≠o4 HCTx-DIv] rep'=true,c'=c ifep∧0j<o4① oi:(rep',c) rep'=0j3o,c'=c+oj if rep A Yoi ② rep'rep,c'=c otherwise if oi o then heapCtxSelector(c++oj,oi)=c' 0:(rep,c〉0→o:∈Eo:wo40j=oa -[HcTx-Cyc] (rep,c) ∫ep'=true,c=ct+oj,if repA Yo:③ rep'=true,d=c. otherwise if oi ot then heapCtzSelector(c++oj,0i)=c heapCtzSelector(_,0i)=c c'=c++oi [McTx] mtdCtxSelector(c,oi)=c Fig.8:Rules for context selection in an OAG,G=(N,E).is a concatenation operator. Figure 8 gives the rules for computing two context selectors,heapCtrSelector and mtdCtrSelector,used in refining an object-sensitive pointer analysis in Fig.11 In heapCtrSelector(c,oi)=c,c denotes an (abstract calling)context of the method that made the allocation of object o;and c'is the heap context selected for o;when oi is allocated in the method with context c.In mtdCirSelector(c,oi)- c',c denotes a heap context of object oi,and c'is the method context selected for the method whose receiver object is o;under its heap context c. For k-obj [24,29],both context selectors are simple.In the case of full- object-sensitivity,we have heapCirSelector([o1,...,on-1],on)=[o1,...,on-1]and mtdCtrSelector([o1,...,on-1],on)=[01,...,on]for every path from oroot to a node on in the OAG,Oroot→o1→.→on-1→on.For ak-object-.sensitive analy- sis with a (k-1)-context-sensitive heap,heapCtrSelector([on-k,...on-1],on)= [on-k+1;...,on-1]and mtdCtrSelector([on-k+1,...,on-1],on)=[on-k+1;...,on]. Essentially,a suffix of length of k is selected from oroot01 ...-on-1-on,resulting in potentially many redundant Oroot context elements to be used blindly. Let us first use an OAG in Fig.9 to explain how we avoid redundant context elements selected by k-obj.The set of con- texts for a given node,denoted ot,can be seen as the set of paths reaching ot from oroot.Instead of using all the nodes on a path to distinguish it from the other four,we use only the five representative nodes,labeled by 1-5,and identify the five paths uniquely as①→③,①→④,②→③,②→④,and 5.The other six nodes are redundant with respect to ot.The rules in Fig.8 are used to identify such representative nodes Fig.9:An OAG
ot ∈ N oroot → oi ∈ E oi ot o t i : horoot ≺ot, [ ]i if oi = ot then heapCtxSelector([ ], oi) = [ ] [Hctx-Init] o t j :hrep, ci oj → oi ∈ E oi ot oj 6= ot o t i :hrep0 , c0 i rep0 = true, c0 = c if ¬rep ∧ oj ≺ot 1 rep0 = oj ≺ot, c0 = c ++ oj if rep ∧ goi 2 rep0 = rep, c0 = c otherwise if oi = ot then heapCtxSelector(c ++ oj , oi) = c 0 [Hctx-Div] o t j :hrep, ci oj → oi ∈ E oi ot oj = ot o t i :hrep0 , c0 i rep0 = true, c0 = c ++ oj , if rep ∧ goi 3 rep0 = true, c0 = c, otherwise if oi = ot then heapCtxSelector(c ++ oj , oi) = c 0 [Hctx-Cyc] heapCtxSelector( , oi) = c c0 = c ++ oi mtdCtxSelector(c, oi) = c 0 [Mctx] Fig. 8: Rules for context selection in an OAG, G=(N, E). ++ is a concatenation operator. Figure 8 gives the rules for computing two context selectors, heapCtxSelector and mtdCtxSelector, used in refining an object-sensitive pointer analysis in Fig. 11. In heapCtxSelector(c, oi) = c 0 , c denotes an (abstract calling) context of the method that made the allocation of object oi and c 0 is the heap context selected for oi when oi is allocated in the method with context c. In mtdCtxSelector(c, oi) = c 0 , c denotes a heap context of object oi , and c 0 is the method context selected for the method whose receiver object is oi under its heap context c. For k-obj [24, 29], both context selectors are simple. In the case of fullobject-sensitivity, we have heapCtxSelector([o1, ..., on−1], on) = [o1, ..., on−1] and mtdCtxSelector([o1, ..., on−1], on) = [o1, ..., on] for every path from oroot to a node on in the OAG, oroot → o1 → ... → on−1 → on. For a k-object-sensitive analysis with a (k − 1)-context-sensitive heap, heapCtxSelector([on−k, ..., on−1], on) = [on−k+1, ..., on−1] and mtdCtxSelector([on−k+1, ..., on−1], on) = [on−k+1, ..., on]. 1 2 3 4 5 Ot Oroot Fig. 9: An OAG. Essentially, a suffix of length of k is selected from oroot → o1 → ... → on−1 → on, resulting in potentially many redundant context elements to be used blindly. Let us first use an OAG in Fig. 9 to explain how we avoid redundant context elements selected by k-obj. The set of contexts for a given node, denoted ot, can be seen as the set of paths reaching ot from oroot. Instead of using all the nodes on a path to distinguish it from the other four, we use only the five representative nodes, labeled by 1 – 5, and identify the five paths uniquely as 1 → 3 , 1 → 4 , 2 → 3 , 2 → 4 , and 5 . The other six nodes are redundant with respect to ot. The rules in Fig. 8 are used to identify such representative nodes