Self-Inferencing Refection Resolution for Java Yue Li,Tian Tan,Yulei Sui,and Jingling Xue School of Computer Science and Engineering.UNSW Australia {yueli,tiantan,ysui,jingling}Ocse.unsw.edu.au Abstract.Reflection has always been an obstacle both for sound and for effective under-approximate pointer analysis for Java applications.In pointer analysis tools,reflection is either ignored or handled partially, resulting in missed,important behaviors.In this paper,we present our findings on reflection usage in Java benchmarks and applications.Guided by these findings,we introduce a static reflection analysis,called ELF, by exploiting a self-inferencing property inherent in many reflective calls. Given a reflective call,the basic idea behind ELF is to automatically in- fer its targets (methods or fields)based on the dynamic types of the arguments of its target calls and the downcasts (if any)on their re- turned values,if its targets cannot be already obtained from the class, Method or Field objects on which the reflective call is made.We evaluate ELF against Doop's state-of-the-art reflection analysis performed in the same context-sensitive Andersen's pointer analysis using all 11 DaCapo benchmarks and two applications.ELF can make a disciplined tradeoff among soundness,precision and scalability while also discovering usually more reflective targets.ELF is useful for any pointer analysis,particularly under-approximate techniques deployed for such clients as bug detection, program understanding and speculative compiler optimization. 1 Introduction Pointer analysis is an important enabling technology since it can improve the precision and performance of many program analyses.However,reflection poses a major obstacle to pointer analysis.Despite the large literature on whole-program [1,6,7,11,15,21]and demand-driven [10,13,14,17]pointer analysis for Java, almost all the analyses reported are unsound in the presence of refection since it is either ignored or handled partially.As a result,under-approximate or unsound techniques represent an attractive alternative in cases where sound analysis is not required 18(e.g.,for supporting bug detection,program understanding and speculative compiler optimization).Even so,ignoring reflection often leads to missed,important behaviors [18.This explains why modern pointer analysis tools for Java [4,19-21]provide some forms of reflection handling. As reflection is increasingly used in Java programs,the cost of imprecise re- flection handling has increased dramatically.To improve the effectiveness of a pointer analysis tool for Java,automatic techniques for handling reflection by balancing soundness,precision and scalability are needed.Despite its impor- tance,this problem has received little attention.Some solutions include(1)dy-
Self-Inferencing Reflection Resolution for Java Yue Li, Tian Tan, Yulei Sui, and Jingling Xue School of Computer Science and Engineering, UNSW Australia {yueli,tiantan,ysui,jingling}@cse.unsw.edu.au Abstract. C onsistent * Complete * Well D ocumen et d t ysaE * o Reuse * * Eva ul det a * EC O O P * Artifact * AE C Reflection has always been an obstacle both for sound and for effective under-approximate pointer analysis for Java applications. In pointer analysis tools, reflection is either ignored or handled partially, resulting in missed, important behaviors. In this paper, we present our findings on reflection usage in Java benchmarks and applications. Guided by these findings, we introduce a static reflection analysis, called Elf, by exploiting a self-inferencing property inherent in many reflective calls. Given a reflective call, the basic idea behind Elf is to automatically infer its targets (methods or fields) based on the dynamic types of the arguments of its target calls and the downcasts (if any) on their returned values, if its targets cannot be already obtained from the Class, Method or Field objects on which the reflective call is made. We evaluate Elf against Doop’s state-of-the-art reflection analysis performed in the same context-sensitive Andersen’s pointer analysis using all 11 DaCapo benchmarks and two applications. Elf can make a disciplined tradeoff among soundness, precision and scalability while also discovering usually more reflective targets. Elf is useful for any pointer analysis, particularly under-approximate techniques deployed for such clients as bug detection, program understanding and speculative compiler optimization. 1 Introduction Pointer analysis is an important enabling technology since it can improve the precision and performance of many program analyses. However, reflection poses a major obstacle to pointer analysis. Despite the large literature on whole-program [1, 6, 7, 11, 15, 21] and demand-driven [10, 13, 14, 17] pointer analysis for Java, almost all the analyses reported are unsound in the presence of reflection since it is either ignored or handled partially. As a result, under-approximate or unsound techniques represent an attractive alternative in cases where sound analysis is not required [18] (e.g., for supporting bug detection, program understanding and speculative compiler optimization). Even so, ignoring reflection often leads to missed, important behaviors [18]. This explains why modern pointer analysis tools for Java [4, 19–21] provide some forms of reflection handling. As reflection is increasingly used in Java programs, the cost of imprecise re- flection handling has increased dramatically. To improve the effectiveness of a pointer analysis tool for Java, automatic techniques for handling reflection by balancing soundness, precision and scalability are needed. Despite its importance, this problem has received little attention. Some solutions include (1) dy-
1 Aa new A(); 2 String cName,mName,fName =... Class clz Class.forName(cName); 4 Object obi clz.newInstance(); 5 Bb=(B)obj; 6 Method mtd clz.getDeclaredMethod(mName,{A.class}); 1 Obiect 1 mtd.invoke(b,{a}); 8 Field fld clz.getField(fName); 9 Xr =(X)fld.get(a); 10 fld.set(NULL,a); Fig.1.An example of reflection usage in Java. namic analysis [2 for recording reflective(call)targets discovered during input- dependent program runs and passing these annotations to a subsequent pointer analysis,(2)online analysis 5 for discovering reflective targets at run time and performing a pointer analysis to support JIT optimizations,and(3)static anal- ysis [4,8,20 for resolving reflective targets together with a pointer analysis. In this paper,we present a new static reflection analysis,called ELF,which is integrated into DooP,a state-of-the-art Datalog-based pointer analysis tool [4 for analyzing Java programs.ELF draws its inspirations from the two earlier re- flection analyses [4,8 and benefits greatly from the open-source reflection anal- ysis implemented in Doop [4].Livshits et al.8 suggested resolving reflective calls by tracking the flow of class/method/field names in the program.In the code from Figure 1.this involves tracking the flow of cName into clz in line 3,mName into mtd in line 6,and fName into fld in line 8,if cName,mName and fName are string constants.If cName is,say,read from a configuration file,they suggested narrowing the types of reflectively-created objects,e.g.,obj in line 4, optimistically by using the downcast (B)available in line 5.Later,Doop [4] handles reflection analogously,but context-sensitively,to obtain the full benefit from the mutual increase in precision of both component analyses. However,ELF goes beyond [4,8]by taking advantage of a self-inferencing property inherent in reflective code to strike a disciplined tradeoff among sound- ness,precision and scalability.Our key observation(made from a reflection-usage study described in Section 2)is that many reflective calls are self-inferenceable. Consider r =(X)fld.get(a)in Figure 1.Its target fields accessed can often be approximated based on the dynamic types (i.e.,A)of argument a and the downcast that post-dominates its return values,if fld represents a statically unknown field named fName.In this case.the refective call is resolved to all possible field reads r =a.f.Here,f is a field of type T(where T is X or a supertype or subtype of X),declared in a class C(where C is A or a supertype of A).To the best of our knowledge,ELF is the first static reflection analysis that exploits such self-inferencing property to resolve reflective calls. Due to the intricacies and complexities of the Java reflection APl,we will postpone a detailed comparison between ELF and the two state-of-the-art reflec- tion analyses [4,8 later in Section 3 after we have introduced ELF in full. In summary,this paper makes the following main contributions:
1 A a = new A(); 2 String cName, mName, fName = ...; 3 Class clz = Class.forName(cName); 4 Object obj = clz.newInstance(); 5 B b = (B)obj; 6 Method mtd = clz.getDeclaredMethod(mName,{A.class}); 7 Object l = mtd.invoke(b, {a}); 8 Field fld = clz.getField(fName); 9 X r = (X)fld.get(a); 10 fld.set(NULL, a); Fig. 1. An example of reflection usage in Java. namic analysis [2] for recording reflective (call) targets discovered during inputdependent program runs and passing these annotations to a subsequent pointer analysis, (2) online analysis [5] for discovering reflective targets at run time and performing a pointer analysis to support JIT optimizations, and (3) static analysis [4, 8, 20] for resolving reflective targets together with a pointer analysis. In this paper, we present a new static reflection analysis, called Elf, which is integrated into Doop, a state-of-the-art Datalog-based pointer analysis tool [4] for analyzing Java programs. Elf draws its inspirations from the two earlier re- flection analyses [4, 8] and benefits greatly from the open-source reflection analysis implemented in Doop [4]. Livshits et al. [8] suggested resolving reflective calls by tracking the flow of class/method/field names in the program. In the code from Figure 1, this involves tracking the flow of cName into clz in line 3, mName into mtd in line 6, and fName into fld in line 8, if cName, mName and fName are string constants. If cName is, say, read from a configuration file, they suggested narrowing the types of reflectively-created objects, e.g., obj in line 4, optimistically by using the downcast (B) available in line 5. Later, Doop [4] handles reflection analogously, but context-sensitively, to obtain the full benefit from the mutual increase in precision of both component analyses. However, Elf goes beyond [4, 8] by taking advantage of a self-inferencing property inherent in reflective code to strike a disciplined tradeoff among soundness, precision and scalability. Our key observation (made from a reflection-usage study described in Section 2) is that many reflective calls are self-inferenceable. Consider r = (X)fld.get(a) in Figure 1. Its target fields accessed can often be approximated based on the dynamic types (i.e., A) of argument a and the downcast that post-dominates its return values, if fld represents a statically unknown field named fName. In this case, the reflective call is resolved to all possible field reads r = a.f. Here, f is a field of type T (where T is X or a supertype or subtype of X), declared in a class C (where C is A or a supertype of A). To the best of our knowledge, Elf is the first static reflection analysis that exploits such self-inferencing property to resolve reflective calls. Due to the intricacies and complexities of the Java reflection API, we will postpone a detailed comparison between Elf and the two state-of-the-art reflection analyses [4, 8] later in Section 3 after we have introduced Elf in full. In summary, this paper makes the following main contributions:
We report findings on a reflection-usage study using 14 representative Java benchmarks and applications (Section 2).We expect these findings to be useful in guiding the design and implementation of reflection analysis. We introduce a static reflection analysis,ELF,to improve the effectiveness of pointer analysis tools for Java (Section 3).ELF adopts a new self-inferencing mechanism for reflection resolution and handles a significant part of the Java reflection aPI that was previously ignored or handled partially. We formulate ELF in Datalog consisting of 207 rules,covering the majority of reflection methods frequently used in Java programs(Section 4). We have evaluated ELF against a state-of-the-art reflection analysis in DooP (version r160113)under the same context-sensitive Andersen's pointer anal- ysis framework,using all 11 DaCapo benchmarks and two Java applications, Eclipse4 and Javac (Section 5).Our results show that ELF can make a dis- ciplined tradeoff among soundness,precision and scalability while resolving usually more reflective call targets than DooP. 2 Understanding Reflection Usage Section 2.1 provides a brief introduction to the Java reflection API.Section 2.2 reports our findings on reflection usage in Java benchmarks and applications. 2.1 Background The Java reflection API provides metaobjects to allow programs to examine themselves and make changes to their structure and behavior at run time.In Figure 1,the metaobjects clz,mtd and fld are instances of the metaobject classes Class,Method and Field,respectively.Constructor can be seen as Method except that the method name""is implicit.Class provides ac- cessor methods such as getDeclaredMethod()in line 6 and getField in line 8 to allow the other metaobjects (e.g.,of Method and Field)related to a Class object to be introspected.With dynamic invocation,a Method object can be commanded to invoke the method that it represents (line 7)and a Field object can be commanded to access the field that it represents (lines 9 and 10). As far as pointer analysis is concerned,we can divide the pointer-affecting methods in the Java reflection API into three categories:(1)entry methods, e.g.,forName()in line 3,for creating Class objects,(2)member-introspecting methods,e.g.,getDeclaredMethod()in line 6 and getField()in line 8,for retrieving Method (Constructor)and Field objects from a Class object,and(3) side-effect methods,e.g.,newInstance(),invoke(),get (and set()in lines 4,7,9 and 10,that affect the pointer information in the program reflectively. Class provides a number of accessor methods for introspecting methods, constructors and fields in a target class.Unlike [4,8,ELF is the first to handle all such accessor methods in reflection analysis.Let us recall the four on return- ing Method objects.getDeclaredMethod(String,Class []returns a Method object that represents a declared method of the target Class object with the name (formal parameter types)specified by the first (second)parameter (line
– We report findings on a reflection-usage study using 14 representative Java benchmarks and applications (Section 2). We expect these findings to be useful in guiding the design and implementation of reflection analysis. – We introduce a static reflection analysis, Elf, to improve the effectiveness of pointer analysis tools for Java (Section 3). Elf adopts a new self-inferencing mechanism for reflection resolution and handles a significant part of the Java reflection API that was previously ignored or handled partially. – We formulate Elf in Datalog consisting of 207 rules, covering the majority of reflection methods frequently used in Java programs (Section 4). – We have evaluated Elf against a state-of-the-art reflection analysis in Doop (version r160113) under the same context-sensitive Andersen’s pointer analysis framework, using all 11 DaCapo benchmarks and two Java applications, Eclipse4 and Javac (Section 5). Our results show that Elf can make a disciplined tradeoff among soundness, precision and scalability while resolving usually more reflective call targets than Doop. 2 Understanding Reflection Usage Section 2.1 provides a brief introduction to the Java reflection API. Section 2.2 reports our findings on reflection usage in Java benchmarks and applications. 2.1 Background The Java reflection API provides metaobjects to allow programs to examine themselves and make changes to their structure and behavior at run time. In Figure 1, the metaobjects clz, mtd and fld are instances of the metaobject classes Class, Method and Field, respectively. Constructor can be seen as Method except that the method name “” is implicit. Class provides accessor methods such as getDeclaredMethod() in line 6 and getField in line 8 to allow the other metaobjects (e.g., of Method and Field) related to a Class object to be introspected. With dynamic invocation, a Method object can be commanded to invoke the method that it represents (line 7) and a Field object can be commanded to access the field that it represents (lines 9 and 10). As far as pointer analysis is concerned, we can divide the pointer-affecting methods in the Java reflection API into three categories: (1) entry methods, e.g., forName() in line 3, for creating Class objects, (2) member-introspecting methods, e.g., getDeclaredMethod() in line 6 and getField() in line 8, for retrieving Method (Constructor) and Field objects from a Class object, and (3) side-effect methods, e.g., newInstance(), invoke(), get() and set() in lines 4, 7, 9 and 10, that affect the pointer information in the program reflectively. Class provides a number of accessor methods for introspecting methods, constructors and fields in a target class. Unlike [4, 8], Elf is the first to handle all such accessor methods in reflection analysis. Let us recall the four on returning Method objects. getDeclaredMethod(String, Class[]) returns a Method object that represents a declared method of the target Class object with the name (formal parameter types) specified by the first (second) parameter (line
6 in Figure 1).getMethod(String,Class[])is similar except that the re- turned Method object is public(either declared or inherited).If the target Class does not have a matching method,then its superclasses are searched first recur- sively (bottom-up)before its interfaces(implemented).getDeclaredMethods() returns an array of Method objects representing all the methods declared in the target Class object.getMethods()is similar except that all the public methods (either declared or inherited)in the target Class object are returned.Given a Method object mtd,its target method can be called as shown in line 7 in Figure 1. 2.2 Empirical Study The Java reflection API is rich and complex in details.We conduct an empirical study to understand reflection usage in practice in order to guide the design and implementation of a sophisticated reflection analysis. We select 14 representative Java programs,including nine DaCapo bench- marks (2006-10-MR2),three latest versions of popular desktop applications, javac-1.7.0,jEdit-5.1.0 and Eclipse-4.2.2 (denoted Eclipse4),and two latest versions of popular server applications,Jetty-9.0.5 and Tomcat-7.0.42. Note that DaCapo consists of 11 benchmarks,including an older version of Eclipse(version 3.1.2).We exclude bloat since its application code is reflection- free.We consider lucene instead of luindex and lusearch separately since these two benchmarks are derived from lucene with the same reflection usage. We consider a total of 191 methods in the Java reflection API (version 1.5). including the ones in java.lang.reflect and java.lang.Class,loadClass() in java.lang.ClassLoader,and getclass()in java.lang.Object.We have also considered A.class,which represents the Class object of a class A. We use SooT[19]to pinpoint the calls to reflection methods in the bytecode of a program.To understand reflection usage,we consider only the reflective calls found in the application classes and their dependent libraries but exclude the standard Java libraries.To increase the code coverage for the five applications considered,we include the jar files whose names contain the names of these applications (e.g.,*jetty*.jar for Jetty)and make them available under the process-dir option supported by SOOT.For Eclipse4,we use org.eclipse.core runtime.adaptor.EclipseStarter to enable SooT to locate all the other jar files used.We manually inspect the reflection usage in a program in a demand- driven manner,starting from its side-effect methods,assisted by Open Call Hier- archy in Eclipse,by following their backward slices.For a total of 609 side-effect callsites examined,510 callsites for calling entry methods and 304 callsites for calling member-introspecting methods are tracked and analyzed. Below we describe our five findings on reflection usage in our empirical study. Side-Effect Methods Table 1 lists a total of nine side-effect methods that can possibly modify or use (as their side effects)the pointer information in a program. Figure 2 depicts their percentage frequency distribution in the 14 programs studied.We can see that invoke()and Class:newInstance()are the two most frequently used (32.7%and 35.3%,respectively,on average),which are
6 in Figure 1). getMethod(String, Class[]) is similar except that the returned Method object is public (either declared or inherited). If the target Class does not have a matching method, then its superclasses are searched first recursively (bottom-up) before its interfaces (implemented). getDeclaredMethods() returns an array of Method objects representing all the methods declared in the target Class object. getMethods() is similar except that all the public methods (either declared or inherited) in the target Class object are returned. Given a Method object mtd, its target method can be called as shown in line 7 in Figure 1. 2.2 Empirical Study The Java reflection API is rich and complex in details. We conduct an empirical study to understand reflection usage in practice in order to guide the design and implementation of a sophisticated reflection analysis. We select 14 representative Java programs, including nine DaCapo benchmarks (2006-10-MR2), three latest versions of popular desktop applications, javac-1.7.0, jEdit-5.1.0 and Eclipse-4.2.2 (denoted Eclipse4), and two latest versions of popular server applications, Jetty-9.0.5 and Tomcat-7.0.42. Note that DaCapo consists of 11 benchmarks, including an older version of Eclipse (version 3.1.2). We exclude bloat since its application code is reflectionfree. We consider lucene instead of luindex and lusearch separately since these two benchmarks are derived from lucene with the same reflection usage. We consider a total of 191 methods in the Java reflection API (version 1.5), including the ones in java.lang.reflect and java.lang.Class, loadClass() in java.lang.ClassLoader, and getClass() in java.lang.Object. We have also considered A.class, which represents the Class object of a class A. We use Soot [19] to pinpoint the calls to reflection methods in the bytecode of a program. To understand reflection usage, we consider only the reflective calls found in the application classes and their dependent libraries but exclude the standard Java libraries. To increase the code coverage for the five applications considered, we include the jar files whose names contain the names of these applications (e.g., *jetty*.jar for Jetty) and make them available under the process-dir option supported by Soot. For Eclipse4, we use org.eclipse.core. runtime.adaptor.EclipseStarter to enable Soot to locate all the other jar files used. We manually inspect the reflection usage in a program in a demanddriven manner, starting from its side-effect methods, assisted by Open Call Hierarchy in Eclipse, by following their backward slices. For a total of 609 side-effect callsites examined, 510 callsites for calling entry methods and 304 callsites for calling member-introspecting methods are tracked and analyzed. Below we describe our five findings on reflection usage in our empirical study. Side-Effect Methods Table 1 lists a total of nine side-effect methods that can possibly modify or use (as their side effects) the pointer information in a program. Figure 2 depicts their percentage frequency distribution in the 14 programs studied. We can see that invoke() and Class::newInstance() are the two most frequently used (32.7% and 35.3%, respectively, on average), which are
Table 1.Nine side-effect methods and their side effects.assuming that the target class of clz and ctor is A and the target method (field)of mtd (fld)is m (f). Simplified Method Calling Scenario Side Effect Class::newInstance o clz.newInstance() o new A() Constructor:newInstance o=ctor.newInstance(farg1,.}) o=newA(arg1,…) Method:invoke a mtd.invoke(o,{arg1,...]) a=0.m(arg1,…) Field::get a=fd.get(o) a=o.f Field::set fld.set(o,a) o.f=a Array:newInstance o Array.newInstance(clz.size) o new Asize Array::get a Array.get(o,i) a=o Array::set Array.set(o,i,a)】 ofil =a Proxy:newProxyInstance o Prory.newProxyInstance(... o=new Proxy*(...) oxy..ge ne Other Fig.2.Side-effect methods Fig.3.Entry methods handled by prior pointer analysis tools 4,20,21].However,Array-related side- effect methods,which are also used in many programs,are previously ignored but handled by ELF.Note that newProxyInstance()is used in jEdit only. Entry Reflection Methods Figure 3 shows the percentage frequency distri- bution of different types of entry methods used.The six as shown are the only ones found in the first 12 programs.In the last two (Jetty and Tomcat),"Oth- ers"stands for defineClass()in ClassLoader and getParameterTypes()in Method only."Unknown"is included since we failed to find the entry meth- ods for some side-effect calls such as invoke()even by using Eclipse's Open Call Hierarchy tool.Finally,getComponentType()is usually used in the form of getclass().getComponentType()for creating a Class object argument for Array.newInstance().On average,Class.forName()and.class are the top two most frequently used entry methods (48.1%and 18.0%,respectively). String Constants and String Manipulation As shown in Figure 4,string constants are commonly used when calling the two entry methods (34.7%on average)and the four member-introspecting methods(63.1%on average).In the presence of string manipulations,many class/method/field names are unknown exactly.This is mainly because their static resolution requires precisely handling of many different operations e.g.,subString()and append().Thus,ELF does
Table 1. Nine side-effect methods and their side effects, assuming that the target class of clz and ctor is A and the target method (field) of mtd (fld) is m (f ). Simplified Method Calling Scenario Side Effect Class::newInstance o = clz.newInstance() o = new A() Constructor::newInstance o = ctor.newInstance({arg1, ...}) o = new A(arg1, ...) Method::invoke a = mtd.invoke(o, {arg1, ...}) a = o.m(arg1, ...) Field::get a = fld.get(o) a = o.f Field::set fld.set(o, a) o.f = a Array::newInstance o = Array.newInstance(clz, size) o = new A[size] Array::get a = Array.get(o, i) a = o[i] Array::set Array.set(o, i, a) o[i] = a Proxy::newProxyInstance o = Proxy.newProxyInstance(...) o = new Proxy$*(...) antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Array::set Array::get Field::set Field::get Method::invoke Array::newInstance Proxy::newProxyInstance Constructor::newInstance Class::newInstance Fig. 2. Side-effect methods. antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Unknown Others Proxy::getProxyClass Class::getComponentType ClassLoader::loadClass .class Object::getClass Class::forName Fig. 3. Entry methods. handled by prior pointer analysis tools [4, 20, 21]. However, Array-related sideeffect methods, which are also used in many programs, are previously ignored but handled by Elf. Note that newProxyInstance() is used in jEdit only. Entry Reflection Methods Figure 3 shows the percentage frequency distribution of different types of entry methods used. The six as shown are the only ones found in the first 12 programs. In the last two (Jetty and Tomcat), “Others” stands for defineClass() in ClassLoader and getParameterTypes() in Method only. “Unknown” is included since we failed to find the entry methods for some side-effect calls such as invoke() even by using Eclipse’s Open Call Hierarchy tool. Finally, getComponentType() is usually used in the form of getClass().getComponentType() for creating a Class object argument for Array.newInstance(). On average, Class.forName() and .class are the top two most frequently used entry methods (48.1% and 18.0%, respectively). String Constants and String Manipulation As shown in Figure 4, string constants are commonly used when calling the two entry methods (34.7% on average) and the four member-introspecting methods (63.1% on average). In the presence of string manipulations, many class/method/field names are unknown exactly. This is mainly because their static resolution requires precisely handling of many different operations e.g., subString() and append(). Thus, Elf does
□Unknown☐String Manipulation Unresolved☐String Manipulatio Resolved Const-String 言复是8量复复屋爱是盖道宜警 (a)Calls to entry methods (b)Calls to member-introspecting methods Fig.4.Classification of the String arguments of two entry methods,forName() and loadclass(),and four member-introspecting methods,getMethod() getDeclaredMethod(),getField()and getDeclaredField(). not handle string manipulations presently.As suggested in Section 5.3.2,how- ever,incomplete information about class/method/field names can be exploited in our self-inferencing framework,just like the cast and type information. We also found that many string arguments are Unknown (55.3%for calling entry methods and 25.1%for calling member-introspecting methods,on average). These are the strings that may be read from,say,configuration files or command lines.Finally,string constants are found to be more frequently used for calling the four member-introspecting methods than the two entry methods:146 calls to getDeclaredMethod()and getMethod(),27 calls to getDeclaredField() and getField()in contrast with 98 calls to forName()and loadclass().This suggests that the analyses [4,20 that ignore string constants flowing into some of these member-introspecting methods may be imprecise(Table 2). Self-Inferenceable Reflective Calls In real applications,many reflective calls are self-inferenceable,as illustrated in Figures 8-10.Therefore,we should try to find their targets by aggressively tracking the flow of constant class/method/field names in the program.However,there are also many input-dependent strings. For many input-dependent reflective calls,such as factoryField.get(null)in Figure 8,field.set(null,value)in Figure 9 and method.invoke(target, parameters)in Figure 10,we can approximate their targets reasonably accu- rately based on the dynamic types of the arguments of their target calls and the downcasts(if any)on their returned values.ELF will exploit such self-inferencing property inherent in reflective code during its reflection analysis. Retrieving an Array of Method/Field/Constructor Objects Class con- tains a number of accessor methods for returning an array of such metaobjects for the target Class object.In the two Eclipse programs,there are four invoke callsites called on an array of Method objects returned from getMethods and 15 fld.get()and fld.set()callsites called on an array of Field objects returned by getDeclaredFields().Ignoring such methods as in prior work [4,8,21]may lead to many missed methods in the call graph of a program
antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average antlr chart eclipse fop hsqldb jython lucene pmd xalan eclipse4 javac jedit jetty tomcat average Unknown String Manipulation Unresolved String Manipulation Resolved Const-String (a) Calls to entry methods (b) Calls to member-introspecting methods Fig. 4. Classification of the String arguments of two entry methods, forName() and loadClass(), and four member-introspecting methods, getMethod(), getDeclaredMethod(), getField() and getDeclaredField(). not handle string manipulations presently. As suggested in Section 5.3.2, however, incomplete information about class/method/field names can be exploited in our self-inferencing framework, just like the cast and type information. We also found that many string arguments are Unknown (55.3% for calling entry methods and 25.1% for calling member-introspecting methods, on average). These are the strings that may be read from, say, configuration files or command lines. Finally, string constants are found to be more frequently used for calling the four member-introspecting methods than the two entry methods: 146 calls to getDeclaredMethod() and getMethod(), 27 calls to getDeclaredField() and getField() in contrast with 98 calls to forName() and loadClass(). This suggests that the analyses [4, 20] that ignore string constants flowing into some of these member-introspecting methods may be imprecise (Table 2). Self-Inferenceable Reflective Calls In real applications, many reflective calls are self-inferenceable, as illustrated in Figures 8 – 10. Therefore, we should try to find their targets by aggressively tracking the flow of constant class/method/field names in the program. However, there are also many input-dependent strings. For many input-dependent reflective calls, such as factoryField.get(null) in Figure 8, field.set(null, value) in Figure 9 and method.invoke(target, parameters) in Figure 10, we can approximate their targets reasonably accurately based on the dynamic types of the arguments of their target calls and the downcasts (if any) on their returned values. Elf will exploit such self-inferencing property inherent in reflective code during its reflection analysis. Retrieving an Array of Method/Field/Constructor Objects Class contains a number of accessor methods for returning an array of such metaobjects for the target Class object. In the two Eclipse programs, there are four invoke callsites called on an array of Method objects returned from getMethods and 15 fld.get() and fld.set() callsites called on an array of Field objects returned by getDeclaredFields(). Ignoring such methods as in prior work [4, 8, 21] may lead to many missed methods in the call graph of a program
r =(A)(o,{..))7 r=(A)get(o)4 ).set(o,a)s s Method M=C.getMethod(mName,{...)Field=CgetField(IName)3 Class.forName(cName)1: 年。tgw。g。w。,,。e。ee。gg。g IName Propagation =(A)C ype ->Target Method/Field Descriptor Inference Fig.5.Self-inferencing reflection analysis in ELF. 3 Methodology We start with a set of assumptions made.We then describe our self-inferencing approach adopted by ELF.Finally,we compare ELF with the two prior reflection analyses [4,8 by summarizing their similarities and differences. 3.1 Assumptions We adopt all the assumptions from [8]:(1)Closed World:only the classes reach- able from the class path at analysis time can be used by the program at run time,(2)Well-behaved Class Loaders:the name of the class returned by a call to forName(cName)equals cName,and(3)Correct Casts:the downcasts operating on the result of a call to newInstance()are correct.Due to (1),we will not con- sider the side-effect method Proxy:newProxyInstance in Table 1 and the entry method loadClass in Figure 3 as both may use custom class loaders.Finally, we broaden Correct Casts by also including fld.get()and mtd.invoke(). 3.2 Self-Inferencing Reflection Resolution Figure 5 depicts a typical reflection scenario and illustrates how ELF works. In this scenario,a Class object C is first created for the target class named cName.Then a Method (Field)object M(F)representing the target method (field)named mName(fName)in the target class of C is created.Finally,at some reflective callsites,e.g.,invoke(),get()and set(),the target method (field) is invoked (accessed)on the target object o,with the arguments,{...or a.In the case of newInstance(),the default constructor "init()"called is implicit. ELF works as part of a pointer analysis,with each being both the producer and consumer of the other.It exploits a self-inferencing property inherent in reflective code,by employing the following two component analyses(Figure 5): Target Propagation (Marked by Solid Arrows)ELF resolves the targets (methods or fields)of reflective calls,such as invoke(),get()and set(),by propagating the names of the target classes and methods/fields (e.g.,those
Target'Class'Type'Propagation Target'Method/Field'Name'Propagation Target'Class'Type'Inference Target'Method/Field'Descriptor'Inference r = (A) M .invoke(o, {...}) r = (A) F .get(o) F .set(o, a) Method M = C .getMethod(mName, {...}) Class C = Class.forName(cName) Field F = C .getField(fName) r = (A) C .newInstance() 1 2 3 4 5 6 7 Fig. 5. Self-inferencing reflection analysis in Elf. 3 Methodology We start with a set of assumptions made. We then describe our self-inferencing approach adopted by Elf. Finally, we compare Elf with the two prior reflection analyses [4, 8] by summarizing their similarities and differences. 3.1 Assumptions We adopt all the assumptions from [8]: (1) Closed World: only the classes reachable from the class path at analysis time can be used by the program at run time, (2) Well-behaved Class Loaders: the name of the class returned by a call to forName(cName) equals cName, and (3) Correct Casts: the downcasts operating on the result of a call to newInstance() are correct. Due to (1), we will not consider the side-effect method Proxy::newProxyInstance in Table 1 and the entry method loadClass in Figure 3 as both may use custom class loaders. Finally, we broaden Correct Casts by also including fld.get() and mtd.invoke(). 3.2 Self-Inferencing Reflection Resolution Figure 5 depicts a typical reflection scenario and illustrates how Elf works. In this scenario, a Class object C is first created for the target class named cName. Then a Method (Field) object M (F) representing the target method (field) named mName (fName) in the target class of C is created. Finally, at some reflective callsites, e.g., invoke(), get() and set(), the target method (field) is invoked (accessed) on the target object o, with the arguments, {...} or a. In the case of newInstance(), the default constructor “init()” called is implicit. Elf works as part of a pointer analysis, with each being both the producer and consumer of the other. It exploits a self-inferencing property inherent in reflective code, by employing the following two component analyses (Figure 5): Target Propagation (Marked by Solid Arrows) Elf resolves the targets (methods or fields) of reflective calls, such as invoke(), get() and set(), by propagating the names of the target classes and methods/fields (e.g., those
pointed by cName,mName and fName if statically known)along the solid lines into the points symbolized by circles.Note that the second argument of getMethod()is an array of type Class[].It may not be beneficial to analyze it to disambiguate overloaded methods.because (1)its size may be statically unknown,(2)its components are collapsed by the pointer analysis, and(3)its components may be Class objects with unknown class names. Target Inference (Marked by Dashed Arrows)By using Target Propaga- tion alone,a target method/field name (blue circle)or its target class type (red circle)at a reflective callsite may be missing.i.e..unknown,due to the presence of input-dependent strings(Figure 4).If the target class type(red circle)is missing,ELF will infer it from the dynamic type of the target ob- ject o (obtained by pointer analysis)at invoke(),get()or set()(when o !=null)or the downcast (if any),such as (A),that post-dominantly op- erates on the result of a call to newInstance().If the target method/field name (blue circle)is missing,ELF will infer it from (1)the dynamic types of the arguments of the target call,e.g.,{...of invoke()and a of set(), and/or(2)the downcast on the result of the call,such as (A)at invoke() and get().Just like getMethod,the second argument of invoke()is also an array,which is also similarly hard to analyze statically.To improve precision, we disambiguate overloaded target methods with a simple intraprocedural analysis only when the array argument can be analyzed exactly element-wise. To balance soundness,precision and scalability in a disciplined manner,ELF adopts the following inference principle:a target method or field is resolved at a reflective callsite if both its target class type (red circle)and its target method/- field name (blue circle)can be resolved (i.e.,statically discovered)during either Target Propagation or Target Inference.As a result,the number of spurious tar- gets introduced when analyzing a reflective call,invoke(),get()or set(),is minimized due to the existence of two simultaneous constraints (the red and blue circles).How to relax ELF in the presence of just one such a constraint will be investigated in future work.Note that the cast operations on newInstance() will still have to be handled heuristically as only one of the two constraints exists. As ELF is unsound,so is the underlying pointer analysis.Therefore,a reflective callsite is said to be resolved if at least one of its targets is resolved. Let us illustrate Target Inference by considering r=(A)F.get(o)in Fig- ure 5.If a target field name is known but its target class type (i.e.,red circle) is missing,we infer it by looking at the types of all pointed-to objects o'by o. If B is the type of o',then a potential target class of o is B or any of its super- types.If the target class type of F is B but a potential target field name (i.e., blue circle)is missing,we can deduce it from the downcast (A)to resolve the call to r =o.f,where f is a member field in B whose type is A or a supertype or subtype of A.A supertype is possible because a field of this supertype may initially point to an object of type,say,A and then downcast to A. In Figure 5,if getMethods()(getFields())is called at Label 6(Label 3) instead,then an array of Method(Field)objects will be returned so that Target
pointed by cName, mName and fName if statically known) along the solid lines into the points symbolized by circles. Note that the second argument of getMethod() is an array of type Class[]. It may not be beneficial to analyze it to disambiguate overloaded methods, because (1) its size may be statically unknown, (2) its components are collapsed by the pointer analysis, and (3) its components may be Class objects with unknown class names. Target Inference (Marked by Dashed Arrows) By using Target Propagation alone, a target method/field name (blue circle) or its target class type (red circle) at a reflective callsite may be missing, i.e., unknown, due to the presence of input-dependent strings (Figure 4). If the target class type (red circle) is missing, Elf will infer it from the dynamic type of the target object o (obtained by pointer analysis) at invoke(), get() or set() (when o != null) or the downcast (if any), such as (A), that post-dominantly operates on the result of a call to newInstance(). If the target method/field name (blue circle) is missing, Elf will infer it from (1) the dynamic types of the arguments of the target call, e.g., {...} of invoke() and a of set(), and/or (2) the downcast on the result of the call, such as (A) at invoke() and get(). Just like getMethod, the second argument of invoke() is also an array, which is also similarly hard to analyze statically. To improve precision, we disambiguate overloaded target methods with a simple intraprocedural analysis only when the array argument can be analyzed exactly element-wise. To balance soundness, precision and scalability in a disciplined manner, Elf adopts the following inference principle: a target method or field is resolved at a reflective callsite if both its target class type (red circle) and its target method/- field name (blue circle) can be resolved (i.e., statically discovered) during either Target Propagation or Target Inference. As a result, the number of spurious targets introduced when analyzing a reflective call, invoke(), get() or set(), is minimized due to the existence of two simultaneous constraints (the red and blue circles). How to relax Elf in the presence of just one such a constraint will be investigated in future work. Note that the cast operations on newInstance() will still have to be handled heuristically as only one of the two constraints exists. As Elf is unsound, so is the underlying pointer analysis. Therefore, a reflective callsite is said to be resolved if at least one of its targets is resolved. Let us illustrate Target Inference by considering r = (A) F.get(o) in Figure 5. If a target field name is known but its target class type (i.e., red circle) is missing, we infer it by looking at the types of all pointed-to objects o 0 by o. If B is the type of o 0 , then a potential target class of o is B or any of its supertypes. If the target class type of F is B but a potential target field name (i.e., blue circle) is missing, we can deduce it from the downcast (A) to resolve the call to r = o.f, where f is a member field in B whose type is A or a supertype or subtype of A. A supertype is possible because a field of this supertype may initially point to an object of type, say, A and then downcast to A. In Figure 5, if getMethods() (getFields()) is called at Label 6 (Label 3) instead, then an array of Method (Field) objects will be returned so that Target
Table 2.Comparing ELF with the two closely-related reflection analyses [4,8]. Member-Introspecting ⑧ DooP 4 ELF Side-Effect Methods Methods ◆一◆一 getMethod invoke getDeclaredMethod getMethods n/a n a v n avv getDeclaredMethods n/a n/a v n/a vv getField get getDeclaredField set getFields n/a n/av n/a vv getDeclaredFields n/a n/a v n/a vv Vn/a√/n/aVn/aWn/a Propagation from them is implicitly performed.All the other methods available in Class for introspecting methods/fields/constructors are handled similarly. 3.3 ELF vs.Livshits et al.'s Analysis and Doop Table 2 compares ELF with Livshits et al.'s and Doop's analyses [4,8 in terms of how four representative side-effect reflective calls are resolved. Target Propagation ELF resolves a target method/field at a reflective callsite by requiring both its target class type (red circle)and its target name(blue circle)to be known.However,this is not the case in the other two analyses. In the case of Livshits et al.'s analysis,the target class type is always ignored. Therefore,the target methods/fields with a given name in all the classes in the program are conservatively included.Doop suffers the opposite problem by ignoring the target method/field names.As a result,all methods/fields in the target class are included.Finally,of the three analyses,ELF is the only one that can handle all the member-introspecting methods listed. Target Inference Of the three analyses,ELF is the only one to adopt a self inferencing principle to find the target classes and methods/fields at a reflec- tive callsite.Livshits et al.'s analysis narrows the type of reflectively-created objects at newInstance()in Figure 5,but Doop does not do this.However, Doop is more sophisticated than Livshits et al.'s analysis in distinguishing virtual,static and special calls and considering the modifiers of fields for loads and stores.These are all handled by the ELF reflection analysis. 4 Reflection Resolution We specify the reflection resolution in ELF as a set of Datalog rules,i.e.,mono- tonic logical inferences(with no negation in a recursion cycle),following the style of [6].The main advantage is that the specification is close to the actual imple- mentation.Datalog has been the basis of several pointer analysis tools [4,6,8, 21].Our rules are declarative:the order of evaluation of rules or examination of their clauses do not affect the final results.Given a program to be analyzed,these rules are repeatedly applied to infer more facts until a fixed point is reached
Table 2. Comparing Elf with the two closely-related reflection analyses [4, 8]. Side-Effect Methods Member-Introspecting [8] Doop [4] Elf Methods invoke getMethod √ √ √ √ √ getDeclaredMethod √ √ √ √ √ √ getMethods n/a n/a √ n/a √ √ getDeclaredMethods n/a n/a √ n/a √ √ getField √ √ √ √ √ get getDeclaredField √ √ √ √ √ √ set getFields n/a n/a √ n/a √ √ getDeclaredFields n/a n/a √ n/a √ √ newInstance √ n/a √ √ n/a √ n/a √ n/a Propagation from them is implicitly performed. All the other methods available in Class for introspecting methods/fields/constructors are handled similarly. 3.3 Elf vs. Livshits et al.’s Analysis and Doop Table 2 compares Elf with Livshits et al.’s and Doop’s analyses [4, 8] in terms of how four representative side-effect reflective calls are resolved. Target Propagation Elf resolves a target method/field at a reflective callsite by requiring both its target class type (red circle) and its target name (blue circle) to be known. However, this is not the case in the other two analyses. In the case of Livshits et al.’s analysis, the target class type is always ignored. Therefore, the target methods/fields with a given name in all the classes in the program are conservatively included. Doop suffers the opposite problem by ignoring the target method/field names. As a result, all methods/fields in the target class are included. Finally, of the three analyses, Elf is the only one that can handle all the member-introspecting methods listed. Target Inference Of the three analyses, Elf is the only one to adopt a selfinferencing principle to find the target classes and methods/fields at a reflective callsite. Livshits et al.’s analysis narrows the type of reflectively-created objects at newInstance() in Figure 5, but Doop does not do this. However, Doop is more sophisticated than Livshits et al.’s analysis in distinguishing virtual, static and special calls and considering the modifiers of fields for loads and stores. These are all handled by the Elf reflection analysis. 4 Reflection Resolution We specify the reflection resolution in Elf as a set of Datalog rules, i.e., monotonic logical inferences (with no negation in a recursion cycle), following the style of [6]. The main advantage is that the specification is close to the actual implementation. Datalog has been the basis of several pointer analysis tools [4, 6, 8, 21]. Our rules are declarative: the order of evaluation of rules or examination of their clauses do not affect the final results. Given a program to be analyzed, these rules are repeatedly applied to infer more facts until a fixed point is reached
ELF works as part of a flow-insensitive Andersen's pointer analysis context- sensitively.However,all the Datalog rules are given here context-insensitively. There are 207 Datalog rules.One set of rules handles all the 98 possible scenarios (i.e.,combinations)involving the methods listed in Table 2(illustrated in Figure 5),where 98 =4 (four member-introspecting methods)x 3 (three side-effect methods,invoke(),get()and set())x 4(four possible arrows in Figure 5)x 2(two types of side-effect methods each,instance or static)+2 (newInstance()with a statically known or unknown type).This set of rules is further divided into those for performing target propagation(involving 4x 3x 1x 2+1 =25 scenarios)and those for performing target inference.The remaining set of rules handles Constructor and arrays and performs bookkeeping duties. Section 4.1 gives a set of domains and input/output relations used.Section 4.2 describes the seven target propagation scenarios corresponding to Labels 1-7 in Figure 5.Section 4.3 describes four representative target inference scenarios.All the other rules(available as an open-source tool)can be understood analogously. Section 4.4 discusses briefly some properties about our analysis. T:set of class types V:set of program variables M:set of methods F:set of fields H:set of heap abstractions I:set of invocation sites N:set of natural numbers S:set of strings SCALL(invo:I,mtd:M) VCALL(invo:I.base:V,mtd:M) ACTUALARC(invo:I,i:N,arg:V) ACTUALRETURN(invo:I,var:V) HEAPTYPE(heap:H,type:T) ASSICNABLE(to Type:T,from Type:T) THISVAR(mtd:M,this:V) LOOKUPMTD(type:T,mName:H,dp:S,mtd:M) MTDSTRING(mtd:M,str:S) STRING ToCLASS(strConst:H,type:T) MTDDECL(type:T,mName:H,dp:S,mtd:M) FLDDECL(type:T,fName:H,fType:T,Ad:F) PUBLICMTD(type:T,mName:H,mtd:M) PUBLICFLD(type:T.fName:H,Ad:F) NEWINSTANCEHEAP(type:T.heap:H) TYPE-CLASSHEAP(type:T,clzHeap:H) MTD-MTDHEAP(mtd:M,mtdHeap:H) FLD-FLDHEAP(Ad:F,fdHeap:H) VARPOINTSTo(var:V,heap:H) CALLGRAPH(invo:I,mtd:M) FLDPOINTSTo(base:H,Ad:F,heap:H) REPCALLGRAPH(invo:I.mtd:M) Fig.6.Domains and input/output relations 4.1 Domains and Input/Output Relations Figure 6 shows the eight domains used,18 input relations and four output rela- tions.Given a method mtd called at an invocation site I.as a static call (SCALL) or a virtual call(VCALL),its i-th argument arg is identified by AcTuALARG and its returned value is assigned to var as identified by AcruALRETURN. HEAPTYPE describes the types of heap objects.AssIGNABLE is the usual sub- typing relation.THIsVAR correlates this to each method where it is declared. MrDSTRING specifies the signatures (in the form of strings)for all the meth- ods,including also their containing class types and return types.STRINGTo- CLAss records the class type information for all compile-time string names. LookUPMTD matches a method mtd named mName with descriptor dp to its definition in a class,type.For simplicity,mName is modeled as a heap object in
Elf works as part of a flow-insensitive Andersen’s pointer analysis contextsensitively. However, all the Datalog rules are given here context-insensitively. There are 207 Datalog rules. One set of rules handles all the 98 possible scenarios (i.e., combinations) involving the methods listed in Table 2 (illustrated in Figure 5), where 98 = 4 (four member-introspecting methods) × 3 (three side-effect methods, invoke(), get() and set()) × 4 (four possible arrows in Figure 5) × 2 (two types of side-effect methods each, instance or static) + 2 (newInstance() with a statically known or unknown type). This set of rules is further divided into those for performing target propagation (involving 4×3×1× 2 + 1 = 25 scenarios) and those for performing target inference. The remaining set of rules handles Constructor and arrays and performs bookkeeping duties. Section 4.1 gives a set of domains and input/output relations used. Section 4.2 describes the seven target propagation scenarios corresponding to Labels 1 – 7 in Figure 5. Section 4.3 describes four representative target inference scenarios. All the other rules (available as an open-source tool) can be understood analogously. Section 4.4 discusses briefly some properties about our analysis. T: set of class types V : set of program variables M: set of methods F: set of fields H : set of heap abstractions I : set of invocation sites N : set of natural numbers S: set of strings Scall(invo:I, mtd:M) Vcall(invo:I, base:V, mtd:M) ActualArg(invo:I, i:N, arg:V ) ActualReturn(invo:I, var:V ) HeapType(heap:H, type:T) Assignable(toType:T, fromType:T) ThisVar(mtd:M, this:V ) LookUpMtd(type:T, mName:H, dp:S, mtd:M) MtdString(mtd:M, str:S) StringToClass(strConst:H, type:T) MtdDecl(type:T, mName:H, dp:S, mtd:M) FldDecl(type:T, fName:H, fType:T, fld:F) PublicMtd(type:T, mName:H, mtd:M) PublicFld(type:T, fName:H, fld:F) NewInstanceHeap(type:T, heap:H ) Type-ClassHeap(type:T, clzHeap:H ) Mtd-MtdHeap(mtd:M, mtdHeap:H ) Fld-FldHeap(fld:F, fldHeap:H ) VarPointsTo(var:V, heap:H ) CallGraph(invo:I, mtd:M) FldPointsTo(base:H, fld:F, heap:H ) RefCallGraph(invo:I, mtd:M) Fig. 6. Domains and input/output relations. 4.1 Domains and Input/Output Relations Figure 6 shows the eight domains used, 18 input relations and four output relations. Given a method mtd called at an invocation site I, as a static call (Scall) or a virtual call (Vcall), its i-th argument arg is identified by ActualArg and its returned value is assigned to var as identified by ActualReturn. HeapType describes the types of heap objects. Assignable is the usual subtyping relation. ThisVar correlates this to each method where it is declared. MtdString specifies the signatures (in the form of strings) for all the methods, including also their containing class types and return types. StringToClass records the class type information for all compile-time string names. LookUpMtd matches a method mtd named mName with descriptor dp to its definition in a class, type. For simplicity, mName is modeled as a heap object in