Effective Soundness-Guided Reflection Analysis Yue Li,Tian Tan,and Jingling Xue Programming Languages and Compilers Group School of Computer Science and Engineering,UNSW Australia Abstract.We introduce SOLAR.the first reflection analysis that allows its soundness to be reasoned about when some assumptions are met and produces significantly improved under-approximations otherwise.In both settings,SoLAR has three novel aspects:(1)lazy heap modeling for reflective allocation sites,(2)collective inference for improving the inferences on related reflective calls,and (3)automatic identification of "problematic"reflective calls that may threaten its soundness,precision and scalability,thereby enabling their improvement via lightweight anno- tations.We evaluate SOLAR against two state-of-the-art solutions,DooP and ELF,with the three treated as under-approximate reflection analyses, using 11 large Java benchmarks and applications.SoLAR is significantly more sound while achieving nearly the same precision and running only several-fold more slowly,subject to only 7 annotations in 3 programs. 1 Introduction Reflection is increasingly used in a range of software and framework architec- tures,allowing a software system to choose and change implementations of ser- vices at run-time,but posing significant challenges to static program analysis. In the case of Java programs,refection has always been an obstacle for pointer analysis [1-10],a fundamental static analysis on which virtually all others [11- 16]are built.All pointer analysis tools for Java [2,17-19]either ignore reflection or handle it partially since their underlying best-effort reflection analyses [5,17, 18,20-22]provide only under-approximated handling of reflection heuristically. However,such unsoundness can render much of the codebase invisible for analysis.There is a recent community initiative [23]calling for the development of soundy analysis to handle "hard"language features(such as reflection).A soundy analysis is one that is as sound as possible without excessively compro- mising precision and/or scalability.Thus,improving or even achieving soundness in reflection analysis will provide significant benefits to many clients,such as pro- gram verifiers,optimizing compilers,bug detectors and security analyzers. In this paper,we make the following contributions: We introduce SoLAR,the first reflection analysis that allows its soundness to be reasoned about when some reasonable assumptions are met and yields significantly improved under-approximations otherwise(Section 2).We have developed SoLAR by adopting three novel aspects in its design:(N1)lazy heap modeling for refective allocation sites.(N2)collective inference for
Effective Soundness-Guided Reflection Analysis Yue Li, Tian Tan, and Jingling Xue Programming Languages and Compilers Group School of Computer Science and Engineering, UNSW Australia Abstract. We introduce Solar, the first reflection analysis that allows its soundness to be reasoned about when some assumptions are met and produces significantly improved under-approximations otherwise. In both settings, Solar has three novel aspects: (1) lazy heap modeling for reflective allocation sites, (2) collective inference for improving the inferences on related reflective calls, and (3) automatic identification of “problematic” reflective calls that may threaten its soundness, precision and scalability, thereby enabling their improvement via lightweight annotations. We evaluate Solar against two state-of-the-art solutions, Doop and Elf, with the three treated as under-approximate reflection analyses, using 11 large Java benchmarks and applications. Solar is significantly more sound while achieving nearly the same precision and running only several-fold more slowly, subject to only 7 annotations in 3 programs. 1 Introduction Reflection is increasingly used in a range of software and framework architectures, allowing a software system to choose and change implementations of services at run-time, but posing significant challenges to static program analysis. In the case of Java programs, reflection has always been an obstacle for pointer analysis [1–10], a fundamental static analysis on which virtually all others [11– 16] are built. All pointer analysis tools for Java [2, 17–19] either ignore reflection or handle it partially since their underlying best-effort reflection analyses [5, 17, 18, 20–22] provide only under-approximated handling of reflection heuristically. However, such unsoundness can render much of the codebase invisible for analysis. There is a recent community initiative [23] calling for the development of soundy analysis to handle “hard” language features (such as reflection). A soundy analysis is one that is as sound as possible without excessively compromising precision and/or scalability. Thus, improving or even achieving soundness in reflection analysis will provide significant benefits to many clients, such as program verifiers, optimizing compilers, bug detectors and security analyzers. In this paper, we make the following contributions: – We introduce Solar, the first reflection analysis that allows its soundness to be reasoned about when some reasonable assumptions are met and yields significantly improved under-approximations otherwise (Section 2). We have developed Solar by adopting three novel aspects in its design: (N1) lazy heap modeling for reflective allocation sites, (N2) collective inference for
related reflective calls,and (N3)automatic identification of "problematic" reflective calls that may threaten its soundness,precision and scalability. -We formalize SoLAR as part of a pointer analysis for Java(including a small core of its reflection APl)and reason about its soundness under a set of assumptions(Section 3).We have produced an open source implementation on top of Doop [18,which is a modern pointer analysis tool for Java. We evaluate SOLAR against two state-of-the-art reflection analyses,Doop [5] and ELF [21],with 11 large Java benchmarks/applications (Section 4),where all the three are treated as under-approximate analyses(due to,e.g.,native code).By instrumenting these programs under their associated inputs(when available).SOLAR is the only one to achieve total recall(for all reflective tar- gets accessed),with 371%(148%)more target methods resolved than Doop (ELF)in total,which translates into 49700(40570)more true caller-callee re- lations statically calculated w.r.t.these inputs alone.SoLAR has done so by maintaining nearly the same precision as and running only several-fold more slowly than ELF and DooP,subject to only 7 annotations in 3 programs. 2 Methodology Fig.1 illustrates an example of reflection usage abstracted in real code.In line 2, a Class metaobject c1 is created by calling Class.forName(cName)to represent the class named cName,where cName,i.e.,cName1 in line 10 is an input string to be read from a command line or a configuration file.In line 3.an object o is refectively created as an instance of c1 by calling c1.newInstance()and then assigned to v with the declared type as Java.lang.Object in line 10. Subsequently,o is used in two common scenarios.In the if branch,o is downcast to a specific type,A,and then used appropriately.The else branch is more interesting.In line 14,a Method metaobject m is created by calling getMethod() indirectly in line 7,with its class name,method name and formal parameters specified by cName2,mName2 and "..."(elided)in line 7,respectively.In line 15, this method is called reflectively on the receiver object o(pointed to by v)with the actual argument being passed in an array,new object[]{x,y}. 1 Object createObj(String cName){ 9 void foo(xx.Yy....) Class c1=Class.forName(cName) 10 Object v=createObj(cName1)://cName1 is an input string retum c1.newinstance0: if(.)( 12 Aa=(A)v: 13 }else{ 5 Method getMtd(String cName.String mName){14 Method m=getMtd(cName2,mName2): 6 Class c2=Class.forName(cName): 5 m.invoke(v,new Object](x,y)): retum c2.getMethod(mName....) 16 8} 17} Fig.1.An example of reflection usage abstracted from JDK 1.6.0_45. A refection analysis infers,i.e..resolves statically the reflective targets ac- cessed at reflective call sites.As usual,soundnesss demands over-approximation. Reflection introduces many challenges for static analysis.First,a modern reflec- tion API is large and complex.Second,reflection is typically used as a means
related reflective calls, and (N3) automatic identification of “problematic” reflective calls that may threaten its soundness, precision and scalability. – We formalize Solar as part of a pointer analysis for Java (including a small core of its reflection API) and reason about its soundness under a set of assumptions (Section 3). We have produced an open source implementation on top of Doop [18], which is a modern pointer analysis tool for Java. – We evaluate Solar against two state-of-the-art reflection analyses, Doop [5] and Elf [21], with 11 large Java benchmarks/applications (Section 4), where all the three are treated as under-approximate analyses (due to, e.g., native code). By instrumenting these programs under their associated inputs (when available), Solar is the only one to achieve total recall (for all reflective targets accessed), with 371% (148%) more target methods resolved than Doop (Elf) in total, which translates into 49700 (40570) more true caller-callee relations statically calculated w.r.t. these inputs alone. Solar has done so by maintaining nearly the same precision as and running only several-fold more slowly than Elf and Doop, subject to only 7 annotations in 3 programs. 2 Methodology Fig. 1 illustrates an example of reflection usage abstracted in real code. In line 2, a Class metaobject c1 is created by calling Class.forName(cName) to represent the class named cName, where cName, i.e., cName1 in line 10 is an input string to be read from a command line or a configuration file. In line 3, an object o is reflectively created as an instance of c1 by calling c1.newInstance() and then assigned to v with the declared type as Java.lang.Object in line 10. Subsequently, o is used in two common scenarios. In the if branch, o is downcast to a specific type, A, and then used appropriately. The else branch is more interesting. In line 14, a Method metaobject m is created by calling getMethod() indirectly in line 7, with its class name, method name and formal parameters specified by cName2, mName2 and “. . . ” (elided) in line 7, respectively. In line 15, this method is called reflectively on the receiver object o (pointed to by v) with the actual argument being passed in an array, new Object[] {x, y}. 1 Object createObj(String cName) { 2 Class c1 = Class.forName(cName); 3 return c1.newInstance(); 4 } 5 Method getMtd(String cName, String mName) { 6 Class c2 = Class.forName(cName); 7 return c2.getMethod(mName, ... ); 8 } 9 void foo(X x, Y y, … ) { 10 Object v = createObj(cName1); //cName1 is an input string 11 if ( … ) { 12 A a = (A) v; 13 } else { 14 Method m = getMtd(cName2, mName2); 15 m.invoke(v, new Object[] {x, y}); 16 } 17 } … … Fig. 1. An example of reflection usage abstracted from JDK 1.6.0 45. A reflection analysis infers, i.e., resolves statically the reflective targets accessed at reflective call sites. As usual, soundnesss demands over-approximation. Reflection introduces many challenges for static analysis. First, a modern reflection API is large and complex. Second, reflection is typically used as a means
of supporting dynamic adaptation of object-oriented software.As such.metaob- jects are often created reflectively as shown in Fig.1 from input strings.Thus, reflective object creation via newInstance()is hard to model statically.Finally. picking judicious approximations to balance soundness,precision and scalability is non-trivial.A simple-minded sound modeling of a reflective call (e.g.,by as- suming arbitrary behaviour)would destroy precision.Imprecision,in turn,often destroys scalability because too many spurious results would be computed. SOLAR automates reflection analysis for Java by working with a pointer anal- ysis.We first define some assumptions (Section 2.1).We then look at the three limitations of the prior work (Section 2.2).Finally,we introduce SOLAR to ad- dress these limitations by adopting three novel aspects in its design(Section 2.3). 2.1 Assumptions The first three are made previously on reflection analysis for Java [20,21].The last one is introduced to allow reflective allocation sites to be modeled lazily. Assumption 1 (Closed-World).Only the classes reachable from the class path at analysis time can be used during program erecution. This assumption is reasonable since we cannot expect static analysis to han- dle all classes that a program may conceivably download from the net and load at runtime.In addition,Java native methods are excluded as well. Assumption 2 (Well-Behaved Class Loaders).The name of the class returned by a call to Class.forName(cName)equals cName. Assumption 3(Correct Casts).Type cast operations applied to the results of reflective calls are correct,without throwing a ClassCastException. Assumption 4 (Object Reachability).Every object o created reflectively in a call to newInstance()fows into (i.e.,is used in)either (1)a type cast operation ...(T)v or (2)a call to invoke(v,...),get(v)or set(v,...),where v points to o,along every execution path in the program. As discussed in Section 4.2,Assumption 4 is found to hold for most reflective allocation sites in real code(as illustrated in Fig.1).Here,(1)and(2)represent two kinds of usage points at which the class types of object o will be inferred lazily.This makes it possible to handle reflective allocation sites more accurately than before and to reason about the soundness of SoLAR for the first time. 2.2 Past Work:Best-Effort Reflection Resolution All the existing solutions [5,17,18,20-22]adopt a best-effort approach to reflec- tion analysis,and consequently,suffer from the following three limitations: L1.Eager Heap Modeling An abstract object o created at a call to,e.g., c.newInstance()is modeled eagerly if its type c can be inferred from a string constant or intraprocedural post-dominant cast,and ignored otherwise.Specifi- cally,.if c represents a known class name,e.g,“A”,then o's type is“A”.Other- wise,an intraprocedurally post-dominating cast operation (T)operating on the result of the newInstance()call will allow c to be over-approximated as T or any of its subtypes.This eager approach often fails in real code shown in Fig.1, where cNamel is an input string and the cast is not post-dominating.Thus,its
of supporting dynamic adaptation of object-oriented software. As such, metaobjects are often created reflectively as shown in Fig. 1 from input strings. Thus, reflective object creation via newInstance() is hard to model statically. Finally, picking judicious approximations to balance soundness, precision and scalability is non-trivial. A simple-minded sound modeling of a reflective call (e.g., by assuming arbitrary behaviour) would destroy precision. Imprecision, in turn, often destroys scalability because too many spurious results would be computed. Solar automates reflection analysis for Java by working with a pointer analysis. We first define some assumptions (Section 2.1). We then look at the three limitations of the prior work (Section 2.2). Finally, we introduce Solar to address these limitations by adopting three novel aspects in its design (Section 2.3). 2.1 Assumptions The first three are made previously on reflection analysis for Java [20, 21]. The last one is introduced to allow reflective allocation sites to be modeled lazily. Assumption 1 (Closed-World). Only the classes reachable from the class path at analysis time can be used during program execution. This assumption is reasonable since we cannot expect static analysis to handle all classes that a program may conceivably download from the net and load at runtime. In addition, Java native methods are excluded as well. Assumption 2 (Well-Behaved Class Loaders). The name of the class returned by a call to Class.forName(cName) equals cName. Assumption 3 (Correct Casts). Type cast operations applied to the results of reflective calls are correct, without throwing a ClassCastException. Assumption 4 (Object Reachability). Every object o created reflectively in a call to newInstance() flows into (i.e., is used in) either (1) a type cast operation ...= (T) v or (2) a call to invoke(v,...), get(v) or set(v,...), where v points to o, along every execution path in the program. As discussed in Section 4.2, Assumption 4 is found to hold for most reflective allocation sites in real code (as illustrated in Fig. 1). Here, (1) and (2) represent two kinds of usage points at which the class types of object o will be inferred lazily. This makes it possible to handle reflective allocation sites more accurately than before and to reason about the soundness of Solar for the first time. 2.2 Past Work: Best-Effort Reflection Resolution All the existing solutions [5, 17, 18, 20–22] adopt a best-effort approach to reflection analysis, and consequently, suffer from the following three limitations: L1. Eager Heap Modeling An abstract object o created at a call to, e.g., c.newInstance() is modeled eagerly if its type c can be inferred from a string constant or intraprocedural post-dominant cast, and ignored otherwise. Specifi- cally, if c represents a known class name, e.g., “A”, then o’s type is “A”. Otherwise, an intraprocedurally post-dominating cast operation (T) operating on the result of the newInstance() call will allow c to be over-approximated as T or any of its subtypes. This eager approach often fails in real code shown in Fig. 1, where cName1 is an input string and the cast is not post-dominating. Thus, its
newInstance()call is ignored.Recently.in DoOP (r5459247-beta)18,the ob- jects created in line 10(or line 3)are assumed to be of type A by taking advantage of the non-post-dominating cast (A)in line 12 to analyze more code.However, the objects with other types created along both the if and else branches are ignored.In prior work,such under-approximate handling of newInstance()is a significant source of unsoundness,as a large part of the program called on the thus ignored objects has been rendered invisible for analysis. L2.Isolated Inferences Many reflective calls(e.g.,those in Fig.1)are related but analysed mostly in isolation,resulting in under-approximated behaviours.In [21],we presented a self-inferencing reflection analysis,called ELF,that can infer more targets at a reflective call site than before [5,17,18,20,22],by exploiting more information available(e.g.,from its arguments and return type).However, due to eager heap modeling,ELF will still ignore the invoke()call in line 15 as v points to objects of unknown types as discussed above. L3.Design-Time Soundness,Precision and Scalability When analysing a program heuristically,a best-effort approach does not know which reflective calls may potentially affect its soundness,precision and scalability.As a result, a developer is out of luck with a program if such best-effort analysis is either unscalable or scalable but with undesired soundness or precision or both. 2.3 SOLAR:Soundness-Guided Reflection Resolution Fig.2 illustrates the SoLAR design,with its three novel aspects marked by N1 -N3,where Ni is introduced to overcome the afore-mentioned limitation Li. Jave Scalable Satisfied Programs N1 Lazy Heap Modeling Prover N2 Collective inference Urscalable PROBE Lightweight Arnotations impreicse Unsound Reflective Cals Fig.2.SOLAR:a soundness-guided analysis with three novel aspects,N1-N3. N1.Lazy Heap Modeling (LHM)SOLAR handles reflective object creation lazily by delaying the creation of objects at their usage points where their types may be inferred,achieving significantly improved soundness and precision. Let us describe the basic idea behind using the example in Fig.1.As cName at c1 Class.forName(cName)in line 2 is unknown,SOLAR will create a Class metaobject ciu that represents this unknown class and assign it to c1.As ci points to c1"at the allocation site v=c1.newInstance()in line 3,SOLAR will create an abstract object o of an unknown type for the site to mark it as being unresolved yet.Subsequently,o will flow into two usage points:Case(I)a type cast operation in line 12 and Case (II)a reflective method call site in line 15. In Case (I),where o is downcast to A,its type u is inferred to be A or any of its subtypes.Let t1,...,tn be all the inferred types.Then o is split into n
newInstance() call is ignored. Recently, in Doop (r5459247-beta) [18], the objects created in line 10 (or line 3) are assumed to be of type A by taking advantage of the non-post-dominating cast (A) in line 12 to analyze more code. However, the objects with other types created along both the if and else branches are ignored. In prior work, such under-approximate handling of newInstance() is a significant source of unsoundness, as a large part of the program called on the thus ignored objects has been rendered invisible for analysis. L2. Isolated Inferences Many reflective calls (e.g., those in Fig. 1) are related but analysed mostly in isolation, resulting in under-approximated behaviours. In [21], we presented a self-inferencing reflection analysis, called Elf, that can infer more targets at a reflective call site than before [5, 17, 18, 20, 22], by exploiting more information available (e.g., from its arguments and return type). However, due to eager heap modeling, Elf will still ignore the invoke() call in line 15 as v points to objects of unknown types as discussed above. L3. Design-Time Soundness, Precision and Scalability When analysing a program heuristically, a best-effort approach does not know which reflective calls may potentially affect its soundness, precision and scalability. As a result, a developer is out of luck with a program if such best-effort analysis is either unscalable or scalable but with undesired soundness or precision or both. 2.3 Solar: Soundness-Guided Reflection Resolution Fig. 2 illustrates the Solar design, with its three novel aspects marked by N1 – N3, where Ni is introduced to overcome the afore-mentioned limitation Li. Scalable Unscalable Satisfied Soundness Proven Java Programs Violated Impreicse Unsound N1 Lazy Heap Modelling Soundness Criteria Imprecision Criteria ( by Customized Thresholds ) PROBE Reflective Calls Lightweight Annotations Automatic Identification of Problematic Reflective Calls N2 Collective Inference N3 Fig. 2. Solar: a soundness-guided analysis with three novel aspects, N1 – N3. N1. Lazy Heap Modeling (LHM) Solar handles reflective object creation lazily by delaying the creation of objects at their usage points where their types may be inferred, achieving significantly improved soundness and precision. Let us describe the basic idea behind using the example in Fig. 1. As cName at c1 = Class.forName(cName) in line 2 is unknown, Solar will create a Class metaobject c1u that represents this unknown class and assign it to c1. As c1 points to c1u at the allocation site v = c1.newInstance() in line 3, Solar will create an abstract object o u 3 of an unknown type for the site to mark it as being unresolved yet. Subsequently, o u 3 will flow into two usage points: Case (I) a type cast operation in line 12 and Case (II) a reflective method call site in line 15. In Case (I), where o u 3 is downcast to A, its type u is inferred to be A or any of its subtypes. Let t1, . . . , tn be all the inferred types. Then o u 3 is split into n
distinct objects o3,...,o"to be assigned to a in line 12.In Case (II),SOLAR will infer u by performing a collective inference as described below,based on the information available in line 15.Let t,...,tm be all the inferred types.Then og is split into m distinct objectsoto be assigned to v in line 15. According to Assumption 4 that states a key observation validated later,a reflectively created object like o is typically used in either Case(I)or Case(II) along every program path.The only but rare exception is that o is created but never used later.Then the corresponding constructor must be annotated to be analyzed statically unless ignoring it will not affect the points-to information. N2.Collective Inference SoLAR builds on the prior work [5,17,18,20,22,21] by relying on collective inference emphasized for the first time in reflection anal- ysis.Let us return to the invoke()call,which cannot be analyzed previously. As v points to o,SoLAR can infer u based on the information available at the call site.This happens when Case (1)cName2 is known or Case(2)cName2 is unknown but mName2 is known.In SOLAR,inference is performed "collectively" whereby inferences on related reflective calls(lines 3,6 and 15 for Case(1)and lines 3.7 and 15 for Case(2))can mutually reinforce each other.We will examine the second case,i.e.,the more complex of the two,in Section 3.4.3.This paper is the first to do so by exploiting the connection between newInstance()(via LHM)and reflective calls for manipulating methods and fields. N3.Automatic Identification of "Problematic"Reflective Calls Due to this capability,SoLAR is the first that can reason about its soundness.When such reasoning is not possible due to,e.g.,native code,SoLAR reduces to an effective under-approximate analysis due to its soundness-guided design,allowing a disciplined tradeoff to be made among soundness,precision and scalability. If SOLAR is scalable for a program,SoLAR can automatically identify "prob- lematic"reflective calls (as opposed to reporting input strings as in [20])that may threaten its soundness and precision to enable both to be improved with lightweight annotations.If SoLAR is unscalable for a program,a simplified ver- sion of SOLAR,denoted PROBE in Fig.2,is called for next.With some "prob- lematic"reflective calls to be annotated,SoLAR will re-analyze the program, scalably after one or more iterations of this "probing"process.We envisage providing a range of PROBE variants with different tradeoffs among soundness, precision and scalability,so that the scalability of PROBE is always guaranteed. Consider Fig.1 again.If both cName2 and mName2 are unknown (given that the type of o is unknown),then SoLAR will flag the invoke()call in line 15 as being potentially unsoundly resolved,detected automatically by verifying Condition(3)in Section 3.5.In addition,SoLAR will also automatically highlight reflective calls that may be potentially imprecisely resolved.Their lightweight annotations will allow SoLAR to yield improved soundness and precision. Discussion Under Assumptions 1-4,we can establish the soundness of So- LAR by verifying a soundness criterion (given in Section 3.5).Otherwise,our soundness-guided approach has made SoLAR demonstrably more effective than existing under-approximate reflection analyses [5,17,20,21]as validated later
distinct objects o t1 3 ,. . . , o tn 3 to be assigned to a in line 12. In Case (II), Solar will infer u by performing a collective inference as described below, based on the information available in line 15. Let t 1 1 , . . . , t1 m be all the inferred types. Then o u 3 is split into m distinct objects o t 1 1 3 ,. . . , o t 1 m 3 to be assigned to v in line 15. According to Assumption 4 that states a key observation validated later, a reflectively created object like o u 3 is typically used in either Case (I) or Case (II) along every program path. The only but rare exception is that o u 3 is created but never used later. Then the corresponding constructor must be annotated to be analyzed statically unless ignoring it will not affect the points-to information. N2. Collective Inference Solar builds on the prior work [5, 17, 18, 20, 22, 21] by relying on collective inference emphasized for the first time in reflection analysis. Let us return to the invoke() call, which cannot be analyzed previously. As v points to o u 3 , Solar can infer u based on the information available at the call site. This happens when Case (1) cName2 is known or Case (2) cName2 is unknown but mName2 is known. In Solar, inference is performed “collectively”, whereby inferences on related reflective calls (lines 3, 6 and 15 for Case (1) and lines 3, 7 and 15 for Case (2)) can mutually reinforce each other. We will examine the second case, i.e., the more complex of the two, in Section 3.4.3. This paper is the first to do so by exploiting the connection between newInstance() (via LHM) and reflective calls for manipulating methods and fields. N3. Automatic Identification of “Problematic” Reflective Calls Due to this capability, Solar is the first that can reason about its soundness. When such reasoning is not possible due to, e.g., native code, Solar reduces to an effective under-approximate analysis due to its soundness-guided design, allowing a disciplined tradeoff to be made among soundness, precision and scalability. If Solar is scalable for a program, Solar can automatically identify “problematic” reflective calls (as opposed to reporting input strings as in [20]) that may threaten its soundness and precision to enable both to be improved with lightweight annotations. If Solar is unscalable for a program, a simplified version of Solar, denoted Probe in Fig. 2, is called for next. With some “problematic” reflective calls to be annotated, Solar will re-analyze the program, scalably after one or more iterations of this “probing” process. We envisage providing a range of Probe variants with different tradeoffs among soundness, precision and scalability, so that the scalability of Probe is always guaranteed. Consider Fig. 1 again. If both cName2 and mName2 are unknown (given that the type of o u 3 is unknown), then Solar will flag the invoke() call in line 15 as being potentially unsoundly resolved, detected automatically by verifying Condition (3) in Section 3.5. In addition, Solar will also automatically highlight reflective calls that may be potentially imprecisely resolved. Their lightweight annotations will allow Solar to yield improved soundness and precision. Discussion Under Assumptions 1 – 4, we can establish the soundness of Solar by verifying a soundness criterion (given in Section 3.5). Otherwise, our soundness-guided approach has made Solar demonstrably more effective than existing under-approximate reflection analyses [5, 17, 20, 21] as validated later
3 Formalism We formalise SOLAR,illustrated in Fig.2,for REFJAVA,which is Java restricted to a small core of its reflection API.SOLAR is flow-insensitive but context- sensitive.However.our formalisation is context-insensitive. 3.1 The REFJAVA Language REFJAVA consists of all Java programs(under Assumptions 1-4)except that the Java reflection API is restricted to the four methods in Fig.1:Class.forName(), newInstance(),getMethod()and invoke().Our formalism is designed to allow its straightforward generalization to the entire API.For example,reflective field accesses via getField(),get()and set()can be handled similarly.As is stan- dard,a Java program is represented only by five kinds of statements in the SSA form,as shown in Fig.5.For simplicity,we assume that all the methods of a class accessed reflectively are its instance members,i.e.,v null in invoke(v,...) in Fig.1.We will discuss how to handle static members in Section 3.9. 3.2 Road Map Inference As depicted in Fig.3,SOLAR's in- ference system,which consists of five components,works together with a pointer analysis..The arrow←→be tween a component and the pointer analysis indicates that each is both a producer and consumer of the other. Fig.3.SOLAR's inference system. Let us see how SOLAR resolves the invoke()call in Fig.1.If cName2 and mName2 are string constants,Propagation will create a Method metaobject(pointed to by m)carrying its known class and method information and pass it to Target Search (1).If cName2 or mName2 is not a constant,a Method metaobject marked as such is created and passed to Inference(2),which will infer the missing information and pass a freshly gener- ated Method metaobject enriched with the missing information to Target Search (3).Then Target Search maps a Method metaobject to its reflective target mtd in its declaring class().Finally,Transformation turns the reflective call in line 15 into a regular call v.mtd(...)and pass it to the pointer analysis (5).Lazy Heap Modeling handles newInstance()in Fig.1 to resolve the dynamic type of v based on the information discovered by Propagation (@)or Inference (b). 3.3 Notations We will use the notations in Fig.4.A method signature consists of the method name and descriptor (i.e.,return type and parameter types)and,a method is specified by its method signature and the class where it is declared or inherited. CO and MO represent the set of Class and Method metaobjects,respectively. In particular,ct denotes a Class metaobject of a known class t and c"a Class metaobject of an unknown class u.As illustrated earlier with Fig.1,we write of
3 Formalism We formalise Solar, illustrated in Fig. 2, for RefJava, which is Java restricted to a small core of its reflection API. Solar is flow-insensitive but contextsensitive. However, our formalisation is context-insensitive. 3.1 The RefJava Language RefJava consists of all Java programs (under Assumptions 1 – 4) except that the Java reflection API is restricted to the four methods in Fig. 1: Class.forName(), newInstance(), getMethod() and invoke(). Our formalism is designed to allow its straightforward generalization to the entire API. For example, reflective field accesses via getField(), get() and set() can be handled similarly. As is standard, a Java program is represented only by five kinds of statements in the SSA form, as shown in Fig. 5. For simplicity, we assume that all the methods of a class accessed reflectively are its instance members, i.e., v ‰ null in invoke(v,...) in Fig. 1. We will discuss how to handle static members in Section 3.9. 3.2 Road Map Inference Transformation Pointer Analysis Propagation Lazy Heap Modeling Target Search 1 4 3 2 a 5 b Fig. 3. Solar’s inference system. As depicted in Fig. 3, Solar’s inference system, which consists of five components, works together with a pointer analysis. The arrow ÐÑ between a component and the pointer analysis indicates that each is both a producer and consumer of the other. Let us see how Solar resolves the invoke() call in Fig. 1. If cName2 and mName2 are string constants, Propagation will create a Method metaobject (pointed to by m) carrying its known class and method information and pass it to Target Search ( 1 ). If cName2 or mName2 is not a constant, a Method metaobject marked as such is created and passed to Inference ( 2 ), which will infer the missing information and pass a freshly generated Method metaobject enriched with the missing information to Target Search ( 3 ). Then Target Search maps a Method metaobject to its reflective target mtd in its declaring class ( 4 ). Finally, Transformation turns the reflective call in line 15 into a regular call v.mtd(...) and pass it to the pointer analysis ( 5 ). Lazy Heap Modeling handles newInstance() in Fig. 1 to resolve the dynamic type of v based on the information discovered by Propagation ( a ) orInference ( b ). 3.3 Notations We will use the notations in Fig. 4. A method signature consists of the method name and descriptor (i.e., return type and parameter types) and, a method is specified by its method signature and the class where it is declared or inherited. CO and MO represent the set of Class and Method metaobjects, respectively. In particular, c t denotes a Class metaobject of a known class t and c u a Class metaobject of an unknown class u. As illustrated earlier with Fig. 1, we write o t i
to represent an abstract object created at an allocation site i if it is an instance of a known class t and o!of (an unknown class type)otherwise.For a Method metaobject,we write m if it is a member in a known class t and my otherwise, with its signature being s.In particular,we write m as a shorthand for m when s is unknown (with the return type s.t,being ignored),i.e.,when s.nm s.p =u. class type teT Class metaobject ct,c"E CO method name nm∈N Method metaobject*mg,mi,g,:∈M0=个×Sm parameter types pE P TO UTI UT2... method signature*seSm=个×N×企 method m∈M=TXT×N×P return type* 8.tr∈T local variable c.m∈y method name* s.nm∈A abstract heap object of,o5,,oi,o,…∈parameter types*s-p∈p Fig.4.Notations (=Xfu),where u is an unknown class type or an unknown method signature).A superscript *marks a domain that contains u. 3.4 The SOLAR's Inference System We present the inference rules used by all the components in Fig.3,starting with the pointer analysis and moving to the five components of SoLAR. 3.4.1 Pointer Analysis Fig.5 gives a standard formulation of a flow-insensitive Andersen's pointer analysis for REFJAVA.pt(r)represents the points-to set of a pointer z.An array object is analyzed with its elements collapsed to a single field,denoted arr.For example,x[i]y can be seen as x.arr y.In [A-NEw), of uniquely identifies the abstract object created as an instance of t at this allocation site,labeled by i.In [A-LD]and [A-sr],the field accesses are handled. 9N时0阿o时2国ks x=y {oi}∈pt(x) x=y.m(argi:....argn)of E pt(y)m'=dispatch(of,m) o)Ept(m))pt(me)9t)V1≤k≤:p(args)EpmA-Cu叫 Fig.5.Rules for Pointer Analysis. In [A-CALL](for non-reflective calls),the function dispatch(o;,m)is used to resolve the virtual dispatch of method m on the receiver object of to be m'(when m is invokable on of).Following [24],we assume that m'has a formal parameter mthis for the receiver object and mpl,...,mpn for the remaining parameters, and a pseudo-variable met is used to hold the return value of m'. 3.4.2 Propagation Fig.6 gives the rules for forName()and getMethod() calls.Depending on whether their arguments are string constants or not,differ- ent kinds of Class and Method metaobjects are created.SC is a set of string constants and toClass returns a Class metaobject c,where t is the class speci- fied by the string value returned by val(o;)(with val:H-java.lang.String). By design,ct and m will flow to Target Search but all the others,i.e.,c",m" and m will fow to Inference,where the missing information is inferred.During Propagation,only the name of a method signature s (i.e.,s.nm)can be discovered but its other parts are unknown:s.t=s.p u
to represent an abstract object created at an allocation site i if it is an instance of a known class t and o u i of (an unknown class type) otherwise. For a Method metaobject, we write m t s if it is a member in a known class t and m u s otherwise, with its signature being s. In particular, we write mu as a shorthand for ms when s is unknown (with the return type s.tr being ignored), i.e., when s.nm “ s.p “ u. class type t P T Class metaobject c t , c u P CO method name nm P N Method metaobject* m t s , m t u, m u s , m u u P MO = Tp ˆ Sm parameter types p P P “ T 0 Y T 1 Y T 2 . . . method signature* s P Sm = Tp ˆ Np ˆ Pp method m P M “ T ˆ T ˆ N ˆ P return type* s.tr P Tp local variable c, m P V method name* s.nm P Np abstract heap object o t 1 , ot 2 , . . . , ou 1 , ou 2 , ¨ ¨ ¨ P H parameter types* s.p P Pp Fig. 4. Notations (Xp “ X Y tuu, where u is an unknown class type or an unknown method signature). A superscript ‘*’ marks a domain that contains u. 3.4 The Solar’s Inference System We present the inference rules used by all the components in Fig. 3, starting with the pointer analysis and moving to the five components of Solar. 3.4.1 Pointer Analysis Fig. 5 gives a standard formulation of a flow-insensitive Andersen’s pointer analysis for RefJava. pt(x) represents the points-to set of a pointer x. An array object is analyzed with its elements collapsed to a single field, denoted arr. For example, x[i] = y can be seen as x.arr = y. In [A-New], o t i uniquely identifies the abstract object created as an instance of t at this allocation site, labeled by i. In [A-Ld] and [A-St], the field accesses are handled. i : x “ new tpq to t i u P ptpxq [A-New] x “ y ptpyq Ď ptpxq [A-Cpy] x “ y.f o t i P pt(y) ptpo t i .fq Ď pt(x) [A-Ld] x.f “ y o t i P pt(x) pt(y) Ď ptpo t i .fq [A-St] x “ y.mparg1 , ..., argnq oi P pt(y) m1 “ dispatchpoi , mq toi u Ď ptpm1 this) ptpm1 ret) Ď pt(x) @ 1 ď k ď n : ptpargkq Ď ptpm1 pk) [A-Call] Fig. 5. Rules for Pointer Analysis. In [A-Call] (for non-reflective calls), the function dispatchpoi , mq is used to resolve the virtual dispatch of method m on the receiver object oi to be m1 (when m is invokable on oi ). Following [24], we assume that m1 has a formal parameter m1 this for the receiver object and m1 p1 , . . . , m1 pn for the remaining parameters, and a pseudo-variable m1 ret is used to hold the return value of m1 . 3.4.2 Propagation Fig. 6 gives the rules for forName() and getMethod() calls. Depending on whether their arguments are string constants or not, different kinds of Class and Method metaobjects are created. SC is a set of string constants and toClass returns a Class metaobject c t , where t is the class speci- fied by the string value returned by val(oi) (with val : H Ñ java.lang.String). By design, c t and m t s will flow to Target Search but all the others, i.e., c u , m u and mu will flow to Inference, where the missing information is inferred. During Propagation, only the name of a method signature s (i.e., s.nm) can be discovered but its other parts are unknown: s.tr “ s.p “ u
Classc-Class.forName(cName)Ept(eName)P.FoRNAME g{g码sc c=toClass(val(o)) Method m=c.getMethod(mName,.)ept(nName)c∈pt(eP.GETMTD] 「{}ifc-=c'AOtrin6 E3C pt(m)2 (ifc--cf A orine Sc 8.tr u {og}ifc=c“Aorins ESC s.nm =val(osrin) 8.p=u a}ifc=c“Aoin Fig.6.Rules for Propagation. 3.4.3 Inference Fig.7 gives three rules to infer the reflective target methods for x =(A)m.invoke(y,args),where A indicates a post-dominating cast on its result.If A =Object,then no such cast exists.In [I-INvTPl,we use the types of the objects pointed to by y to infer the class types of the target methods called. Note that m'represents a freshly generated Method metaobject.In [I-INvsic),we use the information available at a call site(excluding y)to infer the descriptor in the signature of a target method.In [-INvS2T],we use the signature of a method to infer the class types of the method. x =(A)m.invoke(y,args)muE pt(m) oke(y.args},m-Pl-INvT到ptm2{m2】s.pe Pip(args),a.tr《:A,s.nm=u}-N pt(m)2{moe pt(y)) x(A)m.invoke(y.args)pt(m)oe pt(y)Ptp(args)-IvS2T) pt(m)2{mltE M(s.tr.s.nm:s.p)} Fig.7.Rules for Inference. As is standard,t<:t'holds when t is t'or a subtype of t'.In [I-INvSic)and [-INvs2T],:is used to take advantage of the post-dominating cast (A)during inference when A is not Object.By definition,u:Object holds.If t'is not Object,then t:t'holds if and only if t <t'or t'<:t holds.The information on args is also exploited,where args is an array of type object []only when it can be analyzed exactly element-wise by an intraprocedural analysis.In this case,suppose that args is an array of n elements.Let Ai be the set of types of the objects pointed to by its i-th element,args [i].Let P={t'tAi,t<:t'). Then Ptp(args)=Po×·×Pn-i.Otherwise,Ptp(args)=,implying that args is ignored as it cannot be exploited effectively during inference. To maintain precision in [I-INvs2T],we use a method signature to infer its classes when both its name and descriptor are known.In this rule,the function M(st,,s.nm,s.p)returns the set of class types where the method with the speci- fied signature s is declared if s.nmu and s.pu,and otherwise.The return type of the matching method is ignored if s.tr =u. Let us illustrate some of our rules by considering our example in Fig.1. Erample 1.Note that cName1 is an input string.Suppose that cName2 is also an input string but mName2 is a string constant.By applying [P-FORNAME),[P-GErMTD] and [L-UKWTP](in Fig.9)to the calls to forName()in lines 2 and 6,getMethod() and newInstance(),respectively,we obtain c1“∈pt(c1),c2“∈pt(c2),mg∈ pt(m)and o"e pt(v),where s is a signature with a known method name in mName2.Given args new object [{x,y),Ptp(args)is built as described
Class c “ Class.forNamepcNameq o String i P pt(cName) ptpcq Ě " tc t u if o String i P SC tc uu otherwise c t “ toClasspvalpo String i qq [P-ForName] Method m “ c 1 .getMethodpmName, ...q o String i P pt(mName) c P pt(c1 ) ptpmq Ě $ ’’’& ’’’% tm t s u if c “ c t ^ o String i P SC tm t uu if c “ c t ^ o String i R SC tm u s u if c “ c u ^ o String i P SC tm u uu if c “ c u ^ o String i R SC s.tr “ u s.nm “ valpo String i q s.p “ u [P-GetMtd] Fig. 6. Rules for Propagation. 3.4.3 Inference Fig. 7 gives three rules to infer the reflective target methods for x = (A) m.invoke(y,args), where A indicates a post-dominating cast on its result. If A = Object, then no such cast exists. In [I-InvTp], we use the types of the objects pointed to by y to infer the class types of the target methods called. Note that m t represents a freshly generated Method metaobject. In [I-InvSig], we use the information available at a call site (excluding y) to infer the descriptor in the signature of a target method. In [I-InvS2T], we use the signature of a method to infer the class types of the method. m.invoke(y, args) m u P pt(m) pt(m) Ě t m t | o t i P pt(y)u [I-InvTp] x “ pAq m.invoke(y, args) mu P pt(m) pt(m) Ě t ms | s.p P P tppargsq, s.tr !: A, s.nm “ uu [I-InvSig] x “ pAq m.invoke(y, args) m u s P pt(m) o u i P pt(y) s.tr !: A s.nm ‰ u s.p P P tppargsq pt(m) Ě t m t s | t P Mps.tr, s.nm, s.pqu [I-InvS2T] Fig. 7. Rules for Inference. As is standard, t ă: t 1 holds when t is t 1 or a subtype of t 1 . In [I-InvSig] and [I-InvS2T], !: is used to take advantage of the post-dominating cast (A) during inference when A is not Object. By definition, u !: Object holds. If t 1 is not Object, then t !: t 1 holds if and only if t ă: t 1 or t 1 ă: t holds. The information on args is also exploited, where args is an array of type Object[], only when it can be analyzed exactly element-wise by an intraprocedural analysis. In this case, suppose that args is an array of n elements. Let Ai be the set of types of the objects pointed to by its i-th element, args[i]. Let Pi “ tt 1 | t P Ai , t ă: t 1 u. Then P tppargsq “ P0 ˆ ¨ ¨ ¨ ˆ Pn´1. Otherwise, P tppargsq “ ∅, implying that args is ignored as it cannot be exploited effectively during inference. To maintain precision in [I-InvS2T], we use a method signature to infer its classes when both its name and descriptor are known. In this rule, the function Mpstr , s.nm, s.pq returns the set of class types where the method with the speci- fied signature s is declared if s.nm ‰ u and s.p ‰ u, and ∅ otherwise. The return type of the matching method is ignored if s.tr “ u. Let us illustrate some of our rules by considering our example in Fig. 1. Example 1. Note that cName1 is an input string. Suppose that cName2 is also an input string but mName2 is a string constant. By applying [P-ForName], [P-GetMtd] and [L-UkwTp] (in Fig. 9) to the calls to forName() in lines 2 and 6, getMethod() and newInstance(), respectively, we obtain c1u P ptpc1q, c2u P ptpc2q, m u s P ptpmq and o u i P ptpvq, where s is a signature with a known method name in mName2. Given args = new Object[] {x,y}, P tppargsq is built as described
earlier.SOLAR can infer the classes t where this method is declared by [I-INvS2T). Finally,SOLAR will add all inferred Method objects m to pt(m)at the call site. 3.4.4 Target Search For a Method object m in a known class t(with s being possibly u),we define MTD:MO-P(M)to find all target methods matched: MTD(m)=mtdLookUp(t',s.tr,s.nm;s.p) (1) t<:t' where mtdLookUp is the standard lookup function for finding the methods ac- cording to a declaring class t'and a signature s except that (1)the return type s.tr is also considered and(2)any u that appears in s is treated as a wild card. 3.4.5 Transformation Fig.8 gives the rules used for transforming a reflective call into a regular statement,which will be analyzed by the pointer analysis. x=m.invoke(y,args)m E pt(m)m'E MTD(m')oE pt(args) of E pt(of.arr)t"is declaring type of mpk k[1.n]t'<:t" [T-INV] of C pt(argk)x =y.m'(arg1:...,argn) Fig.8.Rules for Transformation. Let us examine [T-INv]in more detail.The second argument args points to a 1- D array of type Object []with its elements collapsed to a single field arr during the pointer analysis,unless args can be analyzed exactly intraprocedurally in our current implementation.Let argi,...,argn be the n freshly created arguments to be passed to each potential target method m'found by Target Search.Let mpi,...,mo be the n parameters (excluding this)of m',such that the declaring type of mk is t".We include o to pt(argk)only when holds in order to filter out the objects that cannot be assigned to mk Finally,the regular call obtained can be analyzed by [A-CALL]in Fig.5. 3.4.6 Lazy Heap Modeling Fig.9 gives the rules for resolving newInstance() lazily.In [L-KWTel,for each Class object ct pointed to by c',an object,of,is created as an instance of this known type at allocation site i straightaway.In IL-UKWTP),as illustrated with Fig.1,o is created to enable LHM if c'points to a cu instead.Then its lazy object creation happens at a type cast by applying [L-CAsr)(with o"blocked from flowing from x to a)and an invoke()call site by applying [L-INv).Note that A is assumed not to be Object in [L-CAsr]. i:v=.newInstance(cept()LKwTr i:v=d.newinstance(cept(c)UKWT {oi}∈pt(v) {or}∈pt(v) Aa=(A)x opt()te:AL-CAST]=m.incoke()oiept(y)∈ptm)t《:tL-liW {oi}∈pt(a) (o =pt(y) Fig.9.Rules for Lazy Heap Modeling. 3.5 Soundness Criterion REFJAVA consists of the four methods from the Java reflection API as shown in Fig.1.SoLAR is sound if their calls are resolved soundly under Assumptions 1 -4.By construction,calls to Class.forName()and getMethod()are always
earlier. Solar can infer the classes t where this method is declared by [I-InvS2T]. Finally, Solar will add all inferred Method objects m t s to ptpmq at the call site. 3.4.4 Target Search For a Method object m t s in a known class t (with s being possibly u), we define MT D : MO Ñ PpMq to find all target methods matched: MT Dpm t sq “ ď tă:t 1 mtdLookU ppt 1 , s.tr, s.nm, s.pq (1) where mtdLookU p is the standard lookup function for finding the methods according to a declaring class t 1 and a signature s except that (1) the return type s.tr is also considered and (2) any u that appears in s is treated as a wild card. 3.4.5 Transformation Fig. 8 gives the rules used for transforming a reflective call into a regular statement, which will be analyzed by the pointer analysis. x “ m.invoke(y, args) m t P pt(m) m1 P MTDpm t q oi P pt(args) o t 1 j P ptpoi .arrq t 2 is declaring type of m1 pk k P r1, ns t 1 ă: t 2 to t 1 j u Ď ptpargkq x “ y.m1 parg1 , ..., argnq [T-Inv] Fig. 8. Rules for Transformation. Let us examine [T-Inv] in more detail. The second argument args points to a 1- D array of type Object[], with its elements collapsed to a single field arr during the pointer analysis, unless args can be analyzed exactly intraprocedurally in our current implementation. Let arg1,. . . , argn be the n freshly created arguments to be passed to each potential target method m1 found by Target Search. Let m1 p1 , . . . , m1 pn be the n parameters (excluding this) of m1 , such that the declaring type of m1 pk is t 2 . We include o t 1 j to ptpargk q only when t 1 ă: t 2 holds in order to filter out the objects that cannot be assigned to m1 pk. Finally, the regular call obtained can be analyzed by [A-Call] in Fig. 5. 3.4.6 Lazy Heap Modeling Fig. 9 gives the rules for resolving newInstance() lazily. In [L-KwTp], for each Class object c t pointed to by c 1 , an object, o t i , is created as an instance of this known type at allocation site i straightaway. In [L-UkwTp], as illustrated with Fig. 1, o u i is created to enable LHM if c 1 points to a c u instead. Then its lazy object creation happens at a type cast by applying [L-Cast] (with o u i blocked from flowing from x to a) and an invoke() call site by applying [L-Inv]. Note that A is assumed not to be Object in [L-Cast]. i : v “ c 1 .newInstancepq c t P pt(c1 ) to t i u Ď pt(v) [L-KwTp] i : v “ c 1 .newInstancepq c u P pt(c1 ) to u i u Ď pt(v) [L-UkwTp] A a “ pAq x o u i P pt(x) t ă: A to t i u Ď ptpaq [L-Cast] x “ m.invokepy, ...q o u i P pt(y) m t P pt(m) t 1 !: t to t 1 i u Ď pt(y) [L-Inv] Fig. 9. Rules for Lazy Heap Modeling. 3.5 Soundness Criterion RefJava consists of the four methods from the Java reflection API as shown in Fig. 1. Solar is sound if their calls are resolved soundly under Assumptions 1 – 4. By construction, calls to Class.forName() and getMethod() are always
soundly resolved(with the metaobjects created being modelled appropriately). Due to Assumption 4,there is no need to consider newInstance()calls since they are soundly resolved if all invoke()calls are.For convenience,we define: AllKwn(v)=o∈pt(o) (2) which means that the dynamic type of every object pointed to by v is known. Consider Fig.7.For the Method metaobjects m with known classes t,these targets can be soundly resolved by Target Search,except that the signatures s can be further refined by applying [-Ixvsic).For the Method objects m"with unknown class types u,the targets accessed are inferred by [I-INVTrl and [-INvS2T). Let us consider a call to (A)m.invoke(y,args).SOLAR attempts to infer the missing classes of its Method metaobjects in two ways,by applying [I-INVTr]and UI-INVS2T].Such a call is soundly resolved if the following condition holds: SC(m.invoke(y,args)=AllKwn(y)vHm#∈pt(m):s.nm≠u Ptp(args)≠②(3) If the first disjunct holds,applying [-INVTP]to invoke()can over-approximate its target methods from the types of all objects pointed to by y.Thus,every Method metaobject m4 E pt(m)is refined into a new one m for every of e pt(y). If the second disjunct holds,then [-Ixvs2T]comes into play.Its targets are over-approximated based on the known method names s.nm and the types of the objects pointed to by args.Thus,every Method metaobject mE pt(m)is refined into a new one m,where s.tr<:A and s.pe Ptp(args).Note that s.t,is leveraged only when it is not u.The post-dominating cast (A)is considered not to exist if A Object.In this case,u Object holds (only for u). Theorem 1.SOLAR is sound for REFJAVA if SC(c)holds at every reflective call c of the form "(A)m.invoke(y,args)"under Assumptions 1-4. 3.6 Identifying Unsoundly Resolved Reflective Calls SOLAR flags a call c to invoke()as resolved unsoundly if SC(c)is false.This can be conservative as some points-to information at c can be over-approximate. However,our evaluation shows that SoLAR can analyze 7 out of the 10 large programs considered scalably with full automation,implying that its inference system is powerful and precise.In addition,all 13 unsound calls reported by SOLAR in the remaining three programs are truly unsound,as discussed in Sec- tion 4.4,validating SOLAR's effectiveness in identifying unsoundness. 3.7 Identifying Imprecisely Resolved Reflective Calls Presently,SoLAR performs this task depicted in Fig.2,by simply ranking the reflective call sites according to the number of reflective targets inferred.This simple metric often gives a good indication about the sources of imprecision. 3.8 PROBE For evaluation purposes,we instantiate PROBE,as shown in Fig.2,from SOLAR as follows.We refrain from performing SOLAR's LHM(by retaining [L-UKWTe)but
soundly resolved (with the metaobjects created being modelled appropriately). Due to Assumption 4, there is no need to consider newInstance() calls since they are soundly resolved if all invoke() calls are. For convenience, we define: AllKwnpvq “ E o u i P ptpvq (2) which means that the dynamic type of every object pointed to by v is known. Consider Fig. 7. For the Method metaobjects m t s with known classes t, these targets can be soundly resolved by Target Search, except that the signatures s can be further refined by applying [I-InvSig]. For the Method objects m u s with unknown class types u, the targets accessed are inferred by [I-InvTp] and [I-InvS2T]. Let us consider a call to (A) m.invoke(y, args). Solar attempts to infer the missing classes of its Method metaobjects in two ways, by applying [I-InvTp] and [I-InvS2T]. Such a call is soundly resolved if the following condition holds: SCpm.invoke(y,args)q “ AllKwnpyq_ @ m u s P ptpmq : s.nm ‰ u^P tppargsq ‰ ∅ (3) If the first disjunct holds, applying [I-InvTp] to invoke() can over-approximate its target methods from the types of all objects pointed to by y. Thus, every Method metaobject m u P ptpmq is refined into a new one m t for every o t i P ptpyq. If the second disjunct holds, then [I-InvS2T] comes into play. Its targets are over-approximated based on the known method names s.nm and the types of the objects pointed to by args. Thus, every Method metaobject m u s P ptpmq is refined into a new one m t s , where s.tr !: A and s.p P P tppargsq ‰ ∅. Note that s.tr is leveraged only when it is not u. The post-dominating cast (A) is considered not to exist if A = Object. In this case, u !: Object holds (only for u). Theorem 1. Solar is sound for RefJava if SCpcq holds at every reflective call c of the form “(A) m.invoke(y, args)” under Assumptions 1 – 4. 3.6 Identifying Unsoundly Resolved Reflective Calls Solar flags a call c to invoke() as resolved unsoundly if SCpcq is false. This can be conservative as some points-to information at c can be over-approximate. However, our evaluation shows that Solar can analyze 7 out of the 10 large programs considered scalably with full automation, implying that its inference system is powerful and precise. In addition, all 13 unsound calls reported by Solar in the remaining three programs are truly unsound, as discussed in Section 4.4, validating Solar’s effectiveness in identifying unsoundness. 3.7 Identifying Imprecisely Resolved Reflective Calls Presently, Solar performs this task depicted in Fig. 2, by simply ranking the reflective call sites according to the number of reflective targets inferred. This simple metric often gives a good indication about the sources of imprecision. 3.8 Probe For evaluation purposes, we instantiate Probe, as shown in Fig. 2, from Solar as follows. We refrain from performing Solar’s LHM (by retaining [L-UkwTp] but