Design of a Certifying Compiler Supporting Proof of Program Safety Yiyun Chen Lin Ge Baojian Hua Zhaopeng Li Cheng Liu Department of Computer Science and Technology University of Science and Technology of China Hefei,Anhui 230026,China yiyun @ustc.edu.cn gelin,huabj,zpli,liuc5 @mail.ustc.edu.cn Abstract safety policies is one of the hot topics in this research field. In the past decade,great progress has been made in the Safety is an important property of high-assurance soft- area of program verification.Many projects adopt type- ware,and one of the hot research topics on it is the veri- based or logic-based approaches to reason about the prop- fication method for software to meet its safety policies.In erties of low-level or high-level programs.The TAL (Typed our previous work,we designed a pointer logic system and Assembly Language)[15]project and the theory of type re- proposed a framework for developing and verifying safety- finements [13]are two typical projects using type-based ap- critical programs.And in this paper,we present the design proaches,while Proof-Carrying Code(PCC)[16],Founda- and implementation of a certifying compiler based on that tional PCC(FPCC)[1].Certified Assembly Programming framework.Here we will mainly explain verification con- (CAP)[19]and Stack-based CAP (SCAP)[6]are typical dition generation,generation of code and assertions,and projects on logic-based techniques.Type-based and logic- proof generation for basic blocks.Our certifying compiler based techniques are complementary to each other,and has the following novelties:1)it supports a programming some projects have tried to combine them.Hamid et al. language equipped with both a type system and a logic sys- adopted a syntactic approach to FPCC[7],and gave a trans- tem:2)and it can produce safety proofs for programs with lation from a typed assembly language into FPCC in the pointers. Flint project.The ATS(Applied Type System)[18]project proposed by Hongwei Xi et al.extends a type system with a notion of program states,so that invariants on states can 1.Introduction be captured in stateful programming.By encoding Hoare logic in its type system,ATS can support Hoare-logic-like Nowadays,high-assurance software is more and more reasoning via the type system. valued,and among its properties,safety and security are Based on the above work,we have proposed a new most important.Safety is the ability to guarantee that soft- framework for developing and verifying safety-critical soft- ware execution does no harm to the system.And security ware: is the ability to preserve the confidentiality,integrity,avail- 1.Safety policies of a program are specified formally by ability and authentication of data.There is a close relation- programmers.These specifications as well as the cor- ship between safety and security.For example,hackers usu- responding program are submitted to a compiler. ally use low-grade safety errors,such as buffer overflows, out-of-bound array accesses and dangling pointer derefer- 2.The compiler produces verification conditions (VCs) for the submitted program.These VCs are required to ences,to destroy a system or achieve unauthorized control of a system.Apparently,it is helpful to ensure software se- verify whether the program satisfies its specifications. curity by strengthening its safety,and it makes us focus on If all of them are proved,the program is believed to software safety in this paper. satisfy the specifications.Most VCs can be proved To achieve software safety,all program errors should automatically by the embedded theorem prover in the be discovered before the execution of the program or be compiler,and the rest are proved interactively by pro- gently captured during the execution.The research goal grammers using proof-assistant tools.All the proofs of software safety is to build a wholesome scientific and should be produced and preserved. technological infrastructure for the management of software 3.The compiler compiles the source code and specifica- safety.And the verification method for software to meet its tions into assembly level counterparts while translating
Design of a Certifying Compiler Supporting Proof of Program Safety Yiyun Chen Lin Ge Baojian Hua Zhaopeng Li Cheng Liu Department of Computer Science and Technology University of Science and Technology of China Hefei, Anhui 230026, China yiyun@ustc.edu.cn {gelin, huabj, zpli, liuc5}@mail.ustc.edu.cn Abstract Safety is an important property of high-assurance software, and one of the hot research topics on it is the veri- fication method for software to meet its safety policies. In our previous work, we designed a pointer logic system and proposed a framework for developing and verifying safetycritical programs. And in this paper, we present the design and implementation of a certifying compiler based on that framework. Here we will mainly explain verification condition generation, generation of code and assertions, and proof generation for basic blocks. Our certifying compiler has the following novelties: 1) it supports a programming language equipped with both a type system and a logic system; 2) and it can produce safety proofs for programs with pointers. 1. Introduction Nowadays, high-assurance software is more and more valued, and among its properties, safety and security are most important. Safety is the ability to guarantee that software execution does no harm to the system. And security is the ability to preserve the confidentiality, integrity, availability and authentication of data. There is a close relationship between safety and security. For example, hackers usually use low-grade safety errors, such as buffer overflows, out-of-bound array accesses and dangling pointer dereferences, to destroy a system or achieve unauthorized control of a system. Apparently, it is helpful to ensure software security by strengthening its safety, and it makes us focus on software safety in this paper. To achieve software safety, all program errors should be discovered before the execution of the program or be gently captured during the execution. The research goal of software safety is to build a wholesome scientific and technological infrastructure for the management of software safety. And the verification method for software to meet its safety policies is one of the hot topics in this research field. In the past decade, great progress has been made in the area of program verification. Many projects adopt typebased or logic-based approaches to reason about the properties of low-level or high-level programs. The TAL (Typed Assembly Language) [15] project and the theory of type re- finements [13] are two typical projects using type-based approaches, while Proof-Carrying Code (PCC) [16], Foundational PCC (FPCC) [1], Certified Assembly Programming (CAP) [19] and Stack-based CAP (SCAP) [6] are typical projects on logic-based techniques. Type-based and logicbased techniques are complementary to each other, and some projects have tried to combine them. Hamid et al. adopted a syntactic approach to FPCC [7], and gave a translation from a typed assembly language into FPCC in the Flint project. The ATS (Applied Type System) [18] project proposed by Hongwei Xi et al. extends a type system with a notion of program states, so that invariants on states can be captured in stateful programming. By encoding Hoare logic in its type system, ATS can support Hoare-logic-like reasoning via the type system. Based on the above work, we have proposed a new framework for developing and verifying safety-critical software: 1. Safety policies of a program are specified formally by programmers. These specifications as well as the corresponding program are submitted to a compiler. 2. The compiler produces verification conditions (VCs) for the submitted program. These VCs are required to verify whether the program satisfies its specifications. If all of them are proved, the program is believed to satisfy the specifications. Most VCs can be proved automatically by the embedded theorem prover in the compiler, and the rest are proved interactively by programmers using proof-assistant tools. All the proofs should be produced and preserved. 3. The compiler compiles the source code and specifications into assembly level counterparts while translating
the proofs obtained from step 2 into proofs at assem- pointer arithmetic are forbidden.To guarantee that type- bly level.The latter proofs are carried in the assembly checked programs are well-typed,union types and coercion code and they ensure that the code meets the corre- of types are also forbidden.malloc and free are used as pre- sponding specifications.These specifications must be defined functions which meet the primary safety policies. equivalent to the counterparts at source level.Such a For example,each invocation of malloc will allocate an area compiler is called a certifying compiler. of memory successfully,and the region of the memory will 4.At assembly level,a proof checker checks the proofs never overlap those already in use. Pointers are classified into effective pointers (those carried in the assembly code automatically to ensure point to objects),null pointers and dangling pointers,and that the code satisfies its specifications. the latter two are also called ineffective pointers.Ineffec- The advantage of this framework is that it provides tive pointer dereferences,memory leaks or using an ineffec- source-level approach for reasoning about program proper- tive pointer as the actual parameter of function free are all ties rather than the assembly-level one.Compared with the considered as violations of the primary safety policies. approaches for assembly code certification,this approach A type system is a tractable syntactic method for prov- can improve the efficiency of safety-critical software devel- ing the absence of certain program behaviors by classifying opment.Since the proofs are checked at assembly level, phrases according to the kinds of values they compute.It the compiler,including a VC generator and a prover,can be is mainly used to eliminate context-sensitive errors in pro- removed from the TCB(Trusted Computing Base).There- grams.A traditional type system is not sufficient when the fore,the TCB of the system will be reduced remarkably. legality of a phrase depends not only on the context but In this framework,we choose a C-like programming lan- also on some expression values in the phrase.Dependent guage PointerC [9]as our source language and implement types [18]is one possible solution to this problem.Another a certifying compiler prototype for it.Our main contri- solution is not to treat the constraints on values as parts of bution is the design of a certifying compiler for a source the type system,and the constraints don't appear in the typ- language with pointers and explicit memory management ing rules.The former leads to a complicated type system (malloc/free).And a prototype of the certifying compiler and the latter leads to an unsound one.We try to find a trade- has been implemented in our work.A distinct feature of our off between these two solutions in the design of PointerC. certifying compiler is that it supports languages equipped To achieve simplicity in the type system and guarantee the with both a type system and a logic system. safety of the language at the same time,side conditions are In this paper,we introduce the design and implemen- introduced into the typing rules to express the constraints on tation of our certifying compiler.The rest of the paper is values.For example,the following two typing rules show organized as follows.We introduce the source language- that the constraints on subscript expressions and pointers PointerC briefly in section 2.In section 3,we discuss how are put in the side conditions. the compiler generates VCs at source level from the asser- THe:int THa:array(10,bool) tions given by programmers and the side conditions in the 0≤e∧e≤9 TFa[e]bool typing rules.Section 4 describes the scheme for generat- ing corresponding assertions at some program points dur- TFp ptr(struct(...,x:int,...)) p∈effective.-ptrs ing intermediate code generation.This ensures that each THp->x:int basic block has proper pre-and postconditions.Section 5 These side conditions must be checked before the ex- explains how the compiler generates a proof for each ba- ecution of the program to ensure that the execution does sic block.Each proof ensures that the corresponding basic not violate the primary safety policies.And to check these block satisfies its pre-and postconditions.Section 6 com- side conditions statically,we have designed a pointer logic pares our work with related work and section 7 concludes. for PointerC.The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool.It collects 2.Source language-PointerC pointer information in a forward manner.This information includes whether a pointer is null,dangling or effective,and PointerC is a C-like programming language that we have the equality relation between effective pointers.The in- designed as the source language in our work.In order to formation can be used to prove that the program satisfies check more pointer programs statically,some pointer oper- the side conditions in the typing rules and then to support ations are restricted in PointerC [9].These restrictions are value-sensitive static checking.For example,according to based on the premise of not affecting the language function- the pointer logic,the following two Hoare triples ality of PointerC.Variables with pointer type can only be used in assignment,equality comparison,dereference or as [p,q)A [p->next)}p=malloc(...) the parameters of functions;the address-of operator(&)and [fp)A {q)A{q->next}A {p->next)D}
the proofs obtained from step 2 into proofs at assembly level. The latter proofs are carried in the assembly code and they ensure that the code meets the corresponding specifications. These specifications must be equivalent to the counterparts at source level. Such a compiler is called a certifying compiler. 4. At assembly level, a proof checker checks the proofs carried in the assembly code automatically to ensure that the code satisfies its specifications. The advantage of this framework is that it provides source-level approach for reasoning about program properties rather than the assembly-level one. Compared with the approaches for assembly code certification, this approach can improve the efficiency of safety-critical software development. Since the proofs are checked at assembly level, the compiler, including a VC generator and a prover, can be removed from the TCB (Trusted Computing Base). Therefore, the TCB of the system will be reduced remarkably. In this framework, we choose a C-like programming language PointerC [9] as our source language and implement a certifying compiler prototype for it. Our main contribution is the design of a certifying compiler for a source language with pointers and explicit memory management (malloc/free). And a prototype of the certifying compiler has been implemented in our work. A distinct feature of our certifying compiler is that it supports languages equipped with both a type system and a logic system. In this paper, we introduce the design and implementation of our certifying compiler. The rest of the paper is organized as follows. We introduce the source language— PointerC briefly in section 2. In section 3, we discuss how the compiler generates VCs at source level from the assertions given by programmers and the side conditions in the typing rules. Section 4 describes the scheme for generating corresponding assertions at some program points during intermediate code generation. This ensures that each basic block has proper pre- and postconditions. Section 5 explains how the compiler generates a proof for each basic block. Each proof ensures that the corresponding basic block satisfies its pre- and postconditions. Section 6 compares our work with related work and section 7 concludes. 2. Source language—PointerC PointerC is a C-like programming language that we have designed as the source language in our work. In order to check more pointer programs statically, some pointer operations are restricted in PointerC [9]. These restrictions are based on the premise of not affecting the language functionality of PointerC. Variables with pointer type can only be used in assignment, equality comparison, dereference or as the parameters of functions; the address-of operator (&) and pointer arithmetic are forbidden. To guarantee that typechecked programs are well-typed, union types and coercion of types are also forbidden. malloc and free are used as predefined functions which meet the primary safety policies. For example, each invocation of malloc will allocate an area of memory successfully, and the region of the memory will never overlap those already in use. Pointers are classified into effective pointers (those point to objects), null pointers and dangling pointers, and the latter two are also called ineffective pointers. Ineffective pointer dereferences, memory leaks or using an ineffective pointer as the actual parameter of function free are all considered as violations of the primary safety policies. A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. It is mainly used to eliminate context-sensitive errors in programs. A traditional type system is not sufficient when the legality of a phrase depends not only on the context but also on some expression values in the phrase. Dependent types [18] is one possible solution to this problem. Another solution is not to treat the constraints on values as parts of the type system, and the constraints don’t appear in the typing rules. The former leads to a complicated type system and the latter leads to an unsound one. We try to find a tradeoff between these two solutions in the design of PointerC. To achieve simplicity in the type system and guarantee the safety of the language at the same time, side conditions are introduced into the typing rules to express the constraints on values. For example, the following two typing rules show that the constraints on subscript expressions and pointers are put in the side conditions. Γe : int Γa : array(10, bool) Γa[e] : bool 0 ≤ e ∧ e ≤ 9 Γp : ptr(struct(..., x:int,... )) Γp->x : int p ∈ effective ptrs These side conditions must be checked before the execution of the program to ensure that the execution does not violate the primary safety policies. And to check these side conditions statically, we have designed a pointer logic for PointerC. The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool. It collects pointer information in a forward manner. This information includes whether a pointer is null, dangling or effective, and the equality relation between effective pointers. The information can be used to prove that the program satisfies the side conditions in the typing rules and then to support value-sensitive static checking. For example, according to the pointer logic, the following two Hoare triples {{{p, q}∧{p->next}}} p=malloc(···) {{{p}∧{q}∧{q->next}∧{p->next}D}}
and 2.A verification condition generator(VCGen)is embed- ded into the front end of the compiler.It can convert {p,q}{p->nextN}free(p)[ip,q)D} the task of proving a program satisfying its specifica- tions into the task of proving a set of VCs.From the hold,where the set [p}represents an assertion,and its in- annotations mentioned in 1 and the side conditions in formal meaning is"p is an effective pointer and is not equal the typing rules,the VCGen generates a set of VCs us- to any other pointers".Similarly,the set {p,g}means"Ef- ing the pointer logic rules.And these VCs should be fective pointers p and q are equal(but not aliases),and they proved in order to guarantee program safety. are not equal to any other pointers";{p,g)p denotes pand q are dangling pointers;and [p->next)N means that p->next 3.A simple theorem prover,which produces correspond- is a null pointer.According to the pointer logic,there are ing proofs for pointer-related VCs,is embedded into no postconditions Q and Q2 which make the following two the compiler as well.Integer-related VCs are proved Hoare triples hold.That is,the preconditions of malloc and interactively in Coq [4]by the programmers.The VCs free here can not satisfy the side conditions in the corre- and their proofs show that the source program satisfies sponding typing rules. its specifications. {{p)A {p->next}p=malloc(...)Q} At first,we discuss how to generate VCs for functions with pointer-type data.Figure 1 shows the structure of a (The object previously pointed by p is lost.) function chosen from PointerC syntax,some irrelevant de- tails are omitted. fp}A {p->next)}p=free(p)Q2} (The object previously pointed by p->next is lost.) FunDcl → id(arg){Body) In history,pointer analyses were mostly conservative es- Body → VarDecList StmtList timates of the pointer status at runtime.And they mainly StmtList → Stmt StmtListe answered the question:what was the possible set of objects Stmt → Ival Exp that a pointer may point to at runtime?Such pointer analy- if(Bexp){StmtList else {StmtList ses can be used in many fields of static analysis and program while (Bexp)[StmtList} optimization,such as liveness analysis needed by register Ival alloc(Type) allocation and constant propagation etc.To meet the safety free(Exp) requirements of software,we have restricted some unde- Ival id(Exp) cidable pointer operations in PointerC,and thus obtained return res an accurate pointer analysis instead of an approximate one. This is the primary difference of our pointer analysis from Figure 1.Structure of function others.Although accurate pointer analysis does not exist for all programs,our practical experience on the applica- tions of some data structures,such as singly linked list,bi- The pointer logic is fit for collecting pointer information nary tree and circular doubly linked list etc.,shows that the in a forward manner,so that the VC generation is based pointer logic is feasible.Besides side conditions in the typ- on the strongest postcondition calculus.Figure 2 shows the ing rules,the pointer logic can also be used to reason about main rules for the strongest postcondition calculus (function other program properties to accommodate the requests of sp)in the pointer logic and the framework of VC generation user-defined safety policies. (for data with pointer types).These rules are recursively de- fined according to the syntactic structures in a function.In Figure 2,II,N and D denote the equivalent division of all 3.Verification condition generation effective pointers,the set of null pointers and the set of dan- gling pointers respectively [2]:is the short notation for In order to check program safety statically,side condi- IIANA D;the first parameter of function sp is a syntactic tions in the typing rules must be provable at the correspond- structure,and the second one is the precondition of the syn- ing program points.This is achieved through the following tactic structure.The VC generation described here has the steps: following important characteristics: 1.Programmers annotate each function with a pair of pre- 1.Since the precondition and the postcondition of and postconditions and each while loop with a loop in- a function are given,the VC sp(StmtList,)is variant.These annotations belong to the specifications generated at the exit of the function(note that StmtList of the source program. forms the statement list of the function)
and {{{p, q}∧{p->next}N }} free(p) {{{p, q}D}} hold, where the set {p} represents an assertion, and its informal meaning is “p is an effective pointer and is not equal to any other pointers”. Similarly, the set {p, q} means “Effective pointers p and q are equal (but not aliases), and they are not equal to any other pointers”; {p, q}D denotes p and q are dangling pointers; and {p->next}N means that p->next is a null pointer. According to the pointer logic, there are no postconditions Q1 and Q2 which make the following two Hoare triples hold. That is, the preconditions of malloc and free here can not satisfy the side conditions in the corresponding typing rules. {{{p}∧{p->next}N }} p=malloc(···) {{Q1}} (The object previously pointed by p is lost.) {{{p}∧{p->next}}} p=free(p) {{Q2}} (The object previously pointed by p->next is lost.) In history, pointer analyses were mostly conservative estimates of the pointer status at runtime. And they mainly answered the question: what was the possible set of objects that a pointer may point to at runtime? Such pointer analyses can be used in many fields of static analysis and program optimization, such as liveness analysis needed by register allocation and constant propagation etc. To meet the safety requirements of software, we have restricted some undecidable pointer operations in PointerC, and thus obtained an accurate pointer analysis instead of an approximate one. This is the primary difference of our pointer analysis from others. Although accurate pointer analysis does not exist for all programs, our practical experience on the applications of some data structures, such as singly linked list, binary tree and circular doubly linked list etc., shows that the pointer logic is feasible. Besides side conditions in the typing rules, the pointer logic can also be used to reason about other program properties to accommodate the requests of user-defined safety policies. 3. Verification condition generation In order to check program safety statically, side conditions in the typing rules must be provable at the corresponding program points. This is achieved through the following steps: 1. Programmers annotate each function with a pair of preand postconditions and each while loop with a loop invariant. These annotations belong to the specifications of the source program. 2. A verification condition generator (VCGen) is embedded into the front end of the compiler. It can convert the task of proving a program satisfying its specifications into the task of proving a set of VCs. From the annotations mentioned in 1 and the side conditions in the typing rules, the VCGen generates a set of VCs using the pointer logic rules. And these VCs should be proved in order to guarantee program safety. 3. A simple theorem prover, which produces corresponding proofs for pointer-related VCs, is embedded into the compiler as well. Integer-related VCs are proved interactively in Coq [4] by the programmers. The VCs and their proofs show that the source program satisfies its specifications. At first, we discuss how to generate VCs for functions with pointer-type data. Figure 1 shows the structure of a function chosen from PointerC syntax, some irrelevant details are omitted. FunDcl → id(arg){Body} Body → VarDecList StmtList StmtList → Stmt StmtList | Stmt → lval = Exp | if (Bexp) {StmtList} else {StmtList} | while (Bexp) {StmtList} | lval = alloc(Type) | free(Exp) | lval = id(Exp) | return res Figure 1. Structure of function The pointer logic is fit for collecting pointer information in a forward manner, so that the VC generation is based on the strongest postcondition calculus. Figure 2 shows the main rules for the strongest postcondition calculus (function sp) in the pointer logic and the framework of VC generation (for data with pointer types). These rules are recursively de- fined according to the syntactic structures in a function. In Figure 2, Π, N and D denote the equivalent division of all effective pointers, the set of null pointers and the set of dangling pointers respectively [2]; Ψ is the short notation for Π ∧N ∧D; the first parameter of function sp is a syntactic structure, and the second one is the precondition of the syntactic structure. The VC generation described here has the following important characteristics: 1. Since the precondition Ψ and the postcondition Ψ of a function are given, the VC sp(StmtList, Ψ) ⊃ Ψ is generated at the exit of the function (note that StmtList forms the statement list of the function)
1.Function definition [亚]id(arg){Body}[亚'],that is[亚]StmtList[亚'),where the StmtList is the StmtList in the Body production. 2.Statement List ·sp(Stmt StmtList,业)=sp(StmtList,sp(Stmt,业) ·sp(e,亚)=亚.If e forms the StmtList of a function,.then VC=亚p亚'(see 1 for亚will be generated.. 3.Statement ·assignment:sp(lval=Exp,亚)=亚',亚'can be calculated using亚according to the assignment rules in the pointer logic. ·condition:sp(if(Bexp){StmtList}else{StmtList2},平)=sp(StmtList1,BexpAΨ)Vsp(StmtList2,一BexpA平) loop:sp(while(Bexp){StmtList,)=-BexpA I,where Iis the loop invariant for pointer-type data. VC1=I and VC2 sp(StmtList,Bexp A DI should be generated at the entry point and the exit of StmtList respectively. 。allocation:sp(Ival=alloc(Type),亚)=亚',亚can be calculated using亚according to the allocation rules in the pointer logic. 。deallocation:sp(free(Exp),亚)=亚',亚'can be calculated using亚according to the deallocation rule in the pointer logic. .function call:If the pre-and postconditions of function id are farg)[lvalA and A Qrespectively,then sp(Ival id(Exp),)=(A Q)[arg-Exp][res-Ivall,where Q and Qarg are assertions that have nothing to do with pointers,arg is the formal parameter,and res is the return value (see Figure 1). VC=(farg}A [Ival)A Qn)[arg-Exp]should be generated at the entry point of this statement. ·return:sp(return res,.亚)=亚 Figure 2.The Strongest postcondition calculus and the VC generation of pointer-type data 2.One difficulty of strongest postcondition calculus is the assignment axiom.Instead,different assignment rules need to find a fixpoint for a recursive equation in a loop are used in different cases in our pointer logic.Since statement.The solutions to such equations are usually the pointer analysis is precise,it is easy to determine undecidable,and it is also the primary reason why the which rule to use in a certain case and it is clear how correctness of a program can not be proved automat- to compute 'using in the sp rule for assignment. ically.The loop invariant provided by programmers Please refer to [2]for some details skipped in Figure 2. is used to avoid the difficulty.However,in order to prove the validity of the loop invariant,two VCs must 5.At the entry point of a function,the pointer logic be generated at the entry point and the exit of the loop should check the initial values of the pointer-type for- respectively. mal parameters and local variables.And at the exit of the function,their effectiveness should also be checked 3.The pre-and postconditions of each function have also to avoid memory leaks.But for the space limit,the VC deeply simplified the computation of sp for function generation in Figure 2 does not reflect this. call statement.Briefly speaking,the before the call statement should imply the precondition of the callee, For integer-type data,we adopt a complemented ap- and the callee's postcondition should be used as the proach of Hoare logic-weakest precondition calculus strongest postcondition after the call statement.Cer- (wp)[5]to generate assertions or VCs at each program tainly,we also need to consider the substitution of ac- point.And we only introduce its distinctions from the ap- tual parameters for formal parameters as well as the proach for pointer-type data as follows. substitution of the variable Ival for the variable res(see function call in Figure 2). 1.Since the assertions in a function are calculated back- ward using the postcondition Q of the function,a VC 4.One remarkable distinction of our pointer logic from Pwp(StmtList,Q)is generated at the entry point of Hoare logic is that the pointer logic has no uniform the function,where P is the precondition of the func-
1. Function definition {{Ψ}} id(arg){Body} {{Ψ }}, that is {{Ψ}} StmtList {{Ψ }}, where the StmtList is the StmtList in the Body production. 2. Statement List • sp(Stmt StmtList, Ψ) = sp(StmtList, sp(Stmt, Ψ)) • sp(, Ψ) = Ψ. If forms the StmtList of a function, then VC = Ψ ⊃ Ψ (see 1 for Ψ ) will be generated. 3. Statement • assignment: sp(lval = Exp, Ψ) = Ψ , Ψ can be calculated using Ψ according to the assignment rules in the pointer logic. • condition: sp(if (Bexp) {StmtList1} else {StmtList2}, Ψ) = sp(StmtList1, Bexp∧Ψ)∨sp(StmtList2, ¬Bexp∧Ψ) • loop: sp(while (Bexp) {StmtList}, Ψ) = ¬Bexp ∧ I, where I is the loop invariant for pointer-type data. VC1 = Ψ ⊃ I and VC2 = sp(StmtList, Bexp ∧ I) ⊃ I should be generated at the entry point and the exit of StmtList respectively. • allocation: sp(lval = alloc(Type), Ψ) = Ψ , Ψ can be calculated using Ψ according to the allocation rules in the pointer logic. • deallocation: sp(free(Exp), Ψ) = Ψ , Ψ can be calculated using Ψ according to the deallocation rule in the pointer logic. • function call: If the pre- and postconditions of function id are {arg}∧{lval}N ∧Qarg and Ψ ∧Q respectively, then sp(lval = id(Exp), Ψ) = (Ψ ∧ Q)[arg ← Exp][res ← lval], where Q and Qarg are assertions that have nothing to do with pointers, arg is the formal parameter, and res is the return value (see Figure 1). VC = Ψ ⊃ ({arg}∧{lval}N ∧ Qarg)[arg ← Exp] should be generated at the entry point of this statement. • return: sp(return res, Ψ) = Ψ Figure 2. The Strongest postcondition calculus and the VC generation of pointer-type data 2. One difficulty of strongest postcondition calculus is the need to find a fixpoint for a recursive equation in a loop statement. The solutions to such equations are usually undecidable, and it is also the primary reason why the correctness of a program can not be proved automatically. The loop invariant provided by programmers is used to avoid the difficulty. However, in order to prove the validity of the loop invariant, two VCs must be generated at the entry point and the exit of the loop respectively. 3. The pre- and postconditions of each function have also deeply simplified the computation of sp for function call statement. Briefly speaking, the Ψ before the call statement should imply the precondition of the callee, and the callee’s postcondition should be used as the strongest postcondition after the call statement. Certainly, we also need to consider the substitution of actual parameters for formal parameters as well as the substitution of the variable lval for the variable res (see function call in Figure 2). 4. One remarkable distinction of our pointer logic from Hoare logic is that the pointer logic has no uniform assignment axiom. Instead, different assignment rules are used in different cases in our pointer logic. Since the pointer analysis is precise, it is easy to determine which rule to use in a certain case and it is clear how to compute Ψ using Ψ in the sp rule for assignment. Please refer to [2] for some details skipped in Figure 2. 5. At the entry point of a function, the pointer logic should check the initial values of the pointer-type formal parameters and local variables. And at the exit of the function, their effectiveness should also be checked to avoid memory leaks. But for the space limit, the VC generation in Figure 2 does not reflect this. For integer-type data, we adopt a complemented approach of Hoare logic—weakest precondition calculus (wp) [5] to generate assertions or VCs at each program point. And we only introduce its distinctions from the approach for pointer-type data as follows. 1. Since the assertions in a function are calculated backward using the postcondition Q of the function, a VC P ⊃ wp(StmtList, Q) is generated at the entry point of the function, where P is the precondition of the func-
tion and StmtList is the statement sequence of the func- 4.Code and assertion generation tion. 2.The side condition in a typing rule should be combined In our framework,specifications and the proof of code satisfying the specifications are carried in the assembly with the assertion which is at the entry point of the code.The assembly code is divided into basic blocks.Basic corresponding statement.For pointer-type data,such a problem does not exist.Because a pointer-related side block,which is a concept in code optimization and gener- ation,is a sequence of instruction;and in our design,the condition is consistent with the premise of the corre- instruction sequence ends with a control transfer instruction sponding inference rule in the pointer logic,and should have been checked before the rule is chosen. such as jmp or call.Each basic block B has its precon dition P,postcondition Q,and the proof or proof hint of 3.When we compute pointer-related assertions,we need [P}B {Q}.The proof can be checked by a proof checker. According to the principle that the postcondition of a basic to decide which assignment rule to use according to the block should imply the precondition of the succeeding ba- assertion before the assignment.However,there is no need to do that about integer-related assertions using sic block in the control flow.Q can be omitted since we can just take the precondition of the succeeding basic block as wp. In fact,the computations of assertions and VCs for In order to make sure that each basic block has perti- integer-type data and pointer-type data interact with each nent pre-and postconditions,assertions should be gener- other. ated at proper program points during code generation.In this section,we introduce a scheme for generating interme- 1.Both of them should be performed together with type diate code and the corresponding assertions.For the lack checking,since they both need side conditions in the of space,we only consider pointer-related assertions.Us- typing rules.But the backward calculus for integer- ing the calculi in section 3,we can get a proper assertion related assertions will reduce the efficiency of the per- at each program point.Figure 3 shows the scheme for the formance,because the calculus direction does not con- main syntactic structures.The first parameter of the recur- sist with the direction of type checking. sive function Code is a syntactic structure.and the second parameter is the precondition of the structure.The function 2.The pre-and postconditions of functions,loop invari- Code generates intermediate code and assertions at some ants and the Bexps of while and if statements may program points.The generated assertion is at the right side contain both integers and pointers.Such an assertion of the code.The assertion which follows"--"should be should be divided into two parts,each of which con- inserted before the line of the code,and which follows"++" cerns only one kind of data and participates one calcu- should be inserted after the line of the code;and the as- lus. sertion led by "**"is the VC to be proved.The function Code calls function code which uses syntactic structure as 3.The variable p->data makes sense only when p is an the only parameter.The function code only generates in- effective pointer,but in Hoare logic,there is no such a termediate code for the syntactic structure.The function consideration.Therefore.rules in Hoare logic can not Code does not generate assertions between the intermediate be used in the case where pointers are concerned.The code which will obviously be translated into the same ba- pointer logic collects pointer equality information,and sic block,such as the code for assignment statements and this information can help us check whether p->data Boolean expressions (Boolean expression calculation does makes sense.Moreover,when using assignment ax- not use short-circuit here).The description in Figure 3 actu- iom,we can perform alias substitution [2]according ally covers some assertion and VC generations in Figure 2, to the equality information of pointers. hence all of these can be done via a one-pass inspection of source programs during compilation. Using the pointer logic,it is easy to prove the correct- When considering the generation of integer-related as- ness of the VC generation,i.e.,if all of the VCs are proved. sertions and VCs,it is difficult to do all the work about the Hoare triple [P}id(arg){Body}[Q}holds.The VC pointers and integers in one pass,because they are based generator is removed from the TCB,since there is a proof on the calculi in different directions.The compiler can checker at assembly level in our framework.Specifications, do type checking,pointer-related assertion generation,VC VCs and the proofs of VCs at source level will be trans- generation,integer side-condition annotation and interme- lated into equivalent assembly-level counterparts.And the diate code generation in the first pass and generate integer- translated counterparts will be used in the verification at as- related assertions and VCs using the annotation of integer sembly level. side conditions in the second pass
tion and StmtList is the statement sequence of the function. 2. The side condition in a typing rule should be combined with the assertion which is at the entry point of the corresponding statement. For pointer-type data, such a problem does not exist. Because a pointer-related side condition is consistent with the premise of the corresponding inference rule in the pointer logic, and should have been checked before the rule is chosen. 3. When we compute pointer-related assertions, we need to decide which assignment rule to use according to the assertion before the assignment. However, there is no need to do that about integer-related assertions using wp. In fact, the computations of assertions and VCs for integer-type data and pointer-type data interact with each other. 1. Both of them should be performed together with type checking, since they both need side conditions in the typing rules. But the backward calculus for integerrelated assertions will reduce the efficiency of the performance, because the calculus direction does not consist with the direction of type checking. 2. The pre- and postconditions of functions, loop invariants and the Bexps of while and if statements may contain both integers and pointers. Such an assertion should be divided into two parts, each of which concerns only one kind of data and participates one calculus. 3. The variable p->data makes sense only when p is an effective pointer, but in Hoare logic, there is no such a consideration. Therefore, rules in Hoare logic can not be used in the case where pointers are concerned. The pointer logic collects pointer equality information, and this information can help us check whether p->data makes sense. Moreover, when using assignment axiom, we can perform alias substitution [2] according to the equality information of pointers. Using the pointer logic, it is easy to prove the correctness of the VC generation, i.e., if all of the VCs are proved, the Hoare triple {{P}} id(arg){Body} {{Q}} holds. The VC generator is removed from the TCB, since there is a proof checker at assembly level in our framework. Specifications, VCs and the proofs of VCs at source level will be translated into equivalent assembly-level counterparts. And the translated counterparts will be used in the verification at assembly level. 4. Code and assertion generation In our framework, specifications and the proof of code satisfying the specifications are carried in the assembly code. The assembly code is divided into basic blocks. Basic block, which is a concept in code optimization and generation, is a sequence of instruction; and in our design, the instruction sequence ends with a control transfer instruction such as jmp or call. Each basic block B has its precondition P, postcondition Q, and the proof or proof hint of {{P}} B {{Q}}. The proof can be checked by a proof checker. According to the principle that the postcondition of a basic block should imply the precondition of the succeeding basic block in the control flow, Q can be omitted since we can just take the precondition of the succeeding basic block as Q. In order to make sure that each basic block has pertinent pre- and postconditions, assertions should be generated at proper program points during code generation. In this section, we introduce a scheme for generating intermediate code and the corresponding assertions. For the lack of space, we only consider pointer-related assertions. Using the calculi in section 3, we can get a proper assertion at each program point. Figure 3 shows the scheme for the main syntactic structures. The first parameter of the recursive function Code is a syntactic structure, and the second parameter is the precondition of the structure. The function Code generates intermediate code and assertions at some program points. The generated assertion is at the right side of the code. The assertion which follows “--” should be inserted before the line of the code, and which follows “++” should be inserted after the line of the code; and the assertion led by “∗∗” is the VC to be proved. The function Code calls function code which uses syntactic structure as the only parameter. The function code only generates intermediate code for the syntactic structure. The function Code does not generate assertions between the intermediate code which will obviously be translated into the same basic block, such as the code for assignment statements and Boolean expressions (Boolean expression calculation does not use short-circuit here). The description in Figure 3 actually covers some assertion and VC generations in Figure 2, hence all of these can be done via a one-pass inspection of source programs during compilation. When considering the generation of integer-related assertions and VCs, it is difficult to do all the work about pointers and integers in one pass, because they are based on the calculi in different directions. The compiler can do type checking, pointer-related assertion generation, VC generation, integer side-condition annotation and intermediate code generation in the first pass and generate integerrelated assertions and VCs using the annotation of integer side conditions in the second pass
1.Statement list ·non-empty:Code(Stmt StmtList,亚)= Code(Stmt,亚) --亚 Code(StmtList,sp(Stmt,)) ·empy:Code(e,亚)=e *If eforms the StmtList of a function Body,VC=亚一亚' will be generated.is the postcondition of the function. 2.Statement ·assignment:Code(lval=Exp,亚)= code(lval =Exp) ++sp(val=Exp,Ψ) .condition:Code(if Bexp {StmtList}else {StmtList},)= code(Bexp) ++亚 JUMP on false to L1 Code(StmtList1,BexpA) GOTO L2 Li:Code(StmtList2,一BexpAΨ) L2: ·loop:Code(while(Bexp){StmtList},亚)= L1 code(Bexp) --I,+I.*VC=I(I is the loop invariant.) JUMP on false to L2 Code(StmtList,BexpA D GOTO L1 *VC=sp(StmtList,BexpA I)I L2: ---BexpAI ●function call:Code(lval=id(Exp),亚)= temp :code(Exp) -({arg}A{Ival}N A Qn)儿arg←-Exp *VC=亚({arg}A{IvalNA Qarg)儿arg←-Exp param temp +({arg}∧{Ival)NA Qarg)[arg←temp call id ++(亚'AQ)[arg←-Exp][res-res-iemp Ival :res-temp +(亚'AQ)[arg←-Exp[res temp←-lval Figure 3.Intermediate code generation and generation of pointer-related assertions When generating assembly code,the pre-and postcondi- there are some relatively steady assertions such as "the tions of basic blocks should be adjusted as follows: return address saved in the current stack frame will not be overwritten during the execution of the basic 1.At source or intermediate level,variables are repre- block".All of these assertions depend on the target sented by names in assertions;but at assembly level, machine.Since they are almost the same for each ba- they are represented by memory addresses or registers. sic block,there is no difficulty in generating them. Also,an assertion at assembly level is parameterized by a machine state.So,assertions also need to be trans- lated during code generation.It is lucky that this kind 5.Proof generation for basic blocks of translation is straightforward. In the generated assembly code,each basic block B has a 2.Registers are used to store temporary values in the as- proof or proof hint for [P}B[Q].The proof or proof hint sembly code,so the contents of some registers may is generated by the compiler.Besides the proofs or hints equal the values of some variables at the exit of one for basic blocks,the assembly code also carries assembly- basic block.Usually,code generation algorithm can level VCs and their proofs.These VCs and proofs can be collect such information.And this information makes achieved by translating the source-level VCs in Figure 2 and it easy to adjust s at the entry point and the exit of their proofs respectively.And they are usually used when one basic block. a basic block's postcondition should be proved to imply its The assertion generation in Figure 3 also faces this successor's precondition. problem,because temporary variables may be intro- At assembly level,we adopt an approach that applies the duced into the intermediate code for expressions. method of Hoare logic directly to the operational semantics of Intel x86 ISA.This approach has the following charac- 3.At the entry point and the exit of each basic block. ters.First,an assertion is parameterized by a machine state;
1. Statement list • non-empty: Code(Stmt StmtList,Ψ) = Code(Stmt, Ψ) -- Ψ Code(StmtList, sp(Stmt, Ψ)) • empty: Code(,Ψ) = ∗∗ If forms the StmtList of a function Body, VC = Ψ ⊃ Ψ will be generated. Ψ is the postcondition of the function. 2. Statement • assignment: Code(lval = Exp,Ψ) = code(lval = Exp) ++ sp(lval = Exp, Ψ) • condition: Code(if Bexp {StmtList1} else {StmtList2},Ψ) = code(Bexp) ++ Ψ JUMP on false to L1 Code(StmtList1, Bexp ∧ Ψ) GOTO L2 L1: Code(StmtList2, ¬Bexp ∧ Ψ) L2: • loop: Code(while (Bexp) {StmtList},Ψ) = L1 : code(Bexp) -- I, ++ I, ∗∗ VC = Ψ ⊃ I (I is the loop invariant.) JUMP on false to L2 Code(StmtList, Bexp ∧ I) GOTO L1 ∗∗ VC = sp(StmtList,Bexp ∧ I) ⊃ I L2 : -- ¬Bexp ∧ I • function call: Code(lval = id(Exp),Ψ) = temp := code(Exp) -- ({arg}∧{lval}N ∧ Qarg)[arg ← Exp] ∗∗ VC = Ψ ⊃ ({arg}∧{lval}N ∧ Qarg)[arg ← Exp] param temp ++ ({arg}∧{lval}N ∧ Qarg)[arg ← temp] call id ++ (Ψ ∧ Q)[arg ← Exp][res ← res temp] lval := res temp ++ (Ψ ∧ Q)[arg ← Exp][res temp ← lval] Figure 3. Intermediate code generation and generation of pointer-related assertions When generating assembly code, the pre- and postconditions of basic blocks should be adjusted as follows: 1. At source or intermediate level, variables are represented by names in assertions; but at assembly level, they are represented by memory addresses or registers. Also, an assertion at assembly level is parameterized by a machine state. So, assertions also need to be translated during code generation. It is lucky that this kind of translation is straightforward. 2. Registers are used to store temporary values in the assembly code, so the contents of some registers may equal the values of some variables at the exit of one basic block. Usually, code generation algorithm can collect such information. And this information makes it easy to adjust Ψs at the entry point and the exit of one basic block. The assertion generation in Figure 3 also faces this problem, because temporary variables may be introduced into the intermediate code for expressions. 3. At the entry point and the exit of each basic block, there are some relatively steady assertions such as “the return address saved in the current stack frame will not be overwritten during the execution of the basic block”. All of these assertions depend on the target machine. Since they are almost the same for each basic block, there is no difficulty in generating them. 5. Proof generation for basic blocks In the generated assembly code, each basic block B has a proof or proof hint for {{P}} B {{Q}}. The proof or proof hint is generated by the compiler. Besides the proofs or hints for basic blocks, the assembly code also carries assemblylevel VCs and their proofs. These VCs and proofs can be achieved by translating the source-level VCs in Figure 2 and their proofs respectively. And they are usually used when a basic block’s postcondition should be proved to imply its successor’s precondition. At assembly level, we adopt an approach that applies the method of Hoare logic directly to the operational semantics of Intel x86 ISA. This approach has the following characters. First, an assertion is parameterized by a machine state;
second,the primary inference rules are based on the oper- of the proof. ational semantics of the instructions,that is,the machine The precondition p of the basic block is state before the execution of an instruction is used to calcu- late the state after it [12].While in Hoare logic,the calcu- AS.S(S(ebp)-16)=S(S(ebp)+16)x S(S(ebp)-20)A lation of the precondition from the postcondition is directly S(S(ebp)-20)<S(S(ebp)+12)A based on syntactic substitution.Generally,machine state S(S(ebp)-20)<S(S(ebp)+12). denotes a map from registers,stack locations and heap loca- tions to values.The change of machine state is based on the (Note that the corresponding assertion at source level is operational semantics of the instructions.The formal de- {x=m*yAy<=n∧y<n}) scription of our proof-carrying assembly code has referred The postcondition g(g is also the precondition of the to CAP and SCAP [19,6],which are common frameworks succeeding basic block 1loop)is supporting Hoare style reasoning for assembly code certifi- cation. AS.S(S(ebp)-16)=S(S(ebp)+16)x S(S(ebp)-20)A The instruction execution may change pointer informa- S(S(ebp)-20)<S(S(ebp)+12). tion in II,N and D,and the change can be deduced ac- cording to the pointer logic at assembly level [11].Besides (Similarly,the corresponding assertion at source level is function call,the instructions which may change the pointer {x ==m y ay <n},i.e.,the loop invariant.) information include mov1,pushl etc.(we call them assign- In these assertions,S denotes the current machine state. ment instructions in general).The rules for these instruc- If the postcondition of an instructiont is Q,its precondition tions in the pointer logic at assembly level are almost the P can be achieved through the formula:(note that P and Q same as the corresponding assignment rules at source level, are parameterized by a machine state S) except that variables are replaced by memory addresses, registers,and register indirect addresses etc.Since there P=gp(L,Q)=AS.Q(upd(S,)) is no complex access path in assembly programs,aliasing calculation at assembly level is much simpler than that at where the function upd(S,)represents the operational se- source level. mantics of the instruction t,and the return value of upd(S, is a machine state after the execution of the instruction t in To generate the proof of a basic block satisfying its pre- state S. and postconditions,it is important to generate assertions Figure 5 shows the operational semantics and precondi- between the instructions in the basic block.These asser- tion generation formulas of some instructions.Sab is tions can be generated from the pre-and postconditions of a machine state,and it means: the basic block.According to P,Q and the correspond- ing machine state,the verification framework at assembly b ifc=a, level checks whether P}O}holds for each instruction Sa→(c)= S(c)ifc≠a. t in the basic block.Pointer-related assertions are generated and inserted into the basic block in a forward manner,with For example,suppose the postcondition for instruction integer-related assertions in a backward manner.Pointer- “mov1r1,r2”is related assertions at assembly level are rather difficult to understand,so we take a basic block with integer-related λS.S(r2)=3, assertions as an example to explain how to generate and in- according to gp((movl r1,r2),Q)in Figure 5,the pre- sert assertions between instructions. condition is The source program in Figure 4 is a function to mul- tiply integer m by n.The assertions in notation {are λS.(S'.S'(r2)=3)Sr2→S(r1)): provided by programmers.The precondition of the func- tion is represented as the assertion in the first line;in line that is, 12 is the postcondition.The loop invariant for while is in λS.S[r2→S(r1)](r2)=3. line 6.The symbol"result"represents the return value According to the definition of Slab],the precondition of the function.The assembly code in the same figure cor- can be reduced to responds to the while loop (line 6-9).where the addresses of n,m,x and y in the stack are represented by 12(%ebp), λS.S(r1)=3. 16(%ebp),-16(%ebp)and -20(%ebp)respectively,and ebp is the base pointer of current stack frame.Then,we take the basic block Itrue(the basic block which are num- Using gp(,Q),we can get the precondition of each bered in Figure 4)as an example to explain the generation instruction backward from the postcondition of the basic
second, the primary inference rules are based on the operational semantics of the instructions, that is, the machine state before the execution of an instruction is used to calculate the state after it [12]. While in Hoare logic, the calculation of the precondition from the postcondition is directly based on syntactic substitution. Generally, machine state denotes a map from registers, stack locations and heap locations to values. The change of machine state is based on the operational semantics of the instructions. The formal description of our proof-carrying assembly code has referred to CAP and SCAP [19, 6], which are common frameworks supporting Hoare style reasoning for assembly code certifi- cation. The instruction execution may change pointer information in Π, N and D, and the change can be deduced according to the pointer logic at assembly level [11]. Besides function call, the instructions which may change the pointer information include movl, pushl etc. (we call them assignment instructions in general). The rules for these instructions in the pointer logic at assembly level are almost the same as the corresponding assignment rules at source level, except that variables are replaced by memory addresses, registers, and register indirect addresses etc. Since there is no complex access path in assembly programs, aliasing calculation at assembly level is much simpler than that at source level. To generate the proof of a basic block satisfying its preand postconditions, it is important to generate assertions between the instructions in the basic block. These assertions can be generated from the pre- and postconditions of the basic block. According to P, Q and the corresponding machine state, the verification framework at assembly level checks whether {{P}} ι {{Q}} holds for each instruction ι in the basic block. Pointer-related assertions are generated and inserted into the basic block in a forward manner, with integer-related assertions in a backward manner. Pointerrelated assertions at assembly level are rather difficult to understand, so we take a basic block with integer-related assertions as an example to explain how to generate and insert assertions between instructions. The source program in Figure 4 is a function to multiply integer m by n. The assertions in notation {{ }} are provided by programmers. The precondition of the function is represented as the assertion in the first line; in line 12 is the postcondition. The loop invariant for while is in line 6. The symbol “result” represents the return value of the function. The assembly code in the same figure corresponds to the while loop (line 6-9), where the addresses of n, m, x and y in the stack are represented by 12(%ebp), 16(%ebp), -16(%ebp) and -20(%ebp) respectively, and ebp is the base pointer of current stack frame. Then, we take the basic block ltrue (the basic block which are numbered in Figure 4) as an example to explain the generation of the proof. The precondition p of the basic block is λS.S(S(ebp) − 16) = S(S(ebp) + 16) × S(S(ebp) − 20)∧ S(S(ebp) − 20) ≤ S(S(ebp) + 12)∧ S(S(ebp) − 20) < S(S(ebp) + 12). (Note that the corresponding assertion at source level is {{x == m * y ∧ y <= n ∧ y < n}}.) The postcondition q (q is also the precondition of the succeeding basic block lloop) is λS.S(S(ebp) − 16) = S(S(ebp) + 16) × S(S(ebp) − 20)∧ S(S(ebp) − 20) ≤ S(S(ebp) + 12). (Similarly, the corresponding assertion at source level is {{x == m * y ∧ y <= n}}, i.e., the loop invariant.) In these assertions, S denotes the current machine state. If the postcondition of an instruction ι is Q, its precondition P can be achieved through the formula: (note that P and Q are parameterized by a machine state S) P = gp(ι, Q) = λS.Q (upd(S, ι)), where the function upd(S, ι) represents the operational semantics of the instruction ι, and the return value of upd(S, ι) is a machine state after the execution of the instruction ι in state S. Figure 5 shows the operational semantics and precondition generation formulas of some instructions. S[a → b] is a machine state, and it means: S[a → b](c) = b if c = a, S(c) if c = a. For example, suppose the postcondition for instruction “movl r1, r2” is λS.S(r2)=3, according to gp((movl r1, r2), Q) in Figure 5, the precondition is λS.((λS .S (r2) = 3) S[r2 → S(r1)]), that is, λS.S[r2 → S(r1)](r2)=3. According to the definition of S[a → b], the precondition can be reduced to λS.S(r1)=3. Using gp(ι, Q), we can get the precondition of each instruction backward from the postcondition of the basic
1100p: {n>=0} cmpl %esi,%eax 2 ltrue int mult (m,n)f jg lfalse: 3 int x,y; 4 movl -16(%ebp),%eax x=0; lend 5 y=0; imp while (y n){x ==m *y A y <n} ltrue: 7 movl 16(%ebp),%eax XX m; 2 addl 8 y=y+1; -16(%ebp),%eax 9 3 movl %eax,-16(%ebp) 4 movl 10 $1,%eax return x; 11 5 addl -20(%ebp),%eax 12 6 movl fresult =m n} %eax,-20(%ebp) 7 jmp 1100p lend: Figure 4.The source code and corresponding assembly code of mult(m,n) Instruction: Operational semantics:upd(S,Precondition:gp(,Q) movlr1.r2 Sr2-→S(r1)I AS.Q(Sr2→S(r1) movlr1,z(r2) Sz+S(r2)→S(r1, λS.Q(Sz+S(r2)→S(r1) if(z+S(r2)∈Dom(S) movlz,r Sr→z, λS.Q(Sr→z) jmp 1 S addlz(r1),r2 Sr2→S(z+S(r1)+S(r2J λS.Q(Sr2→S(z+S(r1)+S(r2J if(z+S(r1)∈Dom(S) movl z(r1),r2 S[r2→S(z+S(r1)小, λS.Q(Sr2→S(z+S(r1)]) if(z+S(r1)∈Dom(S) Figure 5.Operational semantics and assertion generation of some instructions block.For example,from postcondition q.we can get the And the VC at the entry point of the basic block is: precondition of"jmp lloop"(the instruction labeled with 7): AS.S(S(ebp)-16)=S(S(ebp)+16)x S(S(ebp)-20)A S(S(ebp)-20)<S(S(ebp)+12)A S(S(ebp)-20)<S(S(ebp)+12) AS.S(S(ebp)-16)=S(S(ebp)+16)x S(S(ebp)-20)A (AS.S(S(ebp)-16)+S(S(ebp)+16) S(S(ebp)-20)<S(S(ebp)+12). =S(S(ebp)+16)×(S(S(ebp)-20)+1)A S(S(ebp)-20)+1<S(S(ebp)+12)) That is,the precondition of the basic block should imply the The preconditions of the rest instructions can be generated precondition of the first instruction in the basic block.And in a similar way.The precondition of the first instruction is: this VC is easy to prove obviously. When we get all the assertions between the instructions, there is no difficulty to generate the proof of PQ}for AS.S(S(ebp)-16)+S(S(ebp)+16) each instruction t in a basic block.And it is also easy to =S(S(ebp)+16)×(S(S(ebp)-20)+1)A generate a proof for the basic block if we have the proofs for all the instructions in the basic block and the proof for S(S(ebp)-20)+1≤S(S(ebp)+12). the VC of the basic block
1 {{n >= 0}} 2 int mult (m, n){ 3 int x, y; 4 x = 0; 5 y = 0; 6 while (y < n) { {{x == m * y ∧ y <= n}} 7 x = x + m; 8 y = y + 1; 9 } 10 return x; 11 } 12 {{result == m * n}} lloop: ··· cmpl %esi, %eax jg ltrue lfalse: movl -16(%ebp), %eax jmp lend ltrue: 1 movl 16(%ebp), %eax 2 addl -16(%ebp), %eax 3 movl %eax, -16(%ebp) 4 movl $1, %eax 5 addl -20(%ebp), %eax 6 movl %eax, -20(%ebp) 7 jmp lloop lend: ··· Figure 4. The source code and corresponding assembly code of mult (m, n) Instruction: ι Operational semantics: upd(S, ι) Precondition: gp(ι, Q) movl r1, r2 S[r2 → S(r1)] λS.Q (S[r2 → S(r1)]) movl r1, z(r2) S[z + S(r2) → S(r1)], λS.Q (S[z + S(r2) → S(r1)]) if (z + S(r2)) ∈ Dom(S) movl z, r S[r → z], λS.Q (S[r → z]) jmp l S Q addl z(r1), r2 S[r2 → S(z + S(r1)) + S(r2)], λS.Q (S[r2 → S(z + S(r1)) + S(r2)]) if (z + S(r1)) ∈ Dom(S) movl z(r1), r2 S[r2 → S(z + S(r1))], λS.Q (S[r2 → S(z + S(r1))]) if (z + S(r1)) ∈ Dom(S) Figure 5. Operational semantics and assertion generation of some instructions block. For example, from postcondition q, we can get the precondition of “jmp lloop” (the instruction labeled with 7): λS.S(S(ebp) − 16) = S(S(ebp) + 16) × S(S(ebp) − 20)∧ S(S(ebp) − 20) ≤ S(S(ebp) + 12). The preconditions of the rest instructions can be generated in a similar way. The precondition of the first instruction is: λS.S(S(ebp) − 16) + S(S(ebp) + 16) = S(S(ebp) + 16) × (S(S(ebp) − 20) + 1)∧ S(S(ebp) − 20) + 1 ≤ S(S(ebp) + 12). And the VC at the entry point of the basic block is: λS.S(S(ebp) − 16) = S(S(ebp) + 16) × S(S(ebp) − 20)∧ S(S(ebp) − 20) ≤ S(S(ebp) + 12)∧ S(S(ebp) − 20) < S(S(ebp) + 12) ⊃ (λS.S(S(ebp) − 16) + S(S(ebp) + 16) = S(S(ebp) + 16) × (S(S(ebp) − 20) + 1)∧ S(S(ebp) − 20) + 1 ≤ S(S(ebp) + 12)). That is, the precondition of the basic block should imply the precondition of the first instruction in the basic block. And this VC is easy to prove obviously. When we get all the assertions between the instructions, there is no difficulty to generate the proof of {{P}} ι {{Q}} for each instruction ι in a basic block. And it is also easy to generate a proof for the basic block if we have the proofs for all the instructions in the basic block and the proof for the VC of the basic block
6.Related work Colby et al.implemented a certifying compiler called Spe- cial J [3]for a large subset of Java.It compiled Java byte Verifying compiler [8]is a research topic proposed in re- code into target code for Intel IA32 architecture.The ma- cent years.It uses mathematical and logical reasoning to jor advance of Special J over Touchstone was the scope of check the correctness of the program that it compiles be- the source language compiled,which in turn necessitated fore the program executes.The criterion of correctness is the handling of non-trivial run-time mechanisms such as specified by types,assertions,and other redundant annota- object representation,dynamic method dispatch and excep- tions associated with the program.Now it can be classified tion handling.Moreover,Special J was freely able to apply into two research directions on this topic.One is to prove many standard local and global optimizations. the correctness of the compiler formally,that is,the com- Our design has the following significant differences from piler carries a formal proof which confirms the correctness Touchstone and Special J: of the generated objective code.Such a compiler is called 1.PointerC has more pointer types and operations,and a certified compiler.The other is to prove the correctness also provides dynamic storage allocation and deallo- of the compiled code formally,i.e.,the code carries a for- cation.These features make it suitable for writing mal proof to guarantee its correctness,safety or some other system-level programs.Both Touchstone and Special J properties and the proof can be checked by a code consumer have no dynamic storage deallocation. independently.This kind of compiler is called a certifying compiler.Obviously,this paper presents the design of parts 2.We use some new techniques to handle the features of of a certifying compiler. the language equipped with both a type system and a In the research area of certified compiler,Moore was logic system.For example,our VC generator can per- one of the first to mechanically verify semantic preserva- form both forward and backward VC generations,and tion for a compiler [14],although for a custom language the theorem prover embedded in the compiler and the and a custom processor that were not commonly used.Af- Coq proof assistant can be used to prove VCs for dif- ter that,more compilers were verified,including a compiler ferent types of data respectively. for a subset of Common Lisp,a byte-code compiler for a 3.To certify the proof-carrying assembly code,we adopt subset of Java,and a compiler for a tiny subset of C.The an approach that applies the method of Hoare logic di- most recent typical work is the certification of a lightly- rectly to the operational semantics of Intel x86 ISA. optimizing back end that generates PowerPC assembly code We use machine state as the parameter of assertions,so from a simple imperative intermediate language called Cmi- as to describe program properties which may change nor [10]by Leroy.A front end translating a subset of C to with the machine state.These properties are usu- Cminor is under development and certification.One of the ally difficult to be expressed in common type systems. novel features of his work is to emphasize the certification Touchstone and Special J mainly focus on type proper- of a complete compilation chain instead of parts of a com- ties of programs,and rarely concern complicated pro- piler.Another novelty is that most of the compiler is written gram properties directly in the Coq specification language,in a purely func- tional style. 4.Due to the simplicity of the source language,loop in- Generally,it is much easier to prove the correctness of variants which only concern types can be generated au- a calculation result than to prove the correctness of the cal- tomatically in Touchstone and Special J.In our certify- culation itself,so that certifying compiler has more possi- ing compiler,loop invariants may contain more infor- bility to come into use than certified compiler.Necula first mation than types,and it should be provided by pro- proposed the concept of Proof-Carrying Code,and imple- grammers explicitly. mented a certifying compiler called Touchstone [17].This compiler was composed of a traditional optimizing com- 7.Conclusion piler for a type-safe subset of C and a certifier that automati- cally produced a proof of type safety and memory safety for This paper presents the design and implementation of a each assembly program.A proof checker could be used to certifying compiler for PointerC(a programming language check the generated proofs automatically.Since the source equipped with both a type system and a logic system).Al- programs compiled by Touchstone were written in a very though the implemented certifying compiler prototype only small safe subset of C,their type safety and memory safety enforces the primary safety policies such as type safety and were easy to be checked.The main limitations of the source memory safety,the whole framework can fit in the situation language in Touchstone were in the aspects of pointer types where user-defined safety policies are allowed.This paper and memory deallocation,which made the language only only concentrates on the main work of the certifying com- used in writing programs with simple data structures.Later, piler and skips the translation of assertions,VCs and proofs
6. Related work Verifying compiler [8] is a research topic proposed in recent years. It uses mathematical and logical reasoning to check the correctness of the program that it compiles before the program executes. The criterion of correctness is specified by types, assertions, and other redundant annotations associated with the program. Now it can be classified into two research directions on this topic. One is to prove the correctness of the compiler formally, that is, the compiler carries a formal proof which confirms the correctness of the generated objective code. Such a compiler is called a certified compiler. The other is to prove the correctness of the compiled code formally, i.e., the code carries a formal proof to guarantee its correctness, safety or some other properties and the proof can be checked by a code consumer independently. This kind of compiler is called a certifying compiler. Obviously, this paper presents the design of parts of a certifying compiler. In the research area of certified compiler, Moore was one of the first to mechanically verify semantic preservation for a compiler [14], although for a custom language and a custom processor that were not commonly used. After that, more compilers were verified, including a compiler for a subset of Common Lisp, a byte-code compiler for a subset of Java, and a compiler for a tiny subset of C. The most recent typical work is the certification of a lightlyoptimizing back end that generates PowerPC assembly code from a simple imperative intermediate language called Cminor [10] by Leroy. A front end translating a subset of C to Cminor is under development and certification. One of the novel features of his work is to emphasize the certification of a complete compilation chain instead of parts of a compiler. Another novelty is that most of the compiler is written directly in the Coq specification language, in a purely functional style. Generally, it is much easier to prove the correctness of a calculation result than to prove the correctness of the calculation itself, so that certifying compiler has more possibility to come into use than certified compiler. Necula first proposed the concept of Proof-Carrying Code, and implemented a certifying compiler called Touchstone [17]. This compiler was composed of a traditional optimizing compiler for a type-safe subset of C and a certifier that automatically produced a proof of type safety and memory safety for each assembly program. A proof checker could be used to check the generated proofs automatically. Since the source programs compiled by Touchstone were written in a very small safe subset of C, their type safety and memory safety were easy to be checked. The main limitations of the source language in Touchstone were in the aspects of pointer types and memory deallocation, which made the language only used in writing programs with simple data structures. Later, Colby et al. implemented a certifying compiler called Special J [3] for a large subset of Java. It compiled Java byte code into target code for Intel IA32 architecture. The major advance of Special J over Touchstone was the scope of the source language compiled, which in turn necessitated the handling of non-trivial run-time mechanisms such as object representation, dynamic method dispatch and exception handling. Moreover, Special J was freely able to apply many standard local and global optimizations. Our design has the following significant differences from Touchstone and Special J: 1. PointerC has more pointer types and operations, and also provides dynamic storage allocation and deallocation. These features make it suitable for writing system-level programs. Both Touchstone and Special J have no dynamic storage deallocation. 2. We use some new techniques to handle the features of the language equipped with both a type system and a logic system. For example, our VC generator can perform both forward and backward VC generations, and the theorem prover embedded in the compiler and the Coq proof assistant can be used to prove VCs for different types of data respectively. 3. To certify the proof-carrying assembly code, we adopt an approach that applies the method of Hoare logic directly to the operational semantics of Intel x86 ISA. We use machine state as the parameter of assertions, so as to describe program properties which may change with the machine state. These properties are usually difficult to be expressed in common type systems. Touchstone and Special J mainly focus on type properties of programs, and rarely concern complicated program properties. 4. Due to the simplicity of the source language, loop invariants which only concern types can be generated automatically in Touchstone and Special J. In our certifying compiler, loop invariants may contain more information than types, and it should be provided by programmers explicitly. 7. Conclusion This paper presents the design and implementation of a certifying compiler for PointerC (a programming language equipped with both a type system and a logic system). Although the implemented certifying compiler prototype only enforces the primary safety policies such as type safety and memory safety, the whole framework can fit in the situation where user-defined safety policies are allowed. This paper only concentrates on the main work of the certifying compiler and skips the translation of assertions, VCs and proofs
of VCs.The design of the theorem prover,which is em- [10]X.Leroy.Formal certification of a compiler back-end or: bedded in the compiler to prove pointer-related VCs,is also programming a compiler with a proof assistant.In POPL skipped in this paper. '06:Conference record of the 33rdACM SIGPLAN-SIGACT We are improving PointerC and its certifying com- symposium on Principles of programming languages,pages piler:relaxing the restrictions on pointer arithmetic op- 42-54,New York,NY,USA,2006.ACM Press. erations and allowing calloc which is often used in pro- [11]Z.Li.Cog implementation of the soundness proof of fcap grams.We are also studying an embedded theorem prover (description).http://ssg.ustcsz.edu.cn/Iss/software/index. html. for integer-related VCs,and exploring the possibilities to [12]Z.Li.A framework of function-based certified assembly replace the current proof-assistant tool.The influences of programming.http://ssg.ustcsz.edu.cn/lss/doc/index.html. proof-carrying compilation on code optimizations are also [13]Y.Mandelbaum,D.Walker,and R.Harper.An effective the- under consideration. ory of type refinements.In ICFP'03:Proceedings of the eighth ACM SIGPLAN international conference on Func- 8.Acknowledgements tional programming,pages 213-225,New York,NY,USA, 2003.ACM Press. [14]J.S.Moore.Piton:a mechanically verified assembly- We thank Professor Zhong Shao in Yale University and language.Kluwer Academic Publishers,Norwell,MA anonymous referees for suggestions and comments on an 1996. earlier version of this paper.This research is based on work [15]G.Morrisett,D.Walker,K.Crary,and N.Glew.From supported in part by grants from Intel China Research Cen- system f to typed assembly language.In POPL '98:Pro- ter and National Natural Science Foundation of China under ceedings of the 25th ACM SIGPLAN-SIGACT symposium on Grant No.60673126.Any opinions,findings and conclu- Principles of programming languages,pages 85-97,New sions contained in this document are those of the authors York.NY,USA.1998.ACM Press. [16]G.C.Necula.Proof-carrying code.In POPL '97:Pro- and do not reflect the views of these agencies. ceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages,pages 106-119,New References York,NY,USA.1997.ACM Press. [17]G.C.Necula and P.Lee.The design and implementation [1]A.W.Appel.Foundational proof-carrying code.In L/CS of a certifying compiler.In PLDI '98:Proceedings of the '0l:Proceedings of the 16th Annual IEEE Symposium on ACM SIGPLAN 1998 conference on Programming language Logic in Computer Science,pages 247-258.Washington, design and implementation,pages 333-344,New York,NY. DC,USA,2001.IEEE Computer Society. USA.1998.ACM Press. [2]Y.Chen,B.Hua,L.Ge,and W.Zhifang. [18]H.Xi.Applied type system (extended abstract).In post- pointer logic for safety verification of pointer programs. workshop Proceedings of TYPES 2003.pages 394-408. http://ssg.ustcsz.edu.cn/lss/papers/index.html. Springer-Verlag LNCS 3085,2004. [3]C.Colby,P.Lee,G.C.Necula,F.Blau,M.Plesko,and [19]D.Yu,N.A.Hamid,and Z.Shao.Building certified libraries K.Cline.A certifying compiler for java.In PLD/'00: for pcc:dynamic storage allocation.Sci.Comput.Program.. Proceedings of the ACM SIGPLAN 2000 conference on Pro- 50(1-3):101-127.2004. gramming language design and implementation,pages 95- 107.New York.NY.USA.2000.ACM Press. [4]Cog Development Team.The Cog proof assistant reference manual.Cog release v8.0,Oct.2005. [5]E.W.Dijkstra.A discipline of programming.Prentice-Hall, Englewood Cliffs,New Jersey,1976. [6]X.Feng.Z.Shao,A.Vaynberg.S.Xiang.and Z.Ni.Mod- ular verification of assembly code with stack-based con- trol abstractions.In PLDI '06:Proceedings of the 2006 ACM SIGPLAN conference on Programming language de- sign and implementation,pages 401-414,New York,NY, USA.2006.ACM Press. [7]N.A.Hamid,Z.Shao,V.Trifonov,S.Monnier,and Z.Ni. A syntactic approach to foundational proof-carrying code. In LICS'02:Proceedings of the Seventeenth Annual IEEE Symposium on Logic In Computer Science,pages 89-100, Copenhagen,Denmark,July 2002.IEEE. [8]T.Hoare.The verifying compiler:A grand challenge for computing research.J.ACM,50(1):63-69.2003. [9]B.Hua and L.Ge.The definition of pointerc programming language.http://ssg.ustcsz.edu.cn/lss/doc/index.html
of VCs. The design of the theorem prover, which is embedded in the compiler to prove pointer-related VCs, is also skipped in this paper. We are improving PointerC and its certifying compiler: relaxing the restrictions on pointer arithmetic operations and allowing calloc which is often used in programs. We are also studying an embedded theorem prover for integer-related VCs, and exploring the possibilities to replace the current proof-assistant tool. The influences of proof-carrying compilation on code optimizations are also under consideration. 8. Acknowledgements We thank Professor Zhong Shao in Yale University and anonymous referees for suggestions and comments on an earlier version of this paper. This research is based on work supported in part by grants from Intel China Research Center and National Natural Science Foundation of China under Grant No.60673126. Any opinions, findings and conclusions contained in this document are those of the authors and do not reflect the views of these agencies. References [1] A. W. Appel. Foundational proof-carrying code. In LICS ’01: Proceedings of the 16th Annual IEEE Symposium on Logic in Computer Science, pages 247–258, Washington, DC, USA, 2001. IEEE Computer Society. [2] Y. Chen, B. Hua, L. Ge, and W. Zhifang. A pointer logic for safety verification of pointer programs. http://ssg.ustcsz.edu.cn/lss/papers/index.html. [3] C. Colby, P. Lee, G. C. Necula, F. Blau, M. Plesko, and K. Cline. A certifying compiler for java. In PLDI ’00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 95– 107, New York, NY, USA, 2000. ACM Press. [4] Coq Development Team. The Coq proof assistant reference manual. Coq release v8.0, Oct. 2005. [5] E. W. Dijkstra. A discipline of programming. Prentice-Hall, Englewood Cliffs, New Jersey, 1976. [6] X. Feng, Z. Shao, A. Vaynberg, S. Xiang, and Z. Ni. Modular verification of assembly code with stack-based control abstractions. In PLDI ’06: Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 401–414, New York, NY, USA, 2006. ACM Press. [7] N. A. Hamid, Z. Shao, V. Trifonov, S. Monnier, and Z. Ni. A syntactic approach to foundational proof-carrying code. In LICS’02: Proceedings of the Seventeenth Annual IEEE Symposium on Logic In Computer Science, pages 89–100, Copenhagen, Denmark, July 2002. IEEE. [8] T. Hoare. The verifying compiler: A grand challenge for computing research. J. ACM, 50(1):63–69, 2003. [9] B. Hua and L. Ge. The definition of pointerc programming language. http://ssg.ustcsz.edu.cn/lss/doc/index.html. [10] X. Leroy. Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In POPL ’06: Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 42–54, New York, NY, USA, 2006. ACM Press. [11] Z. Li. Coq implementation of the soundness proof of fcap (description). http://ssg.ustcsz.edu.cn/lss/software/index. html. [12] Z. Li. A framework of function-based certified assembly programming. http://ssg.ustcsz.edu.cn/lss/doc/index.html. [13] Y. Mandelbaum, D. Walker, and R. Harper. An effective theory of type refinements. In ICFP’03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming, pages 213–225, New York, NY, USA, 2003. ACM Press. [14] J. S. Moore. Piton: a mechanically verified assemblylanguage. Kluwer Academic Publishers, Norwell, MA, 1996. [15] G. Morrisett, D. Walker, K. Crary, and N. Glew. From system f to typed assembly language. In POPL ’98: Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 85–97, New York, NY, USA, 1998. ACM Press. [16] G. C. Necula. Proof-carrying code. In POPL ’97: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 106–119, New York, NY, USA, 1997. ACM Press. [17] G. C. Necula and P. Lee. The design and implementation of a certifying compiler. In PLDI ’98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pages 333–344, New York, NY, USA, 1998. ACM Press. [18] H. Xi. Applied type system (extended abstract). In postworkshop Proceedings of TYPES 2003, pages 394–408. Springer-Verlag LNCS 3085, 2004. [19] D. Yu, N. A. Hamid, and Z. Shao. Building certified libraries for pcc: dynamic storage allocation. Sci. Comput. Program., 50(1-3):101–127, 2004