A pointer logic and certifying compiler CHEN Yiyun,GE Lin**,HUA Baojian,LI Zhaopeng, LIU Cheng,and WANG Zhifang Department of Computer Science and Technology University of Science and Technology of China, Hefei 230027.China Abstract.Proof-Carrying Code brings two big challenges to the re- search field of programming languages.One is to seek more expressive logics or type systems to specify or reason about the properties of low- level or high-level programs.The other is to study the technology of certifying compilation in which the compiler generates proofs for pro- grams with annotations.This paper presents our progress in the above two aspects.A pointer logic was designed for PointerC (a C-like pro- gramming language)in our research.As an extension of Hoare logic, our pointer logic expresses the change of pointer information for each statement in its inference rules to support program verification.Mean- while,based on the ideas from CAP(Certified Assembly Programming) and SCAP (Stack-based Certified Assembly Programming),a reasoning framework was built to verify the properties of object code in a Hoare style.And a certifying compiler prototype for PointerC was implemented based on this framework. The main contribution of this paper is the design of the pointer logic and the implementation of the certifying compiler prototype.In our certifying compiler,the source language contains rich pointer types and operations and also supports dynamic storage allocation and deallocation. Categories and Subject Descriptors.F.3.1 [Logics and Meanings of Programs]:Specifying and Verifying and Reasoning about Programs- Specification Techniques;D.2.4 [Software Engineering]:Software/Program Verification-safety proofs,formal methods; Keywords.software safety,Hoare logic,pointer logic,Proof-Carrying Code,certifying compiler 1 Introduction Proof-Carrying Code (PCC)[1]brings two big challenges to the research field of programming languages.One is to explore more expressive logics or type systems,so that the properties of low-level or high-level programs will be easily Received February 12,2007;accepted... Email:gelin@mail.ustc.edu.cn
A pointer logic and certifying compiler ? CHEN Yiyun, GE Lin??, HUA Baojian, LI Zhaopeng, LIU Cheng, and WANG Zhifang Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China Abstract. Proof-Carrying Code brings two big challenges to the research field of programming languages. One is to seek more expressive logics or type systems to specify or reason about the properties of lowlevel or high-level programs. The other is to study the technology of certifying compilation in which the compiler generates proofs for programs with annotations. This paper presents our progress in the above two aspects. A pointer logic was designed for PointerC (a C-like programming language) in our research. As an extension of Hoare logic, our pointer logic expresses the change of pointer information for each statement in its inference rules to support program verification. Meanwhile, based on the ideas from CAP (Certified Assembly Programming) and SCAP (Stack-based Certified Assembly Programming), a reasoning framework was built to verify the properties of object code in a Hoare style. And a certifying compiler prototype for PointerC was implemented based on this framework. The main contribution of this paper is the design of the pointer logic and the implementation of the certifying compiler prototype. In our certifying compiler, the source language contains rich pointer types and operations and also supports dynamic storage allocation and deallocation. Categories and Subject Descriptors. F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about ProgramsSpecification Techniques; D.2.4 [Software Engineering]: Software/Program Verification—safety proofs, formal methods; Keywords. software safety, Hoare logic, pointer logic, Proof-Carrying Code, certifying compiler 1 Introduction Proof-Carrying Code (PCC) [1] brings two big challenges to the research field of programming languages. One is to explore more expressive logics or type systems, so that the properties of low-level or high-level programs will be easily ? Received February 12, 2007; accepted ... . ?? Email: gelin@mail.ustc.edu.cn
2 A pointer logic and certifying compiler specified or reasoned about.The other is the research on certifying compilation. which explores how the compiler generates proofs for the compiled programs For the first challenge,the TAL (Typed Assembly Language)project [2] and the theory of type refinements 3 are two typical projects in type-based approaches,while PCC [1]and FPCC(Foundational Proof-Carrying Code)[4] are typical projects on logic-based techniques.Type-based and logic-based tech- niques are complementary to each other,and in recent years,some researchers have tried to combine those techniques.Shao et al.adopted a syntactic approach to FPCC,and proposed a simple and flexible framework to support Hoare-style reasoning for assembly code certification in their projects CAP(Certified Assem- bly Programming)[5]and SCAP (Stack-based CAP)[6]etc.The ATS (Applied Type System)project 7 proposed by Xi et al.extends the type system with a notion of program states,so that invariants on states could be captured in state- ful programming.By encoding Hoare logic in its type system,ATS can support Hoare-logic-like reasoning via the type system. For the second challenge,Necula implemented a certifying compiler [8]called Touchstone.It contains a traditional compiler for a small but type-safe subset of C and a certifier that automatically produces a proof of type safety for each assembly program produced by the compiler.Later,Colby et al.implemented Special J 9,a certifying compiler for a large subset of Java.It compiles Java byte code into target code for Intel's IA32 architecture. We have adopted PCC technology and certifying compilation to a program- ming language with dynamic storage allocation and deallocation-PointerCl(a safe C-like language defined in our work).And we have also implemented a certifying compiler prototype for PointerC.Pointer operations in PointerC are restricted:pointer variables can only be used in assignment,equality compari- son,dereference or as the parameters of functions:the address-of operator (& and pointer arithmetic are forbidden.We adopt an approach combining both type-based and logic-based techniques to reason about program properties.To make the type system simple and guarantee the safety of the language at the same time,side conditions are introduced into the typing rules to express the constraints on values.In order to check these side conditions statically,we have designed a pointer logic for PointerC.The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool.It collects pointer information in a forward manner.Such information can be used to prove that the program sat- isfies the side conditions in the typing rules and then to support value-sensitive static checking. The pointer logic is the main contribution of this paper.As extensions of Hoare logic,separation logic 10 and our pointer logic are both to reason about properties of pointer programs for shared mutable data structures.But there is a great difference between them.Our pointer logic concerns pointer aliasing.In the pointer logic,different access paths are assumed to be bound to different storage locations,unless it can be proved that they are bound to the same location 1 Hua B J and Ge L.The definition of PointerC programming language. http://ssg.ustcsz.edu.cn/Iss/doc/index.html.(in Chinese)
2 A pointer logic and certifying compiler specified or reasoned about. The other is the research on certifying compilation, which explores how the compiler generates proofs for the compiled programs. For the first challenge, the TAL (Typed Assembly Language) project [2] and the theory of type refinements [3] are two typical projects in type-based approaches, while PCC [1] and FPCC (Foundational Proof-Carrying Code) [4] are typical projects on logic-based techniques. Type-based and logic-based techniques are complementary to each other, and in recent years, some researchers have tried to combine those techniques. Shao et al. adopted a syntactic approach to FPCC, and proposed a simple and flexible framework to support Hoare-style reasoning for assembly code certification in their projects CAP (Certified Assembly Programming) [5] and SCAP (Stack-based CAP) [6] etc. The ATS (Applied Type System) project [7] proposed by Xi et al. extends the type system with a notion of program states, so that invariants on states could be captured in stateful programming. By encoding Hoare logic in its type system, ATS can support Hoare-logic-like reasoning via the type system. For the second challenge, Necula implemented a certifying compiler [8] called Touchstone. It contains a traditional compiler for a small but type-safe subset of C and a certifier that automatically produces a proof of type safety for each assembly program produced by the compiler. Later, Colby et al. implemented Special J [9], a certifying compiler for a large subset of Java. It compiles Java byte code into target code for Intel’s IA32 architecture. We have adopted PCC technology and certifying compilation to a programming language with dynamic storage allocation and deallocation—PointerC1 (a safe C-like language defined in our work). And we have also implemented a certifying compiler prototype for PointerC. Pointer operations in PointerC are restricted: pointer variables can only be used in assignment, equality comparison, dereference or as the parameters of functions; the address-of operator (&) and pointer arithmetic are forbidden. We adopt an approach combining both type-based and logic-based techniques to reason about program properties. To make the type system simple and guarantee the safety of the language at the same time, side conditions are introduced into the typing rules to express the constraints on values. In order to check these side conditions statically, we have designed a pointer logic for PointerC. The pointer logic is an extension of Hoare logic and essentially is a pointer analysis tool. It collects pointer information in a forward manner. Such information can be used to prove that the program satisfies the side conditions in the typing rules and then to support value-sensitive static checking. The pointer logic is the main contribution of this paper. As extensions of Hoare logic, separation logic [10] and our pointer logic are both to reason about properties of pointer programs for shared mutable data structures. But there is a great difference between them. Our pointer logic concerns pointer aliasing. In the pointer logic, different access paths are assumed to be bound to different storage locations, unless it can be proved that they are bound to the same location 1 Hua B J and Ge L. The definition of PointerC programming language. http://ssg.ustcsz.edu.cn/lss/doc/index.html. (in Chinese)
Front.Comput.Sci.China (2007),Research Article 3 (those bound to the same location are aliases).Therefore,equality information of effective pointers is needed to deduce the access paths that are bound to the same location.And this is also the reason why the pointer logic need not introduce new connectives.Separation logic can only handle programming languages in which all access paths are simple,so it only concerns where a pointer points to.In separation logic,pointers are assumed to potentially point to the same location, unless they are explicitly expressed to point to different locations.Therefore,it needs to introduce new logical connectives such as separating conjunction "*"etc. While designing inference rules for the pointer logic,we extends Hoare logic in the viewpoint of pointer analysis.The inference rules are designed to be suitable for the collection of pointer information in a forward way.Such information can be used to check the legality and other properties of a program.Separation logic extends Hoare logic in the viewpoint of program verification,and its inference rules are designed to fit backward reasoning. In addition,in source programs,it is common to use expressions with the form idi->id2->id3->...->idn,such as s->next->next,but this kind of ex- pressions could not be used directly in the assertions of separation logic because they involve several separated portions of addressable storage.This makes it difficult for separation logic assertions to be expressed directly using program expressions.In assertions of separation logic,accessibility is explicitly repre- sented by ""This makes the inference rules simple and clear,but when several access paths are bound to the same location,which is common in source programs,the explicit representation will increase the steps of deduction.For example,considering the program fragment: p malloc(...);*p a; {*p=a} 9=P;*g=b; {*p=b} where expressions in notation {are assertions.When using separation logic, the deduction from assertion *p ==a to assertion *p==b has more steps than that using the pointer logic Another contribution of this paper is the implementation of a certifying com- piler prototype.Our certifying compiler supports a source language (PointerC) equipped with both a type system and a logic system.And compared with the source languages compiled in Touchstone and Special J,PointerC has more pointer types and operations,and provides dynamic storage allocation and deal- location as well.These features make PointerC suitable for writing system-level programs. In this paper,we present the pointer logic designed for PointerC and the certifying compiler prototype that has been implemented.The rest of the paper is organized as follows.In section 2,we explain the design of the pointer logic; section 3 describes the assembly-level pointer logic system added in an assembly code certification framework;section 4 discusses the design of the certifying compiler prototype;section 5 gives an example;section 7 compares our work with related work and section 8 concludes
Front. Comput. Sci. China (2007), Research Article 3 (those bound to the same location are aliases). Therefore, equality information of effective pointers is needed to deduce the access paths that are bound to the same location. And this is also the reason why the pointer logic need not introduce new connectives. Separation logic can only handle programming languages in which all access paths are simple, so it only concerns where a pointer points to. In separation logic, pointers are assumed to potentially point to the same location, unless they are explicitly expressed to point to different locations. Therefore, it needs to introduce new logical connectives such as separating conjunction “*”etc. While designing inference rules for the pointer logic, we extends Hoare logic in the viewpoint of pointer analysis. The inference rules are designed to be suitable for the collection of pointer information in a forward way. Such information can be used to check the legality and other properties of a program. Separation logic extends Hoare logic in the viewpoint of program verification, and its inference rules are designed to fit backward reasoning. In addition, in source programs, it is common to use expressions with the form id1->id2->id3->...->idn, such as s->next->next, but this kind of expressions could not be used directly in the assertions of separation logic because they involve several separated portions of addressable storage. This makes it difficult for separation logic assertions to be expressed directly using program expressions. In assertions of separation logic, accessibility is explicitly represented by “7→” . This makes the inference rules simple and clear, but when several access paths are bound to the same location, which is common in source programs, the explicit representation will increase the steps of deduction. For example, considering the program fragment: p = malloc(...); *p = a; {{{*p == a}} q = p; *q = b; {{{*p == b}} where expressions in notation {{ }} are assertions. When using separation logic, the deduction from assertion *p == a to assertion *p == b has more steps than that using the pointer logic. Another contribution of this paper is the implementation of a certifying compiler prototype. Our certifying compiler supports a source language (PointerC) equipped with both a type system and a logic system. And compared with the source languages compiled in Touchstone and Special J, PointerC has more pointer types and operations, and provides dynamic storage allocation and deallocation as well. These features make PointerC suitable for writing system-level programs. In this paper, we present the pointer logic designed for PointerC and the certifying compiler prototype that has been implemented. The rest of the paper is organized as follows. In section 2, we explain the design of the pointer logic; section 3 describes the assembly-level pointer logic system added in an assembly code certification framework; section 4 discusses the design of the certifying compiler prototype; section 5 gives an example; section 7 compares our work with related work and section 8 concludes
4 A pointer logic and certifying compiler 2 The pointer logic A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.It is mainly used to eliminate context-sensitive errors in programs.A traditional type system is not enough when the legality of a phrase depends not only on the context but also on some expression values in the phrase.One solution is to use dependent types 7.Another solution is not to deem the constraints on values as parts of the type system,and the constraints don't appear in the typing rules.The former makes the type system complicated and the latter makes it an unsound type system. We try to investigate a trade-off between these two solutions in the design of PointerC.To make the type system simple and guarantee the safety of the language at the same time,side conditions are introduced into the typing rules to express the constraints on values.For example,in the following two typing rules, side conditions contain the constraints on subscript expressions and pointers TFe:int TFa:array(10,bool) 0≤e∧ex:int p∈effective-ptrs These side conditions must be checked.And to check these side conditions statically,we have designed a pointer logic for PointerC.The design of the pointer logic is inspired by the following facts and speculations: 1.At a program point,if we know whether a pointer is an effective pointer (a pointer which points to an object),a null pointer or a dangling pointer and the equality relation between all the effective pointers,we can deduce whether a pointer operation is safe or not.And we can also capture the change of the pointer information after this operation. 2.A logic system may not able to prove the safety of all programs.But it is still feasible if it can prove the safety of most programs. 3.The pointer information collected by Hoare-style reasoning can be used to reason about program properties.It amounts to combining program analysis with program verification and this combination has a brilliant future. To present the pointer logic,we make clear some pointer-related terms and notations first,and then explain some elementary operations used in the pointer logic.After that,we introduce the inference rules in our pointer logic. 2.1 Conventions,terminologies and notations Dynamic variables can only be accessed through declared pointer variables.For example,s->data,s->next->pre,*s and *s [5]etc.Such expressions are called symbolic access paths of variables,access paths for short.Note that s->next is
4 A pointer logic and certifying compiler 2 The pointer logic A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute. It is mainly used to eliminate context-sensitive errors in programs. A traditional type system is not enough when the legality of a phrase depends not only on the context but also on some expression values in the phrase. One solution is to use dependent types [7]. Another solution is not to deem the constraints on values as parts of the type system, and the constraints don’t appear in the typing rules. The former makes the type system complicated and the latter makes it an unsound type system. We try to investigate a trade-off between these two solutions in the design of PointerC. To make the type system simple and guarantee the safety of the language at the same time, side conditions are introduced into the typing rules to express the constraints on values. For example, in the following two typing rules, side conditions contain the constraints on subscript expressions and pointers. Γ `e : int Γ `a : array(10, bool) Γ `a[e] : bool 0 ≤ e ∧ e ≤ 9 Γ `p : ptr(struct(. . . , x:int, . . . )) Γ `p->x : int p ∈ effective ptrs These side conditions must be checked. And to check these side conditions statically, we have designed a pointer logic for PointerC. The design of the pointer logic is inspired by the following facts and speculations: 1. At a program point, if we know whether a pointer is an effective pointer (a pointer which points to an object), a null pointer or a dangling pointer and the equality relation between all the effective pointers, we can deduce whether a pointer operation is safe or not. And we can also capture the change of the pointer information after this operation. 2. A logic system may not able to prove the safety of all programs. But it is still feasible if it can prove the safety of most programs. 3. The pointer information collected by Hoare-style reasoning can be used to reason about program properties. It amounts to combining program analysis with program verification and this combination has a brilliant future. To present the pointer logic, we make clear some pointer-related terms and notations first, and then explain some elementary operations used in the pointer logic. After that, we introduce the inference rules in our pointer logic. 2.1 Conventions, terminologies and notations Dynamic variables can only be accessed through declared pointer variables. For example, s->data, s->next->pre, *s and *s[5] etc. Such expressions are called symbolic access paths of variables, access paths for short. Note that s->next is
Front.Comput.Sci.China(2007),Research Article 5 not an access path if s is a null or a dangling pointer.For convenience,duplicated parts in an access path are abbreviated in the pointer logic.For example,access path s->next->next->...->next ("->next"is repeated i times)is abbreviated to s(->next)1,and if i=0,s(->next)i just represents s.Pointer variables (in- cluding dynamic pointer variables)are called pointers for short,and they are classified into effective pointers,null pointers and dangling pointers,and the latter two are also called ineffective pointers.In the rest of this paper,two dif- ferent concepts-equality and aliasing of pointers-are often mentioned.When two pointers are aliases,it means that their l-values are equal;and when they are equal,it means that their r-values are equal.Because of the limitations on pointer operations and the call-by-value calling convention in PointerC,a de- clared variable can not be aliased by other variables(in this paper,aliasing of array elements is not considered).Only if two pointers are equal,the access paths, which are formed by appending a common suffix to the two pointers,are aliases. For example,in Fig.1,u and v are equal pointers,but not aliases.u->next and v->next are aliases.Obviously,if we have the information of pointer equality, we can deduce whether two access paths are aliases.In the rest of the paper,we use metavariables p,q and r to represent access paths. 十 data data nex next Fig.l.Pointers inⅡ,V and D In the pointer logic,we use“W"to denote null pointer set,.“D”for dan- gling pointer set and "II"for effective pointer set at any program point.II is divided into equivalent sets according to the equality information of effective pointers.Pointers in one equivalent set are equal but not aliases.At any point of a program,all pointers (or their aliases)can be found in 1I,N or D.If a pointer has several aliases at this point,only one of them is in I7,N or D. For example,in Fig.1,II ={fu,v),{s,u->next)},N={t,s->next) D={w}.(In this figure,the cross in w represents meaningless location,and “八”represents null.)Note that the number of access paths inⅡis just the
Front. Comput. Sci. China (2007), Research Article 5 not an access path if s is a null or a dangling pointer. For convenience, duplicated parts in an access path are abbreviated in the pointer logic. For example, access path s->next->next->. . . ->next (“->next” is repeated i times) is abbreviated to s(->next)i , and if i=0, s(->next)i just represents s. Pointer variables (including dynamic pointer variables) are called pointers for short, and they are classified into effective pointers, null pointers and dangling pointers, and the latter two are also called ineffective pointers. In the rest of this paper, two different concepts—equality and aliasing of pointers—are often mentioned. When two pointers are aliases, it means that their l-values are equal; and when they are equal, it means that their r-values are equal. Because of the limitations on pointer operations and the call-by-value calling convention in PointerC, a declared variable can not be aliased by other variables (in this paper, aliasing of array elements is not considered). Only if two pointers are equal, the access paths, which are formed by appending a common suffix to the two pointers, are aliases. For example, in Fig. 1, u and v are equal pointers, but not aliases. u->next and v->next are aliases. Obviously, if we have the information of pointer equality, we can deduce whether two access paths are aliases. In the rest of the paper, we use metavariables p, q and r to represent access paths. Fig. 1. Pointers in Π, N and D In the pointer logic, we use “N ” to denote null pointer set, “D” for dangling pointer set and “Π” for effective pointer set at any program point. Π is divided into equivalent sets according to the equality information of effective pointers. Pointers in one equivalent set are equal but not aliases. At any point of a program, all pointers (or their aliases) can be found in Π, N or D. If a pointer has several aliases at this point, only one of them is in Π, N or D. For example, in Fig. 1, Π = {{u, v}, {s, u->next}}, N = {t, s->next}, D = {w}. (In this figure, the cross in w represents meaningless location, and “/\” represents null.) Note that the number of access paths in Π is just the
6 A pointer logic and certifying compiler number of arrows in the figure,and the arrows which point to the same object correspond to the pointers in the same equivalent set.Although I,N and D are sets,they are essentially logical expressions connected by conjunction "A", and can appear in the precondition or postcondition of a Hoare triple."is a short notation for II ANA D.For example,in Fig.1,the equivalent set u,v is actually a logical expression,and it means that u and v are equal but unequal to any of other pointers;all the pointer sets can be expressed as {u,v}A{s,u->next}Aft,s->next}NA{w)D,where the sets without subscript represent equivalent pointer sets,and the sets with subscript N and D repre- sent null pointer set and dangling pointer set respectively.The null pointer set [t,s->next}w can also be written as {t}Afs->next)N,and the same for the dangling pointer set D.If,at a program point,IT is an empty set,the assertion at this point contains no equivalent set;if {...IN or {..does not appear in the assertion,it means that null pointer set or dangling pointer set is empty. 2.2 Elementary operations Access paths are a special kind of strings that satisfies certain syntactical re- quirements,and in this paper,all strings represent the strings or substrings that form access paths.If an access path p is a prefix of q(but they are not the same), the value of the predicate prefix(p,q)is true,otherwise is false.The symbol ""represents the concatenation operation of two strings or a string set and a single string.If the operands are a string set S and a single string s,it makes every element in the set S concatenated by the string s: Ss≌S'where(sl‖s)∈Sifs∈S. If the values of two access paths ss2 and s1 are equal(note that neither of si and s2 are empty strings),then the access path s2 is called a cycle in sills2lls3 (s3 is also a non-empty string).The symbol "=is used to represent that two strings are syntactically identical. Next,we define some functions operated on access paths.The functions have or II as their parameters.But such parameters are all omitted for simplicity. 1)Function closure(p)calculates alias set for the access path p,and the aliases in this set must have the simplest form.This set is called the closure of p,and it contains and only contains all acyclic aliases of p,including p itself. closure(p)≌ if length(p)=1 then {p} else let (sill...llsn-1llsn)=p in compression(expansion(closure(sill...llsn-1))Isn), where function length(p)calculates the length of the access path p.The length represents the number of meaningful units in p,rather than the number of char- acters in p.For example,the length of the access path t->next->data is 3. Function expansion(S)expands the alias set S with the access paths whose values are equal to the elements in S.and its formal definition is:
6 A pointer logic and certifying compiler number of arrows in the figure, and the arrows which point to the same object correspond to the pointers in the same equivalent set. Although Π, N and D are sets, they are essentially logical expressions connected by conjunction “∧”, and can appear in the precondition or postcondition of a Hoare triple. “Ψ” is a short notation for Π ∧ N ∧ D. For example, in Fig. 1, the equivalent set {u, v} is actually a logical expression, and it means that u and v are equal but unequal to any of other pointers; all the pointer sets can be expressed as {u, v}∧{s,u->next}∧{t, s->next}N ∧{w}D, where the sets without subscript represent equivalent pointer sets, and the sets with subscript N and D represent null pointer set and dangling pointer set respectively. The null pointer set {t, s->next}N can also be written as {t}N ∧{s->next}N , and the same for the dangling pointer set D. If, at a program point, Π is an empty set, the assertion at this point contains no equivalent set; if {. . . }N or {. . . }D does not appear in the assertion, it means that null pointer set or dangling pointer set is empty. 2.2 Elementary operations Access paths are a special kind of strings that satisfies certain syntactical requirements, and in this paper, all strings represent the strings or substrings that form access paths. If an access path p is a prefix of q (but they are not the same), the value of the predicate prefix(p, q) is true, otherwise is false. The symbol “||” represents the concatenation operation of two strings or a string set and a single string. If the operands are a string set S and a single string s, it makes every element in the set S concatenated by the string s: S||s , S 0 where (s 0 ||s) ∈ S0 iff s 0 ∈ S. If the values of two access paths s1||s2 and s1 are equal (note that neither of s1 and s2 are empty strings), then the access path s2 is called a cycle in s1||s2||s3 (s3 is also a non-empty string). The symbol “≡” is used to represent that two strings are syntactically identical. Next, we define some functions operated on access paths. The functions have Ψ or Π as their parameters. But such parameters are all omitted for simplicity. 1) Function closure(p) calculates alias set for the access path p, and the aliases in this set must have the simplest form. This set is called the closure of p, and it contains and only contains all acyclic aliases of p, including p itself. closure(p) , if length(p) = 1 then {p} else let (s1|| . . . ||sn−1||sn) ≡ p in compression(expansion(closure(s1|| . . . ||sn−1))||sn), where function length(p) calculates the length of the access path p. The length represents the number of meaningful units in p, rather than the number of characters in p. For example, the length of the access path t->next->data is 3. Function expansion(S) expands the alias set S with the access paths whose values are equal to the elements in S, and its formal definition is:
Front.Comput.Sci.China(2007),Research Article 7 expansion(S) if 3s':(IIUNUD)).(Sns'!=0) then let [p1,...Pn}=S-S where S'∈(ⅡU{W}U{D})ASnS'I=0 in SUclosure(p)U...Uclosure(pn) else 0. Function compression(S)deletes all the cyclic access paths from the alias set S,and its formal definition is: compression(S)≌S-S' where (S'CS)A ((s1lls2lls3)ES' f(s1!=e)∧(s2!=e)∧(s多!=e)Λ(sls3)∈S)A(sls2=s1) For brevity,the above function closure is a definition,not a computing al- gorithm.For example,it does not consider the termination of the calculation concerning cyclic data structures,such as doubly-linked cyclic list.But,in the implementation of closure,it is easy to solve the problem of termination.And given the closure function,it is also simple to delete cycles from the access paths. So for convenience,it is assumed that all access paths in programs have the sim- plest form. 2)Function alias(p,q)gets an alias of p from the closure of access path p.The alias it gets must not use any alias of effective pointer q as a prefix.If there is no such an alias,it returns p. alias(p,q) let S=[p'closure(p)I Vq'closure(q).-prefix(q',p)} in if S==0 then p else p'where p'S 3)Function equals(p)gets the equivalent set of p.If one of the aliases of p appears in an equivalent set,the function returns the set,or else it returns empty set. equals(p)≌ if3S:Ⅱ.(Snclosure(p)!=0) then S where S∈ⅡA Sn closure(p)!=0else0 Next,we introduce the operations and predicates directly used in inference rules. These operations show how to get the in a postcondition based on the in the corresponding precondition. 4)Suppose S is an equivalent set in I7,and p is an effective pointer.For each pointer q in S which uses one of p's aliases as its prefix,S/p finds an alias of q using alias(q,p),substitutes the alias for q in S,and then deletes p's aliases and any other pointers which use p's aliases as their prefixes from S. S/p≌ let S'={q:S Vp'closure(p).-prefix(p',q) [3q:S.3p closure(p).(prefix(p',q)=alias(q,p))} in {q:S'(qE closure(p))AVp':closure(p).-prefix(p',q)}
Front. Comput. Sci. China (2007), Research Article 7 expansion(S) , if ∃S0 : (Π ∪ {N } ∪ {D}).(S ∩ S0!=∅) then let {p1 , . . . , pn} = S 0 − S where S 0 ∈ (Π ∪ {N } ∪ {D}) ∧ S ∩ S0!=∅ in S ∪ closure(p1 ) ∪ . . . ∪ closure(pn) else ∅. Function compression(S) deletes all the cyclic access paths from the alias set S, and its formal definition is: compression(S) , S − S0 where (S 0 ⊂ S) ∧ ((s1||s2||s3) ∈ S0 iff (s1!=²) ∧ (s2!=²) ∧ (s3!=²) ∧ ((s1||s3) ∈ S) ∧ (s1||s2 = s1)) For brevity, the above function closure is a definition, not a computing algorithm. For example, it does not consider the termination of the calculation concerning cyclic data structures, such as doubly-linked cyclic list. But, in the implementation of closure, it is easy to solve the problem of termination. And given the closure function, it is also simple to delete cycles from the access paths. So for convenience, it is assumed that all access paths in programs have the simplest form. 2) Function alias(p, q) gets an alias of p from the closure of access path p. The alias it gets must not use any alias of effective pointer q as a prefix. If there is no such an alias, it returns p. alias(p, q) , let S = {p 0 : closure(p) | ∀q 0 : closure(q).¬prefix(q 0 , p 0 )} in if S == ∅ then p else p 0 where p 0 ∈ S 3) Function equals(p) gets the equivalent set of p. If one of the aliases of p appears in an equivalent set, the function returns the set, or else it returns empty set. equals(p) , if ∃S : Π.(S ∩ closure(p)!=∅) then S where S ∈ Π ∧ S ∩ closure(p)!=∅ else ∅ Next, we introduce the operations and predicates directly used in inference rules. These operations show how to get the Ψ in a postcondition based on the Ψ in the corresponding precondition. 4) Suppose S is an equivalent set in Π, and p is an effective pointer. For each pointer q in S which uses one of p’s aliases as its prefix, S/p finds an alias of q using alias(q,p), substitutes the alias for q in S, and then deletes p’s aliases and any other pointers which use p’s aliases as their prefixes from S. S/p , let S 0 = {q : S | ∀p 0 : closure(p).¬prefix(p 0 , q)}∪ {q 0 | ∃q : S.∃p 0 : closure(p).(prefix(p 0 , q) ∧ q 0 ≡ alias(q, p))} in {q : S 0 | ¬(q ∈ closure(p)) ∧ ∀p 0 : closure(p).¬prefix(p 0 , q)}
8 A pointer logic and certifying compiler We use II/p to denote performing S/p for each S in II. When an effective pointer q is assigned a value which is not equal to q,I/q should be performed.For example,in Fig.2 where II ={fu,v->pre},[v,u->next}}, II/v={fu,u->next->pre},fu->next}}.(fu->next}means that u->next is an effective pointer and is not equal to any of other pointers. data data pre pre next ext Fig.2.Pointers in a doubly-linked cyclic list 5)M\p or D\p replaces the pointers in N or D which use p's aliases as their prefixes with their other aliases.A/p or D/p deletes the aliases of p from A or p Np q:N|Vp':closure(p).-prefix(p',q)}u q'q:N.p'closure(p).(prefix(p',q)'=alias(q,p))} NIp≌{q:WI(q∈closure(p)} N/{p1…pn}≌(NIp1).…/Pn) The formal definitions of D\p and D/p are similar. 6)We use the set operator U to represent the union of two pointer sets.Usually. this operator is used to add a pointer to a pointer set.And we also use set operators U and -and their combinations to represent the addition,deletion and substitution of equivalent sets. 7)Function no_leak(p)tests the result of S/p for S containing p(just testing,not really deleting).This testing is to avoid memory leaks caused by an assignment to p. no_leak(p)if equals(p)/p!=0 then true else false 2.3 Axioms and inference rules In this section,we define axioms and inference rules in our pointer logic.In the following rules,p and q are access paths;p.eq is p's equivalent set,and p.eq equals(p)
8 A pointer logic and certifying compiler We use Π/p to denote performing S/p for each S in Π. When an effective pointer q is assigned a value which is not equal to q, Π/q should be performed. For example, in Fig. 2 where Π = {{u, v->pre}, {v, u->next}}, Π/v = {{u, u->next->pre}, {u->next}}. ({u->next} means that u->next is an effective pointer and is not equal to any of other pointers.) Fig. 2. Pointers in a doubly-linked cyclic list 5) N \p or D\p replaces the pointers in N or D which use p’s aliases as their prefixes with their other aliases. N /p or D/p deletes the aliases of p from N or D. N \p , {q : N | ∀p 0 : closure(p).¬prefix(p 0 , q)}∪ {q 0 | ∃q : N .∃p 0 : closure(p).(prefix(p 0 , q) ∧ q 0 ≡ alias(q, p))} N /p , {q : N | ¬(q ∈ closure(p))} N /{p1 . . . pn} , ((N /p1 ). . . /pn) The formal definitions of D\p and D/p are similar. 6) We use the set operator ∪ to represent the union of two pointer sets. Usually, this operator is used to add a pointer to a pointer set. And we also use set operators ∪ and − and their combinations to represent the addition, deletion and substitution of equivalent sets. 7) Function no leak(p) tests the result of S/p for S containing p (just testing, not really deleting). This testing is to avoid memory leaks caused by an assignment to p. no leak(p) , if equals(p)/p!=∅ then true else false 2.3 Axioms and inference rules In this section, we define axioms and inference rules in our pointer logic. In the following rules, p and q are access paths; p.eq is p’s equivalent set, and p.eq = equals(p).
Front.Comput.Sci.China (2007),Research Article 9 In each of the following rules,the premises of the rule should be computed based on the y before the statement. 1)Pointer assignment:p=q. Here different inference rules are used in different cases. a)Both p and q are effective pointers or null pointers,and they are equal(in- cluding the case where p is a null pointer and g is constant NULL). (p.eq!=0A p==q)V(p==NULL A q==NULL) (RULE 1) [IIANAD}P=q(IIANAD) b)Both p and q are effective pointers,and they are not equal. (p.eq!=0∧q.eq!=0∧p!=q)∧no_leak(p) [IIANAD}p=q {((II-{q.eq})/pu{q.eq/pu(p)})ANpAD\p} (RULE2)】 c)p is a null pointer and q is an effective pointer. p==NULL∧q.eq!=0 [IAN A D}p=q{((II-{q.eq})ufq.equ(p)))AN/p D] (RULE3) d)p is a dangling pointer and q is an effective pointer. The rule for this case is similar to(RULE 3). e)p is an effective pointer and q equals NULL (including the case where q is constant NULL). p.eq!=0A q==NULL A no_leak(p) (RULE 4) (IIANAD}p=q {I/PA(NPU(p))AD\p} f)p is a dangling pointer and q equals NULL (including the case where q is constant NULL). p:DA q==NULL (RULE 5) IIANAD)p=q (II(NUp))AD/p} where p:D represents that an alias of p is in D 2)The assignment axiom for non-pointer-type data is as follows.It is an exten- sion of Hoare logic assignment axiom,and aliasing has been considered in this axiom.The axiom does not interfere with {(亚AQ)y1-.[yn←-xx←}x=e{亚AQ}(AXIOM6-1) where y1,...,yn represent all the members of closure(x)
Front. Comput. Sci. China (2007), Research Article 9 In each of the following rules, the premises of the rule should be computed based on the Ψ before the statement. 1) Pointer assignment: p=q. Here different inference rules are used in different cases. a) Both p and q are effective pointers or null pointers, and they are equal (including the case where p is a null pointer and q is constant NULL). (p.eq!=∅ ∧ p==q) ∨ (p==NULL ∧ q==NULL) {Π ∧ N ∧ D} p=q {Π ∧ N ∧ D} (rule 1) b) Both p and q are effective pointers, and they are not equal. (p.eq!=∅ ∧ q.eq!=∅ ∧ p!=q) ∧ no leak(p) {Π ∧ N ∧ D} p=q {((Π − {q.eq})/p ∪ {q.eq/p ∪ {p}}) ∧ N \p ∧ D\p} (rule 2) c) p is a null pointer and q is an effective pointer. p==NULL ∧ q.eq!=∅ {Π ∧ N ∧ D} p=q {((Π − {q.eq}) ∪ {q.eq ∪ {p}}) ∧ N /p ∧ D} (rule 3) d) p is a dangling pointer and q is an effective pointer. The rule for this case is similar to (RULE 3). e) p is an effective pointer and q equals NULL (including the case where q is constant NULL). p.eq!=∅ ∧ q==NULL ∧ no leak(p) {Π ∧ N ∧ D} p=q {Π/p ∧ (N \p ∪ {p}) ∧ D\p} (rule 4) f) p is a dangling pointer and q equals NULL (including the case where q is constant NULL). p : D ∧ q==NULL {Π ∧ N ∧ D} p=q {Π ∧ (N ∪ {p}) ∧ D/p} (rule 5) where p : D represents that an alias of p is in D 2) The assignment axiom for non-pointer-type data is as follows. It is an extension of Hoare logic assignment axiom, and aliasing has been considered in this axiom. The axiom does not interfere with Ψ. {(Ψ ∧ Q)[y1 ← x] . . . [yn ← x][x ← e]} x=e {Ψ ∧ Q} (axiom 6-1) where y1, . . . , yn represent all the members of closure(x)
10 A pointer logic and certifying compiler For assignment of pointer-type data.if besides y.there is another O which contains pointers(including pointers as prefixes),then another assignment axiom may also be useful: {ΨAQ}p=q{'AQr-p} (AXIOM 6-2) where r is a member of closure(p)(not p itself),and can be derived from using RULE 1 to 5.In another word,when assigning a pointer to p,the alias substitution of p should also occur in assertions besides v. Assertion Q in the above two axioms contains no pointer sets.And if there are any access paths in Q,they must be valid access paths according to current I7.All Qs in the following rules have this restriction. 3)Composition,if-then-else,while and Consequence rules are the same as the counterparts in Hoare logic.But note that y should not be strengthened or weakened in Consequence rule.For example,[p,q,r}does not imply [p,q),and [p,q}A fr}should not be weakened to fp,q). 4)V-modification axioms A-(p:A (p!=NULL)(IIU{p}))ANAD (AXIOM 7) 亚A(p:亚)A(p=NULL)Ⅱ∧(WU{p})AD (AXIOM 8) where p:亚indicates that an alias of p is in亚,and“”denotes implication. Some well-formed data structures,such as List and Binary Tree,contain no dangling pointer.So a pointer is either an effective pointer or a null pointer in such data structures.In a program,the effectiveness of such a pointer is usually determined through null-pointer test.Therefore,such a pointer can be added into II or N according to the different test results.These two axioms just modify II or A according to null-pointer test. 5)Allocation statement:p=malloc(t),where t is the type of the allocated storage. If t is a structure type,r1,...,rn represent pointer-type fields in t. a)p is a null pointer. p==NULL [IANAD}p=malloc(t){(IU{p)))AN/pA(DUp->r1,...,p->rn})} (RULE 9) b)p is a dangling pointer.The inference rule for this case is similar to(RULE 9). c)p is an effective pointer.In this case,the allocation statement can be regarded as a statement sequence p=NULL;p=malloc(t);.(RULE 4)and(RULE 9)can be used to handle this case. 6)Deallocation statement:free(p)
10 A pointer logic and certifying compiler For assignment of pointer-type data, if besides Ψ, there is another Q which contains pointers (including pointers as prefixes), then another assignment axiom may also be useful: {Ψ ∧ Q} p=q {Ψ 0 ∧ Q[r ← p]} (axiom 6-2) where r is a member of closure(p) (not p itself), and Ψ 0 can be derived from Ψ using RULE 1 to 5. In another word, when assigning a pointer to p, the alias substitution of p should also occur in assertions besides Ψ. Assertion Q in the above two axioms contains no pointer sets. And if there are any access paths in Q, they must be valid access paths according to current Π. All Qs in the following rules have this restriction. 3) Composition, if-then-else, while and Consequence rules are the same as the counterparts in Hoare logic. But note that Ψ should not be strengthened or weakened in Consequence rule. For example, {p, q,r} does not imply {p, q}, and {p, q} ∧ {r} should not be weakened to {p, q}. 4) Ψ-modification axioms. Ψ ∧ ¬(p : Ψ) ∧ (p!=NULL) ⊃ (Π ∪ {{p}}) ∧ N ∧ D (axiom 7) Ψ ∧ ¬(p : Ψ) ∧ (p==NULL) ⊃ Π ∧ (N ∪ {p}) ∧ D (axiom 8) where p : Ψ indicates that an alias of p is in Ψ, and “⊃” denotes implication. Some well-formed data structures , such as List and Binary Tree, contain no dangling pointer. So a pointer is either an effective pointer or a null pointer in such data structures. In a program, the effectiveness of such a pointer is usually determined through null-pointer test. Therefore, such a pointer can be added into Π or N according to the different test results. These two axioms just modify Π or N according to null-pointer test. 5) Allocation statement: p=malloc(t), where t is the type of the allocated storage. If t is a structure type, r1, . . . , rn represent pointer-type fields in t. a) p is a null pointer. p==NULL {Π ∧ N ∧ D} p=malloc(t) {(Π ∪ {{p}}) ∧ N /p ∧ (D ∪ {p->r1, . . . , p->rn})} (rule 9) b) p is a dangling pointer. The inference rule for this case is similar to (RULE 9). c) p is an effective pointer. In this case, the allocation statement can be regarded as a statement sequence p=NULL; p=malloc(t);. (RULE 4) and (RULE 9) can be used to handle this case. 6) Deallocation statement: free(p)