Li ZP,Zhang Y,Chen YY.A shape graph logic and a shape system.JOURNAL OF COMPUTER SCIENCE AND TECHN0L0GY28(6):1?-?Nov.2013.D0I10.1007/s11390-013-1398-1 A Shape Graph Logic and A Shape System Zhao-Peng Li(李兆),Tember,CCF,Yu Zhang*(张昱),and Yi-Yun Chen(陈意云),ember,CCF School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China Software Security Laboratory,Suzhou Institute for Advanced Study,University of Science and Technology of China Suzhou 215123,China E-mail:fzpli,yzhang,yiyun@ustc.edu.cn Received November 21,2012;revised September 20,2013. Abstract Analysis and verification of pointer programs are still difficult problems so far.This paper uses a shape graph logic and a shape system to solve these problems in two stages.First,shape graphs at every program point are constructed using an analysis tool.Then,they are used to support the verification of other properties (e.g.,orderedness).Our prototype supports automatic verification of programs manipulating complex data structures such as splay trees,treaps,AVL trees and AA trees,etc.The proposed shape graph logic,as an extension to Hoare logic,uses shape graphs directly as assertions.It can be used in the analysis and verification of programs manipulating mutable data structures.The benefit using shape graphs as assertions is that it is convenient for acquiring the relations between pointers in the verification stage.The proposed shape system requires programmers to provide lightweight shape declarations in recursive structure type declarations.It can help rule out programs that construct shapes deviating from what programmers expect (reflected in shape declarations) in the analysis stage.As a benefit,programmers need not provide specifications(e.g.,pre-/post-conditions,loop invariants) about pointers.Moreover,we present a method doing verification in the second stage using traditional Hoare logic rules directly by eliminating aliasing with the aid of shape graphs.Thus,verification conditions could be discharged by general theorem provers. Keywords shape graph logic,program verification,shape analysis,automated theorem proving,loop invariant inference Introduction ting large VCs.Although these tools have been develo- ped in labs,there is no product that can be applied to Formal verification is a major method to improve real-world software yet.The root of this problem lies the dependability of computer software.There are two in the difficulty in automated theorem proving,since main approaches to doing formal verification on soft- many problems are influenced by the power of under- ware.The first one is model checking,which consists of lying automated theorem proving techniques.These a systematically exhaustive exploration of the mathe- problems include aliasing analysis,loop invariant infe- matical model.This approach has been progressively rence,the expressivity of assertion languages,the de- used in the industry.The second one is logical infe- sign of domain-specific logics,etc. rence.It consists of using a formal version of mathe- While trying to make breakthroughs in automated matical reasoning about software systems,usually us- theorem proving techniques,we should consider lower- ing theorem proving software such as Isabelle/HOL[] ing requirements on the capability of theorem provers. or Coq.Most research(such as Ynot 21,Spec#[3]and For instance,we can design new mechanisms for pro- ESC/Javal4l)using the second approach revolve around gramming languages to raise the threshold of being the design of logic systems to reason about programs a legal program and reject certain programs in which and generate verification conditions (VC for short), there exist errors logically.Moreover,program analysis and then use certain theorem prover to prove them. can be used in collecting program information to sup- Researches such as Smallfoot5)and jStarl6l use sym- port program verification.All these approaches could bolic execution and entailment proof to avoid genera- somewhat alleviate the burdens of automated theorem Regular Paper This research was supported by the National Natural Science Foundation of China under Grant Nos.61003043,61170018,the National High Technology Research and Development 863 Program of China under Grant No.2012AA010901-2,and the Postdoctoral Science Foundation of China under Grant No.2012M521250. *Corresponding Author The Cog development team.The Coq proof assistant reference manual (Version 8.2),http://coq.inria fr,Sept.2013. C)2013 Springer Science+Business Media,LLC Science Press,China
Li ZP, Zhang Y, Chen YY. A shape graph logic and a shape system. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 28(6): 1??–??? Nov. 2013. DOI 10.1007/s11390-013-1398-1 A Shape Graph Logic and A Shape System Zhao-Peng Li (李兆鹏), Member, CCF, Yu Zhang∗ (张 昱), and Yi-Yun Chen (陈意云), Member, CCF School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China Software Security Laboratory, Suzhou Institute for Advanced Study, University of Science and Technology of China Suzhou 215123, China E-mail: {zpli, yzhang, yiyun}@ustc.edu.cn Received November 21, 2012; revised September 20, 2013. Abstract Analysis and verification of pointer programs are still difficult problems so far. This paper uses a shape graph logic and a shape system to solve these problems in two stages. First, shape graphs at every program point are constructed using an analysis tool. Then, they are used to support the verification of other properties (e.g., orderedness). Our prototype supports automatic verification of programs manipulating complex data structures such as splay trees, treaps, AVL trees and AA trees, etc. The proposed shape graph logic, as an extension to Hoare logic, uses shape graphs directly as assertions. It can be used in the analysis and verification of programs manipulating mutable data structures. The benefit using shape graphs as assertions is that it is convenient for acquiring the relations between pointers in the verification stage. The proposed shape system requires programmers to provide lightweight shape declarations in recursive structure type declarations. It can help rule out programs that construct shapes deviating from what programmers expect (reflected in shape declarations) in the analysis stage. As a benefit, programmers need not provide specifications (e.g., pre-/post-conditions, loop invariants) about pointers. Moreover, we present a method doing verification in the second stage using traditional Hoare logic rules directly by eliminating aliasing with the aid of shape graphs. Thus, verification conditions could be discharged by general theorem provers. Keywords shape graph logic, program verification, shape analysis, automated theorem proving, loop invariant inference 1 Introduction Formal verification is a major method to improve the dependability of computer software. There are two main approaches to doing formal verification on software. The first one is model checking, which consists of a systematically exhaustive exploration of the mathematical model. This approach has been progressively used in the industry. The second one is logical inference. It consists of using a formal version of mathematical reasoning about software systems, usually using theorem proving software such as Isabelle/HOL[1] or Coq①. Most research (such as Ynot[2], Spec#[3] and ESC/Java[4]) using the second approach revolve around the design of logic systems to reason about programs and generate verification conditions (VC for short), and then use certain theorem prover to prove them. Researches such as Smallfoot[5] and jStar[6] use symbolic execution and entailment proof to avoid generating large VCs. Although these tools have been developed in labs, there is no product that can be applied to real-world software yet. The root of this problem lies in the difficulty in automated theorem proving, since many problems are influenced by the power of underlying automated theorem proving techniques. These problems include aliasing analysis, loop invariant inference, the expressivity of assertion languages, the design of domain-specific logics, etc. While trying to make breakthroughs in automated theorem proving techniques, we should consider lowering requirements on the capability of theorem provers. For instance, we can design new mechanisms for programming languages to raise the threshold of being a legal program and reject certain programs in which there exist errors logically. Moreover, program analysis can be used in collecting program information to support program verification. All these approaches could somewhat alleviate the burdens of automated theorem Regular Paper This research was supported by the National Natural Science Foundation of China under Grant Nos. 61003043, 61170018, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010901-2, and the Postdoctoral Science Foundation of China under Grant No. 2012M521250. ∗Corresponding Author ①The Coq development team. The Coq proof assistant reference manual (Version 8.2), http://coq.inria.fr, Sept. 2013. ©2013 Springer Science + Business Media, LLC & Science Press, China
2 J.Comput.Sci.Technol.,Nov.2013,Vol.28,No.6 proving.This paper introduces our research result on declared,according to the rules of shape checking.In the automatic verification of pointer programs in two addition,the shape system can help rule out programs stages:shape analysis and program verification. that construct (or operate on)shapes deviating from First,we propose a method using shape graphs as what programmers expect in shape declarations in the assertions of pointer equality and validity.Based on analysis phase.In this way,we report errors earlier this approach,we design a shape graph logic (SGL for and decrease the cases needed to be considered in both short)as an extension of Hoare logic.We mainly add analysis and verification. some inference rules for statements dealing with point- With a shape system,programmers need not to ers (pointer statements for short).The resulting logic provide shape graphs in the pre-/post-conditions of can be used in the analysis and verification of pointer functions and loop invariants for program verification. programs manipulating mutable data structures. Hence,it brings no trouble to programmers using shape Under restrictions on pointer arithmetic and uses of graphs as the graphical expression of assertions the address-of operator (&)our shape graphs can pre- Third,we propose a method doing verification in the cisely describe the equalities between pointers (includ- second stage using traditional Hoare logic rules directly ing statically declared pointer variables and pointer- by eliminating aliasing with the aid of shape graphs. typed fields of dynamically allocated structure varia- One important limitation using Hoare logic is that dif- bles)and validity of pointers (valid pointers point to ferent names (including access paths)must represent program objects,i.e.,structure variable allocated on different program objects (i.e.,no aliasing is allowed). the heap)at program points.Given shape graphs, Hoare logic is sound when we apply rules to a program we can acquire whether two access paths are aliases where there is no aliasing in both statements and its or not at corresponding program point.Shape graphs specification(assertions).Shape graphs provide simple are bridges between shape analysis and program veri- ways of judging and eliminating aliasing.We eliminate fication.In the shape analysis stage,we use inference aliasing problems in access paths via alias substitution rules to construct shape graphs at each program point before applying corresponding Hoare logic rules. In the following stage,these shape graphs provide in- Using this method,VCs could be discharged by gene- formation of pointers and support the verification of ral theorem provers without the need of designing spe- other properties (e.g.,orderedness of binary trees).As cialized provers (e.g.,separation logic's special prover it were,certification about pointer-related properties is Smallfoot同). peeled off from program verification and done in the Finally,we implement a prototype for automatic shape analysis stage.We use SGL in both stages in verification of pointer programs.Our prototype sup- favor of soundness proof of the logic. ports programs manipulating complex data structures Using shape graphs as assertions makes work in both such as sorted circular doubly-linked lists,binary search stages easy.In the analysis stage,it is simple to calcu- trees,splay trees,treaps,AVL trees and AA trees. late shape graphs of the post-condition for statement- The rest of the paper is organized as follows.Sec- based on shape graphs of the pre-condition.In the veri- fication stage,shape graphs provide needed informa- tion 2 introduces shape graphs as assertions.Section 3 presents our SGL.Section 4 presents the shape system. tion about pointers.For example,we can easily know Section 5 introduces our program verification prototype whether more than one pointers point to a particular for PointerC.Section 6 compares our work with related node in the graph (to avoid memory leaks).In addi- work,and Section 7 concludes the paper. tion,we can judge whether two access paths are aliases or not using shape graphs (when eliminating aliasing of 2 Shape Graph as Assertion access paths) Second,we propose a shape system.Roughly speak- Shape graphs are directed graphs.They can describe ing,it is similar to type systems.We design and im- point-to relations of statically declared pointer variables plement a shape system for the PointerC programming (pointer for short)and pointer fields of dynamically languagel7.It contains shapes and their definitions, allocated structure variables (field pointer for short) rules for shape inference and shape checking.The shape Shape graphs precisely express the equalities between system requires programmers to provide shape declara- pointers.They can be used to judge whether two access tions in recursive structure type declarations.Compil- paths are aliases or not. ers or other tools can use shape inference rules to infer In this section,shape graphs are defined and the se- shapes constructed by programs via dynamically allo- mantic is given.We present our approach using shape cated structures,and then do shape checking to judge graphs as assertions on pointer relations.In Fig.1,some whether shapes confirm to what the programmer has quick examples of shape graph assertions and their co-
2 J. Comput. Sci. & Technol., Nov. 2013, Vol.28, No.6 proving. This paper introduces our research result on the automatic verification of pointer programs in two stages: shape analysis and program verification. First, we propose a method using shape graphs as assertions of pointer equality and validity. Based on this approach, we design a shape graph logic (SGL for short) as an extension of Hoare logic. We mainly add some inference rules for statements dealing with pointers (pointer statements for short). The resulting logic can be used in the analysis and verification of pointer programs manipulating mutable data structures. Under restrictions on pointer arithmetic and uses of the address-of operator (&), our shape graphs can precisely describe the equalities between pointers (including statically declared pointer variables and pointertyped fields of dynamically allocated structure variables) and validity of pointers (valid pointers point to program objects, i.e., structure variable allocated on the heap) at program points. Given shape graphs, we can acquire whether two access paths are aliases or not at corresponding program point. Shape graphs are bridges between shape analysis and program veri- fication. In the shape analysis stage, we use inference rules to construct shape graphs at each program point. In the following stage, these shape graphs provide information of pointers and support the verification of other properties (e.g., orderedness of binary trees). As it were, certification about pointer-related properties is peeled off from program verification and done in the shape analysis stage. We use SGL in both stages in favor of soundness proof of the logic. Using shape graphs as assertions makes work in both stages easy. In the analysis stage, it is simple to calculate shape graphs of the post-condition for statementbased on shape graphs of the pre-condition. In the veri- fication stage, shape graphs provide needed information about pointers. For example, we can easily know whether more than one pointers point to a particular node in the graph (to avoid memory leaks). In addition, we can judge whether two access paths are aliases or not using shape graphs (when eliminating aliasing of access paths). Second, we propose a shape system. Roughly speaking, it is similar to type systems. We design and implement a shape system for the PointerC programming language[7]. It contains shapes and their definitions, rules for shape inference and shape checking. The shape system requires programmers to provide shape declarations in recursive structure type declarations. Compilers or other tools can use shape inference rules to infer shapes constructed by programs via dynamically allocated structures, and then do shape checking to judge whether shapes confirm to what the programmer has declared, according to the rules of shape checking. In addition, the shape system can help rule out programs that construct (or operate on) shapes deviating from what programmers expect in shape declarations in the analysis phase. In this way, we report errors earlier and decrease the cases needed to be considered in both analysis and verification. With a shape system, programmers need not to provide shape graphs in the pre-/post-conditions of functions and loop invariants for program verification. Hence, it brings no trouble to programmers using shape graphs as the graphical expression of assertions. Third, we propose a method doing verification in the second stage using traditional Hoare logic rules directly by eliminating aliasing with the aid of shape graphs. One important limitation using Hoare logic is that different names (including access paths) must represent different program objects (i.e., no aliasing is allowed). Hoare logic is sound when we apply rules to a program where there is no aliasing in both statements and its specification (assertions). Shape graphs provide simple ways of judging and eliminating aliasing. We eliminate aliasing problems in access paths via alias substitution before applying corresponding Hoare logic rules. Using this method, VCs could be discharged by general theorem provers without the need of designing specialized provers (e.g., separation logic’s special prover Smallfoot[5]). Finally, we implement a prototype for automatic verification of pointer programs. Our prototype supports programs manipulating complex data structures such as sorted circular doubly-linked lists, binary search trees, splay trees, treaps, AVL trees and AA trees. The rest of the paper is organized as follows. Section 2 introduces shape graphs as assertions. Section 3 presents our SGL. Section 4 presents the shape system. Section 5 introduces our program verification prototype for PointerC. Section 6 compares our work with related work, and Section 7 concludes the paper. 2 Shape Graph as Assertion Shape graphs are directed graphs. They can describe point-to relations of statically declared pointer variables (pointer for short) and pointer fields of dynamically allocated structure variables (field pointer for short). Shape graphs precisely express the equalities between pointers. They can be used to judge whether two access paths are aliases or not. In this section, shape graphs are defined and the semantic is given. We present our approach using shape graphs as assertions on pointer relations. In Fig.1, some quick examples of shape graph assertions and their co-
Zhao-Peng Li et al.:A Shape Graph Logic and A Shape System unterparts in the separation logic are shown.Asser- without expression e and assertion a is called an un- tions on the first line are describing a singly-linked list constrained condensation node.Such node represents of length n.Assertions on the second line are used arbitrarily positive number (including zero)of structure to describes doubly-linked lists.Assertion Fig.1(e)is nodes. semantically the same as the assertion Fig.1(f).The 6)Predicate node:it is a rectangle with a symbol difference is that assertion Fig.1(f)uses quantifiers to P inside.There is also information below the node. describe that the fields l and r are dangling pointers. including the name of the predicate,an expression e, and an assertion a.e and a have exactly the same con- Shape Graph Assertions Separation Logic Assertions straints as these of a condensation node. ○ listseg (p,nil,n) Directed Edges and Their Labels.Each directed edge n,n>0 has an identifier as its label(labels are above the edges (a) (b) in our figures).Directed edges and related nodes satisfy the following syntactic constraints. ○9,p dlist(q) 1)Declaration node:it has only one out-edge and dlist no in-edge (c) (d) 2)Structure and condensation node:they have in- /out-edges and the labels of their outgoing edges are 3l1,l2.tmp→11*tmp十4→l2 D different from each other. (e) (0 3)Null node,dangling node and predicate node: (Suppose the Type of mmp Only (Suppose Fields/,r Will Be Translated they have no outgoing edges and there is no restriction Contains Two Fields /r) Into Index 0 and 4 Respectively) on their incoming edges. Fig.1.Quick example:shape graph and separation logic asser- Definition 1(Shape Graph). tions. 1)Shape graphs,whose nodes and edges satisfy afore- mentioned syntactic constraints,are connected graphs if 2.1 Syntax of Shape Graph directions of edges are ignored.The labels of edges from declaration nodes in a shape graph should be different Vertices.There are six kinds of vertex(see Fig.2) from each other. They are used to denote stack slots or heap blocks ma- 2)Suppose G1,G2 are shape graphs.If the label sets nipulated by programs.A vertex is also called a node. of edges from declaration nodes in G1,G2 are disjoint, The following are their names and syntax. GAG2 is also a shape graph (A is logical conjunction). 3)Suppose G1,G2 are shape graphs.If the label sets D of edges from declaration nodes in G1,G2 are equal to e,a name,e,a each other,GIV G2 is also a shape graph (V is logical Declaration Structure Null Dangling Condensation Predicate disjunction). Node Node Node Node Node Node In this paper,we use symbol G to denote shape Fig.2.Six kinds of nodes in shape graphs. graphs without connective V.Shape graph G with- out logical conjunction connectives is called shape sub- 1)Declaration node:it is a circle and represents a graph.Obviously,shape sub-graphs in one shape graph stack slot (a declared pointer variable). are not connected to each other.If GIA G2 is a shape 2)Structure node:it is a rectangle and represents a graph,Gi and G2 assert on two different (separated) heap block(a structure variable). parts of one program state;while G1,G2 in G1VG2 de- 3)Null node:it is a dashed rectangle with a sym- note two possible program states at one program point. bol N inside.Pointers pointing to null nodes are null Shape graphs in this paper are subsets of shape pointers. graphs defined in Definition 1 due to the constraints 4)Dangling node:it is a dashed rectangle with a of the type system.For instance,the type system will symbol D inside.Pointers pointing to dangling nodes guarantee that two structure nodes of different types are dangling pointers. will not be adjacent to each other in shape graphs. 5)Condensation node:it is a gray rectangle.There Fig.3 presents two shape graphs for singly-linked is information below the node,including an expression lists.The loop invariant shape graph of the follow- e and an assertion a constraining e.Expression e is a ing program fragment is shown as Fig.3(a)(suppose the linear integer expression using integer constants and de- singly-linked list pointed by the variable hd has at least clared variables.Assertion a is logical conjunctions of one node).The condensation nodes pointed by hd and relations between such expressions.The gray rectangle ptr denote two list segments whose lengths are m and
Zhao-Peng Li et al.: A Shape Graph Logic and A Shape System 3 unterparts in the separation logic are shown. Assertions on the first line are describing a singly-linked list of length n. Assertions on the second line are used to describes doubly-linked lists. Assertion Fig.1(e) is semantically the same as the assertion Fig.1(f). The difference is that assertion Fig.1(f) uses quantifiers to describe that the fields l and r are dangling pointers. Fig.1. Quick example: shape graph and separation logic assertions. 2.1 Syntax of Shape Graph Vertices. There are six kinds of vertex (see Fig.2). They are used to denote stack slots or heap blocks manipulated by programs. A vertex is also called a node. The following are their names and syntax. Fig.2. Six kinds of nodes in shape graphs. 1) Declaration node: it is a circle and represents a stack slot (a declared pointer variable). 2) Structure node: it is a rectangle and represents a heap block (a structure variable). 3) Null node: it is a dashed rectangle with a symbol N inside. Pointers pointing to null nodes are null pointers. 4) Dangling node: it is a dashed rectangle with a symbol D inside. Pointers pointing to dangling nodes are dangling pointers. 5) Condensation node: it is a gray rectangle. There is information below the node, including an expression e and an assertion a constraining e. Expression e is a linear integer expression using integer constants and declared variables. Assertion a is logical conjunctions of relations between such expressions. The gray rectangle without expression e and assertion a is called an unconstrained condensation node. Such node represents arbitrarily positive number (including zero) of structure nodes. 6) Predicate node: it is a rectangle with a symbol P inside. There is also information below the node, including the name of the predicate, an expression e, and an assertion a. e and a have exactly the same constraints as these of a condensation node. Directed Edges and Their Labels. Each directed edge has an identifier as its label (labels are above the edges in our figures). Directed edges and related nodes satisfy the following syntactic constraints. 1) Declaration node: it has only one out-edge and no in-edge. 2) Structure and condensation node: they have in- /out-edges and the labels of their outgoing edges are different from each other. 3) Null node, dangling node and predicate node: they have no outgoing edges and there is no restriction on their incoming edges. Definition 1 (Shape Graph). 1) Shape graphs, whose nodes and edges satisfy aforementioned syntactic constraints, are connected graphs if directions of edges are ignored. The labels of edges from declaration nodes in a shape graph should be different from each other. 2) Suppose G1, G2 are shape graphs. If the label sets of edges from declaration nodes in G1, G2 are disjoint, G1∧G2 is also a shape graph (∧ is logical conjunction). 3) Suppose G1, G2 are shape graphs. If the label sets of edges from declaration nodes in G1, G2 are equal to each other, G1 ∨ G2 is also a shape graph (∨ is logical disjunction). In this paper, we use symbol G to denote shape graphs without connective ∨. Shape graph G without logical conjunction connectives is called shape subgraph. Obviously, shape sub-graphs in one shape graph are not connected to each other. If G1 ∧ G2 is a shape graph, G1 and G2 assert on two different (separated) parts of one program state; while G1, G2 in G1∨G2 denote two possible program states at one program point. Shape graphs in this paper are subsets of shape graphs defined in Definition 1 due to the constraints of the type system. For instance, the type system will guarantee that two structure nodes of different types will not be adjacent to each other in shape graphs. Fig.3 presents two shape graphs for singly-linked lists. The loop invariant shape graph of the following program fragment is shown as Fig.3(a) (suppose the singly-linked list pointed by the variable hd has at least one node). The condensation nodes pointed by hd and ptr denote two list segments whose lengths are m and
4 J.Comput.Sci.Technol.,Nov.2013,Vol.28,No.6 n respectively (m,n>0).It is easier to understand nodes are null pointers.Fig.3(b)is the shape graph at after we introduce the semantics of shape graphs. the program point before the first statement of the loop ptri=hd;ptr=hd->nxt; body. while(ptr!=NULL){ ptri-ptr;ptr=ptr->nxt; 2.2 Shape Graph for Data Structure Fig.4 shows shape graphs for(circular)singly-linked lists,(circular)doubly-linked lists and binary trees. The symbols,dlist(s,e,a),etc.,on the left-hand side of (a) the definitions are shorthand of shape graphs for easy citing below in this paper.Labels of edges in given shape graphs are only placeholders.Proper names should be used to instantiate them with respect to cor- responding programs. (b) As can be seen,if we remove e and a in the Fig.3.Two instances of shape graphs. right-hand side shape graphs of the two definitions of dlist(s,e,a)and connect them with V,we can ob- A condensation node represents several adjacent tain the definition of dlist(s).Due to limited space, nodes and the expression e below it represents the num- we do not include the definitions of shape graphs for ber of these adjacent nodes.Pointers pointing to null list(s),c_list(s),and c_dlist(s)in Fig.4. Singly-Linked List list(s,e.a) list,e,a e.a (a) Circular Singly-Linked List c list(s,e,a) Gp○ c list,e,a e,a (b) Doubly-Linked List dlisis,e.a) ○p△○'w dlist,e.a a=>(e=0) (c) a=>(e>=1) dlist,e,a e-1.a (d) Doubly-Linked List dlist(s) ○p dlist GWvGW W (e) Circular Doubly-Linked List c dlist(s,e,a) ○p○'w a=>(e=0) c dlist,e,a (0 G口G a=>(e>=1) c dlist,e,a 1 e-1.a (g) Binary Trec tree(s) GpGwvG1 tree tree (h) Fig.4.Part of the definitions of shape predicates
4 J. Comput. Sci. & Technol., Nov. 2013, Vol.28, No.6 n respectively (m, n > 0). It is easier to understand after we introduce the semantics of shape graphs. ptr1=hd; ptr=hd->nxt; while(ptr!=NULL){ ptr1=ptr; ptr=ptr->nxt;} Fig.3. Two instances of shape graphs. A condensation node represents several adjacent nodes and the expression e below it represents the number of these adjacent nodes. Pointers pointing to null nodes are null pointers. Fig.3(b) is the shape graph at the program point before the first statement of the loop body. 2.2 Shape Graph for Data Structure Fig.4 shows shape graphs for (circular) singly-linked lists, (circular) doubly-linked lists and binary trees. The symbols, dlist(s, e, a), etc., on the left-hand side of the definitions are shorthand of shape graphs for easy citing below in this paper. Labels of edges in given shape graphs are only placeholders. Proper names should be used to instantiate them with respect to corresponding programs. As can be seen, if we remove e and a in the right-hand side shape graphs of the two definitions of dlist(s, e, a) and connect them with ∨, we can obtain the definition of dlist(s). Due to limited space, we do not include the definitions of shape graphs for list(s), c list(s), and c dlist(s) in Fig.4. Fig.4. Part of the definitions of shape predicates
Zhao-Peng Li et al.:A Shape Graph Logic and A Shape System Definition 2 (Minimal Shape Graph for Data and the contert). Structure).Shape sub-graphs on the left-/right-hand 2)Edges between nodes in a window are all in the sides of definitions shown in Fig.4 (including definitions window.Edges across the frame,which are used to con- not shoun here due to limited space)are all called mini- nect nodes inside and outside the window,belong to the mal shape graphs for data structure. window.There is a copy of these edges across the frame in its contert of this window. 2.3 Transformation Rules of Shape Graph We use W and X to denote a window and a con- Before giving the rules,we define the concept of text.A window W and a matching context XI can a window,which describes the concerned part of one form a shape graph X[W].Edges across the frame of shape sub-graph in terms of local reasoning. W and edges in X[as a copy are checked whether they Definition 3 (Window).A window is nodes and are coincident.If coincident,the context is a matching edges in a shape sub-graph encircled by a frame in dot one for the window. and dash lines.The other part of the shape graph is We use the following notations in rules using win- called the contert of this window. dows.If there exists at least one edge (e.g.,p in Fig.5) The followings should be satisfied. pointing to certain node in the window,all the other 1)Nodes of shape graphs are in a window or in the possible edges (may be zero)pointing to this node are contert of the window (nodes cannot span the window represented using one edge labeled pk. ΓP .aie,ea」 (a) a≥→(e==0) (b) ------7P a→(e==0) (c) P a→(e>=l) P P (e) P a→(e==0) list,e,a (D P a→(e=1) list,e,a list,e-1,a (g) Fig.5.Part of the equivalent transformation rules for(circular)singly-linked list
Zhao-Peng Li et al.: A Shape Graph Logic and A Shape System 5 Definition 2 (Minimal Shape Graph for Data Structure). Shape sub-graphs on the left-/right-hand sides of definitions shown in Fig.4 (including definitions not shown here due to limited space) are all called minimal shape graphs for data structure. 2.3 Transformation Rules of Shape Graph Before giving the rules, we define the concept of a window, which describes the concerned part of one shape sub-graph in terms of local reasoning. Definition 3 (Window). A window is nodes and edges in a shape sub-graph encircled by a frame in dot and dash lines. The other part of the shape graph is called the context of this window. The followings should be satisfied. 1) Nodes of shape graphs are in a window or in the context of the window (nodes cannot span the window and the context). 2) Edges between nodes in a window are all in the window. Edges across the frame, which are used to connect nodes inside and outside the window, belong to the window. There is a copy of these edges across the frame in its context of this window. We use W and X[] to denote a window and a context. A window W and a matching context X[] can form a shape graph X[W]. Edges across the frame of W and edges in X[] as a copy are checked whether they are coincident. If coincident, the context is a matching one for the window. We use the following notations in rules using windows. If there exists at least one edge (e.g., p1 in Fig.5) pointing to certain node in the window, all the other possible edges (may be zero) pointing to this node are represented using one edge labeled pk. Fig.5. Part of the equivalent transformation rules for (circular) singly-linked list
6 J.Comput.Sci.Technol.,Nov.2013,Vol.28,No.6 Next we introduce the equivalent transformation below it and two adjacent condensation nodes (with rules of shape graphs.They are used to represent trans- e1,a and e2,a2 below respectively).Similarly,it is formations between shape graphs preserving semantic an equivalent transformation.from two adjacent un- equivalence.We give the proof after introducing se- constrained condensation nodes to one unconstrained mantics of shape graphs. condensation node. Another derived rule is for an equivalent transfor- 2.3.1 Equivalent Transformation Rules for (Circular) mation between one structure node and one condensa- Singly-Linked List tion node with 1.true below it. Basic Rules. Another kind of derived rule is related to an al- Rules based on predicate definitions of(circular) ternative inductive definition of list(s,e,a).We can singly-linked list.For example,rule in Fig.5(a)is a rule use the disjunction of right-hand sides without edges based on definition of list(s,e,a)in Fig.4. labeled with p&in rules in Figs.5(f)and 5(g)as the Rules adding or removing a condensation node right-hand side of the definition.Similar rules can be whose expression e equals zero.Rules in Figs.5(b),5(c given for list(s). are this kind of rules.For rule in Fig.5(b),a similar rule 2.3.2 Equivalent Transformation Rules for (Circular) is not included in Fig.5,in which the condensation node Doubly-Linked List interchanges its position with the structure node in the right-hand side window.Moreover,if we change the Basic Rules structure node into predicate node,null node or dan- Rules based on predicate definitions of (circular) gling node (remove its out-edge),the result is also a doubly-linked lists. basic rule of this kind. Rules adding or removing a condensation node Rules folding and unfolding a condensation node whose expression e equals zero(e.g.,rule in Fig.6(a)). whose expression e is greater than zero. Rule in Rules folding and unfolding a condensation node Fig.5(d)is one of this kind of rules.Similarly,we can whose expression e is greater than zero (e.g.,rule in give a rule by interchanging the positions of the struc- Fig.6(b)).Due to the symmetric status of labels l and ture node and the condensation node in the right-hand r,rules in Figs.6(a)and 6(b)can also be applied to side window. cases where the condensation node is pointed by edge Rules folding and unfolding an unconstrained con- r from the structure node. densation node.Rule in Fig.5(e)is one of this kind of Rules folding and unfolding a condensation node, rules.Rules similar to rule in Fig.5(e)can be given if which is a marginal node (head node or tail node)in the nt edge sourced from the condensation node points a doubly-linked list.They are slightly different from to other kinds of nodes. rules in Figs.6(a)and 6(b). Rules folding and unfolding predicate nodes.Rules Rules folding and unfolding an unconstrained con- in Figs.5(f)and 5(g)are two of this kind of rules. densation node (e.g.,rule in Fig.6(c)).Similar rules can Derived Rules. be given if edge r sourced from the condensation node One derived rule states an equivalent transforma- points to other kinds of nodes.or the condensation node tion between a condensation node with e1+e2,a1A a2 itself is a marginal node a→(e==0) a→(e>=l) (b (c) Fig.6.Part of the equivalent transformation rules for (circular)doubly-linked list
6 J. Comput. Sci. & Technol., Nov. 2013, Vol.28, No.6 Next we introduce the equivalent transformation rules of shape graphs. They are used to represent transformations between shape graphs preserving semantic equivalence. We give the proof after introducing semantics of shape graphs. 2.3.1 Equivalent Transformation Rules for (Circular) Singly-Linked List Basic Rules. • Rules based on predicate definitions of (circular) singly-linked list. For example, rule in Fig.5(a) is a rule based on definition of list(s, e, a) in Fig.4. • Rules adding or removing a condensation node whose expression e equals zero. Rules in Figs. 5(b), 5(c) are this kind of rules. For rule in Fig.5(b), a similar rule is not included in Fig.5, in which the condensation node interchanges its position with the structure node in the right-hand side window. Moreover, if we change the structure node into predicate node, null node or dangling node (remove its out-edge), the result is also a basic rule of this kind. • Rules folding and unfolding a condensation node whose expression e is greater than zero. Rule in Fig.5(d) is one of this kind of rules. Similarly, we can give a rule by interchanging the positions of the structure node and the condensation node in the right-hand side window. • Rules folding and unfolding an unconstrained condensation node. Rule in Fig.5(e) is one of this kind of rules. Rules similar to rule in Fig.5(e) can be given if the nt edge sourced from the condensation node points to other kinds of nodes. • Rules folding and unfolding predicate nodes. Rules in Figs. 5(f) and 5(g) are two of this kind of rules. Derived Rules. • One derived rule states an equivalent transformation between a condensation node with e1 + e2, a1 ∧ a2 below it and two adjacent condensation nodes (with e1, a1 and e2, a2 below respectively). Similarly, it is an equivalent transformation, from two adjacent unconstrained condensation nodes to one unconstrained condensation node. • Another derived rule is for an equivalent transformation between one structure node and one condensation node with 1, true below it. • Another kind of derived rule is related to an alternative inductive definition of list(s, e, a). We can use the disjunction of right-hand sides without edges labeled with pk in rules in Figs. 5(f) and 5(g) as the right-hand side of the definition. Similar rules can be given for list(s). 2.3.2 Equivalent Transformation Rules for (Circular) Doubly-Linked List Basic Rules. • Rules based on predicate definitions of (circular) doubly-linked lists. • Rules adding or removing a condensation node whose expression e equals zero (e.g., rule in Fig.6(a)). • Rules folding and unfolding a condensation node whose expression e is greater than zero (e.g., rule in Fig.6(b)). Due to the symmetric status of labels l and r, rules in Figs. 6(a) and 6(b) can also be applied to cases where the condensation node is pointed by edge r from the structure node. • Rules folding and unfolding a condensation node, which is a marginal node (head node or tail node) in a doubly-linked list. They are slightly different from rules in Figs. 6(a) and 6(b). • Rules folding and unfolding an unconstrained condensation node (e.g., rule in Fig.6(c)). Similar rules can be given if edge r sourced from the condensation node points to other kinds of nodes, or the condensation node itself is a marginal node. Fig.6. Part of the equivalent transformation rules for (circular) doubly-linked list
Zhao-Peng Li et al.:A Shape Graph Logic and A Shape System Derived Rules. 1)From equivalent transformation rules in the form Rules can be given in a way similar to derived rules of W+WI VW2,we can obtain two implication rules of(circular)singly-linked lists(except rules related to W1→V and W2→W. alternative inductive definitions). 2)From equivalent transformation rules in the form No other edges (except the two edges labeled with ofWi台W2 with side condition(e1==e2Aa1)→ l and r)point to the condensation node is an impor- a2)A((e2 ==e1Aa2)=a1),we can obtain two implica- tant principle in given definitions and rules for (circu- tion rules W→W2andW2→Wi with side condition lar)doubly-linked lists.The reason is that if we allow (e1==e2∧a1)→a2and(e2==e1Aa2)→a1 respec- other edges to point to condensation nodes,we do not tively. know which nodes they should point to in the unfolded 3)There are implication transformation rules for shape graph. shape graphs changing condensation nodes with e,a to unconstrained condensation nodes. 2.3.3 Equivalent Transformation Rules for Binary For equivalent transformation rules in the form Tree of WiW2,if there are condensation nodes with Besides the rules based on the predicate tree(s) e1,a and e2,a2 in Wi and W2 respectively and e there are rules for condensation nodes similar to those is less than e2 under constraint ai A a2,we can ob- of singly-linked lists.The difference is that there is one tain an implication rule W W3.Wi(W3)differs additional edge from each structure node or condensa- from Wi(W2)in that the former has only replaced the tion node to a tree predicate node. condensation node with an unconstrained condensation node and other nodes and edges are the same with the 2.3.4 Equivalent Transformation Rules for latter. Substitutions in e and a of Condensation Node For equivalent transformation rules in the form of Rules in Figs.7(a)and 7(b)are rules for condensa- Wi+W2,if there is a condensation node with e,a in tion nodes of(circular)singly-linked lists and(circular) W2 and no condensation node in Wi,we can obtain an doubly-linked lists respectively.Similar rules can be implication rule W=W3.W3 differs from W2 in the given for condensation nodes of binary trees or for con- above mentioned way. densation nodes as the marginal nodes in doubly-linked lists. 2.4 Semantics of Shape Graph For a programming language with dynamic mem- nt ory allocation such as C,nodes denote stack slots or (e1==e2人a1)→a2) A(e2==eAa2)→a) heap blocks(except for the null and dangling node), elal e2,a2 and edges represent values of corresponding pointers.A (a) shape graph without condensation nodes and predicate nodes is a graphical representation of machine state. (e1==e2a1)→a2) A general shape graph is a graphical representation of A(e2==e1Aa2)→a1) machine state set. e1,a1 e2,a2 First,we define the semantics of nodes and edges. (b) In a machine with a stack and a heap,denotations (or meanings)of nodes and edges in shape graphs are given Fig.7.Part of the equivalent transformation rules for substitu- as follows. tions in condensation node. 1)A declaration node represents a declared pointer We can do transformations on shape graphs using whose name is the label of the out-edge of this declara- these equivalent transformation rules.For example,for tion node.Different declaration nodes denote distinct a proper context XI and rule Wi+W2,we can apply stack slots. this rule to shape graph X[Wi]to get X[W2]and vice 2)A structure node represents a structure variable versa.That is,XWi XW2.Similarly,we can created by calling malloc.The number of out-edges is get X[W]XWi]V X[W2]from corresponding rule the same with the number of the field pointers in the W÷WVW2. corresponding structure variable.Labels of these edges Next,implication transformation rules are intro- are corresponding names of field pointers.A structure duced.They are transformation rules for shape graphs node denotes a heap block whose cell number is its out- preserving implication in semantics. Corresponding degree and abstract addresses of cells are corresponding proof will be given in Subsection 2.4. labels
Zhao-Peng Li et al.: A Shape Graph Logic and A Shape System 7 Derived Rules. Rules can be given in a way similar to derived rules of (circular) singly-linked lists (except rules related to alternative inductive definitions). No other edges (except the two edges labeled with l and r) point to the condensation node is an important principle in given definitions and rules for (circular) doubly-linked lists. The reason is that if we allow other edges to point to condensation nodes, we do not know which nodes they should point to in the unfolded shape graph. 2.3.3 Equivalent Transformation Rules for Binary Tree Besides the rules based on the predicate tree(s), there are rules for condensation nodes similar to those of singly-linked lists. The difference is that there is one additional edge from each structure node or condensation node to a tree predicate node. 2.3.4 Equivalent Transformation Rules for Substitutions in e and a of Condensation Node Rules in Figs. 7(a) and 7(b) are rules for condensation nodes of (circular) singly-linked lists and (circular) doubly-linked lists respectively. Similar rules can be given for condensation nodes of binary trees or for condensation nodes as the marginal nodes in doubly-linked lists. Fig.7. Part of the equivalent transformation rules for substitutions in condensation node. We can do transformations on shape graphs using these equivalent transformation rules. For example, for a proper context X[] and rule W1 ⇔ W2, we can apply this rule to shape graph X[W1] to get X[W2] and vice versa. That is, X[W1] ⇔ X[W2]. Similarly, we can get X[W] ⇔ X[W1] ∨ X[W2] from corresponding rule W ⇔ W1 ∨ W2. Next, implication transformation rules are introduced. They are transformation rules for shape graphs preserving implication in semantics. Corresponding proof will be given in Subsection 2.4. 1) From equivalent transformation rules in the form of W ⇔ W1 ∨ W2, we can obtain two implication rules W1 ⇒ W and W2 ⇒ W. 2) From equivalent transformation rules in the form of W1 ⇔ W2 with side condition ((e1 == e2 ∧ a1) ⇒ a2)∧((e2 == e1∧a2) ⇒ a1), we can obtain two implication rules W1 ⇒ W2 and W2 ⇒ W1 with side condition (e1 == e2 ∧a1) ⇒ a2 and (e2 == e1 ∧a2) ⇒ a1 respectively. 3) There are implication transformation rules for shape graphs changing condensation nodes with e, a to unconstrained condensation nodes. • For equivalent transformation rules in the form of W1 ⇔ W2, if there are condensation nodes with e1, a1 and e2, a2 in W1 and W2 respectively and e1 is less than e2 under constraint a1 ∧ a2, we can obtain an implication rule W0 1 ⇒ W0 2 . W0 1 (W0 2 ) differs from W1(W2) in that the former has only replaced the condensation node with an unconstrained condensation node and other nodes and edges are the same with the latter. • For equivalent transformation rules in the form of W1 ⇔ W2, if there is a condensation node with e, a in W2 and no condensation node in W1, we can obtain an implication rule W1 ⇒ W0 2 . W0 2 differs from W2 in the above mentioned way. 2.4 Semantics of Shape Graph For a programming language with dynamic memory allocation such as C, nodes denote stack slots or heap blocks (except for the null and dangling node), and edges represent values of corresponding pointers. A shape graph without condensation nodes and predicate nodes is a graphical representation of machine state. A general shape graph is a graphical representation of machine state set. First, we define the semantics of nodes and edges. In a machine with a stack and a heap, denotations (or meanings) of nodes and edges in shape graphs are given as follows. 1) A declaration node represents a declared pointer whose name is the label of the out-edge of this declaration node. Different declaration nodes denote distinct stack slots. 2) A structure node represents a structure variable created by calling malloc. The number of out-edges is the same with the number of the field pointers in the corresponding structure variable. Labels of these edges are corresponding names of field pointers. A structure node denotes a heap block whose cell number is its outdegree and abstract addresses of cells are corresponding labels
J.Comput.Sci.Technol.,Nov.2013,Vol.28,No.6 Cells of other types have little relationship with our sf:Abs Value×FieldVar→Abs Value U{W,D}.The discussion on shapes,so they are not considered here domain of sd is the set of declared pointers.sd maps Unconstrained condensation nodes are regarded as con- declared pointers to their abstract values.sf maps the densation nodes with expression n and assertion n>0 abstract address of a heap block and a field pointer in the following part of this subsection. name to the abstract value of the field pointer (A.D 3)A condensation node with e,a below represents n are two special abstract values for null and dangling structure variables and denotes n separated heap blocks pointers).Next s or (sd,sf)is used to represent the (assume the value of e is n). machine state. 4)Null nodes and dangling nodes do not represent Under such a model,a shape graph without predi- any program object.So they do not denote anything cate nodes and condensation nodes is a graphic repre- of the machine. sentation of the machine state based on the semantics of 5)A predicate node represents several structure the nodes and edges.A generic shape graph is a graphic variables allocated dynamically.It denotes a number representation of a certain set of machine state. of separated heap blocks.Based on the rules of predi- Definition 4.The set of machine states SG rep- cate unfolding in Subsection 2.3,the relations among resented by shape graph G can be defined using the fol- these blocks are determined according to semantics of lowing rules: edges. 1)If there is no predicate node or condensation node Edges do not represent any program object.So they in G,then there is only one state in SG.G represents do not denote any locations in the machine memory. this state directly where all labels of out-edges of decla- They are used to represent the values of corresponding ration nodes and the values of corresponding out-edges pointers. form function sd and structure nodes plus the labels of 1)An edge pointing to a structure node represents their out-edges form function sf. the address of the heap block denoted by the node. 2)If there is a condensation node with e and a below 2)Edges pointing to null nodes are used to ex- it and e has k values (implied by a),under these k cases press that the values of corresponding pointers are the condensation node can be unfolded completely into null.Edges pointing to dangling nodes are used to ex- press that corresponding pointers are dangling point- shape graphs G1,...,Gk.Then,SG]=S[G1]U...U SIGkl.If a implies that e could equal 0,1,...,and the ers.Hence,it does not affect the meaning of the shape condensation node can be unfolded completely into the graph,whether multiple edges point to one null(dan shape graph Gn with n structure nodes (n20),then gling)node,or they point to different null (dangling) SIG]=SIGo]U SIG1]U...USIGn]U.... nodes. 3)For an edge pointing to a condensation node with 3)If there is a predicate node in G and its corre- e and a,it represents the address of the heap block sponding definition body is G'or G1 V G2,then SG] denoted by the first structure node (ordered by the di- =SIG']or SIG]SIGi]USIG2l. rection of edges)of the unfolded condensation node if Similarly,we can define the semantics of GVG2. e >0.Otherwise,if e =0,it should be determined The syntax for the access path in shape graphs is the after removing this node using corresponding transfor- same as that in PointerC in favor of discussing the rela- mation rules. tions between them.Clearly,only access paths from a 4)The denotation of edges pointing predicate nodes declaration node to the other node are considered.We can be determined after unfolding corresponding predi will introduce in the following two cases. cate nodes using transformation rules. Full representation of access paths.In this case, Next we present the semantics of shape graphs fo- the access path does not cross any condensation node, cusing on pointers by omitting variables of other types and the labels of the edges on the path should be listed and their corresponding memory cells.For non-null in turn to represent such an access path.For example, and non-dangling pointers,their values indicate the ab- if the labels are p,left,right in turn,the access path is stract addresses of the heap blocks pointed by them written as p->left->right. The addresses of heap blocks dynamically allocated Condensed representation of access paths.In this are different abstract values.In our abstract machine case,the access path crosses one or more condensation stack slots and heap cells are accessed via names of nodes.For instance,if the path includes an out-edge declared pointers and names of field pointers respec- labeled left of a condensation node which stands for n tively.Thus,the abstract state of the machine (the number of nodes,then (->left)"should appear in the machine state for short)can be represented by two access path,such as hd(->nrt)m and hd(->nrt)m->nrt functions:sd Dec Var Abs Value U{N,D}and in Fig.3(b)
8 J. Comput. Sci. & Technol., Nov. 2013, Vol.28, No.6 Cells of other types have little relationship with our discussion on shapes, so they are not considered here. Unconstrained condensation nodes are regarded as condensation nodes with expression n and assertion n > 0 in the following part of this subsection. 3) A condensation node with e, a below represents n structure variables and denotes n separated heap blocks (assume the value of e is n). 4) Null nodes and dangling nodes do not represent any program object. So they do not denote anything of the machine. 5) A predicate node represents several structure variables allocated dynamically. It denotes a number of separated heap blocks. Based on the rules of predicate unfolding in Subsection 2.3, the relations among these blocks are determined according to semantics of edges. Edges do not represent any program object. So they do not denote any locations in the machine memory. They are used to represent the values of corresponding pointers. 1) An edge pointing to a structure node represents the address of the heap block denoted by the node. 2) Edges pointing to null nodes are used to express that the values of corresponding pointers are null. Edges pointing to dangling nodes are used to express that corresponding pointers are dangling pointers. Hence, it does not affect the meaning of the shape graph, whether multiple edges point to one null (dangling) node, or they point to different null (dangling) nodes. 3) For an edge pointing to a condensation node with e and a, it represents the address of the heap block denoted by the first structure node (ordered by the direction of edges) of the unfolded condensation node if e > 0. Otherwise, if e = 0, it should be determined after removing this node using corresponding transformation rules. 4) The denotation of edges pointing predicate nodes can be determined after unfolding corresponding predicate nodes using transformation rules. Next we present the semantics of shape graphs focusing on pointers by omitting variables of other types and their corresponding memory cells. For non-null and non-dangling pointers, their values indicate the abstract addresses of the heap blocks pointed by them. The addresses of heap blocks dynamically allocated are different abstract values. In our abstract machine, stack slots and heap cells are accessed via names of declared pointers and names of field pointers respectively. Thus, the abstract state of the machine (the machine state for short) can be represented by two functions: sd : DecVar → AbsValue ∪ {N , D} and sf : AbsValue × FieldVar → AbsValue ∪ {N , D}. The domain of sd is the set of declared pointers. sd maps declared pointers to their abstract values. sf maps the abstract address of a heap block and a field pointer name to the abstract value of the field pointer (N , D are two special abstract values for null and dangling pointers). Next s or hsd, sf i is used to represent the machine state. Under such a model, a shape graph without predicate nodes and condensation nodes is a graphic representation of the machine state based on the semantics of the nodes and edges. A generic shape graph is a graphic representation of a certain set of machine state. Definition 4. The set of machine states SJGK represented by shape graph G can be defined using the following rules: 1) If there is no predicate node or condensation node in G, then there is only one state in SJGK. G represents this state directly where all labels of out-edges of declaration nodes and the values of corresponding out-edges form function sd and structure nodes plus the labels of their out-edges form function sf . 2) If there is a condensation node with e and a below it and e has k values (implied by a), under these k cases the condensation node can be unfolded completely into shape graphs G1, . . . , Gk. Then, SJGK = SJG1K ∪ · · · ∪ SJGkK. If a implies that e could equal 0, 1, . . ., and the condensation node can be unfolded completely into the shape graph Gn with n structure nodes (n > 0), then SJGK = SJG0K ∪ SJG1K ∪ · · · ∪ SJGnK ∪ · · ·. 3) If there is a predicate node in G and its corresponding definition body is G0 or G1 ∨ G2, then SJGK = SJG0 K or SJGK = SJG1K ∪ SJG2K. Similarly, we can define the semantics of G1 ∨ G2. The syntax for the access path in shape graphs is the same as that in PointerC in favor of discussing the relations between them. Clearly, only access paths from a declaration node to the other node are considered. We will introduce in the following two cases. • Full representation of access paths. In this case, the access path does not cross any condensation node, and the labels of the edges on the path should be listed in turn to represent such an access path. For example, if the labels are p, left, right in turn, the access path is written as p->left->right. • Condensed representation of access paths. In this case, the access path crosses one or more condensation nodes. For instance, if the path includes an out-edge labeled left of a condensation node which stands for n number of nodes, then (->left) n should appear in the access path, such as hd(->nxt) m and hd(->nxt) m->nxt in Fig.3(b)
Zhao-Peng Li et al.:A Shape Graph Logic and A Shape System A path in shape graphs represents a pointer related only if s卡AG.With respect to G台G1VG2,fom to the last edge,which is identical to the access path for any state s.SIG],sF AIG]if and only if sF A[G]or this pointer in programs.They are collectively called s F AG2.For rules of implication.this theorem still access paths later. holds if the equality between state sets is changed into c Shape graphs are graphic representation of equalities and if and only if is changed to the logical implication. and validity of pointers at given program points. For each transform rule,Theorem 2 can be proved Definition 5.The set of assertions AlG]repre- based on the construction rules of state sets in Defini- sented by shape graph G includes the following asser- tion 4.For instance,from the rule in Fig.5(e)we can tions about pointers:1)Pointers pointing to structure get GGVG2.Hence using rule 3 in Definition 4, nodes are valid;pointers pointing to the null/dangling SIG]is S[G1]U SIG2],where the states in S[Gi]and nodes are not valid.2)Pointers pointing to the same SIG2l satisfy A[Gi]and A[G2]respectively. structure or predicate node are equal;pointers pointing From Theorem 2,if shape graphs are regarded as to the same structure node of an unfolded condensation assertion sets,the transform rules of shape graphs are node are equal.3)The value of pointers pointing to the equivalence or implication rules of the assertion calcu- null node is NULL;pointers pointing to dangling nodes lus.These rules are valid with respect to our machine are dangling pointers.4)Pointers pointing to predicate model. nodes make corresponding predicates hold. Shape graphs,as assertions about pointers in data For instance,ptr ==hd(->nxt)m->nrt appears in structures,are part of program assertions.We use the assertion set represented by the shape graph of G1VG2V...VGk in disjunction normal form (DNF)to Fig.3(b).One can infer from the set of assertions that express different cases of data structure at a program whether two access paths of pointer types are equal or point.Symbols such as台and→can be regarded as not and whether two access paths are aliases or not. logical equivalence and implication connectives respec- Theorem 1.For any shape graph G,any state in tively. SIG]satisfies all assertions in AlGl.That is,Vs SIG.s卡AIG.And for any s'年SIG,fdom(sa)= 2.5 Relations Between Shape Graph and dom(sa)(s ESG)and the size of AbsValue and Field- Symbolic Assertion Var in sf Abs Value x FieldVar-AbsValue UIN,D As assertion,a shape graph can be calculated with is the same as those in sf,then s'AlG. symbolic assertions.When using rules extended from Proof.AccessPath is used to represent the set of ac- Hoare logic,Boolean expressions,such as u==NULL, cess paths of shape graph G.State is used to stand for u!=NULL,u==v and u!=v(where u and v are access S[G].Then GetAbsValue:AccessPath x State-Abs- paths of pointer types),will be used in conjunctions Value UfN,D),which maps access paths to their ab with shape graphs.So special rules are needed to deal stract values,can be defined as follows with this kind of conjunctions.Such symbolic assertion GetAbs Valuelu](sd,sf)= will be removed if it does not conflict with the shape graph and otherwise the whole assertion implies false. sd(u),if u is a declared variable; Fig.8(a)is one of this kind of rules. sf(GetAbs Valuelvl(sd,sf),nert), p if u is in the form of v->next. =NULL→false Aa P 。ea _e'aa' Obviously,the abstract value of an access path u (a) (b) (u E AccessPath)pointing to a node is the address of the heap block represented by the node or in the set Fig.8.Part of the rules for conjunctions of shape graphs and {N,D.If access paths u and v(u,v E AccessPath) symbolic assertions point to the same node in a shape graph,they have the If an assertion a'appears in the symbolic assertion same abstract value.Hence,if there is neither conden- and it affects e,a'should be added to corresponding sation node nor predicate node in G,Theorem 1 holds node constraint.One of this kind of rules is shown in Theorem 1 can be generalized to generic G which can Fig.8(b).If aA a'is false,the whole assertion is also be proved by induction using similar steps as shown in false. Definition 4. Note that names u,uk under the arrows represent Theorem 2.Using any transform rule of shape access paths of corresponding edges (names above the graphs,,G÷G'orG÷G1VG2 can be inferred,. edges are labels)in Fig.8(a).In the following the pa- S[G]=S[G]or SIG]=S[G1]U S[G2]holds.With re- per confirms to these conventions.Different require- spect to GG,for any state s:SIG,sAIG if and ments should be satisfied when applying rules with ac-
Zhao-Peng Li et al.: A Shape Graph Logic and A Shape System 9 A path in shape graphs represents a pointer related to the last edge, which is identical to the access path for this pointer in programs. They are collectively called access paths later. Shape graphs are graphic representation of equalities and validity of pointers at given program points. Definition 5. The set of assertions AJGK represented by shape graph G includes the following assertions about pointers: 1) Pointers pointing to structure nodes are valid; pointers pointing to the null/dangling nodes are not valid. 2) Pointers pointing to the same structure or predicate node are equal; pointers pointing to the same structure node of an unfolded condensation node are equal. 3) The value of pointers pointing to the null node is NULL; pointers pointing to dangling nodes are dangling pointers. 4) Pointers pointing to predicate nodes make corresponding predicates hold. For instance, ptr == hd(->nxt) m->nxt appears in the assertion set represented by the shape graph of Fig.3(b). One can infer from the set of assertions that whether two access paths of pointer types are equal or not and whether two access paths are aliases or not. Theorem 1. For any shape graph G, any state in SJGK satisfies all assertions in AJGK. That is, ∀s : SJGK. s ² AJGK. And for any s 0 ∈ S / JGK, if dom(s 0 d ) = dom(sd) (s ∈SJGK) and the size of AbsValue and FieldVar in s 0 f : AbsValue × FieldVar → AbsValue ∪ {N , D} is the same as those in sf , then s 0 2AJGK. Proof. AccessPath is used to represent the set of access paths of shape graph G. State is used to stand for SJGK. Then GetAbsValue:AccessPath × State → AbsValue ∪{N , D}, which maps access paths to their abstract values, can be defined as follows. GetAbsValueJuKhsd, sf i = sd(u), if u is a declared variable; sf (GetAbsValueJvKhsd, sf i, next), if u is in the form of v-> next. Obviously, the abstract value of an access path u (u ∈ AccessPath) pointing to a node is the address of the heap block represented by the node or in the set {N , D}. If access paths u and v (u, v ∈ AccessPath) point to the same node in a shape graph, they have the same abstract value. Hence, if there is neither condensation node nor predicate node in G, Theorem 1 holds. Theorem 1 can be generalized to generic G which can be proved by induction using similar steps as shown in Definition 4. Theorem 2. Using any transform rule of shape graphs, G ⇔ G0 or G ⇔ G1 ∨ G2 can be inferred, SJGK=SJG0 K or SJGK=SJG1K ∪ SJG2K holds. With respect to G ⇔ G0 , for any state s: SJGK, s²AJGK if and only if s ² AJG0 K. With respect to G ⇔ G1 ∨ G2, for any state s: SJGK, s ² AJGK if and only if s ² AJG1K or s ² AJG2K. For rules of implication, this theorem still holds if the equality between state sets is changed into ⊆ and if and only if is changed to the logical implication. For each transform rule, Theorem 2 can be proved based on the construction rules of state sets in Definition 4. For instance, from the rule in Fig.5(e) we can get G ⇔ G1 ∨ G2. Hence using rule 3 in Definition 4, SJGK is SJG1K ∪ SJG2K, where the states in SJG1K and SJG2K satisfy AJG1K and AJG2K respectively. From Theorem 2, if shape graphs are regarded as assertion sets, the transform rules of shape graphs are equivalence or implication rules of the assertion calculus. These rules are valid with respect to our machine model. Shape graphs, as assertions about pointers in data structures, are part of program assertions. We use G1 ∨G2 ∨· · ·∨Gk in disjunction normal form (DNF) to express different cases of data structure at a program point. Symbols such as ⇔ and ⇒ can be regarded as logical equivalence and implication connectives respectively. 2.5 Relations Between Shape Graph and Symbolic Assertion As assertion, a shape graph can be calculated with symbolic assertions. When using rules extended from Hoare logic, Boolean expressions, such as u==NULL, u!=NULL, u==v and u!=v (where u and v are access paths of pointer types), will be used in conjunctions with shape graphs. So special rules are needed to deal with this kind of conjunctions. Such symbolic assertion will be removed if it does not conflict with the shape graph and otherwise the whole assertion implies false. Fig.8(a) is one of this kind of rules. Fig.8. Part of the rules for conjunctions of shape graphs and symbolic assertions. If an assertion a 0 appears in the symbolic assertion and it affects e, a 0 should be added to corresponding node constraint. One of this kind of rules is shown in Fig.8(b). If a ∧ a 0 is false, the whole assertion is also false. Note that names u, uk under the arrows represent access paths of corresponding edges (names above the edges are labels) in Fig.8(a). In the following the paper confirms to these conventions. Different requirements should be satisfied when applying rules with ac-
10 J.Comput.Sci.Technol.,Nov.2013,Vol.28,No.6 cess paths below edges other than with labels above from a function inserting a node into a circular doubly- edges.We name the window W in this rule.If we want linked list.The type of the nodes is typedef struct to apply this rule to XW],there must exist an access nodef Node*;Node*r}Node.In Fig.9,p points to path u in X W]to the node of W. a node of the circular doubly-linked list,and s points These rules can be easily proven to be valid with to the node which should be inserted into the left side of respect to our machine model. the node pointed by q.Shape graphs and corresponding access path sets are given as assertions at some program 2.6 Relations Between Shape Graphs and points.In the assertions,n (n>0)is used to denote Access Path Sets the length of the list.And i(0r->I in which p and s->r->l are access paths in g is not included). a C-like program language)as assertions.Pointer logic Relations between shape graphs(nodes and directed can be regarded as a symbolic version of shape graph edges)with access path sets include the following as- logic.So in this subsection,we compare shape graphs pects: with access path sets in the pointer logic. 1)Every structure node nd corresponds to exact one We show the differences and relations using a quick access path set st.Directed edges pointing to the struc- example(see Fig.9).This example is a code fragment ture node nd have one-to-one correspondences with Shape Graphs as Assertions Access Path Sets as Assertions {p,p(->r)n,p->r->I}A k:1i-1.{p(->r) p(->r)k+1>A {g,p(->r),p(->r)+1>}A D k:i+1n-1.{p(->r) i-1.i>0 p(->r)k+1->号A{s}A n-i-1,i>0Ai>n s->r,s->DAi>OAn>i s->l=q->l;q->1->r=s /wNext shape graphs and access path sets of case where i>1 are given (case for i=1 is omitted)*/ {,9(->r)n-,p>r->l}A k:1i-2.{p->r), p(->r)k+1->A {s->,p->r)-1,g->A i-2,i>l n-i-1,i>OAi>n p(->r)',8}A {s->T}DA {9,q>>} k:i+1.n-1.{q(->r)k D q(->r)k-+1->}Ai>1An>i 8->r=q;q4r>1=8 fp,9(->r)m-i,p->r->1A k:1.i-1.{p(>r), p(>r)k+1>}A {s,pP(->T),g>}A {8->r,9,q->r->}A i-1.i>1 n-i-1,i>lAn>i h:i+1n-1.g(->r)k-, q(->r)-+1>l}Ai>1An>i s=NULL;q=NULL: /Next shape graphs and access path sets of all cases where are shown*/ {p,p->r)n+1,p->->A k:1.n,{p(->r) p->r)k+1->号A{g,8}NAn≥0 n.n>=0 Fig.9.Code fragment with assertions for function inserting a node into a circular doubly-linked list
10 J. Comput. Sci. & Technol., Nov. 2013, Vol.28, No.6 cess paths below edges other than with labels above edges. We name the window W in this rule. If we want to apply this rule to X[W], there must exist an access path u in X[W] to the node of W. These rules can be easily proven to be valid with respect to our machine model. 2.6 Relations Between Shape Graphs and Access Path Sets Prior to the design of shape graph logic, we have designed a pointer logic[8] using access path set (e.g., {p, s-> r-> l} in which p and s->r->l are access paths in a C-like program language) as assertions. Pointer logic can be regarded as a symbolic version of shape graph logic. So in this subsection, we compare shape graphs with access path sets in the pointer logic. We show the differences and relations using a quick example (see Fig.9). This example is a code fragment from a function inserting a node into a circular doublylinked list. The type of the nodes is typedef struct node{Node* l; Node* r; } Node. In Fig.9, p points to a node of the circular doubly-linked list, and s points to the node which should be inserted into the left side of the node pointed by q. Shape graphs and corresponding access path sets are given as assertions at some program points. In the assertions, n (n > 0) is used to denote the length of the list. And i (0 < i < n) is the number of nodes from the node pointed by p to the node pointed by q following the r link (the node pointed by q is not included). Relations between shape graphs (nodes and directed edges) with access path sets include the following aspects: 1) Every structure node nd corresponds to exact one access path set st. Directed edges pointing to the structure node nd have one-to-one correspondences with Fig.9. Code fragment with assertions for function inserting a node into a circular doubly-linked list