《计算机科学》相关教学资源（参考文献）Modular Verification of Linearizability with Non-Fixed Linearization Points.pdf_大学文库

Modular Verification of Linearizability with Non-Fixed Linearization Points Hongjin Liang Xinyu Feng University of Science and Technology of China lhj1018@mail.ustc.edu.cn xyfeng@ustc.edu.cn Abstract intended operation.When a thread A detects conflicts with another Locating linearization points (LPs)is an intuitive approach for thread B,A may access B's descriptor and help B finish its intended proving linearizability.but it is difficult to apply the idea in Hoare- operation first before finishing its own.In this case,B's operation style logic for formal program verification,especially for verify- takes effect at a step from A.Thus its LP should not be in its own ing algorithms whose LPs cannot be statically located in the code code,but in the code of thread A. In this paper,we propose a program logic with a lightweight in- Besides,in optimistic algorithms and lazy algorithms (e.g., strumentation mechanism which can verify algorithms with non- Heller et al.'s lazy set [13)).the LPs might depend on unpredictable fixed LPs,including the most challenging ones that use the help- future interleavings.In those algorithms,a thread may access the ing mechanism to achieve lock-freedom (as in HSY elimination- shared states as if no interference would occur,and validate the based stack),or have LPs depending on unpredictable future exe- accesses later.If the validation succeeds,it finishes the operation; cutions (as in the lazy set algorithm),or involve both features.We otherwise it rolls back and retries.Its LP is usually at a prior state also develop a thread-local simulation as the meta-theory of our access,but only if the later validation succeeds. logic,and show it implies contextual refinement,which is equiv- Reasoning about algorithms with non-fixed LPs has been a alent to linearizability.Using our logic we have successfully ver- long-standing problem.Most existing work either supports only ified various classic algorithms,some of which are used in the simple objects with static LPs in the implementation code (e.g..[2. java.util.concurrent package. 5,19,301).or lacks formal soundness arguments (e.g.,[321).In this paper,we propose a program logic for verification of linearizability Categories and Subject Descriptors D.2.4 [Software Engineer- with non-fixed LPs.For a concrete implementation of an object ing]:Software/Program Verification-Correctness proofs,Formal method,we treat the corresponding abstract atomic operation and methods;F.3.1 [Logics and Meanings of Programs]:Specifying the abstract state as auxiliary states,and insert auxiliary commands and Verifying and Reasoning about Programs at the LP to execute the abstract operation simultaneously with the concrete step.We verify the instrumented implementation in General Terms Theory,Verification an existing concurrent program logic (we will use LRG [8]in Keywords Concurrency;Rely-Guarantee Reasoning;Lineariz- this paper),but extend it with new logic rules for the auxiliary ability;Refinement;Simulation commands.We also give a new relational interpretation to the logic assertions,and show that at the LP,the step of the original concrete 1.Introduction implementation has the same effect as the abstract operation.We handle non-fixed LPs in the following way: Linearizability is a standard correctness criterion for concurrent ob- ject implementations [16].It requires the fine-grained implementa- To support the helping mechanism,we collect a pending thread tion of an object operation to have the same effect with an instanta- pool as auxiliary state,which is a set of threads and their neous atomic operation.To prove linearizability,the most intuitive abstract operations that might be helped.We allow the thread approach is to find a linearization point(LP)in the code of the im- that is currently being verified to use auxiliary commands to plementation,and show that it is the single point where the effect help execute the abstract operations in the pending thread pool of the operation takes place. For future-dependent LPs,we introduce a try-commit mecha- However,it is difficult to apply this idea when the LPs are not nism to reason with uncertainty.The try clause guesses whether fixed in the code of object methods.For a large class of lock- the corresponding abstract operation should be executed and free algorithms with helping mechanism (e.g.,HSY elimination- keeps all possibilities,while commit chooses a specific pos- based stack [14]),the LP of one method might be in the code sible case when we know which guess is correct later. of some other method.In these algorithms,each thread maintains Although our program logic looks intuitive,it is challenging to a descriptor recording all the information required to fulfill its prove that the logic is sound w.r.t.linearizability.Recent work has shown the equivalence between linearizability and contextual re- finement [5,9,10].The latter is often verified by proving simula- tions between the concrete implementation and the atomic opera- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed tion [5].The simulation establishes some correspondence between the executions of the two sides,showing there exists one step in for profit or commercial advantage and that copies bear this notice and the full citation nthe first page.To copy otherwise,to republish,to post to redistribute the concrete execution that fulfills the abstract operation.Given the to lists,requires prior specific permission and/or a fee equivalence between linearizability and refinement,we would ex- PLDI'13,June 16-19,2013,Seattle,WA,USA pect the simulations to justify the soundness of the LP method and Copyright©2013ACM978-1-4503-2014-6/1306..s15.00 to serve as the meta-theory of our logic.However,existing thread- 459

Modular Verification of Linearizability with Non-Fixed Linearization Points Hongjin Liang Xinyu Feng University of Science and Technology of China lhj1018@mail.ustc.edu.cn xyfeng@ustc.edu.cn Abstract Locating linearization points (LPs) is an intuitive approach for proving linearizability, but it is difficult to apply the idea in Hoarestyle logic for formal program verification, especially for verifying algorithms whose LPs cannot be statically located in the code. In this paper, we propose a program logic with a lightweight instrumentation mechanism which can verify algorithms with non- fixed LPs, including the most challenging ones that use the helping mechanism to achieve lock-freedom (as in HSY eliminationbased stack), or have LPs depending on unpredictable future executions (as in the lazy set algorithm), or involve both features. We also develop a thread-local simulation as the meta-theory of our logic, and show it implies contextual refinement, which is equivalent to linearizability. Using our logic we have successfully verified various classic algorithms, some of which are used in the java.util.concurrent package. Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verification – Correctness proofs, Formal methods; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs General Terms Theory, Verification Keywords Concurrency; Rely-Guarantee Reasoning; Linearizability; Refinement; Simulation 1. Introduction Linearizability is a standard correctness criterion for concurrent object implementations [16]. It requires the fine-grained implementation of an object operation to have the same effect with an instantaneous atomic operation. To prove linearizability, the most intuitive approach is to find a linearization point (LP) in the code of the implementation, and show that it is the single point where the effect of the operation takes place. However, it is difficult to apply this idea when the LPs are not fixed in the code of object methods. For a large class of lockfree algorithms with helping mechanism (e.g., HSY eliminationbased stack [14]), the LP of one method might be in the code of some other method. In these algorithms, each thread maintains a descriptor recording all the information required to fulfill its Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI’13, June 16–19, 2013, Seattle, WA, USA. Copyright c 2013 ACM 978-1-4503-2014-6/13/06. . . $15.00 intended operation. When a thread A detects conflicts with another thread B, A may access B’s descriptor and help B finish its intended operation first before finishing its own. In this case, B’s operation takes effect at a step from A. Thus its LP should not be in its own code, but in the code of thread A. Besides, in optimistic algorithms and lazy algorithms (e.g., Heller et al.’s lazy set [13]), the LPs might depend on unpredictable future interleavings. In those algorithms, a thread may access the shared states as if no interference would occur, and validate the accesses later. If the validation succeeds, it finishes the operation; otherwise it rolls back and retries. Its LP is usually at a prior state access, but only if the later validation succeeds. Reasoning about algorithms with non-fixed LPs has been a long-standing problem. Most existing work either supports only simple objects with static LPs in the implementation code (e.g., [2, 5, 19, 30]), or lacks formal soundness arguments (e.g., [32]). In this paper, we propose a program logic for verification of linearizability with non-fixed LPs. For a concrete implementation of an object method, we treat the corresponding abstract atomic operation and the abstract state as auxiliary states, and insert auxiliary commands at the LP to execute the abstract operation simultaneously with the concrete step. We verify the instrumented implementation in an existing concurrent program logic (we will use LRG [8] in this paper), but extend it with new logic rules for the auxiliary commands. We also give a new relational interpretation to the logic assertions, and show that at the LP, the step of the original concrete implementation has the same effect as the abstract operation. We handle non-fixed LPs in the following way: • To support the helping mechanism, we collect a pending thread pool as auxiliary state, which is a set of threads and their abstract operations that might be helped. We allow the thread that is currently being verified to use auxiliary commands to help execute the abstract operations in the pending thread pool. • For future-dependent LPs, we introduce a try-commit mechanism to reason with uncertainty. The try clause guesses whether the corresponding abstract operation should be executed and keeps all possibilities, while commit chooses a specific possible case when we know which guess is correct later. Although our program logic looks intuitive, it is challenging to prove that the logic is sound w.r.t. linearizability. Recent work has shown the equivalence between linearizability and contextual re- finement [5, 9, 10]. The latter is often verified by proving simulations between the concrete implementation and the atomic operation [5]. The simulation establishes some correspondence between the executions of the two sides, showing there exists one step in the concrete execution that fulfills the abstract operation. Given the equivalence between linearizability and refinement, we would expect the simulations to justify the soundness of the LP method and to serve as the meta-theory of our logic. However, existing thread- 459

local simulations do not support non-fixed LPs (except the recent 1 readPair(int i,j){ work [31].which we will discuss in Sec.7).We will explain the 1 push(int v){ 2 local a,b,v,w; challenges in detail in Sec.2. 2 local x,t,bi 3 while(true){ Our work is inspired by the earlier work on linearizability 3 x :new node(v); verification,in particular the use of auxiliary code and states by 4 do{ RGSim [19],but makes the following new contributions: 6 x.next :=t; 6 if(v m[i].v){ 7 7 return (a,b);} proof for linearizability with non-fixed LPs.Our logic is built 8 while(!b); 8]] upon the unary program logic LRG [8],but we give a relational 9] 9 write(int i,d){ interpretation of assertions and rely/guarantee conditions.We (a)Treiber Stack 10] also introduce new logic rules for auxiliary commands used (c)Pair Snapshot specifically for linearizability proofs. 1 push(int v){ 2 local p,him,q; We give a light instrumentation mechanism to relate con- 3 p :new thrdDescriptor(cid,PUSH,v); crete implementations with abstract operations.The system- while(true){ atic use of auxiliary states and commands makes it possible 5 if (tryPush(v))return; to execute the abstract operations synchronously with the con- 6 loc[cid]:p; crete code.The try-commit clauses allow us to reason about him :rand();q :loc[him]; future-dependent uncertainty without resorting to prophecy if (q !null &q.id him &q.op POP) variables [1,32],whose existing semantics (e.g.,[1])is un- 9 if (cas(&loc[cid],p,null)){ suitable for Hoare-style verification. 10 We design a novel thread-local simulation as the meta-theory 11 if (b)return; for our logic.It generalizes RGSim [19]and other composi- 12 44 tional reasoning of refinement (e.g.,[5,30])with the support (b)HSY Elimination-Based Stack for non-fixed LPs. Figure 1.LPs and Instrumented Auxiliary Commands Instead of ensuring linearizability directly,the program logic and the simulation both establish contextual refinement,which we prove is equivalent to linearizability.A program logic for a sequence of values with"::"for concatenation.Then push(v) contextual refinement is interesting in its own right,since con- can be linearized at the successful cas since it is the single point textual refinement is also a widely accepted (and probably more where the operation takes effect. natural)correctness criterion for library code. We can encode the above reasoning in an existing (unary)con- current program logic,such as Rely-Guarantee reasoning [17]and We successfully apply our logic to verify 12 well-known algo- CSL [24].Inspired by Vafeiadis [32],we embed the abstract oper- rithms.Some of them are used in the java.util.concurrent ation and the abstract state 0 as auxiliary states on the concrete package,such as MS non-blocking queue [23]and Harris- side,so the program state now becomes ((,0)).where a is the Michael lock-free list [11.22]. original concrete state.Then we instrument the concrete implemen- In the rest of this paper,we first analyze the challenges in the tation with an auxiliary command linself(shorthand for "linearize logic design and explain our approach informally in Sec.2.Then self)at the LP to update the auxiliary state.Intuitively,linself will we give the basic technical setting in Sec.3,including a formal execute the abstract operation over the abstract state 0,as de- operational definition of linearizability.We present our program scribed in the following operational semantics rule: logic in Sec.4,and the new simulation relation as the meta-theory (Y,8)…(end,8) in Sec.5.In Sec.6 we summarize all the algorithms we have (linself,(o,(y,0)）-→(skip,(a,(end,8)） verified and sketch the proofs of three representative algorithms We discuss related work and conclude in Sec.7. Here~encodes the transition of yy at the abstract level,and end is a termination marker.We insert linself into the same atomic block Challenges and Our Approach with the concrete statement at the LP,such as line 7'in Fig.1(a), so that the concrete and abstract sides are executed simultaneously. Below we start from a simple program logic for linearizability with Here the atomic block (C)means C is executed atomically.Then fixed LPs,and extend it to support algorithms with non-fixed LPs. we reason about the instrumented code using a traditional concur- We also discuss the problems with the underlying meta-theory rent logic extended with a new inference rule for linself. which establishes the soundness of the logic w.r.t.linearizability. The idea is intuitive,but it cannot handle more advanced algo- rithms with non-fixed LPs,including the algorithms with the help- 2.1 Basic Logic for Fixed LPs ing mechanism and those whose locations of LPs depend on the We first show a simple and intuitive logic which follows the LP future interleavings.Below we analyze the two challenges in detail approach.As a working example,Fig.1(a)shows the implementa- and explain our solutions using two representative algorithms,the tion of push in Treiber stack [29](let's first ignore the blue code at HSY stack and the pair snapshot. line 7').The stack object is implemented as a linked list pointed to by S,and push(v)repeatedly tries to update S to point to the new 2.2 Support Helping Mechanism with Pending Thread Pool node using compare-and-swap (cas)until it succeeds. HSY elimination-based stack [14]is a typical example using the To verify linearizability,we first locate the LP in the code.The helping mechanism.Figure 1(b)shows part of its push method LP of push(v)is at the cas statement when it succeeds (line 7). implementation.The basic idea behind the algorithm is to let a push That is,the successful cas can correspond to the abstract atomic and a pop cancel out each other. PUSH(v)operation:Stk :=v::Stk;and all the other concrete At the beginning of the method in Fig.1(b),the thread allocates steps cannot.Here we simply represent the abstract stack Stk as its thread descriptor (line 3),which contains the thread id,the name 460

local simulations do not support non-fixed LPs (except the recent work [31], which we will discuss in Sec. 7). We will explain the challenges in detail in Sec. 2. Our work is inspired by the earlier work on linearizability verification, in particular the use of auxiliary code and states by Vafeiadis [32] and our previous work on thread-local simulation RGSim [19], but makes the following new contributions: • We propose the first program logic that has a formal soundness proof for linearizability with non-fixed LPs. Our logic is built upon the unary program logic LRG [8], but we give a relational interpretation of assertions and rely/guarantee conditions. We also introduce new logic rules for auxiliary commands used specifically for linearizability proofs. • We give a light instrumentation mechanism to relate concrete implementations with abstract operations. The systematic use of auxiliary states and commands makes it possible to execute the abstract operations synchronously with the concrete code. The try-commit clauses allow us to reason about future-dependent uncertainty without resorting to prophecy variables [1, 32], whose existing semantics (e.g., [1]) is unsuitable for Hoare-style verification. • We design a novel thread-local simulation as the meta-theory for our logic. It generalizes RGSim [19] and other compositional reasoning of refinement (e.g., [5, 30]) with the support for non-fixed LPs. • Instead of ensuring linearizability directly, the program logic and the simulation both establish contextual refinement, which we prove is equivalent to linearizability. A program logic for contextual refinement is interesting in its own right, since contextual refinement is also a widely accepted (and probably more natural) correctness criterion for library code. • We successfully apply our logic to verify 12 well-known algorithms. Some of them are used in the java.util.concurrent package, such as MS non-blocking queue [23] and HarrisMichael lock-free list [11, 22]. In the rest of this paper, we first analyze the challenges in the logic design and explain our approach informally in Sec. 2. Then we give the basic technical setting in Sec. 3, including a formal operational definition of linearizability. We present our program logic in Sec. 4, and the new simulation relation as the meta-theory in Sec. 5. In Sec. 6 we summarize all the algorithms we have verified and sketch the proofs of three representative algorithms. We discuss related work and conclude in Sec. 7. 2. Challenges and Our Approach Below we start from a simple program logic for linearizability with fixed LPs, and extend it to support algorithms with non-fixed LPs. We also discuss the problems with the underlying meta-theory, which establishes the soundness of the logic w.r.t. linearizability. 2.1 Basic Logic for Fixed LPs We first show a simple and intuitive logic which follows the LP approach. As a working example, Fig. 1(a) shows the implementation of push in Treiber stack [29] (let’s first ignore the blue code at line 7’). The stack object is implemented as a linked list pointed to by S, and push(v) repeatedly tries to update S to point to the new node using compare-and-swap (cas) until it succeeds. To verify linearizability, we first locate the LP in the code. The LP of push(v) is at the cas statement when it succeeds (line 7). That is, the successful cas can correspond to the abstract atomic PUSH(v) operation: Stk := v::Stk; and all the other concrete steps cannot. Here we simply represent the abstract stack Stk as 1 push(int v) { 2 local x, t, b; 3 x := new node(v); 4 do { 5 t := S; 6 x.next := t; 7 8 } while(!b); 9 } (a) Treiber Stack 1 readPair(int i, j) { 2 local a, b, v, w; 3 while(true) { 4 5 6 if(v = m[i].v) { 6’ commit(cid (end, (a, b))); 7 return (a, b); } 8 }} 9 write(int i, d) { 10 } (c) Pair Snapshot 1 push(int v) { 2 local p, him, q; 3 p := new thrdDescriptor(cid, PUSH, v); 4 while(true) { 5 if (tryPush(v)) return; 6 loc[cid] := p; 7 him := rand(); q := loc[him]; 8 if (q != null && q.id = him && q.op = POP) 9 if (cas(&loc[cid], p, null)) { 10 11 if (b) return; } 12 ... 13 } } (b) HSY Elimination-Based Stack Figure 1. LPs and Instrumented Auxiliary Commands a sequence of values with “::” for concatenation. Then push(v) can be linearized at the successful cas since it is the single point where the operation takes effect. We can encode the above reasoning in an existing (unary) concurrent program logic, such as Rely-Guarantee reasoning [17] and CSL [24]. Inspired by Vafeiadis [32], we embed the abstract operation γ and the abstract state θ as auxiliary states on the concrete side, so the program state now becomes (σ,(γ,θ)), where σ is the original concrete state. Then we instrument the concrete implementation with an auxiliary command linself (shorthand for “linearize self”) at the LP to update the auxiliary state. Intuitively, linself will execute the abstract operation γ over the abstract state θ, as described in the following operational semantics rule: (γ,θ) (end, θ ) (linself, (σ, (γ,θ))) −→ (skip, (σ, (end, θ ))) Here encodes the transition of γ at the abstract level, and end is a termination marker. We insert linself into the same atomic block with the concrete statement at the LP, such as line 7’ in Fig. 1(a), so that the concrete and abstract sides are executed simultaneously. Here the atomic block C means C is executed atomically. Then we reason about the instrumented code using a traditional concurrent logic extended with a new inference rule for linself. The idea is intuitive, but it cannot handle more advanced algorithms with non-fixed LPs, including the algorithms with the helping mechanism and those whose locations of LPs depend on the future interleavings. Below we analyze the two challenges in detail and explain our solutions using two representative algorithms, the HSY stack and the pair snapshot. 2.2 Support Helping Mechanism with Pending Thread Pool HSY elimination-based stack [14] is a typical example using the helping mechanism. Figure 1(b) shows part of its push method implementation. The basic idea behind the algorithm is to let a push and a pop cancel out each other. At the beginning of the method in Fig. 1(b), the thread allocates its thread descriptor (line 3), which contains the thread id, the name 460

of the operation to be performed,and the argument.The current thread cid first tries to perform Treiber stack's push (line 5).It returns if succeeds.Otherwise,it writes its descriptor in the global loc array (line 6)to allow other threads to eliminate its push.The elimination array loc1..n has one slot for each thread,which LP LPs oft and t' potential LP commit holds the pointer to a thread descriptor.The thread randomly reads a slot him in loc (line 7).If the descriptor q says him is doing (a)Simple Simulation (b)Pending Thread Pool (c)Speculation pop.cid tries to eliminate itself with him by two cas instructions. Figure 2.Simulation Diagrams The first clears cid's entry in loc so that no other thread could eliminate with cid (line 9).The second attempts to mark the entry of him in loc as "eliminated with cid"(line 10).If successful,it reads j's data,but only if the validation at line 6 succeeds.That is should be the LPs of both the push of cid and the pop of him,with whether we should linearize the operation at line 5 depends on the the push happening immediately before the pop. future unpredictable behavior of line 6. The helping mechanism allows the current thread to linearize As discussed a lot in previous work (e.g.,[1,32)),the future the operations of other threads,which cannot be expressed in the dependent LPs cannot be handled by introducing history variables, basic logic.It also breaks modularity and makes thread-local ver- which are auxiliary variables storing values or events in the past ex- ification difficult.For the thread cid,its concrete step could cor- ecutions.We have to refer to events coming from the unpredictable respond to the steps of both cid and him at the abstract level.For future.Thus people propose prophecy variables [1.32]as the dual him,a step from its environment could fulfill its abstract operation. of history variables to store future behaviors.But as far as we know We must ensure in the thread-local verification that the two threads there is no semantics of prophecy variables suitable for Hoare-style cid and him always take consistent views on whether and how the local and compositional reasoning. abstract operation of him is done.For example,if we let a concrete Instead of resorting to prophecy variables,we follow the specu- step in cid fulfill the abstract pop of him,we must know him is lation idea [31].For the concrete step at a potential LP(e.g..line 5 indeed doing pop and its pop has not been done before.Otherwise. of readPair),we execute the abstract operation speculatively and we will not be able to compose cid and him in parallel. keep both the result and the original abstract configuration.Later We extend the basic logic to express the helping mechanism. based on the result of the validation (e.g.,line 6 in readPair),we First we introduce a new auxiliary command lin(t)to linearize a keep the appropriate branch and discard the other. specific thread t.For instance,in Fig.1(b)we insert line 10'at For the logic,we introduce two new auxiliary commands: the LP to execute both the push of cid and the pop of him at the trylinself is to do speculation,and commit(p)will commit to the abstract level.We also extend the auxiliary state to record both appropriate branch satisfying the assertion p.In Fig.1(c),we insert abstract operations of cid and him.More generally,we embed lines 5,and 6,,where cid(end,(a,b))means that the cur- a pending thread pool U,which maps threads to their abstract rent thread cid should have done its abstract operation and would operations.It specifies a set of threads whose operations might return (a.b).We also extend the auxiliary state to record the mul- be helped by others.Then under the new state (o,(U,0)),the tiple possibilities of abstract operations and abstract states after semantics of lin(t)just executes the thread t's abstract operation speculation. in U,similarly to the semantics of linself discussed before. Furthermore,we can combine the speculation idea with the The shared pending thread pool U allows us to recover the pending thread pool.We allow the abstract operations in the pend- thread modularity when verifying the helping mechanism.A con- ing thread pool as well as the current thread to speculate.Then crete step of cid could fulfill the operation of him in U as well as we could handle some trickier algorithms such as RDCSS [12].in its own abstract operation;and conversely,the thread him running which the location of LP for thread t may be in the code of some in parallel could check U to know if its operation has been finished other thread and also depend on the future behaviors of that thread. by others (such as cid)or not.We gain consistent abstract infor- Please see Sec.6 for one such example. mation of other threads in the thread-local verification.Note that the need of 0 itself does not break modularity because the required 2.4 Simulation as Meta-Theory information of other threads'abstract operations can be inferred from the concrete state.In the HSY stack example,we know him The LP proof method can be understood as building simulations be- is doing pop by looking at its thread descriptor in the elimination tween the concrete implementations and the abstract atomic opera- array.In this case U can be viewed as an abstract representation of tions,such as the simple weak simulation in Fig.2(a).The lower- the elimination array. level and higher-level arrows are the steps of the implementation and of the abstract operation respectively,and the dashed lines de- note the simulation relation.We use dark nodes and white nodes 2.3 Try-Commit Commands for Future-Dependent LPs at the abstract level to distinguish whether the operation has been Another challenge is to reason about optimistic algorithms whose finished or not.The only step at the concrete side corresponding LPs depend on the future interleavings. to the single abstract step should be the LP of the implementation We give a toy example,pair snapshot [27],in Fig.1(c).The (labeled"LP"in the diagram).Since our program logic is based on object is an array m,each slot of which contains two fields:d for the LP method,we can expect simulations to justify its soundness. the data and v for the version number.The write(i,d)method In particular,we want a thread-local simulation which can handle (lines 9)updates the data stored at address i and increments the both the helping mechanism and future-dependent LPs and can en- version number instantaneously.The readPair(i,j)method in- sure linearizability. tends to perform an atomic read of two slots i and j in the presence To support helping in the simulation,we should allow the LP of concurrent writes.It reads the data at slots i and j separately at step at the concrete level to correspond to an abstract step made by lines 4 and 5,and validate the first read at line 6.If i's version num a thread other than the one being verified.This requires informa- ber has not been increased,the thread knows that when it read j's tion from other threads at the abstract side,thus makes it difficult data at line 5,i's data had not been updated.This means the two to build a thread-local simulation.To address the problem,we intro reads were at a consistent state,thus the thread can return.We can duce the pending thread pool at the abstract level of the simulation see that the LP of readPair should be at line 5 when the thread just as in the development of our logic in Sec.2.2.The new simula- 461

of the operation to be performed, and the argument. The current thread cid first tries to perform Treiber stack’s push (line 5). It returns if succeeds. Otherwise, it writes its descriptor in the global loc array (line 6) to allow other threads to eliminate its push. The elimination array loc[1..n] has one slot for each thread, which holds the pointer to a thread descriptor. The thread randomly reads a slot him in loc (line 7). If the descriptor q says him is doing pop, cid tries to eliminate itself with him by two cas instructions. The first clears cid’s entry in loc so that no other thread could eliminate with cid (line 9). The second attempts to mark the entry of him in loc as “eliminated with cid” (line 10). If successful, it should be the LPs of both the push of cid and the pop of him, with the push happening immediately before the pop. The helping mechanism allows the current thread to linearize the operations of other threads, which cannot be expressed in the basic logic. It also breaks modularity and makes thread-local verification difficult. For the thread cid, its concrete step could correspond to the steps of both cid and him at the abstract level. For him, a step from its environment could fulfill its abstract operation. We must ensure in the thread-local verification that the two threads cid and him always take consistent views on whether and how the abstract operation of him is done. For example, if we let a concrete step in cid fulfill the abstract pop of him, we must know him is indeed doing pop and its pop has not been done before. Otherwise, we will not be able to compose cid and him in parallel. We extend the basic logic to express the helping mechanism. First we introduce a new auxiliary command lin(t) to linearize a specific thread t. For instance, in Fig. 1(b) we insert line 10’ at the LP to execute both the push of cid and the pop of him at the abstract level. We also extend the auxiliary state to record both abstract operations of cid and him. More generally, we embed a pending thread pool U, which maps threads to their abstract operations. It specifies a set of threads whose operations might be helped by others. Then under the new state (σ,(U, θ)), the semantics of lin(t) just executes the thread t’s abstract operation in U, similarly to the semantics of linself discussed before. The shared pending thread pool U allows us to recover the thread modularity when verifying the helping mechanism. A concrete step of cid could fulfill the operation of him in U as well as its own abstract operation; and conversely, the thread him running in parallel could check U to know if its operation has been finished by others (such as cid) or not. We gain consistent abstract information of other threads in the thread-local verification. Note that the need of U itself does not break modularity because the required information of other threads’ abstract operations can be inferred from the concrete state. In the HSY stack example, we know him is doing pop by looking at its thread descriptor in the elimination array. In this case U can be viewed as an abstract representation of the elimination array. 2.3 Try-Commit Commands for Future-Dependent LPs Another challenge is to reason about optimistic algorithms whose LPs depend on the future interleavings. We give a toy example, pair snapshot [27], in Fig. 1(c). The object is an array m, each slot of which contains two fields: d for the data and v for the version number. The write(i,d) method (lines 9) updates the data stored at address i and increments the version number instantaneously. The readPair(i,j) method intends to perform an atomic read of two slots i and j in the presence of concurrent writes. It reads the data at slots i and j separately at lines 4 and 5, and validate the first read at line 6. If i’s version number has not been increased, the thread knows that when it read j’s data at line 5, i’s data had not been updated. This means the two reads were at a consistent state, thus the thread can return. We can see that the LP of readPair should be at line 5 when the thread (a) Simple Simulation (b) Pending Thread Pool (c) Speculation Figure 2. Simulation Diagrams reads j’s data, but only if the validation at line 6 succeeds. That is, whether we should linearize the operation at line 5 depends on the future unpredictable behavior of line 6. As discussed a lot in previous work (e.g., [1, 32]), the futuredependent LPs cannot be handled by introducing history variables, which are auxiliary variables storing values or events in the past executions. We have to refer to events coming from the unpredictable future. Thus people propose prophecy variables [1, 32] as the dual of history variables to store future behaviors. But as far as we know, there is no semantics of prophecy variables suitable for Hoare-style local and compositional reasoning. Instead of resorting to prophecy variables, we follow the speculation idea [31]. For the concrete step at a potential LP (e.g., line 5 of readPair), we execute the abstract operation speculatively and keep both the result and the original abstract configuration. Later based on the result of the validation (e.g., line 6 in readPair), we keep the appropriate branch and discard the other. For the logic, we introduce two new auxiliary commands: trylinself is to do speculation, and commit(p) will commit to the appropriate branch satisfying the assertion p. In Fig. 1(c), we insert lines 5’ and 6’, where cid (end,(a, b)) means that the current thread cid should have done its abstract operation and would return (a, b). We also extend the auxiliary state to record the multiple possibilities of abstract operations and abstract states after speculation. Furthermore, we can combine the speculation idea with the pending thread pool. We allow the abstract operations in the pending thread pool as well as the current thread to speculate. Then we could handle some trickier algorithms such as RDCSS [12], in which the location of LP for thread t may be in the code of some other thread and also depend on the future behaviors of that thread. Please see Sec. 6 for one such example. 2.4 Simulation as Meta-Theory The LP proof method can be understood as building simulations between the concrete implementations and the abstract atomic operations, such as the simple weak simulation in Fig. 2(a). The lowerlevel and higher-level arrows are the steps of the implementation and of the abstract operation respectively, and the dashed lines denote the simulation relation. We use dark nodes and white nodes at the abstract level to distinguish whether the operation has been finished or not. The only step at the concrete side corresponding to the single abstract step should be the LP of the implementation (labeled “LP” in the diagram). Since our program logic is based on the LP method, we can expect simulations to justify its soundness. In particular, we want a thread-local simulation which can handle both the helping mechanism and future-dependent LPs and can ensure linearizability. To support helping in the simulation, we should allow the LP step at the concrete level to correspond to an abstract step made by a thread other than the one being verified. This requires information from other threads at the abstract side, thus makes it difficult to build a thread-local simulation. To address the problem, we introduce the pending thread pool at the abstract level of the simulation, just as in the development of our logic in Sec. 2.2. The new simula- 461

(MName）f∈String (ThrdlD)t E Nat (Expr)E ::x n E+E... (Mem)o E (PVarUNat)-Int (BExp)B ::true false E=E !B .. (CallStk)::=(L:,C)o (Instr)c:=:=E:=[E]]:=E print(E) (ThrdPool)K:={t1∽K1,.·,tn∽Kn} |x=cons(E,,E)「dispose(E)|· (PState）S:=(acao,C) (Stmt)C ::skip cx:=f(E)return E noret (LState)s ::=(dc,o,K) (C)C;C if(B)C else C while (B){C) (Em)）e=(t,f,n)I(t,ok,n）I(t,obj,abort) (Prog）W:=skip|letΠinCl..l‖C (t,out,n)(t,clt,abort) (ODecl)II ::={f1(1,C1),...,fn(n,Cn)} (ETrace)H::=e e::H Figure 3.Syntax of the Programming Language Figure 4.States and Event Traces tion is shown in Fig.2(b).We can see that a concrete step of thread We use a runtime command noret to abort methods that termi- t could help linearize the operation of t'in the pending thread pool nate but do not execute return E.It is automatically appended to as well as its own operation.Thus the new simulation intuitively the method code and is not supposed to be used by programmers. supports the helping mechanism. Other commands are mostly standard.Clients can use print(E)to As forward simulations,neither of the simulations in Fig.2(a) produce observable external events.We do not allow the object's and (b)supports future-dependent LPs.For each step along the con- methods to produce external events.To simplify the semantics,we crete execution in those simulations,we need to decide immedi- also assume there are no nested method calls. ately whether the step is at the LP,and cannot postpone the decision Figure 4 gives the model of program states.Here we partition a to the future.As discussed a lot in previous work (e.g.,[1,3.6,211). global state S into the client memory oe,the object oo and a thread we have to introduce backward simulations or hybrid simulations pool K.A client can only access the client memory oe,unless it to support future-dependent LPs.Here we exploit the speculation calls object methods.The thread pool maps each thread id t to its idea and develop a forward-backward simulation [21].As shown in local call stack frame.A call stack k could be either empty (o)when Fig.2(c).we keep both speculations after the potential LP,where the thread is not executing a method,or a triple (o,C),where the higher black nodes result from executing the abstract operation o maps the method's formal argument and local variables (if any) and the lower white nodes record the original abstract configura- to their values.is the caller's variable to receive the return value. tion.Then at the validation step we commit to the correct branch. and C is the caller's remaining code to be executed after the method Finally,to ensure linearizability,the thread-local simulation has returns.To give a thread-local semantics,we also define the thread to be compositional.As a counterexample,we can construct a local view s of the state. simple simulation(like the one in Fig.2(a))between the following Figure 5 gives selected rules of the operational semantics.We implementation C and the abstract atomic increment operation y, show three kinds of transitions:for the top-level program but C is not linearizable w.r.t. transitions,- t.n for the transitions of thread t with the methods' declaration II,and-t for the steps inside method calls of thread C:local t;t :x;x :t+1; y:x++ t.To describe the operational semantics for threads.we use an The reason is that the simple simulation is not compositional w.rt. execution context E: parallel compositions.To address this problem,we proposed a (ExecContext)E::=[]E:C compositional simulation RGSim [19]in previous work.The idea is to parameterize the simple simulation with the interference with The hole shows the place where the execution of code occurs. the environment,in the form of rely/guarantee conditions (R and E[C]represents the code resulting from placing C into the hole. G)[17].RGSim says,the concrete executions are simulated by the We label transitions with events e defined in Fig.4.An event abstract executions under interference from the environment R,and could be a method invocation (t,f,n)or return (t,ok,n).a fault all the related state transitions of the thread being verified should (t,obj,abort)produced by the object method code,an output satisfy G.For parallel composition,we check that the guarantee (t,out,n)generated by print(E),or a fault (t,clt,abort)from the G of each thread is permitted in the rely R of the other.Then the client code.The first two events are called object events,and the last simulation becomes compositional and can ensure linearizability. two are observable external events.The third one (t,obj,abort) We combine the above ideas and develop a new compositional belongs to both classes.An event trace H is then defined as a finite simulation with the support of non-fixed LPs as the meta-theory of sequence of events. our logic.We will discuss our simulation formally in Sec.5. 3.2 Object Specification and Linearizability 3.Basic Technical Settings and Linearizability Next we formalize the object specification T,which maps method names to their abstract operations as shown in Fig.6.y trans- In this section,we formalize linearizability of an object implemen- forms an argument value and an initial abstract object to a return tation w.r.t.its specification,and show that linearizability is equiv- value with a resulting abstract object in a single step.It specifies alent to contextual refinement. the intended sequential behaviors of the method.The abstract ob- 3.1 Language and Semantics ject representation 0 is defined as a mapping from program vari- ables to abstract values.We leave the abstract values unspecified As shown in Fig.3,a program W contains several client threads in here,which can be instantiated by programmers. parallel,each of which could call the methods declared in the object Then we give an abstract version of programs W,where clients II.A method is defined as a pair (C).where r is the formal interact with the abstract object specification I instead of its im- argument and C is the method body.For simplicity,we assume plementation II.The semantics is almost the same as the concrete there is only one object in W and each method takes one argument language shown in Sec.3.1,except that the abstract atomic opera only,but it is easy to extend our work with multiple objects and tion y is executed when the method is called,which now operates arguments. over the abstract object 0 instead of over the concrete one oo.The 462

(MName) f ∈ String (Expr) E ::= x | n | E + E | ... (BExp) B ::= true | false | E = E | !B | ... (Instr) c ::= x := E | x := [E] | [E] := E | print(E) | x := cons(E,...,E) | dispose(E) | ... (Stmt) C ::= skip | c | x := f(E) | return E | noret | C | C; C | if (B) C else C | while (B){C} (Prog) W ::= skip | let Π in C ...C (ODecl) Π ::= {f1 (x1, C1),...,fn (xn, Cn)} Figure 3. Syntax of the Programming Language tion is shown in Fig. 2(b). We can see that a concrete step of thread t could help linearize the operation of t in the pending thread pool as well as its own operation. Thus the new simulation intuitively supports the helping mechanism. As forward simulations, neither of the simulations in Fig. 2(a) and (b) supports future-dependent LPs. For each step along the concrete execution in those simulations, we need to decide immediately whether the step is at the LP, and cannot postpone the decision to the future. As discussed a lot in previous work (e.g., [1, 3, 6, 21]), we have to introduce backward simulations or hybrid simulations to support future-dependent LPs. Here we exploit the speculation idea and develop a forward-backward simulation [21]. As shown in Fig. 2(c), we keep both speculations after the potential LP, where the higher black nodes result from executing the abstract operation and the lower white nodes record the original abstract configuration. Then at the validation step we commit to the correct branch. Finally, to ensure linearizability, the thread-local simulation has to be compositional. As a counterexample, we can construct a simple simulation (like the one in Fig. 2(a)) between the following implementation C and the abstract atomic increment operation γ, but C is not linearizable w.r.t. γ. C : local t; t := x; x := t + 1; γ : x++ The reason is that the simple simulation is not compositional w.r.t. parallel compositions. To address this problem, we proposed a compositional simulation RGSim [19] in previous work. The idea is to parameterize the simple simulation with the interference with the environment, in the form of rely/guarantee conditions (R and G) [17]. RGSim says, the concrete executions are simulated by the abstract executions under interference from the environment R, and all the related state transitions of the thread being verified should satisfy G. For parallel composition, we check that the guarantee G of each thread is permitted in the rely R of the other. Then the simulation becomes compositional and can ensure linearizability. We combine the above ideas and develop a new compositional simulation with the support of non-fixed LPs as the meta-theory of our logic. We will discuss our simulation formally in Sec. 5. 3. Basic Technical Settings and Linearizability In this section, we formalize linearizability of an object implementation w.r.t. its specification, and show that linearizability is equivalent to contextual refinement. 3.1 Language and Semantics As shown in Fig. 3, a program W contains several client threads in parallel, each of which could call the methods declared in the object Π. A method is defined as a pair (x, C), where x is the formal argument and C is the method body. For simplicity, we assume there is only one object in W and each method takes one argument only, but it is easy to extend our work with multiple objects and arguments. (ThrdID) t ∈ Nat (Mem) σ ∈ (PVar ∪ Nat) Int (CallStk) κ ::= (σl, x, C) | ◦ (ThrdPool) K ::= {t1 κ1,...,tn κn} (PState) S ::= (σc, σo, K) (LState) s ::= (σc, σo, κ) (Evt) e ::= (t, f,n) | (t, ok, n) | (t, obj, abort) | (t, out, n) | (t, clt, abort) (ETrace) H ::= | e::H Figure 4. States and Event Traces We use a runtime command noret to abort methods that terminate but do not execute return E. It is automatically appended to the method code and is not supposed to be used by programmers. Other commands are mostly standard. Clients can use print(E) to produce observable external events. We do not allow the object’s methods to produce external events. To simplify the semantics, we also assume there are no nested method calls. Figure 4 gives the model of program states. Here we partition a global state S into the client memory σc, the object σo and a thread pool K. A client can only access the client memory σc, unless it calls object methods. The thread pool maps each thread id t to its local call stack frame. A call stack κ could be either empty (◦) when the thread is not executing a method, or a triple (σl, x, C), where σl maps the method’s formal argument and local variables (if any) to their values, x is the caller’s variable to receive the return value, and C is the caller’s remaining code to be executed after the method returns. To give a thread-local semantics, we also define the thread local view s of the state. Figure 5 gives selected rules of the operational semantics. We show three kinds of transitions: −→ for the top-level program transitions, −→t,Π for the transitions of thread t with the methods’ declaration Π, and −t for the steps inside method calls of thread t. To describe the operational semantics for threads, we use an execution context E: (ExecContext) E ::= [ ] | E; C The hole [ ] shows the place where the execution of code occurs. E[ C ] represents the code resulting from placing C into the hole. We label transitions with events e defined in Fig. 4. An event could be a method invocation (t,f,n) or return (t, ok, n), a fault (t, obj, abort) produced by the object method code, an output (t, out, n) generated by print(E), or a fault (t, clt, abort) from the client code. The first two events are called object events, and the last two are observable external events. The third one (t, obj, abort) belongs to both classes. An event trace H is then defined as a finite sequence of events. 3.2 Object Specification and Linearizability Next we formalize the object specification Γ, which maps method names to their abstract operations γ, as shown in Fig. 6. γ transforms an argument value and an initial abstract object to a return value with a resulting abstract object in a single step. It specifies the intended sequential behaviors of the method. The abstract object representation θ is defined as a mapping from program variables to abstract values. We leave the abstract values unspecified here, which can be instantiated by programmers. Then we give an abstract version of programs W, where clients interact with the abstract object specification Γ instead of its implementation Π. The semantics is almost the same as the concrete language shown in Sec. 3.1, except that the abstract atomic operation γ is executed when the method is called, which now operates over the abstract object θ instead of over the concrete one σo. The 462

(C,(ae,oo,K()》,n(C,(c2,。,) letⅡinCl..C..‖Cn,(oc,oo,)e(etΠinCl..Ci...Cn,(a2,a。,K{ix') (a)Program Transitions Π(f)=(g,C)【Ele=nx∈dom(oc)k=({y“n以，x,E[skip]) f是dom(四)or【Ele undefined or r是dom(ae) Ef(E)(e)(C;moret,(e)) (E()(e)))nabort k=(01,I,C)[ElGowa =n ae=detxn} [Eloc =n (E[return E]())() :.n(C,(a2,ao,o》 (E[print(E)],())(ou) .n(E[skip],(c,o,)) (C,o wa)t(C',owa)dom(a1)=dom(aj) (noret,）s:ohj.abort t,abort (C,(ae;o;(a1,I,Ce)))n(C',(de,p:(aj,r,Ce))) (b)Thread Transitions IE,lo undefined(1≤j≤i)orx是dom(a) (C;a)(skip,' (C,o)abort (E[x :cons(E1,...,Ei)],a)-tabort (E[(C)].)(E[skip].') (E[(C)],o)-◆t abort (c)Thread Transitions Inside Method Calls Figure 5.Selected Rules of Concrete Operational Semantics (AbsObj)B∈Par一AbsVal (MSpec）Y∈lnt+AbsObj一Imt×AbsObj We use HW,(e,)]to represent the set of histories pro- duced by the executions of W with the initial client memory (0Spec）T={f11,,fn∽m} oc.the object oo,and empty call stacks for all threads,and use (AbsProg)W::=skip with I do Cll...IlC H[W,(e,0)]to generate histories from the abstract program W with the initial client memory oe and the abstract object 0. Figure 6.Object Specification and Abstract Program A legal sequential history H is a history generated by any client using the specification I and an initial abstract object 0. abstract operation generates a pair of invocation and return events TP(8,H)些 atomically.Due to space limit,we give the semantics in TR [18]. 3n,C1,...,Cn:c.HE HI(with I do C1ll...IICn),(c:)l Linearizability.Linearizability [16]is defined using the notion of Then an object is linearizable iff all its completed concurrent histo- histories,which are special event traces H consisting of only object ries are linearizable w.rt.some legal sequential histories. events (i.e.,invocations,returns and object faults). Definition 2(Linearizability of Objects).The object's implemen- Below we use H(i)for the i-th event of H,and H for the tation II is linearizable w.r.t.its specification T under a refinement length of H.Hl represents the sub-history consisting of all the mapping,denoted byⅡ≤eT,if events whose thread id is t.The predicates is_inv(e)and is_res(e) mean that the event e is a method invocation and a response (i.e.,a Vn,C1,...,Cn;de,do,0,H. return or an object fault)respectively.We say a response e2 matches H∈Hl(letΠinC1ll..lCn),(ac,a。)lA(o(ao)=) → an invocation e iff they have the same thread id. 3He,H'.He∈completions(H)AfD(O,H')AHe≤nH A history H is sequential iff the first event of H is an invoca- Here the mapping relates concrete objects to abstract ones: tion,and each invocation,except possibly the last,is immediately followed by a matching response.Then H is well-formed iff,for all (RefMap)p∈Mem一AbsObj t.Ht is sequential.H is complete iff it is well-formed and every The side condition ()=0 in the above definition requires invocation has a matching response.An invocation is pending if no the initial concrete object o to be a well-formed data structure matching response follows it.We handle pending invocations in an representing a valid object 0. incomplete history H following the standard linearizability defini- tion [16]:we append zero or more response events to H,and drop 3.3 Contextual Refinement and Linearizability the remaining pending invocations.Then we get a set of complete Next we define contextual refinement between the concrete object histories.which is denoted by completions(H).We give formal and its specification,and prove its equivalence to linearizability definitions of the above concepts in TR [18]. This equivalence will serve as the basis of our logic soundness w.r.t. Definition 1(Linearizable Histories).Hn H'iff linearizability. Informally,the contextual refinement says,for any set of client 1.∀t.H:=H'; threads,the program W has no more observable behaviors than the 2.there exists a bijectionπ：{l,,lH}→{1，，lH'T}such corresponding abstract program W.Below we use [W,(e,)] that Vi.H(i)=H((i))and to represent the set of observable event traces generated during the i,i.i<jA is-res(H(i)A is_inv(H()=→T()<π(i) executions of W with the initial state (e,)(and empty stacks) It is defined similarly as W,(e,)but now the traces consist That is,H is linearizable w.nt.H'if the latter is a permutation of of observable events only (output events,client faults or object the former,preserving the order of events in the same threads and faults).The observable event traces [w,(c,0)]generated by the order of the non-overlapping method calls. the abstract program is defined similarly 463

(Ci, (σc, σo, K(i))) e −→i,Π (C i, (σ c, σ o, κ )) (let Π in C1 ...Ci ...Cn, (σc, σo, K)) e −→ (let Π in C1 ...C i ...Cn, (σ c, σ o, K{i κ })) (a) Program Transitions Π(f)=(y, C) Eσc = n x ∈ dom(σc) κ = ({y n}, x, E[skip ]) (E[ x := f(E) ], (σc, σo, ◦)) (t,f,n) −−−−→t,Π (C; noret, (σc, σo, κ)) f ∈ dom(Π) or Eσc undefined or x ∈ dom(σc) (E[ x := f(E) ], (σc, σo, ◦)) (t,clt,abort) −−−−−−−→t,Π abort κ = (σl, x, C) Eσoσl = n σ c = σc{x n} (E[ return E ], (σc, σo, κ)) (t,ok,n) −−−−−→t,Π (C, (σ c, σo, ◦)) Eσc = n (E[ print(E) ], (σc, σo, ◦)) (t,out,n) −−−−−→t,Π (E[skip ], (σc, σo, ◦)) (noret, s) (t,obj,abort) −−−−−−−→t,Π abort (C, σo σl) −t (C , σ o σ l) dom(σl) = dom(σ l) (C, (σc, σo, (σl, x, Cc))) −→t,Π (C , (σc, σ o, (σ l, x, Cc))) (b) Thread Transitions Ej σ undefined (1 ≤ j ≤ i) or x ∈ dom(σ) (E[ x := cons(E1,...,Ei) ], σ) −t abort (C, σ) −∗ t (skip, σ ) (E[ C ], σ) −t (E[skip ], σ ) (C, σ) −∗ t abort (E[ C ], σ) −t abort (c) Thread Transitions Inside Method Calls Figure 5. Selected Rules of Concrete Operational Semantics (AbsObj) θ ∈ PVar AbsVal (MSpec) γ ∈ Int → AbsObj Int × AbsObj (OSpec) Γ ::= {f1 γ1,...,fn γn} (AbsProg) W ::= skip | with Γ do C ...C Figure 6. Object Specification and Abstract Program abstract operation generates a pair of invocation and return events atomically. Due to space limit, we give the semantics in TR [18]. Linearizability. Linearizability [16] is defined using the notion of histories, which are special event traces H consisting of only object events (i.e., invocations, returns and object faults). Below we use H(i) for the i-th event of H, and |H| for the length of H. H|t represents the sub-history consisting of all the events whose thread id is t. The predicates is inv(e) and is res(e) mean that the event e is a method invocation and a response (i.e., a return or an object fault) respectively. We say a response e2 matches an invocation e1 iff they have the same thread id. A history H is sequential iff the first event of H is an invocation, and each invocation, except possibly the last, is immediately followed by a matching response. Then H is well-formed iff, for all t, H|t is sequential. H is complete iff it is well-formed and every invocation has a matching response. An invocation is pending if no matching response follows it. We handle pending invocations in an incomplete history H following the standard linearizability definition [16]: we append zero or more response events to H, and drop the remaining pending invocations. Then we get a set of complete histories, which is denoted by completions(H). We give formal definitions of the above concepts in TR [18]. Definition 1 (Linearizable Histories). H lin H iff 1. ∀t. H|t = H |t; 2. there exists a bijection π : {1,..., |H|} → {1,..., |H |} such that ∀i. H(i) = H (π(i)) and ∀i, j. i < j ∧ is res(H(i)) ∧ is inv(H(j)) =⇒ π(i) < π(j). That is, H is linearizable w.r.t. H if the latter is a permutation of the former, preserving the order of events in the same threads and the order of the non-overlapping method calls. We use H [[ W,(σc, σo) ]] to represent the set of histories produced by the executions of W with the initial client memory σc, the object σo, and empty call stacks for all threads, and use H [[ W,(σc, θ) ]] to generate histories from the abstract program W with the initial client memory σc and the abstract object θ. A legal sequential history H is a history generated by any client using the specification Γ and an initial abstract object θ. Γ (θ, H) def = ∃n, C1,...,Cn, σc. H ∈ H[ ([ with Γ do C1 ...Cn), (σc, θ) ]] Then an object is linearizable iff all its completed concurrent histories are linearizable w.r.t. some legal sequential histories. Definition 2 (Linearizability of Objects). The object’s implementation Π is linearizable w.r.t. its specification Γ under a refinement mapping ϕ, denoted by Π ϕ Γ, iff ∀n, C1,...,Cn, σc, σo, θ, H. H ∈ H[ ([ let Π in C1 ...Cn), (σc, σo) ]] ∧ (ϕ(σo) = θ) =⇒ ∃Hc, H . Hc ∈ completions(H) ∧ Γ (θ, H ) ∧ Hc lin H Here the mapping ϕ relates concrete objects to abstract ones: (RefMap) ϕ ∈ Mem AbsObj The side condition ϕ(σo) = θ in the above definition requires the initial concrete object σo to be a well-formed data structure representing a valid object θ. 3.3 Contextual Refinement and Linearizability Next we define contextual refinement between the concrete object and its specification, and prove its equivalence to linearizability. This equivalence will serve as the basis of our logic soundness w.r.t. linearizability. Informally, the contextual refinement says, for any set of client threads, the program W has no more observable behaviors than the corresponding abstract program W. Below we use O [[ W,(σc, σo) ]] to represent the set of observable event traces generated during the executions of W with the initial state (σc, σo) (and empty stacks). It is defined similarly as H [[ W,(σc, σo) ]], but now the traces consist of observable events only (output events, client faults or object faults). The observable event traces O [[ W,(σc, θ) ]] generated by the abstract program is defined similarly. 463

(n.sStmt） ::=skip c return E noret (②，)=pxqf卡pA'=q linself lin(E)trylinself (②，)=fΣ卡p∧= I trylin(E)I commit(p)(C)C;C (∑，)=R1*R2if if(B)C else C while (B)C) 21,2,1,兴：（公=*2）Λ(=*) (RelState) =(a,△) A(②1，)FRA(②2，)卡R2 (SpecSer) ={(U1,01),,(Um,8n)} (∑，)=R1⊕R2if (PendThrds) U =(tiT1,...,tnTn} 81,2,1,(公=1⊕2)Λ（②=1⊕） A(②1，)FB1A(②2，)FR2 (AbsOp) r:=(y,n)I (end,n) (RelAss)p,q,I::=true false E=E emp EE Id true] True些true x true Ix→E|E-(Y,E)IE一(end,E) Y(m)(0)=(m',8)(△，n)二（△'，n') 1p*91p⊕gpV9I·: (RelAct)R,G:=pxqI可|R*R|R⊕R|.· (0,n)三(@，n') ({U,)}y△，n)2({U,)》}世△'，n') E,plyE',gl iff Figure 7.Instrumented Code and Relational State Model o,△，n.(a,△)上(E=n)*p =3△'，n'.(△，n)二（△'，n')A(a,△）上(E=n)*g ·兰{(0,0)} where·∈SpecSer I R iff([I]→R)A(R→I×I)A Precise(I) fLg dom(f)n dom(g)=0 △1#△2些U1⊥U2Aa1⊥02，where(W1,01)∈△1A(U2,2)∈△2 Figure 9.Semantics of Actions △1*△2兰{(U1sU2,01w02)1(U1,a1)∈△1A(U2,02)∈△2} 81*2兰(o1w2,△1*△2) Although our logic is based on LRG [8].this approach is mostly in- dependent with the base logic.Similar extensions can also be made where∑1=（(a1,△1)，∑2=(a2,△2)，a1⊥o2and△1t△2 1⊕2些a,AUA2)f=(a,△1)and2=(o,△2) over other logics,such as RGSep [32]. Our logic is proposed to verify object methods only.Verified undefined otherwise object methods are guaranteed to be a contextual refinement of their abstract atomic operations,which ensures linearizability of {E。些IEl. if dom(a)=fv(E) the object.We discuss verification of whole programs consisting of undefined otherwise both client code and object code at the end of Sec.4.3. (o,△)=E=E2if{(E1=E2)}。=trueA△=· (a,△)上emp ifa=0A△=● 4.1 Instrumented Code and States (o,△)=E→E2if3L,n,σ'.{(E1,E2)}。'=(L.n) In Fig.7 we show the syntax of the instrumented code and its Λo=a出{l一n}A△=· state model.As explained in Sec.2.program states for the (a,△)卡x片Eif3n,0.{E}。=nA0={x∽n object method executions now consist of two parts,the physical A△={(08)} object states o and the auxiliary data A.A is a nonempry set of (a,△)FE1一(，E2）if3a1,o2,t,n.a=o1wo2 (0)pairs,each pair representing a speculation of the situation AtE1=tAE2=n at the abstract level.Here 0 is the current abstract object,and A△={({t…(Y,n)},0)} U is a pending thread pool recording the remaining operation to (o,△)卡E一(end,E2)if3o1,2,t,n.a=1yo2 be fulfilled by each thread.It maps a thread id to its remaining A{E1Na1=tA《E2》.2=n abstract operation.which is either(,n)(the operation y needs to A△={({t(end,n),0)} be executed with argument n)or (end,n)(the operation has been ∑卡p*9if321,22.2=B1*22A1FpA2卡9 finished with the return value n).We assume A is always domain- 2卡p⊕9if1,2.2=1⊕22A1FpA22F9 exact,defined as follows: SpecExact(p)if△，△'.(，△)=p)A(-,△)=p)=→（△=△） DomExact(△)gU,8,U',.(U,8)∈△A(U',8)∈△ dom(U)=dom(U)A dom(0)=dom(0'). Figure 8.Semantics of State Assertions It says,all the speculations in A should describe the same set of threads and the same domain of abstract objects.Any A containing a single speculation is domain-exact.Also domain-exactness can Definition 3(Contextual Refinement).IICT iff be preserved under the step of any command in our instrumented n,C1, ·,Cnoc,ao,8.p(a。)=8 language,thus it is reasonable to assume it always holds. ≠Ol(letΠinCl..lCn),(ac,aa)l Below we informally explain the effects over A of the newly CoI(with r do Cill...Il Cn),(e,0)]. introduced commands.We leave their formal semantics to Sec.4.4. The auxiliary command linself executes the unfinished abstract op- Following Filipovic et al.[9].we can prove that linearizability is eration of the current thread in every 0 in A,and changes the ab- equivalent to contextual refinement.We give the proofs in TR [18]. stract object 0 correspondingly.lin(E)executes the abstract opera- Theorem4(Equivalence).Π≤，T←→ⅡgeT. tion of the thread with id E.linself or lin(E)is executed when we know for sure that a step is the linearization point.The trylinself 4.A Relational Rely-Guarantee Style Logic command introduces uncertainty.Since we do not know if the ab- stract operation of the current thread is fulfilled or not at the cur- To prove object linearizability,we first instrument the object imple- rent point,we consider both possibilities.For each ()pair in A mentation by introducing auxiliary states and auxiliary commands that contains unfinished abstract operation of the current thread,we which relate the concrete code with the abstract object and oper- add in Aa new speculation (where the abstract operation ations.Our program logic extends LRG [8]with a relational in- is done and is the resulting abstract object.Since the original terpretation of assertions and new rules for auxiliary commands (U,0)is also kept,we have both speculations in A.Similarly,the 464

[E1,ply[E2,q] 上：{t一(h,E1)*p)linself{t一(end,E2)*q} ·(LINSELF) (LINSELF-END F:{t一(end,E)linself{t一(end,E)} [E1.ply[E2,91 h{E一，B)*p)trylin(E)E一(，E)*p)⊕(E一(emd,E2)*g(任R) SpecExact(p) p→p (TRY-END) ·(COMMIT) Ft{E一(end,E')trylin(E)HE一(end,E')} Ft ip'true}commit(p)p' 上t{p}C{q} (FRAME】 (p)(}(SPEc-CoN) F:{t一(end,E)}E[return E]{t-(end,E》 (RET) 上t{p*r}C{q*r} :{p⊕p'Cg⊕g} Ht (p}C(g}(px q)=G*True [l,G,I上t{p}(C){g pVg÷I*true Sta({p,q},R+ld)IR (ATOM) (ATOM-R) [,G,IF:{p}C){q} R,G,IF {p}(CHa} Figure 10.Selected Inference Rules trylin(E)command introduces speculations about the thread E. Rely and guarantee assertions specify transitions over Here When we have enough knowledge p about the situation of the ab- we follow the syntax of LRG [8].with a new assertion R R2 stract objects and operations,the commit(p)step keeps only the specifying speculative behaviors of the environment.The semantics subset of speculations consistent with p and drops the rest.Here p is given in Fig.9.We will show the use of the assertions in the is a logical assertion about the state >which is explained below. examples of Sec.6. 4.2 Assertions 4.3 Inference Rules Syntax of assertions is shown in Fig.7.Following rely-guarantee The rules of our logic are shown in Fig.10.Rules on the top half are style reasoning,assertions are either single state assertions p and g for sequential Hoare-style reasoning.They are proposed to verify or binary rely/guarantee conditions R and G.Note here states refer code C in the atomic block(C).The judgment is parameterized to the relational states X. with the id t of the current thread. We use standard separation logic assertions such as true,E1= For the linself command,if the abstract operation of the cur- E2,emp and E E2 to specify the memory o.As shown in rent thread has not been done,this command will finish it.Here Fig.8,their semantics is almost standard,but for E=E2 to hold [E1,p]E2,g]in the LINSELF rule describes the behavior of y. over o we require the domain of o contains only the free variables which transforms abstract objects satisfying p to new ones satisfy- in Ei and E2.Here we use E to evaluate E with the extra ing q.E and E2 are the argument and return value respectively. requirement that o contains the exact resource to do the evaluation. The definition is given in Fig.9.The LINSELF-END rule says linself New assertions are introduced to specify A.E specifies has no effects if we know the abstract operation has been finished the abstract object 0 in A,with no speculations of U (abstract The LIN rule and LIN-END rule are similar and omitted here. operations).while E1(,E2)(and E1(end,E2))specifies The TRY rule says that if the thread E has not finished the the singleton speculation of U.Semantics of separating conjunction abstract operation y,it can do speculation using trylin(E).The p*q is similar as in separation logic,except that it is now lifted resulting state contains both cases,one says does not progress to assertions over the relational states Note that the underlying at this point and the other says it does.If the current thread has “disjoint union”over△for separating conjunction should not be already finished the abstract operation,trylin(E)would have no confused with the normal disjoint union operator over sets.The effects,as shown in the TRY-END rule.We omit the TRYSELF rule former (denoted as A1*A2 in Fig.8)describes the split of pending and TRYSELF-END rule for the current thread,which are similar. thread pools and/or abstract objects.For example,the left side A in The above two pairs of rules require us to know for sure either the following equation specifies two speculations of threads ti and the abstract operation has been finished or not.If we want to t2(we assume the abstract object part is empty and omitted here) support uncertainty in the pre-condition,we could first consider and it can be split into two sets A and A2 on the right side,each different cases and then apply the SPEC-CONJ rule,which is like of which describes the speculations of a single thread. the conjunction rule in traditional Hoare logic. {t1Y1□} The COMMIT rule allows us to commit to a specific speculation and drop the rest.commit(p)keeps only the speculations satisfy- 1t2T2 {t2T2,t2T幻} ing p.We require p to describe an exact set of speculations,as de- fined by SpecExact(p)in Fig.8.For example,the following p is The most interesting new assertion is p g,where p and q speculation-exact,while p2 is not: specify two different speculations.It is this assertion that reflects uncertainty about the abstract level.However,the readers should ph兰t一(，n)⊕t一(end,n) not confuse with disjunction.It is more like conjunction since it 2兰t一(，n)vt一(end,n' says A contains both speculations satisfying p and those satisfying In all of our examples in Sec.6,the assertion p in commit(p) g.As an example,the above equation could be formulated at the describes a singleton speculation,so SpecExact(p)trivially holds. assertion level using*and⊕： Before the current thread returns,it must know its abstract (t1一Y1*t2一Y2)⊕(t1一Y1*t2一Y2) operation has been done,as required in the RET rule.We also have t1→Y1*(t2→Y2⊕t2一T%) a standard FRAME rule as in separation logic for local reasoning. 465

Rules in the bottom half show how to do rely-guarantee style (C;o)(C,') C≠E[return-] concurrency reasoning,which are very similar to those in LRG [8] As in LRG,we use a precise invariant I to specify the boundary (C,(a,△)-→：(C,(a',△) of the well-formed shared resource.The ATOM rule says we could U.(W,)e△→U(t)=(end,[E].) reason sequentially about code in the atomic block.Then we can lift it to the concurrent setting as long as its effects over the shared (E[return E],(o,△)）-→t(skip,(a,△)》 resource satisfy the guarantee G.which is fenced by the invariant I. △：△ In this step we assume the environment does not update shared re- source,thus using ld as the rely condition (see Fig.9).To allow (E[linself],(a,△)-→t(E[skip],(a,△)》 general environment behaviors,we should apply the ATOM-R rule IEl。=t△→x△y later,which requires that R be fenced by I and the pre-and post- conditions be stable with respect to R.Here Sta(p,),R)re- (E(trylin(E)l,(a,△)-→t(E[skip],(a,△U△)》 quires that p and g be stable with respect to R,a standard require- SpecExact(p) ment in rely-guarantee reasoning.More rules are shown in TR [18). (a,△lp=(,△) (E[commit(p)l,(a,△)→t(E[skip],(a,△) Linking with client program verification.Our relational logic is introduced for object verification,but it can also be used to verify (C,)→：(C,) (E,)=R client code,since it is just an extension over the general-purpose (C,)(@,y) (C,)R(C,y) concurrent logic LRG (which includes the rule for parallel com- position).Moreover,as we will see in Sec.5,our logic ensures Auxiliary Definitions: contextual refinement.Therefore,to verify a program W,we could replace the object implementation with the abstract operations and U(t)=(Y,n) y(n)()=(n',8) U(t)=(end,n) verify the corresponding abstract program W instead.Since W ab- (U,a)--t(U{t(end,n)},0) (U,0)-t(U,9) stracts away concrete object representation and method implemen- tation details,this approach provides us with"separation and in- (U,8)--t(U',0)） △→t△ formation hiding"[26]over the object,but still keeps enough in- 010 {(0,)}出△→：{(U',8)}U△ formation (i.e.,the abstract operations)about the method calls in (a,△)lp=(a',△)if concurrent client verification. ,△"，△p-(a=a'uaA(△=△'u△"A(a',△p)=p) 4.4 Semantics and Partial Correctness A(△'ldm(△p)=△p)A(△"ldm(△p)n△p=0) We first show some key operational semantics rules for the instru- lD((U,0)I dom({(U,0)))=DA3U,0'.(UWU',W')EA} mented code in Fig.11. dom(△)g(dom(U),dom(e)where(U,a)∈△ A single step execution of the instrumented code by thread t is represented as (C,)(C,').When we reach the Figure 11.Operational Semantics in the Relational State Model return E command (the second rule),we require that there be no uncertainty about thread t at the abstract level in A.That is,in every speculation in A,we always know t's operation has been finished conditions.We could prove the logic ensures partial correctness by with the same return value E.Meanings of the auxiliary commands showing R.G.I Ft ipCfg implies R.G,I Et ipCigt.The have been explained before.Here we use the auxiliary definition details are shown in TR [18].In the next section,we give a stronger △→t△'to formally define their transitions over△.The semantics soundness of the logic,i.e.soundness w.rt.linearizability of commit(p)requires p to be speculation-exact (see Fig.8).Also it uses (A)p=(',A')to filter out the wrong speculations.To 5. Soundness via Simulation ensure locality,this filter allows A to contain some extra resource Our logic intuitively relates the concrete object code with its ab- such as the threads and their abstract operations other than those stract level specification.In this section we formalize the intuition described in p.For example,the following A describes two threads and prove that the logic indeed ensures object linearizability.The t and t2,but we could mention only ti in commit(p). proof is constructed in the following steps.We propose a new rely- 了t1Y1,n1) t1(end,n guarantee-based forward-backward simulation between the con- △： t22,n2t2end,n」 crete code and the abstract operation.We prove the simulation is compositional and implies contextual refinement between the two If p is t(,n),then commit(p)will keep only the left sides,and our logic indeed establishes such a simulation.Thus the speculation and discard the other.p can also be t(,n) t(end,n),then commit(p)will keep both speculations. logic establishes contextual refinement.Finally we get linearizabil- ity following Theorem 4. Given the thread-local semantics,we could next define the tran- Below we first define a rely-guarantee-based forward-backward sition ((C),which describes the behavior of thread simulation.It extends RGSim [19]with the support of the helping t with interference R from the environment. mechanism and speculations. Semantics preservation by the instrumentation.It is easy to see Definition 5(Simulation for Method).(,C)GYiff that the newly introduced auxiliary commands do not change the physical state o,nor do they affect the program control flow.Thus n,o,△.(o,△)卢(t一(，n)*(x=n)*p) the instrumentation does not change program behaviors,unless the =→(C:noret,o)≤r:Gp△. auxiliary commands are inserted into the wrong places and they get Whenever(C,a):G:A.we have the following: stuck,but this can be prevented by our program logic. 1.if C E return.,then Soundness w.r.t.partial correctness.Following LRG [8],we (a)for any C'and a',if (C,)(C',). could give semantics of the logic judgment as R,G,Ip)Cfg) then there exists△'such that△三△'， which encodes partial correctness of C w.rt.the pre-and post- (a,△)，(a',△)=(G*True)and(C',o)h:Gp△ 466

Rules in the bottom half show how to do rely-guarantee style concurrency reasoning, which are very similar to those in LRG [8]. As in LRG, we use a precise invariant I to specify the boundary of the well-formed shared resource. The ATOM rule says we could reason sequentially about code in the atomic block. Then we can lift it to the concurrent setting as long as its effects over the shared resource satisfy the guarantee G, which is fenced by the invariant I. In this step we assume the environment does not update shared resource, thus using Id as the rely condition (see Fig. 9). To allow general environment behaviors, we should apply the ATOM-R rule later, which requires that R be fenced by I and the pre- and postconditions be stable with respect to R. Here Sta({p, q}, R) requires that p and q be stable with respect to R, a standard requirement in rely-guarantee reasoning. More rules are shown in TR [18]. Linking with client program verification. Our relational logic is introduced for object verification, but it can also be used to verify client code, since it is just an extension over the general-purpose concurrent logic LRG (which includes the rule for parallel composition). Moreover, as we will see in Sec. 5, our logic ensures contextual refinement. Therefore, to verify a program W, we could replace the object implementation with the abstract operations and verify the corresponding abstract program W instead. Since W abstracts away concrete object representation and method implementation details, this approach provides us with “separation and information hiding” [26] over the object, but still keeps enough information (i.e., the abstract operations) about the method calls in concurrent client verification. 4.4 Semantics and Partial Correctness We first show some key operational semantics rules for the instrumented code in Fig. 11. A single step execution of the instrumented code by thread t is represented as (C, Σ) −→ t (C , Σ ). When we reach the return E command (the second rule), we require that there be no uncertainty about thread t at the abstract level in Δ. That is, in every speculation in Δ, we always know t’s operation has been finished with the same return value E. Meanings of the auxiliary commands have been explained before. Here we use the auxiliary definition Δ →t Δ to formally define their transitions over Δ. The semantics of commit(p) requires p to be speculation-exact (see Fig. 8). Also it uses (σ, Δ)|p = (σ , Δ ) to filter out the wrong speculations. To ensure locality, this filter allows Δ to contain some extra resource such as the threads and their abstract operations other than those described in p. For example, the following Δ describes two threads t1 and t2, but we could mention only t1 in commit(p). Δ : t1 (γ1, n1) t2 (γ2, n2) , t1 (end, n 1) t2 (end, n 2) If p is t1 (γ1, n1), then commit(p) will keep only the left speculation and discard the other. p can also be t1 (γ1, n1) ⊕ t1 (end, n 1), then commit(p) will keep both speculations. Given the thread-local semantics, we could next define the transition (C, Σ) R −→t (C, Σ), which describes the behavior of thread t with interference R from the environment. Semantics preservation by the instrumentation. It is easy to see that the newly introduced auxiliary commands do not change the physical state σ, nor do they affect the program control flow. Thus the instrumentation does not change program behaviors, unless the auxiliary commands are inserted into the wrong places and they get stuck, but this can be prevented by our program logic. Soundness w.r.t. partial correctness. Following LRG [8], we could give semantics of the logic judgment as R, G, I |=t {p}C{q}, which encodes partial correctness of C w.r.t. the pre- and post- (C, σ) −t (C , σ ) C = E[ return ] (C, (σ, Δ)) −→t (C , (σ , Δ)) ∀U. (U, )∈Δ =⇒ U(t)=(end, Eσ) (E[ return E ], (σ, Δ)) −→t (skip, (σ, Δ)) Δ →t Δ (E[ linself ], (σ, Δ)) −→t (E[skip ], (σ, Δ )) Eσ = t Δ →t Δ (E[ trylin(E) ], (σ, Δ)) −→t (E[skip ], (σ, Δ∪Δ )) SpecExact(p) (σ, Δ)|p = ( , Δ ) (E[ commit(p) ], (σ, Δ)) −→t (E[skip ], (σ, Δ )) (C, Σ) −→t (C , Σ ) (C, Σ) R −→t (C , Σ ) (Σ, Σ ) |= R (C, Σ) R −→t (C, Σ ) Auxiliary Definitions: U(t)=(γ,n) γ(n)(θ)=(n , θ ) (U, θ) t (U{t (end, n )}, θ ) U(t)=(end, n) (U, θ) t (U, θ) ∅ →t ∅ (U, θ) t (U , θ ) Δ →t Δ {(U, θ)} Δ →t {(U , θ )} ∪ Δ (σ, Δ)|p = (σ , Δ ) iff ∃σ,Δ,Δp. (σ = σ σ) ∧ (Δ = Δ Δ) ∧ ((σ ,Δp) |= p) ∧ (Δ |dom(Δp) = Δp) ∧ (Δ|dom(Δp) ∩ Δp = ∅) Δ|D def = {(U,θ) | dom({(U,θ)})=D∧ ∃U ,θ . (U U , θ θ )∈Δ} dom(Δ) def = (dom(U), dom(θ)) where (U, θ) ∈ Δ Figure 11. Operational Semantics in the Relational State Model conditions. We could prove the logic ensures partial correctness by showing R, G, I t {p}C{q} implies R, G, I |=t {p}C{q}. The details are shown in TR [18]. In the next section, we give a stronger soundness of the logic, i.e. soundness w.r.t. linearizability. 5. Soundness via Simulation Our logic intuitively relates the concrete object code with its abstract level specification. In this section we formalize the intuition and prove that the logic indeed ensures object linearizability. The proof is constructed in the following steps. We propose a new relyguarantee-based forward-backward simulation between the concrete code and the abstract operation. We prove the simulation is compositional and implies contextual refinement between the two sides, and our logic indeed establishes such a simulation. Thus the logic establishes contextual refinement. Finally we get linearizability following Theorem 4. Below we first define a rely-guarantee-based forward-backward simulation. It extends RGSim [19] with the support of the helping mechanism and speculations. Definition 5 (Simulation for Method). (x, C) t R;G;p γ iff ∀n, σ, Δ. (σ, Δ) |= (t (γ,n) ∗ (x = n) ∗ p) =⇒ (C; noret, σ) t R;G;p Δ . Whenever (C, σ) t R;G;p Δ, we have the following: 1. if C = E[ return ], then (a) for any C and σ , if (C, σ) −t (C , σ ), then there exists Δ such that Δ ⇒ Δ , ((σ, Δ),(σ , Δ )) |= (G ∗ True) and (C , σ ) t R;G;p Δ ; 466

(b)(C,a)tabort; Objects Helping Fut.LP Java Pkg HS Book 2.for any o'and△'，if(o,△)，(o',△)=(R*ld). Treiber stack [29] then(C,o)hGp△； HSY stack[14】 MS two-lock queue [23] 3.if C =E return E,then there exists n'such thatE=n' MS lock-free queue [23] 1 and(o,△)卡(t一(end,n)*(x=-)*p) DGLM queue [6] Lock-coupling list As in RGSim.(C):G:p Y says,the implementation C Optimistic list [15] 1 is simulated by the abstract operation y under the interference Heller et al.lazy list [13] with the environment,which is specified by R and G.The new Harris-Michael lock-free list simulation holds if the executions of the concrete code C are related Pair snapshot [27] to the speculative executions of some A.The A could specify CCAS [31] abstract operations of other threads that might be helped,as well as RDCSS [12] the current thread t.Initially,the abstract operation of t is y,with the same argument n as the concrete side (i.e..=n).The abstract Table 1.Verified Algorithms Using Our Logic operations of other threads can be known from the precondition p. For each step of the concrete code C,we require it to be safe, and correspond to some steps of A,as shown in the first condition Lemma 7(Logic Ensures Simulation for Method). in Definition 5.We define the transition AA'as follows. For any t,C,.R.G and p,if there exist I and C such that △三△'ifU".0'(U'.0)∈△' R,G,IF:{t一(，x)*p}C{t一(end,)*(e=)*p}, →3U,0.(U,)∈△A(U,)-+*(U',0), where (U,0)-(U,0)3t.(U,0)--(U,0') and Er(C)=(C;noret),then (C):GpY. and (U,0)--+t(,0')has been defined in Fig.11. Here we use Er(C)to erase the instrumented commands in C.The It says,.any(U',g')pair in△'should be“reachable”from△. lemma shows that,verifying C in our logic establishes simulation Specifically,we could execute the abstract operation of some thread between the original code and the abstract operation.It is proved t'(which could be the current thread t or some others),or drop by first showing that our logic ensures the standard rely-guarantee- some (U,)pair in A.The former is like a step of trylin(t')or style partial correctness (see Sec.4.4).Then we build the simula lin(t'),depending on whether or not we keep the original abstract tion by projecting the instrumented semantics(Fig.11)to the con- operation of t'.The latter can be viewed as a commit step,in which crete semantics of C(Fig.5)and the speculative steps=of A. we discard the wrong speculations. Finally.from Lemmas 6 and 7,we get the soundness theorem We also require the related steps at the two levels to satisfy of our logic,which says our logic can verify linearizability the guarantee G*True.G for the shared part and True (arbitrary transitions)for the local part.Symmetrically,the second condition Theorem 8 (Logic Soundness).For any II,I and if there exist in Definition 5 says,the simulation should be preserved under the R.G,p and I such that the following hold for all t, environment interference R ld,R for the shared part and ld 1.for any f.if II(f)=(,C),there exists C such that (identity transitions)for the local part. Finally,when the method returns (the last condition in Defi- R,G,It{t一((f),x)*}C{t一(end,)*(c=)*p}, nition 5),we require the current thread t has finished its abstract operation,and the return values match at the two levels. Er(C)=(C;noret),andx dom(I): Like RGSim,our new simulation is compositional,thus can 2.R =Vt Gu,ptI,and Sta(pt,Re): ensure contextual refinement between the implementation and the abstract operation,as shown in the following lemma. 3.Lp→八Λ：P: then IIT,and thus IIT. Lemma 6(Simulation Implies Contextual Refinement). For any II.I and if there exist R.G.p and I such that the following hold for all t, 6.Examples 1.for any f such that II(f)=(,C).we have II(f):Gp I(f). Our logic gives us an effective approach to verify linearizability. andx主dom(I): As shown in Table 1,we have verified 12 algorithms,including 2.R=V Gv,I R,G).p 1.and Sta(pt,Re): two stacks,three queues,four lists and three algorithms on atomic memory reads or writes.Table 1 summarizes their features,includ- 3.Lp→∧tp ing the helping mechanism(Helping)and future-dependent LPs thenΠgoT. (Fut.LP).Some of them are used in the java.util.concurrent package (Java Pkg).The last column (HS Book)shows whether Here dom(T)means the formal argument x is always in the it occurs in Herlihy and Shavit's classic textbook on concurrent al- local state,and lifts o to a state assertion: gorithms [15].We have almost covered all the fine-grained stacks, queues and lists in the book.We can see that our logic supports Lp兰{(a,{@,))1p(a)= various objects ranging from simple ones with static LPs to sophis- Lemma 6 allows us to prove contextual refinement II I by ticated ones with non-fixed LPs.Although many of the examples showing the simulation II(f)I(f)for each method f. can be verified using other approaches,we provide the first pro- where R,G and p are defined over the shared states fenced by the gram logic which is proved sound and useful enough to verify all invariant I,and the interference constraint R =VG holds of these algorithms.Their complete proofs are given in TR [18]. following Rely-Guarantee reasoning [17].Its proof is similar to the In general we verify linearizability in the following steps.First compositionality proofs of RGSim [19],but now we need to be we instrument the code with the auxiliary commands such as careful with the helping between threads and the speculations.We linself,trylin(E)and commit(p)at proper program points.The give the proofs in TR [18]. instrumentation should not be difficult based on the intuition of the algorithm.Then,we specify the assertions (as in Theorem 8) 467

(b) (C, σ) −t abort; 2. for any σ and Δ , if ((σ, Δ),(σ , Δ )) |= (R ∗ Id), then (C, σ ) t R;G;p Δ ; 3. if C = E[ return E ], then there exists n such that Eσ = n and (σ, Δ) |= (t (end, n ) ∗ (x = ) ∗ p). As in RGSim, (x, C) t R;G;p γ says, the implementation C is simulated by the abstract operation γ under the interference with the environment, which is specified by R and G. The new simulation holds if the executions of the concrete code C are related to the speculative executions of some Δ. The Δ could specify abstract operations of other threads that might be helped, as well as the current thread t. Initially, the abstract operation of t is γ, with the same argument n as the concrete side (i.e., x = n). The abstract operations of other threads can be known from the precondition p. For each step of the concrete code C, we require it to be safe, and correspond to some steps of Δ, as shown in the first condition in Definition 5. We define the transition Δ ⇒ Δ as follows. Δ ⇒ Δ iff ∀U , θ . (U , θ ) ∈ Δ =⇒ ∃U, θ. (U, θ) ∈ Δ ∧ (U, θ) ∗ (U , θ ) , where (U, θ) (U , θ ) def = ∃t. (U, θ) t (U , θ ) and (U, θ) t (U , θ ) has been defined in Fig. 11. It says, any (U , θ ) pair in Δ should be “reachable” from Δ. Specifically, we could execute the abstract operation of some thread t (which could be the current thread t or some others), or drop some (U, θ) pair in Δ. The former is like a step of trylin(t ) or lin(t ), depending on whether or not we keep the original abstract operation of t . The latter can be viewed as a commit step, in which we discard the wrong speculations. We also require the related steps at the two levels to satisfy the guarantee G ∗ True, G for the shared part and True (arbitrary transitions) for the local part. Symmetrically, the second condition in Definition 5 says, the simulation should be preserved under the environment interference R ∗ Id, R for the shared part and Id (identity transitions) for the local part. Finally, when the method returns (the last condition in Defi- nition 5), we require the current thread t has finished its abstract operation, and the return values match at the two levels. Like RGSim, our new simulation is compositional, thus can ensure contextual refinement between the implementation and the abstract operation, as shown in the following lemma. Lemma 6 (Simulation Implies Contextual Refinement). For any Π, Γ and ϕ, if there exist R, G, p and I such that the following hold for all t, 1. for any f such that Π(f)=(x, C), we have Π(f) t Rt;Gt;pt Γ(f), and x ∈ dom(I); 2. Rt = t=t Gt , I {Rt, Gt}, pt ⇒ I, and Sta(pt, Rt); 3. ϕ ⇒ t pt; then Π ϕ Γ. Here x ∈ dom(I) means the formal argument x is always in the local state, and ϕ lifts ϕ to a state assertion: ϕ def = {(σ, {(∅, θ)}) | ϕ(σ) = θ}. Lemma 6 allows us to prove contextual refinement Π ϕ Γ by showing the simulation Π(f) t Rt;Gt;pt Γ(f) for each method f, where R, G and p are defined over the shared states fenced by the invariant I, and the interference constraint Rt = t=t Gt holds following Rely-Guarantee reasoning [17]. Its proof is similar to the compositionality proofs of RGSim [19], but now we need to be careful with the helping between threads and the speculations. We give the proofs in TR [18]. Objects Helping Fut. LP Java Pkg HS Book Treiber stack [29] √ HSY stack [14] √ √ MS two-lock queue [23] √ MS lock-free queue [23] √ √ √ DGLM queue [6] √ Lock-coupling list √ Optimistic list [15] √ Heller et al. lazy list [13] √ √ √ Harris-Michael lock-free list √ √ √ √ Pair snapshot [27] √ CCAS [31] √ √ RDCSS [12] √ √ Table 1. Verified Algorithms Using Our Logic Lemma 7 (Logic Ensures Simulation for Method). For any t, x, C, γ, R, G and p, if there exist I and C such that R, G, I t {t (γ,x) ∗ p} C {t (end, ) ∗ (x = ) ∗ p} , and Er(C)=(C; noret), then (x, C) t R;G;p γ. Here we use Er(C) to erase the instrumented commands in C. The lemma shows that, verifying C in our logic establishes simulation between the original code and the abstract operation. It is proved by first showing that our logic ensures the standard rely-guaranteestyle partial correctness (see Sec. 4.4). Then we build the simulation by projecting the instrumented semantics (Fig. 11) to the concrete semantics of C (Fig. 5) and the speculative steps ⇒ of Δ. Finally, from Lemmas 6 and 7, we get the soundness theorem of our logic, which says our logic can verify linearizability. Theorem 8 (Logic Soundness). For any Π, Γ and ϕ, if there exist R, G, p and I such that the following hold for all t, 1. for any f, if Π(f)=(x, C), there exists C such that Rt, Gt, I t {t(Γ(f), x) ∗ pt} C {t(end, ) ∗ (x= ) ∗ pt} , Er(C)=(C; noret), and x ∈ dom(I); 2. Rt = t=t Gt , pt ⇒ I, and Sta(pt, Rt); 3. ϕ ⇒ t pt; then Π ϕ Γ, and thus Π ϕ Γ. 6. Examples Our logic gives us an effective approach to verify linearizability. As shown in Table 1, we have verified 12 algorithms, including two stacks, three queues, four lists and three algorithms on atomic memory reads or writes. Table 1 summarizes their features, including the helping mechanism (Helping) and future-dependent LPs (Fut. LP). Some of them are used in the java.util.concurrent package (Java Pkg). The last column (HS Book) shows whether it occurs in Herlihy and Shavit’s classic textbook on concurrent algorithms [15]. We have almost covered all the fine-grained stacks, queues and lists in the book. We can see that our logic supports various objects ranging from simple ones with static LPs to sophisticated ones with non-fixed LPs. Although many of the examples can be verified using other approaches, we provide the first program logic which is proved sound and useful enough to verify all of these algorithms. Their complete proofs are given in TR [18]. In general we verify linearizability in the following steps. First we instrument the code with the auxiliary commands such as linself, trylin(E) and commit(p) at proper program points. The instrumentation should not be difficult based on the intuition of the algorithm. Then, we specify the assertions (as in Theorem 8) 467

readPair(int i,j){local a,b,v,w; 1 eng(v){ 16deg()[ {I*(cid→(y.(i,j))} 2 local x,t,s,bi 17 local h,t,s,v,bi 1 while(true){ 3 x :cons(v,null) 18 while (true){ *(cid(:(i,j))e true)} 4 while (true) 19 h :Head;t :Tail; t :Tail;s :t.next; 20 s :h.next; v'.(I A readCell(i,a,v;v))*(cid(,(i,j))true) 6 if (t Tail){ 21 if (h Head) 3 > f(s=nu11)[ 22 if (h=t){ {.(I A readCell(i,a,v:)A readCell(j,b,w;))*afterTry} b:=cas(&(t.next),s,x);23 if (s null) if (v m[i].v){ 9 if (b){ 24 return EMPTY; I*(cid(end,(a,b))true) 10 cas(&Tail,t,x); 25 cas(&Tail,t,s); commit(cid(end,(a,b))); 11 return; 26 Felse [I *(cid-(end,(a,b)))} 12 }else cas(&Tail,t,s); 27 v :s.val; 6 return (a,b); 13 28 b:=cas(&Head,h,s); I *(cid-(end,(a,b)))} 14 29 if(b)return v; 7 15] 30]] Auxiliary definitions: readCell(i,d,v:)(cell(i,d,v)v(cell(i,)A))*true Figure 13.MS Lock-Free Queue Code absRes(cid一(end,(a,b)Ar'=)v(cid一(end,(-,b)Au'≠v) after Try cid一(，(i,j)⊕absRes⊕true 6.2 MS Lock-Free Queue Figure 12.Proof Outline of readPair in Pair Snapshot The widely-used MS lock-free queue [23]also has future-dependent LPs.We show its code in Fig.13. The queue is implemented as a linked list with Head and Tail and reason about the instrumented code by applying our inference pointers.Head always points to the first node (a sentinel)in the list, rules,just like the usual partial correctness verification in LRG.In and Tail points to either the last or second to last node.The eng our experience,handling the auxiliary commands usually would method appends a new node at the tail of the list and advances not introduce much difficulty over the plain verification with LRG. Tail,and deq replaces the sentinel node by its next node and Below we sketch the proofs of three representative examples:the returns the value in the new sentinel.If the list contains only the pair snapshot,MS lock-free queue and the CCAS algorithm. sentinel node,meaning the queue is empty,then deq returns EMPTY The algorithm employs the helping mechanism for the enq 6.1 Pair Snapshot method to swing the Tail pointer when it lags behind the end of As discussed in Sec.2.3.the pair snapshot algorithm has a future- the list.A thread should first try to help the half-finished enq by dependent LP.In Fig.12,we show the proof of readPair for the advancing Tail (lines 12 and 25 in Fig.13)before doing its own current thread cid.We will use y for its abstract operation,which operation.But this helping mechanism would not affect the LP of atomically reads the cells i and j at the abstract level. enq which is statically located at line 8 when the cas succeeds. First,we insert trylinself and commit as highlighted in Fig.12 since the new node already becomes visible in the queue after being The commit command says,when the validation at line 4 succeeds, appended to the list,and updating Tail will not affect the abstract we must have cid(end,(a,b))as a possible speculation.This queue.We simply instrument line 8 as follows to verify eng: actually requires a correct instrumentation of trylinself.In Fig.12 b :cas(&(t.next),s,x);if (b)linself; we insert it at line 3.It cannot be moved to other program points since line 3 is the only place where we could get the abstract return On the other hand,the original queue algorithm [23]checks value (a,b)when executing y.Besides,we cannot replace it by Head or Tail (line 6 or 21 in Fig.13)to make sure that its value a linself,because if line 4 fails later,we have to restart to do the has not been changed since its local copy was read (at line 5 original abstract operation. or 19),and if it fails,the operation will restart.This check can After the instrumentation,we can define the precise invariant 7. improve efficiency of the algorithm,but it makes the LP of the deq the rely R and the guarantee G.The invariant I simply maps every method for the empty queue case depend on future executions.That memory cell (d,v)at the concrete level to a cell with data d at the LP should be at line 20,if the method returns EMPTY at the end abstract level.as shown below: of the same iteration.The intuition is,when we read null from I兰④esiz-(目d,v.cel(a,d,v） h.next at line 20 (indicating the abstract queue must be empty there),we do not know how the iteration would terminate at that where cell(i,d,v)兽（回→(d,v)*(片d) time.If the later check over Head at line 21 fails,the operation Every thread guarantees that when writing a cell,it increases the would restart and line 20 may not be the LP.We can use our try- version number.Here we use [G]I short for (GVId)+ldA (II). commit instrumentation to handle this future-dependent LP.We GWrite]r Write,v.cell(i,v)x cell(i,+1) insert trylinself at line 20,as follows: Then we specify the pre-and post-conditions,and reason about Before the method returns EMPTY,we commit to the finished ab- the instrumented code using our inference rules.The proof follows stract operation,i.e.,we insert commit (cid(end,EMPTY))just the intuition of the algorithm.Note that we relax cid(,(i,j)) before line 24.Also,when we know we have to do another itera- in the precondition of the method to cid(,(i,j))true to tion,we can commit to the original DEQ operation,i.e.,we insert ensure the loop invariant.The latter says,cid may just start (or commit(cidDEQ)at the end of the loop body. restart)its operation and have not done yet. For the case of nonempty queues,the LP of the deq method is The readPair method in the pair snapshot algorithm is "read- statically at line 28 when the cas succeeds.Thus we can instrument only"in the sense that the abstract operation does not update the ab linself there,as shown below. stract object.This perhaps means that it does not matter to linearize b :cas(&Head,h,s);if (b)linself;> the method multiple times.In Sec.6.3 we will verify an algorithm with future-dependent LPs,CCAS,which is not "read-only".We After the instrumentation,we can define /R and G and verify can still "linearize"a method with side effects multiple times the code using our logic rules.The invariant I relates all the nodes 468

readPair(int i, j) { local a, b, v, w; {I ∗ (cid (γ, (i, j)))} 1 while(true) { {I ∗ (cid (γ, (i, j)) ⊕ true)} 2 {∃v . (I ∧ readCell(i, a, v; v )) ∗ (cid (γ, (i, j)) ⊕ true)} 3 {∃v . (I ∧ readCell(i, a, v; v ) ∧ readCell(j, b, w; )) ∗ afterTry} 4 if (v = m[i].v) { {I ∗ (cid (end, (a, b)) ⊕ true)} 5 commit(cid (end, (a, b))); {I ∗ (cid (end, (a, b)))} 6 return (a, b); {I ∗ (cid (end, (a, b)))} 7 }}} Auxiliary definitions: readCell(i, d, v; v ) def = (cell(i, d, v) ∨ (cell(i, , v ) ∧ v = v )) ∗ true absRes def = (cid(end, (a, b)) ∧ v =v)∨(cid(end, ( , b)) ∧ v =v) afterTry def = cid (γ, (i, j)) ⊕ absRes ⊕ true Figure 12. Proof Outline of readPair in Pair Snapshot and reason about the instrumented code by applying our inference rules, just like the usual partial correctness verification in LRG. In our experience, handling the auxiliary commands usually would not introduce much difficulty over the plain verification with LRG. Below we sketch the proofs of three representative examples: the pair snapshot, MS lock-free queue and the CCAS algorithm. 6.1 Pair Snapshot As discussed in Sec. 2.3, the pair snapshot algorithm has a futuredependent LP. In Fig. 12, we show the proof of readPair for the current thread cid. We will use γ for its abstract operation, which atomically reads the cells i and j at the abstract level. First, we insert trylinself and commit as highlighted in Fig. 12. The commit command says, when the validation at line 4 succeeds, we must have cid (end,(a, b)) as a possible speculation. This actually requires a correct instrumentation of trylinself. In Fig. 12, we insert it at line 3. It cannot be moved to other program points since line 3 is the only place where we could get the abstract return value (a, b) when executing γ. Besides, we cannot replace it by a linself, because if line 4 fails later, we have to restart to do the original abstract operation. After the instrumentation, we can define the precise invariant I, the rely R and the guarantee G. The invariant I simply maps every memory cell (d, v) at the concrete level to a cell with data d at the abstract level, as shown below: I def = i∈[1..size].(∃d, v. cell(i, d, v)) where cell(i, d, v) def = (m[i] → (d, v)) ∗ (m[i] ⇒ d)) Every thread guarantees that when writing a cell, it increases the version number. Here we use [G]I short for (G ∨Id) ∗ Id∧(I I). G def = [Write]I Write def = ∃i, v. cell(i, , v) cell(i, , v + 1) The rely R is the same as the guarantee G. Then we specify the pre- and post-conditions, and reason about the instrumented code using our inference rules. The proof follows the intuition of the algorithm. Note that we relax cid (γ,(i, j)) in the precondition of the method to cid (γ,(i, j)) ⊕ true to ensure the loop invariant. The latter says, cid may just start (or restart) its operation and have not done yet. The readPair method in the pair snapshot algorithm is “readonly” in the sense that the abstract operation does not update the abstract object. This perhaps means that it does not matter to linearize the method multiple times. In Sec. 6.3 we will verify an algorithm with future-dependent LPs, CCAS, which is not “read-only”. We can still “linearize” a method with side effects multiple times. 1 enq(v) { 2 local x, t, s, b; 3 x := cons(v, null); 4 while (true) { 5 t := Tail; s := t.next; 6 if (t = Tail) { 7 if (s = null) { 8 b:=cas(&(t.next),s,x); 9 if (b) { 10 cas(&Tail, t, x); 11 return; } 12 }else cas(&Tail, t, s); 13 } 14 } 15 } 16 deq() { 17 local h, t, s, v, b; 18 while (true) { 19 h := Head; t := Tail; 20 s := h.next; 21 if (h = Head) 22 if (h = t) { 23 if (s = null) 24 return EMPTY; 25 cas(&Tail, t, s); 26 }else { 27 v := s.val; 28 b:=cas(&Head,h,s); 29 if(b) return v; } 30 } } Figure 13. MS Lock-Free Queue Code 6.2 MS Lock-Free Queue The widely-used MS lock-free queue [23] also has future-dependent LPs. We show its code in Fig. 13. The queue is implemented as a linked list with Head and Tail pointers. Head always points to the first node (a sentinel) in the list, and Tail points to either the last or second to last node. The enq method appends a new node at the tail of the list and advances Tail, and deq replaces the sentinel node by its next node and returns the value in the new sentinel. If the list contains only the sentinel node, meaning the queue is empty, then deq returns EMPTY. The algorithm employs the helping mechanism for the enq method to swing the Tail pointer when it lags behind the end of the list. A thread should first try to help the half-finished enq by advancing Tail (lines 12 and 25 in Fig. 13) before doing its own operation. But this helping mechanism would not affect the LP of enq which is statically located at line 8 when the cas succeeds, since the new node already becomes visible in the queue after being appended to the list, and updating Tail will not affect the abstract queue. We simply instrument line 8 as follows to verify enq: On the other hand, the original queue algorithm [23] checks Head or Tail (line 6 or 21 in Fig. 13) to make sure that its value has not been changed since its local copy was read (at line 5 or 19), and if it fails, the operation will restart. This check can improve efficiency of the algorithm, but it makes the LP of the deq method for the empty queue case depend on future executions. That LP should be at line 20, if the method returns EMPTY at the end of the same iteration. The intuition is, when we read null from h.next at line 20 (indicating the abstract queue must be empty there), we do not know how the iteration would terminate at that time. If the later check over Head at line 21 fails, the operation would restart and line 20 may not be the LP. We can use our trycommit instrumentation to handle this future-dependent LP. We insert trylinself at line 20, as follows: Before the method returns EMPTY, we commit to the finished abstract operation, i.e., we insert commit(cid (end, EMPTY)) just before line 24. Also, when we know we have to do another iteration, we can commit to the original DEQ operation, i.e., we insert commit(cid DEQ) at the end of the loop body. For the case of nonempty queues, the LP of the deq method is statically at line 28 when the cas succeeds. Thus we can instrument linself there, as shown below. After the instrumentation, we can define I, R and G and verify the code using our logic rules. The invariant I relates all the nodes 468