《计算机科学》相关教学资源（参考文献）A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations

团购合买资源类别：文库，文档格式：PDF，文档页数：14，文件大小：509.77KB

A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations Hongjin Liang Xinyu Feng Ming Fu School of Computer Science and Technology,University of Science and Technology of China Hefei.Anhui 230026.China Ihj1018@mail.ustc.edu.cn xyfeng@ustc.edu.cn fumingOustc.edu.cn Abstract Correctness of compilation and optimizations of concurrent Verifying program transformations usually requires proving that programs.In this most natural program transformation verifica- the resulting program(the target)refines or is equivalent to the tion problem,every compilation phase does a program transfor- original one (the source).However,the refinement relation between mation T,which needs to preserve the semantics of the inputs individual sequential threads cannot be preserved in general with Atomicity of concurrent objects.A concurrent object or library the presence of parallel compositions,due to instruction reordering provides a set of methods that allow clients to manipulate the and the different granularities of atomic operations at the source shared data structure with abstract atomic behaviors [15].Their and the target.On the other hand,the refinement relation defined correctness can be reduced to the correctness of the transforma- based on fully abstract semantics of concurrent programs assumes tion from abstract atomic operations to concrete and executable arbitrary parallel environments,which is too strong and cannot be programs in a concurrent context. satisfied by many well-known transformations. Verifying implementations of software transactional memory In this paper,we propose a Rely-Guarantee-based Simulation (STM).Many languages supporting STM provide a high-level (RGSim)to verify concurrent program transformations.The rela- atomic block atomicC,so that programmers can assume the tion is parametrized with constraints of the environments that the atomicity of the execution of C.Atomic blocks are imple- source and the target programs may compose with.It considers the mented using some STM protocol (e.g.,TL2 [111)that allows interference between threads and their environments,thus is less very fine-grained interleavings.Verifying that the fine-grained permissive than relations over sequential programs.It is composi- program respects the semantics of atomic blocks gives us the tional w.rt.parallel compositions as long as the constraints are sat- correctness of the STM implementation. isfied.Also,RGSim does not require semantics preservation under Correctness of concurrent garbage collectors (GCs).High- all environments,and can incorporate the assumptions about en- level garbage-collected languages (e.g.,Java)allow program- vironments made by specific program transformations in the form mers to work at an abstract level without knowledge of the un- of rely/guarantee conditions.We use RGSim to reason about opti- derlying GC algorithm.However,the concrete and executable mizations and prove atomicity of concurrent objects.We also pro- low-level program involves interactions between the mutators pose a general garbage collector verification framework based on and the collector.If we view the GC implementation as a trans- RGSim,and verify the Boehm et al.concurrent mark-sweep GC. formation from high-level mutators to low-level ones with a Categories and Subject Descriptors D.2.4 [Software Engineer- concrete GC thread,the GC safety can be reduced naturally to ing]:Software/Program Verification-Correctness proofs,Formal the semantics preservation of the transformation. methods;F.3.1 [Logics and Meanings of Programs]:Specifying To verify the correctness of a program transformation T,we and Verifying and Reasoning about Programs follow Leroy's approach [19]and define a refinement relation General Terms Theory,Verification between the target and the source programs,which says the target has no more observable behaviors than the source.Then we can Keywords Concurrency,Program Transformation,Rely-Guarantee formalize the correctness of the transformation as follows: Reasoning.Simulation Correct(T)≌C,C.C=T(C)=→CgC. (1.1) 1. Introduction That is,for any source program C acceptable by T.T(C)is Many verification problems can be reduced to verifying program a refinement of C.When the source and the target are shared- transformations,i.e.,proving the target program of the transforma- state concurrent programs,the refinement needs to satisfy the tion has no more observable behaviors than the source.Below we following requirements to support effective proof of Correct(T): give some typical examples in concurrent settings: Since the target T(C)may be in a different language from the source,the refinement should be general and independent of the language details. Permission to make digital or hard copies of all or part of this work for personal or To verify fine-grained implementations of abstract operations classroom use is granted without fee provided that copies are not made or distributed the refinement should support different views of program states for profit or commercial advantage and that copies bear this notice and the full citation and different granularities of state accesses at the source and the on the first page.To copy otherwisc.to republish,to post toredistribute to lists,requires prior specific permission and/or a fee. target levels. POPL'12.January 25-27.2012.Philadelphia,PA,USA. When T is syntax-directed (and it is usually the case for par- Copyright©2012ACM978-1-4503-1083-3/1201..s10.00 allel compositions,i.e.,T(CC)=T(C)T(C)).a com 455

A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations Hongjin Liang Xinyu Feng Ming Fu School of Computer Science and Technology, University of Science and Technology of China Hefei, Anhui 230026, China lhj1018@mail.ustc.edu.cn xyfeng@ustc.edu.cn fuming@ustc.edu.cn Abstract Verifying program transformations usually requires proving that the resulting program (the target) refines or is equivalent to the original one (the source). However, the refinement relation between individual sequential threads cannot be preserved in general with the presence of parallel compositions, due to instruction reordering and the different granularities of atomic operations at the source and the target. On the other hand, the refinement relation defined based on fully abstract semantics of concurrent programs assumes arbitrary parallel environments, which is too strong and cannot be satisfied by many well-known transformations. In this paper, we propose a Rely-Guarantee-based Simulation (RGSim) to verify concurrent program transformations. The relation is parametrized with constraints of the environments that the source and the target programs may compose with. It considers the interference between threads and their environments, thus is less permissive than relations over sequential programs. It is compositional w.r.t. parallel compositions as long as the constraints are satisfied. Also, RGSim does not require semantics preservation under all environments, and can incorporate the assumptions about environments made by specific program transformations in the form of rely/guarantee conditions. We use RGSim to reason about optimizations and prove atomicity of concurrent objects. We also propose a general garbage collector verification framework based on RGSim, and verify the Boehm et al. concurrent mark-sweep GC. Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verification – Correctness proofs, Formal methods; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs General Terms Theory, Verification Keywords Concurrency, Program Transformation, Rely-Guarantee Reasoning, Simulation 1. Introduction Many verification problems can be reduced to verifying program transformations, i.e., proving the target program of the transformation has no more observable behaviors than the source. Below we give some typical examples in concurrent settings: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. POPL’12, January 25–27, 2012, Philadelphia, PA, USA. Copyright c 2012 ACM 978-1-4503-1083-3/12/01. . . $10.00 • Correctness of compilation and optimizations of concurrent programs. In this most natural program transformation verification problem, every compilation phase does a program transformation T, which needs to preserve the semantics of the inputs. • Atomicity of concurrent objects. A concurrent object or library provides a set of methods that allow clients to manipulate the shared data structure with abstract atomic behaviors [15]. Their correctness can be reduced to the correctness of the transformation from abstract atomic operations to concrete and executable programs in a concurrent context. • Verifying implementations of software transactional memory (STM). Many languages supporting STM provide a high-level atomic block atomic{C}, so that programmers can assume the atomicity of the execution of C. Atomic blocks are implemented using some STM protocol (e.g., TL2 [11]) that allows very fine-grained interleavings. Verifying that the fine-grained program respects the semantics of atomic blocks gives us the correctness of the STM implementation. • Correctness of concurrent garbage collectors (GCs). Highlevel garbage-collected languages (e.g., Java) allow programmers to work at an abstract level without knowledge of the underlying GC algorithm. However, the concrete and executable low-level program involves interactions between the mutators and the collector. If we view the GC implementation as a transformation from high-level mutators to low-level ones with a concrete GC thread, the GC safety can be reduced naturally to the semantics preservation of the transformation. To verify the correctness of a program transformation T, we follow Leroy’s approach [19] and define a refinement relation between the target and the source programs, which says the target has no more observable behaviors than the source. Then we can formalize the correctness of the transformation as follows: Correct(T) ∀C, C. C = T(C) =⇒ C C . (1.1) That is, for any source program C acceptable by T, T(C) is a refinement of C. When the source and the target are sharedstate concurrent programs, the refinement needs to satisfy the following requirements to support effective proof of Correct(T): • Since the target T(C) may be in a different language from the source, the refinement should be general and independent of the language details. • To verify fine-grained implementations of abstract operations, the refinement should support different views of program states and different granularities of state accesses at the source and the target levels. • When T is syntax-directed (and it is usually the case for parallel compositions, i.e., T(C C ) = T(C) T(C )), a com- 455

positional refinement is of particular importance for modular local r1; local r2; verification of T. x:=1; y:=1; 1。y r2:=X However,existing refinement(or equivalence)relations cannot sat- if (r1=0)then if (r2 =0)then isfy all these requirements at the same time.Contextual equiva- critical region critical region lence,the canonical notion for comparing program behaviors.fails to handle different languages since the contexts of the source and (a)Dekker's Mutual Exclusion Algorithm the target will be different.Simulations and logical relations have been used to verify compilation [4,16,19,21].but they are usu- x:=x+1;‖X:=x+1; ally designed for sequential programs (except [21,25],which we Vs. will discuss in Section 8).Since the refinement or equivalence re- local r1; local r2; lation between sequential threads cannot be preserved in general r1=X; ‖r2:=x; with parallel compositions,we cannot simply adapt existing work x=r1+1; x:=r2+1; on sequential programs to verify transformations of concurrent pro- grams.Refinement relations based on fully abstract semantics of (b)Different Granularities of Atomic Operations concurrent programs are compositional,but they assume arbitrary program contexts,which is too strong for many practical transfor- Figure 1.Equivalence Lost after Parallel Composition mations.We will explain the challenges in detail in Section 2. In this paper,we propose a Rely-Guarantee-based Simulation (RGSim)for compositional verification of concurrent transforma- optimizations in Section 5,verify atomicity of concurrent objects tions.By addressing the above problems,we make the following in Section 6,and prove the correctness of concurrent GCs in Sec- tion 7.Finally we discuss related work and conclude in Section 8. contributions RGSim parametrizes the simulation between concurrent pro- 2. grams with rely/guarantee conditions [17],which specify the Challenges and Our Approach interactions between the programs and their environments. The major challenge we face is to have a compositional refinement This makes the corresponding refinement relation composi- relation between concurrent programs.i.e..we should be able to tional w.r.t.parallel compositions,allowing us to decompose know T(C1)T(C2)Cil C2 if we have T(C1)C1 and refinement proofs for multi-threaded programs into proofs for T(C2)C2. individual threads.On the other hand,the rely/guarantee con- ditions can incorporate the assumptions about environments 2.1 Sequential Refinement Loses Parallel Compositionality made by specific program transformations,so RGSim can be applied to verify many practical transformations. Observable behaviors of sequential imperative programs usually re- fer to their control effects (e.g.,termination and exceptions)and Based on the simulation technique.RGSim focuses on com- final program states.However,refinement relations defined cor- paring externally observable behaviors (e.g.,I/O events)only, respondingly cannot be preserved after parallel compositions.It which gives us considerable leeway in the implementations of has been a well-known fact in the compiler community that sound related programs.The relation is mostly independent of the lan- optimizations for sequential programs may change the behaviors guage details.It can be used to relate programs in different of multi-threaded programs [5].The Dekker's algorithm shown in languages with different views of program states and different Figure 1(a)has been widely used to demonstrate the problem.Re- granularities of atomic state accesses. ordering the first two statements of the thread on the left preserves RGSim makes relational reasoning about optimizations possi- its sequential behaviors,but the whole program can no longer en- ble in parallel contexts.We present a set of relational reason- sure exclusive access to the critical region. ing rules to characterize and justify common optimizations in a In addition to instruction reordering,the different granularities concurrent setting,including hoisting loop invariants,strength of atomic operations between the source and the target programs reduction and induction variable elimination,dead code elimi- can also break the compositionality of program equivalence in a nation,redundancy introduction,efc.. concurrent setting.In Figure 1(b),the target program at the bottom RGSim gives us a refinement-based proof method to verify behaves differently from the source at the top (assuming each fine-grained implementations of abstract algorithms and con- statement is executed atomically),although the individual threads current objects.We successfully apply RGSim to verify con- at the target and the source have the same behaviors current counters,the concurrent GCD algorithm,Treiber's non- blocking stack and the lock-coupling list. 2.2 Assuming Arbitrary Environments is Too Strong We reduce the problem of verifying concurrent garbage collec- The problem with the refinement for sequential programs is that it tors to verifying transformations,and present a general GC ver- does not consider the effects of threads'intermediate state accesses ification framework,which combines unary Rely-Guarantee- on their parallel environments.People have given fully abstract se- based verification [17]with relational proofs based on RGSim. mantics to concurrent programs (e.g.,[1,8)).The semantics of a We verify the Boehm et al.concurrent garbage collection algo- program is modeled as a set of execution traces.Each trace is an rithm [7]using our framework.As far as we know,it is the first interleaving of state transitions made by the program itself and ar- time to formally prove the correctness of this algorithm. bitrary transitions made by the environment.Then the refinement between programs can be defined as the subset relation between In the rest of this paper,we first analyze the challenges for com- the corresponding trace sets.Since it considers all possible envi- positional verification of concurrent program transformations,and ronments,the refinement relation has very nice compositionality, explain our approach informally in Section 2.Then we give the ba- but unfortunately is too strong to formulate the correctness of many sic technical settings in Section 3 and present the formal definition well-known transformations,including the four classes of transfor- of RGSim in Section 4.We show the use of RGSim to reason about mations mentioned before: 456

positional refinement is of particular importance for modular verification of T. However, existing refinement (or equivalence) relations cannot satisfy all these requirements at the same time. Contextual equivalence, the canonical notion for comparing program behaviors, fails to handle different languages since the contexts of the source and the target will be different. Simulations and logical relations have been used to verify compilation [4, 16, 19, 21], but they are usually designed for sequential programs (except [21, 25], which we will discuss in Section 8). Since the refinement or equivalence relation between sequential threads cannot be preserved in general with parallel compositions, we cannot simply adapt existing work on sequential programs to verify transformations of concurrent programs. Refinement relations based on fully abstract semantics of concurrent programs are compositional, but they assume arbitrary program contexts, which is too strong for many practical transformations. We will explain the challenges in detail in Section 2. In this paper, we propose a Rely-Guarantee-based Simulation (RGSim) for compositional verification of concurrent transformations. By addressing the above problems, we make the following contributions: • RGSim parametrizes the simulation between concurrent programs with rely/guarantee conditions [17], which specify the interactions between the programs and their environments. This makes the corresponding refinement relation compositional w.r.t. parallel compositions, allowing us to decompose refinement proofs for multi-threaded programs into proofs for individual threads. On the other hand, the rely/guarantee conditions can incorporate the assumptions about environments made by specific program transformations, so RGSim can be applied to verify many practical transformations. • Based on the simulation technique, RGSim focuses on comparing externally observable behaviors (e.g., I/O events) only, which gives us considerable leeway in the implementations of related programs. The relation is mostly independent of the language details. It can be used to relate programs in different languages with different views of program states and different granularities of atomic state accesses. • RGSim makes relational reasoning about optimizations possible in parallel contexts. We present a set of relational reasoning rules to characterize and justify common optimizations in a concurrent setting, including hoisting loop invariants, strength reduction and induction variable elimination, dead code elimination, redundancy introduction, etc.. • RGSim gives us a refinement-based proof method to verify fine-grained implementations of abstract algorithms and concurrent objects. We successfully apply RGSim to verify concurrent counters, the concurrent GCD algorithm, Treiber’s nonblocking stack and the lock-coupling list. • We reduce the problem of verifying concurrent garbage collectors to verifying transformations, and present a general GC verification framework, which combines unary Rely-Guaranteebased verification [17] with relational proofs based on RGSim. • We verify the Boehm et al. concurrent garbage collection algorithm [7] using our framework. As far as we know, it is the first time to formally prove the correctness of this algorithm. In the rest of this paper, we first analyze the challenges for compositional verification of concurrent program transformations, and explain our approach informally in Section 2. Then we give the basic technical settings in Section 3 and present the formal definition of RGSim in Section 4. We show the use of RGSim to reason about local r1; x := 1; r1 := y; if (r1 = 0) then critical region local r2; y := 1; r2 := x; if (r2 = 0) then critical region (a) Dekker’s Mutual Exclusion Algorithm x := x+1; x := x+1; vs. local r1; r1 := x; x := r1 + 1; local r2; r2 := x; x := r2 + 1; (b) Different Granularities of Atomic Operations Figure 1. Equivalence Lost after Parallel Composition optimizations in Section 5, verify atomicity of concurrent objects in Section 6, and prove the correctness of concurrent GCs in Section 7. Finally we discuss related work and conclude in Section 8. 2. Challenges and Our Approach The major challenge we face is to have a compositional refinement relation between concurrent programs, i.e., we should be able to know T(C1) T(C2) C1 C2 if we have T(C1) C1 and T(C2) C2. 2.1 Sequential Refinement Loses Parallel Compositionality Observable behaviors of sequential imperative programs usually refer to their control effects (e.g., termination and exceptions) and final program states. However, refinement relations defined correspondingly cannot be preserved after parallel compositions. It has been a well-known fact in the compiler community that sound optimizations for sequential programs may change the behaviors of multi-threaded programs [5]. The Dekker’s algorithm shown in Figure 1(a) has been widely used to demonstrate the problem. Reordering the first two statements of the thread on the left preserves its sequential behaviors, but the whole program can no longer ensure exclusive access to the critical region. In addition to instruction reordering, the different granularities of atomic operations between the source and the target programs can also break the compositionality of program equivalence in a concurrent setting. In Figure 1(b), the target program at the bottom behaves differently from the source at the top (assuming each statement is executed atomically), although the individual threads at the target and the source have the same behaviors. 2.2 Assuming Arbitrary Environments is Too Strong The problem with the refinement for sequential programs is that it does not consider the effects of threads’ intermediate state accesses on their parallel environments. People have given fully abstract semantics to concurrent programs (e.g., [1, 8]). The semantics of a program is modeled as a set of execution traces. Each trace is an interleaving of state transitions made by the program itself and arbitrary transitions made by the environment. Then the refinement between programs can be defined as the subset relation between the corresponding trace sets. Since it considers all possible environments, the refinement relation has very nice compositionality, but unfortunately is too strong to formulate the correctness of many well-known transformations, including the four classes of transformations mentioned before: 456

Many concurrent languages (e.g.,C++[6])do not give seman- condition R of a thread specifies the permitted state transitions that tics to programs with data races (like the examples shown in its environment may have,and its guarantee G specifies the possi- Figure 1).Therefore the compilers only need to guarantee the ble transitions made by the thread itself.To ensure parallel threads semantics preservation of data-race-free programs. can collaborate,we need to check the interference constraint,i.e., When we prove that a fine-grained implementation of a concur- the guarantee of each thread is permitted in the rely of every others. rent object is a refinement of an abstract atomic operation,we Then we can verify their parallel composition by separately veri can assume that all accesses to the object in the context of the fying each thread,showing its behaviors under the rely condition target program use the same set of primitives. indeed satisfy its guarantee.After parallel composition,the threads should be executed under their common environment (i.e.,the in- Usually the implementation of STM (e.g.,TL2 [11])ensures tersection of their relies)and guarantee all the possible transitions the atomicity of a transaction atomicC only when there are made by them (i.e.,the union of their guarantees). no data races.Therefore,the correctness of the transformation from high-level atomic blocks to fine-grained concurrent code Parametrized with rely/guarantee conditions for the two levels. assumes data-race-freedom in the source. our relation (C,R,g)(C,R,G)talks about not only the target C and the source C,but also the interference R and g between C Many garbage-collected languages are type-safe and prohibit and its target-level environment,and R and G between C and its en- operations such as pointer arithmetics.Therefore the garbage vironment at the source level.Informally,(C,R,g)(C,R,G) collector could make corresponding assumptions about the mu- says the executions of C under the environment R do not exhibit tators that run in parallel. more observable behaviors than the executions of C under the en- In all these cases,the transformations of individual threads are vironment R,and the state transitions of C and C satisfy g and G allowed to make various assumptions about the environments.They respectively.RGSim is now compositional,as long as the threads do not have to ensure semantics preservation within all contexts are composed with well-behaved environments only.The paral- lel compositionality lemma is in the following form.If we know 2.3 Languages at Source and Target May Be Different (C1,R1,91)≤(C1,R1,G1)and(C2,R2,92)(C2,R2,G2), The use of different languages at the source and the target levels and also the interference constraints are satisfied,i.e.,92 C Ri, GI C R2.G2C RI and GIC R2,we could get makes the formulation of the transformation correctness more dif- ficult.If the source and the target languages have different views of (C1lC2,R1∩R2,g1Ug2)(C1‖C2,R1nR2,G1UG2). program states and different atomic primitives,we cannot directly compare the state transitions made by the source and the target pro- The compositionality of RGSim gives us a proof theory for concur- rent program transformations. grams.This is another reason that makes the aforementioned subset relation between sets of program traces in fully abstract semantics Also different from fully abstract semantics for threads.which infeasible.For the same reason,many existing techniques for prov- assumes arbitrary behaviors of environments,RGSim allows us to ing refinement or equivalence of programs in the same language instantiate the interference R,C,R and G differently for different cannot be applied either. assumptions about environments,therefore it can be used to verify the aforementioned four classes of transformations.For instance.if 2.4 Different Observers Make Different Observations we want to prove that a transformation preserves the behaviors of Concurrency introduces tensions between two kinds of observers: data-race-free programs,we can specify the data-race-freedom in human beings(as external observers)and the parallel program con- R and G.Then we are no longer concerned with the examples in texts.External observers do not care about the implementation de- Figure 1.both of which have data races. tails of the source and the target programs.For them,intermediate state accesses (such as memory reads and writes)are silent steps 3. Basic Technical Settings (unobservable),and only external events(such as I/O operations) In this section,we present the source and the target programming are observable.On the other hand,state accesses have effects on languages.Then we define a basic refinement C,which naturally the parallel program contexts,and are not silent to them. says the target has no more externally observable event traces than If the refinement relation relates externally observable event the source.We use as an intuitive formulation of the correctness traces only,it cannot have parallel compositionality,as we ex- of transformations. plained in Section 2.1.On the other hand,relating all state ac- cesses of programs is too strong.Any reordering of state accesses 3.1 The Languages or change of atomicity would fail the refinement. Following standard simulation techniques,we model the seman- 2.5 Our Approach tics of target and source programs as labeled transition systems. Before showing the languages,we first define events and labels in In this paper we propose a Rely-Guarantee-based Simulation Figure 2(a).We leave the set of events unspecified here.It can be (RGSim)<between the target and the source programs.It es- instantiated by program verifiers,depending on their interest (e.g., tablishes a weak simulation,ensuring that for every externally ob- input/output events).A label that will be associated with a state servable event made by the target program there is a corresponding transition is either an event or T,which means the corresponding one in the source.We choose to view intermediate state accesses transition does not generate any event (i.e.,a silent step). as silent steps,thus we can relate programs with different imple- mentation details.This also makes our simulation independent of The target language,which we also call the low-level language, language details. is shown in Figure 2(b).We abstract away the forms of states,ex- pressions and primitive instructions in the language.An arithmetic To support parallel compositionality,our relation takes into expression E is modeled as a function from states to integers lifted account explicitly the expected interference between threads and with an undefined value L.Boolean expressions are modeled sim their parallel environments.Inspired by the Rely-Guarantee (R- ilarly.An instruction is a partial function from states to sets of la- G)verification method [17],we specify the interference using bel and state pairs,describing the state transitions and the events it rely/guarantee conditions.In Rely-Guarantee reasoning,the rely generates.We use P(-)to denote the power set.Unsafe executions 457

• Many concurrent languages (e.g., C++ [6]) do not give semantics to programs with data races (like the examples shown in Figure 1). Therefore the compilers only need to guarantee the semantics preservation of data-race-free programs. • When we prove that a fine-grained implementation of a concurrent object is a refinement of an abstract atomic operation, we can assume that all accesses to the object in the context of the target program use the same set of primitives. • Usually the implementation of STM (e.g., TL2 [11]) ensures the atomicity of a transaction atomic{C} only when there are no data races. Therefore, the correctness of the transformation from high-level atomic blocks to fine-grained concurrent code assumes data-race-freedom in the source. • Many garbage-collected languages are type-safe and prohibit operations such as pointer arithmetics. Therefore the garbage collector could make corresponding assumptions about the mutators that run in parallel. In all these cases, the transformations of individual threads are allowed to make various assumptions about the environments. They do not have to ensure semantics preservation within all contexts. 2.3 Languages at Source and Target May Be Different The use of different languages at the source and the target levels makes the formulation of the transformation correctness more dif- ficult. If the source and the target languages have different views of program states and different atomic primitives, we cannot directly compare the state transitions made by the source and the target programs. This is another reason that makes the aforementioned subset relation between sets of program traces in fully abstract semantics infeasible. For the same reason, many existing techniques for proving refinement or equivalence of programs in the same language cannot be applied either. 2.4 Different Observers Make Different Observations Concurrency introduces tensions between two kinds of observers: human beings (as external observers) and the parallel program contexts. External observers do not care about the implementation details of the source and the target programs. For them, intermediate state accesses (such as memory reads and writes) are silent steps (unobservable), and only external events (such as I/O operations) are observable. On the other hand, state accesses have effects on the parallel program contexts, and are not silent to them. If the refinement relation relates externally observable event traces only, it cannot have parallel compositionality, as we explained in Section 2.1. On the other hand, relating all state accesses of programs is too strong. Any reordering of state accesses or change of atomicity would fail the refinement. 2.5 Our Approach In this paper we propose a Rely-Guarantee-based Simulation (RGSim) between the target and the source programs. It establishes a weak simulation, ensuring that for every externally observable event made by the target program there is a corresponding one in the source. We choose to view intermediate state accesses as silent steps, thus we can relate programs with different implementation details. This also makes our simulation independent of language details. To support parallel compositionality, our relation takes into account explicitly the expected interference between threads and their parallel environments. Inspired by the Rely-Guarantee (RG) verification method [17], we specify the interference using rely/guarantee conditions. In Rely-Guarantee reasoning, the rely condition R of a thread specifies the permitted state transitions that its environment may have, and its guarantee G specifies the possible transitions made by the thread itself. To ensure parallel threads can collaborate, we need to check the interference constraint, i.e., the guarantee of each thread is permitted in the rely of every others. Then we can verify their parallel composition by separately verifying each thread, showing its behaviors under the rely condition indeed satisfy its guarantee. After parallel composition, the threads should be executed under their common environment (i.e., the intersection of their relies) and guarantee all the possible transitions made by them (i.e., the union of their guarantees). Parametrized with rely/guarantee conditions for the two levels, our relation (C, R, G) (C, R, G) talks about not only the target C and the source C, but also the interference R and G between C and its target-level environment, and R and G between C and its environment at the source level. Informally, (C, R, G) (C, R, G) says the executions of C under the environment R do not exhibit more observable behaviors than the executions of C under the environment R, and the state transitions of C and C satisfy G and G respectively. RGSim is now compositional, as long as the threads are composed with well-behaved environments only. The parallel compositionality lemma is in the following form. If we know (C1, R1, G1) (C1, R1, G1) and (C2, R2, G2) (C2, R2, G2), and also the interference constraints are satisfied, i.e., G2 ⊆ R1, G1 ⊆ R2, G2 ⊆ R1 and G1 ⊆ R2, we could get (C1 C2, R1 ∩ R2, G1 ∪ G2) (C1 C2, R1 ∩ R2, G1 ∪ G2). The compositionality of RGSim gives us a proof theory for concurrent program transformations. Also different from fully abstract semantics for threads, which assumes arbitrary behaviors of environments, RGSim allows us to instantiate the interference R, G, R and G differently for different assumptions about environments, therefore it can be used to verify the aforementioned four classes of transformations. For instance, if we want to prove that a transformation preserves the behaviors of data-race-free programs, we can specify the data-race-freedom in R and G. Then we are no longer concerned with the examples in Figure 1, both of which have data races. 3. Basic Technical Settings In this section, we present the source and the target programming languages. Then we define a basic refinement , which naturally says the target has no more externally observable event traces than the source. We use as an intuitive formulation of the correctness of transformations. 3.1 The Languages Following standard simulation techniques, we model the semantics of target and source programs as labeled transition systems. Before showing the languages, we first define events and labels in Figure 2(a). We leave the set of events unspecified here. It can be instantiated by program verifiers, depending on their interest (e.g., input/output events). A label that will be associated with a state transition is either an event or τ , which means the corresponding transition does not generate any event (i.e., a silent step). The target language, which we also call the low-level language, is shown in Figure 2(b). We abstract away the forms of states, expressions and primitive instructions in the language. An arithmetic expression E is modeled as a function from states to integers lifted with an undefined value ⊥. Boolean expressions are modeled similarly. An instruction is a partial function from states to sets of label and state pairs, describing the state transitions and the events it generates. We use P( ) to denote the power set. Unsafe executions 457

(Events）e:= level languages are atomic in the interleaving semantics.Below we 444 (Labels)o::=er use --for zero or multiple-step transitions with no events (a)Events and Transition Labels generated,and*_for multiple-step transitions with only one event e generated. (LState) (LEpr）E.∈LState-→lntn 3.2 The Event Trace Refinement (LBExp)B∈LState→{true,fase}L Now we can formally define the refinement relation that relates (LInstr）c ∈LState一P(Labels×LState)U{abort)) the set of externally observable event traces generated by the target and the source programs.A trace is a sequence of events e,and may (LStmt)C ::skip c C1;C2 if (B)C1 else C2 end with a termination marker done or a fault marker abort. while (B)C CiC2 (LStep)-→L∈P(LStu/{skip}x LState)x Labels (EvrTrace)::=e done abort e:: x((LStmt x LState)Ufabort)) Definition 1(Event Trace Set).ETrSet,(C,o)represents a set of (b)The Low-Level Language external event traces produced by C inn steps from the state: 。ETrSeto(C,a)e{e}: (HState）∑：=. (HExpr）E ∈HState→ntL ·ETrSetn+-i(C,o)e (HBEp)B∈HState→{true,false} {E|(C,o)-→(C',o)AE∈ETrSetn(C',a') V(C,a)e(C',a）AE∈ETrSet(C',a)AE=e:E (HInstr）c∈HState一P(Labels x HState)U{abort})】 V(C.o)abort A&=abort (HSmmt)C::=skip c C1::C2 if B then C1 else C2 VC=skip A&=done. while B do C C1C2 (HSep)-→L∈P(HSm/八skp}×HState)x Labels We define ETrSet(C,)asU ETrSetn(C,a). x((HSmt x HState)U{abort)) We overload the notation and use ETrSet(C.>for the high-level language.Then we define an event trace refinement as the subset (c)The High-Level Language relation between event trace sets,which is similar to Leroy's re- finement property [19]. Figure 2.Generic Languages at Target and Source Levels Definition 2(Event Trace Refinement).We say (C,o)is an e- trace refinement of (C,)i.e.,(C,a)(C,).if and only if lead to abort.Note that the semantics of an instruction could be ETrSet(C,o)C ETrSet(C,>) non-deterministic.Moreover,it might be undefined on some states making it possible to model blocking operations such as acquiring The refinement is defined for program configurations instead of a lock. for code only because the initial states may affect the behaviors of programs.In this case,the transformation T should translate Statements are either primitive instructions or compositions of states as well as code.We overload the notation and use T()to them.skip is a special statement used as a flag to show the end of represent the state transformation,and use CC for executions.A single-step execution of statements is modeled as a labeled transitionwhich is a triple of an initial program o,∑.o=T()=→(C,σ)E(C,), configuration (a pair of statement and state).a label and a resulting then Correct(T)defined in formula (1.1)can be reformulated as configuration.It is undefined when the initial statement is skip.The step aborts if an unsafe instruction is executed. Correct(T)会∀C,C.C=T(C)=→C二rC.(3.1) The high-level language (source language)is defined similarly 4. The RGSim Relation in Figure 2(c),but it is important to note that its states and primitive instructions may be different from those in the low-level language. The e-trace refinement is defined directly over the externally ob- The compound statements are almost the same as their low-level servable behaviors of programs.It is intuitive,and also abstract in counterparts.C;C2 and CC2 are sequential and parallel com- that it is independent of language details.However,as we explained positions of C and C2 respectively.Note that we choose to use the before,it is not compositional w.rt.parallel compositions.In this same set of compound statements in the two languages for simplic- section we propose RGSim.which can be viewed as a composi- ity only.This is not required by our simulation relation,although tional proof technique that allows us to derive the simple e-trace the analogous program constructs of the two languages (e.g.,paral- refinement and then verify the corresponding transformation T. lel compositions ClC2 and CC2)make it convenient for us to 4.1 The Definition discuss the compositionality later. Figure 3 shows part of the definition of_which gives Our co-inductively defined RGSim relation is in the form of the high-level operational semantics of statements.We often omit (C,a,R,G),≤a7,(C,∑，R,G),which is a simulation between the subscript H (or L)in_>H(or_)and the label on program configurations (C,o)and (C.>)It is parametrized with top of the arrow when it is T.The semantics is mostly standard the rely and guarantee conditions at the low level and the high level, We only show the rules for primitive instructions and parallel com- which are binary relations over states: positions here.Note that when a primitive instruction c is blocked R,g∈P(LState×LState),R,G∈P(HState×HState). at state >(i.e.,dom(c)),we let the program configuration reduce to itself.For example,the instruction lock(1)would be The simulation also takes two additional parameters:the step in- variant o and the postcondition y.which are both relations between blocked when 1 is not 0,making it be repeated until 1 becomes the low-level and the high-level states. 0:whereas unlock(1)simply sets 1 to 0 at any time and would never be blocked.Primitive instructions in the high-level and low- a,Y,S∈P(LState×HState). 458

(Events) e ::= ... (Labels) o ::= e | τ (a) Events and Transition Labels (LState) σ ::= ... (LExpr) E ∈ LState → Int⊥ (LBExp) B ∈ LState → {true, false}⊥ (LInstr) c ∈ LState P((Labels × LState) ∪ {abort}) (LStmt) C ::= skip | c | C1; C2 | if (B) C1 else C2 | while (B) C | C1 C2 (LStep) −→L ∈ P((LStmt/{skip} × LState) × Labels ×((LStmt × LState) ∪ {abort})) (b) The Low-Level Language (HState) Σ ::= ... (HExpr) E ∈ HState → Int⊥ (HBExp) B ∈ HState → {true, false}⊥ (HInstr) c ∈ HState P((Labels × HState) ∪ {abort}) (HStmt) C ::= skip | c | C1; ; C2 | if B then C1 else C2 | while B do C | C1C2 (HStep) −→L ∈ P((HStmt/{skip} × HState) × Labels ×((HStmt × HState) ∪ {abort})) (c) The High-Level Language Figure 2. Generic Languages at Target and Source Levels lead to abort. Note that the semantics of an instruction could be non-deterministic. Moreover, it might be undefined on some states, making it possible to model blocking operations such as acquiring a lock. Statements are either primitive instructions or compositions of them. skip is a special statement used as a flag to show the end of executions. A single-step execution of statements is modeled as a labeled transition −→L , which is a triple of an initial program configuration (a pair of statement and state), a label and a resulting configuration. It is undefined when the initial statement is skip. The step aborts if an unsafe instruction is executed. The high-level language (source language) is defined similarly in Figure 2(c), but it is important to note that its states and primitive instructions may be different from those in the low-level language. The compound statements are almost the same as their low-level counterparts. C1; ; C2 and C1C2 are sequential and parallel compositions of C1 and C2 respectively. Note that we choose to use the same set of compound statements in the two languages for simplicity only. This is not required by our simulation relation, although the analogous program constructs of the two languages (e.g., parallel compositions C1 C2 and C1C2) make it convenient for us to discuss the compositionality later. Figure 3 shows part of the definition of −→H , which gives the high-level operational semantics of statements. We often omit the subscript H (or L) in −→H (or −→L ) and the label on top of the arrow when it is τ . The semantics is mostly standard. We only show the rules for primitive instructions and parallel compositions here. Note that when a primitive instruction c is blocked at state Σ (i.e., Σ ∈ dom(c)), we let the program configuration reduce to itself. For example, the instruction lock(l) would be blocked when l is not 0, making it be repeated until l becomes 0; whereas unlock(l) simply sets l to 0 at any time and would never be blocked. Primitive instructions in the high-level and lowlevel languages are atomic in the interleaving semantics. Below we use −→ ∗ for zero or multiple-step transitions with no events generated, and e −→∗ for multiple-step transitions with only one event e generated. 3.2 The Event Trace Refinement Now we can formally define the refinement relation that relates the set of externally observable event traces generated by the target and the source programs. A trace is a sequence of events e, and may end with a termination marker done or a fault marker abort. (EvtTrace) E ::= | done | abort | e::E Definition 1 (Event Trace Set). ETrSetn(C, σ) represents a set of external event traces produced by C in n steps from the state σ: • ETrSet0(C, σ) {} ; • ETrSetn+1(C, σ) {E | (C, σ) −→ (C , σ ) ∧E∈ETrSetn(C , σ ) ∨ (C, σ) e −→ (C , σ ) ∧ E ∈ETrSetn(C , σ ) ∧ E =e::E ∨ (C, σ) −→ abort ∧ E =abort ∨ C =skip ∧ E =done} . We define ETrSet(C, σ) as n ETrSetn(C, σ). We overload the notation and use ETrSet(C, Σ) for the high-level language. Then we define an event trace refinement as the subset relation between event trace sets, which is similar to Leroy’s re- finement property [19]. Definition 2 (Event Trace Refinement). We say (C, σ) is an etrace refinement of (C, Σ), i.e., (C, σ) (C, Σ), if and only if ETrSet(C, σ) ⊆ ETrSet(C, Σ). The refinement is defined for program configurations instead of for code only because the initial states may affect the behaviors of programs. In this case, the transformation T should translate states as well as code. We overload the notation and use T(Σ) to represent the state transformation, and use C T C for ∀σ, Σ. σ = T(Σ) =⇒ (C, σ) (C, Σ), then Correct(T) defined in formula (1.1) can be reformulated as Correct(T) ∀C, C. C = T(C) =⇒ C T C . (3.1) 4. The RGSim Relation The e-trace refinement is defined directly over the externally observable behaviors of programs. It is intuitive, and also abstract in that it is independent of language details. However, as we explained before, it is not compositional w.r.t. parallel compositions. In this section we propose RGSim, which can be viewed as a compositional proof technique that allows us to derive the simple e-trace refinement and then verify the corresponding transformation T. 4.1 The Definition Our co-inductively defined RGSim relation is in the form of (C, σ, R, G) α;γ (C, Σ, R, G), which is a simulation between program configurations (C, σ) and (C, Σ). It is parametrized with the rely and guarantee conditions at the low level and the high level, which are binary relations over states: R, G∈P(LState × LState), R, G ∈ P(HState × HState). The simulation also takes two additional parameters: the step invariant α and the postcondition γ, which are both relations between the low-level and the high-level states. α, γ, ζ ∈ P(LState × HState). 458

(r,)∈c (e,)∈c abort∈c∑ ∑tdom(c) (c,)→(skp,) (c )(skip,) (c,)-→abort c,)→(c,) (C1,)→(C,) (C2,)-→(C2,) (skp川sk,)→(skp习】 (ClC2,)-→+(CⅢC2,) (C1lC2,)-→(C1ⅢC,) (C1,)(C1,) (C2,)(C%,) (C1,)-→abort or(C2,)-→abort (C1lC2,)e(C1ⅢC2,) (CC,)(C,) (C1lC2,)-→abort Figure 3.Operational Semantics of the High-Level Language a a (C.a)a (C,)a R R R RMI R Gle ↓* a o_ C,）-3-(c,2y Co)-3-(C) (a)o-Related Transitions (b)The Side Condition of TRANS (a)Program Steps (b)Environment Steps Figure 4.Related Transitions Figure 5.Simulation Diagrams of RGSim Before we formally define RGSim in Definition 4,we first introduce the a-related transitions as follows. C.the same event should be produced by C.We show the Definition 3(o-Related Transitions). simulation diagram with events generated by the program steps (R,R)a≌{(a,o),(②，∑)|(o,o)∈RA(②，)∈R in Figure 5(a),where solid lines denote hypotheses and dashed ∧(o,)∈aA(σ'，D)∈a} lines denote conclusions,following Leroy's notations [19]. The a relation reflects the abstractions from the low-level ma- (R.R)represents a set of the a-related transitions in R and R chine model to the high-level one,and is preserved by the re- putting together the corresponding transitions in R and R that can lated transitions at the two levels (so it is an invariant).For be related by o,as illustrated in Figure 4(a).(9,G)is defined in instance,when verifying a fine-grained implementation of sets, the same way. the a relation may relate a concrete representation in memory Definition 4(RGSim).Whenever (C,a,R,g)(C,S,R,G). (e.g.,a linked-list)at the low level to the corresponding abstract mathematical set at the high level. then (o,)Eo and the following are true: 1.if (C,)(C,),then there exist C'and such that The corresponding transitions of C and C need to be in (C,)-→*(C,),(o,o),(②，)∈(g,G)aand (G,G*)That is,for each step of C,its state transition should (C,',R,g)(C',E',R,G): satisfy the guarantee g,and the corresponding transition made by the multiple steps of C should be in the transitive closure of 2.if (C,)(C,'),then there exist C'and 'such that G.The guarantees are abstractions of the programs'behaviors. (C,)e→*(C','),(a,a),(②，')∈(g,G)aand As we will show later in the PAR rule in Figure 7,they will serve as the rely conditions of the sibling threads at the time of (C',o',R,g):(C,E,R,G): parallel compositions.Note that we do not need each step of C 3.if C=skip,then there exists'such that to be in G,although we could do so.This is because we only (C,)→*(skp,),(a,o),(②，)∈(G,G*)a care about the coarse-grained behaviors(with mumbling)of the (a,∑)∈y and yCo: source that are used to simulate the target.We will explain more by the example (4.1)in Section 4.2. 4.if(C,o)→abort,then(C,)→*abort; If C terminates,then C terminates as well,and the final states 5.if(o,o),(∑，∑))∈(R,R*)a,then should be related by the postcondition y.We require c a, (Co,R,9)≤a7(C,,R,G). i.e.,the final state relation is not weaker than the step invariant. Then.(C,R,g):(C,R,G)iff .C is not safe only if C is not safe either.This means the trans- for allo and∑，if(a,)∈(，then(C,o,R,g)≤a(C,∑，R,G) formation should not make a safe high-level program unsafe at Here the precondition is used to relate the initial states and the low level. Informally,CG)(CR G)says the low-level Whatever the low-level environment R and the high-level one R do,as long as the state transitions are o-related,they should configuration (C,o)is simulated by the high-level configuration (C,)with behaviors g and G respectively,no matter how their not affect the simulation between C and C,as shown in Fig- environments R and R interfere with them.It requires the follow- ure 5(b).Here a step in R may correspond to zero or multiple ing hold for every execution of C: steps of R.Note that different from the program steps,for the environment steps we do not require each step of R to corre- .Starting from a-related states,each step of C corresponds to spond to zero or multiple steps of R.On the other hand,only zero or multiple steps of C,and the resulting states are o- requiring that R be simulated by R is not sufficient for parallel related too.If an external event is produced in the step of compositionality,which we will explain later in Section 4.2. 459

(τ, Σ ) ∈ c Σ (c, Σ) −→ (skip, Σ ) (e, Σ ) ∈ c Σ (c, Σ) e −→ (skip, Σ ) abort ∈ c Σ (c, Σ) −→ abort Σ ∈ dom(c) (c, Σ) −→ (c, Σ) (skipskip, Σ) −→ (skip, Σ) (C1, Σ) −→ (C 1, Σ ) (C1C2, Σ) −→ (C 1C2, Σ ) (C2, Σ) −→ (C 2, Σ ) (C1C2, Σ) −→ (C1C 2, Σ ) (C1, Σ) e −→ (C 1, Σ ) (C1C2, Σ) e −→ (C 1C2, Σ ) (C2, Σ) e −→ (C 2, Σ ) (C1C2, Σ) e −→ (C1C 2, Σ ) (C1, Σ) −→ abort or (C2, Σ) −→ abort (C1C2, Σ) −→ abort Figure 3. Operational Semantics of the High-Level Language σ σ Σ Σ α ❄ R α ❄ R σ σ θ θ Σ Σ α β ❄ R ❄ RM ❄ R ∗ α β (a) α-Related Transitions (b) The Side Condition of TRANS Figure 4. Related Transitions Before we formally define RGSim in Definition 4, we first introduce the α-related transitions as follows. Definition 3 (α-Related Transitions). R, Rα {((σ, σ ),(Σ, Σ )) | (σ, σ ) ∈R∧ (Σ, Σ ) ∈ R ∧ (σ, Σ) ∈ α ∧ (σ , Σ ) ∈ α} . R, Rα represents a set of the α-related transitions in R and R, putting together the corresponding transitions in R and R that can be related by α, as illustrated in Figure 4(a). G, Gα is defined in the same way. Definition 4 (RGSim). Whenever(C, σ, R, G) α;γ (C, Σ, R, G), then (σ, Σ)∈α and the following are true: 1. if (C, σ) −→ (C , σ ), then there exist C and Σ such that (C, Σ) −→∗ (C , Σ ), ((σ, σ ),(Σ, Σ )) ∈ G, G∗α and (C , σ , R, G) α;γ (C , Σ , R, G); 2. if (C, σ) e −→ (C , σ ), then there exist C and Σ such that (C, Σ) e −→∗ (C , Σ ), ((σ, σ ),(Σ, Σ )) ∈ G, G∗α and (C , σ , R, G) α;γ (C , Σ , R, G); 3. if C = skip, then there exists Σ such that (C, Σ) −→∗ (skip, Σ ), ((σ, σ),(Σ, Σ )) ∈ G, G∗α, (σ, Σ ) ∈ γ and γ ⊆ α; 4. if (C, σ) −→ abort, then (C, Σ) −→∗ abort; 5. if ((σ, σ ),(Σ, Σ )) ∈ R, R∗α, then (C, σ , R, G) α;γ (C, Σ , R, G). Then, (C, R, G) α;ζγ (C, R, G) iff for all σ and Σ, if(σ, Σ) ∈ ζ, then (C, σ, R, G) α;γ (C, Σ, R, G). Here the precondition ζ is used to relate the initial states σ and Σ. Informally, (C, σ, R, G) α;γ (C, Σ, R, G) says the low-level configuration (C, σ) is simulated by the high-level configuration (C, Σ) with behaviors G and G respectively, no matter how their environments R and R interfere with them. It requires the following hold for every execution of C: • Starting from α-related states, each step of C corresponds to zero or multiple steps of C, and the resulting states are α- related too. If an external event is produced in the step of (C, σ) (C , σ ) (C, Σ) (C , Σ ) α G ❄ e α ❄ G ∗ e (C, σ) (C, σ ) (C, Σ) (C, Σ ) α ❄ R α ❄ R ∗ (a) Program Steps (b) Environment Steps Figure 5. Simulation Diagrams of RGSim C, the same event should be produced by C. We show the simulation diagram with events generated by the program steps in Figure 5(a), where solid lines denote hypotheses and dashed lines denote conclusions, following Leroy’s notations [19]. • The α relation reflects the abstractions from the low-level machine model to the high-level one, and is preserved by the related transitions at the two levels (so it is an invariant). For instance, when verifying a fine-grained implementation of sets, the α relation may relate a concrete representation in memory (e.g., a linked-list) at the low level to the corresponding abstract mathematical set at the high level. • The corresponding transitions of C and C need to be in G, G∗α. That is, for each step of C, its state transition should satisfy the guarantee G, and the corresponding transition made by the multiple steps of C should be in the transitive closure of G. The guarantees are abstractions of the programs’ behaviors. As we will show later in the PAR rule in Figure 7, they will serve as the rely conditions of the sibling threads at the time of parallel compositions. Note that we do not need each step of C to be in G, although we could do so. This is because we only care about the coarse-grained behaviors (with mumbling) of the source that are used to simulate the target. We will explain more by the example (4.1) in Section 4.2. • If C terminates, then C terminates as well, and the final states should be related by the postcondition γ. We require γ ⊆ α, i.e., the final state relation is not weaker than the step invariant. • C is not safe only if C is not safe either. This means the transformation should not make a safe high-level program unsafe at the low level. • Whatever the low-level environment R and the high-level one R do, as long as the state transitions are α-related, they should not affect the simulation between C and C, as shown in Figure 5(b). Here a step in R may correspond to zero or multiple steps of R. Note that different from the program steps, for the environment steps we do not require each step of R to correspond to zero or multiple steps of R. On the other hand, only requiring that R be simulated by R is not sufficient for parallel compositionality, which we will explain later in Section 4.2. 459

InitRelT()a,∑.o=T(E)→(o,)∈S We can prove Sta(,(R,R*)).Stability of the pre-and post- conditions under the environments'interference is assumed as an BB(o,)Bo=B BAB(o.)BoABE implicit side condition at every proof rule in Figure 7,e.g.,we as- Intuit(a)Va,∑，g',.(a,)EanoCa'AC sume Sta(,(R,R)a)in the SKIP rule.We also require implicitly →（σ'，）∈a that the relies and guarantees are closed over identity transitions, n#a兰（们na)二(m出a) since stuttering steps will not affect observable event traces. Boa{(a,)|30.(a,)∈aA(0,)∈B} In Figure 7,the rules SKIP,SEQ.IF and WHILE reveal a high degree of similarity to the corresponding inference rules in Hoare a出B兰{(a1世a2,1世∑2）|(a1,1)∈aA(a2,22)∈B} logic.In the SEQ rule,yy serves as the postcondition of Ci and Ci ld≌{(a,o)lo∈LState}True≌{(a,o）la,a'∈LState} and the precondition of C2 and C2 at the same time.The IF rule RM isMidOf(a,B:R,R)ea,a',∑，∑ requires the boolean conditions of both sides to be evaluated to the (a,a),(,)∈(R,)Boa same value under the precondition C.We give the definitions of =→V6.(o,6)∈aA(0,)∈B the sets BB and B/B in Figure 6.The rule also requires the 一3f'.(a,a),(0,8)∈(R,RaA(0,8),(②，)∈(R,)a precondition S to imply the step invariant a.In the WHILE rule,the y relation is viewed as a loop invariant preserved at the loop entry Figure 6.Auxiliary Definitions for RGSim point,and needs to ensure BB. Parallel compositionality.The PAR rule shows parallel compo- sitionality of RGSim.The interference constraints say that two Then based on the simulation,we hide the states by the precon- threads can be composed in parallel if one thread's guarantee im- dition C and define the RGSim relation between programs only.By plies the rely of the other.After parallel composition,they are ex- the definition we know C a if (C,R,G)(C,R,G).i.e., pected to run in the common environment and their guaranteed be- the precondition needs to be no weaker than the step invariant. haviors contain each single thread's behaviors. RGSim is sound w.r.t.the e-trace refinement (Definition 2).That Note that,although RGSim does not require every step of the is,(C,o,R,G)≤an.(C,∑，R,G)ensures that(C,o)does not high-level program to be in its guarantee (see the first two condi- have more observable behaviors than (C.) tions in Definition 4).this relaxation does not affect the parallel Theorem 5(Soundness).If there exist R.G.R.G.a and y such compositionality.This is because the target could have less behav- tha(C,o,R,g)≤a7(C,∑，R,G),then(C,o)S(C,). iors than the source.To let CC2 simulate CC2,we only need a subset of the interleavings of C and C2 to simulate those of C The soundness theorem can be proved by first strengthening the and C2.Thus the high-level relies and guarantees need to ensure relies to the identity transitions and weakening the guarantees to the existence of those interleavings only.Below we give a simple the universal relations.Then we prove that the resulting simulation example to explain this subtle issue.We can prove under identity environments implies the e-trace refinement. (x:=x+2,R,G)a:x1(x:=x+1;x:=x+1,R,G), (4.1) For program transformations,since the initial state for the target program is transformed from the initial state for the source,we use where the relies and the guarantees say x can be increased by 2 and InitRelr(S)(defined in Figure 6)to say the transformation T over o,C and relate x of the two sides: states ensures the binary precondition C. R=g {(a,a')lo'=avo'=ofx~a(x)+2}}; Corollary 6.If there exist R.g.R.G.a.and y such that R=G{(②，)1=V={x(x)+2}: InitRelr(C)and (C,R,g):(C,R,G).then C Er C. a==Y兰{(a,)|a(x)=(x)}. Note that the high-level program is actually finer-grained than its 4.2 Compositionality Rules guarantee,but to prove (4.1)we only need the execution in which it goes two steps to the end without interference from its environment RGSim is compositional w.rt.various program constructs,includ- Also we can prove (print (x),R,G):(print (x),R,G) ing parallel compositions.We present the compositionality rules in Then by the PAR rule,we get Figure 7,which gives us a relational proof method for concurrent program transformations. a9R高a As in the R-G logic [17].we require that the pre-and post- conditions be stable under the interference from the environments which does not violate the natural meaning of refinements.That is, Here we introduce the concept of stability of a relation w.r.t.a set all the possible external events produced by the low-level side can of transition pairs A E P((LState x LState)x (HState x HState)). also be produced by the high-level side,although the latter could have more external behaviors due to its finer granularity. Definition 7(Stability).Sta(C,A)holds iff for all o,o',∑and Another subtlety in the RGSim definition is with the fifth con- if(o,)∈Sand(o,a),(②，)∈A,then(a',)∈. dition over the environments,which is crucial for parallel composi- tionality.One may think a more natural alternative to this condition Usually we need Sta(,(R,R)),which says whenever holds is to require that R be simulated by R: initially and R and R*perform related actions,the resulting states If (')ER,then there exists 'such that still satisfy C.By unfolding(R,R),we could see that a itself is stable w.rt.any a-related transitions,i.e.,Sta(a,(,R*)). (②，)∈and(C,a,R,9)3acx1(C,y,R,G).4.2) Another simple example is given below,where both environments We refer to this modified simulation definition as 0}is lifted nately.does not have parallel compositionality.As a counter- to the relation C: example,if the invariant a says the left-side x is not greater than the right-side x and the precondition requires x of the two sides (≌{(a,)|a(x)=(x)≥0}a兰{(a,)Ia(x)=(x)} R{(o,a)|o=a{x…a(x)+1}} are equal,i.e.. R{(E,)1={x(x)+1} a≌{(a,)1a(x)≤(x)}C兰{(a,)|a(x)=(x)} 460

InitRelT(ζ) ∀σ, Σ. σ = T(Σ) =⇒ (σ, Σ) ∈ ζ B⇔⇔B {(σ, Σ) | B σ = B Σ} B∧∧B {(σ, Σ) | B σ ∧ B Σ} Intuit(α) ∀σ, Σ, σ , Σ . (σ, Σ)∈α ∧ σ ⊆σ ∧ Σ⊆Σ =⇒ (σ , Σ ) ∈ α η # α (η ∩ α) ⊆ (η α) β ◦ α {(σ, Σ) | ∃θ. (σ, θ) ∈ α ∧ (θ, Σ) ∈ β} α β {(σ1 σ2, Σ1 Σ2) | (σ1, Σ1) ∈ α ∧ (σ2, Σ2) ∈ β} Id {(σ, σ) | σ ∈ LState} True {(σ, σ ) | σ, σ ∈ LState} RM isMidOf (α, β; R, R) ∀σ, σ , Σ, Σ . ((σ, σ ), (Σ, Σ )) ∈ R, Rβ◦α =⇒ ∀θ. (σ, θ) ∈ α ∧ (θ, Σ) ∈ β =⇒ ∃θ . ((σ,σ ), (θ,θ ))∈ R, RMα ∧ ((θ,θ ), (Σ,Σ ))∈ RM, Rβ Figure 6. Auxiliary Definitions for RGSim Then based on the simulation, we hide the states by the precondition ζ and define the RGSim relation between programs only. By the definition we know ζ ⊆ α if (C, R, G) α;ζγ (C, R, G), i.e., the precondition needs to be no weaker than the step invariant. RGSim is sound w.r.t. the e-trace refinement (Definition 2). That is, (C, σ, R, G) α;γ (C, Σ, R, G) ensures that (C, σ) does not have more observable behaviors than (C, Σ). Theorem 5 (Soundness). If there exist R, G, R, G, α and γ such that (C, σ, R, G) α;γ (C, Σ, R, G), then (C, σ) (C, Σ). The soundness theorem can be proved by first strengthening the relies to the identity transitions and weakening the guarantees to the universal relations. Then we prove that the resulting simulation under identity environments implies the e-trace refinement. For program transformations, since the initial state for the target program is transformed from the initial state for the source, we use InitRelT(ζ) (defined in Figure 6) to say the transformation T over states ensures the binary precondition ζ. Corollary 6. If there exist R, G, R, G, α, ζ and γ such that InitRelT(ζ) and (C, R, G) α;ζγ (C, R, G), then C T C. 4.2 Compositionality Rules RGSim is compositional w.r.t. various program constructs, including parallel compositions. We present the compositionality rules in Figure 7, which gives us a relational proof method for concurrent program transformations. As in the R-G logic [17], we require that the pre- and postconditions be stable under the interference from the environments. Here we introduce the concept of stability of a relation ζ w.r.t. a set of transition pairs Λ ∈ P((LState × LState) × (HState × HState)). Definition 7 (Stability). Sta(ζ, Λ) holds iff for all σ, σ , Σ and Σ , if (σ, Σ) ∈ ζ and ((σ, σ ),(Σ, Σ )) ∈ Λ, then (σ , Σ ) ∈ ζ. Usually we need Sta(ζ,R, R∗α), which says whenever ζ holds initially and R and R∗ perform related actions, the resulting states still satisfy ζ. By unfolding R, R∗α, we could see that α itself is stable w.r.t. any α-related transitions, i.e., Sta(α,R, R∗α). Another simple example is given below, where both environments could increment x and the unary stable assertion {x ≥ 0} is lifted to the relation ζ: ζ {(σ, Σ) | σ(x) = Σ(x) ≥ 0} α {(σ, Σ) | σ(x) = Σ(x)} R {(σ, σ ) | σ = σ{x σ(x)+1}} R {(Σ, Σ ) | Σ = Σ{x Σ(x)+1}} We can prove Sta(ζ,R, R∗α). Stability of the pre- and postconditions under the environments’ interference is assumed as an implicit side condition at every proof rule in Figure 7, e.g., we assume Sta(ζ,R, R∗α) in the SKIP rule. We also require implicitly that the relies and guarantees are closed over identity transitions, since stuttering steps will not affect observable event traces. In Figure 7, the rules SKIP, SEQ, IF and WHILE reveal a high degree of similarity to the corresponding inference rules in Hoare logic. In the SEQ rule, γ serves as the postcondition of C1 and C1 and the precondition of C2 and C2 at the same time. The IF rule requires the boolean conditions of both sides to be evaluated to the same value under the precondition ζ. We give the definitions of the sets B ⇔⇔ B and B∧∧B in Figure 6. The rule also requires the precondition ζ to imply the step invariant α. In the WHILE rule, the γ relation is viewed as a loop invariant preserved at the loop entry point, and needs to ensure B⇔⇔B. Parallel compositionality. The PAR rule shows parallel compositionality of RGSim. The interference constraints say that two threads can be composed in parallel if one thread’s guarantee implies the rely of the other. After parallel composition, they are expected to run in the common environment and their guaranteed behaviors contain each single thread’s behaviors. Note that, although RGSim does not require every step of the high-level program to be in its guarantee (see the first two conditions in Definition 4), this relaxation does not affect the parallel compositionality. This is because the target could have less behaviors than the source. To let C1C2 simulate C1 C2, we only need a subset of the interleavings of C1 and C2 to simulate those of C1 and C2. Thus the high-level relies and guarantees need to ensure the existence of those interleavings only. Below we give a simple example to explain this subtle issue. We can prove (x:=x+2, R, G) α;ζγ (x:=x+1;x:=x+1, R, G), (4.1) where the relies and the guarantees say x can be increased by 2 and α, ζ and γ relate x of the two sides: R = G {(σ, σ ) | σ = σ ∨ σ = σ{x σ(x)+2}} ; R = G {(Σ, Σ ) | Σ = Σ ∨ Σ = Σ{x Σ(x)+2}} ; α = ζ = γ {(σ, Σ) | σ(x) = Σ(x)} . Note that the high-level program is actually finer-grained than its guarantee, but to prove (4.1) we only need the execution in which it goes two steps to the end without interference from its environment. Also we can prove (print(x), R, G) α;ζγ (print(x), R, G). Then by the PAR rule, we get (x:=x+2print(x), R, G) α;ζγ ((x:=x+1;x:=x+1)print(x), R, G) , which does not violate the natural meaning of refinements. That is, all the possible external events produced by the low-level side can also be produced by the high-level side, although the latter could have more external behaviors due to its finer granularity. Another subtlety in the RGSim definition is with the fifth condition over the environments, which is crucial for parallel compositionality. One may think a more natural alternative to this condition is to require that R be simulated by R: If (σ, σ ) ∈ R, then there exists Σ such that (Σ, Σ ) ∈ R∗ and (C, σ , R, G) α;ζγ (C, Σ , R, G). (4.2) We refer to this modified simulation definition as . Unfortunately, does not have parallel compositionality. As a counterexample, if the invariant α says the left-side x is not greater than the right-side x and the precondition ζ requires x of the two sides are equal, i.e., α {(σ, Σ) | σ(x) ≤ Σ(x)} ζ {(σ, Σ) | σ(x) = Σ(x)} , 460

CCa (C1R):K(C1 R,G)(C2,R,G)a:n(C2,R,G) (SKIP) ·(sEQ) (skip,R,ld)(skip,R,ld) (C1:C2,R,9)a:(C1;C2,R,G) (C,R,G)。61xy(C1,R,G)(C2,R,9)5a2x7(2,R,G) (≤(B台B)S=(n(BmB)2=(Gn(BAB)SCa (F) (if (B)C1 else C2 R,g)if B then C1 else C2.R,G) (C,R,g)30:7K7(C,R,G)7(BB)1=(n(BAB))72=(n(BA-B)) (WHILE) (while (B)C,R,G):(while B do C,R,G) (C1,R1.G1)3a (C1,R1,G1) (C2,R2,S2）ax3 (C2,R2,G2 G1 C R2 g2 CR1 G1 C R2 G2 CR1 GC,inR2,Su9g)≤KK(a)(Ge,RnR2,GuG(eR） (C.,R.G)(C,RG) (C.R,G) (CU)acaSta(d,(G.G")a)(STREN-a) (C,R,G)3a:5x7 a≤a'Sta(a,(R,R")a） (WEAKEN-Q) (C,R,9)≤a:xy(C,R,G) (C,R,9)≤a'K1(C,R,G) (C,R,9)≤：x1(C,R,G)S'C5y≤YSa R'CR R'CR GCG G二G (CONSEQ)】 (C,R',G')ax (C,R',) sBC2a是E》 (C.R,g)(M,RM,GM) (M,RM,GM≤8：6xn(C,R,G) n#{,Y,a}Sta(n,{(G,G*)a,(R1,Ri)a}) RM isMidOf (a,B;R,R*) (FRAME) (C,RwR1,9wGi)≤awa:(ux(w(C,R出R1,G出G1) (C,R,9)≤goal6ocxo)(C,R,G(TRANS） Figure 7.Compositionality Rules for RGSim we could prove the following: respect the invariant B and R1.G1.Ri and G1.We give the auxil- 国之CT. (x:=x+1.ld,True) iary definitions in Figure 6.The disjoint union between states is lifted to state pairs.An intuitionistic state relation is monotone w.r.t. the extension of states.The disjointness n a says that any state Here we use ld and True(defined in Figure 6)for the sets of identity pair satisfying both n and o can be split into two disjoint state pairs transitions and arbitrary transitions respectively,and overload the satisfying n and a respectively.For example.let n{(, notations at the low level to the high level.However,the following a(y)=(y)}and a(a,>I a(x)=(x)},then both n refinement does not hold after parallel composition: and a are intuitionistic and n#o holds.We also require n to be (x:=x+1llprint(x),Id,True):xa stable under interference from the programs (i.e.,the programs do (x:=x+2print (x),Id,True). not change the extra resource)and the extra environments.We use This is because the rely R(or R)is an abstraction of all the per- n#{S,,a}as a shorthand for(忉#)A(忉#y）A(m#a)- Similar representations are used in this rule. mitted behaviors in the environment of a thread.But a concrete sibling thread that runs in parallel may produce less transitions Finally,the transitivity rule TRANS allows us to verify a trans- than R(or R).To obtain parallel compositionality,we need to en- formation by using an intermediate level as a bridge.The interme- sure that the simulation holds for all concrete sibling threads.With diate environment RM should be chosen with caution so that the our definition≤，the refinement(print(x),True,ld≤a:xa (B o a)-related transitions can be decomposed into B-related and (print(x),True,ld)is not provable because,after the environ- a-related transitions,as illustrated in Figure 4(b).Here o defines ments'o-related transitions,the target may print a value smaller the composition of two relations and isMidOf defines the side con- than the one printed by the source. dition over the environments,as shown in Figure 6.We use 0 for a middle-level state. Other rules.We also develop some other useful rules about RGSim.For example,the STREN-o rule allows us to replace the Soundness of all the rules in Figure 7 is proved in the techni- invariant a by a stronger invariant a'.We need to check that a' cal report [20],showing that for each rule the premises imply the is indeed an invariant preserved by the related program steps,i.e.. conclusion.The proofs are also mechanized in the Coq proof assis- Sta(a',(9,G)holds.Symmetrically,the WEAKEN-o rule re- tant [10]. quires a to be preserved by environment steps related by the weaker Instantiations of relies and guarantees.We can derive the se- invariant a'.As usual,the pre/post conditions,the relies and the quential refinement and the fully-abstract-semantics-based refine- guarantees can be strengthened or weakened by the CoNSEQ rule. ment by instantiating the rely conditions in RGSim.For example, The FRAME rule allows us to use local specifications.When ver- the refinement (4.3)over closed programs assumes identity envi- ifying the simulation between C and C,we need to only talk about ronments,making the interference constraints in the PAR rule un- satisfiable.This confirms the observation in Section 2.1 that the the locally-used resource in o,6 and y,and the local relies and guarantees R,g,R and G.Then the proof can be reused in con- sequential refinement loses parallel compositionality. texts where some extra resource n is used,and the accesses of it (C,Id,True):(C,Id,True) (4.3) 461

ζ ⊆ α (skip, R, Id) α;ζζ (skip, R, Id) (SKIP) (C1, R, G) α;ζγ (C1, R, G) (C2, R, G) α;γη (C2, R, G) (C1; C2, R, G) α;ζη (C1; ; C2, R, G) (SEQ) (C1, R, G) α;ζ1γ (C1, R, G) (C2, R, G) α;ζ2γ (C2, R, G) ζ ⊆ (B⇔⇔B) ζ1 = (ζ ∩ (B∧∧B)) ζ2 = (ζ ∩ (¬B∧∧¬B)) ζ ⊆ α (if (B) C1 else C2, R, G) α;ζγ (if B then C1 else C2, R, G) (IF) (C, R, G) α;γ1γ (C, R, G) γ ⊆ (B⇔⇔B) γ1 = (γ ∩ (B∧∧B)) γ2 = (γ ∩ (¬B∧∧¬B)) (while (B) C, R, G) α;γγ2 (while B do C, R, G) (WHILE) (C1, R1, G1) α;ζγ1 (C1, R1, G1) (C2, R2, G2) α;ζγ2 (C2, R2, G2) G1 ⊆ R2 G2 ⊆ R1 G1 ⊆ R2 G2 ⊆ R1 (C1 C2, R1 ∩ R2, G1 ∪ G2) α;ζ(γ1∩γ2) (C1C2, R1 ∩ R2, G1 ∪ G2) (PAR) (C, R, G) α;ζγ (C, R, G) (ζ ∪ γ) ⊆ α ⊆ α Sta(α , G, G∗α) (C, R, G) α;ζγ (C, R, G) (STREN-α) (C, R, G) α;ζγ (C, R, G) α ⊆ α Sta(α, R, R∗α ) (C, R, G) α;ζγ (C, R, G) (WEAKEN-α) (C, R, G) α;ζγ (C, R, G) ζ ⊆ ζ γ ⊆ γ ⊆ α R ⊆ R R ⊆ R G⊆G G ⊆ G (C, R , G ) α;ζγ (C, R , G ) (CONSEQ) (C, R, G) α;ζγ (C, R, G) η ⊆ β Intuit({α, ζ, γ, β, η, R, R, R1, R1}) η # {ζ, γ,α} Sta(η, {G, G∗α, R1, R∗ 1β}) (C, RR1, GG1) αβ;(ζη)(γη) (C, R R1, G G1) (FRAME) (C, R, G) α;ζγ (M, RM, GM) (M, RM, GM) β;δη (C, R, G) RM isMidOf (α, β; R, R∗) (C, R, G) β◦α;(δ◦ζ)(η◦γ) (C, R, G) (TRANS) Figure 7. Compositionality Rules for RGSim we could prove the following: (x:=x+1, Id, True) α;ζα (x:=x+2, Id, True) ; (print(x), True, Id) α;ζα (print(x), True, Id) . Here we use Id and True (defined in Figure 6) for the sets of identity transitions and arbitrary transitions respectively, and overload the notations at the low level to the high level. However, the following refinement does not hold after parallel composition: (x:=x+1print(x), Id, True) α;ζα (x:=x+2print(x), Id, True) . This is because the rely R (or R) is an abstraction of all the permitted behaviors in the environment of a thread. But a concrete sibling thread that runs in parallel may produce less transitions than R (or R). To obtain parallel compositionality, we need to ensure that the simulation holds for all concrete sibling threads. With our definition , the refinement (print(x), True, Id) α;ζα (print(x), True, Id) is not provable because, after the environments’ α-related transitions, the target may print a value smaller than the one printed by the source. Other rules. We also develop some other useful rules about RGSim. For example, the STREN-α rule allows us to replace the invariant α by a stronger invariant α . We need to check that α is indeed an invariant preserved by the related program steps, i.e., Sta(α ,G, G∗α) holds. Symmetrically, the WEAKEN-α rule requires α to be preserved by environment steps related by the weaker invariant α . As usual, the pre/post conditions, the relies and the guarantees can be strengthened or weakened by the CONSEQ rule. The FRAME rule allows us to use local specifications. When verifying the simulation between C and C, we need to only talk about the locally-used resource in α, ζ and γ, and the local relies and guarantees R, G, R and G. Then the proof can be reused in contexts where some extra resource η is used, and the accesses of it respect the invariant β and R1, G1, R1 and G1. We give the auxiliary definitions in Figure 6. The disjoint union between states is lifted to state pairs. An intuitionistic state relation is monotone w.r.t. the extension of states. The disjointness η # α says that any state pair satisfying both η and α can be split into two disjoint state pairs satisfying η and α respectively. For example, let η {(σ, Σ) | σ(y) = Σ(y)} and α {(σ, Σ) | σ(x) = Σ(x)}, then both η and α are intuitionistic and η # α holds. We also require η to be stable under interference from the programs (i.e., the programs do not change the extra resource) and the extra environments. We use η # {ζ, γ,α} as a shorthand for (η # ζ) ∧ (η # γ) ∧ (η # α). Similar representations are used in this rule. Finally, the transitivity rule TRANS allows us to verify a transformation by using an intermediate level as a bridge. The intermediate environment RM should be chosen with caution so that the (β ◦ α)-related transitions can be decomposed into β-related and α-related transitions, as illustrated in Figure 4(b). Here ◦ defines the composition of two relations and isMidOf defines the side condition over the environments, as shown in Figure 6. We use θ for a middle-level state. Soundness of all the rules in Figure 7 is proved in the technical report [20], showing that for each rule the premises imply the conclusion. The proofs are also mechanized in the Coq proof assistant [10]. Instantiations of relies and guarantees. We can derive the sequential refinement and the fully-abstract-semantics-based refinement by instantiating the rely conditions in RGSim. For example, the refinement (4.3) over closed programs assumes identity environments, making the interference constraints in the PAR rule unsatisfiable. This confirms the observation in Section 2.1 that the sequential refinement loses parallel compositionality. (C, Id, True) α;ζγ (C, Id, True) (4.3) 461

The refinement(4.4)assumes arbitrary environments,which makes to a target having infinite silent steps.In the above example,this al- the interference constraints in the PAR rule trivially true.But this as- lows the low-level programs to be blocked forever (e.g.,at the time sumption is too strong:usually (4.4)cannot be satisfied in practice when the lock is held but never released by some other thread) Proving the preservation of the termination behavior would require (C,True,True):(C,True,True) (4.4) liveness proofs in a concurrent setting (e.g..proving the absence of deadlock),which we leave as future work. 4.3 A Simple Example In the next three sections,we show more serious examples to Below we give a simple example to illustrate the use of RGSim demonstrate the applicability of RGSim. and its parallel compositionality in verifying concurrent program transformations.The high-level program CC2 is transformed to CiC2,using a lock 1 to synchronize the accesses of the shared 5.Relational Reasoning about Optimizations variable x.We aim to prove ClC CC2.That is,although As a general correctness notion of concurrent program transforma- x:=x+2 is implemented by two steps of incrementing x in C2,the tions,RGSim establishes a relational approach to justify compiler parallel observer Ci will not print unexpected values.Here we view optimizations on concurrent programs.Below we adapt Benton's output events as externally observable behaviors. work [3]on sequential optimizations to the concurrent setting. print(x);x :=x+2; 5.1 Optimization Rules Usually optimizations depend on particular contexts,e.g..the as- 1ock(1); 1ock(1); print(x);x :x+1;x :x+1; signment :=E can be eliminated only in the context that the value of is never used after the assignment.In a shared-state con- in1oclk1】: (unlock(1);X :=x;) current setting,we should also consider the parallel context for an To facilitate the proof,we introduce an auxiliary shared vari- optimization.RGSim enables us to specify various sophisticated requirements for the parallel contexts by rely/guarantee conditions able X at the low level to record the value of x at the time when Based on RGSim,we provide a set of inference rules to character- releasing the lock.It specifies the value of x outside every critical section,thus should match the value of the high-level x after every ize and justify common optimizations (e.g.,dead code elimination) with information of both the sequential and the parallel contexts corresponding action.Here (C)means C is executed atomically. Due to the space limit,we only present some interesting rules here By the soundness and compositionality of RGSim,we only need and leave other rules in the technical report [20].Note in this sec- to prove simulations over individual threads,providing appropriate tion the target and the source programs are in the same language. relies and guarantees.We first define the invariant a,which only Sequential skip Law cares about the value of x when the lock is free. (C1,R1,91)≤a:x1(C2,R2,S2） a {(a,)|a(X)=(x)A(a(1)=0→(x=a(X)} We let the pre-and post-conditions be a as well. (skip;C1,R1,S1)≤a:xy(C2,R2,S2】 The high-level threads can be executed in arbitrary environ- (C1,R1,91)5ax7(C2,R2,2) ments with arbitrary guarantees:R=GA True.The transfor- (C1,R1,91)≤a:x,(skip:C2,R2,92】 mation uses the lock to protect every access of x,thus the low-level Plus the variants with skip after the code Ci or C2.That is,skips relies and guarantees are not arbitrary: could be arbitrarily introduced and eliminated. R{(o,o')lo(1)=cid= σ(x)=o'(x)Aa(X)=a'(X)Aa(1)=a'(1)}: Common Branch g(a,a')la'=o v a(1)=0na'=of1~cia} o1,a2.(a1,a2)∈S→B02≠L Vo(1)=cidAo'=ofx (C,R,G):1x7(C1,R',C)1 (sn(trueAB)) Va(1)=cidAa'=of10x}} (C,R,g)3a:2K(C2,R',G')2=((trueA-B)) Every low-level thread guarantees that it updates x only when the (C,R,)(if (B)C1 else C2,R',') lock is acquired.Its environment cannot update x or 1 if the current thread holds the lock.Here cid is the identifier of the current This rule says that,when the if-condition can be evaluated and both branches can be optimized to the same code C,we can transform thread.When acquired,the lock holds the id of the owner thread. the whole if-statement to C without introducing new behaviors. Following the definition,we can prove (C1,R,G) Dead While (C1,R,G)and (C2,R,g)x(C2,R,G).By applying the PAR rule and from the soundness of RGSim (Corollary 6),we know s=(sn(trueA-B))Ca Sta(:(R1:Ri)a) CillC2CC2 holds for any T that respects a. (skip,R1,ld)(while(B)(C),R2,Id) Perhaps interestingly,if we omit the lock and unlock operations We can eliminate the loop,if the loop condition is false (no matter in C1,then Ci C2 would have more externally observable behav- how the environments update the states)at the loop entry point. iors than CiC2.This does not indicate the unsoundness of our Dead Code Elimination PAR rule (which is sound!).The reason is that x might have dif- ferent values on the two levels after the environments'a-related (skip,Id,Id)(C,ld,g) Sta({S,Y},(R1,R2》a) transitions,so that we cannot have (print(x),R,g)x (skip,R1,ld)(C,R2,9) (print (x),R,G)with the current definitions of a,R and G,even though the code of the two sides are syntactically identical. Intuitively (skip,Id,Id)(C,Id,g)says that the code C can be eliminated in a sequential context where the initial and the More discussions.RGSim ensures that the target program pre- final states satisfy and y respectively.If both and y are stable serves safety properties (including the partial correctness)of the w.r.t.the interference from the environments Ri and R2,then the source,but allows a terminating source program to be transformed code C can be eliminated in such a parallel context as well. 462

The refinement (4.4) assumes arbitrary environments, which makes the interference constraints in the PAR rule trivially true. But this assumption is too strong: usually (4.4) cannot be satisfied in practice. (C, True, True) α;ζγ (C, True, True) (4.4) 4.3 A Simple Example Below we give a simple example to illustrate the use of RGSim and its parallel compositionality in verifying concurrent program transformations. The high-level program C1C2 is transformed to C1 C2, using a lock l to synchronize the accesses of the shared variable x. We aim to prove C1 C2 T C1C2. That is, although x:=x+2 is implemented by two steps of incrementing x in C2, the parallel observer C1 will not print unexpected values. Here we view output events as externally observable behaviors. print(x); x := x + 2; ⇓ lock(l); print(x); unlock(l); lock(l); x := x+1; x := x+1; unlock(l); X := x; To facilitate the proof, we introduce an auxiliary shared variable X at the low level to record the value of x at the time when releasing the lock. It specifies the value of x outside every critical section, thus should match the value of the high-level x after every corresponding action. Here C means C is executed atomically. By the soundness and compositionality of RGSim, we only need to prove simulations over individual threads, providing appropriate relies and guarantees. We first define the invariant α, which only cares about the value of x when the lock is free. α {(σ, Σ) | σ(X) = Σ(x) ∧ (σ(l)=0 =⇒ σ(x) = σ(X))} . We let the pre- and post-conditions be α as well. The high-level threads can be executed in arbitrary environments with arbitrary guarantees: R = G True. The transformation uses the lock to protect every access of x, thus the low-level relies and guarantees are not arbitrary: R {(σ, σ ) | σ(l)=cid =⇒ σ(x)=σ (x) ∧ σ(X)=σ (X) ∧ σ(l)=σ (l)} ; G {(σ, σ ) | σ =σ ∨ σ(l)=0 ∧ σ =σ{l cid} ∨ σ(l)=cid ∧ σ =σ{x } ∨ σ(l)=cid ∧ σ =σ{l 0, X }} . Every low-level thread guarantees that it updates x only when the lock is acquired. Its environment cannot update x or l if the current thread holds the lock. Here cid is the identifier of the current thread. When acquired, the lock holds the id of the owner thread. Following the definition, we can prove (C1, R, G) α;αα (C1, R, G) and (C2, R, G) α;αα (C2, R, G). By applying the PAR rule and from the soundness of RGSim (Corollary 6), we know C1 C2 T C1C2 holds for any T that respects α. Perhaps interestingly, if we omit the lock and unlock operations in C1, then C1 C2 would have more externally observable behaviors than C1 C2. This does not indicate the unsoundness of our PAR rule (which is sound!). The reason is that x might have different values on the two levels after the environments’ α-related transitions, so that we cannot have (print(x), R, G) α;αα (print(x), R, G) with the current definitions of α, R and G, even though the code of the two sides are syntactically identical. More discussions. RGSim ensures that the target program preserves safety properties (including the partial correctness) of the source, but allows a terminating source program to be transformed to a target having infinite silent steps. In the above example, this allows the low-level programs to be blocked forever (e.g., at the time when the lock is held but never released by some other thread). Proving the preservation of the termination behavior would require liveness proofs in a concurrent setting (e.g., proving the absence of deadlock), which we leave as future work. In the next three sections, we show more serious examples to demonstrate the applicability of RGSim. 5. Relational Reasoning about Optimizations As a general correctness notion of concurrent program transformations, RGSim establishes a relational approach to justify compiler optimizations on concurrent programs. Below we adapt Benton’s work [3] on sequential optimizations to the concurrent setting. 5.1 Optimization Rules Usually optimizations depend on particular contexts, e.g., the assignment x := E can be eliminated only in the context that the value of x is never used after the assignment. In a shared-state concurrent setting, we should also consider the parallel context for an optimization. RGSim enables us to specify various sophisticated requirements for the parallel contexts by rely/guarantee conditions. Based on RGSim, we provide a set of inference rules to characterize and justify common optimizations (e.g., dead code elimination) with information of both the sequential and the parallel contexts. Due to the space limit, we only present some interesting rules here and leave other rules in the technical report [20]. Note in this section the target and the source programs are in the same language. Sequential skip Law (C1, R1, G1) α;ζγ (C2, R2, G2) (skip; C1, R1, G1) α;ζγ (C2, R2, G2) (C1, R1, G1) α;ζγ (C2, R2, G2) (C1, R1, G1) α;ζγ (skip; C2, R2, G2) Plus the variants with skip after the code C1 or C2. That is, skips could be arbitrarily introduced and eliminated. Common Branch ∀σ1, σ2. (σ1, σ2) ∈ ζ =⇒ B σ2 =⊥ (C, R, G) α;ζ1γ (C1, R , G ) ζ1 = (ζ ∩ (true∧∧B)) (C, R, G) α;ζ2γ (C2, R , G ) ζ2 = (ζ ∩ (true∧∧¬B)) (C, R, G) α;ζγ (if (B) C1 else C2, R , G ) This rule says that, when the if-condition can be evaluated and both branches can be optimized to the same code C, we can transform the whole if-statement to C without introducing new behaviors. Dead While ζ = (ζ ∩ (true∧∧¬B)) ζ ⊆ α Sta(ζ, R1, R∗ 2α) (skip, R1, Id) α;ζζ (while (B){C}, R2, Id) We can eliminate the loop, if the loop condition is false (no matter how the environments update the states) at the loop entry point. Dead Code Elimination (skip, Id, Id) α;ζγ (C, Id, G) Sta({ζ,γ}, R1, R∗ 2α) (skip, R1, Id) α;ζγ (C, R2, G) Intuitively (skip, Id, Id) α;ζγ (C, Id, G) says that the code C can be eliminated in a sequential context where the initial and the final states satisfy ζ and γ respectively. If both ζ and γ are stable w.r.t. the interference from the environments R1 and R2, then the code C can be eliminated in such a parallel context as well. 462

Redundancy Introduction ADD(e): RMV(e) 0 atom 0 atom (c,Id,g)(skip,Id,Id)Sta(,},(R1,R)) S SU[e}; S:=S-{e}; (c,R1,G)a:x7(skip,R2,Id) As we lifted sequential dead code elimination,we can also lift se- (a)An Abstract Set quential redundant code introduction to the concurrent setting,so long as the pre-and post-conditions are stable w.rt.the environ- add(e): rmv(e）: ments.Note that here c is a single instruction,because we should local x,y,z,ui local x,y,z,v; 0 0 consider the interference from the environments at every interme- 1 1 lock(x); lock(x); diate state when introducing a sequence of redundant instructions. 3 5.2 An Example of Invariant Hoisting while (u 9 tion variable elimination)can be found in the technical report [20] } if (v=e){ Target Code (C1) Source Code(C) 10 if (u !e){ 10 11 lock(y); local t; local t; 11 y :=new(); 12 t:=x+1; while(i n){ 12 y.1ock:=0; hi1e(1 y.data :=e; 14 unlock(x); 1:=1+t: 1:=1+t; 14 y.next :=2; 2 6 free(y); y else When we do not care about the final value of t,it's not diffi- 16 16 unlock(x); unlock(x); cult to prove that the optimized code Ci preserves the sequential y behaviors of the source C [3].But in a concurrent setting,safely (b)The Lock-Coupling List-Based Set hoisting the invariant code t:=x+1 also requires that the environ- ment should not update x nor t. Figure 8.The Set Object R(a')a(x)=0'(x)Ao(t)=a'(t)}. The guarantee of the program can be specified as arbitrary transi- of implementations of the object:we can define abstract atomic tions.Since we only care about the values of i,n and x,the invari- operations in a high-level language as specifications,and prove ant relation o can be defined as: the concrete fine-grained implementations refine the correspond- a{(a1,)|a1(1)=a(1)A1()=()Ao1(x)=a(x}. ing atomic operations when executed in appropriate environments For instance,in Figure 8(a)we define two atomic set operations We do not need special pre-and post-conditions,thus the correct- ADD(e)and RMV(e).Figure 8(b)gives a concrete implementation ness of the optimization is formalized as follows: of the set object using a lock-coupling list.Partial correctness and (C1,R,True):x(C,R,True). atomicity of the algorithm has been verified before [28.29].Here (5.1) we show that its atomicity can also be verified using our RGSim We could prove (5.1)directly by the RGSim definition and the by proving the low-level methods refine the corresponding abstract operations.We will discuss the key difference between the previous operational semantics of the code.But below we give a more con- proofs and ours in Section 8. venient proof using the optimization rules and the compositional- ity rules instead.We first prove the following by the Dead-Code- We first take the generic languages in Figure 2,and instantiate Elimination and Redundancy-Introduction rules: the high-level program states below. (t:=x+1,R,True):x(skip,R,True); (HMem)Ms,M E (LocU PVar)-HVal (skip,R,True):7(t:=x+1,R,True), (HThrds) Π ∈ThrdID→HMem where y and n specify the states at the specific program points: (HState) ∈HThrds x HMem Yan{(a1,o)|a1(t)=1(x)+1} The state consists of shared memory Ms(where the object resides) n yn{(a1,a)la(t)=a(x)+1}. and a thread pool II,which is a mapping from thread identifiers (t ThrdlD)to their memory M.The low-level state is defined After adding skips to Ci and C to make them the same "shape" similarly.We use m.m and to represent the low-level shared we can prove the simulation by the compositionality rules SEQ and memory,thread-local memory and the thread pool respectively. WHILE.Finally,we remove all the skips and conclude (5.1),i.e. the correctness of the optimization in appropriate contexts.Since To allow ownership transfer between the shared memory and the relies only prohibit updates of x and t,we can execute Ci and thread-local memory,we use atomfC (or (C).A at the low level) C concurrently with other threads which update i and n or read x, to convert the shared memory to local and then execute C (or still ensuring semantics preservation. C)atomically.Following RGSep [29],an abstract transition A E P(HMem×HMem)(orA∈P(LMem×LMem)is used to specify 6.Proving Atomicity of Concurrent Objects the effects of the atomic operation over the shared memory,which allows us to split the resulting state back into shared and local when A concurrent object provides a set of methods,which can be called we exit the atomic block.The atomic blocks are instantiations of the in parallel by clients as the only way to access the object.RGSim generic primitive operations c(or c)in Figure 2.Their semantics is gives us a refinement-based proof method to verify the atomicity shown in the technical report [20].We omit the annotations A and 463

Redundancy Introduction (c, Id, G) α;ζγ (skip, Id, Id) Sta({ζ,γ}, R1, R∗ 2α) (c, R1, G) α;ζγ (skip, R2, Id) As we lifted sequential dead code elimination, we can also lift sequential redundant code introduction to the concurrent setting, so long as the pre- and post-conditions are stable w.r.t. the environments. Note that here c is a single instruction, because we should consider the interference from the environments at every intermediate state when introducing a sequence of redundant instructions. 5.2 An Example of Invariant Hoisting With these rules, we can prove the correctness of many traditional compiler optimizations performed on concurrent programs in appropriate contexts. Here we only give a small example of hoisting loop invariants. More examples (e.g., strength reduction and induction variable elimination) can be found in the technical report [20]. Target Code (C1) local t; t := x + 1; while(i 1 lock(x); 2 3 4 while (u 9 } 10 if (u != e) { 11 y := new(); 12 y.lock := 0; 13 y.data := e; 14 y.next := z; 15 } 16 unlock(x); local x,y,z,v; 0 1 lock(x); 2 3 4 while (v 9 } 10 if (v = e) { 11 lock(y); 12 13 14 unlock(x); 15 free(y); } else { 16 unlock(x); } (b) The Lock-Coupling List-Based Set Figure 8. The Set Object of implementations of the object: we can define abstract atomic operations in a high-level language as specifications, and prove the concrete fine-grained implementations refine the corresponding atomic operations when executed in appropriate environments. For instance, in Figure 8(a) we define two atomic set operations, ADD(e) and RMV(e). Figure 8(b) gives a concrete implementation of the set object using a lock-coupling list. Partial correctness and atomicity of the algorithm has been verified before [28, 29]. Here we show that its atomicity can also be verified using our RGSim by proving the low-level methods refine the corresponding abstract operations. We will discuss the key difference between the previous proofs and ours in Section 8. We first take the generic languages in Figure 2, and instantiate the high-level program states below. (HMem) Ms, Ml ∈ (Loc ∪ PVar) HVal (HThrds) Π ∈ ThrdID → HMem (HState) Σ ∈ HThrds × HMem The state consists of shared memory Ms (where the object resides) and a thread pool Π, which is a mapping from thread identifiers (t ∈ ThrdID) to their memory Ml. The low-level state σ is defined similarly. We use ms, ml and π to represent the low-level shared memory, thread-local memory and the thread pool respectively. To allow ownership transfer between the shared memory and thread-local memory, we use atom{C}A (or CA at the low level) to convert the shared memory to local and then execute C (or C) atomically. Following RGSep [29], an abstract transition A ∈ P(HMem×HMem) (or A∈P(LMem×LMem)) is used to specify the effects of the atomic operation over the shared memory, which allows us to split the resulting state back into shared and local when we exit the atomic block. The atomic blocks are instantiations of the generic primitive operations c (or c) in Figure 2. Their semantics is shown in the technical report [20]. We omit the annotations A and 463

ms list(Z,A) 兰(ms=中Ax=null AA=e）V(仔mg3u.3y.3A'.ms=m出{x(-u,)}AA=v:A'Amg卡list(,A) shared_map(ms,Ms)m.A.3r.ms =m Head~xA (m list(x,MIN_VAL::A::MAX_VAL))A sorted(A)A (elems(A)=Ms(S)) local_map(m,Mi)mi(e)=Mi(e)A 3mi.mi myu} a 兰{(r,ms),(L,Ms)|shared_map(ms,Ms)At∈dom().local_map(π(t),Π(t)} Shock(t) 兰{(r,ms,(m,mg)》|3c,u,.ms(x)=(0,u,)∧mg=ms{r一(t,,)} Gunlock(t) {((,ms),(n.ms))|3,v.y.ms(2)=(t.v,y)Ams ms{(0vy))} Cadd(t) e{(rw{tmu,ms),(π出{tm},ms》 3r,y,z,u,v,w.ms(x)=(t,u,2)八ms(z)=(-+,-) A m's=msi(t;u,y)}y(0,v,2))A(miiy(0,v,2))=m)Au<v<w Cmv(t) 兰{(r出{tm},ms),(π出{tm},ms) 3x,,z,u,.ms(x)=(t,u,yAms()=(化，u,) A msy(t,v,z)}=msx(t,u,z)}A m=miy(t,v,2)}Av<MAX_VAL} Cocal(t) 兰{(rw{tmu},ms),(r世{tm},ms)lr∈(ThrdID→Mem)Aml,mi,ms∈LMem g(t) Glock(t)UGulock(t)UGadd(t)UGrmv(t)UGhocal(t) R()兰U9(t) Gadd(t) ≌{(Iw{tMh,Ms),(I世{tM,Mg)|3e.Mg=M{s∽Ms(s)U{e} Grmv(t) {((tM),M),(nIM),M))e.M=M{sM.(s)-{e))} Glocal(t) {({t∽M},Ms),(I{tM},Ms)|∈(ThrdID→HMem)AM,M,Ms∈HMem} G(t) Gada(t)UGrmv(t)UGlocal(t) R(e≌UG(t) Figure 9.Useful Definitions for the Lock-Coupling List A in Figure 8,which are the same as the corresponding guarantees We give the detailed proofs in the technical report [201.which are in Figure 9,as we will explain below. done operationally based on the definition of RGSim. In Figure 8.the abstract set is implemented by an ordered By the compositionality and the soundness of RGSim,we know singly-linked list pointed to by a shared variable Head,with two that the fine-grained operations (under the parallel environment R) sentinel nodes at the two ends of the list containing the values are simulated by the corresponding atomic operations (under the MIN_VAL and MAX_VAL respectively.Each list node is associated high-level environment R),while R and R say all accesses to the with a lock.Traversing the list uses "hand-over-hand"locking: set must be done through the add and remove operations.This gives the lock on one node is not released until its successor is locked. us the atomicity of the concurrent implementation of the set object. add(e)inserts a new node with value e in the appropriate position while holding the lock of its predecessor.rmv(e)redirects the More examples.In the technical report[201,we also show the use predecessor's pointer while both the node to be removed and its of RGSim to prove the atomicity of other fine-grained algorithms predecessor are locked. including the non-blocking concurrent counter [27],Treiber's stack algorithm[26],and a concurrent GCD algorithm (calculating great- We define the a relation,the guarantees and the relies in Fig- est common divisors). ure 9.The predicate ms list(,A)represents a singly-linked list in the shared memory ms at the location whose values form 7. the sequence A.Then the mapping shared_map between the low- Verifying Concurrent Garbage Collectors level and the high-level shared memory is defined by only concern In this section,we explain in detail how to reduce the problem of ing about the value sequence on the list:the concrete list should verifying concurrent garbage collectors to transformation verifica- be sorted and its elements constitute the abstract set.For a thread tion,and use RGSim to develop a general GC verification frame- t's local memory of the two levels,we require that the values of work.We apply the framework to prove the correctness of the e are the same and enough local space is provided for add(e) Boehm et al.concurrent GC algorithm [7] and rmv(e).as defined in the mapping local_map.Then a relates the shared memory by shared_map and the local memory of each 7.1 Correctness of Concurrent GCs thread t by local_map A concurrent GC is executed by a dedicate thread and performs The atomic actions of the algorithm are specified by Soek. the collection work in parallel with user threads(mutators),which Guloek.Gadd.G and Socal respectively,which are all parame- access the shared heap via read,write and allocation operations.To terized with a thread identifier t.For example.(t)says that ensure that the GC and the mutators share a coherent view of the when holding the locks of the node y and its predecessor r,we can heap,the heap operations from mutators may be instrumented with transfer the node y from the shared memory to the thread's local extra operations,which provide an interaction mechanism to allow memory.This corresponds to the action performed by the code of arbitrary mutators to cooperate with the GC.These instrumented line 13 in rmv(e).Every thread t is executed in the environment heap operations are called barriers (e.g.,read barriers,write barriers that any other thread t'can only perform those five actions.as de- and allocation barriers). fined in R(t).Similarly,the high-level G(t)and R(t)are defined The GC thread and the barriers constitute a concurrent garbage according to the abstract ADD(e)and RMV(e).The relies and guar- collecting system,which provides a higher-level user-friendly pro- antees are almost the same as those in the proofs in RGSep [28]. gramming model for garbage-collected languages (e.g.,Java).In We can prove that for any thread t,the following hold: this high-level model,programmers feel they access the heap using regular memory operations,and are freed from manually disposing (t.add(e),R(t),G(t))a:axa (t.ADD(e),R(t),G(t)); objects that are no longer in use.They do not need to consider the (t.rmv(e),R(t),G(t))a:axa (t.RMV(e),R(t),G(t)) implementation details of the GC and the existence of barriers. 464

ms |= list(x, A) (ms = φ ∧ x = null ∧ A = ) ∨ (∃m s.∃v.∃y.∃A . ms = m s {x ( , v, y)} ∧ A = v ::A ∧ m s |= list(y, A )) shared map(ms, Ms) ∃m s.∃A.∃x. ms = m s {Head x} ∧ (m s |= list(x, MIN VAL::A::MAX VAL)) ∧ sorted(A) ∧ (elems(A)=Ms(S)) local map(ml, Ml) ml(e) = Ml(e) ∧ ∃m l. ml = m l {x , y , z , u , v } α {((π,ms), (Π, Ms)) | shared map(ms, Ms) ∧ ∀t ∈ dom(Π). local map(π(t), Π(t))} Glock(t) {((π,ms), (π,m s)) | ∃x, v, y. ms(x) = (0, v, y) ∧ m s = ms{x (t, v, y)}} Gunlock(t) {((π,ms), (π,m s)) | ∃x, v, y. ms(x)=(t, v, y) ∧ m s = ms{x (0, v, y)}} Gadd(t) {((π {t ml}, ms), (π {t m l}, m s)) | ∃x, y, z, u, v, w. ms(x)=(t, u, z) ∧ ms(z)=( , w, ) ∧ m s = ms{x (t, u, y)}{y (0, v, z)} ∧ (m l {y (0, v, z)} = ml) ∧ u<v<w} Grmv(t) {((π {t ml}, ms), (π {t m l}, m s)) | ∃x, y, z, u, v. ms(x)=(t, u, y) ∧ ms(y)=(t, v, z) ∧ m s {y (t, v, z)} = ms{x (t, u, z)} ∧ m l = ml {y (t, v, z)} ∧ v < MAX VAL} Glocal(t) {((π {t ml}, ms), (π {t m l}, ms)) | π ∈ (ThrdID → LMem) ∧ ml, m l, ms ∈ LMem} G(t) Glock(t) ∪ Gunlock(t) ∪ Gadd(t) ∪ Grmv(t) ∪ Glocal(t) R(t) t=t G(t ) Gadd(t) {((Π {t Ml}, Ms), (Π {t M l }, M s)) | ∃e. M s = Ms{S Ms(S)∪{e}}} Grmv(t) {((Π {t Ml}, Ms), (Π {t M l }, M s)) | ∃e. M s = Ms{S Ms(S)−{e}}} Glocal(t) {((Π {t Ml}, Ms), (Π {t M l }, Ms)) | Π ∈ (ThrdID → HMem) ∧ Ml, M l , Ms ∈ HMem} G(t) Gadd(t) ∪ Grmv(t) ∪ Glocal(t) R(t) t=t G(t ) Figure 9. Useful Definitions for the Lock-Coupling List A in Figure 8, which are the same as the corresponding guarantees in Figure 9, as we will explain below. In Figure 8, the abstract set is implemented by an ordered singly-linked list pointed to by a shared variable Head, with two sentinel nodes at the two ends of the list containing the values MIN VAL and MAX VAL respectively. Each list node is associated with a lock. Traversing the list uses “hand-over-hand” locking: the lock on one node is not released until its successor is locked. add(e) inserts a new node with value e in the appropriate position while holding the lock of its predecessor. rmv(e) redirects the predecessor’s pointer while both the node to be removed and its predecessor are locked. We define the α relation, the guarantees and the relies in Figure 9. The predicate ms |= list(x, A) represents a singly-linked list in the shared memory ms at the location x, whose values form the sequence A. Then the mapping shared map between the lowlevel and the high-level shared memory is defined by only concerning about the value sequence on the list: the concrete list should be sorted and its elements constitute the abstract set. For a thread t’s local memory of the two levels, we require that the values of e are the same and enough local space is provided for add(e) and rmv(e), as defined in the mapping local map. Then α relates the shared memory by shared map and the local memory of each thread t by local map. The atomic actions of the algorithm are specified by Glock, Gunlock, Gadd, Grmv and Glocal respectively, which are all parameterized with a thread identifier t. For example, Grmv(t) says that when holding the locks of the node y and its predecessor x, we can transfer the node y from the shared memory to the thread’s local memory. This corresponds to the action performed by the code of line 13 in rmv(e). Every thread t is executed in the environment that any other thread t can only perform those five actions, as de- fined in R(t). Similarly, the high-level G(t) and R(t) are defined according to the abstract ADD(e) and RMV(e). The relies and guarantees are almost the same as those in the proofs in RGSep [28]. We can prove that for any thread t, the following hold: (t.add(e), R(t), G(t)) α;αα (t.ADD(e), R(t), G(t)) ; (t.rmv(e), R(t), G(t)) α;αα (t.RMV(e), R(t), G(t)). We give the detailed proofs in the technical report [20], which are done operationally based on the definition of RGSim. By the compositionality and the soundness of RGSim, we know that the fine-grained operations (under the parallel environment R) are simulated by the corresponding atomic operations (under the high-level environment R), while R and R say all accesses to the set must be done through the add and remove operations. This gives us the atomicity of the concurrent implementation of the set object. More examples. In the technical report [20], we also show the use of RGSim to prove the atomicity of other fine-grained algorithms, including the non-blocking concurrent counter [27], Treiber’s stack algorithm [26], and a concurrent GCD algorithm (calculating greatest common divisors). 7. Verifying Concurrent Garbage Collectors In this section, we explain in detail how to reduce the problem of verifying concurrent garbage collectors to transformation verification, and use RGSim to develop a general GC verification framework. We apply the framework to prove the correctness of the Boehm et al. concurrent GC algorithm [7]. 7.1 Correctness of Concurrent GCs A concurrent GC is executed by a dedicate thread and performs the collection work in parallel with user threads (mutators), which access the shared heap via read, write and allocation operations. To ensure that the GC and the mutators share a coherent view of the heap, the heap operations from mutators may be instrumented with extra operations, which provide an interaction mechanism to allow arbitrary mutators to cooperate with the GC. These instrumented heap operations are called barriers (e.g., read barriers, write barriers and allocation barriers). The GC thread and the barriers constitute a concurrent garbage collecting system, which provides a higher-level user-friendly programming model for garbage-collected languages (e.g., Java). In this high-level model, programmers feel they access the heap using regular memory operations, and are freed from manually disposing objects that are no longer in use. They do not need to consider the implementation details of the GC and the existence of barriers. 464

点击进入文档下载页（PDF格式）

共14页，试读已结束，阅读完整版请下载

点击下载（PDF格式）

浏览记录