正在加载图片...
Redundancy Introduction ADD(e): RMV(e) 0 atom 0 atom (c,Id,g)(skip,Id,Id)Sta(,},(R1,R)) S SU[e}; S:=S-{e}; (c,R1,G)a:x7(skip,R2,Id) As we lifted sequential dead code elimination,we can also lift se- (a)An Abstract Set quential redundant code introduction to the concurrent setting,so long as the pre-and post-conditions are stable w.rt.the environ- add(e): rmv(e): ments.Note that here c is a single instruction,because we should local x,y,z,ui local x,y,z,v; 0 <x :Head;> 0 <x :Head;> consider the interference from the environments at every interme- 1 1 lock(x); lock(x); diate state when introducing a sequence of redundant instructions. <z :x.next;> <y :x.next;> 3 <v :y.data;> 5.2 An Example of Invariant Hoisting <u :z.data;> while (u<e){ while (v e){ With these rules,we can prove the correctness of many traditional lock(z); lock(y); compiler optimizations performed on concurrent programs in ap- unlock(x); 8 unlock(x); propriate contexts.Here we only give a small example of hoisting 67 X:■2; x:=y; loop invariants.More examples (e.g.,strength reduction and induc- 8 <z :x.next;> <y :x.next;> 9 <v :y.data;> tion variable elimination)can be found in the technical report [20] <u :z.data;> } if (v=e){ Target Code (C1) Source Code(C) 10 if (u !e){ 10 11 lock(y); local t; local t; 11 y :=new(); 12 t:=x+1; while(i n){ 12 y.1ock:=0; <z :=y.next;> hi1e(1<n){ t:=x+1; 13 13 <x.next :z;> y.data :=e; 14 unlock(x); 1:=1+t: 1:=1+t; 14 y.next :=2; 2 6 free(y); <x.next :=y;> y else When we do not care about the final value of t,it's not diffi- 16 16 unlock(x); unlock(x); cult to prove that the optimized code Ci preserves the sequential y behaviors of the source C [3].But in a concurrent setting,safely (b)The Lock-Coupling List-Based Set hoisting the invariant code t:=x+1 also requires that the environ- ment should not update x nor t. Figure 8.The Set Object R(a')a(x)=0'(x)Ao(t)=a'(t)}. The guarantee of the program can be specified as arbitrary transi- of implementations of the object:we can define abstract atomic tions.Since we only care about the values of i,n and x,the invari- operations in a high-level language as specifications,and prove ant relation o can be defined as: the concrete fine-grained implementations refine the correspond- a{(a1,)|a1(1)=a(1)A1()=()Ao1(x)=a(x}. ing atomic operations when executed in appropriate environments For instance,in Figure 8(a)we define two atomic set operations We do not need special pre-and post-conditions,thus the correct- ADD(e)and RMV(e).Figure 8(b)gives a concrete implementation ness of the optimization is formalized as follows: of the set object using a lock-coupling list.Partial correctness and (C1,R,True):x(C,R,True). atomicity of the algorithm has been verified before [28.29].Here (5.1) we show that its atomicity can also be verified using our RGSim We could prove (5.1)directly by the RGSim definition and the by proving the low-level methods refine the corresponding abstract operations.We will discuss the key difference between the previous operational semantics of the code.But below we give a more con- proofs and ours in Section 8. venient proof using the optimization rules and the compositional- ity rules instead.We first prove the following by the Dead-Code- We first take the generic languages in Figure 2,and instantiate Elimination and Redundancy-Introduction rules: the high-level program states below. (t:=x+1,R,True):x(skip,R,True); (HMem)Ms,M E (LocU PVar)-HVal (skip,R,True):7(t:=x+1,R,True), (HThrds) Π ∈ThrdID→HMem where y and n specify the states at the specific program points: (HState) ∈HThrds x HMem Yan{(a1,o)|a1(t)=1(x)+1} The state consists of shared memory Ms(where the object resides) n yn{(a1,a)la(t)=a(x)+1}. and a thread pool II,which is a mapping from thread identifiers (t ThrdlD)to their memory M.The low-level state is defined After adding skips to Ci and C to make them the same "shape" similarly.We use m.m and to represent the low-level shared we can prove the simulation by the compositionality rules SEQ and memory,thread-local memory and the thread pool respectively. WHILE.Finally,we remove all the skips and conclude (5.1),i.e. the correctness of the optimization in appropriate contexts.Since To allow ownership transfer between the shared memory and the relies only prohibit updates of x and t,we can execute Ci and thread-local memory,we use atomfC (or (C).A at the low level) C concurrently with other threads which update i and n or read x, to convert the shared memory to local and then execute C (or still ensuring semantics preservation. C)atomically.Following RGSep [29],an abstract transition A E P(HMem×HMem)(orA∈P(LMem×LMem)is used to specify 6.Proving Atomicity of Concurrent Objects the effects of the atomic operation over the shared memory,which allows us to split the resulting state back into shared and local when A concurrent object provides a set of methods,which can be called we exit the atomic block.The atomic blocks are instantiations of the in parallel by clients as the only way to access the object.RGSim generic primitive operations c(or c)in Figure 2.Their semantics is gives us a refinement-based proof method to verify the atomicity shown in the technical report [20].We omit the annotations A and 463Redundancy Introduction (c, Id, G) α;ζγ (skip, Id, Id) Sta({ζ,γ}, R1, R∗ 2α) (c, R1, G) α;ζγ (skip, R2, Id) As we lifted sequential dead code elimination, we can also lift se￾quential redundant code introduction to the concurrent setting, so long as the pre- and post-conditions are stable w.r.t. the environ￾ments. Note that here c is a single instruction, because we should consider the interference from the environments at every interme￾diate state when introducing a sequence of redundant instructions. 5.2 An Example of Invariant Hoisting With these rules, we can prove the correctness of many traditional compiler optimizations performed on concurrent programs in ap￾propriate contexts. Here we only give a small example of hoisting loop invariants. More examples (e.g., strength reduction and induc￾tion variable elimination) can be found in the technical report [20]. Target Code (C1) local t; t := x + 1; while(i < n) { i := i + t; } ⇐ Source Code (C) local t; while(i < n) { t := x + 1; i := i + t; } When we do not care about the final value of t, it’s not diffi- cult to prove that the optimized code C1 preserves the sequential behaviors of the source C [3]. But in a concurrent setting, safely hoisting the invariant code t:=x+1 also requires that the environ￾ment should not update x nor t. R {(σ, σ ) | σ(x) = σ (x) ∧ σ(t) = σ (t)} . The guarantee of the program can be specified as arbitrary transi￾tions. Since we only care about the values of i, n and x, the invari￾ant relation α can be defined as: α {(σ1, σ) | σ1(i) = σ(i) ∧ σ1(n) = σ(n) ∧ σ1(x) = σ(x)} . We do not need special pre- and post-conditions, thus the correct￾ness of the optimization is formalized as follows: (C1, R, True) α;αα (C, R, True). (5.1) We could prove (5.1) directly by the RGSim definition and the operational semantics of the code. But below we give a more con￾venient proof using the optimization rules and the compositional￾ity rules instead. We first prove the following by the Dead-Code￾Elimination and Redundancy-Introduction rules: (t:=x+1, R, True) α;αγ (skip, R, True) ; (skip, R, True) α;γη (t:=x+1, R, True) , where γ and η specify the states at the specific program points: γ α ∩ {(σ1, σ) | σ1(t) = σ1(x)+1} ; η γ ∩ {(σ1, σ) | σ(t) = σ(x)+1} . After adding skips to C1 and C to make them the same “shape”, we can prove the simulation by the compositionality rules SEQ and WHILE. Finally, we remove all the skips and conclude (5.1), i.e., the correctness of the optimization in appropriate contexts. Since the relies only prohibit updates of x and t, we can execute C1 and C concurrently with other threads which update i and n or read x, still ensuring semantics preservation. 6. Proving Atomicity of Concurrent Objects A concurrent object provides a set of methods, which can be called in parallel by clients as the only way to access the object. RGSim gives us a refinement-based proof method to verify the atomicity ADD(e) : RMV(e) : 0 atom { S := S ∪ {e}; } 0 atom { S := S − {e}; } (a) An Abstract Set add(e) : rmv(e) : local x,y,z,u; 0 <x := Head;> 1 lock(x); 2 <z := x.next;> 3 <u := z.data;> 4 while (u < e) { 5 lock(z); 6 unlock(x); 7 x := z; 8 <z := x.next;> 9 <u := z.data;> } 10 if (u != e) { 11 y := new(); 12 y.lock := 0; 13 y.data := e; 14 y.next := z; 15 <x.next := y;> } 16 unlock(x); local x,y,z,v; 0 <x := Head;> 1 lock(x); 2 <y := x.next;> 3 <v := y.data;> 4 while (v < e) { 5 lock(y); 6 unlock(x); 7 x := y; 8 <y := x.next;> 9 <v := y.data;> } 10 if (v = e) { 11 lock(y); 12 <z := y.next;> 13 <x.next := z;> 14 unlock(x); 15 free(y); } else { 16 unlock(x); } (b) The Lock-Coupling List-Based Set Figure 8. The Set Object of implementations of the object: we can define abstract atomic operations in a high-level language as specifications, and prove the concrete fine-grained implementations refine the correspond￾ing atomic operations when executed in appropriate environments. For instance, in Figure 8(a) we define two atomic set operations, ADD(e) and RMV(e). Figure 8(b) gives a concrete implementation of the set object using a lock-coupling list. Partial correctness and atomicity of the algorithm has been verified before [28, 29]. Here we show that its atomicity can also be verified using our RGSim by proving the low-level methods refine the corresponding abstract operations. We will discuss the key difference between the previous proofs and ours in Section 8. We first take the generic languages in Figure 2, and instantiate the high-level program states below. (HMem) Ms, Ml ∈ (Loc ∪ PVar)  HVal (HThrds) Π ∈ ThrdID → HMem (HState) Σ ∈ HThrds × HMem The state consists of shared memory Ms (where the object resides) and a thread pool Π, which is a mapping from thread identifiers (t ∈ ThrdID) to their memory Ml. The low-level state σ is defined similarly. We use ms, ml and π to represent the low-level shared memory, thread-local memory and the thread pool respectively. To allow ownership transfer between the shared memory and thread-local memory, we use atom{C}A (or CA at the low level) to convert the shared memory to local and then execute C (or C) atomically. Following RGSep [29], an abstract transition A ∈ P(HMem×HMem) (or A∈P(LMem×LMem)) is used to specify the effects of the atomic operation over the shared memory, which allows us to split the resulting state back into shared and local when we exit the atomic block. The atomic blocks are instantiations of the generic primitive operations c (or c) in Figure 2. Their semantics is shown in the technical report [20]. We omit the annotations A and 463
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有