Li L,Zhang Y,Chen YY et al.Certifying concurrent programs using transactional memory.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 24(1):110-121 Jan.2009 Certifying Concurrent Programs Using Transactional Memory Long Li(李隆),Yu Zhang(张昱),Yi-Yun Chen(陈意云),and Yong Li(李勇) Department of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China Software Security Laboratory,Suzhou Institute for Advanced Study,University of Science and Technology of China Suzhou 215123,China E-mail:fliwis,liyong}@mail.ustc.edu.cn:fyuzhang,yiyun@ustc.edu.cn Received December 19,2007;revised June 23,2008 Abstract Transactional memory (TM)is a new promising concurrency-control mechanism that can avoid many of the pitfalls of the traditional lock-based techniques.TM systems handle data races between threads automatically so that programmers do not have to reason about the interaction of threads manually.TM provides a programming model that may make the development of multi-threaded programs easier.Much work has been done to explore the various implementation strategies of TM systems and to achieve better performance,but little has been done on how to formally reason about programs using TM and how to make sure that such reasoning is sound.In this paper,we focus on the semantics of transactional memory and present a proof-carrying code (PCC)system for reasoning about programs using TM.We formalize our reasoning with respect to the TM semantics,prove its soundness,and use examples to demonstrate its effectiveness. Keywords program verification,transactional memory,proof-carrying code,concurrent program safety 1 Introduction performance issues of these implementations have also been discussed.And there has also been some work The multicore architectures.which provide infras- that explores the formalization of the semantics of tructures of better performance,have become popu- TM[6.11-151.But little has been done on how to for- lar.To benefit from this growth in performance,pro- mally reason about programs using TM and how to grammers are required and forced to develop multi- make sure that such reasoning is sound.Easy reason- threaded programs.However,it is of great challenges ing,one of the primary reasons to introduce transac- for programmers to deal with subtle concurrent access tional memory,is rarely referred to. to shared memory by multiple threads.Traditionally. programmers use locks to achieve mutual exclusion on Proof-carrying code(PCC)16]is a Hoare-style rea- concurrent access to shared memory.But lock-based soning framework and has been well studied to rea- programming is difficult to reason about and may lead son about various properties of programs in recent to problems such as deadlock.Transactional memory years[17-201.And there are also extensions for reason- (TM)is proposed as an alternate concurrency-control ing about multi-threaded programs using locks(21.221 mechanism to avoid many of the pitfalls of the lock- However,the appearance of programs using TM claims based one.TM systems provide transactions as a pro for to be a brand-new system. gramming structure for programmers and handle data In this paper,we present a system to certify con- races between threads automatically so that program- current programs using transactional memory.We de- mers do not have to reason about the interaction of fine an abstract machine with built-in TM at assembly- threads manually.TM provides a programming model level and present a program logic based on this ma- that may make the development of multi-threaded pro- chine to support the verification of different properties. grams easier. We prove the soundness of our program logic with re- Much work has been done to explore the var- spect to the semantics of transactional memory and use ious implementation strategies of TM systems in examples to demonstrate its effectiveness.Our paper hardwarel1-4,in softwarel5-8],and in hybridl9,10.The makes the following contributions. Regular Paper Supported by the National Natural Science Foundation of China under Grant Nos.60673126 and 90718026,and Intel China Research Center
Li L, Zhang Y, Chen YY et al. Certifying concurrent programs using transactional memory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 24(1): 110–121 Jan. 2009 Certifying Concurrent Programs Using Transactional Memory Long Li (李 隆), Yu Zhang (张 昱), Yi-Yun Chen (陈意云), and Yong Li (李 勇) Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China Software Security Laboratory, Suzhou Institute for Advanced Study, University of Science and Technology of China Suzhou 215123, China E-mail: {liwis, liyong}@mail.ustc.edu.cn; {yuzhang, yiyun}@ustc.edu.cn Received December 19, 2007; revised June 23, 2008. Abstract Transactional memory (TM) is a new promising concurrency-control mechanism that can avoid many of the pitfalls of the traditional lock-based techniques. TM systems handle data races between threads automatically so that programmers do not have to reason about the interaction of threads manually. TM provides a programming model that may make the development of multi-threaded programs easier. Much work has been done to explore the various implementation strategies of TM systems and to achieve better performance, but little has been done on how to formally reason about programs using TM and how to make sure that such reasoning is sound. In this paper, we focus on the semantics of transactional memory and present a proof-carrying code (PCC) system for reasoning about programs using TM . We formalize our reasoning with respect to the TM semantics, prove its soundness, and use examples to demonstrate its effectiveness. Keywords program verification, transactional memory, proof-carrying code, concurrent program safety 1 Introduction The multicore architectures, which provide infrastructures of better performance, have become popular. To benefit from this growth in performance, programmers are required and forced to develop multithreaded programs. However, it is of great challenges for programmers to deal with subtle concurrent access to shared memory by multiple threads. Traditionally, programmers use locks to achieve mutual exclusion on concurrent access to shared memory. But lock-based programming is difficult to reason about and may lead to problems such as deadlock. Transactional memory (TM) is proposed as an alternate concurrency-control mechanism to avoid many of the pitfalls of the lockbased one. TM systems provide transactions as a programming structure for programmers and handle data races between threads automatically so that programmers do not have to reason about the interaction of threads manually. TM provides a programming model that may make the development of multi-threaded programs easier. Much work has been done to explore the various implementation strategies of TM systems in hardware[1−4], in software[5−8], and in hybrid[9,10]. The performance issues of these implementations have also been discussed. And there has also been some work that explores the formalization of the semantics of TM[6,11−15]. But little has been done on how to formally reason about programs using TM and how to make sure that such reasoning is sound. Easy reasoning, one of the primary reasons to introduce transactional memory, is rarely referred to. Proof-carrying code (PCC)[16] is a Hoare-style reasoning framework and has been well studied to reason about various properties of programs in recent years[17−20]. And there are also extensions for reasoning about multi-threaded programs using locks[21,22] . However, the appearance of programs using TM claims for to be a brand-new system. In this paper, we present a system to certify concurrent programs using transactional memory. We de- fine an abstract machine with built-in TM at assemblylevel and present a program logic based on this machine to support the verification of different properties. We prove the soundness of our program logic with respect to the semantics of transactional memory and use examples to demonstrate its effectiveness. Our paper makes the following contributions. Regular Paper Supported by the National Natural Science Foundation of China under Grant Nos. 60673126 and 90718026, and Intel China Research Center
Long Li et al.:Certifying Concurrent Programs Using TM 111 We present a system for certifying properties of Then we present examples that are written and proven programs using TM.Even though the programming in our system in Section 4.In Section 5.we discuss the using transactions is supposed to reason about easily related work and then give the conclusion in Section 6. there are subtle properties that are expected by multi- threaded programs.To ensure such properties are cor- Transactional Memory rectly enforced in a concurrent program,a substantial way is to perform a formal verification.Our system is In this section,we discuss the semantics of transac- essentially a PCC system that performs such verifica- tional memory and its implementation strategies.Then tion. we study the violations in implementations and deter- We present a program logic for reasoning about mine the compositions of the abstract machine in our properties of programs using TM in our system.And system. we prove it sound with respect to the TM semantics. TM systems free programmers from manually reason- 2.1 Semantics and Implementations ing about the interaction of threads,but threads do cooperate on shared memory.To verify the properties Transactional memory systems provide transactions of a program using TM,we not only verify the proper- as a programming structure for programmers.A trans- ties of each thread,but also emphasize the cooperation action is a sequence of instructions that executes atom- ically and in isolation.The atomicity refers to the all- that is allowed on shared memory.In our system,we specify a global invariant on shared memory to justify or-nothing effect on the memory.If a transaction com the interleaving of transactions.This global invariant mits,then all of its modifications to the memory appear is required to hold all through the program.We es- to take effect all together and happen instantaneously. tablish the consistency for transactions in our system And if a transaction aborts (a rollback occurs subse- by requiring a transaction to transmit shared memory quently),then none of its modifications takes effect,as from one consistent state that respects the global in- if the transaction were never executed. A transaction also runs in isolation.It executes as variant to another.We formalize such reasoning in a PCC system and prove it sound following the syntactic if it is the only active operation in the system,and no approach to proving type soundness.And we also use other operations make progress while it is running.This means that the effect of a transaction is not visible to examples to demonstrate its effectiveness.Our reason other threads until the transaction commits. ing reveals the easy reasoning aspect of programs using TM:programmers can reason about their threads in The atomicity and isolation properties together give isolation and need only to make sure that their threads the illusion that there is no interaction between trans- do not violate the global invariant when transactions actions,and a transaction executes as a single atomic step with respect to the other threads in the system. update shared memory. Thus,programmers can reason about the behavior of a Our system addresses the safety issues at assembly- thread without considering the interaction of the other level directly as PCC systems do.So we do not need threads,as if it is executed under a sequential circum- to trust the complicated compilation and optimization, stance. and can have a smaller trusted computing base to build The key mechanisms for a TM system to provide the executable PCC packages for programs using TM.Fur- atomicity and isolation properties are data versioning thermore,our abstraction of TM systems is still general and conflict detection,whose implementations distin- and similar to high-level ones.The reasoning we de- guish alternative TM proposals. scribe at assembly-level can be easily lifted up to higher Data versioning manages multiple versions of data levels. when transactions are being executed.A new version, All our work is mechanized in Coq proof assistant3 produced by one of the not-yet-commit transactions. to build machine checkable proofs.And the efforts will become visible to all other threads after the trans- needed to build proofs for PCC packages are signifi- action commits.The old version,produced by a pre- cantly mitigated due to the easy reasoning property viously committed transaction,will be preserved when enforced by TM semantics. the transaction aborts.With eager versioning,a write The rest of this paper is organized as follows.In access within a transaction immediately writes to mem- Section 2,we discuss the semantics and implementa- ory the new data and creates an undo log for the old tions of transactional memory and also the violations one.Also the write access prevents other threads from in practice.In Section 3,we describe the abstract ma- accessing the same address until the transaction com- chine we model and the program logic we use to reason. pletes by using locks or equivalent mechanisms.When
Long Li et al.: Certifying Concurrent Programs Using TM 111 • We present a system for certifying properties of programs using TM. Even though the programming using transactions is supposed to reason about easily, there are subtle properties that are expected by multithreaded programs. To ensure such properties are correctly enforced in a concurrent program, a substantial way is to perform a formal verification. Our system is essentially a PCC system that performs such verification. • We present a program logic for reasoning about properties of programs using TM in our system. And we prove it sound with respect to the TM semantics. TM systems free programmers from manually reasoning about the interaction of threads, but threads do cooperate on shared memory. To verify the properties of a program using TM, we not only verify the properties of each thread, but also emphasize the cooperation that is allowed on shared memory. In our system, we specify a global invariant on shared memory to justify the interleaving of transactions. This global invariant is required to hold all through the program. We establish the consistency for transactions in our system by requiring a transaction to transmit shared memory from one consistent state that respects the global invariant to another. We formalize such reasoning in a PCC system and prove it sound following the syntactic approach to proving type soundness. And we also use examples to demonstrate its effectiveness. Our reasoning reveals the easy reasoning aspect of programs using TM: programmers can reason about their threads in isolation and need only to make sure that their threads do not violate the global invariant when transactions update shared memory. • Our system addresses the safety issues at assemblylevel directly as PCC systems do. So we do not need to trust the complicated compilation and optimization, and can have a smaller trusted computing base to build executable PCC packages for programs using TM. Furthermore, our abstraction of TM systems is still general and similar to high-level ones. The reasoning we describe at assembly-level can be easily lifted up to higher levels. All our work is mechanized in Coq proof assistant[23] to build machine checkable proofs. And the efforts needed to build proofs for PCC packages are signifi- cantly mitigated due to the easy reasoning property enforced by TM semantics. The rest of this paper is organized as follows. In Section 2, we discuss the semantics and implementations of transactional memory and also the violations in practice. In Section 3, we describe the abstract machine we model and the program logic we use to reason. Then we present examples that are written and proven in our system in Section 4. In Section 5, we discuss the related work and then give the conclusion in Section 6. 2 Transactional Memory In this section, we discuss the semantics of transactional memory and its implementation strategies. Then we study the violations in implementations and determine the compositions of the abstract machine in our system. 2.1 Semantics and Implementations Transactional memory systems provide transactions as a programming structure for programmers. A transaction is a sequence of instructions that executes atomically and in isolation. The atomicity refers to the allor-nothing effect on the memory. If a transaction commits, then all of its modifications to the memory appear to take effect all together and happen instantaneously. And if a transaction aborts (a rollback occurs subsequently), then none of its modifications takes effect, as if the transaction were never executed. A transaction also runs in isolation. It executes as if it is the only active operation in the system, and no other operations make progress while it is running. This means that the effect of a transaction is not visible to other threads until the transaction commits. The atomicity and isolation properties together give the illusion that there is no interaction between transactions, and a transaction executes as a single atomic step with respect to the other threads in the system. Thus, programmers can reason about the behavior of a thread without considering the interaction of the other threads, as if it is executed under a sequential circumstance. The key mechanisms for a TM system to provide the atomicity and isolation properties are data versioning and conflict detection, whose implementations distinguish alternative TM proposals. Data versioning manages multiple versions of data when transactions are being executed. A new version, produced by one of the not-yet-commit transactions, will become visible to all other threads after the transaction commits. The old version, produced by a previously committed transaction, will be preserved when the transaction aborts. With eager versioning, a write access within a transaction immediately writes to memory the new data and creates an undo log for the old one. Also the write access prevents other threads from accessing the same address until the transaction completes by using locks or equivalent mechanisms. When
112 J.Comput.Sci.Technol.,Jan.2009,Vol.24.No.1 the transaction aborts,the undo log is used to restore To focus on the reasoning issue,we model the ab- the old version of data stract machine with strict semantics of atomicity and Lazy versioning buffers all new data versions until isolation and require that all access to shared data must the transaction completes.The new versions will be appear within transactions.The TM system in our sys- copied to the actual memory when the transaction com- tem will be a lazy versioning.lazy conflict detection mits.If the transaction aborts,the buffer is discarded system.Being at the assembly level,the granularity of and no further compensation is needed. conflict detection is naturally word-level.And we do Conflict detection signals the multiple access to the not support nested transaction for now since it is much same data with at least one transaction attempting more a performance consideration than the semantics to write a new version.Detection relies on tracking of transaction basis.We leave this to our future work. the read set (addresses read from)and write set (ad- dresses written to)for each transaction.Under eager 3 The System conflict detection,the system checks for conflicts when- ever transactions read or write data.While under lazy In this section,we first present an abstract machine conflict detection,all checks are delayed until the end with transaction structures and its operational seman- of each transaction (i.e.,when it tries to commit). tics.Then we present a program logic of this machine for reasoning. 2.2 Violations 3.1 Abstract Machine TM systems are supposed to enforce the properties of atomicity(all-or-nothing)and isolation (intermedi- Fig.1 defines the abstract machine and the syntax ate state not visible to other threads).But practical of the assembly language.A program P consists of a implementations may violate these properties more or code heap C,a global shared data heap H(referred to less and may lead to unexpected behaviors of programs as a "global heap"in the remainder of this paper)and using TM as follows numbers of threads TS working on the global heap con- Atomicity Violation.In practice,the rollback op- currently.The code heap C maps labels to instructions. eration of eager versioning and the commit operation of The data heap H maps heap addresses to natural values. lazy versioning are time-consuming.If the TM system And the threads TS are modeled as a partial mapping allows other threads to access the data which rollback from thread id t(natural numbers)to its owner thread or commit may modify during their execution,threads T.Each thread T consists of a local view of the global may get invalid data (read data ahead of rollback or heap H.(referred to as a "local heap"in the remainder commit operations)or lose update (write data ahead). of this paper),a write set Hw for recording the write The solution is to claim exclusiveness for the data that attempts of its transaction,a register file R,a program rollback or commit is about to access before their exe- cution. (Program) :=(C,H,T3) Isolation Violation.Threads may see the inter- (Threads) TS :={t∽T}* (Thread) T mediate state of a transaction when implementations = (Hs:HR,pc,L:B) (BackFile) of eager versioning do not claim exclusiveness for the :=(R,pc) addresses it writes to.Blundell et al.4 discussed this (CodeHeap) 0 ={f…}" (Heap) H.Hs = issue and called it weak atomicity. w) (WriteSet) Hw :={亿w}* Privatization Issue.The atomicity and isolation (LogFile) 8= (L,w)::L properties of TM systems do not specify whether the (RegFile) = {r4w}" access to shared data is allowed to bypass the TM sys- (Reg) 三 {rkk∈o31 tem's mechanisms (outside transactions).If "yes",it (Labels)t,f,l,pc:= n (nat nums) introduces the privatization problem25)we will not ad- (WordVal) ::=n (nat nums) dress for now.If "no",then TM systems should han- (Instr) :addurd,rs,rt l addiurd,rs,w dle the access to shared data outside transactions too, suburd,rs,rt|slturd,rs,rt because they may lead to unexpected behaviors of pro- I andird:rs:1 lw rd;w(rs) grams.Shpeisman et al.26]discussed this problem and I sw rs,w(rd)|begrs,rt,f solved it by adding barriers with respect to TM system I bners,rt,f ljfljrrs for every access to memory outside transactions.These starttrans commit barriers can be treated as featherweight transactions in (InstrSeq) our opinion. Fig.1.Machine syntax
112 J. Comput. Sci. & Technol., Jan. 2009, Vol.24, No.1 the transaction aborts, the undo log is used to restore the old version of data. Lazy versioning buffers all new data versions until the transaction completes. The new versions will be copied to the actual memory when the transaction commits. If the transaction aborts, the buffer is discarded and no further compensation is needed. Conflict detection signals the multiple access to the same data with at least one transaction attempting to write a new version. Detection relies on tracking the read set (addresses read from) and write set (addresses written to) for each transaction. Under eager conflict detection, the system checks for conflicts whenever transactions read or write data. While under lazy conflict detection, all checks are delayed until the end of each transaction (i.e., when it tries to commit). 2.2 Violations TM systems are supposed to enforce the properties of atomicity (all-or-nothing) and isolation (intermediate state not visible to other threads). But practical implementations may violate these properties more or less and may lead to unexpected behaviors of programs using TM as follows. • Atomicity Violation. In practice, the rollback operation of eager versioning and the commit operation of lazy versioning are time-consuming. If the TM system allows other threads to access the data which rollback or commit may modify during their execution, threads may get invalid data (read data ahead of rollback or commit operations) or lose update (write data ahead). The solution is to claim exclusiveness for the data that rollback or commit is about to access before their execution. • Isolation Violation. Threads may see the intermediate state of a transaction when implementations of eager versioning do not claim exclusiveness for the addresses it writes to. Blundell et al.[24] discussed this issue and called it weak atomicity. • Privatization Issue. The atomicity and isolation properties of TM systems do not specify whether the access to shared data is allowed to bypass the TM system’s mechanisms (outside transactions). If “yes”, it introduces the privatization problem[25] we will not address for now. If “no”, then TM systems should handle the access to shared data outside transactions too, because they may lead to unexpected behaviors of programs. Shpeisman et al.[26] discussed this problem and solved it by adding barriers with respect to TM system for every access to memory outside transactions. These barriers can be treated as featherweight transactions in our opinion. To focus on the reasoning issue, we model the abstract machine with strict semantics of atomicity and isolation and require that all access to shared data must appear within transactions. The TM system in our system will be a lazy versioning, lazy conflict detection system. Being at the assembly level, the granularity of conflict detection is naturally word-level. And we do not support nested transaction for now since it is much more a performance consideration than the semantics of transaction basis. We leave this to our future work. 3 The System In this section, we first present an abstract machine with transaction structures and its operational semantics. Then we present a program logic of this machine for reasoning. 3.1 Abstract Machine Fig.1 defines the abstract machine and the syntax of the assembly language. A program P consists of a code heap C, a global shared data heap H (referred to as a “global heap” in the remainder of this paper) and numbers of threads TS working on the global heap concurrently. The code heap C maps labels to instructions. The data heap H maps heap addresses to natural values. And the threads TS are modeled as a partial mapping from thread id t (natural numbers) to its owner thread T. Each thread T consists of a local view of the global heap Hs (referred to as a “local heap” in the remainder of this paper), a write set Hw for recording the write attempts of its transaction, a register file R, a program (Program) P ::= (C, H, TS) (Threads) TS ::= {t à T} ∗ (Thread) T ::= (Hs, Hw, R, pc, L, B) (BackFile) B ::= (R, pc) (CodeHeap) C ::= {f à ι} ∗ (Heap) H, Hs ::= {l à w} ∗ (WriteSet) Hw ::= {l à w} ∗ (LogFile) L ::= ∅ | (l, w) :: L (RegFile) R ::= {r à w} ∗ (Reg) r ::= {rk} k∈{0,...,31} (Labels) t, f, l, pc ::= n (nat nums) (WordVal) w ::= n (nat nums) (Instr) ι ::= addu rd, rs, rt | addiu rd, rs, w | subu rd, rs, rt | sltu rd, rs, rt | andi rd, rs, 1 | lw rd, w(rs) | sw rs, w(rd) | beq rs, rt, f | bne rs, rt, f | j f | jr rs | starttrans | commit (InstrSeq) I ::= ι | ι;I Fig.1. Machine syntax
Long Li et al.:Certifying Concurrent Programs Using TM 113 (C,H,t1一T1),,(tn∽Tn)一(C,H,[t一T,,(tk-1一Tk-1),(tkTg),(tk+1Tk+1),(tnTn0 if (C.H,T)(CT)for anyk where (C,H,(H Hw,R,pc,L,B))(C,H,T) if C[pc] then (H T)= addurd,rs,rt (阻,(Hs,Hm,R{ra一R(rs)+R(r)},pc+l,L,B) addiurd,rs,w (H,(田s,Hr,R{ra∽R(rs)+w,pc+1,L,B) suburd,rs,rt (H,(Hs,He,R{rd∽R(rs)-R(r)},pc+1,L,B)】 slturd,rs,rt (H,(Hs,Hw,Rfrd~i,pc+1,L,B))if R(rs)<R(rt),i=1,else i=0 andird,rs,1 (H (H Rrdi},pc+1,L,B))if R(rs)is odd,i=1,else i=0 lwrd,w(rs) (H,(Hs,Hm,R{raHs(Oh,pc+1,(L:(亿,H(),B) ifl走dom(He) (H,(HH,R{rd()},pc+1,L,B)) ifl∈dom(He) where l=R(rs)+Al∈dom(Hs) swrs,w(rd) (H,((HsR(rs)),(HR(rs)),R,pc+1,L,B))if I=R(rd)+wAlE dom(Hs) beqrs,rt:f (H,(Hs:HR,f,L,B)) if R(rs)=R(r) (H,(H.,H,R,pc+1,L,B)) otherwise bners,rt;f (H,(Hs,H,R,f,L,B)) ifR(rs)≠R(re) (H,(Hs H R,pc+1,L,B)) otherwise if (H,(Hs,H,R,f,L,B)) jrrs (H,(s,H如,R,R(r),L,B) starttrans (H (H,,R,pc+1,0,(R,pc))) commit (但yHu),(0,0,R,pc+1,a,B)】 if(L,w)∈L,H()=w (H,(0,B.R,B.pc,0,B)) otherwise Fig.2.Operational semantics counter pc,a log file L for recording the read attempts point.This thread model is enforced by allowing any of its transaction,and a back-up file B for restoring the thread (id tk)to execute at any program point in the state of a transaction when it rolls back.Here we ignore definition of the step function.Thus,we define an in- the fact that a thread may have a private data heap of terleaving model for multi-threaded machines where the its own,because the private data heap does not consist instructions of individual thread execute with sequen- of shared data and it can be treated as an extension of tial consistency. the register file. The register file R maps registers to natural values. The operational semantics for most instructions are quite straightforward.We focus on the transaction re- The log file L records the heap address and the value it lated instructions here.The starttrans instruction stores for every read when executing a transaction.And signals the start of a transaction.It initializes the write the back-up file B records the state of the register file R set H and the log file L to empty and saves current and the program counter pc when a transaction starts. The set of instructions we present here are the com- register file R and program counter pc to the back-up file B to recover the state of the transaction when it rolls monly used subset in RISC machines with additional instructions marking the start (starttrans)and the back.It also copies the global heap H to its local copy H.to act as its view to the global heap when execut- end (commit)of a transaction. ing inside a transaction.The commit instruction checks The step function (of a program P is defined in whether the log file L is consistent with current global Fig.2.The auxiliary function (C,H,T)(C,H,T) heap H.If "yes",the transaction is allowed to commit is used to define the effects of the execution of a thread its effects H to the global heap H.If"no",register file Tk.Here we follow the preemptive thread model where and program counter are restored from the back-up file execution of threads can be preempted at any program B to restart the transaction.For both cases,the local
Long Li et al.: Certifying Concurrent Programs Using TM 113 (C, H, [(t1 à T1), . . . , (tn à Tn)]) 7−→ (C, H0 , [(t1 à T1), . . . , (tk−1 à Tk−1), (tk à T 0 k ), (tk+1 à Tk+1), . . . , (tn à Tn)]) if (C, H, Tk) k 7−→(C, H0 , T 0 k ) for any k where (C, H, (Hs, Hw, R, pc, L, B)) k 7−→(C, H0 , T 0 ) if C[pc] = then (H0 , T 0 ) = addu rd, rs, rt (H, (Hs, Hw, R{rd à R(rs) + R(rt)}, pc+1, L, B)) addiu rd, rs, w (H, (Hs, Hw, R{rd à R(rs) + w}, pc+1, L, B)) subu rd, rs, rt (H, (Hs, Hw, R{rd à R(rs) − R(rt)}, pc+1, L, B)) sltu rd, rs, rt (H, (Hs, Hw, R{rd à i}, pc+1, L, B)) if R(rs) < R(rt), i = 1, else i = 0 andi rd, rs, 1 (H, (Hs, Hw, R{rd à i}, pc+1, L, B)) if R(rs) is odd, i = 1, else i = 0 lw rd, w(rs) (H, (Hs, Hw, R{rd à Hs(l)}, pc+1, (L :: (l, Hs(l))), B)) if l /∈ dom(Hw) (H, (Hs, Hw, R{rd à Hs(l)}, pc+1, L, B)) if l ∈ dom(Hw) where l = R(rs) + w ∧ l ∈ dom(Hs) sw rs, w(rd) (H, ((Hs{l à R(rs)}), (Hw ] {l à R(rs)}), R, pc+1, L, B)) if l = R(rd) + w ∧ l ∈ dom(Hs) beq rs, rt, f (H, (Hs, Hw, R, f, L, B)) if R(rs) = R(rt) (H, (Hs, Hw, R, pc+1, L, B)) otherwise bne rs, rt, f (H, (Hs, Hw, R, f, L, B)) if R(rs) 6= R(rt) (H, (Hs, Hw, R, pc+1, L, B)) otherwise j f (H, (Hs, Hw, R, f, L, B)) jr rs (H, (Hs, Hw, R, R(rs), L, B)) starttrans (H, (H, ∅, R, pc+1, ∅, (R, pc))) commit ((H ] Hw), (∅, ∅, R, pc+1, ∅, B)) if ∀ (l, w) ∈ L, H(l) = w (H, (∅, ∅, B.R, B.pc, ∅, B)) otherwise Fig.2. Operational semantics. counter pc, a log file L for recording the read attempts of its transaction, and a back-up file B for restoring the state of a transaction when it rolls back. Here we ignore the fact that a thread may have a private data heap of its own, because the private data heap does not consist of shared data and it can be treated as an extension of the register file. The register file R maps registers to natural values. The log file L records the heap address and the value it stores for every read when executing a transaction. And the back-up file B records the state of the register file R and the program counter pc when a transaction starts. The set of instructions we present here are the commonly used subset in RISC machines with additional instructions marking the start (starttrans) and the end (commit) of a transaction. The step function (7−→) of a program P is defined in Fig.2. The auxiliary function (C, H,Tk) k 7−→(C, H0 ,T 0 k ) is used to define the effects of the execution of a thread Tk. Here we follow the preemptive thread model where execution of threads can be preempted at any program point. This thread model is enforced by allowing any thread (id tk) to execute at any program point in the definition of the step function. Thus, we define an interleaving model for multi-threaded machines where the instructions of individual thread execute with sequential consistency. The operational semantics for most instructions are quite straightforward. We focus on the transaction related instructions here. The starttrans instruction signals the start of a transaction. It initializes the write set Hw and the log file L to empty and saves current register file R and program counter pc to the back-up file B to recover the state of the transaction when it rolls back. It also copies the global heap H to its local copy Hs to act as its view to the global heap when executing inside a transaction. The commit instruction checks whether the log file L is consistent with current global heap H. If “yes”, the transaction is allowed to commit its effects Hw to the global heap H. If “no”, register file and program counter are restored from the back-up file B to restart the transaction. For both cases, the local
114 J.Comput.Sci.Technol.,Jan.2009,Vol.24.No.1 heap Hs,write set H and log file L are set empty when the transaction completes.Since the intermedi- The lw and sw instructions operate on local heap H ate state of a transaction is not visible to the other only.Thus all the access attempts to the global heap H threads.the violation of the invariant will not be ob- are required to operate on the local copy Hs first and served by other threads,the global heap will always be take effect on the global heap H until the correspond- in a consistent state that respects the invariant.To en- ing transaction commits.So the non-transactional ac force such reasoning,we track the state of thread-local cess to shared memory H is denied.The lw instruction data structures during the execution of a thread to en- reads data and appends the log file L when the ad- sure that the desired properties are correctly enforced. dress it reads from is not in the domain of the write And we also make sure that the update operation on set H.The sw instruction updates the local heap H. the global heap does not violate the invariant on the directly and performs a unique merge on the write set global heap. H recording its write attempt to the global heap H. Fig.3 defines the specification constructs in our sys- The function defines a unique merge on heaps as tem for such reasoning.A program specification con- follows: sists of a global invariant Inv and code heap specifica- tions for threads.The global invariant Inv speci- V1.(Ha出H6)()= H6(),1∈H6; fies the restrictions on the global heap H when multi- Ha (1),I ple threads are working on it concurrently.The code heap specification maps labels to preconditions.The The abstract machine records the values but not the global invariant Inv and the precondition p for an in- versions of the global heap in logs as what Herlihy et struction together form the expectations at that pro- al.7 have done and identifies the consistency of the log gram point. file L by checking whether all logged tuples (l,w)reflect the mapping relation in H(付(l,w)∈L,H()=w) The implementation strategies of data versioning and (State) S::=(H,Hs,Hw,R,pc,L,B) conflict detection may vary in different TM systems. (Pstate)Sp ::=(Hs,H,R,pc,B) However,verification should check the consistency of (Spred)p ∈Pstate-Prop both branches no matter what mechanisms cause the (Invariant)Inv E Heap-Prop control fow to branch.So we implement a relatively (ProgSpec)Φ=(nU,1,.,乎n]) simple mechanism in the abstract machine to remove the nondeterminacy of the system. (CdHpSpec)Ψ:={fp}" (Coerc)S]::(Hs;H:R,pc,B) As shown in Fig.2.all instructions operate on local data structures of threads except commit.So the effects Fig.3.Specification syntax (on the global heap)of a transaction are not visible to other threads until the transaction commits.The isola- The validity of a precondition depends on the lo- tion property is properly established.Also the update cal data structures only.Note that the log file L is operation on the global heap and the rollback operation excluded,because programmers will not observe the on local structures are defined to finish in a single step, transitions of the log file and the well-formedness of we achieve atomicity by definitions. a commit instruction relies on the well-formedness of The abstract machine we present here may have both branches,the state of the log file is not vital. poor performance due to redundant rollbacks:success- ful commit of other threads will always cause a trans- 3.3 Inference Rules action to roll back,if the transaction has a read from The inference rules for a program are shown in Fig.4, the address that is just updated,no matter it happens and the rules for instructions are shown in Fig.5.We before(conflict occurs)or after that commit (should be use the following judgement forms to define the infer- fine).But it defines a semantics of TM systems strictly ence rules: and works fine with reasoning as we show later. Φ,p,,pn上P (well-formed program), 3.2 Program Specifications 亚,Inw上{p}(C,H,T) (well-formed thread). 亚,Inw上C:亚' (well-formed code heap), In our system,transactions are identified as atomic 亚,Imw上{p}f:I (well-formed instr.sequence). operations that modify the global heap with respect to certain invariant.This invariant may be violated during A program is well-formed if the global heap respects the execution of a transaction,but will be reestablished the invariant on it and every thread of the program is
114 J. Comput. Sci. & Technol., Jan. 2009, Vol.24, No.1 heap Hs, write set Hw and log file L are set empty. The lw and sw instructions operate on local heap Hs only. Thus all the access attempts to the global heap H are required to operate on the local copy Hs first and take effect on the global heap H until the corresponding transaction commits. So the non-transactional access to shared memory H is denied. The lw instruction reads data and appends the log file L when the address it reads from is not in the domain of the write set Hw. The sw instruction updates the local heap Hs directly and performs a unique merge on the write set Hw recording its write attempt to the global heap H. The function ] defines a unique merge on heaps as follows: ∀ l. (Ha ] Hb) (l) = ½ Hb (l), l ∈ Hb; Ha (l), l 6∈ Hb. The abstract machine records the values but not the versions of the global heap in logs as what Herlihy et al.[7] have done and identifies the consistency of the log file L by checking whether all logged tuples (l, w) reflect the mapping relation in H (∀ (l, w) ∈ L, H(l) = w). The implementation strategies of data versioning and conflict detection may vary in different TM systems. However, verification should check the consistency of both branches no matter what mechanisms cause the control flow to branch. So we implement a relatively simple mechanism in the abstract machine to remove the nondeterminacy of the system. As shown in Fig.2, all instructions operate on local data structures of threads except commit. So the effects (on the global heap) of a transaction are not visible to other threads until the transaction commits. The isolation property is properly established. Also the update operation on the global heap and the rollback operation on local structures are defined to finish in a single step, we achieve atomicity by definitions. The abstract machine we present here may have poor performance due to redundant rollbacks: successful commit of other threads will always cause a transaction to roll back, if the transaction has a read from the address that is just updated, no matter it happens before (conflict occurs) or after that commit (should be fine). But it defines a semantics of TM systems strictly and works fine with reasoning as we show later. 3.2 Program Specifications In our system, transactions are identified as atomic operations that modify the global heap with respect to certain invariant. This invariant may be violated during the execution of a transaction, but will be reestablished when the transaction completes. Since the intermediate state of a transaction is not visible to the other threads, the violation of the invariant will not be observed by other threads, the global heap will always be in a consistent state that respects the invariant. To enforce such reasoning, we track the state of thread-local data structures during the execution of a thread to ensure that the desired properties are correctly enforced. And we also make sure that the update operation on the global heap does not violate the invariant on the global heap. Fig.3 defines the specification constructs in our system for such reasoning. A program specification Φ consists of a global invariant Inv and code heap specifications Ψ for threads. The global invariant Inv speci- fies the restrictions on the global heap H when multiple threads are working on it concurrently. The code heap specification Ψ maps labels to preconditions. The global invariant Inv and the precondition p for an instruction together form the expectations at that program point. (State) S ::= (H, Hs, Hw, R, pc, L, B) (Pstate) Sp ::= (Hs, Hw, R, pc, B) (Spred) p ∈ Pstate → Prop (Invariant) Inv ∈ Heap → Prop (ProgSpec) Φ ::= (Inv, [Ψ1, . . . , Ψn]) (CdHpSpec) Ψ ::= {f à p} ∗ (Coerc) [[S]] ::= (Hs, Hw, R, pc, B) Fig.3. Specification syntax. The validity of a precondition depends on the local data structures only. Note that the log file L is excluded, because programmers will not observe the transitions of the log file and the well-formedness of a commit instruction relies on the well-formedness of both branches, the state of the log file is not vital. 3.3 Inference Rules The inference rules for a program are shown in Fig.4, and the rules for instructions are shown in Fig.5. We use the following judgement forms to define the inference rules: Φ, [p1, . . . , pn] ` P (well-formed program), Ψ,Inv ` {p} (C, H, T) (well-formed thread), Ψ,Inv ` C : Ψ 0 (well-formed code heap), Ψ,Inv ` {p} f : I (well-formed instr. sequence). A program is well-formed if the global heap respects the invariant on it and every thread of the program is
Long Li et al.:Certifying Concurrent Programs Using TM 115 well-formed.Thus the verification of a multi-threaded A thread is well-formed if the precondition of the program can be decomposed into the verification of its thread is satisfied and the invariant on the global heap component threads.The reasoning is thread-modular holds.Also the code heap and the instruction sequence in our system. the thread is going to execute are required to be well- formed. 电,p1,.,pn]-P(Well-formed program) And a code heap is well-formed only if every instruc- tion sequence of it is well-formed. Φ=(nu,[1,,n]) Most inference rules for instruction sequence are similar and grouped in the SEQ rule.It requires that Inv H k.乎k,Inu上{pg}(C,H,TS(tk) (PROG) the precondition together with the global invariant en- Φ,p1,pnF(C,H,TS) sures the safe execution of the instruction:and that the 亚,Inur{p}(C,H,T)(Wel-formed thread) resulting state respects the global invariant and satis- Inv H p (HsHw,R,f,B) fies the postcondition which also serves as the precon- dition of the remaining instruction sequence.Note that 亚,Inw上C:亚亚,lmu上{p}f:C[] 业,Inw上{p}(C,H,(H,Hw,R,f,L,B) (THRD) the STARTTRANS rule is also grouped in SEQ rule.Even though the instruction copies the global heap to its local 亚,InvHC:亚(We--formed code heap) heap,it does not need special treatment for its behav- Hf∈dom():重,Inek{w'(f》f:Cf] ior. (CDHP) 乎,nu上C:亚 The BNE and BEQ rules are similar and we present one here.Also they do not make much difference from the Fig.4.Inference rules. SEQ rule except that they check both branches where the control fow will switch depending on the equiva- lence of the values that corresponding registers store. Invp}f:I(Well-formed instruction sequence) The J and JR rules require that it is safe to make a 亚(B.pc)=p"亚,Inu上{p}f+1:I jump to the target address and the global invariant is VS.ImuS.HApS→V(L,w)∈S.L,S.H(0=w preserved after jumping. Inv (Step (S)).HA p'[Step (S)] The COMMIT rule is similar to the BEQ rule except S.InuS.HAp[→(L,w)∈s.L,s.H(U)=w that it branches due to the consistence of the log file. -Inv (Step (S)).H p"[Step(S)] It checks that both successful commit case and roll- (COMMIT) back case are well-formed.And the requirement that Ψ,nuF{p}f:commit:I the resulting state respects the global invariant after 亚(f')=p successful commit actually claims for the checking of S.ImuS.HAp→ the update made by the successfully committed trans- Inv (Step (S)).HA p'[Step (S)] () action on the global heap H.Furthermore,it is easy Ψ,Inwr{p}f:jf' to show that rollback is always allowed.A rollback is (R(r)》=p caused by an update on the global heap.Since we will HS.Invs.HApS→ restore local data structures of transactions,the differ- Inv (Step (S)).HA p'[Step (S)] ence of start and rollback state of a transaction is the (JR) 乎,lnw卜{p}f:jrrs global heap only.But all updates on the global heap (f')=p”重,nwk{p}f+1:I are required to respect the global invariant,and the precondition relies on local data structures only.So it VS.Inv S.HA p [S]-S.R(ra)=S.R(rt) is always safe to roll back a transaction. -Inv (Step (S)).H p"[Step (S)] As shown in the inference rules,the invariant on the HS.Imvs.HAp一s.R(rs)≠s.R(r) Inv (Step (S)).H p'[Step (S)] global heap is required to preserve for every instruc- (BEQ) tion rule,so the invariant on the global heap is guar- 亚,Inu下{p}f:beg rs,rt,f':I anteed all through the program.And the transitions of Efaddu,addiu,subu,sltu,andi,Iw,sw,starttrans} a thread are tracked in the preconditions of its instruc- 业,nw上{p}f+1:I tions. VS.nuS.HApS一 Inv(Step(S)).Hp'[Step (S)] 3.4 Soundness (SEQ) Ψ,Inw上{p}f:;I The soundness of these inference rules with re- Fig.5.Inference rules cont'. spect to the operational semantics of the abstract ma-
Long Li et al.: Certifying Concurrent Programs Using TM 115 well-formed. Thus the verification of a multi-threaded program can be decomposed into the verification of its component threads. The reasoning is thread-modular in our system. Φ, [p1, . . . , pn] ` P (Well-formed program) Φ = (Inv, [Ψ1, . . . , Ψn]) Inv H ∀ k. Ψk, Inv ` {pk} (C, H, TS(tk)) Φ, [p1, . . . , pn] ` (C, H, TS) (prog) Ψ, Inv ` {p} (C, H, T) (Well-formed thread) Inv H p (Hs, Hw, R, f, B) Ψ, Inv ` C : Ψ Ψ, Inv ` {p} f : C[f] Ψ, Inv ` {p} (C, H, (Hs, Hw, R, f, L, B)) (thrd) Ψ, Inv ` C : Ψ0 (Well-formed code heap) ∀ f ∈ dom(Ψ0 ) : Ψ, Inv ` {Ψ0 (f)} f : C[f] Ψ, Inv ` C : Ψ0 (cdhp) Fig.4. Inference rules. Ψ, Inv ` {p} f : I (Well-formed instruction sequence) Ψ(B.pc) = p 00 Ψ, Inv ` {p 0} f +1 : I ∀ S. Inv S.H ∧ p [[S]] → ∀ (l, w) ∈ S.L, S.H(l) = w → Inv (Step (S)).H ∧ p 0 [[Step (S)]] ∀ S. Inv S.H ∧ p [[S]] → ¬∀ (l, w) ∈ S.L, S.H(l) = w → Inv (Step (S)).H ∧ p 00 [[Step (S)]] Ψ, Inv ` {p} f : commit;I (commit) Ψ(f 0 ) = p 0 ∀ S. Inv S.H ∧ p [[S]] → Inv (Step (S)).H ∧ p 0 [[Step (S)]] Ψ, Inv ` {p} f : j f 0 (j) Ψ(R(rs)) = p 0 ∀ S. Inv S.H ∧ p [[S]] → Inv (Step (S)).H ∧ p 0 [[Step (S)]] Ψ, Inv ` {p} f : jr rs (jr) Ψ(f 0 ) = p 00 Ψ, Inv ` {p 0} f +1 : I ∀ S. Inv S.H ∧ p [[S]] → S.R(rs) = S.R(rt) → Inv (Step (S)).H ∧ p 00 [[Step (S)]] ∀ S. Inv S.H ∧ p [[S]] → S.R(rs) 6= S.R(rt) → Inv (Step (S)).H ∧ p 0 [[Step (S)]] Ψ, Inv ` {p} f : beq rs, rt, f0 ;I (beq) ι ∈ {addu, addiu, subu, sltu, andi, lw, sw, starttrans} Ψ, Inv ` {p 0} f +1 : I ∀ S. Inv S.H ∧ p [[S]] → Inv (Step (S)).H ∧ p 0 [[Step (S)]] Ψ, Inv ` {p} f : ι;I (seq) Fig.5. Inference rules cont’. A thread is well-formed if the precondition of the thread is satisfied and the invariant on the global heap holds. Also the code heap and the instruction sequence the thread is going to execute are required to be wellformed. And a code heap is well-formed only if every instruction sequence of it is well-formed. Most inference rules for instruction sequence are similar and grouped in the SEQ rule. It requires that the precondition together with the global invariant ensures the safe execution of the instruction; and that the resulting state respects the global invariant and satis- fies the postcondition which also serves as the precondition of the remaining instruction sequence. Note that the STARTTRANS rule is also grouped in SEQ rule. Even though the instruction copies the global heap to its local heap, it does not need special treatment for its behavior. The BNE and BEQ rules are similar and we present one here. Also they do not make much difference from the SEQ rule except that they check both branches where the control flow will switch depending on the equivalence of the values that corresponding registers store. The J and JR rules require that it is safe to make a jump to the target address and the global invariant is preserved after jumping. The COMMIT rule is similar to the BEQ rule except that it branches due to the consistence of the log file. It checks that both successful commit case and rollback case are well-formed. And the requirement that the resulting state respects the global invariant after successful commit actually claims for the checking of the update made by the successfully committed transaction on the global heap H. Furthermore, it is easy to show that rollback is always allowed. A rollback is caused by an update on the global heap. Since we will restore local data structures of transactions, the difference of start and rollback state of a transaction is the global heap only. But all updates on the global heap are required to respect the global invariant, and the precondition relies on local data structures only. So it is always safe to roll back a transaction. As shown in the inference rules, the invariant on the global heap is required to preserve for every instruction rule, so the invariant on the global heap is guaranteed all through the program. And the transitions of a thread are tracked in the preconditions of its instructions. 3.4 Soundness The soundness of these inference rules with respect to the operational semantics of the abstract ma-
116 J.Comput.Sci.Technol.,Jan.2009,Vol.24.No.1 chine is established following the syntactic approach which multiple threads cooperate on are checked.The to proving type soundness.From the "progress"and expressiveness and the application to high-level pro- "preservation"lemmas,we can guarantee that given a grams of such invariance proof method have been well- well-formed program under compatible preconditions, known.It is the atomicity and isolation properties of the current instruction sequence will be able to exe- transactional memory that make the requirement of in- cute without getting "stuck".Furthermore,any safety variant on shared memory reasonable. property derivable from the global invariant will hold In this section,we use examples to demonstrate the throughout the execution.The soundness of our system effectiveness of our reasoning system.And we present is formally stated as Theorem 1. examples in high-level programs and their assembly Lemma1(Progress).Φ=(Inw,[i,,乎n]).f counterparts,showing that reasoning about programs there exist p1,,pn,such that重,p,,pnJ卜P,then using TM in assembly level is essentially the same as in there exists a program P such that PP. high-level under such reasoning. Lemma2(Preservation).Φ=(Inv,[i,…, 亚nl).f重,p1,,Pnl上P and P-一P,then there 4.1 Dining Philosopher Algorithm erist p,,p such that重,p,.,pn]上P. Theorem 1 (Soundness).=(Inv,[1,..., A simplified example of dining philosophers in trans- 乎n]).If there erist p1,,pn,such that重,p1,· actional program style is shown in Fig.7,where two Pn]P,then for any n≥0,there erist a programP philosophers (Threadi and Thread2)share two forks andp,…,p such that P-→nP'and重,pi,,pn]H (represented by memory locations forki and fork2). A philosopher picks up a fork by writing the thread id (1 or 2)into the memory location representing the fork, 3.5 Separation Logic and puts down a fork by writing 0. We describe heaps using separation logicl271.Ex- Variables: Invariant: cept for implication,separation logic is similar to lin- nat fork1,fork2: fork1 fork2 ear logic,where register and heap bindings are treated as intuitionistic and linear resources,respectively,ex- Thread1: Thread2: tended with an axiom restricting each heap address to while (TRUE){ while (TRUE){ a single binding.We write A and B for heap predicates. starttrans; starttrans; We write HA if the heap predicate A holds on heap if (fork1){ if (fork2){ H.The syntax for the fragment of separation logic we forki =1; fork2 =2; use is given in Fig.6. fork2 =1; fork =2; commit; commit; A,B ::nm|emp A*B AA B AV B } |3x:P.B|廿x∈S.A else else commit; commit Fig.6.Separation logic. continue; continue; Now we describe these predicates informally.nm holds if the heap consists entirely of the binding of n to /eat /eat m.emp holds only on the empty heap.A*B holds if starttrans; starttrans; the heap can be split into two disjoint parts such that A fork1 =0; fork1=0; holds on one and B on the other.AA B holds if both A fork2 =0; fork2 =0; and B hold on the entire heap.AV B holds if either A commit; commit; or B holds on the heap.3x:P.B holds if there exists /think //think an z of type P such that Bz holds on the heap.P will be omitted when it is clear from the context.Vx E S.A Fig.7.Simplified example of dining philosopher algorithm. holds on a heap that satisfies Ax for all x in the finite set S. The invariant on the shared memory can be ex- pressed as(fork1=fork2),indicating that both forks 4 Examples are free(when they are all equal to 0)or a philosopher is holding them both.In the more general case of three or In our system,not only the invariant on each thread more philosophers,the invariant on the shared memory of a program but also the invariant on the global heap can be expressed as (V i.forki=0Vforki=forki+1=
116 J. Comput. Sci. & Technol., Jan. 2009, Vol.24, No.1 chine is established following the syntactic approach to proving type soundness. From the “progress” and “preservation” lemmas, we can guarantee that given a well-formed program under compatible preconditions, the current instruction sequence will be able to execute without getting “stuck”. Furthermore, any safety property derivable from the global invariant will hold throughout the execution. The soundness of our system is formally stated as Theorem 1. Lemma 1 (Progress). Φ = (Inv, [Ψ1, . . . , Ψn]). If there exist p1, . . . , pn, such that Φ, [p1, . . . , pn] ` P, then there exists a program P 0 such that P 7−→ P 0 . Lemma 2 (Preservation). Φ = (Inv, [Ψ1, . . ., Ψn]). If Φ, [p1, . . . , pn] ` P and P 7−→ P 0 , then there exist p 0 1 , . . . , p0 n such that Φ, [p 0 1 , . . . , p0 n ] ` P 0 . Theorem 1 (Soundness). Φ = (Inv, [Ψ1, . . ., Ψn]). If there exist p1, . . . , pn, such that Φ, [p1, . . ., pn] ` P, then for any n > 0, there exist a program P 0 and p 0 1 , . . . , p0 n such that P 7−→n P 0 and Φ, [p 0 1 , . . . , p0 n ] ` P 0 . 3.5 Separation Logic We describe heaps using separation logic[27]. Except for implication, separation logic is similar to linear logic, where register and heap bindings are treated as intuitionistic and linear resources, respectively, extended with an axiom restricting each heap address to a single binding. We write A and B for heap predicates. We write H ° A if the heap predicate A holds on heap H. The syntax for the fragment of separation logic we use is given in Fig.6. A, B ::= n 7→ m | emp | A ∗ B | A ∧ B | A ∨ B | ∃ x : P.B | ∀ x ∈ S.A Fig.6. Separation logic. Now we describe these predicates informally. n 7→ m holds if the heap consists entirely of the binding of n to m. emp holds only on the empty heap. A ∗ B holds if the heap can be split into two disjoint parts such that A holds on one and B on the other. A∧B holds if both A and B hold on the entire heap. A ∨ B holds if either A or B holds on the heap. ∃ x : P.B holds if there exists an x of type P such that Bx holds on the heap. P will be omitted when it is clear from the context. ∀ x ∈ S.A holds on a heap that satisfies Ax for all x in the finite set S. 4 Examples In our system, not only the invariant on each thread of a program but also the invariant on the global heap which multiple threads cooperate on are checked. The expressiveness and the application to high-level programs of such invariance proof method have been wellknown. It is the atomicity and isolation properties of transactional memory that make the requirement of invariant on shared memory reasonable. In this section, we use examples to demonstrate the effectiveness of our reasoning system. And we present examples in high-level programs and their assembly counterparts, showing that reasoning about programs using TM in assembly level is essentially the same as in high-level under such reasoning. 4.1 Dining Philosopher Algorithm A simplified example of dining philosophers in transactional program style is shown in Fig.7, where two philosophers (Thread1 and Thread2) share two forks (represented by memory locations fork1 and fork2). A philosopher picks up a fork by writing the thread id (1 or 2) into the memory location representing the fork, and puts down a fork by writing 0. Variables: Invariant: nat fork1, fork2; fork1 = fork2 Thread1 : Thread2 : while (TRUE) { while (TRUE) { starttrans; starttrans; if (! fork1){ if (! fork2){ fork1 = 1; fork2 = 2; fork2 = 1; fork1 = 2; commit; commit; } } else { else { commit; commit; continue; continue; } } // eat // eat starttrans; starttrans; fork1 = 0; fork1 = 0; fork2 = 0; fork2 = 0; commit; commit; // think // think } } Fig.7. Simplified example of dining philosopher algorithm. The invariant on the shared memory can be expressed as (fork1 = fork2), indicating that both forks are free (when they are all equal to 0) or a philosopher is holding them both. In the more general case of three or more philosophers, the invariant on the shared memory can be expressed as (∀ i.forki = 0∨forki = forki+1 =
Long Li et al.:Certifying Concurrent Programs Using TM 117 iv forki forki-1=i-1),indicating that the i-th next to him always.Note that we cannot ensure fairness philosopher is not holding forks or he is holding both in our system.Verification of such liveness properties forks he need or his fork is being held by others,where is part of our future work. forks are free or picked in pairs and a philosopher is try- The corresponding specification and program for our ing to pick his own fork and the fork of the philosopher system are shown in Fig.8.Only the code for Thread is given because that of Thread2 is similar.In this ex- InuH是u.HIfork1→v*fork2→v ample,Inv defines the invariant on the global heap for this program (fork fork2). L1:-{pc=L1} We explain the code block labeled L1,which cor- starttrans responds to the operations of picking up both forks -{日v.H,I fork1一v*fork2一v AHe IF emp A B.pc L1} for philosopher 1.The precondition of this block in- dicates that all jumpings to this block are allowed be- lw ti forki zero cause (pc =L1)is always valid when it is about to ex- -{日v.H,I fork1一v*fork2一U AH IF emp A B.pc L1 A R(t1)=H (fork1)} ecute the instruction labeled L1.After the block starts bne t1 zero L3 a transaction,the write attempts H of this transac- -{Hs I fork1一0*fork2→0 tion are set empty and the global heap H is copied to AHe I-emp A B.pc L1} local H.,so the local heap also respects the global in- addiu t2 zero 1 variant here (Inv H.).Also the target of rollback is -{HsI-fork1一0*fork2→0 set where the transaction starts (B.pc =L1).Then it AHe I-emp A B.pc L1 A R(t2)=1} tries to pick up both forks (updates H.)after checking sw t2 fork1 zero that the forks are free at its view.Note that the invari- -{H I-fork11*fork20 ant is broken locally(H,fork1一1*fork2一0) AHe IF fork11A B.pc L1 A R(t2)=1} when the first fork is picked(sw t2 fork zero),and sw t2 fork2 zero is reestablished when the other fork is also picked before -{H。IFfork1-1*for2一1 the transaction commits.Then it commits its picking AH IF fork1一1*fork2→1∧B.pc=L1} up(Hw l fork1→1*fork2→l)to the global heap commit H if no forks have been picked and are being held by -{true} the other (Thread2)during its try. j L2 Note that the reasoning of Thread needs no infor- mation of Thread2.Due to the guarantee that all modi- L2:-{pc=L2 fications to the global heap will respect the global in- starttrans variant,we are able to perform such reasoning.And -{臼v.Hs l fork1→v*fork2一U it is obvious that the composition of such reasoning is AHLe I-emp A B.pc L2} easy,since we need only to identify the compatibility addiu t3 zero 0 of the invariants of different threads to increase concur- -{月v.Hs l fork1→v*fork2一U rency. AHe I-emp A B.pc L2 A R(t3)=0} sw t3 fork zero When reasoning about the lock-based program for -{扫v.Hs l fork1→0*fork2一u this examplel211,programmers are required to en- AHLe I-fork10A B.pc L2A R(t3)=0} sure the non-interference between every two threads. sw t3 fork2 zero And when composing threads,the non-interference be- -{Hs IF fork1→0*for2→0 tween the newly-added one and all the existent is re- AHw I-fork1一0*fork2→0∧B.pc=L2]} quired.The concurrency-control mechanism,transac- commit tional memory,significantly eases the reasoning. -{true} Interestingly,we can perform a similar reasoning for j L1 the corresponding high-level programs in Fig.7.We know that (fork fork2)is valid when the trans- L3:-{pc=L3AH4 Ifork1→1*fork2→1 action starts.And the forks are free to pick if(fork= AHw I-emp A B.pc =L1} 0=fork2).Then the philosopher will commit his pick- commit ing attempts successfully if shared memory has not been -{true updated by the other thread.Finally,shared memory j L1 remains in a state that respects the invariant (forki= Fig.8.Dining Philosophers in our system. 1=fork2)
Long Li et al.: Certifying Concurrent Programs Using TM 117 i ∨ forki = forki−1 = i − 1), indicating that the i-th philosopher is not holding forks or he is holding both forks he need or his fork is being held by others, where forks are free or picked in pairs and a philosopher is trying to pick his own fork and the fork of the philosopher Inv H , ∃ v. H ° fork1 7→ v ∗ fork2 7→ v L1: −{pc = L1} starttrans −{∃ v.Hs ° fork1 7→ v ∗ fork2 7→ v ∧Hw ° emp ∧ B.pc = L1} lw t1 fork1 zero −{∃ v.Hs ° fork1 7→ v ∗ fork2 7→ v ∧Hw ° emp ∧ B.pc = L1 ∧ R(t1) = Hs(fork1)} bne t1 zero L3 −{Hs ° fork1 7→ 0 ∗ fork2 7→ 0 ∧Hw ° emp ∧ B.pc = L1} addiu t2 zero 1 −{Hs ° fork1 7→ 0 ∗ fork2 7→ 0 ∧Hw ° emp ∧ B.pc = L1 ∧ R(t2) = 1} sw t2 fork1 zero −{Hs ° fork1 7→ 1 ∗ fork2 7→ 0 ∧Hw ° fork1 7→ 1 ∧ B.pc = L1 ∧ R(t2) = 1} sw t2 fork2 zero −{Hs ° fork1 7→ 1 ∗ fork2 7→ 1 ∧Hw ° fork1 7→ 1 ∗ fork2 7→ 1 ∧ B.pc = L1} commit −{true} j L2 L2: −{pc = L2} starttrans −{∃ v.Hs ° fork1 7→ v ∗ fork2 7→ v ∧Hw ° emp ∧ B.pc = L2} addiu t3 zero 0 −{∃ v.Hs ° fork1 7→ v ∗ fork2 7→ v ∧Hw ° emp ∧ B.pc = L2 ∧ R(t3) = 0} sw t3 fork1 zero −{∃ v.Hs ° fork1 7→ 0 ∗ fork2 7→ v ∧Hw ° fork1 7→ 0 ∧ B.pc = L2 ∧ R(t3) = 0} sw t3 fork2 zero −{Hs ° fork1 7→ 0 ∗ fork2 7→ 0 ∧Hw ° fork1 7→ 0 ∗ fork2 7→ 0 ∧ B.pc = L2} commit −{true} j L1 L3: −{pc = L3 ∧ Hs ° fork1 7→ 1 ∗ fork2 7→ 1 ∧Hw ° emp ∧ B.pc = L1} commit −{true} j L1 Fig.8. Dining Philosophers in our system. next to him always. Note that we cannot ensure fairness in our system. Verification of such liveness properties is part of our future work. The corresponding specification and program for our system are shown in Fig.8. Only the code for Thread1 is given because that of Thread2 is similar. In this example, Inv defines the invariant on the global heap for this program (fork1 = fork2). We explain the code block labeled L1, which corresponds to the operations of picking up both forks for philosopher 1. The precondition of this block indicates that all jumpings to this block are allowed because (pc = L1) is always valid when it is about to execute the instruction labeled L1. After the block starts a transaction, the write attempts Hw of this transaction are set empty and the global heap H is copied to local Hs, so the local heap also respects the global invariant here (Inv Hs). Also the target of rollback is set where the transaction starts (B.pc = L1). Then it tries to pick up both forks (updates Hs) after checking that the forks are free at its view. Note that the invariant is broken locally (Hs ° fork1 7→ 1 ∗ fork2 7→ 0) when the first fork is picked (sw t2 fork1 zero), and is reestablished when the other fork is also picked before the transaction commits. Then it commits its picking up (Hw ° fork1 7→ 1 ∗ fork2 7→ 1) to the global heap H if no forks have been picked and are being held by the other (Thread2) during its try. Note that the reasoning of Thread1 needs no information of Thread2. Due to the guarantee that all modi- fications to the global heap will respect the global invariant, we are able to perform such reasoning. And it is obvious that the composition of such reasoning is easy, since we need only to identify the compatibility of the invariants of different threads to increase concurrency. When reasoning about the lock-based program for this example[21], programmers are required to ensure the non-interference between every two threads. And when composing threads, the non-interference between the newly-added one and all the existent is required. The concurrency-control mechanism, transactional memory, significantly eases the reasoning. Interestingly, we can perform a similar reasoning for the corresponding high-level programs in Fig.7. We know that (fork1 = fork2) is valid when the transaction starts. And the forks are free to pick if (fork1 = 0 = fork2). Then the philosopher will commit his picking attempts successfully if shared memory has not been updated by the other thread. Finally, shared memory remains in a state that respects the invariant (fork1 = 1 = fork2)
118 J.Comput.Sci.Technol.,Jan.2009,Vol.24,No.1 Usually,high-level languages use a programming structure atomicf...}to identify a transaction.Such a InuH≌3t1,2,tg,t4.HIFul1一U1*Empty一2* transaction structure denies the flexibility of construct- In一v3*0ut一U4 ing a transaction,where branches may lead to different A v1 TRUE-2 FALSE components of a same start of a transaction,as shown AU2=TRUE→U1=FALSE in Fig.7.For the same philosopher thread,an auxil- A v3 BFSIZE A v4 BFSIZE iary variable is needed to record the picking attempts L.start: -[pc L.start inside the transaction and used to decide whether to starttrans try for forks again before eating.Fig.9 presents the lw rfull Full zero code fragment using an auxiliary variable for the atomic addiu rtrue zero TRUE bne rfull rtrue Laddin block.We can see that the assembly-level abstraction j L_commit does provide certain convenience. Laddin: -{B.pc =L_start A Hu l emp A了,2,3,v4.Hs IF Fu11一1* while (TRUE){ Empty一2*In→3*0ut一v4 picked FALSE; A U3 BFSIZE A v4 BFSIZE atomicf A R(rtrue)=TRUE Av1 TRUE} if (fork1){ lw rin In zero fork fork2 =1; addiu rin rin 1 picked TRUE: addiu rtemp zero BFSIZE } beq rin rtemp L.subR jLfcheck if (picked)continue; Lfcheck: /eat sw rin In zero lw rout Out zero beq rin rout Lfset jL-setemp Fig.9.Auxiliary variable in atomic block. L_setemp: -{B.pc L.start A31,2,3,U4.Hs I-Fu11一1* 4.2 Producer-Consumer Algorithm Empty一2*In一3*0ut一v4 A v3 BFSIZE A v4 BFSIZE Fig.10 presents the code for the producer-consumer ∧(U1≠TRUE A H In一ts) algorithm where producers keep putting goods into the V(v1=TRUE A Hw I-Ful1→U1*In-v3)} addiu remp zero FALSE Variables: Invariant: sw remp Empty zero bool Full,Empty; !(Ful1 A Empty) j L_commit int In,Out; 0≤In,Out<BFSIZE L_commit: -[B.pc=L.start A 3 v1:U3;04.Hs IF Fullv1* Producer: Consumer: Empty一FALSE*In一3*Out一v4 while (TRUE){ while (TRUE){ A U3 BFSIZE A v4 BFSIZE starttrans; starttrans; ∧(U1≠TRUE if (Full) if (Empty) AH Empty一FALSE*In一g) V(v1 TRUE In++; 0ut++; ∧(He I Ful1一U1*Empty→FALSE if (In ==BFSIZE) if (Out -BFSIZE) In→3 V H I-emp)} In BFSIZE; Out -BFSIZE; commit if (In =Out) if (Out =In) jL-start Full TRUE; Empty TRUE; L_subR: Empty FALSE; Full FALSE; subu rin rin rtemp } } jL.fcheck commit; commit; L fset: } sw rtrue Full zero j L.setemp Fig.10.Producer-consumer. Fig.11.Producer-consumer in our system
118 J. Comput. Sci. & Technol., Jan. 2009, Vol.24, No.1 Usually, high-level languages use a programming structure atomic{. . . } to identify a transaction. Such a transaction structure denies the flexibility of constructing a transaction, where branches may lead to different components of a same start of a transaction, as shown in Fig.7. For the same philosopher thread, an auxiliary variable is needed to record the picking attempts inside the transaction and used to decide whether to try for forks again before eating. Fig.9 presents the code fragment using an auxiliary variable for the atomic block. We can see that the assembly-level abstraction does provide certain convenience. while (TRUE) { picked = FALSE; atomic{ if (! fork1){ fork1 = fork2 = 1; picked = TRUE; } } if (! picked) continue; // eat . . . Fig.9. Auxiliary variable in atomic block. 4.2 Producer-Consumer Algorithm Fig.10 presents the code for the producer-consumer algorithm where producers keep putting goods into the Variables: Invariant: bool Full, Empty; ! (Full ∧ Empty) int In, Out; 0 6 In, Out < BFSIZE Producer : Consumer : while (TRUE) { while (TRUE) { starttrans; starttrans; if (! Full) if (! Empty) { { In+ +; Out+ +; if (In == BFSIZE) if (Out == BFSIZE) In -= BFSIZE; Out -= BFSIZE; if (In == Out) if (Out == In) Full = TRUE; Empty = TRUE; Empty = FALSE; Full = FALSE; } } commit; commit; } } Fig.10. Producer-consumer. Inv H , ∃ v1, v2, v3, v4. H ° Full 7→ v1 ∗ Empty 7→ v2∗ In 7→ v3 ∗ Out 7→ v4 ∧ v1 = TRUE → v2 = FALSE ∧ v2 = TRUE → v1 = FALSE ∧ v3 < BFSIZE ∧ v4 < BFSIZE L start: −{pc = L start} starttrans lw rfull Full zero addiu rtrue zero TRUE bne rfull rtrue L addin j L commit L addin: −{B.pc = L start ∧ Hw ° emp ∧ ∃ v1, v2, v3, v4. Hs ° Full 7→ v1∗ Empty 7→ v2 ∗ In 7→ v3 ∗ Out 7→ v4 ∧ v3 < BFSIZE ∧ v4 < BFSIZE ∧ R(rtrue) = TRUE ∧ v1 6= TRUE} lw rin In zero addiu rin rin 1 addiu rtemp zero BFSIZE beq rin rtemp L subR j L fcheck L fcheck: sw rin In zero lw rout Out zero beq rin rout L fset j L setemp L setemp: −{B.pc = L start ∧ ∃ v1, v2, v3, v4. Hs ° Full 7→ v1∗ Empty 7→ v2 ∗ In 7→ v3 ∗ Out 7→ v4 ∧ v3 < BFSIZE ∧ v4 < BFSIZE ∧ ((v1 6= TRUE ∧ Hw ° In 7→ v3) ∨ (v1 = TRUE ∧ Hw ° Full 7→ v1 ∗ In 7→ v3))} addiu remp zero FALSE sw remp Empty zero j L commit L commit: −{B.pc = L start ∧ ∃ v1, v3, v4. Hs ° Full 7→ v1∗ Empty 7→ FALSE ∗ In 7→ v3 ∗ Out 7→ v4 ∧ v3 < BFSIZE ∧ v4 < BFSIZE ∧ ((v1 6= TRUE ∧ Hw ° Empty 7→ FALSE ∗ In 7→ v3) ∨ (v1 = TRUE ∧ (Hw ° Full 7→ v1 ∗ Empty 7→ FALSE∗ In 7→ v3 ∨ Hw ° emp)))} commit j L start L subR: subu rin rin rtemp j L fcheck L fset: sw rtrue Full zero j L setemp Fig.11. Producer-consumer in our system
Long Li et al.:Certifying Concurrent Programs Using TM 119 storage while consumers keep consuming goods.The of interaction.And we formalize our reasoning into a storage is modeled as a finite First In First Out queue in PCC[16]system. the program.Threads(producers and consumers)oper- Recent years,Shao et al.have developed CCAP211 ate on the queue concurrently while keeping the queue CMAP[22]and SAGL[36]as extensions to the PCC consistent:the queue will not overflow and will not framework to verify properties of concurrent programs be full and empty simultaneously.This invariant is ex- using locks.And we present an extension to enable pressed as!(Full A Empty)A(0≤In,Out<BFSIZE), verification of concurrent programs using TM. where Full and Empty indicate the queue is full or Also,much work has started to explore the seman- empty respectively,In points to the location where pro- tics of transactional memory.Moore and Grossman[13] ducers are about to put goods and Out points to the have presented type systems to study the behaviors of location where consumers are about to consume goods. spawning threads in transactional programs and showed The assembly code for producers and the corre- the relation of strong and weak atomicity in TM sys- sponding invariant for the program are presented in tem's implementations.They presented the semantics Fig.11.The specifications are also shown of where a of transactions in a A-calculus.In a later version of their producer thread is about to start a transaction (labeled work4l,they expanded to study the semantics of the L-start,to increase goods (labeled L_addin),to sig- rollback operation for eager versioning in details.They nal the non-emptiness of the storage(labeled L-setemp) used different type systems to study different issues. and to commit the transaction(labeled L_commit).And Abadi et al.15]developed semantics and type sys- the code and specifications for consumers are similar tems for the Automatic Mutual Exclusion (AME)cal- and not shown here. culus,which encourage programmers to place as much Producers and consumers would start transactions of the program text inside transactions as possible. and collect effects they attempt to make based on their They also presented the semantics and correctness of local views to the global heap.Then they would commit the rollback operation in their systems. their modifications to the global heap with respect to Vitek et al.[1]used a variant of Featherweight the global invariant when their attempts are validated. Javal37 to define a framework to explore various im- Our work is fully mechanized in the Coq proof as- plementation strategies of TM systems and to ensure sistant,including the system,its soundness proof and the correctness of an implementation by establishing a the examples that are demonstrated above.Interested serializability result.And they assumed all codes exe- readers can check out our Coq implementation[281 for cute inside transactions. more details However,most work on TM semantics has just ex- plored definition of abstract machines and execution of Related Work transactional programs,and does not propose rules for certifying general properties of concurrent programs us- Many methods have been proposed for reasoning ing TM. about properties of both sequential and concurrent programs(29-331.But most efforts on concurrent pro 6 Conclusion grams focus on the lock-based programs and do not aim at programs using TM.And the lock-based reason- In this paper we present a system for verifying ing requires programmers to reason about the interfer- concurrent programs using transactional memory.We ence between threads.But as we present in this paper, modeled an assembly-level machine with built-in trans- TM-based reasoning does not.There is an easier way actional memory.We incorporated the well-known enforced by the TM semantics to perform TM-based invariant-based proof method into our system by requir- reasoning. ing the preservation of invariant not only on properties Harris et al.4 require an explicit definition of in- of each thread of a program but also on the shared mem- variants from programmers to track the dependencies ory these threads cooperate on.We proved our reason- between transactions.But they do not specify the prop- ing sound with respect to the TM semantics.And we erties of a program.Concurrent separation logicl35] used a few examples to demonstrate the effectiveness of separates shared memory into pieces,each of which is our techniques.For now,our system can reason about guarded by a lock and respects certain invariant,to safety properties of programs using TM only.Enriching perform the verification of lock-based concurrent pro- the expressiveness of the program logic of our system grams.We are enlightened by these work and specify to enable the verification of liveness properties such as invariants on shared memory to check the correctness fairness will be part of our future work.And we also
Long Li et al.: Certifying Concurrent Programs Using TM 119 storage while consumers keep consuming goods. The storage is modeled as a finite First In First Out queue in the program. Threads (producers and consumers) operate on the queue concurrently while keeping the queue consistent: the queue will not overflow and will not be full and empty simultaneously. This invariant is expressed as ! (Full ∧ Empty) ∧ (0 6 In, Out < BFSIZE), where Full and Empty indicate the queue is full or empty respectively, In points to the location where producers are about to put goods and Out points to the location where consumers are about to consume goods. The assembly code for producers and the corresponding invariant for the program are presented in Fig.11. The specifications are also shown of where a producer thread is about to start a transaction (labeled L start, to increase goods (labeled L addin), to signal the non-emptiness of the storage (labeled L setemp) and to commit the transaction (labeled L commit). And the code and specifications for consumers are similar and not shown here. Producers and consumers would start transactions and collect effects they attempt to make based on their local views to the global heap. Then they would commit their modifications to the global heap with respect to the global invariant when their attempts are validated. Our work is fully mechanized in the Coq proof assistant, including the system, its soundness proof and the examples that are demonstrated above. Interested readers can check out our Coq implementation[28] for more details. 5 Related Work Many methods have been proposed for reasoning about properties of both sequential and concurrent programs[29−33]. But most efforts on concurrent programs focus on the lock-based programs and do not aim at programs using TM. And the lock-based reasoning requires programmers to reason about the interference between threads. But as we present in this paper, TM-based reasoning does not. There is an easier way enforced by the TM semantics to perform TM-based reasoning. Harris et al.[34] require an explicit definition of invariants from programmers to track the dependencies between transactions. But they do not specify the properties of a program. Concurrent separation logic[35] separates shared memory into pieces, each of which is guarded by a lock and respects certain invariant, to perform the verification of lock-based concurrent programs. We are enlightened by these work and specify invariants on shared memory to check the correctness of interaction. And we formalize our reasoning into a PCC[16] system. Recent years, Shao et al. have developed CCAP[21] , CMAP[22] and SAGL[36] as extensions to the PCC framework to verify properties of concurrent programs using locks. And we present an extension to enable verification of concurrent programs using TM. Also, much work has started to explore the semantics of transactional memory. Moore and Grossman[13] have presented type systems to study the behaviors of spawning threads in transactional programs and showed the relation of strong and weak atomicity in TM system’s implementations. They presented the semantics of transactions in a λ-calculus. In a later version of their work[14], they expanded to study the semantics of the rollback operation for eager versioning in details. They used different type systems to study different issues. Abadi et al.[15] developed semantics and type systems for the Automatic Mutual Exclusion (AME) calculus, which encourage programmers to place as much of the program text inside transactions as possible. They also presented the semantics and correctness of the rollback operation in their systems. Vitek et al.[11] used a variant of Featherweight Java[37] to define a framework to explore various implementation strategies of TM systems and to ensure the correctness of an implementation by establishing a serializability result. And they assumed all codes execute inside transactions. However, most work on TM semantics has just explored definition of abstract machines and execution of transactional programs, and does not propose rules for certifying general properties of concurrent programs using TM. 6 Conclusion In this paper we present a system for verifying concurrent programs using transactional memory. We modeled an assembly-level machine with built-in transactional memory. We incorporated the well-known invariant-based proof method into our system by requiring the preservation of invariant not only on properties of each thread of a program but also on the shared memory these threads cooperate on. We proved our reasoning sound with respect to the TM semantics. And we used a few examples to demonstrate the effectiveness of our techniques. For now, our system can reason about safety properties of programs using TM only. Enriching the expressiveness of the program logic of our system to enable the verification of liveness properties such as fairness will be part of our future work. And we also