正在加载图片...
that globally ordered tim amns are n out hy an exter he times the Thin th the ther o the r ( (RM)as part of 2PC [round-trip lificd since they have beer (a)Traditional SI (b)RSI Protocol If the TM was able Figure 5:Distributed 2PC Commit Protocols for SI all involved RMs,the trar alRMs round-tripm which installs the new hpitothtengysthenatwokaad2pCjs (va rate effec in e Hictin ns fo record the tin way me ssing time (i also refer ed to e t)vield [49 P(X RID.,the tir amit of a single i the trans e about ict 6入) milr and in ent.So the intution that in p the it-time side the d-ph and it is the cords are (it adds a fixed cos to both ossible sin the comth for smal RID)in its ent ided using c updates the ch lays to d of non-blocking commu tive up tes whi round-trip: 5.mdt0at ntralized c le fall below )high availabilit d-trip s 3 and 4 nflict-likeliho d should no io tributed transaction can not scale antially inc time fo a tr 4.1.3 CPU Overhead 1[42d The increa an likelihood of conflicts is,however,not th 2PC specifically.are doomed to be non-scable. With a (in oti the fol other more traditional 2PC protocolsas well tion (i 412 Increased Contention Likelihood can be ponse If. rts.As outlined in the (an the chance of tention and ge (m)an e(m.)in the centr ie rou 35 wh the sender to n ou (TM 4n and m. 3+4nthat globally ordered timestamps are given out by an exter￾nal service, as suggested in [15] or [17]. Since the timestamp service implementation is orthogonal, we simply assume that the timestamp service is not a potential bottleneck when us￾ing approaches like Spanner [17] or epoch-based SI [62]. After the TM received the CID, it prepares the other nodes involved in the transaction through prepare messages to the resource managers (RM) as part of 2PC [round-trip message 3]. Each RM (1) checks to see if the records in it’s partition have been modified since they have been read by the transaction and (2) locks each tuple to prevent updates by other transactions after the validation[34]. This normally requires checking if any of the records of the write-sets has a higher CID than the RID. If the TM was able to prepare all involved RMs, the trans￾action can be committed by sending a commit message to all RMs [round-trip message 4], which installs the new ver￾sion (value and commit-timestamp) and releases the locks. Moreover, in order to make the new value readable by other transactions, TM needs to wait until the second phase of 2PC completes [message 4], and then inform the timestamp service that a new version was installed [one-way message 5]. For the remainder, we assume that the timestamp ser￾vice implements a logic similar to [15] or Oracle RAC [49] to ensure the SI properties. That is, if a client requests an RID, the timestamp service returns the largest committed timestamp. Finally, the TM notifies the client about the outcome of the transaction [one-way message 6]. Overall the protocol requires 9 one-way message delays if done in the previously outlined sequential order. However, some messages can be done in parallel: the commit-time￾stamp [message 2] can be requested in parallel to prepar￾ing the resource manager [message 3] since the commit￾timestamp is not required until the 2nd phase of 2PC [mes￾sage 4]. This simplification is possible since we assume blind writes are not allowed; therefore a transaction must read all data items (and their corresponding RID) in its working set before attempting to commit. Similarly, the client can be informed [message 6] in parallel with the 2nd phase of 2PC [message 4]. This reduces the number of message delays to 4 until the client can be informed about the outcome (one-way message 1, round-trip 3, one-way message 5), and to at least 6 until the transaction becomes visible (one-way message 1, round-trips 3 and 4, one-way message 6). Compared to a centralized DBMS, the 6 message delays required for 2PC substantially increases the execution time for a transation. Unlike the described 2PC protocol, a traditional 2PC pro￾tocol [42] does not require a timestamp service. However, a traditional 2PC protocol which consists of a prepare and a commit/abort phase still requires 6 message delays in total (including client notification). Thus, the following discus￾sion is not specific to SI and can be generalized towards other more traditional 2PC protocols as well. 4.1.2 Increased Contention Likelihood The increased transaction latencies due to message delays increase the chance of contention and aborts. As outlined in Section 2, the average latency for small one-way messages over Ethernet is roughly 35µs, whereas the actual work of a transaction ranges from 10- 60µs today if no disk or network is involved 2 [31, 24]. That is, for short-running transactions, 2For instance [28] reported 64µs for a single partition trans￾action on an ancient 2008 Xeon processor Client Timestamp Service TM TCP/IP Locks Data RM TCP/IP Locks Data RM Locks Data TCP/IP (1) (2) (3) (3) (4) (4) (6) (5) (a) Traditional SI Client (TM) 0 t2 pl t1 pl 1 t3 pl t2 pl t1 pl 0 t9 pl t7 pl t3 pl 0 t1 pl 1 t2 pl t1 pl 0 t2 pl t1 pl 1 t3 pl t2 pl t1 pl 0 t9 pl t7 pl t3 pl 0 t1 pl 1 t2 pl t1 pl 0 t2 pl t1 pl 1 t3 pl t2 pl t1 pl 0 t9 pl t7 pl t3 pl 0 t1 pl 1 t2 pl t1 pl .... .... .... .... .... .... .... .... .... (1) (2) (2) (3) (3) (2) (3) (4) Timestamp Service (b) RSI Protocol Figure 5: Distributed 2PC Commit Protocols for SI the dominant factor for latency is the network and 2PC just amplifies the bottleneck. In order to model the contention rate effect, we assume an M/M/1 queue X to estimate the number of waiting, i.e., conflicting, transactions for a given record r with some ar￾rival rate λ. With this model, a 6× increase in transaction processing time (i.e., also referred to as service time t) yields to a service capacity decrease of µ = 1/(6t) and thus, an in￾creased conflict likelihood of P(X >= 0) = 1 − P(X = 0) = 1 − (1 − λ/µ) = 6λt. However, a transaction rarely consists of a single record. With n records, the likelihood of a con- flict increases to 1 − Q n P(X = 0) = 1 − (1 − 6λt) n , if we employ the simplifying assumption that the access rate to all records is similar and independent. So the intuition that the likelihood of conflicts with 2PC increases is true. However, we did not consider the read-phase and it is easy to show that the relative difference is less severe as more records are read (it adds a fixed cost to both). In addition, a redesign of the commit protocol to use RDMA verbs can significantly decrease the conflict likelihood since the latency is much lower for small messages (see also Fig￾ure 2(b)). Furthermore, recent work has shown that most conflicts can be avoided using commutative updates [10]. In fact, using newer consistency protocols, it is even possible to take advantage of non-blocking commutative updates while preserving limit constraints (e.g., a product’s stock cannot fall below 0), high availability, and using no centralized co￾ordination [33]. As a result, we believe that the increased conflict-likelihood should no longer be an argument that dis￾tributed transactions can not scale. 4.1.3 CPU Overhead The increased likelihood of conflicts is, however, not the main reason why distributed transactions in general, and 2PC specifically, are doomed to be non-scalable. With an increasing number of server nodes, the number of network messages also increases. In a centralized system, the DBMS only has to handle 2 messages per transaction (i.e., the re￾quest and response to the client). If we assume that the clients can be scaled independently from the server (and are not further considered), the server has to handle one receive message (mr) and one send message (ms) in the centralized case. Without RDMA, the receiver and the sender both have to spend CPU cycles for every message. In our distributed scenario of Figure 5(a) with one TM server and n involved RMs (n = 2 in Figure 5(a)), every transaction requires mr = 2+4·n and ms = 3+4·n messages. Assuming that sends and recieves are similarly expensive we 6
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有