正在加载图片...
the primary since it cannot assume that there is a one-to-one match base to the state that it was in before the crash.The DBMS's coordi in seri For data loading.the prir o 7.EXPERIMENTAL EVALUATION node or an entire cluster during a reconfiguration. nteuH-Sre th ench 6.1 Failure Handling We set Squall's chunk s There re the r of odkwihaoucepaionilgam&elamnodewihdi sof rec on.they are ab any node f node. Sto s complete lls singl the last mes sage sent by ting data.th Since mig are syn titions resend any pending requests to the recently failed site 6.2 Crash recovery ennco thy epbyr 9)mgron on the ronowingv s all of its check sed mig ures ach.Second. yr can only migr a pu tim es after a reco tion completes bu the itialize) ces In o ctive techni nd o d the starting pe never ace ed)and pu ulling single keys at a time crea after the che ther rrent plar enge for each p rtition then read nts measure incl ing overloade partitions ory from the file or disk,the DBMS then replays the command log to restore the data-the primary since it cannot assume that there is a one-to-one match between the number of source replicas and destination replicas. For the data extraction requests, the primary notifies secondaries when a range is partially or fully extracted so the replica can remove tu￾ples from its local copy. Using fixed-size chunks enable the repli￾cas to deterministically remove the same tuples per chunk as their primary without needing to send a list of tuple ids. For data loading, the primary forwards the pull response to its secondary replicas for them to load newly migrated tuples. Before the primary sends an acknowledgement to Squall that it received the new data, it must receive an acknowledgement from all of its replicas. This guarantees strong consistency between the replicas and that for each tuple there is only one primary copy at any time. We now describe how Squall handles either the failure of a single node or an entire cluster during a reconfiguration. 6.1 Failure Handling The DBMS sends heartbeats between nodes and uses watchdog processes to determine when a node has failed. There are three cases that Squall handles: (1) the reconfiguration leader failing, (2) a node with a source partition failing, and (3) a node with destina￾tion partition failing. A node failure can involve all three scenarios (e.g., the leader fails and has in- and out-going data). Since replicas independently track the progress of reconfiguration, they are able to replace a failed primary replica during reconfiguration if needed. If any node fails during reconfiguration, it is not allowed to rejoin the cluster until the reconfiguration has completed. Afterwards it recovers the updated state from its primary node. During the data migration phase, the leader’s state is synchronously replicated to secondary replicas. If the leader fails, a replica is able to resume managing the reconfiguration. After fail-over, the new leader replica sends a notification to all partitions to announce the new leader. Along with this notification, the last message sent by the leader is rebroadcast in case of failure during transmission. When a node fails while migrating data, the secondary replica replacing the partition’s primary replica reconciles any potentially lost reconfiguration messages. Since migration requests are syn￾chronously executed at the secondary replicas, Squall is only con￾cerned with requests sent to the failed primary that have not yet been processed. When a secondary replica is promoted to a pri￾mary, it broadcasts a notification to all partitions so that other par￾titions resend any pending requests to the recently failed site. 6.2 Crash Recovery During a reconfiguration, the DBMS suspends all of its check￾point operations. This ensures that the partitions’ checkpoints stored on disk are consistent (i.e., a tuple does not exist in two partitions at the same time). The DBMS continues to write transaction entries to its command log during data migration. If the entire system crashes after a reconfiguration completes but before a new snapshot is taken, then the DBMS recovers the data￾base from the last checkpoint and performs the migration process again. The DBMS scans the command log to find the starting point after the last checkpoint entry and looks for the first reconfiguration transaction that started after the checkpoint. If one is found, then the DBMS extracts the partition plan from the entry and uses that as the current plan. The execution engine for each partition then reads its last snapshot. For each tuple in a snapshot, Squall determines what partition should store that tuple, since it may not be the same partition that is reading in the snapshot. Once the snapshot has been loaded into memory from the file on disk, the DBMS then replays the command log to restore the data￾base to the state that it was in before the crash. The DBMS’s coordi￾nator ensures that these transactions are executed in the exact order that they were originally executed the first time. Hence, the state of the database after this recovery process is guaranteed to be correct, even if the number of partitions changes due to the reconfiguration. This is because (1) transactions are logged and replayed in serial order, so the re-execution occurs in exactly the same order as in the initial execution, and (2) replay begins from a transactionally￾consistent snapshot that does not contain any uncommitted data, so no rollback is necessary at recovery time [21, 27]. 7. EXPERIMENTAL EVALUATION We now present our evaluation of Squall. For this analysis, we integrated Squall with H-Store [1] with an external controller that initiates the reconfiguration procedure at a fixed time in the bench￾mark trial. We used the April 2014 release of H-Store with com￾mand logging enabled. We set Squall’s chunk size limit to 8 MB and set a minimum time between asynchronous pulls to 200 ms. We also limit the number of reconfiguration sub-plans to be between 5 and 20, with a 100 ms delay between them. The experiments in Section 7.6 show our justification for this configuration. The experiments were conducted on a cluster where each node has a Intel Xeon E5620 CPU running 64-bit CentOS Linux with OpenJDK 1.7. The nodes are in a single rack connected by a 1GB switch with an average RTT of 0.35 ms. As part of this evaluation, we also implemented three different reconfiguration approaches in H-Store: Stop-and-Copy: A distributed transaction locks the entire clus￾ter and then performs the data migration. All partitions block until this process completes. Pure Reactive: The system only pulls single tuples from the source partition to the destination when they are needed. This is the same as Squall but without the asynchronous migration or any of the optimizations described in Section 5. Zephyr+: This technique combines the pure-reactive migration with chunked asynchronous pulls (cf. Section 4.5) and pull prefetching (cf. Section 5.3) to simulate pulling data in pages instead of individual keys. The purely reactive reconfiguration approach is semantically equiv￾alent to the Zephyr [19] migration system based on the following observations. First, Zephyr relies on a transaction execution com￾pleting at the source to begin a push-phase in addition to the reac￾tive pull-based migration. In general for a reconfiguration, transac￾tion execution at the source does not complete, so we stick to a pure pull-based approach. Second, Zephyr can only migrate a disk-page of tuples at a time (typically 4 KB). Squall on the other hand can migrate both single tuples and ranges. Third, there is no need to copy a “wireframe” between partitions to initialize a destination, as all required information (e.g., schema or authentication) is already present. In our experiments, the pure reactive technique was not guaranteed to finish the reconfiguration (because some tuples are never accessed) and pulling single keys at a time created significant coordination overhead. Thus, we also implemented the Zephyr+ approach that uses chunked asynchronous pulls with prefetching. Our experiments measure how well these approaches are able to reconfigure a database in H-Store under a variety of situations, including re-balancing overloaded partitions, cluster contraction, and shuffling data between partitions. For experiments where Pure Reactive and Zephyr+ results are identical, we only show the latter
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有