figuration and does not migrate any data.Since the reconfiguration se(D)( on.the 3.2 Data Migration The easiest way to safely transfer data in a distributed DBMS is at a partition when it does no). tions.For ing th system is still ex cuting tran this dat any lost or duplicated dat o over n of m This 3.1 Initialization If the s data is d pull me od intr uce ency for tr otified by an exter e.)the d on ation,(2)acti I data in al tra whether it is allowed to start the nfiguration s.This lock the tatus of the uple it is migrat ous nother partition then theshes 3.3 Termination ntralized tor of dat nce all of the partitions agree tos has sent nd/o art the reco tion (out on (in ning attri As we in S Each and exits the re 4.MANAGING DATA MIGRATION fter a partition compl The migration of data between partitions in a transactionally safe hen the le all onplan:{ "warehouse (W_ID)": { "Partition 1" : [0-3) "Partition 2" : [3-5) "Partition 3" : [5-9) "Partition 4" : [9- ) }} (a) Old Plan plan:{ "warehouse (W_ID)": { "Partition 1": [0-2) "Partition 2": [3-5) "Partition 3": [2-3),[5-6) "Partition 4": [6- ) }} (b) New Plan Figure 5: An example of an updated partition plan for the TPC-C database shown in Fig. 2. at a partition when it does not). Determining when a reconfiguration should occur and how the partition plan should evolve are addressed in the E-Store project [38]. We assume that a separate system controller monitors the DBMS and then initiates the recon- figuration process by providing the system with the new partition plan when appropriate [18, 38]. Squall processes a live reconfiguration in three stages: (1) initializing the partitions’ tracking data structures, (2) migrating data between partitions, and (3) identifying when the reconfiguration process has terminated. In this section, we provide an overview of these three steps. We discuss Squall’s data migration protocol in further detail in Section 4. We then present optimizations of this process in Section 5, such as eager data requests and a divide-andconquer technique for reconfiguration plans. 3.1 Initialization The initialization phase synchronously engages all of partitions in the cluster to start a new reconfiguration. As part of this step, each partition prepares for reconfiguration and identifies the tuples it will be exchanging with other partitions. A reconfiguration starts when the DBMS is notified by an external controller that the system needs to re-balance. This notification identifies (1) the new partition plan for the database and (2) the designated leader node for the operation. The leader is any node in the cluster that contains a partition affected by the reconfiguration. If the reconfiguration calls for a new node to be added to the cluster, then that node must be on-line before the reconfiguration can begin. To initiate the reconfiguration, the leader invokes a special transaction that locks every partition in the cluster and checks to see whether it is allowed to start the reconfiguration process. This locking is the same as when a normal transaction needs to access every partition in the system. The request is allowed to proceed if (1) the system has terminated all previous reconfigurations and (2) the DBMS is not writing out a recovery snapshot of the database to disk. If either of these conditions is not satisfied, then the transaction aborts and is re-queued after the blocking operation finishes. This ensures that all partitions have a consistent view of data ownership and prevents deadlocks caused by concurrent reconfigurations. Once all of the partitions agree to start the reconfiguration, they then enter a “reconfiguration mode” where each partition examines the new plan to identify which tuples are leaving the partition (outgoing) and which tuples will be moving into the partition (incoming). These incoming and outgoing tuples are broken into ranges based on their partitioning attributes. As we discuss in Section 5, this step is necessary because Squall may need to split tuple ranges into smaller chunks or split the reconfiguration into smaller subreconfigurations for performance reasons. After a partition completes this local data analysis, it notifies the leader and waits to learn whether the reconfiguration will proceed. If all of the partitions agree to proceed with the reconfiguration, then the leader sends out acknowledgement to all of the partitions to begin migrating data. Squall only uses this global lock during the initialization phase to synchronize all partitions to begin recon- figuration and does not migrate any data. Since the reconfiguration transaction only modifies the meta-data related to reconfiguration, the transaction is extremely short and has a negligible impact on performance. For all our trials in our experimental evaluation, the average length of this initialization phase was ∼130 ms. 3.2 Data Migration The easiest way to safely transfer data in a distributed DBMS is to stop executing transactions and then move data to its new location. This approach, known as stop-and-copy, ensures that transactions execute either before or after the transfer and therefore have a consistent view of the database. But shutting down the system is unacceptable for applications that cannot tolerate downtime, so stop-and-copy is not an option. It is non-trivial, however, to transfer data while the system is still executing transactions. For example, if half of the data that a transaction needs has already been migrated to a different partition, it is not obvious whether it is better to propagate changes to that partition or restart the transaction at the new location. The challenge is in how to coordinate this data movement between partitions without any lost or duplicated data, and with minimal impact to the DBMS’s performance. To overcome these problems, Squall tracks the location of migrating tuples at each partition during the reconfiguration process. This allows each node’s transaction manager to determine whether it has all of the tuples that are needed for a particular transaction. If the system is uncertain of the current location of required tuples, then the transaction is scheduled at the partitions where the data is supposed to be according to the new plan. Then when the transaction attempts to access the tuples that have not been moved yet, Squall will reactively pull data to the new location [19, 35]. Although this on-demand pull method introduces latency for transactions, it has four benefits: (1) it always advances the progress of data migration, (2) active data is migrated earlier in the reconfiguration, (3) it requires no external coordination, and (4) it does not incur any downtime to synchronize ownership metadata. In addition to the on-demand data pulls, Squall also asynchronously migrates additional data so that the reconfiguration completes in a timely manner. All of these migration requests are executed by a partition in the same manner as regular transactions. Each partition is responsible for tracking the progress of migrating data between itself and other partitions. In other words, each partition only tracks the status of the tuples it is migrating. This allows it to identify whether a particular tuple is currently stored in the local partition or whether it must retrieve it from another partition. 3.3 Termination Since there is no centralized controller in Squall that monitors the process of data migration, it up to each partition to independently determine when it has sent and/or received all of the data that it needs. Once a partition recognizes that it has received all of the tuples required for the new partition plan, it notifies the current leader that the data migration is finished at that partition. When the leader receives acknowledgments from all of the partitions in the cluster, it notifies all partitions that the reconfiguration process is complete. Each partition removes all of its tracking data structures and exits the reconfiguration mode. 4. MANAGING DATA MIGRATION The migration of data between partitions in a transactionally safe manner is the most important feature of Squall. As such, we now discuss this facet of the system in greater detail. We first describe how Squall divides each partition’s migrating tuples into ranges and tracks the progress of the reconfiguration