Squall:Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases Aaron J.Elmore',Vaibhav Arora2,Rebecca Taft Andrew Pavlo,Divyakant Agrawal2.s,Amr El Abbadi wrcSgne8a8mag1Baansne aeimorecschcsb.. ABSTRACT vith the abilit udden shifts This whe e is de (OLTP) high performance and de spre ons.H o h In these o reco overall DBMS.M e ale databases in thi n,we introduce the Squall techniqu a node l ts fin ng of ohpgadinghan vare licated data having to shut down 1.INTRODUCTION disk-ha ar.they are pre The nced for scalable main-memory DBMSs ismo vated by de This me rent data ns.Such systems eschew leg s19,351 tional DBMSs 37]and alleviate the ntion on shared data the DBMS ep train o more,modern web/mobile applications require always-available data ches 137 f all ssor physical loggng full cit the f h or specific pe issions@acm.平 then instead on to a new nod a rights licensed to ACM ut)approach dynamic allows the system to continue to process it mirate Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases Aaron J. Elmore1 , Vaibhav Arora2 , Rebecca Taft3 Andrew Pavlo4 , Divyakant Agrawal2,5 , Amr El Abbadi2 1University of Chicago, 2University of California, Santa Barbara 3MIT CSAIL, 4Carnegie Mellon University, 5Qatar Computing Research Institute aelmore@cs.uchicago.edu, {vaibhavarora,agrawal,amr}@cs.ucsb.edu, rytaft@mit.edu, pavlo@cs.cmu.edu ABSTRACT For data-intensive applications with many concurrent users, modern distributed main memory database management systems (DBMS) provide the necessary scale-out support beyond what is possible with single-node systems. These DBMSs are optimized for the short-lived transactions that are common in on-line transaction processing (OLTP) workloads. One way that they achieve this is to partition the database into disjoint subsets and use a single-threaded transaction manager per partition that executes transactions one-ata-time in serial order. This minimizes the overhead of concurrency control mechanisms, but requires careful partitioning to limit distributed transactions that span multiple partitions. Previous methods used off-line analysis to determine how to partition data, but the dynamic nature of these applications means that they are prone to hotspots. In these situations, the DBMS needs to reconfigure how data is partitioned in real-time to maintain performance objectives. Bringing the system off-line to reorganize the database is unacceptable for on-line applications. To overcome this problem, we introduce the Squall technique for supporting live reconfiguration in partitioned, main memory DBMSs. Squall supports fine-grained repartitioning of databases in the presence of distributed transactions, high throughput client workloads, and replicated data. An evaluation of our approach on a distributed DBMS shows that Squall can reconfigure a database with no downtime and minimal overhead on transaction latency. 1. INTRODUCTION The need for scalable main-memory DBMSs is motivated by decreasing memory costs and an increasing demand for high-throughput transaction processing systems. Such systems eschew legacy diskoriented concurrency control and recovery mechanisms of traditional DBMSs [37] and alleviate the contention on shared datastructures [17,26,31,40]. But some OLTP databases are larger than the amount of DRAM that is available on a single node. Furthermore, modern web/mobile applications require always-available data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. SIGMOD’15, May 31–June 4, 2015, Melbourne, Victoria, Australia. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2758-9/15/05 ...$15.00. http://dx.doi.org/10.1145/2723372.2723726. with the ability to support sudden shifts in access patterns. The inability to react to changes in usage or mitigate potential downtime can cause significant financial losses for service providers [7]. This argues for the use of a distributed DBMS architecture where the database is deployed in memory-only storage on a cluster of shared-nothing nodes. These scalable DBMSs, colloquially known as NewSQL [10], achieve high performance and scalability without sacrificing the benefits of strong transactional guarantees by spreading the databases across nodes into disjoint partitions. Recent examples of these systems include H-Store [24] (and its commercial version VoltDB [6]), MemSQL [2], and SQLFire [5]. Even if a database resides entirely in memory across multiple partitions, the DBMS is not immune to problems resulting from changes in workload demands or access patterns. For example, sudden increases in the popularity of a particular item in the database can negatively impact the performance of the overall DBMS. Modern distributed systems can, in theory, add and remove resources dynamically, but in practice it is difficult to scale databases in this manner [23]. Increasing system capacity involves either scaling up a node by upgrading hardware or scaling out by adding additional nodes to the system in order to distribute load. Either scenario can involve migrating data between nodes and bringing nodes off-line during maintenance windows [18]. Previous work has shown how to migrate a database from one node to another incrementally to avoid having to shut down the system [8, 15, 19, 35]. These approaches, however, are not suitable for partitioned main-memory DBMSs. In particular, they are predicated upon disk-based concurrency control and recovery mechanisms. This means that they rely on concurrent data access to migrate data using heavy-weight two-phase locking and snapshot isolation methods to ensure correctness [19, 35]. More proactive migration techniques rely on physical logging from standard replication and recovery mechanisms [8, 15]. But this dependency on the DBMS’s replication infrastructure places additional strain on partitions that may be already overloaded. Above all, these approaches are inappropriate for partitioned main-memory DBMSs that execute transactions serially [37], since the system does not support concurrent access or physical logging. Even if one adapted these approaches to move partitions between nodes, they are still not able to split one partition into multiple partitions. For example, if there is a particular entity in a partition that is extremely popular (e.g., the Wu Tang Clan’s Twitter account), then instead of migrating the entire partition to a new node it is better to move those individual tuples to their own partition [38]. A better (but more difficult) approach is to dynamically reconfigure the physical layout of the database while the system is live. This allows the system to continue to process transactions as it migrates