DrIM Fast In-memory Transaction Processing using RDMA and HTM XINDA WEL, JIAXIN SHL YANZHE CHEN RONG CHEN,HAB○CHEN Institute of Parallel and distributed Systems Shanghai Jiao tong University, China SOSP 15 The 25th ACM Symposium on Operating Systems Principles
XINDA WEI, JIAXIN SHI, YANZHE CHEN, RONG CHEN, HAIBO CHEN Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Fast In-memory Transaction Processing using RDMA and HTM DrTM
Transaction: Key Pillar for Many Systems Alibaba. com S9.3 billion/day Demand speedy distributed Transaction Over large data volumes VPaL 9.56 million 6 million 12306] tickets/day payments/day 2
2 Transaction: Key Pillar for Many Systems Demand Speedy Distributed Transaction Over Large Data Volumes $9.3 billion/day 9.56 million tickets/day 11. 6 million payments/day
High COSt for Distributed TX Many scalable systems have low performance a Usually 10s-100s of thousands of TX/second n High COST(config. that outperform single thread o e.g., HStore, Calvin SIGMOD'12 Emerging speedy tx systems not scale-out o Achieve over 100s of thousands TX/second o e.g., Silo SosP'13, DBXEurOSys'14 Dilemma single-node perf. vs. scale-ouf i Salability But at what Cost HotoS 2015
3 High COST for Distributed TX Many scalable systems have low performance □ Usually 10s~100s of thousands of TX/second □ High COST1 (config. that outperform single thread) □ e.g., HStore, CalvinSIGMOD’12 1 Salability! But at what Cost? HotOS 2015 Dilemma: single-node perf. vs. scale-out Emerging speedy TX systems not scale-out □ Achieve over 100s of thousands TX/second □ e.g., SiloSOSP’13, DBXEuroSys’14
Why(Distributed TXs are slow? Only 4% of wall-clock time spent on useful data processing while the rest is occupied with buffer pools, locking, latching, recovery. Michael stone braker Useful Work 4% Buffer Pool Recovery 24% 24% Latching Locking% 24% i"The Traditional RDBMS Wisdom is All Wrong
4 Why (Distributed) TXs are Slow? Only 4% of wall-clock time spent on useful data processing, while the rest is occupied with buffer pools, locking, latching, recovery. 1 -- Michael Stonebraker 1 “The Traditional RDBMS Wisdom is All Wrong
Opportunities: (not so) New HW Features HTM: Hardware Transaction Memory n Allow a group of load& store instructions to execute in an atomic, consistent and isolated (ACl)way RDMA: Remote Direct Memory Access n Provide cross-machine accesses with high speed, low latency and low CPU overhead Rethink the design of low-COST scalable in-memory transaction systems
5 RDMA: Remote Direct Memory Access □ Provide cross-machine accesses with high speed, low latency and low CPU overhead Rethink the design of low-COST scalable in-memory transaction systems Opportunities: (not so) New HW Features HTM: Hardware Transaction Memory □ Allow a group of load & store instructions to execute in an atomic, consistent and isolated (ACI) way
Opportunities with htm& RDma HTM: Hardware Transaction Memory a non-transactional code will unconditionally abort a transaction when their accesses conflict Strong RDMA: Remote Direct Memory Access Atomicity
HTM: Hardware Transaction Memory 6 Opportunities with HTM & RDMA RDMA: Remote Direct Memory Access a non-transactional code will unconditionally abort a transaction when their accesses conflict Strong Atomicity
Opportunities with htm& RDma HTM: Hardware Transaction Memory a non-transactional code will unconditionally abort a transaction when their accesses conflict Strong RDMA: Remote Direct Memory Access Atomicity one-sided RDMa operations are cache-coherent with local accesses Strong Consistency
HTM: Hardware Transaction Memory 8 Opportunities with HTM & RDMA RDMA: Remote Direct Memory Access a non-transactional code will unconditionally abort a transaction when their accesses conflict one-sided RDMA operations are cache-coherent with local accesses Strong Atomicity Strong Consistency
Opportunities with htm& RDma HTM: Hardware Transaction Memory a non-transactional code will unconditionally abort a transaction when their accesses conflict RDMA: Remote Direct Memory Access one-sided RDMa operations are cache-coherent with local accesses HTM Strong RDMA Strong Atomicity Consistency DMA ops will abort conflictIng HTM TX
HTM: Hardware Transaction Memory 8 Opportunities with HTM & RDMA RDMA: Remote Direct Memory Access HTM Strong Atomicity RDMA Strong Consistency RDMA ops will abort conflicting HTM TX a non-transactional code will unconditionally abort a transaction when their accesses conflict one-sided RDMA operations are cache-coherent with local accesses
Opportunities with htm& RDma HTM: Hardware Transaction Memory a non-transactional code will unconditionally abort a transaction when their accesses conflict RDMA: Remote Direct Memory Access one-sided RDMa operations are cache-coherent with local accesses HTM Strong Atomicity Consistency> DMA ops will abort conflictIng HTM TX Basis for distributed tm
HTM: Hardware Transaction Memory 9 Opportunities with HTM & RDMA RDMA: Remote Direct Memory Access Basis for Distributed TM HTM Strong Atomicity RDMA Strong Consistency RDMA ops will abort conflicting HTM TX a non-transactional code will unconditionally abort a transaction when their accesses conflict one-sided RDMA operations are cache-coherent with local accesses
Overall Idea Use htm's acl properties for local TX execution Use one-sided rdma to glue multiple HTM TXs Useful In- Memory Logging Work 4% with Nvm Buffer Pool Recovery 24% 24% atching Use hTMs acl Locking% features n: Memory、24% Store One-sided RDMA Ops 10
10 Use HTM’s ACI properties for local TX execution Use one-sided RDMA to glue multiple HTM TXs In-Memory Store In-Memory Logging with NVM One-sided RDMA Ops Use HTM’s ACI features Overall Idea