Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers
Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers
Outlines 1 Introduction 02.Background 03. Previous work ◎4. Pitfalls 5. Our Desian ◎6. Evaluation ◎7. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion
Outlines 1 Introduction 2. Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion
Introduction The new era of Big Data is coming! oogle- 20 PB per day(2008) YAHOO!-30 TB per day(2009) facebook 60 TB per day(2010) amazon eb y-petabytes per day What does big data mean? Important user information significant business value
Introduction The new era of Big Data is coming! – 20 PB per day (2008) – 30 TB per day (2009) – 60 TB per day (2010) –petabytes per day What does big data mean? Important user information significant business value
MapReduce What is mapreduce? most popular parallel computing model proposed by Google Select, Join gre Page rank Inverted index Clustering, machine translation Log analysis database operation Reco Search M engine earnIng Applications Scientific Cryptanalysis computation
MapReduce What is MapReduce? most popular parallel computing model proposed by Google database operation Search engine Machine learning Cryptanalysis Scientific computation Applications … Select, Join, Group Page rank, Inverted index, Log analysis Clustering, machine translation, Recommendation
Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will Delay the job execution time Degrade the cluster throughput How to solve it peculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster
Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will: Delay the job execution time Degrade the cluster throughput How to solve it? Speculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster
Outlines 1 Introduction 02.Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion
Architecture Master Assign Assign Part Map Part 2 Reduce plit 1 Split 2 art 1 N OutputI Map Part 2 Output2 Split 米 Reduce Output files Input files Part Part 2 Map Stage Reduce stage
Architecture Split 1 Split 2 … Split M Map Part 2 Part 1 Map Part 2 Part 1 Map Part 2 Part 1 Reduce Reduce Output2 Input files Map Stage Reduce Stage Output files Output1 Master … Assign Assign
Programming model a Input: (key, value) pairs o Output: key*, value *) pairs Phase Map Combine ap List(K1, V1) List(K2, v2) List(K2 List(v2)) Stage Copy Sort Reduce Reduce List(K2 Ordered( K2 List(V2)) List(V2) List(K3, V3)
Programming model ❑ Input : (key, value) pairs ❑ Output : (key*, value*) pairs Phase Stage Map: Map Combine List(K1,V1) → List(K2,V2) → List(K2, List(V2)) Reduce: Copy Sort Reduce List(K2, List(V2)) → Ordered (K2, List(V2)) → List(K3,V3)
Causes of Stragglers nternal factors External factors resource capacity of worker resource competition due to nodes is heterogeneous Co-hosted applications resource competition due to v input data skew other MapReduce tasks running on the same worker v remote input or output node source is too slow hardware fault
Causes of Stragglers Internal factors External factors ✓ resource capacity of worker nodes is heterogeneous ✓ resource competition due to other MapReduce tasks running on the same worker node ✓ resource competition due to co-hosted applications ✓ input data skew ✓ remote input or output source is too slow ✓ hardware faulty