How￾ever, there are two fundamental similarities. (1) Both systems use redundant execution to recover from data loss caused by failures. (2) Both use locality-aware scheduling to reduce the amount of data sent across con￾gested network links. TACC [7] is a system designed to simplify con￾struction of highly-available networked services. Like MapReduce, it relies on re-execution as a mechanism for implementing fault-tolerance. 8 Conclusions The MapReduce programming model has been success￾fully used at Google for many different purposes. We attribute this success to several reasons. First, the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault-tolerance, locality opti￾mization, and load balancing. Second, a large variety of problems are easily expressible as MapReduce com￾putations. For example, MapReduce is used for the gen￾eration of data for Google’s production web search ser￾vice, for sorting, for data mining, for machine learning, and many other systems. Third, we have developed an implementation of MapReduce that scales to large clus￾ters of machines comprising thousands of machines. The implementation makes efficient use of these machine re￾sources and therefore is suitable for use on many of the large computational problems encountered at Google. We have learned several things from this work. First, restricting the programming model makes it easy to par￾allelize and distribute computations and to make such computations fault-tolerant. Second, network bandwidth is a scarce resource. A number of optimizations in our system are therefore targeted at reducing the amount of data sent across the network: the locality optimization al￾lows us to read data from local disks, and writing a single copy of the intermediate data to local disk saves network bandwidth. Third, redundant execution can be used to reduce the impact of slow machines, and to handle ma￾chine failures and data loss. Acknowledgements Josh Levenberg has been instrumental in revising and extending the user-level MapReduce API with a num￾ber of new features based on his experience with using MapReduce and other people’s suggestions for enhance￾ments. MapReduce reads its input from and writes its output to the Google File System [8]. We would like to thank Mohit Aron, Howard Gobioff, Markus Gutschke, David Kramer, Shun-Tak Leung, and Josh Redstone for their work in developing GFS. We would also like to thank Percy Liang and Olcan Sercinoglu for their work in developing the cluster management system used by MapReduce. Mike Burrows, Wilson Hsieh, Josh Leven￾berg, Sharon Perl, Rob Pike, and Debby Wallach pro￾vided helpful comments on earlier drafts of this pa￾per. The anonymous OSDI reviewers, and our shepherd, Eric Brewer, provided many useful suggestions of areas where the paper could be improved. Finally, we thank all the users of MapReduce within Google's engineering or￾ganization for providing helpful feedback, suggestions, and bug reports.  
