ing system that produces the data str_中国高校课件下载中心

点击下载：《大规模数据处理——云计算 Mass Data Processing Cloud Computing》课程教学资源（阅读材料）MapReduce——Simplified Data Processing on Large Clusters

正在加载图片...

ing system that produces the data structures used for the make it easier for programmers to write parallel pro- Google web search service.The indexing system takes grams.A key difference between these systems and as input a large set of documents that have been retrieved MapReduce is that MapReduce exploits a restricted pro- by our crawling system,stored as a set of GFS files.The gramming model to parallelize the user program auto- raw contents for these documents are more than 20 ter- matically and to provide transparent fault-tolerance. abytes of data.The indexing process runs as a sequence Our locality optimization draws its inspiration from of five to ten MapReduce operations.Using MapReduce techniques such as active disks [12,15].where compu- (instead of the ad-hoc distributed passes in the prior ver- tation is pushed into processing elements that are close sion of the indexing system)has provided several bene- to local disks.to reduce the amount of data sent across fits: I/O subsystems or the network.We run on commodity processors to which a small number of disks are directly The indexing code is simpler,smaller,and easier to connected instead of running directly on disk controller understand.because the code that deals with fault processors,but the general approach is similar. tolerance,distribution and parallelization is hidden Our backup task mechanism is similar to the eager within the MapReduce library.For example,the scheduling mechanism employed in the Charlotte Sys- size of one phase of the computation dropped from tem 3. approximately 3800 lines of C++code to approx- One of the shortcomings of simple eager scheduling is that if a given task causes repeated failures, imately 700 lines when expressed using MapRe- the entire computation fails to complete.We fix some in- duce. stances of this problem with our mechanism for skipping The performance of the MapReduce library is good bad records. enough that we can keep conceptually unrelated The MapReduce implementation relies on an in-house computations separate,instead of mixing them to- cluster management system that is responsible for dis- gether to avoid extra passes over the data.This tributing and running user tasks on a large collection of makes it easy to change the indexing process.For shared machines.Though not the focus of this paper,the example,one change that took a few months to cluster management system is similar in spirit to other make in our old indexing system took only a few systems such as Condor [16]. days to implement in the new system. The sorting facility that is a part of the MapReduce library is similar in operation to NOW-Sort [1].Source The indexing process has become much easier to machines(map workers)partition the data to be sorted operate,because most of the problems caused by and send it to one of R reduce workers.Each reduce machine failures,slow machines,and networking hiccups are dealt with automatically by the MapRe- worker sorts its data locally (in memory if possible).Of course NOW-Sort does not have the user-definable Map duce library without operator intervention.Further- and Reduce functions that make our library widely appli- more,it is easy to improve the performance of the cable. indexing process by adding new machines to the in- River [2]provides a programming model where pro- dexing cluster. cesses communicate with each other by sending data over distributed queues.Like MapReduce,the River 7 Related Work system tries to provide good average case performance even in the presence of non-uniformities introduced by Many systems have provided restricted programming heterogeneous hardware or system perturbations.River models and used the restrictions to parallelize the com- achieves this by careful scheduling of disk and network putation automatically.For example,an associative func- transfers to achieve balanced completion times.MapRe- tion can be computed over all prefixes of an N element duce has a different approach.By restricting the pro- array in log N time on N processors using parallel prefix gramming model,the MapReduce framework is able computations [6,9,13].MapReduce can be considered to partition the problem into a large number of fine- a simplification and distillation of some of these models grained tasks.These tasks are dynamically scheduled based on our experience with large real-world compu- on available workers so that faster workers process more tations.More significantly,we provide a fault-tolerant tasks.The restricted programming model also allows implementation that scales to thousands of processors. us to schedule redundant executions of tasks near the In contrast,most of the parallel processing systems have end of the job which greatly reduces completion time in only been implemented on smaller scales and leave the the presence of non-uniformities (such as slow or stuck details of handling machine failures to the programmer. workers). Bulk Synchronous Programming [17]and some MPI BAD-FS [5]has a very different programming model primitives [11]provide higher-level abstractions that from MapReduce,and unlike MapReduce,is targeted to To appear in OSDI 2004 11ing system that produces the data structures used for the Google web search service. The indexing system takes as input a large set of documents that have been retrieved by our crawling system, stored as a set of GFS files. The raw contents for these documents are more than 20 terabytes of data. The indexing process runs as a sequence of five to ten MapReduce operations. Using MapReduce (instead of the ad-hoc distributed passes in the prior version of the indexing system) has provided several bene- fits: • The indexing code is simpler, smaller, and easier to understand, because the code that deals with fault tolerance, distribution and parallelization is hidden within the MapReduce library. For example, the size of one phase of the computation dropped from approximately 3800 lines of C++ code to approximately 700 lines when expressed using MapReduce. • The performance of the MapReduce library is good enough that we can keep conceptually unrelated computations separate, instead of mixing them together to avoid extra passes over the data. This makes it easy to change the indexing process. For example, one change that took a few months to make in our old indexing system took only a few days to implement in the new system. • The indexing process has become much easier to operate, because most of the problems caused by machine failures, slow machines, and networking hiccups are dealt with automatically by the MapReduce library without operator intervention. Furthermore, it is easy to improve the performance of the indexing process by adding new machines to the indexing cluster. 7 Related Work Many systems have provided restricted programming models and used the restrictions to parallelize the computation automatically. For example, an associative function can be computed over all prefixes of an N element array in log N time on N processors using parallel prefix computations [6, 9, 13]. MapReduce can be considered a simplification and distillation of some of these models based on our experience with large real-world computations. More significantly, we provide a fault-tolerant implementation that scales to thousands of processors. In contrast, most of the parallel processing systems have only been implemented on smaller scales and leave the details of handling machine failures to the programmer. Bulk Synchronous Programming [17] and some MPI primitives [11] provide higher-level abstractions that make it easier for programmers to write parallel programs. A key difference between these systems and MapReduce is that MapReduce exploits a restricted programming model to parallelize the user program automatically and to provide transparent fault-tolerance. Our locality optimization draws its inspiration from techniques such as active disks [12, 15], where computation is pushed into processing elements that are close to local disks, to reduce the amount of data sent across I/O subsystems or the network. We run on commodity processors to which a small number of disks are directly connected instead of running directly on disk controller processors, but the general approach is similar. Our backup task mechanism is similar to the eager scheduling mechanism employed in the Charlotte System [3]. One of the shortcomings of simple eager scheduling is that if a given task causes repeated failures, the entire computation fails to complete. We fix some instances of this problem with our mechanism for skipping bad records. The MapReduce implementation relies on an in-house cluster management system that is responsible for distributing and running user tasks on a large collection of shared machines. Though not the focus of this paper, the cluster management system is similar in spirit to other systems such as Condor [16]. The sorting facility that is a part of the MapReduce library is similar in operation to NOW-Sort [1]. Source machines (map workers) partition the data to be sorted and send it to one of R reduce workers. Each reduce worker sorts its data locally (in memory if possible). Of course NOW-Sort does not have the user-definable Map and Reduce functions that make our library widely applicable. River [2] provides a programming model where processes communicate with each other by sending data over distributed queues. Like MapReduce, the River system tries to provide good average case performance even in the presence of non-uniformities introduced by heterogeneous hardware or system perturbations. River achieves this by careful scheduling of disk and network transfers to achieve balanced completion times. MapReduce has a different approach. By restricting the programming model, the MapReduce framework is able to partition the problem into a large number of finegrained tasks. These tasks are dynamically scheduled on available workers so that faster workers process more tasks. The restricted programming model also allows us to schedule redundant executions of tasks near the end of the job which greatly reduces completion time in the presence of non-uniformities (such as slow or stuck workers). BAD-FS [5] has a very different programming model from MapReduce, and unlike MapReduce, is targeted to To appear in OSDI 2004 11

<<向上翻页向下翻页>>

点击下载：《大规模数据处理——云计算 Mass Data Processing Cloud Computing》课程教学资源（阅读材料）MapReduce——Simplified Data Processing on Large Clusters