正在加载图片...
Furthermore,R is often constrained by users because the intermediate key.A default partitioning function is the output of each reduce task ends up in a separate out- provided that uses hashing (e.g."hash(key)mod R"). put file.In practice,we tend to choose M so that each This tends to result in fairly well-balanced partitions.In individual task is roughly 16 MB to 64 MB of input data some cases,however,it is useful to partition data by (so that the locality optimization described above is most some other function of the key.For example,sometimes effective),and we make R a small multiple of the num- the output keys are URLs,and we want all entries for a ber of worker machines we expect to use.We often per- single host to end up in the same output file.To support form MapReduce computations with M=200,000 and situations like this,the user of the MapReduce library R=5.000,using 2.000 worker machines. can provide a special partitioning function.For example. using "hash(Hostname(urlkey))mod R"as the par- titioning function causes all URLs from the same host to 3.6 Backup Tasks end up in the same output file One of the common causes that lengthens the total time taken for a MapReduce operation is a"straggler":a ma- 4.2 Ordering Guarantees chine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. We guarantee that within a given partition,the interme- Stragglers can arise for a whole host of reasons.For ex- diate key/value pairs are processed in increasing key or- ample,a machine with a bad disk may experience fre- der.This ordering guarantee makes it easy to generate quent correctable errors that slow its read performance a sorted output file per partition,which is useful when from 30 MB/s to 1 MB/s.The cluster scheduling sys- the output file format needs to support efficient random tem may have scheduled other tasks on the machine, access lookups by key,or users of the output find it con- causing it to execute the MapReduce code more slowly venient to have the data sorted. due to competition for CPU,memory,local disk,or net- work bandwidth.A recent problem we experienced was 4.3 Combiner Function a bug in machine initialization code that caused proces- sor caches to be disabled:computations on affected ma- In some cases,there is significant repetition in the inter- chines slowed down by over a factor of one hundred. mediate keys produced by each map task,and the user- We have a general mechanism to alleviate the prob- specified Reduce function is commutative and associa- lem of stragglers.When a MapReduce operation is close tive.A good example of this is the word counting exam- to completion,the master schedules backup executions ple in Section 2.1.Since word frequencies tend to follow of the remaining in-progress tasks.The task is marked a Zipf distribution,each map task will produce hundreds as completed whenever either the primary or the backup or thousands of records of the form <the,1>.All of execution completes.We have tuned this mechanism so these counts will be sent over the network to a single re- that it typically increases the computational resources duce task and then added together by the Reduce function used by the operation by no more than a few percent. to produce one number.We allow the user to specify an We have found that this significantly reduces the time optional Combiner function that does partial merging of to complete large MapReduce operations.As an exam- this data before it is sent over the network. ple,the sort program described in Section 5.3 takes 44% The Combiner function is executed on each machine longer to complete when the backup task mechanism is that performs a map task.Typically the same code is used disabled. to implement both the combiner and the reduce func- tions.The only difference between a reduce function and a combiner function is how the MapReduce library han- 4 Refinements dles the output of the function.The output of a reduce function is written to the final output file.The output of Although the basic functionality provided by simply a combiner function is written to an intermediate file that writing Map and Reduce functions is sufficient for most will be sent to a reduce task. needs.we have found a few extensions useful.These are Partial combining significantly speeds up certain described in this section. classes of MapReduce operations.Appendix A contains an example that uses a combiner. 4.1 Partitioning Function 4.4 Input and Output Types The users of MapReduce specify the number of reduce tasks/output files that they desire(R).Data gets parti- The MapReduce library provides support for reading in- tioned across these tasks using a partitioning function on put data in several different formats.For example,"text" To appear in OSDI 2004 6Furthermore, R is often constrained by users because the output of each reduce task ends up in a separate out￾put file. In practice, we tend to choose M so that each individual task is roughly 16 MB to 64 MB of input data (so that the locality optimization described above is most effective), and we make R a small multiple of the num￾ber of worker machines we expect to use. We often per￾form MapReduce computations with M = 200, 000 and R = 5, 000, using 2,000 worker machines. 3.6 Backup Tasks One of the common causes that lengthens the total time taken for a MapReduce operation is a “straggler”: a ma￾chine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. Stragglers can arise for a whole host of reasons. For ex￾ample, a machine with a bad disk may experience fre￾quent correctable errors that slow its read performance from 30 MB/s to 1 MB/s. The cluster scheduling sys￾tem may have scheduled other tasks on the machine, causing it to execute the MapReduce code more slowly due to competition for CPU, memory, local disk, or net￾work bandwidth. A recent problem we experienced was a bug in machine initialization code that caused proces￾sor caches to be disabled: computations on affected ma￾chines slowed down by over a factor of one hundred. We have a general mechanism to alleviate the prob￾lem of stragglers. When a MapReduce operation is close to completion, the master schedules backup executions of the remaining in-progress tasks. The task is marked as completed whenever either the primary or the backup execution completes. We have tuned this mechanism so that it typically increases the computational resources used by the operation by no more than a few percent. We have found that this significantly reduces the time to complete large MapReduce operations. As an exam￾ple, the sort program described in Section 5.3 takes 44% longer to complete when the backup task mechanism is disabled. 4 Refinements Although the basic functionality provided by simply writing Map and Reduce functions is sufficient for most needs, we have found a few extensions useful. These are described in this section. 4.1 Partitioning Function The users of MapReduce specify the number of reduce tasks/output files that they desire (R). Data gets parti￾tioned across these tasks using a partitioning function on the intermediate key. A default partitioning function is provided that uses hashing (e.g. “hash(key) mod R”). This tends to result in fairly well-balanced partitions. In some cases, however, it is useful to partition data by some other function of the key. For example, sometimes the output keys are URLs, and we want all entries for a single host to end up in the same output file. To support situations like this, the user of the MapReduce library can provide a special partitioning function. For example, using “hash(Hostname(urlkey)) mod R” as the par￾titioning function causes all URLs from the same host to end up in the same output file. 4.2 Ordering Guarantees We guarantee that within a given partition, the interme￾diate key/value pairs are processed in increasing key or￾der. This ordering guarantee makes it easy to generate a sorted output file per partition, which is useful when the output file format needs to support efficient random access lookups by key, or users of the output find it con￾venient to have the data sorted. 4.3 Combiner Function In some cases, there is significant repetition in the inter￾mediate keys produced by each map task, and the user￾specified Reduce function is commutative and associa￾tive. A good example of this is the word counting exam￾ple in Section 2.1. Since word frequencies tend to follow a Zipf distribution, each map task will produce hundreds or thousands of records of the form <the, 1>. All of these counts will be sent over the network to a single re￾duce task and then added together by the Reduce function to produce one number. We allow the user to specify an optional Combiner function that does partial merging of this data before it is sent over the network. The Combiner function is executed on each machine that performs a map task. Typically the same code is used to implement both the combiner and the reduce func￾tions. The only difference between a reduce function and a combiner function is how the MapReduce library han￾dles the output of the function. The output of a reduce function is written to the final output file. The output of a combiner function is written to an intermediate file that will be sent to a reduce task. Partial combining significantly speeds up certain classes of MapReduce operations. Appendix A contains an example that uses a combiner. 4.4 Input and Output Types The MapReduce library provides support for reading in￾put data in several different formats. For example, “text” To appear in OSDI 2004 6
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有