正在加载图片...
20000 20000 20000 15000 15000 15000 10000 10000 10000 5000 5000 5000 0 500 1000 50 1000 1000 20000- 20000 20000 15000 15000 15000 10000 10000 10000 5000 5000 5000 0 1000 500 I000 1000 20000 20000 20000 (s/)indino 15000 15000 15000 10000 0000 10000 5000 5000 5000 0 500 1000 500 1000 500 1000 Seconds Seconds Seconds (a)Normal execution (b)No backup tasks (c)200 tasks killed Figure 3:Data transfer rates over time for different executions of the sort program original text line as the intermediate key/value pair.We the first batch of approximately 1700 reduce tasks (the used a built-in Identity function as the Reduce operator. entire MapReduce was assigned about 1700 machines. This functions passes the intermediate key/value pair un- and each machine executes at most one reduce task at a changed as the output key/value pair.The final sorted time).Roughly 300 seconds into the computation,some output is written to a set of 2-way replicated GFS files of these first batch of reduce tasks finish and we start (i.e.,2 terabytes are written as the output of the program). shuffling data for the remaining reduce tasks.All of the As before,the input data is split into 64MB pieces shuffling is done about 600 seconds into the computation. (M =15000).We partition the sorted output into 4000 The bottom-left graph shows the rate at which sorted files(R=4000).The partitioning function uses the ini- data is written to the final output files by the reduce tasks. tial bytes of the key to segregate it into one of R pieces. There is a delay between the end of the first shuffling pe- Our partitioning function for this benchmark has built- riod and the start of the writing period because the ma- chines are busy sorting the intermediate data.The writes in knowledge of the distribution of keys.In a general sorting program,we would add a pre-pass MapReduce continue at a rate of about 2-4 GB/s for a while.All of the writes finish about 850 seconds into the computation. operation that would collect a sample of the keys and Including startup overhead,the entire computation takes use the distribution of the sampled keys to compute split- points for the final sorting pass. 891 seconds.This is similar to the current best reported result of 1057 seconds for the TeraSort benchmark [181. Figure 3 (a)shows the progress of a normal execution A few things to note:the input rate is higher than the of the sort program.The top-left graph shows the rate shuffle rate and the output rate because of our locality at which input is read.The rate peaks at about 13 GB/s optimization-most data is read from a local disk and and dies off fairly quickly since all map tasks finish be- bypasses our relatively bandwidth constrained network. fore 200 seconds have elapsed.Note that the input rate The shuffle rate is higher than the output rate because is less than for grep.This is because the sort map tasks the output phase writes two copies of the sorted data (we spend about half their time and I/O bandwidth writing in- make two replicas of the output for reliability and avail- termediate output to their local disks.The corresponding ability reasons).We write two replicas because that is intermediate output for grep had negligible size. the mechanism for reliability and availability provided The middle-left graph shows the rate at which data by our underlying file system.Network bandwidth re- is sent over the network from the map tasks to the re- quirements for writing data would be reduced if the un- duce tasks.This shuffling starts as soon as the first derlying file system used erasure coding [14]rather than map task completes.The first hump in the graph is for replication. To appear in OSDI 2004 9500 1000 0 5000 10000 15000 20000 Input (MB/s) 500 1000 0 5000 10000 15000 20000 Shuffle (MB/s) 500 1000 Seconds 0 5000 10000 15000 20000 Output (MB/s) Done (a) Normal execution 500 1000 0 5000 10000 15000 20000 Input (MB/s) 500 1000 0 5000 10000 15000 20000 Shuffle (MB/s) 500 1000 Seconds 0 5000 10000 15000 20000 Output (MB/s) Done (b) No backup tasks 500 1000 0 5000 10000 15000 20000 Input (MB/s) 500 1000 0 5000 10000 15000 20000 Shuffle (MB/s) 500 1000 Seconds 0 5000 10000 15000 20000 Output (MB/s) Done (c) 200 tasks killed Figure 3: Data transfer rates over time for different executions of the sort program original text line as the intermediate key/value pair. We used a built-in Identity function as the Reduce operator. This functions passes the intermediate key/value pair un￾changed as the output key/value pair. The final sorted output is written to a set of 2-way replicated GFS files (i.e., 2 terabytes are written as the output of the program). As before, the input data is split into 64MB pieces (M = 15000). We partition the sorted output into 4000 files (R = 4000). The partitioning function uses the ini￾tial bytes of the key to segregate it into one of R pieces. Our partitioning function for this benchmark has built￾in knowledge of the distribution of keys. In a general sorting program, we would add a pre-pass MapReduce operation that would collect a sample of the keys and use the distribution of the sampled keys to compute split￾points for the final sorting pass. Figure 3 (a) shows the progress of a normal execution of the sort program. The top-left graph shows the rate at which input is read. The rate peaks at about 13 GB/s and dies off fairly quickly since all map tasks finish be￾fore 200 seconds have elapsed. Note that the input rate is less than for grep. This is because the sort map tasks spend about half their time and I/O bandwidth writing in￾termediate output to their local disks. The corresponding intermediate output for grep had negligible size. The middle-left graph shows the rate at which data is sent over the network from the map tasks to the re￾duce tasks. This shuffling starts as soon as the first map task completes. The first hump in the graph is for the first batch of approximately 1700 reduce tasks (the entire MapReduce was assigned about 1700 machines, and each machine executes at most one reduce task at a time). Roughly 300 seconds into the computation, some of these first batch of reduce tasks finish and we start shuffling data for the remaining reduce tasks. All of the shuffling is done about 600 secondsinto the computation. The bottom-left graph shows the rate at which sorted data is written to the final output files by the reduce tasks. There is a delay between the end of the first shuffling pe￾riod and the start of the writing period because the ma￾chines are busy sorting the intermediate data. The writes continue at a rate of about 2-4 GB/s for a while. All of the writes finish about 850 seconds into the computation. Including startup overhead, the entire computation takes 891 seconds. This is similar to the current best reported result of 1057 seconds for the TeraSort benchmark [18]. A few things to note: the input rate is higher than the shuffle rate and the output rate because of our locality optimization – most data is read from a local disk and bypasses our relatively bandwidth constrained network. The shuffle rate is higher than the output rate because the output phase writes two copies of the sorted data (we make two replicas of the output for reliability and avail￾ability reasons). We write two replicas because that is the mechanism for reliability and availability provided by our underlying file system. Network bandwidth re￾quirements for writing data would be reduced if the un￾derlying file system used erasure coding [14] rather than replication. To appear in OSDI 2004 9
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有