1.2 The MapReduce Framework A Hadoop _中国高校课件下载中心

正在加载图片...

1.2 The MapReduce Framework identical-machine model,there are some well-known approx- A Hadoop system runs on top of a distributed file sys- imation algorithms.For example,Graham [12]proposed a tem,called the Hadoop Distributed File System (HDFS) (2-1/n)-approximation algorithm in 1966,where n is the HDFS usually runs on networked commodity PCs,where total number of machines.Graham 13 proposed another data are replicated and locally stored on hard disks of each 4/3-approximation algorithm in 1969.However,under the machine.To store and process huge volume of data sets. unrelated-machine model,this problem is known to be APX- HDFS typically uses a block size of 64MB.Therefore.mov- hard,both in terms of its offline 17 and online 1,2 ap- ing computation close to the data is a design goal in the proximability. MapReduce framework. As some researchers [3,4 pointed out,the scheduling In the MapReduce framework,any application is specified mechanisms and polices that assign tasks to servers within by jobs.A MapReduce job splits the input data into inde- the MapReduce framework can have a profound effect on ef- pendent blocks,which are processed by the map tasks in ficiency.An early version of Hadoop uses a simple heuristic parallel.Each map task processes a single block'consisting algorithm that greedily exploits data locality.Zaharia,Kon- of some number of records.Each record in turn consists of winski and Joseph 19 proposed some heuristic refinements a key/value pair.A map task applies the user defined map based on experimental results. function to each input key/value pair and produces inter- 1.4 Our Contributions mediate key/value pairs.The framework then sorts the in- termediate data,and forwards them to the reduce tasks via We investigate task assignment in Hadoop.In Section 2, interconnected networks.After receiving all intermediate we propose an idealized Hadoop model to evaluate the cost key/value pairs with the same key,a reduce task executes of task assignments.Based on this model.we show in Sec- the user defined reduce function and produces the output tion 3 that there is no feasible algorithm to find the optimal data.Finally,these output data are written back to the assignment unless P =AP.In Section 4,we show that task HDFS. assignments computed by a simple greedy round-robin algo- In such a framework.there is a single server.called the rithm might deviate from the optimum by a multiplicative master,that keeps track of all jobs in the whole distributed factor.In Section 5,we present an algorithm that employs system.The master runs a special process,called the job- maximum flow and increasing threshold techniques to com- tracker,that is responsible for task assignment and schedul- pute task assignments that are optimal to within an additive ing for the whole system.For the rest of servers that are constant called the slaves,each of them runs a process called the task- tracker.The tasktracker schedules the several tasks assigned 2. PROBLEM FORMALIZATION to the single server in a way similar to a normal operating system. The map task assignment is a vital part that affects the Definition 1.A Map-Reduce schema (MR-schema)is a completion time of the whole job.First,each reduce task pair (T.S),where T is a set of tasks and S is a set of servers. cannot begin until it receives the required intermediate data Let m =T and n =S.A task assignment is a function from all finished map tasks.Second,the assignment de- A:T-S that assigns each task t to a server A(t).2 Let termines the location of intermediate data and the pattern A ={T-S}be the set of all possible task assignments. of the communication traffic.Therefore,some algorithms An MR-system is a triple (T,S,w),where (T,S)is an MR- should be in place to optimize the task assignment. schema and w:T×A→Q is a cost function. 1.3 Related Work Intuitively,w(t,A)is the time to perform task t on server A(t)in the context of the complete assignment A.The mo- Since Kuhn [15]proposed the first method for the classic tivation for this level of generality is that the time to execute assignment problem in 1955,variations of the assignment problem have been under extensive study in many areas [5] a task t in Hadoop depends not only on the task and the server speed,but also on possible network congestion,which In the classic assignment problem,there are identical num- in turn is influenced by the other tasks running on the clus- ber of jobs and persons.An assignment is a one-to-one map- ter. ping from tasks to persons.Each job introduces a cost when it is assigned to a person.Therefore,an optimal assignment Definition 2.The load of server s under assignment A minimizes the total cost over all persons. In the area of parallel and distributed computing,when is defined as L=:A()=.w(t,A).The marimum load jobs are processed in parallel over several machines,one under assignment A is defined as L4=max,L.The total is interested in minimizing the maximum processing time load under assignment A is defined as of any machines.This problem is sometimes called the minimum makespan scheduling problem.This problem An MR-system models a cloud computer where all servers in general is known to be AP-complete [11].Under the work in parallel.Tasks assigned to the same server are processed sequentially,whereas tasks assigned to different iStrictly speaking,a map task in Hadoop sometimes pro- servers run in parallel.Thus,the total completion time of cesses data that comes from two successive file blocks.This the cloud under task assignment A is given by the maximum occurs because file blocks do not respect logical record load LA boundaries,so the last logical record processed by a map task might lie partly in the current data block and partly 2In an MR-schema,it is common that IT>S.Therefore in the succeeding block,requiring the map task to access in this paper,unlike the classic assignment problem where an the succeeding block in order to fetch the tail end of its last assignment refers to a one-to-one mapping or a permutation logical record. 5,15,we instead use the notion of many-to-one mapping.1.2 The MapReduce Framework A Hadoop system runs on top of a distributed file system, called the Hadoop Distributed File System (HDFS). HDFS usually runs on networked commodity PCs, where data are replicated and locally stored on hard disks of each machine. To store and process huge volume of data sets, HDFS typically uses a block size of 64MB. Therefore, moving computation close to the data is a design goal in the MapReduce framework. In the MapReduce framework, any application is specified by jobs. A MapReduce job splits the input data into independent blocks, which are processed by the map tasks in parallel. Each map task processes a single block1 consisting of some number of records. Each record in turn consists of a key/value pair. A map task applies the user defined map function to each input key/value pair and produces intermediate key/value pairs. The framework then sorts the intermediate data, and forwards them to the reduce tasks via interconnected networks. After receiving all intermediate key/value pairs with the same key, a reduce task executes the user defined reduce function and produces the output data. Finally, these output data are written back to the HDFS. In such a framework, there is a single server, called the master, that keeps track of all jobs in the whole distributed system. The master runs a special process, called the jobtracker, that is responsible for task assignment and scheduling for the whole system. For the rest of servers that are called the slaves, each of them runs a process called the tasktracker. The tasktracker schedules the several tasks assigned to the single server in a way similar to a normal operating system. The map task assignment is a vital part that affects the completion time of the whole job. First, each reduce task cannot begin until it receives the required intermediate data from all finished map tasks. Second, the assignment determines the location of intermediate data and the pattern of the communication traffic. Therefore, some algorithms should be in place to optimize the task assignment. 1.3 Related Work Since Kuhn [15] proposed the first method for the classic assignment problem in 1955, variations of the assignment problem have been under extensive study in many areas [5]. In the classic assignment problem, there are identical number of jobs and persons. An assignment is a one-to-one mapping from tasks to persons. Each job introduces a cost when it is assigned to a person. Therefore, an optimal assignment minimizes the total cost over all persons. In the area of parallel and distributed computing, when jobs are processed in parallel over several machines, one is interested in minimizing the maximum processing time of any machines. This problem is sometimes called the minimum makespan scheduling problem. This problem in general is known to be N P-complete [11]. Under the 1Strictly speaking, a map task in Hadoop sometimes processes data that comes from two successive file blocks. This occurs because file blocks do not respect logical record boundaries, so the last logical record processed by a map task might lie partly in the current data block and partly in the succeeding block, requiring the map task to access the succeeding block in order to fetch the tail end of its last logical record. identical-machine model, there are some well-known approximation algorithms. For example, Graham [12] proposed a (2 − 1/n)-approximation algorithm in 1966, where n is the total number of machines. Graham [13] proposed another 4/3-approximation algorithm in 1969. However, under the unrelated-machine model, this problem is known to be APXhard, both in terms of its offline [17] and online [1, 2] approximability. As some researchers [3, 4] pointed out, the scheduling mechanisms and polices that assign tasks to servers within the MapReduce framework can have a profound effect on ef- ficiency. An early version of Hadoop uses a simple heuristic algorithm that greedily exploits data locality. Zaharia, Konwinski and Joseph [19] proposed some heuristic refinements based on experimental results. 1.4 Our Contributions We investigate task assignment in Hadoop. In Section 2, we propose an idealized Hadoop model to evaluate the cost of task assignments. Based on this model, we show in Section 3 that there is no feasible algorithm to find the optimal assignment unless P = N P. In Section 4, we show that task assignments computed by a simple greedy round-robin algorithm might deviate from the optimum by a multiplicative factor. In Section 5, we present an algorithm that employs maximum flow and increasing threshold techniques to compute task assignments that are optimal to within an additive constant. 2. PROBLEM FORMALIZATION Definition 1. A Map-Reduce schema (MR-schema) is a pair (T, S), where T is a set of tasks and S is a set of servers. Let m = |T| and n = |S|. A task assignment is a function A: T → S that assigns each task t to a server A(t).2 Let A = {T → S} be the set of all possible task assignments. An MR-system is a triple (T, S, w), where (T, S) is an MRschema and w : T × A → Q + is a cost function. Intuitively, w(t, A) is the time to perform task t on server A(t) in the context of the complete assignment A. The motivation for this level of generality is that the time to execute a task t in Hadoop depends not only on the task and the server speed, but also on possible network congestion, which in turn is influenced by the other tasks running on the cluster. Definition 2. The load of server s under assignment A is defined as L A s = P t:A(t)=s w(t, A). The maximum load under assignment A is defined as L A = maxs L A s . The total load under assignment A is defined as HA = P s L A s . An MR-system models a cloud computer where all servers work in parallel. Tasks assigned to the same server are processed sequentially, whereas tasks assigned to different servers run in parallel. Thus, the total completion time of the cloud under task assignment A is given by the maximum load L A. 2 In an MR-schema, it is common that |T| ≥ |S|. Therefore in this paper, unlike the classic assignment problem where an assignment refers to a one-to-one mapping or a permutation [5, 15], we instead use the notion of many-to-one mapping

<<向上翻页向下翻页>>