PROOF.Assume ti∈Gc,s:∈GR and A(t:)=s To check that this truth assignment is a satisfying as- t∈Gr,sf∈Gc and A()=s.Then each of si and s signment,note that for the three literal tasks lu,lu2,lu3, is assigned one remote task.Let B(ti)=s and B()=si at most two of them are assigned to the clause server Cu and then L≤LA=3andL号≤Ld=3.This process de- There must be one literal task,say l,that is locally as- creases ne and nr each by one,and the resulting assignment signed to a ring server.In this case,o()=true and thus is also feasible.Repeat the same process until the smaller the clause o(Cu)=true.This fact holds for all clauses and one of ne and nr becomes 0.Then let B(t)=A(t)for all thus indicates that ()=(ACu)=true.This completes the remaining tasks t E T.It is obvious that B is feasible. the proof of Lemma 3.10. and one of ne and nf equals Ine-ndl and the other one equals0.☐ Finally we prove the main theorem PROOF OF THEOREM 3.1.Lemmas 3.3,3.4 and 3.10 es- LEMMA 3.8.Let A be a feasible assignment in G'such tablish that 3SAT <p(2,3)-UHTA via f.Therefore,(2,3)- that ne=0.Then na =0. UHTA is NP-hard.It is easy to see that(2,3)-UHTA ENP PROOF.For the sake of contradiction,assume tiE GR, because in time O(mn)a nondeterministic Turing machine siE Gc and A(ti)=si.For each server si E Gc,there is could guess the assignment and accept iff the maximum load one auxiliary task au u E [1,a]such that p(au,sj)holds. under the assignment does not exceed 3.Therefore,(2,3)- Since wloc 1 and wrem =3,if A is feasible then A(au) UHTA is NP-complete..▣ A(a)for uv.Since there are a auxiliary tasks and o servers in Gc,one server is assigned exactly one auxiliary 4.A ROUND ROBIN ALGORITHM task.Since A(ti)=si,La 21+3>3,contradicting the In this section.we analyze a simple round robin algo- fact that A is feasible.Therefore,there is no tiE GR and rithm for the UHTA problem.Algorithm 1 is inspired by s∈Gc such that A(t)=st.Thus,n,4=0.☐ the Hadoop scheduler algorithm.It scans over each server in a round robin fashion.When assigning a new task to a LEMMA 3.9.Let A be a feasible assignment in G'such server,Algorithm 1 tries heuristically to exploit data local- that na =0.Then ne =0. ity.Since we have not specified the order of assigned tasks, PROOF.For the sake of contradiction,assume tiE Gc, Algorithm 1 may produce many possible outputs (assign- si E GR and A(ti)=si.Let ko,k1,k2,ks denote the number ments). of ring servers filled to load 0,1,2,3,respectively.From the total number of servers in GR,we have Algorithm 1 The round robin algorithm exploring locality. ko+k1+k2+k3=23w (1) 1:input:a set of unassigned tasks T,a list of servers {s1,s2,..,sn,a placement relation p Similarly.from the total number of tasks in GR.we have 2:define i1 as an index variable 3:define A as an assignment 0·k0+1·k1+2·k2+1·k3=33w (2) 4:A(t)=L(task t is unassigned)for all t 5:while exists unassigned task do Subtracting (1)from (2)gives k2 Bw+ko.Assigning 6: if exists unassigned task t such that p(t,si)holds then both neighboring ring tasks to the same ring server fills it update A by assigning A(t)=si to load 2.Since there are only 23w ring servers,we have else k2<Bw.Hence,ko =0 and k2 =Bw.This implies that all 9: pick any unassigned task t',update A by assigning A(t')=si ring tasks are assigned to ring servers in alternating positions 10 end if in each ring. 11: i(i mod n)+1 There are Bw remaining ring servers and Bw variable tasks. 12:end while Therefore,a variable task is remotely assigned to one of the 13:output:assignment A remaining ring servers by A. Now consider the server si that has been remotely as- Algorithm 1 is analogous to the Hadoop scheduler algo- signed ti E Gc.If it is assigned two ring tasks,its load is rithm up to core version 0.19.There are three differences, La =2+3>3.If it is assigned one variable task,its load is though.First,the Hadoop algorithm assumes three kinds of L=3+3>3.A is not feasible in either case.Therefore, placement:data-local,rack-local and rack-remote,whereas there is no ti E Gc and si E GR such that A(ti)=si.Thus, Algorithm 1 assumes only two:local and remote.Second ne=0.☐ the Hadoop scheduler works incrementally rather than as- signing all tasks initially.Last,the Hadoop algorithm is Now we prove the following Lemma. deterministic,whereas Algorithm 1 is nondeterministic. LEMMA 3.10.If G=f(o)is admissible,then o is satis- THEOREM 4.1.If wrem woc,increasing the number of fiable. data block replicas may increase the marimum load of the PROOF.Given feasible task assignment A in G=f(), assignment computed by Algorithm 1. we construct the satisfying truth assignment o for o.From PROOF.The number of edges in the placement graph is Lemmas 3.6,3.7,3.8 and 3.9,we construct a feasible assign- equal to the number of data block replicas,and thus adding ment B in G,such that ne=n=0,and in each variable a new edge in the placement graph is equivalent to adding gadget i,either servers in Ti or servers in Fi are saturated by a new replica in the system.Consider the simple placement variable tasks.If ring servers in Fi are saturated by variable graph G where m =n,and there is an edge between task tasks,let o(ri)=true.If ring servers in Ti are saturated by ti and si for all 1 i<n.Running Algorithm 1 gives variable tasks,let o(xi)=false. an assignment A in which task ti is assigned to si for allProof. Assume ti ∈ GC , si ∈ GR and A(ti) = si; t 0 i ∈ GR, s 0 i ∈ GC and A(t 0 i) = s 0 i . Then each of si and s 0 i is assigned one remote task. Let B(ti) = s 0 i and B(t 0 i) = si, and then L B si ≤ L A si = 3 and L B s 0 i ≤ L A s 0 i = 3. This process decreases nc and nr each by one, and the resulting assignment is also feasible. Repeat the same process until the smaller one of nc and nr becomes 0. Then let B(t) = A(t) for all the remaining tasks t ∈ T 0 . It is obvious that B is feasible, and one of n B c and n B r equals |n A c − n A r | and the other one equals 0. Lemma 3.8. Let A be a feasible assignment in G 0 such that n A c = 0. Then n A r = 0. Proof. For the sake of contradiction, assume ti ∈ GR, si ∈ GC and A(ti) = si. For each server sj ∈ GC , there is one auxiliary task au : u ∈ [1, α] such that ρ(au, sj ) holds. Since wloc = 1 and wrem = 3, if A is feasible then A(au) 6= A(av) for u 6= v. Since there are α auxiliary tasks and α servers in GC , one server is assigned exactly one auxiliary task. Since A(ti) = si, L A si ≥ 1 + 3 > 3, contradicting the fact that A is feasible. Therefore, there is no ti ∈ GR and si ∈ GC such that A(ti) = si. Thus, n A r = 0. Lemma 3.9. Let A be a feasible assignment in G 0 such that n A r = 0. Then n A c = 0. Proof. For the sake of contradiction, assume ti ∈ GC , si ∈ GR and A(ti) = si. Let k0, k1, k2, k3 denote the number of ring servers filled to load 0, 1, 2, 3, respectively. From the total number of servers in GR, we have k0 + k1 + k2 + k3 = 2βω (1) Similarly, from the total number of tasks in GR, we have 0 · k0 + 1 · k1 + 2 · k2 + 1 · k3 = 3βω (2) Subtracting (1) from (2) gives k2 = βω + k0. Assigning both neighboring ring tasks to the same ring server fills it to load 2. Since there are only 2βω ring servers, we have k2 ≤ βω. Hence, k0 = 0 and k2 = βω. This implies that all ring tasks are assigned to ring servers in alternating positions in each ring. There are βω remaining ring servers and βω variable tasks. Therefore, a variable task is remotely assigned to one of the remaining ring servers by A. Now consider the server si that has been remotely assigned ti ∈ GC . If it is assigned two ring tasks, its load is L A si = 2+ 3 > 3. If it is assigned one variable task, its load is L A si = 3 + 3 > 3. A is not feasible in either case. Therefore, there is no ti ∈ GC and si ∈ GR such that A(ti) = si. Thus, n A c = 0. Now we prove the following Lemma. Lemma 3.10. If G = f(φ) is admissible, then φ is satis- fiable. Proof. Given feasible task assignment A in G = f(φ), we construct the satisfying truth assignment σ for φ. From Lemmas 3.6, 3.7, 3.8 and 3.9, we construct a feasible assignment B in G 0 , such that n B c = n B r = 0, and in each variable gadget i, either servers in Ti or servers in Fi are saturated by variable tasks. If ring servers in Fi are saturated by variable tasks, let σ(xi) = true. If ring servers in Ti are saturated by variable tasks, let σ(xi) = false. To check that this truth assignment is a satisfying assignment, note that for the three literal tasks lu1, lu2, lu3, at most two of them are assigned to the clause server Cu. There must be one literal task, say luv, that is locally assigned to a ring server. In this case, σ(luv) = true and thus the clause σ(Cu) = true. This fact holds for all clauses and thus indicates that σ(φ) = σ( V Cu) = true. This completes the proof of Lemma 3.10. Finally we prove the main theorem. Proof of Theorem 3.1. Lemmas 3.3, 3.4 and 3.10 establish that 3SAT ≤p (2,3)-UHTA via f. Therefore, (2,3)- UHTA is N P-hard. It is easy to see that (2,3)-UHTA ∈ N P because in time O(mn) a nondeterministic Turing machine could guess the assignment and accept iff the maximum load under the assignment does not exceed 3. Therefore, (2, 3)- UHTA is N P-complete. 4. A ROUND ROBIN ALGORITHM In this section, we analyze a simple round robin algorithm for the UHTA problem. Algorithm 1 is inspired by the Hadoop scheduler algorithm. It scans over each server in a round robin fashion. When assigning a new task to a server, Algorithm 1 tries heuristically to exploit data locality. Since we have not specified the order of assigned tasks, Algorithm 1 may produce many possible outputs (assignments). Algorithm 1 The round robin algorithm exploring locality. 1: input: a set of unassigned tasks T, a list of servers {s1, s2, · · · , sn}, a placement relation ρ 2: define i ← 1 as an index variable 3: define A as an assignment 4: A(t) = ⊥ (task t is unassigned) for all t 5: while exists unassigned task do 6: if exists unassigned task t such that ρ(t, si) holds then 7: update A by assigning A(t) = si 8: else 9: pick any unassigned task t 0 , update A by assigning A(t 0 ) = si 10: end if 11: i ← (i mod n) + 1 12: end while 13: output: assignment A Algorithm 1 is analogous to the Hadoop scheduler algorithm up to core version 0.19. There are three differences, though. First, the Hadoop algorithm assumes three kinds of placement: data-local, rack-local and rack-remote, whereas Algorithm 1 assumes only two: local and remote. Second, the Hadoop scheduler works incrementally rather than assigning all tasks initially. Last, the Hadoop algorithm is deterministic, whereas Algorithm 1 is nondeterministic. Theorem 4.1. If wrem > wloc, increasing the number of data block replicas may increase the maximum load of the assignment computed by Algorithm 1. Proof. The number of edges in the placement graph is equal to the number of data block replicas, and thus adding a new edge in the placement graph is equivalent to adding a new replica in the system. Consider the simple placement graph G where m = n, and there is an edge between task ti and si for all 1 ≤ i ≤ n. Running Algorithm 1 gives an assignment A in which task ti is assigned to si for all