5.2.1 Properties of optimal assignments LEMMA5.10.LB≤VB In order to prove the approximation bound,we first es- PROOF.By definition,L=w(t,B)and VB= tablish some properties of optimal assignments. ∑tBe=sv(t,B).By Lemma5.&,un≤uem,and thus Definition 14.Given an HMR-system,let be the set w(t,B≤u6,B).It follows that Vs∈SL≤yB.There- of all optimal assignments,i.e.,those that minimize the fore LB s vB,because LB max,Lf and VB =VB= maximum load.Let rmin minfr4 A O}and let max,Vp. O1={0∈O|r0=rmin}. For the remainder of this section.let si be a server such LEMMA 5.6.Let OE01.If eo=ko,then r=0 and that =ko.Such a server exists by Lemma 5.9.Let LO=KO S2=S-Is1 be the set of remaining servers.For a partial PRooF.Let=ko for some server s.Assume to the assignment B2a,define Ne to be the average virtual load contrary that r≥1.Then L≥Ko+w2n.Let t be under B of the servers in S2.Formally, a remote task assigned to s by O.By definition 3,p(t,s') NA ∑ses2V2 = 2uoe+r3wem-V盟 holds for at least one server s's. = S2 n-1 Case 1:s'has at least one remote task t'.Then move t' to s and t to s'.This results in another assignment B.B is To obtain the approximation bound,we compare N with still optimal because≤Lg,Lg≤g,and L,=ig, the similar quantity Mo for the optimal assignment.For for any other server s"ES-{s,s'}. convenience,we let 6=wfem/(n-1). Case 2:s'has only local tasks.By the definition of ko s'has at most ko local tasks assigned by O.Then move t to LEMMA 5.11.Let B=ai=B'.Then s'.This results in another assignment B.B is still optimal V≤NB≤M°-d. because LLLKo+whoeKo wrem S LO PROOF.Proof is by a counting argument.By Lemma 5.9, and LB,=LO,for any other server s" ES-is,s. we havek=kO,sog,≥g=ko.Hence,V≥Ko In either case,we have shown the new assignment is in By Lemma5.2,we have o≥o.Letd=ee-°.d≥0 O.However,since t becomes local in the new assignment, because9≥e≥O.Because 3+rB+uP=m=o+rO fewer remote tasks are assigned than in O.This contradicts we have r8+u3+d=ro.Also,u3 2 1 since t is unassigned that O01.Thus,s is assigned no remote tasks,so Lo= kOoe=K0.☐ in B.Then by Lemma 5.8, Definition15.LetO∈O.DefineM°=He=e (n-1)Na ewloe +rwcem -va =(+d)wloc +(ro-u8 -d)wcem -Va LEMMA5.7.LO≥MO ≤t°ac+(r0-u2)u8nm-Ko PROOF.Let s be a server of maximal local load in O,so k=ko.Let S2=S-[s1).By Lemma 5.6,L=K ≤(m-l)M°-2m The total load on 52is∑.esLg=H°-Ko so the Hence,NB≤Mo-i. Now,since B is part of a trace,we have Vsv for all average load on S2isM°.Hence,Lo≥max.ES2Lg≥ avg,esa L Mo .□ s'5.In particular,V Na,since Na is the average virtual load of all servers in S2.We conclude that Vs 5.2.2 Analyzing the algorithm NB≤M0-6.☐ Assume throughout this section that OO and a PROOF OF THEOREM 5.4.Lemma 5.5 shows that the ao一ai一..→au=B is a trace generated by iteration time complexity of Algorithm 2 is O(m'n).Now we finish T=ko of the algorithm.Virtual loads are all based on a,so the proof for the approximation bound. we generally omit explicit mention of a in the superscripts Let s be a server of maximum virtual load in B,so VB of v and V. VB.Let i be the smallest integer such that i.=Bls,that is,no more tasks are assigned to s in the subtrace beginning LEMMA5.8.Moe≤uBn≤wRem≤u2n with ai. PROOF.wioe wm follows from the definition of a Case 1:i=0:Then ego <kao=ko by Lemma 5.9,and Hadoop cost function.Because Bo, Lemma 5.2.co 2to so r5m=ro rao=0,soVB=Vao≤Ko.Hence,.VB≤Ko≤LO Hence,wrem(rB)≤wrem(ra+u)≤wrem(rO)by mono- Case 2:i>0:Then B=a;=B'for some task t. tonicity of wrem().It follows by definition of the wfem no- By lemma5.ll,V2≤M°-d,so using Lemma5.8, tation that w品n≤wem≤um.☐ Vg'≤Ve+wem≤M°-6+wem LEMMA 5.9.=k Then by Lemma 5.7, PROoF.k<ko because no server is assigned more than r local tasks by max-cover at iteration=.For sake of VB-V-Vg'≤Mo+品m-i≤LO+u%m-i. contradiction,assume ka<k.Then u>0,because Both cases imply that VB Lo+wem -6.By otherwise a B and LB =ka.wloe <Ko<Lo,violating Lemma 5.10,we have LB <VE.Because the algorithm the optimality of O.Let t be an unassigned task in a.By chooses an assignment with least maximum load as the out- definition,p(t,s)holds for some server s.Assign t to s in put A,we have L4 <LB.Hence, a to obtain a new partial assignment B.We have k< ka+1≤kosT.By Lemma5.2,e≥ee,contradicting 小≤+品-i=°+(-n) the fact that 25=g+1.We conclude thatk=.5.2.1 Properties of optimal assignments In order to prove the approximation bound, we first establish some properties of optimal assignments. Definition 14. Given an HMR-system, let O be the set of all optimal assignments, i.e., those that minimize the maximum load. Let rmin = min{r A | A ∈ O} and let O1 = {O ∈ O | r O = rmin}. Lemma 5.6. Let O ∈ O1. If ` O s = k O, then r O s = 0 and L O s = KO. Proof. Let ` O s = k O for some server s. Assume to the contrary that r O s ≥ 1. Then L O s ≥ KO + w O rem. Let t be a remote task assigned to s by O. By definition 3, ρ(t, s0 ) holds for at least one server s 0 6= s. Case 1: s 0 has at least one remote task t 0 . Then move t 0 to s and t to s 0 . This results in another assignment B. B is still optimal because L B s ≤ L O s , L B s0 ≤ L O s0 , and L B s00 = L O s00 for any other server s 00 ∈ S − {s, s0 }. Case 2: s 0 has only local tasks. By the definition of k O, s 0 has at most k O local tasks assigned by O. Then move t to s 0 . This results in another assignment B. B is still optimal because L B s < LO s , L B s0 = KO + wloc ≤ KO + wrem ≤ L O s , and L B s00 = L O s00 for any other server s 00 ∈ S − {s, s0 }. In either case, we have shown the new assignment is in O. However, since t becomes local in the new assignment, fewer remote tasks are assigned than in O. This contradicts that O ∈ O1. Thus, s is assigned no remote tasks, so L O s = k Owloc = KO. Definition 15. Let O ∈ O1. Define MO = HO−KO n−1 . Lemma 5.7. L O ≥ MO. Proof. Let s1 be a server of maximal local load in O, so k O s1 = k O. Let S2 = S − {s1}. By Lemma 5.6, L O s1 = KO. The total load on S2 is P s∈S2 L O s = HO − KO, so the average load on S2 is MO. Hence, L O ≥ maxs∈S2 L O s ≥ avgs∈S2 L O s = MO. 5.2.2 Analyzing the algorithm Assume throughout this section that O ∈ O1 and α = α0 → α1 → . . . → αu = B is a trace generated by iteration τ = k O of the algorithm. Virtual loads are all based on α, so we generally omit explicit mention of α in the superscripts of v and V . Lemma 5.8. wloc ≤ w B rem ≤ w α rem ≤ w O rem. Proof. wloc ≤ w B rem follows from the definition of a Hadoop cost function. Because B ⊇ α, r B ≤ r α + u α . By Lemma 5.2, ` α ≥ ` O, so r α + u α = m − ` α ≤ m − ` O = r O. Hence, wrem(r B) ≤ wrem(r α + u α ) ≤ wrem(r O) by monotonicity of wrem(·). It follows by definition of the w β rem notation that w B rem ≤ w α rem ≤ w O rem. Lemma 5.9. k α = k O. Proof. k α ≤ k O because no server is assigned more than τ local tasks by max-cover at iteration τ = k O. For sake of contradiction, assume k α < kO. Then u α > 0, because otherwise α = B and L B = k α · wloc < KO ≤ L O, violating the optimality of O. Let t be an unassigned task in α. By definition, ρ(t, s) holds for some server s. Assign t to s in α to obtain a new partial assignment β. We have k β ≤ k α + 1 ≤ k O = τ . By Lemma 5.2, ` α ≥ ` β , contradicting the fact that ` β = ` α + 1. We conclude that k α = k O. Lemma 5.10. L B ≤ V B. Proof. By definition, L B s = P t:B(t)=s w(t, B) and V B s = P t:B(t)=s v(t, B). By Lemma 5.8, w B rem ≤ w α rem, and thus w(t, B) ≤ v(t, B). It follows that ∀s ∈ S, L B s ≤ V B s . Therefore L B ≤ V B, because L B = maxs L B s and V B = V B = maxs V B s . For the remainder of this section, let s1 be a server such that ` α s1 = k O. Such a server exists by Lemma 5.9. Let S2 = S − {s1} be the set of remaining servers. For a partial assignment β ⊇ α, define N β to be the average virtual load under β of the servers in S2. Formally, N β = P s∈S2 V β s |S2| = ` βwloc + r βw α rem − V β s1 n − 1 To obtain the approximation bound, we compare N β with the similar quantity MO for the optimal assignment. For convenience, we let δ = w O rem/(n − 1). Lemma 5.11. Let β = αi t:s −→ αi+1 = β 0 . Then V β s ≤ N β ≤ MO − δ. Proof. Proof is by a counting argument. By Lemma 5.9, we have k α = k O, so ` β s1 ≥ ` α s1 = k O. Hence, V β s1 ≥ KO. By Lemma 5.2, we have ` α ≥ ` O. Let d = ` β − ` O. d ≥ 0 because ` β ≥ ` α ≥ ` O. Because ` β +r β +u β = m = ` O +r O, we have r β +u β +d = r O. Also, u β ≥ 1 since t is unassigned in β. Then by Lemma 5.8, (n − 1)N β = ` βwloc + r βw α rem − V β s1 = (` O + d)wloc + (r O − u β − d)w α rem − V β s1 ≤ ` Owloc + (r O − u β )w O rem − K O ≤ (n − 1)MO − w O rem. Hence, N β ≤ MO − δ. Now, since β is part of a trace, we have V β s ≤ V β s0 for all s 0 ∈ S. In particular, V β s ≤ N β , since N β is the average virtual load of all servers in S2. We conclude that V β s ≤ N β ≤ MO − δ. Proof of Theorem 5.4. Lemma 5.5 shows that the time complexity of Algorithm 2 is O(m2n). Now we finish the proof for the approximation bound. Let s be a server of maximum virtual load in B, so V B s = V B. Let i be the smallest integer such that αi|s = B|s, that is, no more tasks are assigned to s in the subtrace beginning with αi. Case 1: i = 0: Then ` α0 s ≤ k α0 = k O by Lemma 5.9, and r α0 = 0, so V B = V α0 s ≤ KO. Hence, V B ≤ KO ≤ L O. Case 2: i > 0: Then β = αi−1 t:s −→ αi = β 0 for some task t. By lemma 5.11, V β s ≤ MO − δ, so using Lemma 5.8, V β 0 s ≤ V β s + w α rem ≤ MO − δ + w O rem. Then by Lemma 5.7, V B = V B s = V β 0 s ≤ MO + w O rem − δ ≤ L O + w O rem − δ. Both cases imply that V B ≤ L O + w O rem − δ. By Lemma 5.10, we have L B ≤ V B. Because the algorithm chooses an assignment with least maximum load as the output A, we have L A ≤ L B. Hence, L A ≤ L O + w O rem − δ = L O + „ 1 − 1 n − 1 « · w O rem