正在加载图片...
Lemma 2.The maximum number of master vertices for each machine is bounded as follows: (Pr[MazLoad ka]o(1) fa>1, Pr[MarLoad>ka]=1-o(1)if0<a<1 Here MarLoad=max HvEV Master(v)=m)l,and ka=+ 2n In p 2alnp 4.2 Partitioning Power-law Graphs Now we change the sequence of fixed degrees into a sequence of random samples generated from the power-law distribution.As a result,upper-bounds can be provided for the above three methods, which are Random.Grid and DBH. Theorem 3.Let the minimal degree be dmin and each de {di be sampled from a power-law degree distribution with parameter o E(2,3).The expected replication factor of Random on p machines can be approximately bounded by: E0--)】s--] where=dmin×&号 This theorem says that when the degree sequence is under power-law distribution,the upper bound of the expected replication factor increases as a decreases.This implies that Random yields a worse partitioning when the power-law graph is more skewed. Like Corollary 1,we replace p with p to get the similar result for Grid. Corollary 2.By using Grid method,the expected replication factor on p machines can be approxi- mately bounded by: 9三--)】s--捫 whee2=dmin×&二2 Note that万l-(-)叫sp--分)l-SoCoray'2 ehGrd can rduce the replication factor but it is not motivated by the skewness of the degree distribution. Theorem 4.Assume each edge is hashed by our DBH method and hi di-1 for any vi.The expected replication factor of DBH on p machines can be approximately bounded by: EH.D --)】s--] where-dmin×&二-dminx+ Note that --)]<--)] Therefore.our DBH method can expectedly reduce the replication factor.The teincreases as o decreases,which means our DBH reduces more replication factor when the power-law graph is more skewed.Note that Grid and our DBH method actually use two different ways to reduce the replication factor.Grid reduces more replication factor when p grows.These two approaches can be combined to obtain further improvement,which is not the focus of this paper. Finally,we show that our DBH methd also guarantees good edge-balance (workload balance)under power-law distributions. 6Lemma 2. The maximum number of master vertices for each machine is bounded as follows:  Pr[M axLoad > ka] = o(1) if a > 1, Pr[M axLoad > ka] = 1 − o(1) if 0 < a < 1. Here M axLoad = max m |{v ∈ V | M aster(v) = m}|, and ka = n p + r 2n ln p p  1 − ln ln p 2a ln p  . 4.2 Partitioning Power-law Graphs Now we change the sequence of fixed degrees into a sequence of random samples generated from the power-law distribution. As a result, upper-bounds can be provided for the above three methods, which are Random, Grid and DBH. Theorem 3. Let the minimal degree be dmin and each d ∈ {di} n i=1 be sampled from a power-law degree distribution with parameter α ∈ (2, 3). The expected replication factor of Random on p machines can be approximately bounded by: ED " p n Xn i=1  1 −  1 − 1 p di # ≤ p  1 −  1 − 1 p Ωˆ  , where Ω =ˆ dmin × α−1 α−2 . This theorem says that when the degree sequence is under power-law distribution, the upper bound of the expected replication factor increases as α decreases. This implies that Random yields a worse partitioning when the power-law graph is more skewed. Like Corollary 1, we replace p with √p to get the similar result for Grid. Corollary 2. By using Grid method, the expected replication factor on p machines can be approxi￾mately bounded by: ED "√p n Xn i=1  1 −  1 − 1 √p di # ≤ √ p  1 −  1 − 1 √p Ωˆ  , where Ω =ˆ dmin × α−1 α−2 . Note that √p  1 −  1 − √ 1 p Ωˆ  ≤ p  1 −  1 − 1 p Ωˆ  . So Corollary 2 tells us that Grid can reduce the replication factor but it is not motivated by the skewness of the degree distribution. Theorem 4. Assume each edge is hashed by our DBH method and hi ≤ di − 1 for any vi . The expected replication factor of DBH on p machines can be approximately bounded by: EH,D " p n Xn i=1  1 −  1 − 1 p hi+1# ≤ p  1 −  1 − 1 p Ωˆ0  , where Ωˆ0 = dmin × α−1 α−2 − dmin × α−1 2α−3 + 1 2 . Note that p  1 −  1 − 1 p Ωˆ0  < p  1 −  1 − 1 p Ωˆ  . Therefore, our DBH method can expectedly reduce the replication factor. The term α−1 2α−3 increases as α decreases, which means our DBH reduces more replication factor when the power-law graph is more skewed. Note that Grid and our DBH method actually use two different ways to reduce the replication factor. Grid reduces more replication factor when p grows. These two approaches can be combined to obtain further improvement, which is not the focus of this paper. Finally, we show that our DBH methd also guarantees good edge-balance (workload balance) under power-law distributions. 6
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有