4 Theoretical Analysis In this sectio_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Power-law Graph Computing - Theoretical and Empirical Analysis

正在加载图片...

4 Theoretical Analysis In this section,we present theoretical analysis for our DBH method.For comparison,the ran- dom vertex-cut method (called Random)of PowerGraph [7]and the grid-based constrained solu- tion(called Grid)of GraphBuilder [9]are adopted as baselines.Our analysis is based on random- ization.Moreover,we assume that the graph is undirected and there are no duplicated edges in the graph.We mainly study the performance in terms of replication factor,edge-imbalance and vertex- imbalance defined in Section 2.Due to space limitation,we put the proofs of all theoretical results into the supplementary material. 4.1 Partitioning Degree-fixed Graphs Firstly,we assume that the degree sequence [di is fixed.Then we can get the following expected replication factor produced by different methods. Random assigns each edge evenly to the p machines via a randomized hash function.The result can be directly got from PowerGraph [7]. Lemma 1.Assume that we have a sequence of n vertices [vi and the corresponding degree sequence D={diA simple randomized vertex-cut on p machines has the expected replication factor: 空ap---)] By using the Grid hash function,each vertex has p rather than p candidate machines compared to Random.Thus we simply replace p with p to get the following corollary. Corollary 1.By using Grid for hashing,the expected replication factor on p machines is: 24l0-三--》] Using DBH method in Section 3.2,we obtain the following result by fixing the sequence [h, where h;is defined as the number of v:'s adjacent edges which are hashed by the neighbors of v according to the edge-hash function defined in(1). Theorem 1.Assume that we have a sequence ofn vertices and the corresponding degree sequence D =(di.For each vi.di-hi adjacent edges of it are hashed by vi itself.Define H=hi.Our DBH method on p machines has the expected replication factor: 2---門s--] where hi≤d-1for any. This theorem says that our DBH method has smaller expected replication factor than Random of PowerGraph [7]. Next we turn to the analysis of the balance constraints.We still fix the degree sequence and have the following result for our DBH method. Theorem 2.Our DBH method on p machines with the sequences {vi)1,{di and (hi defined in Theorem I has the edge-imbalance: maxl{e∈E|M(e)=mH +ma ∑(d-h) 1=1 P jElp]vEP; EV/p 2EV/p Although the master vertices are evenly assigned to each machine,we want to show how the ran- domized assignment is close to the perfect balance.This problem is well studied in the model of uniformly throwing n balls into p bins when n>p(Inp)3 [18].4 Theoretical Analysis In this section, we present theoretical analysis for our DBH method. For comparison, the random vertex-cut method (called Random) of PowerGraph [7] and the grid-based constrained solution (called Grid) of GraphBuilder [9] are adopted as baselines. Our analysis is based on randomization. Moreover, we assume that the graph is undirected and there are no duplicated edges in the graph. We mainly study the performance in terms of replication factor, edge-imbalance and verteximbalance defined in Section 2. Due to space limitation, we put the proofs of all theoretical results into the supplementary material. 4.1 Partitioning Degree-fixed Graphs Firstly, we assume that the degree sequence {di} n i=1 is fixed. Then we can get the following expected replication factor produced by different methods. Random assigns each edge evenly to the p machines via a randomized hash function. The result can be directly got from PowerGraph [7]. Lemma 1. Assume that we have a sequence of n vertices {vi} n i=1 and the corresponding degree sequence D = {di} n i=1. A simple randomized vertex-cut on p machines has the expected replication factor: E " 1 n Xn i=1 |A(vi)| D # = p n Xn i=1 1 − 1 − 1 p di . By using the Grid hash function, each vertex has √p rather than p candidate machines compared to Random. Thus we simply replace p with √p to get the following corollary. Corollary 1. By using Grid for hashing, the expected replication factor on p machines is: E " 1 n Xn i=1 |A(vi)| D # = √p n Xn i=1 1 − 1 − 1 √p di . Using DBH method in Section 3.2, we obtain the following result by fixing the sequence {hi} n i=1, where hi is defined as the number of vi’s adjacent edges which are hashed by the neighbors of vi according to the edge-hash function defined in (1). Theorem 1. Assume that we have a sequence of n vertices {vi} n i=1 and the corresponding degree sequence D = {di} n i=1. For each vi , di − hi adjacent edges of it are hashed by vi itself. Define H = {hi} n i=1. Our DBH method on p machines has the expected replication factor: E " 1 n Xn i=1 |A(vi)| H, D# = p n Xn i=1 1 − 1 − 1 p hi+1 ≤ p n Xn i=1 1 − 1 − 1 p di , where hi ≤ di − 1 for any vi . This theorem says that our DBH method has smaller expected replication factor than Random of PowerGraph [7]. Next we turn to the analysis of the balance constraints. We still fix the degree sequence and have the following result for our DBH method. Theorem 2. Our DBH method on p machines with the sequences {vi} n i=1, {di} n i=1 and {hi} n i=1 defined in Theorem 1 has the edge-imbalance: max m |{e ∈ E | M(e) = m}| |E|/p = Pn i=1 hi p + max j∈[p] P vi∈Pj (di − hi) 2|E|/p . Although the master vertices are evenly assigned to each machine, we want to show how the randomized assignment is close to the perfect balance. This problem is well studied in the model of uniformly throwing n balls into p bins when n p(ln p) 3 [18]. 5

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Power-law Graph Computing - Theoretical and Empirical Analysis