Existing GP methods can be divided in_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Power-law Graph Computing - Theoretical and Empirical Analysis

正在加载图片...

Existing GP methods can be divided into two main categories:edge-cut and vertex-cut methods. Edge-cut tries to evenly assign the vertices to machines by cutting the edges.In contrast,vertex-cut tries to evenly assign the edges to machines by cutting the vertices.Figure 1 illustrates the edge- cut and vertex-cut partitioning results of an example graph.In Figure 1 (a),the edges(A,C)and (A,E)are cut,and the two machines store the vertex sets {A,B,D}and [C,E,respectively.In Figure 1(b),the vertex A is cut,and the two machines store the edge sets {(A,B),(A,D),(B,D)} and {(A,C),(A,E),(C,E),respectively.In edge-cut,both machines of a cut edge should maintain a ghost (local replica)of the vertex and the edge data.In vertex-cut,all the machines associated with a cut vertex should maintain a mirror (local replica)of the vertex.The ghosts and mirrors are shown in shaded vertices in Figure 1.In edge-cut,the workload of a machine is determined by the number of vertices located in that machine,and the communication cost of the whole graph is determined by the number of edges spanning different machines.In vertex-cut,the workload of a machine is determined by the number of edges located in that machine,and the communication cost of the whole graph is determined by the number of mirrors of the vertices. (D (a)Edge-Cut (b)Vertex-Cut Figure 1:Two strategies for graph partitioning.Shaded vertices are ghosts and mirrors,respectively Most traditional DGC frameworks,such as GraphLab [14]and Pregel [16],use edge-cut method- s [10,19,20,21]for GP.Very recently,the authors of PowerGraph [7]find that the vertex-cut methods can achieve better performance than edge-cut methods,especially for power-law graph- s.Hence,vertex-cut has attracted more and more attention from DGC research community.For example,PowerGraph [7]adopts a random vertex-cut method and two greedy variants for GP. GraphBuilder [9]provides some heuristics,such as the grid-based constrained solution,to improve the random vertex-cut method. Large natural graphs usually follow skewed degree distributions like power-law distributions,which makes GP challenging.Different vertex-cut methods can result in different performance for power- law graphs.For example,Figure 2(a)shows a toy power-law graph with only one vertex having much higher degree than the others.Figure 2(b)shows a partitioning strategy by cutting the vertices {E,F,A,C,D),and Figure 2(c)shows a partitioning strategy by cutting the vertices [A,E).We can find that the partitioning strategy in Figure 2(c)is better than that in Figure 2(b)because the number of mirrors in Figure 2(c)is smaller which means less communication cost.The intuition underlying this example is that cutting higher-degree vertices can result in fewer mirror vertices. Hence,the power-law degree distribution can be used to facilitate GP.Unfortunately,existing vertex- cut methods,including those in PowerGraph and GraphBuilder,make rarely use of the power-law degree distribution for GP.Hence,they cannot achieve satisfactory performance in natural power- law graphs.PowerLyra [4]tries to combine both edge-cut and vertex-cut together by using the power-law degree distribution.However,it is lack of theoretical guarantee. (b)Bad partitioning D (a)Sample (c)Good partitioning Figure 2:Partition a sample graph with vertex-cut.Existing GP methods can be divided into two main categories: edge-cut and vertex-cut methods. Edge-cut tries to evenly assign the vertices to machines by cutting the edges. In contrast, vertex-cut tries to evenly assign the edges to machines by cutting the vertices. Figure 1 illustrates the edgecut and vertex-cut partitioning results of an example graph. In Figure 1 (a), the edges (A,C) and (A,E) are cut, and the two machines store the vertex sets {A,B,D} and {C,E}, respectively. In Figure 1 (b), the vertex A is cut, and the two machines store the edge sets {(A,B),(A,D),(B,D)} and {(A,C),(A,E),(C,E)}, respectively. In edge-cut, both machines of a cut edge should maintain a ghost (local replica) of the vertex and the edge data. In vertex-cut, all the machines associated with a cut vertex should maintain a mirror (local replica) of the vertex. The ghosts and mirrors are shown in shaded vertices in Figure 1. In edge-cut, the workload of a machine is determined by the number of vertices located in that machine, and the communication cost of the whole graph is determined by the number of edges spanning different machines. In vertex-cut, the workload of a machine is determined by the number of edges located in that machine, and the communication cost of the whole graph is determined by the number of mirrors of the vertices. (a) Edge-Cut (b) Vertex-Cut Figure 1: Two strategies for graph partitioning. Shaded vertices are ghosts and mirrors, respectively. Most traditional DGC frameworks, such as GraphLab [14] and Pregel [16], use edge-cut methods [10, 19, 20, 21] for GP. Very recently, the authors of PowerGraph [7] find that the vertex-cut methods can achieve better performance than edge-cut methods, especially for power-law graphs. Hence, vertex-cut has attracted more and more attention from DGC research community. For example, PowerGraph [7] adopts a random vertex-cut method and two greedy variants for GP. GraphBuilder [9] provides some heuristics, such as the grid-based constrained solution, to improve the random vertex-cut method. Large natural graphs usually follow skewed degree distributions like power-law distributions, which makes GP challenging. Different vertex-cut methods can result in different performance for powerlaw graphs. For example, Figure 2 (a) shows a toy power-law graph with only one vertex having much higher degree than the others. Figure 2 (b) shows a partitioning strategy by cutting the vertices {E, F, A, C, D}, and Figure 2 (c) shows a partitioning strategy by cutting the vertices {A, E}. We can find that the partitioning strategy in Figure 2 (c) is better than that in Figure 2 (b) because the number of mirrors in Figure 2 (c) is smaller which means less communication cost. The intuition underlying this example is that cutting higher-degree vertices can result in fewer mirror vertices. Hence, the power-law degree distribution can be used to facilitate GP. Unfortunately, existing vertexcut methods, including those in PowerGraph and GraphBuilder, make rarely use of the power-law degree distribution for GP. Hence, they cannot achieve satisfactory performance in natural powerlaw graphs. PowerLyra [4] tries to combine both edge-cut and vertex-cut together by using the power-law degree distribution. However, it is lack of theoretical guarantee. (a) Sample (b) Bad partitioning (c) Good partitioning Figure 2: Partition a sample graph with vertex-cut. 2

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Power-law Graph Computing - Theoretical and Empirical Analysis