3 Degree-Based Hashing for GP In this_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Power-law Graph Computing - Theoretical and Empirical Analysis

正在加载图片...

3 Degree-Based Hashing for GP In this section,we propose a novel vertex-cut method,called degree-based hashing (DBH),to ef- fectively exploit the power-law distribution for GP. 3.1 Hashing Model We refer to a certain machine by its index idz,and the idrth machine is denoted as Pid.We first de- fine two kinds of hash functions:vertex-hash function id verter_hash(v)which hashes vertex v to the machine Pidz,and edge-hash function idr edgehash(e)or idx edge_hash(vi,vj) which hashes edge e=(vi,vi)to the machine Pidz. Our hashing model includes two main components: Master-vertex assignment:The master replica of vi is uniquely assigned to one of the p machines with equal probability for each machine by some randomized hash function verter hash(vi). Edge assignment:Each edge e =(vi,vj)is assigned to one of the p machines by some hash function edge_hash(vi,vj). It is easy to find that the above hashing model is a vertex-cut GP method.The master-vertex as- signment can be easily implemented,which can also be expected to achieve a low vertex-imbalance score.On the contrary,the edge assignment is much more complicated.Different edge-hash func- tions can achieve different replication factors and different edge-imbalance scores.Please note that replication factor reflects communication cost,and edge-imbalance reflects workload-imbalance. Hence,the key of our hashing model lies in the edge-hash function edge_hash(vi,vj). 3.2 Degree-Based Hashing From the example in Figure 2,we observe that in power-law graphs the replication factor,which is defined as the total number of replicas divided by the total number of vertices,will be smaller if we cut vertices with relatively higher degrees.Based on this intuition,we define the edge_hash(vi,vj) as follows: edgehash(vi,vj)= fvertex-hash(vi)if di<dj, (1) verterhash(v;)otherwise. It means that we use the vertex-hash function to define the edge-hash function.Furthermore,the edge-hash function value of an edge is determined by the degrees of the two associated vertices. More specifically,the edge-hash function value of an edge is defined by the vertex-hash function value of the associated vertex with a smaller degree.Hence,our method is called degree-based hashing(DBH).DBH can effectively capture the intuition that cutting vertices with higher degrees will get better performance. Our DBH method for vertex-cut GP is briefly summarized in Algorithm 1,where [n)={1,...,n}. Algorithm 1 Degree-based hashing(DBH)for vertex-cut GP Input:The set of edges E;the set of vertices V;the number of machines p. Output:The assignment M(e)E[p for each edge e. 1:Initialization:count the degree di for each i [n]in parallel 2:for all e =(vi,vj)EE do 3: Hash each edge in parallel: 4: if di<di then 5: M(e)←-verter_hash(i) 6: else M(e)verterhash(vj) 8: end if 9:end for3 Degree-Based Hashing for GP In this section, we propose a novel vertex-cut method, called degree-based hashing (DBH), to effectively exploit the power-law distribution for GP. 3.1 Hashing Model We refer to a certain machine by its index idx, and the idxth machine is denoted as Pidx. We first de- fine two kinds of hash functions: vertex-hash function idx = vertex hash(v) which hashes vertex v to the machine Pidx, and edge-hash function idx = edge hash(e) or idx = edge hash(vi , vj ) which hashes edge e = (vi , vj ) to the machine Pidx. Our hashing model includes two main components: • Master-vertex assignment: The master replica of vi is uniquely assigned to one of the p machines with equal probability for each machine by some randomized hash function vertex hash(vi). • Edge assignment: Each edge e = (vi , vj ) is assigned to one of the p machines by some hash function edge hash(vi , vj ). It is easy to find that the above hashing model is a vertex-cut GP method. The master-vertex assignment can be easily implemented, which can also be expected to achieve a low vertex-imbalance score. On the contrary, the edge assignment is much more complicated. Different edge-hash functions can achieve different replication factors and different edge-imbalance scores. Please note that replication factor reflects communication cost, and edge-imbalance reflects workload-imbalance. Hence, the key of our hashing model lies in the edge-hash function edge hash(vi , vj ). 3.2 Degree-Based Hashing From the example in Figure 2, we observe that in power-law graphs the replication factor, which is defined as the total number of replicas divided by the total number of vertices, will be smaller if we cut vertices with relatively higher degrees. Based on this intuition, we define the edge hash(vi , vj ) as follows: edge hash(vi , vj ) = vertex hash(vi) if di < dj , vertex hash(vj ) otherwise. (1) It means that we use the vertex-hash function to define the edge-hash function. Furthermore, the edge-hash function value of an edge is determined by the degrees of the two associated vertices. More specifically, the edge-hash function value of an edge is defined by the vertex-hash function value of the associated vertex with a smaller degree. Hence, our method is called degree-based hashing (DBH). DBH can effectively capture the intuition that cutting vertices with higher degrees will get better performance. Our DBH method for vertex-cut GP is briefly summarized in Algorithm 1, where [n] = {1, . . . , n}. Algorithm 1 Degree-based hashing (DBH) for vertex-cut GP Input: The set of edges E; the set of vertices V ; the number of machines p. Output: The assignment M(e) ∈ [p] for each edge e. 1: Initialization: count the degree di for each i ∈ [n] in parallel 2: for all e = (vi , vj ) ∈ E do 3: Hash each edge in parallel: 4: if di < dj then 5: M(e) ← vertex hash(vi) 6: else 7: M(e) ← vertex hash(vj ) 8: end if 9: end for 4

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Distributed Power-law Graph Computing - Theoretical and Empirical Analysis