正在加载图片...
Distributed Power-law Graph Computing: Theoretical and Empirical Analysis Cong Xie Ling Yan Dept.of Comp.Sci.and Eng. Dept.of Comp.Sci.and Eng. Shanghai Jiao Tong University Shanghai Jiao Tong University 800 Dongchuan Road 800 Dongchuan Road Shanghai 200240,China Shanghai 200240,China xcgoner1108@qmail.com yling0718@sjtu.edu.cn Wu-Jun Li Zhihua Zhang National Key Lab.for Novel Software Tech. Dept.of Comp.Sci.and Eng. Dept.of Comp.Sci.and Tech. Shanghai Jiao Tong University Nanjing University 800 Dongchuan Road Nanjing 210023,China Shanghai 200240,China liwujun@nju.edu.cn zhang-zhecs.situ.edu.cn Abstract With the emergence of big graphs in a variety of real applications like social networks,machine learning based on distributed graph-computing(DGC)frame works has attracted much attention from big data machine learning community. In DGC frameworks,the graph partitioning(GP)strategy plays a key role to af- fect the performance,including the workload balance and communication cost. Typically,the degree distributions of natural graphs from real applications follow skewed power laws,which makes GP a challenging task.Recently,many methods have been proposed to solve the GP problem.However,the existing GP methods cannot achieve satisfactory performance for applications with power-law graphs. In this paper,we propose a novel vertex-cut method,called degree-based hash- ing(DBH),for GP.DBH makes effective use of the skewed degree distributions for GP.We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance. Furthermore,empirical results on several large power-law graphs also show that DBH can outperform the state of the art. 1 Introduction Recent years have witnessed the emergence of big graphs in a large variety of real applications, such as the web and social network services.Furthermore,many machine learning and data mining algorithms can also be modeled with graphs [14].Hence,machine learning based on distributed graph-computing(DGC)frameworks has attracted much attention from big data machine learning community [14,16,15,7,12,8].To perform distributed(parallel)graph-computing on clusters with several machines(servers),one has to partition the whole graph across the machines in a cluster. Graph partitioning(GP)can dramatically affect the performance of DGC frameworks in terms of workload balance and communication cost.Hence,the GP strategy typically plays a key role in DGC frameworks.The ideal GP method should minimize the cross-machine communication cost, and simultaneously keep the workload in every machine approximately balanced.Distributed Power-law Graph Computing: Theoretical and Empirical Analysis Cong Xie Dept. of Comp. Sci. and Eng. Shanghai Jiao Tong University 800 Dongchuan Road Shanghai 200240, China xcgoner1108@gmail.com Ling Yan Dept. of Comp. Sci. and Eng. Shanghai Jiao Tong University 800 Dongchuan Road Shanghai 200240, China yling0718@sjtu.edu.cn Wu-Jun Li National Key Lab. for Novel Software Tech. Dept. of Comp. Sci. and Tech. Nanjing University Nanjing 210023, China liwujun@nju.edu.cn Zhihua Zhang Dept. of Comp. Sci. and Eng. Shanghai Jiao Tong University 800 Dongchuan Road Shanghai 200240, China zhang-zh@cs.sjtu.edu.cn Abstract With the emergence of big graphs in a variety of real applications like social networks, machine learning based on distributed graph-computing (DGC) frame￾works has attracted much attention from big data machine learning community. In DGC frameworks, the graph partitioning (GP) strategy plays a key role to af￾fect the performance, including the workload balance and communication cost. Typically, the degree distributions of natural graphs from real applications follow skewed power laws, which makes GP a challenging task. Recently, many methods have been proposed to solve the GP problem. However, the existing GP methods cannot achieve satisfactory performance for applications with power-law graphs. In this paper, we propose a novel vertex-cut method, called degree-based hash￾ing (DBH), for GP. DBH makes effective use of the skewed degree distributions for GP. We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance. Furthermore, empirical results on several large power-law graphs also show that DBH can outperform the state of the art. 1 Introduction Recent years have witnessed the emergence of big graphs in a large variety of real applications, such as the web and social network services. Furthermore, many machine learning and data mining algorithms can also be modeled with graphs [14]. Hence, machine learning based on distributed graph-computing (DGC) frameworks has attracted much attention from big data machine learning community [14, 16, 15, 7, 12, 8]. To perform distributed (parallel) graph-computing on clusters with several machines (servers), one has to partition the whole graph across the machines in a cluster. Graph partitioning (GP) can dramatically affect the performance of DGC frameworks in terms of workload balance and communication cost. Hence, the GP strategy typically plays a key role in DGC frameworks. The ideal GP method should minimize the cross-machine communication cost, and simultaneously keep the workload in every machine approximately balanced. 1
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有