Method Computation complexity Memory _中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Blocking-based Neighbor Sampling for Large-scale Graph Neural Network

正在加载图片...

Method Computation complexity Memory complexity Non-sampling [Kipf and Welling,2017] O(s-(sBr+Br2)) O(Lr2+s.Br NS [Hamilton et al..2017] 0(s -1.(snBr+Bm2)】 O(Lr2+sh·Br) VRGCN IChen et al..2018al (sh-1.((sn+s).Br+Br) O(Lr2s1(s+sn)Br) MVS-GNN [Cong et al.,2020] L-1.((sn+s).Br+Br2)) O(Lr2+s-1(s+sn)Br) LADIES [Zou et al..2019] O(L·(s/W)2.Alo+Ls·T O(Lr2+Ls·r) GraphSAINT [Zeng et al.,2020] O(L·(sg/W)2.Alo+Lsg·r O(Lr2+Lsg·r) BNS (ours) Os-·(smBr+(6/1-6)+1)·Br2)】 OLr2+s站-Sn Br】 Table 1:Computation and memory complexity.s denotes the average number of neighbors per node in A.sn denotes the average number of neighbors sampled for each node.sn=snx(1-6),where o denotes the ratio of blocked nodes in BNS.st denotes the average number of nodes per layer in layer-wise sampling.s denotes the average number of nodes per subgraph in subgraph sampling.B=Vin denotes the mini-batch size of output layer.L is the number of layers in GNN models.r is the hidden dimension of networks. lack of systematically experimental analysis on large-scale 4.2 Baselines and Settings graphs.For example,even with low computation complex- We compare BNS with VRGCN [Chen et al.,2018a], ity,VRGCN,MVS-GNN and LADIES have a high time cost LADIES [Zou et al.,2019]and GraphSAINT [Zeng et al., of preparing data(including sampling)before each round of 2020].which are the state-of-the-art methods with node-wise mini-batch training.In addition,VRGCN brings a huge bur- sampling,layer-wise sampling and subgraph sampling,re- den to the memory for storing all nodes'historical represen- spectively.Additionally,we compare BNS with the classi- tations at each layer.MVS-GNN has the same complexi- cal node-wise sampling method NS [Hamilton et al.,2017] ty as the non-sampling method at the outer iteration,which We do not compare BNS with MVS-GNN since MVS-GNN might make the training infeasible on large-scale graphs be- adopts the non-sampling strategy for training at the outer it- cause of running out of graphics memory.GraphSAINT faces eration,which leads to the problem of running out of graph- the problem of sparse connection in subgraphs.Moreover, ics memory.Besides,comparisons with model simplifica- GraphSAINT adopts the non-sampling strategy at the evalua- tion methods are moved to the Appendix due to space limi- tion and testing stage,which is also inefficient on large-scale tation.Since the original implementations of the above base- graphs. lines cannot directly scale to the benchmark datasets in this Similar to existing node-wise sampling methods,BNS re- paper,we re-implement them according to the corresponding duces the growth rate from s to a small sn,where sn denotes authors'codes.For a fair comparison,implementations of all the number of non-blocked neighbors per node.We will show methods,including BNS,only differ in the sampling process. that with a small sn,BNS can achieve comparable accuracy For all methods,GNN model is instantiated with GraphSAGE to NS with a large sn,while BNS has lower computation and Hamilton et al.,2017],since it can achieve good perfor- memory complexity.Moreover,BNS has a low time cost of mance on the benchmark datasets.Note that sampling strate- preparing data before each round of mini-batch training. gies and settings during inference are the same as those in the training stage for all methods except for GraphSAINT. 4 Experiments The hyper-parameters r,L,T(maximum epoch),A and p(probability of dropout)are independent of sampling strate- In this section,we compare BNS with other baselines on five gies,and hence they are set to be the same for different sam- node-classification datasets.BNS is implemented on the Py- pling strategies on one specific dataset.Empirically,r is set torch platform [Paszke et al..2019]with Pytorch-Geometric to 128 on all datasets,L is set to 5 on both ogbn-proteins and Library Fey and Lenssen,2019.All experiments are run on ogbn-products,and L is set to 3 on ogbn-papers100M.For T. a NVIDIA TitanXP GPU server with 12 GB graphics memo- it is set to 100 on both ogbn-products and ogbn-papers100M, y and set to 1,000 on ogbn-proteins.For A and p,the values of them are obtained by tuning with NS on the benchmark 4.1 Datasets datasets.On ogbn-product,A =5 x 10-6 and p =0.1.On ogbn-papers100M,入=5×10-7andp=0.1.On ogbn- Ogbn-products,ogbn-papers100M and ogbn-proteins2 are proteins,A =0 and p =0.In BNS,we set p to 0.5 for con- publicly available [Hu et al.,2020].Ogbn-products is a large- venience and do not tune it.Adam [Kingma and Ba,2015] scale dataset with millions of nodes.Ogbn-papers100M is a is used to optimize the model and the learning rate n is set to large-scale dataset with hundreds of millions of nodes.Ogbn- 0.01.For all settings,experiments are run for 10 times with proteins is a small-scale dataset with hundreds of thousands different initialization each time,and the mean results of 10 of nodes.Amazon and Yelp in GraphSAINT,are also used runs are reported. for evaluation.Due to space limitation,the information and results on Amazon and Yelp are moved to the Appendix.The 4.3 Evaluation Criteria statistics of datasets can be found in the Appendix. The ultimate goal of sampling strategies for GNNs is to ob- tain high accuracy with a low time cost,not just to reduce https://ogb.stanford.edu/docs/nodeprop/ time and memory cost to extreme cases at the expense of sac-Method Computation complexity Memory complexity Non-sampling [Kipf and Welling, 2017] O s L−1 (sBr + Br2 ) O Lr2 + s L · Br NS [Hamilton et al., 2017] O s L−1 n · (snBr + Br2 ) O Lr2 + s L n · Br VRGCN [Chen et al., 2018a] O s L−1 n · ((sn + s) · Br + Br2 ) O Lr2 + s L−1 n · (s + sn)Br MVS-GNN [Cong et al., 2020] O s L−1 n · ((sn + s) · Br + Br2 ) O Lr2 + s L−1 n · (s + sn)Br LADIES [Zou et al., 2019] O L · (sl/N) 2 · kAk0 + Lsl · r 2 ) O Lr2 + Lsl · r GraphSAINT [Zeng et al., 2020] O L · (sg/N) 2 · kAk0 + Lsg · r 2 ) O Lr2 + Lsg · r BNS (ours) O s˜ L−1 n · (snBr + (δ/(1 − δ) + 1) · Br2 ) O Lr2 + ˜s L−1 n · snBr Table 1: Computation and memory complexity. s denotes the average number of neighbors per node in A. sn denotes the average number of neighbors sampled for each node. s˜n = sn × (1 − δ), where δ denotes the ratio of blocked nodes in BNS. sl denotes the average number of nodes per layer in layer-wise sampling. sg denotes the average number of nodes per subgraph in subgraph sampling. B = |Vin| denotes the mini-batch size of output layer. L is the number of layers in GNN models. r is the hidden dimension of networks. lack of systematically experimental analysis on large-scale graphs. For example, even with low computation complexity, VRGCN, MVS-GNN and LADIES have a high time cost of preparing data (including sampling) before each round of mini-batch training. In addition, VRGCN brings a huge burden to the memory for storing all nodes’ historical representations at each layer. MVS-GNN has the same complexity as the non-sampling method at the outer iteration, which might make the training infeasible on large-scale graphs because of running out of graphics memory. GraphSAINT faces the problem of sparse connection in subgraphs. Moreover, GraphSAINT adopts the non-sampling strategy at the evaluation and testing stage, which is also inefficient on large-scale graphs. Similar to existing node-wise sampling methods, BNS reduces the growth rate from s to a small s˜n, where s˜n denotes the number of non-blocked neighbors per node. We will show that with a small s˜n, BNS can achieve comparable accuracy to NS with a large sn, while BNS has lower computation and memory complexity. Moreover, BNS has a low time cost of preparing data before each round of mini-batch training. 4 Experiments In this section, we compare BNS with other baselines on five node-classification datasets. BNS is implemented on the Pytorch platform [Paszke et al., 2019] with Pytorch-Geometric Library [Fey and Lenssen, 2019]. All experiments are run on a NVIDIA TitanXP GPU server with 12 GB graphics memory. 4.1 Datasets Ogbn-products, ogbn-papers100M and ogbn-proteins2 are publicly available [Hu et al., 2020]. Ogbn-products is a largescale dataset with millions of nodes. Ogbn-papers100M is a large-scale dataset with hundreds of millions of nodes. Ogbnproteins is a small-scale dataset with hundreds of thousands of nodes. Amazon and Yelp in GraphSAINT, are also used for evaluation. Due to space limitation, the information and results on Amazon and Yelp are moved to the Appendix. The statistics of datasets can be found in the Appendix. 2 https://ogb.stanford.edu/docs/nodeprop/ 4.2 Baselines and Settings We compare BNS with VRGCN [Chen et al., 2018a], LADIES [Zou et al., 2019] and GraphSAINT [Zeng et al., 2020], which are the state-of-the-art methods with node-wise sampling, layer-wise sampling and subgraph sampling, respectively. Additionally, we compare BNS with the classical node-wise sampling method NS [Hamilton et al., 2017]. We do not compare BNS with MVS-GNN since MVS-GNN adopts the non-sampling strategy for training at the outer iteration, which leads to the problem of running out of graphics memory. Besides, comparisons with model simplification methods are moved to the Appendix due to space limitation. Since the original implementations of the above baselines cannot directly scale to the benchmark datasets in this paper, we re-implement them according to the corresponding authors’ codes. For a fair comparison, implementations of all methods, including BNS, only differ in the sampling process. For all methods, GNN model is instantiated with GraphSAGE [Hamilton et al., 2017], since it can achieve good performance on the benchmark datasets. Note that sampling strategies and settings during inference are the same as those in the training stage for all methods except for GraphSAINT. The hyper-parameters r, L, T (maximum epoch), λ and p (probability of dropout) are independent of sampling strategies, and hence they are set to be the same for different sampling strategies on one specific dataset. Empirically, r is set to 128 on all datasets, L is set to 5 on both ogbn-proteins and ogbn-products, and L is set to 3 on ogbn-papers100M. For T, it is set to 100 on both ogbn-products and ogbn-papers100M, and set to 1,000 on ogbn-proteins. For λ and p, the values of them are obtained by tuning with NS on the benchmark datasets. On ogbn-product, λ = 5 × 10−6 and p = 0.1. On ogbn-papers100M, λ = 5 × 10−7 and p = 0.1. On ogbnproteins, λ = 0 and p = 0. In BNS, we set ρ to 0.5 for convenience and do not tune it. Adam [Kingma and Ba, 2015] is used to optimize the model and the learning rate η is set to 0.01. For all settings, experiments are run for 10 times with different initialization each time, and the mean results of 10 runs are reported. 4.3 Evaluation Criteria The ultimate goal of sampling strategies for GNNs is to obtain high accuracy with a low time cost, not just to reduce time and memory cost to extreme cases at the expense of sac-

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Blocking-based Neighbor Sampling for Large-scale Graph Neural Network