正在加载图片...
第12期 曹倩等:基于超图的非规则应用局部性优化 ·1477· 图9所示为各个重排算法的时间开销图.可以 Proceedings of the 16th Principles and Practices of Parallel Pro- 看出:原理简单的算法,如基于遍历思想的H_NRC_ gramming (PPoPP).San Antonio,2011:13 B]Zhou M,Sahni O,Shephard M S,et al.Adjacencyased data re- D算法,时间开销很小:而原理相对复杂的算法,如 ordering algorithm for acceleration of finite element computations 使用了多级图划分思想的H_P℉B_D算法,时间开 Sci Program,2010,18(2):107 销比较大,HPFB_D算法的时间开销约为H_NRC_ 4]Wang M.Parallel Compilation and Optimization for Multi-core Pro- D算法的3倍.对于开销较大的H_PFB_D算法,在 cessors [Dissertation].Changsha:National University of Defense 一定范围内,当迭代次数增加时,可以有效地分摊算 Technology,2010 法本身的开销,因此程序性能也会相应地提高 (王淼.面向多核处理器的并行编译及优化关键技术研究[学 位论文].长沙:国防科学技术大学,2010) 2.50 5] Camata JJ,Rossa A L,Valli A M P,et al.Reordering and in- 200 complete preconditioning in serial and parallel adaptive mesh re- finement and coarsening flow solutions.Int J Numer Methods Flu- ids,2012,69(4):802 盘100 [6]Liu X H,Yang Y,Zhang J Y,et al.Basic-block reordering algo- 50 ◆一 rithm based on structural analysis.J Softcare,2008,19(7): 1603 H_NRC_D H_BS_D H_PFB_D H NRC_I H_BS_I 重排算法 (刘先华,杨阳,张吉豫,等.一种基于子结构分析的基本块 重排算法.软件学报.2008,19(7):1603) 图9各种算法的时间开销 ] Hsu C H,Chen S C.Efficient selection strategies towards proces- Fig.9 Cost of each algorithm sorreordering techniques for improving data locality in heterogene ous clusters.J Supercomput,2012,60(3):284 4结论 [8]Rjeili AA,Karypis G.Multilevel algorithms for partitioning pow- er-aw graphs /Proceedings of the 20th International Parallel and (1)采用超图的思想实现数据重排和迭代重 Distributed Processing Symposium (IPDPS).Rhodes Island, 排,可以改善非规则应用中cache的局部性,提高非 2006:566 规则应用的执行速度. ]Lu H,Tan Y P,Xue X Y,et al.Real-ime,adaptive,and locali- (2)最优的基于超图的数据重排算法与最优的 ty-based graph partitioning method for video scene clustering. IEEE Trans Circuits Syst Video Technol,2011,21(11)1747 基于超图的迭代重排算法相结合,其效果不一定是 [10]Fan N,Pardalos P M.Linear and quadratic programming approa- 最优的,即数据重排算法与迭代重排算法不能简单 ches for the general graph partitioning problem.J Global Optim, 地叠加 2010,48(1):57 (3)对于开销较大的重排算法,在一定范围内, [11]Huang R J.Counting series of undirected hypergraphs.J Unin 增加迭代次数,可以有效地分摊算法本身的开销,因 Sci Technol Beijing,1999,21(5):507 (黄汝激.无向超图的计数级数.北京科技大学学报,1999, 此程序性能也会相应地提高. 21(5):507) (4)本文提出的基于超图的重排算法仍具有一 [12]Han H,Tseng C W.Exploiting locality for irregular scientific 定的局限性,如这些算法并不适用于循环迭代间存 codes.IEEE Trans Parallel Distrib Syst,2006,17(7):606 在数据依赖的情况,这也将是我们下一步研究的 [13]Strout MM,Osheim N,Rostron D,et al.Evaluation of hierar- 重点 chical mesh reorderings/Proceedings of the 9th International Conference on Computational Science (ICCS).Baton Rouge, 2009:540 参考文献 [14]Ucara B.Catalyurekb 0 V,Aykanat C.A matrix partitioning in- [1]Kim S,Han H,Choe K M.Region-based parallelization of irregu- terface to PaToH in MATLAB.J Parallel Comput,2010,36 lar reductions on explicitly managed memory hierarchies.Super- (5):254 comput,2011,56(1):25 [15]Norman M,Padoan P,Bordner J,et al.ZEUS2D Codes Online Bauer M,Clark J,Schkufza E,et al.Programming the memory [EB/0L](2007-0807)[2011-12-01].htp:/lca.ucsd. hierarchy revisited:Supporting irregular parallelism in sequoia// edu/portal/codes/zeus2d第 12 期 曹 倩等: 基于超图的非规则应用局部性优化 图 9 所示为各个重排算法的时间开销图. 可以 看出: 原理简单的算法,如基于遍历思想的 H_NRC_ D 算法,时间开销很小; 而原理相对复杂的算法,如 使用了多级图划分思想的 H_PFB_D 算法,时间开 销比较大,H_PFB_D 算法的时间开销约为 H_NRC_ D 算法的 3 倍. 对于开销较大的 H_PFB_D 算法,在 一定范围内,当迭代次数增加时,可以有效地分摊算 法本身的开销,因此程序性能也会相应地提高. 图 9 各种算法的时间开销 Fig. 9 Cost of each algorithm 4 结论 ( 1) 采用超图的思想实现数据重排和迭代重 排,可以改善非规则应用中 cache 的局部性,提高非 规则应用的执行速度. ( 2) 最优的基于超图的数据重排算法与最优的 基于超图的迭代重排算法相结合,其效果不一定是 最优的,即数据重排算法与迭代重排算法不能简单 地叠加. ( 3) 对于开销较大的重排算法,在一定范围内, 增加迭代次数,可以有效地分摊算法本身的开销,因 此程序性能也会相应地提高. ( 4) 本文提出的基于超图的重排算法仍具有一 定的局限性,如这些算法并不适用于循环迭代间存 在数据依赖的情况,这也将是我们下一步研究的 重点. 参 考 文 献 [1] Kim S,Han H,Choe K M. Region-based parallelization of irregu￾lar reductions on explicitly managed memory hierarchies. J Super￾comput,2011,56( 1) : 25 [2] Bauer M,Clark J,Schkufza E,et al. Programming the memory hierarchy revisited: Supporting irregular parallelism in sequoia / / Proceedings of the 16th Principles and Practices of Parallel Pro￾gramming ( PPoPP) . San Antonio,2011: 13 [3] Zhou M,Sahni O,Shephard M S,et al. Adjacency-based data re￾ordering algorithm for acceleration of finite element computations. Sci Program,2010,18( 2) : 107 [4] Wang M. Parallel Compilation and Optimization for Multi-core Pro￾cessors [Dissertation]. Changsha: National University of Defense Technology,2010 ( 王淼. 面向多核处理器的并行编译及优化关键技术研究[学 位论文]. 长沙: 国防科学技术大学,2010) [5] Camata J J,Rossa A L,Valli A M P,et al. Reordering and in￾complete preconditioning in serial and parallel adaptive mesh re￾finement and coarsening flow solutions. Int J Numer Methods Flu￾ids,2012,69( 4) : 802 [6] Liu X H,Yang Y,Zhang J Y,et al. Basic-block reordering algo￾rithm based on structural analysis. J Software,2008,19 ( 7 ) : 1603 ( 刘先华,杨阳,张吉豫,等. 一种基于子结构分析的基本块 重排算法. 软件学报. 2008,19( 7) : 1603) [7] Hsu C H,Chen S C. Efficient selection strategies towards proces￾sor reordering techniques for improving data locality in heterogene￾ous clusters. J Supercomput,2012,60( 3) : 284 [8] Rjeili A A,Karypis G. Multilevel algorithms for partitioning pow￾er-law graphs / / Proceedings of the 20th International Parallel and Distributed Processing Symposium ( IPDPS ) . Rhodes Island, 2006: 566 [9] Lu H,Tan Y P,Xue X Y,et al. Real-time,adaptive,and locali￾ty-based graph partitioning method for video scene clustering. IEEE Trans Circuits Syst Video Technol,2011,21( 11) : 1747 [10] Fan N,Pardalos P M. Linear and quadratic programming approa￾ches for the general graph partitioning problem. J Global Optim, 2010,48( 1) : 57 [11] Huang R J. Counting series of undirected hypergraphs. J Univ Sci Technol Beijing,1999,21( 5) : 507 ( 黄汝激. 无向超图的计数级数. 北京科技大学学报,1999, 21( 5) : 507) [12] Han H,Tseng C W. Exploiting locality for irregular scientific codes. IEEE Trans Parallel Distrib Syst,2006,17( 7) : 606 [13] Strout M M,Osheim N,Rostron D,et al. Evaluation of hierar￾chical mesh reorderings / / Proceedings of the 9th International Conference on Computational Science ( ICCS ) . Baton Rouge, 2009: 540 [14] Uara B,atalyürekb  V,Aykanat C. A matrix partitioning in￾terface to PaToH in MATLAB. J Parallel Comput,2010,36 ( 5) : 254 [15] Norman M,Padoan P,Bordner J,et al. ZEUS-2D Codes Online [EB /OL]( 2007--08--07) [2011--12--01]. http: / /lca. ucsd. edu /portal /codes/zeus2d ·1477·
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有