正在加载图片...
=a×X+Y) Assuming vectors X,Y are length 64 LD FO a .load scalar a LV V1.Rx load vector x Scalar ys. vector MULTS V2.F0.V1 vector-scalar mult LV V3, Ry load vector Y ADDV V4..V3 add SV Ry v4 LD FO a .store the result ADDI R4, Rx, #512 last address to load loop: LD F2, O(Rx) load XO 578(2+9*64)vs MULTD F2, F0xF2 :a*XO 321(1+564)ops(18× Ld F4, O( Ry) :load YO 578(2+964)vs. ADDD F4, F2-F4 a*XO+ YO 6 instructions(96X) SD F4,0(Ry): store into YO ADDI RxRx. #8 increment index to x 64 operation vectors no loop overhead ADDI Ry, Ry, #8 increment index to y SUB R20, R4, Rx compute bound also 64X fewer pipeline hazards BNZ R20, loop check if done 1/272021 中国科学技术大学DAXPY (Y = a × X + Y) 1/27/2021 中国科学技术大学 3
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有