Memory Coalescing 7O Memory Coalescin_中国高校课件下载中心

点击下载：上海交通大学：《Multicore Architecture and Parallel Computing》课程教学资源（PPT课件讲稿）Lecture 8 CUDA, cont’d

正在加载图片...

O Memory Coalescing global void Matrix MulKernel (float* Md, float* Nd, float*Pd, int Width) shared float Mds [TILE WIDTH] [TILE WIDTH]; shared f1 oat nds[T工LEW工DTH][ TILE WIDTH]; 3. int bx blockIdx x int by blockIdxy 4. int tx threadIdx. x; int ty =threadIdxy // Identify the row and column of the Pd element to work on int Row by TILE WIDTH tyi 6. int col bx TILE WIDTH + txi float Pvalue = o // Loop over the Ma and Nd tiles required to compute the pd element 8 for (int m=0; m< Width/TILE WIDTH; ++m) I // Coolaborative loading of Md and Nd tiles into shared memory 9 Mds [ty][tx]= Md[Row*width +(m*TILE WIDTH tx)]i 10 Nds [ty][tx]= Nd[Col +(m*TILE WIDTH ty)*width]i syncthreads ()i 11. for (int k=0; k< TILE WIDTH; ++k) Pvalue + Mds [tyl[k] Nds [k][tx]i Synchthreads ()i 14.} 13. Pd[Row*Width+Col]= PvalueiMemory Coalescing 7

<<向上翻页向下翻页>>

点击下载：上海交通大学：《Multicore Architecture and Parallel Computing》课程教学资源（PPT课件讲稿）Lecture 8 CUDA, cont’d