正在加载图片...
Parallel Prefix Sum(Scan)with CUDA and returns the array [,m,(⊕a,(a⊕4⊕..⊕an2l Example:If is addition,then the exclusive scan operation on the array [31704163, returns [0341111141622 An exclusive scan can be generated from an inclusive scan by shifting the resulting array right by one element and inserting the identity.Likewise,an inclusive scan can be generated from an exclusive scan by shifting the resulting array left,and inserting at the end the sum of the last element of the scan and the last element of the input array [1].For the remainder of this paper we will focus on the implementation of exclusive scan and refer to it simply as scan unless otherwise specified. Sequential Scan Implementing a sequential version of scan(that could be run in a single thread on a CPU,for example)is trivial.We simply loop over all the elements in the input array and add the value of the previous element of the input array to the sum computed for the previous element of the output array,and write the sum to the current element of the output array. void scan(float*output,float*input,int length) output[0]=0;//since this is a prescan,not a scan for(int j=1;j<length;++j) output[j]input[j-1]output[j-1]; This code performs exactly nadds for an array of length m,this is the minimum number of adds required to produce the scanned array.When we develop our parallel version of scan, we would like it to be work-efficient.This means do no more addition operations (or work) than the sequential version.In other words the two implementations should have the same work complexcity,O(n). A Naive Parallel Scan for d:=I to logan do d:=0 to logn-1 forall kin parallel do k=0n-1 if kz 2d then=x+ 2^d Algorithm 1:A sum scan algorithm that is not work-efficient. April 2007Parallel Prefix Sum (Scan) with CUDA April 2007 4 and returns the array [I, a0, (a0 ⊕ a1), …, (a0 ⊕ a1 ⊕ … ⊕ an-2)]. Example: If ⊕ is addition, then the exclusive scan operation on the array [3 1 7 0 4 1 6 3], returns [0 3 4 11 11 14 16 22]. An exclusive scan can be generated from an inclusive scan by shifting the resulting array right by one element and inserting the identity. Likewise, an inclusive scan can be generated from an exclusive scan by shifting the resulting array left, and inserting at the end the sum of the last element of the scan and the last element of the input array [1]. For the remainder of this paper we will focus on the implementation of exclusive scan and refer to it simply as scan unless otherwise specified. Sequential Scan Implementing a sequential version of scan (that could be run in a single thread on a CPU, for example) is trivial. We simply loop over all the elements in the input array and add the value of the previous element of the input array to the sum computed for the previous element of the output array, and write the sum to the current element of the output array. void scan( float* output, float* input, int length) { output[0] = 0; // since this is a prescan, not a scan for(int j = 1; j < length; ++j) { output[j] = input[j-1] + output[j-1]; } } This code performs exactly n adds for an array of length n; this is the minimum number of adds required to produce the scanned array. When we develop our parallel version of scan, we would like it to be work-efficient. This means do no more addition operations (or work) than the sequential version. In other words the two implementations should have the same work complexity, O(n). A Naïve Parallel Scan Algorithm 1: A sum scan algorithm that is not work-efficient. for d := 1 to log2n do forall k in parallel do if k ≥ 2d then x[k] := x[k − 2d-1] + x[k] k=0,...,n-1 d:=0 to logn-1 2^d
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有