正在加载图片...
Introduction These abstractions provide fine-grained data parallelism and thread parallelism, nested within coarse-grained data parallelism and task parallelism.They guide the programmer to partition the problem into coarse sub-problems that can be solved independently in parallel by blocks of threads,and each sub-problem into finer pieces that can be solved cooperatively in parallel by all threads within the block. This decomposition preserves language expressivity by allowing threads to cooperate when solving each sub-problem,and at the same time enables automatic scalability. Indeed,each block of threads can be scheduled on any of the available multiprocessors within a GPU,in any order,concurrently or sequentially,so that a compiled CUDA program can execute on any number of multiprocessors as illustrated by Figure 5,and only the runtime system needs to know the physical multiprocessor count. This scalable programming model allows the GPU architecture to span a wide market range by simply scaling the number of multiprocessors and memory partitions:from the high-performance enthusiast GeForce GPUs and professional Quadro and Tesla computing products to a variety of inexpensive,mainstream GeForce GPUs(see CUDA- Enabled GPUs for a list of all CUDA-enabled GPUs). www.nvidia.com CUDA C Programming Guide PG-02829-001_v8.015Introduction www.nvidia.com CUDA C Programming Guide PG-02829-001_v8.0 | 5 These abstractions provide fine-grained data parallelism and thread parallelism, nested within coarse-grained data parallelism and task parallelism. They guide the programmer to partition the problem into coarse sub-problems that can be solved independently in parallel by blocks of threads, and each sub-problem into finer pieces that can be solved cooperatively in parallel by all threads within the block. This decomposition preserves language expressivity by allowing threads to cooperate when solving each sub-problem, and at the same time enables automatic scalability. Indeed, each block of threads can be scheduled on any of the available multiprocessors within a GPU, in any order, concurrently or sequentially, so that a compiled CUDA program can execute on any number of multiprocessors as illustrated by Figure 5, and only the runtime system needs to know the physical multiprocessor count. This scalable programming model allows the GPU architecture to span a wide market range by simply scaling the number of multiprocessors and memory partitions: from the high-performance enthusiast GeForce GPUs and professional Quadro and Tesla computing products to a variety of inexpensive, mainstream GeForce GPUs (see CUDA￾Enabled GPUs for a list of all CUDA-enabled GPUs)
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有