正在加载图片...
CONTENTS Managing Devices 60 Using the Runtime API to Query GPU Information 61 Determining the Best GPU 63 Using nvidia-smi to Query GPU Information 63 Setting Devices at Runtime 64 Summary 65 CHAPTER 3:CUDA EXECUTION MODEL 67 Introducing the CUDA Execution Model 67 GPU Architecture Overview 68 The Fermi Architecture 7 The Kepler Architecture 73 Profile-Driven Optimization 78 Understanding the Nature of Warp Execution 80 Warps and Thread Blocks 80 Warp Divergence 82 Resource Partitioning 87 Latency Hiding 90 Occupancy 93 Synchronization 97 Scalability 98 Exposing Parallelism 98 Checking Active Warps with nvprof 100 Checking Memory Operations with nvprof 100 Exposing More Parallelism 101 Avoiding Branch Divergence 104 The Parallel Reduction Problem 104 Divergence in Parallel Reduction 106 Improving Divergence in Parallel Reduction 110 Reducing with Interleaved Pairs 112 Unrolling Loops 114 Reducing with Unrolling 115 Reducing with Unrolled Warps 117 Reducing with Complete Unrolling 119 Reducing with Template Functions 120 Dynamic Parallelism 122 Nested Execution 123 Nested Hello World on the GPU 124 Nested Reduction 128 Summary 132 X www.it-ebooks.infox CONTENTS ftoc.indd 08/07/2014 Page x Managing Devices 60 Using the Runtime API to Query GPU Information 61 Determining the Best GPU 63 Using nvidia-smi to Query GPU Information 63 Setting Devices at Runtime 64 Summary 65 CHAPTER 3: CUDA EXECUTION MODEL 67 Introducing the CUDA Execution Model 67 GPU Architecture Overview 68 The Fermi Architecture 71 The Kepler Architecture 73 Profi le-Driven Optimization 78 Understanding the Nature of Warp Execution 80 Warps and Thread Blocks 80 Warp Divergence 82 Resource Partitioning 87 Latency Hiding 90 Occupancy 93 Synchronization 97 Scalability 98 Exposing Parallelism 98 Checking Active Warps with nvprof 100 Checking Memory Operations with nvprof 100 Exposing More Parallelism 101 Avoiding Branch Divergence 104 The Parallel Reduction Problem 104 Divergence in Parallel Reduction 106 Improving Divergence in Parallel Reduction 110 Reducing with Interleaved Pairs 112 Unrolling Loops 114 Reducing with Unrolling 115 Reducing with Unrolled Warps 117 Reducing with Complete Unrolling 119 Reducing with Template Functions 120 Dynamic Parallelism 122 Nested Execution 123 Nested Hello World on the GPU 124 Nested Reduction 128 Summary 132 www.it-ebooks.info
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有