Join the discussion p2p.wror.com Wrox Programmer to Programmer Wrox A Wiry Brand Professional CUDA C Programming Foreword by Dr.Barbara Chapman,Center for Advanced Computing Data Systems,University of Houston ②nVIDIA John Cheng,Max Grossman,Ty McKercher w.it-ebooks.itif色
www.it-ebooks.info
Professional CUDAC Programming Published by John Wiley Sons,Inc. 10475 Crosspoint Boulevard Indianapolis,IN 46256 www.wiley.com Copyright 2014 by John Wiley Sons,Inc.,Indianapolis,Indiana Published simultaneously in Canada ISBN:978-1-118-73932-7 ISBN:978-1-118-73927-3(ebk) IsBN:978-1-118-73931-0(ebk) Manufactured in the United States of America 10987654321 No part of this publication may be reproduced,stored in a retrieval system or transmitted in any form or by any means, electronic,mechanical,photocopying,recording,scanning or otherwise,except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act,without either the prior written permission of the Publisher,or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,222 Rosewood Drive,Danvers, MA 01923,(978)750-8400,fax(978)646-8600.Requests to the Publisher for permission should be addressed to the Permissions Department,John Wiley Sons,Inc.,111 River Street,Hoboken,NJ 07030,(201)748-6011,fax(201)748- 6008,or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty:The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties,including without limitation warranties of fitness for a particular purpose.No warranty may be created or extended by sales or pro- motional materials.The advice and strategies contained herein may not be suitable for every situation.This work is sold with the understanding that the publisher is not engaged in rendering legal,accounting,or other professional services. If professional assistance is required,the services of a competent professional person should be sought.Neither the pub- lisher nor the author shall be liable for damages arising herefrom.The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Web site may provide or recommendations it may make.Further,readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at(877)762-2974,outside the United States at(317)572-3993 or fax (317)572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand.Some material included with stan- dard print versions of this book may not be included in e-books or in print-on-demand.If this book refers to media such as a CD or DVD that is not included in the version you purchased,you may download this material at http://book- support.wiley.com.For more information about Wiley products,visit www.wiley.com. Library of Congress Control Number:2014937184 Trademarks:Wiley,Wrox,the Wrox logo,Programmer to Programmer,and related trade dress are trademarks or regis- tered trademarks of John Wiley Sons,Inc.and/or its affiliates,in the United States and other countries,and may not be used without written permission.CUDA is a registered trademark of NVIDIA Corporation.All other trademarks are the property of their respective owners.John Wiley Sons,Inc.,is not associated with any product or vendor mentioned in this book. www.it-ebooks.info
ffi rs.indd 08/07/2014 Page ii Professional CUDA® C Programming Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2014 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-1-118-73932-7 ISBN: 978-1-118-73927-3 (ebk) ISBN: 978-1-118-73931-0 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748- 6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifi cally disclaim all warranties, including without limitation warranties of fi tness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Web site may provide or recommendations it may make. Further, readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2014937184 Trademarks: Wiley, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affi liates, in the United States and other countries, and may not be used without written permission. CUDA is a registered trademark of NVIDIA Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book. www.it-ebooks.info
CREDITS ACQUISITIONS EDITOR BUSINESS MANAGER Mary James Amy Knies PROJECT EDITOR VICE PRESIDENT AND EXECUTIVE GROUP Martin V.Minner PUBLISHER Richard Swadley TECHNICAL EDITORS Wei Zhang ASSOCIATE PUBLISHER Chao Zhao Jim Minatel PRODUCTION MANAGER PROJECT COORDINATOR,COVER Kathleen Wisor Patrick Redmond COPY EDITOR PROOFREADER Katherine Burt Nancy Carrasco MANAGER OF CONTENT DEVELOPMENT AND INDEXER ASSEMBLY Johnna VanHoose Dinse Mary Beth Wakefield COVER DESIGNER DIRECTOR OF COMMUNITY MARKETING Wiley David Mayhew COVER IMAGE MARKETING MANAGER ©iStock.com/fatido Carrie Sherrill www.it-ebooks.info
ffi rs.indd 08/07/2014 Page iii ACQUISITIONS EDITOR Mary James PROJECT EDITOR Martin V. Minner TECHNICAL EDITORS Wei Zhang Chao Zhao PRODUCTION MANAGER Kathleen Wisor COPY EDITOR Katherine Burt MANAGER OF CONTENT DEVELOPMENT AND ASSEMBLY Mary Beth Wakefi eld DIRECTOR OF COMMUNITY MARKETING David Mayhew MARKETING MANAGER Carrie Sherrill BUSINESS MANAGER Amy Knies VICE PRESIDENT AND EXECUTIVE GROUP PUBLISHER Richard Swadley ASSOCIATE PUBLISHER Jim Minatel PROJECT COORDINATOR, COVER Patrick Redmond PROOFREADER Nancy Carrasco INDEXER Johnna VanHoose Dinse COVER DESIGNER Wiley COVER IMAGE © iStock.com/fatido CREDITS www.it-ebooks.info
ABOUT THE AUTHORS JOHN(RUNWEI)CHENG is a research scientist with extensive industry experience in high-performance computing on heterogeneous computing platforms.Before joining the oil and gas industry,John worked in the finance industry for more than ten years as an expert in computational intelligence,providing advanced solutions based on genetic algorithms hybridized with data mining and statistical learning to solve real world business challenges.As an internationally recognized researcher in the field of genetic algorithms and their application to industrial engineering,John has co-authored three books.John's first book,Genetic Algorithms and Engineering Design,published by John Wiley and Sons in 1997,is still used as a textbook in universities worldwide.John has a wide range of experience in both academic research and industry development,and is gifted in making complex subjects accessible to readers with a concise,illustrative,and edifying approach.John earned his doctoral degree in computational intelligence from the Tokyo Institute of Technology. MAX GROSSMAN has been working as a developer with various GPU programming models for nearly a decade.His experience is focused in developing novel GPU pro- gramming models and implementing scientific algorithms on GPU hardware.Max has applied GPUs to a wide range of domains,including geoscience,plasma phys- ics,medical imaging,and machine learning,and enjoys understanding the compu- tational patterns of new domains and finding new and unusual ways to apply GPUs to them.Lessons learned from these domains help to guide Max's work in programming models and frameworks.Max earned his degree in computer science from Rice University with a focus in paral- lel computing TY MCKERCHER is a Principal Solution Architect with NVIDIA,leading a team that specializes in visual computing systems architecture across multiple industries. He often serves as a liaison between customer and product engineering teams dur- ing emerging technology evaluations.He has been engaged in CUDA-based projects since he participated in the first CUDA kitchen training session held at NVIDIA headquarters in 2006.Since then,Ty has helped architect GPU-based supercomput- ing environments at some of the largest and most demanding production datacenters in the world. Ty earned his mathematics degree with emphasis in geophysics and computer science from the Colorado School of Mines. www.it-ebooks.info
ffi rs.indd 08/07/2014 Page v ABOUT THE AUTHORS JOHN (RUNWEI) CHENG is a research scientist with extensive industry experience in high-performance computing on heterogeneous computing platforms. Before joining the oil and gas industry, John worked in the fi nance industry for more than ten years as an expert in computational intelligence, providing advanced solutions based on genetic algorithms hybridized with data mining and statistical learning to solve real world business challenges. As an internationally recognized researcher in the fi eld of genetic algorithms and their application to industrial engineering, John has co-authored three books. John’s fi rst book, Genetic Algorithms and Engineering Design, published by John Wiley and Sons in 1997, is still used as a textbook in universities worldwide. John has a wide range of experience in both academic research and industry development, and is gifted in making complex subjects accessible to readers with a concise, illustrative, and edifying approach. John earned his doctoral degree in computational intelligence from the Tokyo Institute of Technology. MAX GROSSMAN has been working as a developer with various GPU programming models for nearly a decade. His experience is focused in developing novel GPU programming models and implementing scientifi c algorithms on GPU hardware. Max has applied GPUs to a wide range of domains, including geoscience, plasma physics, medical imaging, and machine learning, and enjoys understanding the computational patterns of new domains and fi nding new and unusual ways to apply GPUs to them. Lessons learned from these domains help to guide Max’s work in programming models and frameworks. Max earned his degree in computer science from Rice University with a focus in parallel computing. TY MCKERCHER is a Principal Solution Architect with NVIDIA, leading a team that specializes in visual computing systems architecture across multiple industries. He often serves as a liaison between customer and product engineering teams during emerging technology evaluations. He has been engaged in CUDA-based projects since he participated in the fi rst CUDA kitchen training session held at NVIDIA headquarters in 2006. Since then, Ty has helped architect GPU-based supercomputing environments at some of the largest and most demanding production datacenters in the world. Ty earned his mathematics degree with emphasis in geophysics and computer science from the Colorado School of Mines. www.it-ebooks.info
ABOUT THE TECHNICAL EDITORS WEI ZHANG is a scientific programmer and has been working in the high-performance computing area for 15 years.He has developed or co-developed many scientific software packages for molecu- lar simulation,computer-aided drug design,EM structure reconstruction,and seismic depth imag- ing.He is now focusing his effort on improving the performance of seismic data processing using new technologies such as CUDA. CHAO ZHAO joined Chevron in 2008 and currently serves as Geophysical Application Software Development Specialist.In this role,Chao is responsible for designing and developing software products for geoscientists.Prior to joining Chevron,Chao was a software developer for Knowledge Systems Inc.and Seismic Micro Technology Inc.With more than 13 years of software developing experience in the exploration and production industry,Chao has gained rich knowledge in the fields of geology and geophysics.Having a broad education in science,Chao likes to see CUDA program- ming used widely in scientific research and enjoys contributing to it as much as he can.He holds a Bachelor of Science degree in chemistry from Peking University and a Master of Science in computer science from the University of Rhode Island. www.it-ebooks.info
ffi rs.indd 08/07/2014 Page vi ABOUT THE TECHNICAL EDITORS WEI ZHANG is a scientifi c programmer and has been working in the high-performance computing area for 15 years. He has developed or co-developed many scientifi c software packages for molecular simulation, computer-aided drug design, EM structure reconstruction, and seismic depth imaging. He is now focusing his effort on improving the performance of seismic data processing using new technologies such as CUDA. CHAO ZHAO joined Chevron in 2008 and currently serves as Geophysical Application Software Development Specialist. In this role, Chao is responsible for designing and developing software products for geoscientists. Prior to joining Chevron, Chao was a software developer for Knowledge Systems Inc. and Seismic Micro Technology Inc. With more than 13 years of software developing experience in the exploration and production industry, Chao has gained rich knowledge in the fi elds of geology and geophysics. Having a broad education in science, Chao likes to see CUDA programming used widely in scientifi c research and enjoys contributing to it as much as he can. He holds a Bachelor of Science degree in chemistry from Peking University and a Master of Science in computer science from the University of Rhode Island. www.it-ebooks.info
ACKNOWLEDGMENTS IT WOULD BE HARD TO IMAGINE this project making it to the finish line without the suggestions, constructive criticisms,help,and resources of our colleagues and friends We would like to express our thanks to NVIDIA for granting access to many GTC conference pre- sentations and CUDA technical documents that add both great value and authority to this book. In particular,we owe much gratitude to Dr.Paulius Micikevicius and Dr.Peng Wang,Developer Technology Engineers at NVIDIA,for their kind advice and help during the writing of this book. Special thanks to Mark Ebersole,NVIDIA Chief CUDA Educator,for his guidance and feedback during the review process. We would like to thank Mr.Will Ramey,Sr.Product Manager at NVIDIA,and Mr.Nadeem Mohammad,Product Marketing at NVIDIA,for their support and encouragement during the entire project. We would like to thank Mr.Paul Holzhauer,Director of Oil Gas at NVIDIA,for his support during the initial phase of this project. Especially,we owe an enormous debt of gratitude to many presenters and speakers in past GTC con- ferences for their inspiring and creative work on GPU computing technologies.We have recorded all your credits in our suggested reading lists. After years of work using GPUs in real production projects,John is very grateful to the people who helped him become a GPU computing enthusiast.Especially,John would like to thank Dr.Nanxun Dai and Dr.Bao Zhao for their encouragement,support,and guidance on seismic imaging projects at BGP.John also would like to thank his colleagues Dr.Zhengzhen Zhou,Dr.Wei Zhang,Mrs. Grace Zhang,and Mr.Kai Yang.They are truly brilliant and very pleasant to work with.John loves the team and feels very privileged to be one of them.John would like to extend a special thanks to Dr.Mitsuo Gen,an internationally well-known professor,the supervisor of John's doctoral pro- gram,for giving John the opportunity to teach at universities in Japan and co-author academic books,especially for his fully supporting John during the years when John was running a startup based on evolutionary computation technologies in Tokyo.John is very happy working on this proj- ect with Ty and Max as a team and learned a lot from them during the process of book writing. John owes a debt of gratitude to his wife,Joly,and his son,Rick,for their love,support,and consid- erable patience during evenings and weekends over the past year while Dad was yet again "doing his own book work.” For over 25 years,Ty has been helping software developers solve HPC grand challenges.Ty is delighted to work at NVIDIA to help clients extend their current knowledge to unlock the poten- tial from massively parallel GPUs.There are so many NVIDIANs to thank,but Ty would like to specifically recognize Dr.Paulius Micikevicius for his gifted insights and strong desire to always improve while doing the heavy lifting for numerous projects.When John asked Ty to help share www.it-ebooks.info
ffi rs.indd 08/07/2014 Page vii ACKNOWLEDGMENTS IT WOULD BE HARD TO IMAGINE this project making it to the fi nish line without the suggestions, constructive criticisms, help, and resources of our colleagues and friends. We would like to express our thanks to NVIDIA for granting access to many GTC conference presentations and CUDA technical documents that add both great value and authority to this book. In particular, we owe much gratitude to Dr. Paulius Micikevicius and Dr. Peng Wang, Developer Technology Engineers at NVIDIA, for their kind advice and help during the writing of this book. Special thanks to Mark Ebersole, NVIDIA Chief CUDA Educator, for his guidance and feedback during the review process. We would like to thank Mr. Will Ramey, Sr. Product Manager at NVIDIA, and Mr. Nadeem Mohammad, Product Marketing at NVIDIA, for their support and encouragement during the entire project. We would like to thank Mr. Paul Holzhauer, Director of Oil & Gas at NVIDIA, for his support during the initial phase of this project. Especially, we owe an enormous debt of gratitude to many presenters and speakers in past GTC conferences for their inspiring and creative work on GPU computing technologies. We have recorded all your credits in our suggested reading lists. After years of work using GPUs in real production projects, John is very grateful to the people who helped him become a GPU computing enthusiast. Especially, John would like to thank Dr. Nanxun Dai and Dr. Bao Zhao for their encouragement, support, and guidance on seismic imaging projects at BGP. John also would like to thank his colleagues Dr. Zhengzhen Zhou, Dr. Wei Zhang, Mrs. Grace Zhang, and Mr. Kai Yang. They are truly brilliant and very pleasant to work with. John loves the team and feels very privileged to be one of them. John would like to extend a special thanks to Dr. Mitsuo Gen, an internationally well-known professor, the supervisor of John’s doctoral program, for giving John the opportunity to teach at universities in Japan and co-author academic books, especially for his fully supporting John during the years when John was running a startup based on evolutionary computation technologies in Tokyo. John is very happy working on this project with Ty and Max as a team and learned a lot from them during the process of book writing. John owes a debt of gratitude to his wife, Joly, and his son, Rick, for their love, support, and considerable patience during evenings and weekends over the past year while Dad was yet again “doing his own book work.” For over 25 years, Ty has been helping software developers solve HPC grand challenges. Ty is delighted to work at NVIDIA to help clients extend their current knowledge to unlock the potential from massively parallel GPUs. There are so many NVIDIANs to thank, but Ty would like to specifi cally recognize Dr. Paulius Micikevicius for his gifted insights and strong desire to always improve while doing the heavy lifting for numerous projects. When John asked Ty to help share www.it-ebooks.info
ACKNOWLEDGMENTS CUDA knowledge in a book project,he welcomed the challenge.Dave Jones,NVIDIA,senior direc- tor approved Ty's participation in this project,and sadly last year Dave lost his courageous battle against cancer.Our hearts go out to Dave and his family-his memory serves to inspire,to press on,and to pursue your passions.The encouragements from Shanker Trivedi and Marc Hamilton have been especially helpful.Yearning to maintain his life/work balance,Ty recruited Max to join this project.It was truly a pleasure to learn from John and Max as they developed the book content that Ty helped review.Finally,Ty's wife,Judy,and his four children deserve recognition for their unconditional support and love-it is a blessing to receive encouragement and motivation while pursuing those things that bring joy to your life. Max has been fortunate to collaborate with and be guided by a number of brilliant and talented engineers,researchers,and mentors.First,thanks have to go to Professor Vivek Sarkar and the whole Habanero Research Group at Rice University.There,Max got his first taste of HPC and CUDA.The mentorship of Vivek and others in the group was invaluable in enabling him to explore the exciting world of research.Max would also like to thank Mauricio Araya-Polo and Gladys Gonzalez at Repsol.The experience gained under their mentorship was incredibly valuable in writ- ing a book that would be truly useful to real-world work in science and engineering.Finally,Max would like to thank John and Ty for inviting him along on this writing adventure in CUDA and for the lessons this experience has provided in CUDA,writing,and life. It would not be possible to make a quality professional book without input from technical editors, development editors,and reviewers.We would like to extend our sincere appreciation to Mary E. James,our acquisitions editor;Martin V.Minner,our project editor;Katherine Burt,our copy edi- tor;and Wei Zhang and Chao Zhao,our technical editors.You are an insightful and professional editorial team and this book would not be what it is without you.It was a great pleasure to work with you on this project. www.it-ebooks.info
ffi rs.indd 08/07/2014 Page viii CUDA knowledge in a book project, he welcomed the challenge. Dave Jones, NVIDIA, senior director approved Ty’s participation in this project, and sadly last year Dave lost his courageous battle against cancer. Our hearts go out to Dave and his family — his memory serves to inspire, to press on, and to pursue your passions. The encouragements from Shanker Trivedi and Marc Hamilton have been especially helpful. Yearning to maintain his life/work balance, Ty recruited Max to join this project. It was truly a pleasure to learn from John and Max as they developed the book content that Ty helped review. Finally, Ty’s wife, Judy, and his four children deserve recognition for their unconditional support and love — it is a blessing to receive encouragement and motivation while pursuing those things that bring joy to your life. Max has been fortunate to collaborate with and be guided by a number of brilliant and talented engineers, researchers, and mentors. First, thanks have to go to Professor Vivek Sarkar and the whole Habanero Research Group at Rice University. There, Max got his fi rst taste of HPC and CUDA. The mentorship of Vivek and others in the group was invaluable in enabling him to explore the exciting world of research. Max would also like to thank Mauricio Araya-Polo and Gladys Gonzalez at Repsol. The experience gained under their mentorship was incredibly valuable in writing a book that would be truly useful to real-world work in science and engineering. Finally, Max would like to thank John and Ty for inviting him along on this writing adventure in CUDA and for the lessons this experience has provided in CUDA, writing, and life. It would not be possible to make a quality professional book without input from technical editors, development editors, and reviewers. We would like to extend our sincere appreciation to Mary E. James, our acquisitions editor; Martin V. Minner, our project editor; Katherine Burt, our copy editor; and Wei Zhang and Chao Zhao, our technical editors. You are an insightful and professional editorial team and this book would not be what it is without you. It was a great pleasure to work with you on this project. ACKNOWLEDGMENTS www.it-ebooks.info
CONTENTS FOREWORD xvii PREFACE xix INTRODUCTION xxi CHAPTER 1:HETEROGENEOUS PARALLEL COMPUTING WITH CUDA 1 Parallel Computing Sequential and Parallel Programming 3 Parallelism 4 Computer Architecture 6 Heterogeneous Computing 8 Heterogeneous Architecture 9 Paradigm of Heterogeneous Computing 12 CUDA:A Platform for Heterogeneous Computing 1 Hello World from GPU Is CUDA C Programming Difficult? Summary 29 CHAPTER 2:CUDA PROGRAMMING MODEL 23 Introducing the CUDA Programming Model CUDA Programming Structure Managing Memory Organizing Threads Launching a CUDA Kernel Writing Your Kernel Verifying Your Kernel Handling Errors Compiling and Executing Timing Your Kernel Timing with CPU Timer Timing with nvprof Organizing Parallel Threads Indexing Matrices with Blocks and Threads 353087960034799378 Summing Matrices with a 2D Grid and 2D Blocks Summing Matrices with a 1D Grid and 1D Blocks Summing Matrices with a 2D Grid and 1D Blocks www.it-ebooks.info
ftoc.indd 08/07/2014 Page ix CONTENTS FOREWORD xvii PREFACE xix INTRODUCTION xxi CHAPTER 1: HETEROGENEOUS PARALLEL COMPUTING WITH CUDA 1 Parallel Computing 2 Sequential and Parallel Programming 3 Parallelism 4 Computer Architecture 6 Heterogeneous Computing 8 Heterogeneous Architecture 9 Paradigm of Heterogeneous Computing 12 CUDA: A Platform for Heterogeneous Computing 14 Hello World from GPU 17 Is CUDA C Programming Diffi cult? 20 Summary 21 CHAPTER 2: CUDA PROGRAMMING MODEL 23 Introducing the CUDA Programming Model 23 CUDA Programming Structure 25 Managing Memory 26 Organizing Threads 30 Launching a CUDA Kernel 36 Writing Your Kernel 37 Verifying Your Kernel 39 Handling Errors 40 Compiling and Executing 40 Timing Your Kernel 43 Timing with CPU Timer 44 Timing with nvprof 47 Organizing Parallel Threads 49 Indexing Matrices with Blocks and Threads 49 Summing Matrices with a 2D Grid and 2D Blocks 53 Summing Matrices with a 1D Grid and 1D Blocks 57 Summing Matrices with a 2D Grid and 1D Blocks 58 www.it-ebooks.info
CONTENTS Managing Devices 60 Using the Runtime API to Query GPU Information 61 Determining the Best GPU 63 Using nvidia-smi to Query GPU Information 63 Setting Devices at Runtime 64 Summary 65 CHAPTER 3:CUDA EXECUTION MODEL 67 Introducing the CUDA Execution Model 67 GPU Architecture Overview 68 The Fermi Architecture 7 The Kepler Architecture 73 Profile-Driven Optimization 78 Understanding the Nature of Warp Execution 80 Warps and Thread Blocks 80 Warp Divergence 82 Resource Partitioning 87 Latency Hiding 90 Occupancy 93 Synchronization 97 Scalability 98 Exposing Parallelism 98 Checking Active Warps with nvprof 100 Checking Memory Operations with nvprof 100 Exposing More Parallelism 101 Avoiding Branch Divergence 104 The Parallel Reduction Problem 104 Divergence in Parallel Reduction 106 Improving Divergence in Parallel Reduction 110 Reducing with Interleaved Pairs 112 Unrolling Loops 114 Reducing with Unrolling 115 Reducing with Unrolled Warps 117 Reducing with Complete Unrolling 119 Reducing with Template Functions 120 Dynamic Parallelism 122 Nested Execution 123 Nested Hello World on the GPU 124 Nested Reduction 128 Summary 132 X www.it-ebooks.info
x CONTENTS ftoc.indd 08/07/2014 Page x Managing Devices 60 Using the Runtime API to Query GPU Information 61 Determining the Best GPU 63 Using nvidia-smi to Query GPU Information 63 Setting Devices at Runtime 64 Summary 65 CHAPTER 3: CUDA EXECUTION MODEL 67 Introducing the CUDA Execution Model 67 GPU Architecture Overview 68 The Fermi Architecture 71 The Kepler Architecture 73 Profi le-Driven Optimization 78 Understanding the Nature of Warp Execution 80 Warps and Thread Blocks 80 Warp Divergence 82 Resource Partitioning 87 Latency Hiding 90 Occupancy 93 Synchronization 97 Scalability 98 Exposing Parallelism 98 Checking Active Warps with nvprof 100 Checking Memory Operations with nvprof 100 Exposing More Parallelism 101 Avoiding Branch Divergence 104 The Parallel Reduction Problem 104 Divergence in Parallel Reduction 106 Improving Divergence in Parallel Reduction 110 Reducing with Interleaved Pairs 112 Unrolling Loops 114 Reducing with Unrolling 115 Reducing with Unrolled Warps 117 Reducing with Complete Unrolling 119 Reducing with Template Functions 120 Dynamic Parallelism 122 Nested Execution 123 Nested Hello World on the GPU 124 Nested Reduction 128 Summary 132 www.it-ebooks.info
CONTENTS CHAPTER 4:GLOBAL MEMORY 135 Introducing the CUDA Memory Model 136 Benefits of a Memory Hierarchy 136 CUDA Memory Model 137 Memory Management 145 Memory Allocation and Deallocation 146 Memory Transfer 146 Pinned Memory 148 Zero-Copy Memory 150 Unified Virtual Addressing 156 Unified Memory 157 Memory Access Patterns 158 Aligned and Coalesced Access 158 Global Memory Reads 160 Global Memory Writes 169 Array of Structures versus Structure of Arrays 171 Performance Tuning 176 What Bandwidth Can a Kernel Achieve? 179 Memory Bandwidth 179 Matrix Transpose Problem 180 Matrix Addition with Unified Memory 195 Summary 199 CHAPTER 5:SHARED MEMORY AND CONSTANT MEMORY 203 Introducing CUDA Shared Memory 204 Shared Memory 204 Shared Memory Allocation 206 Shared Memory Banks and Access Mode 206 Configuring the Amount of Shared Memory 212 Synchronization 214 Checking the Data Layout of Shared Memory 216 Square Shared Memory 217 Rectangular Shared Memory 225 Reducing Global Memory Access 232 Parallel Reduction with Shared Memory 232 Parallel Reduction with Unrolling 236 Parallel Reduction with Dynamic Shared Memory 238 Effective Bandwidth 239 书 www.it-ebooks.info
xi CONTENTS ftoc.indd 08/07/2014 Page xi CHAPTER 4: GLOBAL MEMORY 135 Introducing the CUDA Memory Model 136 Benefi ts of a Memory Hierarchy 136 CUDA Memory Model 137 Memory Management 145 Memory Allocation and Deallocation 146 Memory Transfer 146 Pinned Memory 148 Zero-Copy Memory 150 Unifi ed Virtual Addressing 156 Unifi ed Memory 157 Memory Access Patterns 158 Aligned and Coalesced Access 158 Global Memory Reads 160 Global Memory Writes 169 Array of Structures versus Structure of Arrays 171 Performance Tuning 176 What Bandwidth Can a Kernel Achieve? 179 Memory Bandwidth 179 Matrix Transpose Problem 180 Matrix Addition with Unifi ed Memory 195 Summary 199 CHAPTER 5: SHARED MEMORY AND CONSTANT MEMORY 203 Introducing CUDA Shared Memory 204 Shared Memory 204 Shared Memory Allocation 206 Shared Memory Banks and Access Mode 206 Confi guring the Amount of Shared Memory 212 Synchronization 214 Checking the Data Layout of Shared Memory 216 Square Shared Memory 217 Rectangular Shared Memory 225 Reducing Global Memory Access 232 Parallel Reduction with Shared Memory 232 Parallel Reduction with Unrolling 236 Parallel Reduction with Dynamic Shared Memory 238 Effective Bandwidth 239 www.it-ebooks.info