当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

《并行与分布式程序设计》课程教学参考书:CUDA《Programming Massively Parallel Processors》(A Hands-on Approach)

资源类别:文库,文档格式:PDF,文档页数:277,文件大小:4.74MB,团购合买
点击下载完整版文档(PDF)

David B.Kirk Wen-mei W.Hwu Programming Massively Parallel Processors A Hands-on Approach nVIDIA. MK

In Praise of Programming Massively Parallel Processors: A Hands-on Approach Parallel programming is about performance,for otherwise you'd write a sequential program.For those interested in learning or teaching the topic, a problem is where to find truly parallel hardware that can be dedicated to the task,for it is difficult to see interesting speedups if its shared or only modestly parallel.One answer is graphical processing units(GPUs),which can have hundreds of cores and are found in millions of desktop and laptop computers.For those interested in the GPU path to parallel enlightenment, this new book from David Kirk and Wen-mei Hwu is a godsend,as it intro- duces CUDA,a C-like data parallel language,and Tesla,the architecture of the current generation of NVIDIA GPUs.In addition to explaining the language and the architecture,they define the nature of data parallel pro- blems that run well on heterogeneous CPU-GPU hardware.More con- cretely,two detailed case studies demonstrate speedups over CPU-only C programs of 10X to 15X for naive CUDA code and 45X to 105X for expertly tuned versions.They conclude with a glimpse of the future by describing the next generation of data parallel languages and architectures:OpenCL and the NVIDIA Fermi GPU.This book is a valuable addition to the recently reinvigorated parallel computing literature. David Patterson Director,The Parallel Computing Research Laboratory Pardee Professor of Computer Science,U.C.Berkeley Co-author of Computer Architecture:A Quantitative Approach Written by two teaching pioneers,this book is the definitive practical refer- ence on programming massively parallel processors-a true technological gold mine.The hands-on learning included is cutting-edge,yet very read- able.This is a most rewarding read for students,engineers and scientists interested in supercharging computational resources to solve today's and tomorrow's hardest problems. Nicolas Pinto MIT,NVIDIA Fellow 2009 I have always admired Wen-mei Hwu's and David Kirk's ability to turn complex problems into easy-to-comprehend concepts.They have done it again in this book.This joint venture of a passionate teacher and a GPU

In Praise of Programming Massively Parallel Processors: A Hands-on Approach Parallel programming is about performance, for otherwise you’d write a sequential program. For those interested in learning or teaching the topic, a problem is where to find truly parallel hardware that can be dedicated to the task, for it is difficult to see interesting speedups if its shared or only modestly parallel. One answer is graphical processing units (GPUs), which can have hundreds of cores and are found in millions of desktop and laptop computers. For those interested in the GPU path to parallel enlightenment, this new book from David Kirk and Wen-mei Hwu is a godsend, as it intro￾duces CUDA, a C-like data parallel language, and Tesla, the architecture of the current generation of NVIDIA GPUs. In addition to explaining the language and the architecture, they define the nature of data parallel pro￾blems that run well on heterogeneous CPU-GPU hardware. More con￾cretely, two detailed case studies demonstrate speedups over CPU-only C programs of 10X to 15X for naı¨ve CUDA code and 45X to 105X for expertly tuned versions. They conclude with a glimpse of the future by describing the next generation of data parallel languages and architectures: OpenCL and the NVIDIA Fermi GPU. This book is a valuable addition to the recently reinvigorated parallel computing literature. David Patterson Director, The Parallel Computing Research Laboratory Pardee Professor of Computer Science, U.C. Berkeley Co-author of Computer Architecture: A Quantitative Approach Written by two teaching pioneers, this book is the definitive practical refer￾ence on programming massively parallel processors—a true technological gold mine. The hands-on learning included is cutting-edge, yet very read￾able. This is a most rewarding read for students, engineers and scientists interested in supercharging computational resources to solve today’s and tomorrow’s hardest problems. Nicolas Pinto MIT, NVIDIA Fellow 2009 I have always admired Wen-mei Hwu’s and David Kirk’s ability to turn complex problems into easy-to-comprehend concepts. They have done it again in this book. This joint venture of a passionate teacher and a GPU

evangelizer tackles the trade-off between the simple explanation of the con- cepts and the depth analysis of the programming techniques.This is a great book to learn both massive parallel programming and CUDA. Mateo Valero Director,Barcelona Supercomputing Center The use of GPUs is having a big impact in scientific computing.David Kirk and Wen-mei Hwu's new book is an important contribution towards educat- ing our students on the ideas and techniques of programming for massively- parallel processors. Mike Giles Professor of Scientific Computing University of Oxford This book is the most comprehensive and authoritative introduction to GPU computing yet.David Kirk and Wen-mei Hwu are the pioneers in this increasingly important field,and their insights are invaluable and fascinating.This book will be the standard reference for years to come. Hanspeter Pfister Harvard University This is a vital and much needed text.GPU programming is growing by leaps and bounds.This new book will be very welcomed and highly useful across inter-disciplinary fields. Shannon Steinfadt Kent State University

evangelizer tackles the trade-off between the simple explanation of the con￾cepts and the depth analysis of the programming techniques. This is a great book to learn both massive parallel programming and CUDA. Mateo Valero Director, Barcelona Supercomputing Center The use of GPUs is having a big impact in scientific computing. David Kirk and Wen-mei Hwu’s new book is an important contribution towards educat￾ing our students on the ideas and techniques of programming for massively￾parallel processors. Mike Giles Professor of Scientific Computing University of Oxford This book is the most comprehensive and authoritative introduction to GPU computing yet. David Kirk and Wen-mei Hwu are the pioneers in this increasingly important field, and their insights are invaluable and fascinating. This book will be the standard reference for years to come. Hanspeter Pfister Harvard University This is a vital and much needed text. GPU programming is growing by leaps and bounds. This new book will be very welcomed and highly useful across inter-disciplinary fields. Shannon Steinfadt Kent State University

Programming Massively Parallel Processors A Hands-on Approach David B.Kirk and Wen-mei W.Hwu AMSTERDAM·BOSTON·HEIDELBERG·LONDON NEW YORK·OXFORD·PARIS·SAN DIEGO MK SAN FRANCISCO·SINGAPORE·SYDNEY·TOKYO ELSEVIER Morgan Kaufmann Publishers is an imprint of Elsevier KAUFMANN

Programming Massively Parallel Processors A Hands-on Approach David B. Kirk and Wen-mei W. Hwu AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann Publishers is an imprint of Elsevier

Morgan Kaufmann Publishers is an imprint of Elsevier. 30 Corporate Drive,Suite 400.Burlington,MA 01803,USA This book is printed on acid-free paper. 2010 David B.Kirk/NVIDIA Corporation and Wen-mei Hwu.Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means.electronic or mechanical,including photocopying,recording.or any information storage and retrieval system,without permission in writing from the publisher.Details on how to seek permission,further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency.can be found at our website:www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). NVIDIA,the NVIDIA logo.CUDA,GeForce,Quadro,and Tesla are trademarks or registered trademarks of NVIDIA Corporation in the U.S.and other countries. OpenCL is a trademark of Apple Inc. Notices Knowledge and best practice in this field are constantly changing.As new research and experience broaden our understanding.changes in research methods,professional practices.or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information,methods,compounds,or experiments described herein.In using such information or methods they should be mindful of their own safety and the safety of others,including parties for whom they have a professional responsibility. To the fullest extent of the law,neither the Publisher nor the authors.contributors.or editors.assume any liability for any injury and/or damage to persons or property as a matter of products liability. negligence or otherwise,or from any use or operation of any methods.products,instructions,or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application Submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN:978-0-12-381472-2 For information on all Morgan Kaufmann publications. visit our Web site at www.mkp.com or www.elsevierdirect.com Printed in United States of America 101112131454321 Working together to grow libraries in developing countries www.clscvicr.com www.bookaid.org www.sabre.org ELSEVIER BOOK AID nternational Sabre Foundation

Morgan Kaufmann Publishers is an imprint of Elsevier. 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA This book is printed on acid-free paper. # 2010 David B. Kirk/NVIDIA Corporation and Wen-mei Hwu. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). NVIDIA, the NVIDIA logo, CUDA, GeForce, Quadro, and Tesla are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. OpenCL is a trademark of Apple Inc. Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application Submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-381472-2 For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.elsevierdirect.com Printed in United States of America 10 11 12 13 14 5 4 3 2 1

Contents Preface ...............................X Acknowledgments.… ................xvii Dedication… ..........xIX CHAPTER 1 INTRODUCTION. 1.1GPUs as Parallel Computers 1.2 Architecture of a Modern GPU............... 8 1.3 Why More Speed or Parallelism?............. 10 1.4 Parallel Programming Languages and Models...13 1.5 Overarching Goals.15 1.6 Organization of the Book. .16 CHAPTER 2 HISTORY OF GPU COMPUTING.............................. 21 2.1 Evolution of Graphics Pipelines..21 2.1.1 The Era of Fixed-Function Graphics Pipelines..................22 2.1.2 Evolution of Programmable Real-Time Graphics ......6 2.1.3 Unified Graphics and Computing Processors29 2.1.4 GPGPU:An Intermediate Step......... .31 2.2 GPU Computing… .32 2.2.1 Scalable GPUS...33 2.2.2 Recent Developments......... 34 2.3 Future Trends.... 34 CHAPTER 3 INTRODUCTION To cUDA...39 3.1 Data Parallelism… 39 3.2 CUDA Program Structure ........... 41 3.3 A Matrix-Matrix Multiplication Example. .42 3.4 Device Memories and Data Transfer....... 46 3.5 Kernel Functions and Threading....................5. 36 Summary......56 3.6.1 Function declarations...... .56 3.6.2 Kernel launch. 56 3.6.3 Predefined variables.56 3.6.4 Runtime API57 CHAPTER 4 CUDA THREADS.............. 59 4.1 CUDA Thread Organization............. 59 4.2 Using blockIdx and threadIdx.....................64 4.3 Synchronization and Transparent Scalability.................68 vii

Contents Preface ......................................................................................................................xi Acknowledgments ................................................................................................ xvii Dedication...............................................................................................................xix CHAPTER 1 INTRODUCTION................................................................................1 1.1 GPUs as Parallel Computers..........................................................2 1.2 Architecture of a Modern GPU......................................................8 1.3 Why More Speed or Parallelism?................................................10 1.4 Parallel Programming Languages and Models............................13 1.5 Overarching Goals........................................................................15 1.6 Organization of the Book.............................................................16 CHAPTER 2 HISTORY OF GPU COMPUTING .....................................................21 2.1 Evolution of Graphics Pipelines ..................................................21 2.1.1 The Era of Fixed-Function Graphics Pipelines..................22 2.1.2 Evolution of Programmable Real-Time Graphics .............26 2.1.3 Unified Graphics and Computing Processors ....................29 2.1.4 GPGPU: An Intermediate Step...........................................31 2.2 GPU Computing ...........................................................................32 2.2.1 Scalable GPUs.....................................................................33 2.2.2 Recent Developments..........................................................34 2.3 Future Trends................................................................................34 CHAPTER 3 INTRODUCTION TO CUDA..............................................................39 3.1 Data Parallelism............................................................................39 3.2 CUDA Program Structure ............................................................41 3.3 A Matrix–Matrix Multiplication Example...................................42 3.4 Device Memories and Data Transfer...........................................46 3.5 Kernel Functions and Threading..................................................51 3.6 Summary .......................................................................................56 3.6.1 Function declarations ..........................................................56 3.6.2 Kernel launch ......................................................................56 3.6.3 Predefined variables ............................................................56 3.6.4 Runtime API........................................................................57 CHAPTER 4 CUDA THREADS.............................................................................59 4.1 CUDA Thread Organization ........................................................59 4.2 Using blockIdx and threadIdx ..........................................64 4.3 Synchronization and Transparent Scalability ..............................68 vii

viii Contents 4.4Thread Assignment. 4.5 Thread Scheduling and Latency Tolerance....1 4.6 Summary… .74 4.7Exercises94 CHAPTER 5 cUDATM MEMORIES.7 5.1 Importance of Memory Access Efficiency....... .78 5.2 CUDA Device Memory Types........... .79 5.3 A Strategy for Reducing Global Memory Traffic.........83 5.4 Memory as a Limiting Factor to Parallelism..... 90 5.5 Summary 92 5.6 Exercises..................................................9 CHAPTER 6 PERFORMANCE CONSIDERATIONS 95 6.1 More on Thread Execution.............. ..96 6.2 Global Memory Bandwidth. .103 6.3 Dynamic Partitioning of SM Resources.... .111 6.4 Data Prefetching… .113 6.5 Instruction Mix. .115 6.6 Thread Granularity ....... .116 6.7 Measured Performance and Summary.................. 118 6.8 Exercises… .120 CHAPTER 7 FLOATING POINT CONSIDERATIONS..125 7.1 Floating-Point Format............ .126 7.1.1 Normalized Representation of M....................126 7.1.2 Excess Encoding of E.127 7.2 Representable Numbers............... 129 7.3 Special Bit Patterns and Precision.134 7.4 Arithmetic Accuracy and Rounding135 7.5 Algorithm Considerations.............. .136 7.6 Summary… .138 77Exercises .138 CHAPTER 8 APPLICATION CASE STUDY:ADVANCED MRI RECONSTRUCTI0N.… .141 8.1 Application Background142 8.2Iterative Reconstruction44 8.3Computing Fd 148 Step 1.Determine the Kernel Parallelism Structure................149 Step 2.Getting Around the Memory Bandwidth Limitation....156

4.4 Thread Assignment.......................................................................70 4.5 Thread Scheduling and Latency Tolerance .................................71 4.6 Summary .......................................................................................74 4.7 Exercises .......................................................................................74 CHAPTER 5 CUDA MEMORIES.......................................................................77 5.1 Importance of Memory Access Efficiency..................................78 5.2 CUDA Device Memory Types ....................................................79 5.3 A Strategy for Reducing Global Memory Traffic.......................83 5.4 Memory as a Limiting Factor to Parallelism ..............................90 5.5 Summary .......................................................................................92 5.6 Exercises .......................................................................................93 CHAPTER 6 PERFORMANCE CONSIDERATIONS................................................95 6.1 More on Thread Execution ..........................................................96 6.2 Global Memory Bandwidth........................................................103 6.3 Dynamic Partitioning of SM Resources ....................................111 6.4 Data Prefetching .........................................................................113 6.5 Instruction Mix ...........................................................................115 6.6 Thread Granularity .....................................................................116 6.7 Measured Performance and Summary .......................................118 6.8 Exercises .....................................................................................120 CHAPTER 7 FLOATING POINT CONSIDERATIONS ...........................................125 7.1 Floating-Point Format.................................................................126 7.1.1 Normalized Representation of M .....................................126 7.1.2 Excess Encoding of E.......................................................127 7.2 Representable Numbers..............................................................129 7.3 Special Bit Patterns and Precision.............................................134 7.4 Arithmetic Accuracy and Rounding ..........................................135 7.5 Algorithm Considerations...........................................................136 7.6 Summary .....................................................................................138 7.7 Exercises .....................................................................................138 CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION.......................................................................141 8.1 Application Background.............................................................142 8.2 Iterative Reconstruction..............................................................144 8.3 Computing FHd...........................................................................148 Step 1. Determine the Kernel Parallelism Structure.................149 Step 2. Getting Around the Memory Bandwidth Limitation....156 viii Contents

Contents iⅸ Step 3.Using Hardware Trigonometry Functions.....163 Step 4.Experimental Performance Tuning.....................166 8.4 Final Evaluation.. .167 8.5 Exercises.170 CHAPTER 9 APPLICATION CASE STUDY:MOLECULAR VISUALIZATION AND ANALYSIS. .173 9.1 Application Background194 9.2 A Simple Kemnel Implementation....176 9.3 Instruction Execution Efficiency.................. 180 9.4 Memory Coalescing182 9.5 Additional Performance Comparisons185 9.6 Using Multiple GPUs.. .187 9.7 Exercises........ .188 CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING... 191 10.1 Goals of Parallel Programming .................... 192 10.2 Problem Decomposition. 193 10.3 Algorithm Selection ........ .196 10.4 Computational Thinking...... 202 10.5 Exercises… .204 CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM...05 11.1 Background.… 205 11.2 Data Parallelism Model........ .207 11.3 Device Architecture209 11.4 Kernel Functions......... 211 11.5 Device Management and Kernel Launch...........................212 11.6 Electrostatic Potential Map in OpenCL14 11.7 Summary.… .219 11.8 Exercises. 220 CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK21 12.1Goals Revisited221 12.2 Memory Architecture Evolution23 12.2.1 Large Virtual and Physical Address Spaces.......223 12.2.2 Unified Device Memory Space.224 12.2.3 Configurable Caching and Scratch Pad...225 12.2.4 Enhanced Atomic Operations.226 12.2.5 Enhanced Global Memory Access.................... .226

Step 3. Using Hardware Trigonometry Functions ....................163 Step 4. Experimental Performance Tuning ...............................166 8.4 Final Evaluation..........................................................................167 8.5 Exercises .....................................................................................170 CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS............................................................................173 9.1 Application Background.............................................................174 9.2 A Simple Kernel Implementation ..............................................176 9.3 Instruction Execution Efficiency................................................180 9.4 Memory Coalescing....................................................................182 9.5 Additional Performance Comparisons .......................................185 9.6 Using Multiple GPUs .................................................................187 9.7 Exercises .....................................................................................188 CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING ....................................................................................191 10.1 Goals of Parallel Programming ...............................................192 10.2 Problem Decomposition ...........................................................193 10.3 Algorithm Selection .................................................................196 10.4 Computational Thinking...........................................................202 10.5 Exercises ...................................................................................204 CHAPTER 11 A BRIEF INTRODUCTION TO OPENCL ......................................205 11.1 Background...............................................................................205 11.2 Data Parallelism Model............................................................207 11.3 Device Architecture..................................................................209 11.4 Kernel Functions ......................................................................211 11.5 Device Management and Kernel Launch ................................212 11.6 Electrostatic Potential Map in OpenCL ..................................214 11.7 Summary...................................................................................219 11.8 Exercises ...................................................................................220 CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK ........................................221 12.1 Goals Revisited.........................................................................221 12.2 Memory Architecture Evolution ..............................................223 12.2.1 Large Virtual and Physical Address Spaces ................223 12.2.2 Unified Device Memory Space ....................................224 12.2.3 Configurable Caching and Scratch Pad........................225 12.2.4 Enhanced Atomic Operations .......................................226 12.2.5 Enhanced Global Memory Access ...............................226 Contents ix

Contents 12.3 Kernel Execution Control Evolution227 12.3.1 Function Calls within Kernel Functions .227 12.3.2 Exception Handling in Kernel Functions......227 12.3.3 Simultaneous Execution of Multiple Kernels.........28 12.3.4 Interruptible Kemnels228 12.4 Core Performance… .229 12.4.1 Double-Precision Speed229 12.4.2 Better Control Flow Efficiency............. .229 12.5 Programming Environment......... .230 12.6 A Bright Outlook… .230 APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE ..... 233 A.1 matrixmul.cu… 233 A.2 matrixmulgold.cpp .237 A.3 matrixmul.h… .238 A.4 assist.h… .239 A.5 Expected Output… 243 APPENDIX B GPU COMPUTE CAPABILITIES..... 245 B.1 GPU Compute Capability Tables...... 245 B.2 Memory Coalescing Variations... .246 ndey........ 251

12.3 Kernel Execution Control Evolution .......................................227 12.3.1 Function Calls within Kernel Functions ......................227 12.3.2 Exception Handling in Kernel Functions.....................227 12.3.3 Simultaneous Execution of Multiple Kernels .............. 228 12.3.4 Interruptible Kernels .....................................................228 12.4 Core Performance.....................................................................229 12.4.1 Double-Precision Speed ...............................................229 12.4.2 Better Control Flow Efficiency ....................................229 12.5 Programming Environment ......................................................230 12.6 A Bright Outlook......................................................................230 APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE .............................................................................233 A.1 matrixmul.cu........................................................................233 A.2 matrixmul_gold.cpp .........................................................237 A.3 matrixmul.h..........................................................................238 A.4 assist.h .................................................................................239 A.5 Expected Output .........................................................................243 APPENDIX B GPU COMPUTE CAPABILITIES ....................................................245 B.1 GPU Compute Capability Tables...............................................245 B.2 Memory Coalescing Variations..................................................246 Index......................................................................................................... 251 x Contents

Preface WHY WE WROTE THIS BOOK Mass-market computing systems that combine multicore CPUs and many- core GPUs have brought terascale computing to the laptop and petascale computing to clusters.Armed with such computing power,we are at the dawn of pervasive use of computational experiments for science,engineer- ing,health,and business disciplines.Many will be able to achieve break- throughs in their disciplines using computational experiments that are of unprecedented level of scale,controllability,and observability.This book provides a critical ingredient for the vision:teaching parallel programming to millions of graduate and undergraduate students so that computational thinking and parallel programming skills will be as pervasive as calculus. We started with a course now known as ECE498AL.During the Christ- mas holiday of 2006,we were frantically working on the lecture slides and lab assignments.David was working the system trying to pull the early GeForce 8800 GTX GPU cards from customer shipments to Illinois,which would not succeed until a few weeks after the semester began.It also became clear that CUDA would not become public until a few weeks after the start of the semester.We had to work out the legal agreements so that we can offer the course to students under NDA for the first few weeks. We also needed to get the words out so that students would sign up since the course was not announced until after the preenrollment period. We gave our first lecture on January 16,2007.Everything fell into place.David commuted weekly to Urbana for the class.We had 52 students,a couple more than our capacity.We had draft slides for most of the first 10 lectures.Wen-mei's graduate student,John Stratton, graciously volunteered as the teaching assistant and set up the lab.All students signed NDA so that we can proceed with the first several lectures until CUDA became public.We recorded the lectures but did not release them on the Web until February.We had graduate students from physics,astron- omy,chemistry,electrical engineering,mechanical engineering as well as computer science and computer engineering.The enthusiasm in the room made it all worthwhile. Since then,we have taught the course three times in one-semester format and two times in one-week intensive format.The ECE498AL course has become a permanent course known as ECE408 of the University of Illinois,Urbana-Champaign.We started to write up some early chapters of this book when we offered ECE498AL the second time.We tested these xi

Preface WHY WE WROTE THIS BOOK Mass-market computing systems that combine multicore CPUs and many￾core GPUs have brought terascale computing to the laptop and petascale computing to clusters. Armed with such computing power, we are at the dawn of pervasive use of computational experiments for science, engineer￾ing, health, and business disciplines. Many will be able to achieve break￾throughs in their disciplines using computational experiments that are of unprecedented level of scale, controllability, and observability. This book provides a critical ingredient for the vision: teaching parallel programming to millions of graduate and undergraduate students so that computational thinking and parallel programming skills will be as pervasive as calculus. We started with a course now known as ECE498AL. During the Christ￾mas holiday of 2006, we were frantically working on the lecture slides and lab assignments. David was working the system trying to pull the early GeForce 8800 GTX GPU cards from customer shipments to Illinois, which would not succeed until a few weeks after the semester began. It also became clear that CUDA would not become public until a few weeks after the start of the semester. We had to work out the legal agreements so that we can offer the course to students under NDA for the first few weeks. We also needed to get the words out so that students would sign up since the course was not announced until after the preenrollment period. We gave our first lecture on January 16, 2007. Everything fell into place. David commuted weekly to Urbana for the class. We had 52 students, a couple more than our capacity. We had draft slides for most of the first 10 lectures. Wen-mei’s graduate student, John Stratton, graciously volunteered as the teaching assistant and set up the lab. All students signed NDA so that we can proceed with the first several lectures until CUDA became public. We recorded the lectures but did not release them on the Web until February. We had graduate students from physics, astron￾omy, chemistry, electrical engineering, mechanical engineering as well as computer science and computer engineering. The enthusiasm in the room made it all worthwhile. Since then, we have taught the course three times in one-semester format and two times in one-week intensive format. The ECE498AL course has become a permanent course known as ECE408 of the University of Illinois, Urbana-Champaign. We started to write up some early chapters of this book when we offered ECE498AL the second time. We tested these xi

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共277页,可试读40页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有