An Introduction to PARAL■LEL PR①GRAMMING Peter Pacheco M< HORSA时KAUPHA鲜N
Contents Preface ................................................................................ XV Acknowledgments.… xviii About the Author…… xix CHAPTER 1 Why Parallel Computing?................................... 1 1.1 Why We Need Ever-Increasing Performance.................. 2 1.2 Why We're Building Parallel Systems ....................... 3 1.3 Why We Need to Write Parallel Programs..................... 3 1.4 How Do We Write Parallel Programs?......................... 6 1.5 What We'll Be Doing.................................... 8 1.6 Concurrent,Parallel,Distributed............................... 9 1.7 The Rest of the Book................................. 10 1.8 A Word of Warning............................................... 10 1.9 Typographical Conventions..... 11 1.10 Summary............................................ 12 1.11 Exercises.… 12 CHAPTER 2 Parallel Hardware and Parallel Software.................... 15 2.1 Some Background.... 15 2.1.1 The von Neumann architecture........................ 15 2.1.2 Processes,multitasking,and threads.................. 17 2.2 Modifications to the von Neumann Model..................... 18 2.2.1 The basics of caching................................ 19 2.2.2 Cache mappings................ 20 2.2.3 Caches and programs:an example 22 2.2.4 Virtual memory ....................................... 23 2.2.5 Instruction-level parallelism........................... 25 2.2.6 Hardware multithreading.............................. 28 2.3 Parallel Hardware.… 29 2.3.1 SIMD systems.… 29 2.3.2 MIMD systems.........… 32 2.3.3 Interconnection networks.............................. 35 2.3.4 Cache coherence................... 43 2.3.5 Shared-memory versus distributed-memory.......... 46 2.4 Parallel Software..................................... 47 2.4.1 Caveats.......... 47 2.4.2 Coordinating the processes/threads.................... 48 2.4.3 Shared-memory.............................. 49 i送
Contents Preface .............................................................................. xv Acknowledgments .................................................................. xviii About the Author ................................................................... xix CHAPTER 1 Why Parallel Computing?................................... 1 1.1 Why We Need Ever-Increasing Performance .................. 2 1.2 Why We’re Building Parallel Systems ......................... 3 1.3 Why We Need to Write Parallel Programs ..................... 3 1.4 How Do We Write Parallel Programs? ......................... 6 1.5 What We’ll Be Doing ............................................. 8 1.6 Concurrent, Parallel, Distributed ................................ 9 1.7 The Rest of the Book ............................................. 10 1.8 A Word of Warning ............................................... 10 1.9 Typographical Conventions ...................................... 11 1.10 Summary ........................................................... 12 1.11 Exercises ........................................................... 12 CHAPTER 2 Parallel Hardware and Parallel Software .................... 15 2.1 Some Background................................................. 15 2.1.1 The von Neumann architecture ........................ 15 2.1.2 Processes, multitasking, and threads .................. 17 2.2 Modifications to the von Neumann Model ..................... 18 2.2.1 The basics of caching................................... 19 2.2.2 Cache mappings ........................................ 20 2.2.3 Caches and programs: an example .................... 22 2.2.4 Virtual memory ......................................... 23 2.2.5 Instruction-level parallelism ........................... 25 2.2.6 Hardware multithreading............................... 28 2.3 Parallel Hardware ................................................. 29 2.3.1 SIMD systems .......................................... 29 2.3.2 MIMD systems ......................................... 32 2.3.3 Interconnection networks .............................. 35 2.3.4 Cache coherence ........................................ 43 2.3.5 Shared-memory versus distributed-memory .......... 46 2.4 Parallel Software .................................................. 47 2.4.1 Caveats .................................................. 47 2.4.2 Coordinating the processes/threads.................... 48 2.4.3 Shared-memory ......................................... 49 ix
X Contents 2.4.4 Distributed-memory.................................... 53 2.4.5 Programming hybrid systems.......................... 56 2.5 Input and Output............................. 56 2.6 Performance....................... 58 2.6.1 Speedup and efficiency................................. 58 2.6.2 Amdahl's law..................................... 61 2.6.3 Scalability.............................................. 62 2.6.4 Taking timings..................................... 63 2.7 Parallel Program Design............ 65 2.7.1 An example..… 66 2.8 Writing and Running Parallel Programs........................ 70 2.9 Assumptions… 70 2.10 Summary… 71 2.10.1 Serial systems........................................... 71 2.10.2 Parallel hardware................................ 73 2.10.3 Parallel software................................ 74 2.10.4 Input and output ................................... 75 2.l0.5 Performance.… 75 2.10.6 Parallel program design ............................... 76 2.10.7 Assumptions.… 76 2.11 Exercises..... 77 CHAPTER 3 Distributed-Memory Programming with MPl................. 83 3.1 Getting Started.… 84 3.1.1 Compilation and execution.......................... 84 3.1.2 MPI programs.… 86 3.1.3 MPI_Init and MPI_Finalize.......................... 86 3.1.4 Communicators,MPI_Comm_size and MPI.Comm_rank.… 87 3.1.5 SPMD programs............................ 88 3.1.6 Communication........................................ 88 3.1.7 MPI_Send.… 88 3.1.8MP1_ReCV.… 90 3.1.9 Message matching................................... 91 3.l.10 The status-p argument.…....… 92 3.1.11 Semantics of MPI_Send and MPI_Recv................ 93 3.1.12 Some potential pitfalls ................... 94 3.2 The Trapezoidal Rule in MPI........................... 94 3.2.1 The trapezoidal rule................................... 94 3.2.2 Parallelizing the trapezoidal rule...................... 96
x Contents 2.4.4 Distributed-memory .................................... 53 2.4.5 Programming hybrid systems .......................... 56 2.5 Input and Output .................................................. 56 2.6 Performance ....................................................... 58 2.6.1 Speedup and efficiency ................................. 58 2.6.2 Amdahl’s law ........................................... 61 2.6.3 Scalability ............................................... 62 2.6.4 Taking timings .......................................... 63 2.7 Parallel Program Design ......................................... 65 2.7.1 An example ............................................. 66 2.8 Writing and Running Parallel Programs........................ 70 2.9 Assumptions ....................................................... 70 2.10 Summary ........................................................... 71 2.10.1 Serial systems ........................................... 71 2.10.2 Parallel hardware ....................................... 73 2.10.3 Parallel software ........................................ 74 2.10.4 Input and output ........................................ 75 2.10.5 Performance............................................. 75 2.10.6 Parallel program design ................................ 76 2.10.7 Assumptions ............................................ 76 2.11 Exercises ........................................................... 77 CHAPTER 3 Distributed-Memory Programming with MPI ................. 83 3.1 Getting Started..................................................... 84 3.1.1 Compilation and execution............................. 84 3.1.2 MPI programs........................................... 86 3.1.3 MPI Init and MPI Finalize .......................... 86 3.1.4 Communicators, MPI Comm size and MPI Comm rank..................................... 87 3.1.5 SPMD programs ........................................ 88 3.1.6 Communication ......................................... 88 3.1.7 MPI Send ................................................ 88 3.1.8 MPI Recv ................................................ 90 3.1.9 Message matching ...................................... 91 3.1.10 The status p argument ................................ 92 3.1.11 Semantics of MPI Send and MPI Recv ................ 93 3.1.12 Some potential pitfalls ................................. 94 3.2 The Trapezoidal Rule in MPI .................................... 94 3.2.1 The trapezoidal rule .................................... 94 3.2.2 Parallelizing the trapezoidal rule ...................... 96
Contents xi 3.3 Dealing with/0.… 97 3.3.10 utput.… 97 3.3.2nput.… 100 3.4 Collective Communication............................... 101 3.4.1 Tree-structured communication........................ 102 3.4.2 MPI_Reduce.… 103 3.4.3 Collective vs.point-to-point communications........ 105 3.4.4MPIA11 reduce.… 106 3.4.5 Broadcast......... 106 3.4.6 Data distributions...................................... 109 3.4.7 Scatter… 110 3.4.8 Gather....... 112 3.4.9 Allgather… 113 3.5 MPI Derived Datatypes................................. 116 3.6 Performance Evaluation of MPI Programs..................... 119 3.6.1 Taking timings.................................. 119 3.6.2 Results.… 122 3.6.3 Speedup and efficiency........................... 125 3.6.4 Scalability............................................... 126 3.7 A Parallel Sorting Algorithm.................................. 127 3.7.1 Some simple serial sorting algorithms ............... 127 3.7.2 Parallel odd-even transposition sort................... 129 3.7.3 Safety in MPI programs.............................. 132 3.7.4 Final details of parallel odd-even sort................. 134 3.8 Suimmary................................. 136 3.9 Exercises.… 140 3.10 Programming Assignments........................ 147 CHAPTER 4 Shared-Memory Programming with Pthreads................ 151 4.1 Processes,Threads,and Pthreads.............................. 151 4.2 Hello,World.................... 153 4.2.1 Execution.… 153 4.2.2 Preliminaries............................................ 155 4.2.3 Starting the threads.................................. 156 4.2.4 Running the threads ................................. 157 4.2.5 Stopping the threads....................... 158 4.2.6 Error checking… 158 4.2.7 Other approaches to thread startup.................... 159 4.3 Matrix-Vector Multiplication............................ 159 4.4 Critical Sections.......................................... 162
Contents xi 3.3 Dealing with I/O .................................................. 97 3.3.1 Output ................................................... 97 3.3.2 Input ..................................................... 100 3.4 Collective Communication....................................... 101 3.4.1 Tree-structured communication........................ 102 3.4.2 MPI Reduce ............................................. 103 3.4.3 Collective vs. point-to-point communications ........ 105 3.4.4 MPI Allreduce ......................................... 106 3.4.5 Broadcast ................................................ 106 3.4.6 Data distributions ....................................... 109 3.4.7 Scatter ................................................... 110 3.4.8 Gather ................................................... 112 3.4.9 Allgather ................................................ 113 3.5 MPI Derived Datatypes .......................................... 116 3.6 Performance Evaluation of MPI Programs..................... 119 3.6.1 Taking timings .......................................... 119 3.6.2 Results................................................... 122 3.6.3 Speedup and efficiency ................................. 125 3.6.4 Scalability ............................................... 126 3.7 A Parallel Sorting Algorithm .................................... 127 3.7.1 Some simple serial sorting algorithms ................ 127 3.7.2 Parallel odd-even transposition sort ................... 129 3.7.3 Safety in MPI programs ................................ 132 3.7.4 Final details of parallel odd-even sort ................. 134 3.8 Summary ........................................................... 136 3.9 Exercises ........................................................... 140 3.10 Programming Assignments ...................................... 147 CHAPTER 4 Shared-Memory Programming with Pthreads................ 151 4.1 Processes, Threads, and Pthreads ............................... 151 4.2 Hello, World ....................................................... 153 4.2.1 Execution................................................ 153 4.2.2 Preliminaries ............................................ 155 4.2.3 Starting the threads ..................................... 156 4.2.4 Running the threads .................................... 157 4.2.5 Stopping the threads .................................... 158 4.2.6 Error checking .......................................... 158 4.2.7 Other approaches to thread startup .................... 159 4.3 Matrix-Vector Multiplication .................................... 159 4.4 Critical Sections ................................................... 162
xii Contents 4.5 Busy-Waiting.… 165 4.6 Mutexes.… 168 4.7 Producer-Consumer Synchronization and Semaphores....... 171 4.8 Barriers and Condition Variables............................ 176 4.8.1 Busy-waiting and a mutex............................. 177 4.8.2 Semaphores… 177 4.8.3 Condition variables..................................... 179 4.8.4 Pthreads barriers........................................ 181 4.9 Read-Write Locks................................................. 181 4.9.1 Linked list functions.................................... 181 4.9.2 A multi-threaded linked list............................ 183 4.9.3 Pthreads read-write locks.............................. 187 4.9.4 Performance of the various implementations......... 188 4.9.5 Implementing read-write locks........................ 190 4.10 Caches,Cache Coherence,and False Sharing......... 190 4.11 Thread-Safety.… 195 4.11.1 Incorrect programs can produce correct output.......198 4.12 Summary.… 198 4.13 Exercises..… 200 4.14 Programming Assignments................................... 206 CHAPTER 5 Shared-Memory Programming with OpenMP............. 209 5.1 Getting Started............................................... 210 5.1.1 Compiling and running OpenMP programs........... 211 5.1.2 The program.…… 212 5.1.3 Error checking ........................................ 215 5.2 The Trapezoidal Rule................ 216 5.2.1 A first OpenMP version............................. 216 5.3 Scope of Variables..... 220 5.4 The Reduction Clause...................................... 221 5.5 The parallel for Directive............................... 224 5.5.1 Caveats.… 225 5.5.2 Data dependences............................. 227 5.5.3 Finding loop-carried dependences..................... 228 5.5.4 Estimatingπ.....…..… 229 5.5.5 More on scope.… 231 5.6 More About Loops in OpenMP:Sorting................... 232 5.6.1 Bubble sort...… 232 5.6.2 Odd-even transposition sort............................ 233 5.7 Scheduling Loops........... 236 5.7.1 The schedule clause................................... 237 5.7.2 The static schedule type.......................... 238
xii Contents 4.5 Busy-Waiting ...................................................... 165 4.6 Mutexes ............................................................ 168 4.7 Producer-Consumer Synchronization and Semaphores....... 171 4.8 Barriers and Condition Variables................................ 176 4.8.1 Busy-waiting and a mutex ............................. 177 4.8.2 Semaphores ............................................. 177 4.8.3 Condition variables ..................................... 179 4.8.4 Pthreads barriers ........................................ 181 4.9 Read-Write Locks ................................................. 181 4.9.1 Linked list functions.................................... 181 4.9.2 A multi-threaded linked list ............................ 183 4.9.3 Pthreads read-write locks .............................. 187 4.9.4 Performance of the various implementations ......... 188 4.9.5 Implementing read-write locks ........................ 190 4.10 Caches, Cache Coherence, and False Sharing ................. 190 4.11 Thread-Safety...................................................... 195 4.11.1 Incorrect programs can produce correct output ....... 198 4.12 Summary ........................................................... 198 4.13 Exercises ........................................................... 200 4.14 Programming Assignments ...................................... 206 CHAPTER 5 Shared-Memory Programming with OpenMP ................ 209 5.1 Getting Started..................................................... 210 5.1.1 Compiling and running OpenMP programs........... 211 5.1.2 The program ............................................ 212 5.1.3 Error checking .......................................... 215 5.2 The Trapezoidal Rule ............................................. 216 5.2.1 A first OpenMP version ................................ 216 5.3 Scope of Variables ................................................ 220 5.4 The Reduction Clause ............................................ 221 5.5 The parallel for Directive .................................... 224 5.5.1 Caveats .................................................. 225 5.5.2 Data dependences....................................... 227 5.5.3 Finding loop-carried dependences..................... 228 5.5.4 Estimating π ............................................ 229 5.5.5 More on scope .......................................... 231 5.6 More About Loops in OpenMP: Sorting ....................... 232 5.6.1 Bubble sort .............................................. 232 5.6.2 Odd-even transposition sort ............................ 233 5.7 Scheduling Loops ................................................. 236 5.7.1 The schedule clause................................... 237 5.7.2 The static schedule type.............................. 238
Contents xiii 5.7.3 The dynamic and guided schedule types............. 239 5.7.4 The runtime schedule type............................ 239 5.7.5 Which schedule?........................................ 241 5.8 Producers and Consumers..................... 241 5.8.1 Queues.… 241 5.8.2 Message-passing............. 242 5.8.3 Sending messages ..................................... 243 5.8.4 Receiving messages.................................... 243 5.8.5 Termination detection.................................. 244 58.6 Startup.… 244 5.8.7 The atomic directive................................... 245 5.8.8 Critical sections and locks............................. 246 5.8.9 Using locks in the message-passing program......... 248 5.8.10 critical directives,atomic directives, 0rl0Cks2.… 249 5.8.11 Some caveats.… 249 5.9 Caches,Cache Coherence,and False Sharing................. 251 5.10 Thread-Safety.… 256 5.10.1 Incorrect programs can produce correct output....... 258 5.11 Summary.… 259 5.12 Exercises.… 263 5.13 Programming Assignments................................ 267 CHAPTER 6 Parallel Program Development............ 271 6.1 Two n-Body Solvers............................. 271 6.l.1 The problem.… 271 6.1.2 Two serial programs.................................. 273 6.1.3 Parallelizing the n-body solvers....................... 277 6.1.4 A word about I/O............................. 280 6.1.5 Parallelizing the basic solver using OpenMP......... 281 6.1.6 Parallelizing the reduced solver using OpenMP...... 284 6.1.7 Evaluating the OpenMP codes....................... 288 6.1.8 Parallelizing the solvers using pthreads............... 289 6.1.9 Parallelizing the basic solver using MPI.............. 290 6.1.10 Parallelizing the reduced solver using MPI........... 292 6.1.11 Performance of the MPI solvers....................... 297 6.2 Tree Search.................. 299 6.2.1 Recursive depth-first search............................ 302 6.2.2 Nonrecursive depth-first search........................ 303 6.2.3 Data structures for the serial implementations........ 305 6.2.4 Performance of the serial implementations........... 306 6.2.5 Parallelizing tree search.............................. 306
Contents xiii 5.7.3 The dynamic and guided schedule types............. 239 5.7.4 The runtime schedule type ............................ 239 5.7.5 Which schedule?........................................ 241 5.8 Producers and Consumers........................................ 241 5.8.1 Queues................................................... 241 5.8.2 Message-passing ........................................ 242 5.8.3 Sending messages ...................................... 243 5.8.4 Receiving messages .................................... 243 5.8.5 Termination detection .................................. 244 5.8.6 Startup ................................................... 244 5.8.7 The atomic directive ................................... 245 5.8.8 Critical sections and locks ............................. 246 5.8.9 Using locks in the message-passing program ......... 248 5.8.10 critical directives, atomic directives, or locks? ................................................. 249 5.8.11 Some caveats............................................ 249 5.9 Caches, Cache Coherence, and False Sharing ................. 251 5.10 Thread-Safety...................................................... 256 5.10.1 Incorrect programs can produce correct output ....... 258 5.11 Summary ........................................................... 259 5.12 Exercises ........................................................... 263 5.13 Programming Assignments ...................................... 267 CHAPTER 6 Parallel Program Development ............................... 271 6.1 Two n-Body Solvers .............................................. 271 6.1.1 The problem............................................. 271 6.1.2 Two serial programs .................................... 273 6.1.3 Parallelizing the n-body solvers ....................... 277 6.1.4 A word about I/O ....................................... 280 6.1.5 Parallelizing the basic solver using OpenMP ......... 281 6.1.6 Parallelizing the reduced solver using OpenMP ...... 284 6.1.7 Evaluating the OpenMP codes......................... 288 6.1.8 Parallelizing the solvers using pthreads ............... 289 6.1.9 Parallelizing the basic solver using MPI .............. 290 6.1.10 Parallelizing the reduced solver using MPI ........... 292 6.1.11 Performance of the MPI solvers ....................... 297 6.2 Tree Search ........................................................ 299 6.2.1 Recursive depth-first search............................ 302 6.2.2 Nonrecursive depth-first search........................ 303 6.2.3 Data structures for the serial implementations........ 305 6.2.4 Performance of the serial implementations ........... 306 6.2.5 Parallelizing tree search ................................ 306
xiv Contents 6.2.6 A static parallelization of tree search using pthreads ................................................ 309 6.2.7 A dynamic parallelization of tree search using pthreads......................... 310 6.2.8 Evaluating the pthreads tree-search programs........ 315 6.2.9 Parallelizing the tree-search programs using OpenMP...................... 316 6.2.10 Performance of the OpenMP implementations....... 318 6.2.11 Implementation of tree search using MPI and static partitioning ...................... 44 319 6.2.12 Implementation of tree search using MPI and dynamic partitioning............................... 327 6.3 A Word of Caution..… 335 6.4 Which API.… 335 6.5 Summary.… 336 6.5.1 Pthreads and OpenMP..................... 337 6.5.2MPI.… 338 6.6 Exercises.… 341 6.7 Programming Assignments...................................... 350 CHAPTER 7 Where to Go from Here ...................................... 353 References......... 357 Index......... 361
xiv Contents 6.2.6 A static parallelization of tree search using pthreads ................................................. 309 6.2.7 A dynamic parallelization of tree search using pthreads ................................................. 310 6.2.8 Evaluating the pthreads tree-search programs ........ 315 6.2.9 Parallelizing the tree-search programs using OpenMP .......................................... 316 6.2.10 Performance of the OpenMP implementations ....... 318 6.2.11 Implementation of tree search using MPI and static partitioning .................................. 319 6.2.12 Implementation of tree search using MPI and dynamic partitioning............................... 327 6.3 A Word of Caution ................................................ 335 6.4 Which API? ........................................................ 335 6.5 Summary ........................................................... 336 6.5.1 Pthreads and OpenMP.................................. 337 6.5.2 MPI ...................................................... 338 6.6 Exercises ........................................................... 341 6.7 Programming Assignments ...................................... 350 CHAPTER 7 Where to Go from Here ....................................... 353 References ......................................................................... 357 Index ............................................................................... 361
Preface Parallel hardware has been ubiquitous for some time now.It's difficult to find a lap- top,desktop,or server that doesn't use a multicore processor.Beowulf clusters are nearly as common today as high-powered workstations were during the 1990s,and cloud computing could make distributed-memory systems as accessible as desktops. In spite of this,most computer science majors graduate with little or no experience in parallel programming.Many colleges and universities offer upper-division elective courses in parallel computing,but since most computer science majors have to take numerous required courses,many graduate without ever writing a multithreaded or multiprocess program. It seems clear that this state of affairs needs to change.Although many programs can obtain satisfactory performance on a single core,computer scientists should be made aware of the potentially vast performance improvements that can be obtained with parallelism,and they should be able to exploit this potential when the need arises. An Introduction to Parallel Programming was written to partially address this problem.It provides an introduction to writing parallel programs using MPI, Pthreads,and OpenMP-three of the most widely used application programming interfaces (APIs)for parallel programming.The intended audience is students and professionals who need to write parallel programs.The prerequisites are mini- mal:a college-level course in mathematics and the ability to write serial programs in C.They are minimal because we believe that students should be able to start programming parallel systems as early as possible. At the University of San Francisco,computer science students can fulfill a requirement for the major by taking the course,on which this text is based,immedi- ately after taking the"Introduction to Computer Science I"course that most majors take in the first semester of their freshman year.We've been offering this course in parallel computing for six years now,and it has been our experience that there really is no reason for students to defer writing parallel programs until their junior or senior year.To the contrary,the course is popular,and students have found that using concurrency in other courses is much easier after having taken the Introduction course. If second-semester freshmen can learn to write parallel programs by taking a class,then motivated computing professionals should be able to learn to write paral- lel programs through self-study.We hope this book will prove to be a useful resource for them. About This Book As we noted earlier,the main purpose of the book is to teach parallel programming in MPI,Pthreads,and OpenMP to an audience with a limited background in computer science and no previous experience with parallelism.We also wanted to make it as XV
Preface Parallel hardware has been ubiquitous for some time now. It’s difficult to find a laptop, desktop, or server that doesn’t use a multicore processor. Beowulf clusters are nearly as common today as high-powered workstations were during the 1990s, and cloud computing could make distributed-memory systems as accessible as desktops. In spite of this, most computer science majors graduate with little or no experience in parallel programming. Many colleges and universities offer upper-division elective courses in parallel computing, but since most computer science majors have to take numerous required courses, many graduate without ever writing a multithreaded or multiprocess program. It seems clear that this state of affairs needs to change. Although many programs can obtain satisfactory performance on a single core, computer scientists should be made aware of the potentially vast performance improvements that can be obtained with parallelism, and they should be able to exploit this potential when the need arises. An Introduction to Parallel Programming was written to partially address this problem. It provides an introduction to writing parallel programs using MPI, Pthreads, and OpenMP—three of the most widely used application programming interfaces (APIs) for parallel programming. The intended audience is students and professionals who need to write parallel programs. The prerequisites are minimal: a college-level course in mathematics and the ability to write serial programs in C. They are minimal because we believe that students should be able to start programming parallel systems as early as possible. At the University of San Francisco, computer science students can fulfill a requirement for the major by taking the course, on which this text is based, immediately after taking the “Introduction to Computer Science I” course that most majors take in the first semester of their freshman year. We’ve been offering this course in parallel computing for six years now, and it has been our experience that there really is no reason for students to defer writing parallel programs until their junior or senior year. To the contrary, the course is popular, and students have found that using concurrency in other courses is much easier after having taken the Introduction course. If second-semester freshmen can learn to write parallel programs by taking a class, then motivated computing professionals should be able to learn to write parallel programs through self-study. We hope this book will prove to be a useful resource for them. About This Book As we noted earlier, the main purpose of the book is to teach parallel programming in MPI, Pthreads, and OpenMP to an audience with a limited background in computer science and no previous experience with parallelism. We also wanted to make it as xv
xvi Preface flexible as possible so that readers who have no interest in learning one or two of the APIs can still read the remaining material with little effort.Thus,the chapters on the three APIs are largely independent of each other:they can be read in any order, and one or two of these chapters can be bypass.This independence has a cost:It was necessary to repeat some of the material in these chapters.Of course,repeated material can be simply scanned or skipped. Readers with no prior experience with parallel computing should read Chapter 1 first.It attempts to provide a relatively nontechnical explanation of why parallel sys- tems have come to dominate the computer landscape.The chapter also provides a short introduction to parallel systems and parallel programming. Chapter 2 provides some technical background in computer hardware and soft- ware.Much of the material on hardware can be scanned before proceeding to the API chapters.Chapters 3,4,and 5 are the introductions to programming with MPI, Pthreads,and OpenMP,respectively. In Chapter 6 we develop two longer programs:a parallel n-body solver and a parallel tree search.Both programs are developed using all three APIs.Chapter 7 provides a brief list of pointers to additional information on various aspects of parallel computing. We use the C programming language for developing our programs because all three APIs have C-language interfaces,and,since C is such a small language,it is a relatively easy language to learn-especially for C++and Java programmers,since they are already familiar with C's control structures. Classroom Use This text grew out of a lower-division undergraduate course at the University of San Francisco.The course fulfills a requirement for the computer science major,and it also fulfills a prerequisite for the undergraduate operating systems course.The only prerequisites for the course are either a grade of"B"or better in a one-semester introduction to computer science or a "C"or better in a two-semester introduction to computer science.The course begins with a four-week introduction to C program- ming.Since most students have already written Java programs,the bulk of what is covered is devoted to the use pointers in C.I The remainder of the course provides introductions to programming in MPI,Pthreads,and OpenMP. We cover most of the material in Chapters 1,3,4,and 5,and parts of the material in Chapters 2 and 6.The background in Chapter 2 is introduced as the need arises. For example,before discussing cache coherence issues in OpenMP(Chapter 5),we cover the material on caches in Chapter 2. The coursework consists of weekly homework assignments,five programming assignments,a couple of midterms,and a final exam.The homework usually involves IInterestingly,a number of students have said that they found the use of C pointers more difficult than MPI programming
xvi Preface flexible as possible so that readers who have no interest in learning one or two of the APIs can still read the remaining material with little effort. Thus, the chapters on the three APIs are largely independent of each other: they can be read in any order, and one or two of these chapters can be bypass. This independence has a cost: It was necessary to repeat some of the material in these chapters. Of course, repeated material can be simply scanned or skipped. Readers with no prior experience with parallel computing should read Chapter 1 first. It attempts to provide a relatively nontechnical explanation of why parallel systems have come to dominate the computer landscape. The chapter also provides a short introduction to parallel systems and parallel programming. Chapter 2 provides some technical background in computer hardware and software. Much of the material on hardware can be scanned before proceeding to the API chapters. Chapters 3, 4, and 5 are the introductions to programming with MPI, Pthreads, and OpenMP, respectively. In Chapter 6 we develop two longer programs: a parallel n-body solver and a parallel tree search. Both programs are developed using all three APIs. Chapter 7 provides a brief list of pointers to additional information on various aspects of parallel computing. We use the C programming language for developing our programs because all three APIs have C-language interfaces, and, since C is such a small language, it is a relatively easy language to learn—especially for C++ and Java programmers, since they are already familiar with C’s control structures. Classroom Use This text grew out of a lower-division undergraduate course at the University of San Francisco. The course fulfills a requirement for the computer science major, and it also fulfills a prerequisite for the undergraduate operating systems course. The only prerequisites for the course are either a grade of “B” or better in a one-semester introduction to computer science or a “C” or better in a two-semester introduction to computer science. The course begins with a four-week introduction to C programming. Since most students have already written Java programs, the bulk of what is covered is devoted to the use pointers in C.1 The remainder of the course provides introductions to programming in MPI, Pthreads, and OpenMP. We cover most of the material in Chapters 1, 3, 4, and 5, and parts of the material in Chapters 2 and 6. The background in Chapter 2 is introduced as the need arises. For example, before discussing cache coherence issues in OpenMP (Chapter 5), we cover the material on caches in Chapter 2. The coursework consists of weekly homework assignments, five programming assignments, a couple of midterms, and a final exam. The homework usually involves 1 Interestingly, a number of students have said that they found the use of C pointers more difficult than MPI programming
Preface xvii writing a very short program or making a small modification to an existing program. Their purpose is to insure that students stay current with the course work and to give them hands-on experience with the ideas introduced in class.It seems likely that the existence of the assignments has been one of the principle reasons for the course's success.Most of the exercises in the text are suitable for these brief assignments. The programming assignments are larger than the programs written for home- work,but we typically give students a good deal of guidance:We'll frequently include pseudocode in the assignment and discuss some of the more difficult aspects in class.This extra guidance is often crucial:It's not difficult to give programming assignments that will take far too long for the students to complete.The results of the midterms and finals,and the enthusiastic reports of the professor who teaches oper- ating systems,suggest that the course is actually very successful in teaching students how to write parallel programs. For more advanced courses in parallel computing,the text and its online support materials can serve as a supplement so that much of the information on the syntax and semantics of the three APIs can be assigned as outside reading.The text can also be used as a supplement for project-based courses and courses outside of computer science that make use of parallel computation. Support Materials The book's website is located at http://www.mkp.com/pacheco.It will include errata and links to sites with related materials.Faculty will be able to download complete lecture notes,figures from the text,and solutions to the exercises and the programming assignments.All users will be able to download the longer programs discussed in the text. We would greatly appreciate readers letting us know of any errors they find. Please send an email to peter@usfca.edu if you do find a mistake
Preface xvii writing a very short program or making a small modification to an existing program. Their purpose is to insure that students stay current with the course work and to give them hands-on experience with the ideas introduced in class. It seems likely that the existence of the assignments has been one of the principle reasons for the course’s success. Most of the exercises in the text are suitable for these brief assignments. The programming assignments are larger than the programs written for homework, but we typically give students a good deal of guidance: We’ll frequently include pseudocode in the assignment and discuss some of the more difficult aspects in class. This extra guidance is often crucial: It’s not difficult to give programming assignments that will take far too long for the students to complete. The results of the midterms and finals, and the enthusiastic reports of the professor who teaches operating systems, suggest that the course is actually very successful in teaching students how to write parallel programs. For more advanced courses in parallel computing, the text and its online support materials can serve as a supplement so that much of the information on the syntax and semantics of the three APIs can be assigned as outside reading. The text can also be used as a supplement for project-based courses and courses outside of computer science that make use of parallel computation. Support Materials The book’s website is located at http://www.mkp.com/pacheco. It will include errata and links to sites with related materials. Faculty will be able to download complete lecture notes, figures from the text, and solutions to the exercises and the programming assignments. All users will be able to download the longer programs discussed in the text. We would greatly appreciate readers letting us know of any errors they find. Please send an email to peter@usfca.edu if you do find a mistake