Part Ill: Parallel Programming models The Implicit Model 1. Basic Concept: With this approach, programmers write codes using a familiar sequential programming language, and then compiler is responsible to convert automatically it into a parallel codes(ex. KAP from kuck and Associates FORGE from Advanced Parallel Research) 2. Features Simpler semantics: no deadlock; always determinate Better portability due to sequential program Single thread of control makes testing, debugging correctness verification easier Disadvantages: Extremely difficult to develop autoparallel compiler; Autopar always is low-efficiency NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Implicit Model Part III:Parallel Programming Models 3 - 1
Part Ill: Parallel Programming models The data-Parallel model 1. Basic Concept: Data-Parallel model is the native model for SIMD machines. Data parallel programming emphasizes local computations and data routing operations. It can be implemented either on SIMD or on SPMD. Fortran90 and HPF are examples Features Single thread: as far as control fow is concerned. a data parallel program is just like a sequential program Parallel synchronous operation on large data structure (ex. Array etc. Loosely synchronous there is a synchronization after every statement Single address space: all variables reside in a single address space Explicit data allocation: users allocating data may reduce communication overhead Implicit communication: users don't have to specify communication operations NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Data-Parallel Model Part III:Parallel Programming Models 3 - 2
Part Ill: Parallel Programming models The Shared-Variable model 1. Basic Concept: The shared-variable programming is the native model for PVP, SMP and DSM machines. THere is an ANSI X3H5 standard. The portability of programs is problematic 2. Features Multiple threads: A shared variable program uses either SPMD(Single- Program-Multiple-Data)or MPMD (Multiple-Program-Multiple-Data) Asynchronous: Each process executes at its on pace Explicit synchronization: special synchronous operations(barrier, lock, critical region, event)are use Single address space: all variables reside in a single address space. Implicit data and computation distribution: because data can be considered in SM. there is no need to explicitly distribute data and computation. Implicit communication: communication is done implicitly through reading/ writing of shared variables NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Shared-Variable Model Part III:Parallel Programming Models 3 - 3
Part Ill: Parallel Programming models The message-Passing model 1. Basic Concept: The message passing programming is the native model for MPP, Cow. The portability of programs is enchanced greatly by PVM and MPI libraries 2. Features Multiple threads: A message passing program uses either SPMD ( Single- Program-Multiple-Data)or MPMD (Multiple-Program-Multiple-Data Asynchronous operations at different nodes. Explicit synchronization: special synchronous operations(barrier, lock, critical region, event)are used Multiple address space: The processes of a parallel program reside in different address space Explicit data mapping and workload allocation Explicit communication: The processes interact by executing message passing operation NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Message-Passing Model Part III:Parallel Programming Models 3 - 4
Part Ill: Parallel Programming models Comparison of Parallel Programming Models 】国3 NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Comparison of Parallel Programming Models Part III:Parallel Programming Models 3 - 5
Part I: Sample Program π Computation g Integration formula of T 4 1+x d=>,+05y2N 1=01+( aA sequential c code to compute T: #definen 1000000 main double local, pi=0.0. w: ng 1, 10/N; for(i=0i<N计i++){ local=(+0.5)*w; pi=pi+4.0/(1.0+local*local) printf("pi is %f、n”pi*w) NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn p Computation + Integration formula of p : + A sequential C code to compute p : #define N 1000000 main(){ double local,pi=0.0,w; long i; w=1.0/N; for (i=0;i<N;i++){ local=(i+0.5)*w; pi=pi+4.0/(1.0+local*local); } printf(“pi is %f\n”,pi*w); } Part III:Sample Program 4 3 2 1 0 1 ò å - = ´ + + » + = 1 0 1 0 2 2 1 ) 0.5 1 ( 4 1 4 N i N N i dx x p 3 - 6
Part Ill: Shared-Memory Programming Standards ANSI X3H5 c Parallel Construct: Using parallel construct to specify parallelism of X3H5 program Inside a parallel construct includes either parallel block, parallel loop, or single process. program main I The program begins in sequential mode I A is executed by only the base thread parallel I Switch to parallel mode I B is replicated by every team member sections I Starts a parallel block section I One team mem ber executes C section D Another team member executes d ps ections Wait till both C and D are completed using Temporarily switch to sequential mode E E is executed by one team member end singl I Switch back to parallel mode pdo i=1, 6 I Starts a pdo construct 'The team members share the 6 iterations of F end pdo no wait No implicit barrier More replicate code end parallel I Switch back to sequential mode H H is executed by only the initial process I There could be more parallel constructs d Implicit barrier(fence operation: Located parallel,end paralleled section, end pdo, end single forces all memory accesses up to this point to become consistent. Thread interaction and synchronization, including four types of synchronization variables: Latch, Lock, Event and Ordinal NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn ANSI X3H5 + Parallel Construct:Using parallel construct to specify parallelism of X3H5 program. Inside a parallel construct includes either parallel block,parallel loop,or single process. + Implicit barrier(fence operation):Located parallel,end parallel,end psection,end pdo,end psingle forces all memory accesses up to this point to become consistent. + Thread interaction and synchronization,including four types of synchronization variables:Latch,Lock,Event and Ordinal. Part III:Shared-Memory Programming Standards 3 - 7
Part Ill: Shared-Memory Programming Standards POSIX Threads(Pthreads) Pthreads standard was established by ieee standards committee which is similar to Solaris Threads g Thread Management Primitives Function Prototype Meaning int pthread create(pthread t* thread id, pthread attr t*attr, Create a thread void*('mmyroutine)(void*), void*arg) void pthread exit(void*status) A thread exits int pthread join(pthread t thread, void** status) Join a thread pthread t pthread self( void) Retums the calling thread ID g Threads Synchronization Primitives Function Meaning pthread mutex init() Creates a new mutex variable pthread mutex destroy(.) Destroy a mutex variable thread mutex lock(.) Lock(acquire) a mutex variable pthread mutex trylock(.) Try to acquire a mutex variable pthread mutex unlock(.) Unlock(release)a mutex variable pthread cond ini(…) Creates a new conditional variable pthread cond destroy.) Destroy a conditional variable pthread cond wai(…) Wait(block)on a conditional variable pthread_ _cond timedwait() Wait on a conditional variable up to a time limit pthread cond signal(.) Post an event, unlock one waiting process pthread_cond broadcast(.) Post an event, unlock all waiting process NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn POSIX Threads(Pthreads) Pthreads standard was established by IEEE standards committee which is similar to Solaris Threads. + Thread Management Primitives : + Threads Synchronization Primitives : Part III:Shared-Memory Programming Standards 3 - 8
Part Ill: Shared-Memory Programming Standards Shared-Variable Parallel Code to Compute T The following code is a C-like notation: #define N 1000000 maino double local, pi =0.0,w longi; A w=1.0/N B: #pragma parallel #pragma shared( pi, w) #pragma local (i, local #pragma pfor iterate (i=0; N: 1) for(i=0;i<N;i++)& local =(1+0.5)*w: local=4.0/(1.0+ local local #pragma critical pI- PI al; printf("pi is f n", pi *w); 3/* mainO */ NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Shared-Variable Parallel Code to Compute p The following code is a C-like notation : Part III:Shared-Memory Programming Standards 3 - 9
Part Ill: Message Passing Programming MPI: Message Passing Interface E Message-Passing Library approach to parallel programming: A collection of processes executes program written in standard sequential language augmented with calls to library of functions to send/receive massage. K Computation: In MPI programming model, a computation consists of one or more heavy weigh processes that communicate by calling library routines.The number of processes in an MPI computation is normally fixed Communication mechanism a Point-to-point communication operation e Collective communication operations (broadcast, summation etc. E Communicator: to allow the MPI programmers to define modules that allow subprograms to encapsulate communication operations 7 Basic MPI: Although MPI is a complex system including more than 200 functions, we can solve a wide range of problems using just six of its functions Both C language binding and Fortran language binding for MPI. NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn MPI:Message Passing Interface + Message-Passing Library approach to parallel programming:A collection of processes executes program written in standard sequential language augmented with calls to library of functions to send/receive massage. + Computation:In MPI programming model,a computation consists of one or more heavy weigh processes that communicate by calling library routines.The number of processes in an MPI computation is normally fixed. + Communication mechanism : - Point-to-point communication operation. - Collective communication operations (broadcast,summation etc.). + Communicator:to allow the MPI programmers to define modules that allow subprograms to encapsulate communication operations. + Basic MPI:Although MPI is a complex system including more than 200 functions,we can solve a wide range of problems using just six of its functions! + Both C language binding and Fortran language binding for MPI. Part III:Message Passing Programming 3 - 10