OpenMP OpenMP Application Programming Interface Examples Version 4.0.2-March 2015 Source codes for OpenMP 4.0.2 Examples can be downloaded from github. Copyright 1997-2015 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted,provided the OpenMP
OpenMP Application Programming Interface Examples Version 4.0.2 – March 2015 Source codes for OpenMP 4.0.2 Examples can be downloaded from github. Copyright c 1997-2015 OpenMP Architecture Review Board. Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture Review Board copyright notice and the title of this document appear. Notice is given that copying is by permission of OpenMP Architecture Review Board
Contents 1 A Simple Parallel Loop 2 The OpenMP Memory Model 3 Conditional Compilation 10 4 Internal Control Variables(ICVs) 11 5 The parallel Construct 6 Controlling the Number of Threads on Multiple Nesting Levels 公 7 Interaction Between the num_threads Clause and omp_set_dynamic 20 8 The proc_bind Clause 22 8.1 Spread Affinity Policy. ·。”。·”·,··4·4”·。。·。···◆ 22 8.2 Close Affinity Policy 25 83 Master Affinity Policy.....................,...,...... 0 9 Fortran Restrictions on the do Construct 29 10 Fortran Private Loop Iteration Variables 吗 11 The nowait Clause 33 12 The collapse Clause 37 13 The parallel sections Construct 41 14 The firstprivate Clause and the sections Construct 43 15 The single Construct
Contents 1 A Simple Parallel Loop 3 2 The OpenMP Memory Model 4 3 Conditional Compilation 10 4 Internal Control Variables (ICVs) 11 5 The parallel Construct 14 6 Controlling the Number of Threads on Multiple Nesting Levels 17 7 Interaction Between the num_threads Clause and omp_set_dynamic 20 8 The proc_bind Clause 22 8.1 Spread Affinity Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8.2 Close Affinity Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8.3 Master Affinity Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 9 Fortran Restrictions on the do Construct 29 10 Fortran Private Loop Iteration Variables 31 11 The nowait Clause 33 12 The collapse Clause 37 13 The parallel sections Construct 41 14 The firstprivate Clause and the sections Construct 43 15 The single Construct 45 i
16 The task and taskwait Constructs 47 17 Task Dependences 66 17.1 Flow Dependence 66 17.2 Anti-dependence 6 17.3 Output Dependence... 68 l7.4 Concurrent Execution with Dependences.······················ 70 17.5 Matrix multiplication 71 18 The taskgroup Construct 73 19 The taskyield Construct 76 20 The workshare Construct 78 21 The master Construct 82 22 The critical Construct 84 23 Worksharing Constructs Inside a critical Construct 86 24 Binding of barrier Regions 88 25 The atomic Construct 91 26 Restrictions on the atomic Construct % 27 The flush Construct without a List 102 28 Placement of flush,barrier,taskwait and taskyield Directives 106 29 The ordered Clause and the ordered Construct 110 30 Cancellation Constructs 114 31 The threadprivate Directive 119 32 Parallel Random Access Iterator Loop 125 33 Fortran Restrictions on shared and private Clauses with Common Blocks 126 ii OpenMP Examples Version 4.0.2-March 2015
16 The task and taskwait Constructs 47 17 Task Dependences 66 17.1 Flow Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 17.2 Anti-dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 17.3 Output Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 17.4 Concurrent Execution with Dependences . . . . . . . . . . . . . . . . . . . . . . . 70 17.5 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 18 The taskgroup Construct 73 19 The taskyield Construct 76 20 The workshare Construct 78 21 The master Construct 82 22 The critical Construct 84 23 Worksharing Constructs Inside a critical Construct 86 24 Binding of barrier Regions 88 25 The atomic Construct 91 26 Restrictions on the atomic Construct 98 27 The flush Construct without a List 102 28 Placement of flush, barrier, taskwait and taskyield Directives 106 29 The ordered Clause and the ordered Construct 110 30 Cancellation Constructs 114 31 The threadprivate Directive 119 32 Parallel Random Access Iterator Loop 125 33 Fortran Restrictions on shared and private Clauses with Common Blocks 126 ii OpenMP Examples Version 4.0.2 - March 2015
34 The default(none)Clause 129 35 Race Conditions Caused by Implied Copies of Shared Variables in Fortran 131 36 The private Clause 133 37 Fortran Restrictions on Storage Association with the private Clause 137 38 C/C++Arrays in a firstprivate Clause 140 39 The lastprivate Clause 142 40 The reduction Clause 144 41 The copyin Clause 150 42 The copyprivate Clause 152 43 Nested Loop Constructs 157 44 Restrictions on Nesting of Regions 160 45 The omp_set_dynamic and omp_set_num_threads Routines 167 46 The omp_get_num_threads Routine 169 47 The omp_init_lock Routine 172 48 Ownership of Locks 174 49 Simple Lock Routines 176 50 Nestable Lock Routines 179 51 SIMD Constructs 182 52 target Construct 193 52.1 target Construct on parallel Construct 193 52.2 target Construct with map Clause .....194 Contents
34 The default(none) Clause 129 35 Race Conditions Caused by Implied Copies of Shared Variables in Fortran 131 36 The private Clause 133 37 Fortran Restrictions on Storage Association with the private Clause 137 38 C/C++ Arrays in a firstprivate Clause 140 39 The lastprivate Clause 142 40 The reduction Clause 144 41 The copyin Clause 150 42 The copyprivate Clause 152 43 Nested Loop Constructs 157 44 Restrictions on Nesting of Regions 160 45 The omp_set_dynamic and omp_set_num_threads Routines 167 46 The omp_get_num_threads Routine 169 47 The omp_init_lock Routine 172 48 Ownership of Locks 174 49 Simple Lock Routines 176 50 Nestable Lock Routines 179 51 SIMD Constructs 182 52 target Construct 193 52.1 target Construct on parallel Construct . . . . . . . . . . . . . . . . . . . . 193 52.2 target Construct with map Clause . . . . . . . . . . . . . . . . . . . . . . . . . 194 Contents iii
52.3 map Clause with to/from map-types........................195 52.4 map Clause with Array Sections .......................... 197 52.5 target Construct with if Clause 198 53 target data Construct 200 53.1 Simple target data Construct 200 53.2 target data Region Enclosing Multiple target Regions.·········· 201 53.3 target data Construct with Orphaned Call...........·.····... 204 53.4 target data Construct with if Clause 208 54 target update Construct 212 54.1 Simple target data and target update Constructs 212 54.2 target update Construct with if Clause 214 55 declare target Construct 216 55.1 declare target and end declare target for a Function 216 55.2 declare target Construct for Class Type 218 55.3 declare target and end declare target for Variables·········: 219 55.4 declare target and end declare target with declare simd 222 56 teams Constructs 224 56.1 target and teams Constructs with omp_get_num_teams and omp_get_team_num Routines·········.·····,。,.,,·.· 224 56.2 target,teams,and distribute Constructs·,··. 226 56.3 target teams,and Distribute Parallel Loop Constructs·········· 227 56.4 target teams and Distribute Parallel Loop Constructs with Scheduling Clauses 229 56.5 target teams and distribute simd Constructs............... 230 56.6 target teams and Distribute Parallel Loop SIMD Constructs.......... 232 57 Asynchronous Execution of a target Region Using Tasks 233 58 Array Sections in Device Constructs 238 59 Device Routines 243 59.I omp_is_initialdevice Routine....................... 243 59.2omp_get_num_devices Routine..··········· 245 iv OpenMP Examples Version 4.0.2-March 2015
52.3 map Clause with to/from map-types . . . . . . . . . . . . . . . . . . . . . . . . 195 52.4 map Clause with Array Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 52.5 target Construct with if Clause . . . . . . . . . . . . . . . . . . . . . . . . . 198 53 target data Construct 200 53.1 Simple target data Construct . . . . . . . . . . . . . . . . . . . . . . . . . . 200 53.2 target data Region Enclosing Multiple target Regions . . . . . . . . . . . . 201 53.3 target data Construct with Orphaned Call . . . . . . . . . . . . . . . . . . . . 204 53.4 target data Construct with if Clause . . . . . . . . . . . . . . . . . . . . . . 208 54 target update Construct 212 54.1 Simple target data and target update Constructs . . . . . . . . . . . . . 212 54.2 target update Construct with if Clause . . . . . . . . . . . . . . . . . . . . 214 55 declare target Construct 216 55.1 declare target and end declare target for a Function . . . . . . . . . . 216 55.2 declare target Construct for Class Type . . . . . . . . . . . . . . . . . . . . 218 55.3 declare target and end declare target for Variables . . . . . . . . . . 219 55.4 declare target and end declare target with declare simd . . . . . 222 56 teams Constructs 224 56.1 target and teams Constructs with omp_get_num_teams and omp_get_team_num Routines . . . . . . . . . . . . . . . . . . . . . . . . 224 56.2 target, teams, and distribute Constructs . . . . . . . . . . . . . . . . . . 226 56.3 target teams, and Distribute Parallel Loop Constructs . . . . . . . . . . . . . 227 56.4 target teams and Distribute Parallel Loop Constructs with Scheduling Clauses 229 56.5 target teams and distribute simd Constructs . . . . . . . . . . . . . . . 230 56.6 target teams and Distribute Parallel Loop SIMD Constructs . . . . . . . . . . 232 57 Asynchronous Execution of a target Region Using Tasks 233 58 Array Sections in Device Constructs 238 59 Device Routines 243 59.1 omp_is_initial_device Routine . . . . . . . . . . . . . . . . . . . . . . . 243 59.2 omp_get_num_devices Routine . . . . . . . . . . . . . . . . . . . . . . . . . 245 iv OpenMP Examples Version 4.0.2 - March 2015
59.3 omp_set_default_device and omp_get_default_device Routines....·.·.·.,·· 246 60 Fortran ASSocIATE Construct 248 A Document Revision History 250 A.1 Changes from 4.0.1 to 4.0.2 ,。。。·,。,,,。··。÷··。·。。。。。·… 250 A.2 Changes from 4.0 to 4.0.1 250 A.3 Changes from 3.1 to 4.0. 。。。。。。 250 Contents
59.3 omp_set_default_device and omp_get_default_device Routines . . . . . . . . . . . . . . . . . . . . . . 246 60 Fortran ASSOCIATE Construct 248 A Document Revision History 250 A.1 Changes from 4.0.1 to 4.0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 A.2 Changes from 4.0 to 4.0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 A.3 Changes from 3.1 to 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Contents v
Introduction 2 This collection of pr 34 mming examples supplments the OpenMPPIor Shad with theopeMincaotheoa conventions used in that document 56 Note-This first release of the OpenMP Examples reflects the OpenMP Version 4.0 specifications. Additional examples are being developed and will be published in future releases of this document. The OpenMP API specification provides a model for parallel programming that is portable across 8 shared memory architectures from different vendors.Compilers from numerous vendors support the OpenMP API. 101 The directives,library routines.and environment variables demonstrated in this document allow users to create and manage parallel programs while permitting portability.The directives extend the 12 C,C++and Fortran base languages with single program multiple data(SPMD)constructs,tasking 13 constructs,device constructs,worksharing constructs,and synchronization constructs,and they 14 provide support for sharing and privatizing data.The functionality to control the runtime 15 167 18 The latest source codes for OpenMP Examples can be downloaded from the sources directory at 9 https://github.com/OpenMP/Examples.The codes for this OpenMP 4.0.2 Examples document have 20 the tag v4.0.2. 222 tion bout the OpenMP API anda list of the comilers that suppor the OpenMP 23 http://www.openmp.org 1
1 Introduction 2 This collection of programming examples supplements the OpenMP API for Shared Memory 3 Parallelization specifications, and is not part of the formal specifications. It assumes familiarity 4 with the OpenMP specifications, and shares the typographical conventions used in that document. 5 Note – This first release of the OpenMP Examples reflects the OpenMP Version 4.0 specifications. 6 Additional examples are being developed and will be published in future releases of this document. 7 The OpenMP API specification provides a model for parallel programming that is portable across 8 shared memory architectures from different vendors. Compilers from numerous vendors support 9 the OpenMP API. 10 The directives, library routines, and environment variables demonstrated in this document allow 11 users to create and manage parallel programs while permitting portability. The directives extend the 12 C, C++ and Fortran base languages with single program multiple data (SPMD) constructs, tasking 13 constructs, device constructs, worksharing constructs, and synchronization constructs, and they 14 provide support for sharing and privatizing data. The functionality to control the runtime 15 environment is provided by library routines and environment variables. Compilers that support the 16 OpenMP API often include a command line option to the compiler that activates and allows 17 interpretation of all OpenMP directives. 18 The latest source codes for OpenMP Examples can be downloaded from the sources directory at 19 https://github.com/OpenMP/Examples. The codes for this OpenMP 4.0.2 Examples document have 20 the tag v4.0.2. 21 Complete information about the OpenMP API and a list of the compilers that support the OpenMP 22 API can be found at the OpenMP.org web site 23 http://www.openmp.org 1
Examples 2 The following are examples of the OpenMP API directives,constructs,and routines. A C/C++ A statement following a directive is compound only when necessary,and a non-compound statement is indented with respect to a directive preceding it. C/C++
1 Examples 2 The following are examples of the OpenMP API directives, constructs, and routines. C / C++ 3 A statement following a directive is compound only when necessary, and a non-compound 4 statement is indented with respect to a directive preceding it. C / C++ 2
1 CHAPTER1 2 A Simple Parallel Loop The following example demonstrates how to parallelize a simple loop using the parallel loop g construct.The loop iteration variable is private by default,so it is not necessary to specify it explicitly in a private clause. C/C++ 6 Example ploop.Ic void simple(int n,float ta,float +b) int i; ttra-P20 ate by defa电 1 C/C++ Fortran Example ploop.If 5 SUBROUTINE SIMPLE(N,A,B) -3 INTEGER I,N REAL B(N),A(N) s-5 S-6 !SOMP PARALLEL DO !I is private by default S-7 DO I=2,N S-8 B(I)=(a(I)+A(I-1)/2.0 S-9 ENDDO S-10 !SOMP END PARALLEL DO S-11 S-12 END SUBROUTINE SIMPLE Fortran 3
1 CHAPTER 1 2 A Simple Parallel Loop 3 The following example demonstrates how to parallelize a simple loop using the parallel loop 4 construct. The loop iteration variable is private by default, so it is not necessary to specify it 5 explicitly in a private clause. C / C++ 6 Example ploop.1c S-1 void simple(int n, float *a, float *b) S-2 { S-3 int i; S-4 S-5 #pragma omp parallel for S-6 for (i=1; i<n; i++) /* i is private by default */ S-7 b[i] = (a[i] + a[i-1]) / 2.0; S-8 } C / C++ Fortran 7 Example ploop.1f S-1 SUBROUTINE SIMPLE(N, A, B) S-2 S-3 INTEGER I, N S-4 REAL B(N), A(N) S-5 S-6 !$OMP PARALLEL DO !I is private by default S-7 DO I=2,N S-8 B(I) = (A(I) + A(I-1)) / 2.0 S-9 ENDDO S-10 !$OMP END PARALLEL DO S-11 S-12 END SUBROUTINE SIMPLE Fortran 3
1 CHAPTER2 The OpenMP Memory Model In the following example,at Print 1,the value of x could be either 2 or 5,depending on the timing g of the threads,and the implementation of the assignment to x.There are two reasons that the value at Print I might not be 5.First,Print I might be executed before the assignment to x is executed. 67 ment.the value5 is not guaranteed to be seen by 8 The barrier after Print I contains implicit flushes on all threads,as well as a thread synchronization. so the programmer is guaranteed that the value 5 will be printed by both Print 2 and Print 3. C/C++ 10 Example mem_model.Io #include #include int main()( int x; x=2; pragma omp parallel num_threads(2)shared(x) if (omp- -get_thread_num()=-0)( 85 fol ng read Thread#sd:x 819 S-18 s-19 #pragma omp barrier s-20 if (omp_get_thread_num()==0){
1 CHAPTER 2 2 The OpenMP Memory Model 3 In the following example, at Print 1, the value of x could be either 2 or 5, depending on the timing 4 of the threads, and the implementation of the assignment to x. There are two reasons that the value 5 at Print 1 might not be 5. First, Print 1 might be executed before the assignment to x is executed. 6 Second, even if Print 1 is executed after the assignment, the value 5 is not guaranteed to be seen by 7 thread 1 because a flush may not have been executed by thread 0 since the assignment. 8 The barrier after Print 1 contains implicit flushes on all threads, as well as a thread synchronization, 9 so the programmer is guaranteed that the value 5 will be printed by both Print 2 and Print 3. C / C++ 10 Example mem_model.1c S-1 #include S-2 #include S-3 S-4 int main(){ S-5 int x; S-6 S-7 x = 2; S-8 #pragma omp parallel num_threads(2) shared(x) S-9 { S-10 S-11 if (omp_get_thread_num() == 0) { S-12 x = 5; S-13 } else { S-14 /* Print 1: the following read of x has a race */ S-15 printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x ); S-16 } S-17 S-18 #pragma omp barrier S-19 S-20 if (omp_get_thread_num() == 0) { 4