Code Optimization
1 Code Optimization
Outline Optimizing blockers Memory alias Side effect in function call Understanding modern Processor Super-scalar Out-of -order execution More Code Optimization techniques Performance Tuning Suggested reading 5.1.57~516
2 Outline • Optimizing Blockers – Memory alias – Side effect in function call • Understanding Modern Processor – Super-scalar – Out-of –order execution • More Code Optimization techniques • Performance Tuning • Suggested reading – 5.1, 5.7 ~ 5.16
5.1 Capabilities and Limitations of Optimizing Compliers Review on 5.3 Program Example 5.4 Eliminating Loop Inefficiencies 5.5 Reducing Procedure Calls 5.6 Eliminating Unneeded Memory References
3 5.1 Capabilities and Limitations of Optimizing Compliers Review on 5.3 Program Example 5.4 Eliminating Loop Inefficiencies 5.5 Reducing Procedure Calls 5.6 Eliminating Unneeded Memory References
Example P387 void combinel(vec ptr v, data t *dest) int ii dest=工DENT; for(i=0;主< vec length(v);立++){ int val get vec element(v, i, &val)i dest=★ dest oper va1;
4 void combine1(vec_ptr v, data_t *dest) { int i; *dest = IDENT; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest = *dest OPER val; } } Example P387
Example P388 void combine(vec ptr v, int *dest) int ii int length vec length(v)i ★dest=工DENT; f。r(立=0;i<1 ength;i++){ int val get vec element(v, l, &val)i ★dest=★ dest oper va1;
5 void combine2(vec_ptr v, int *dest) { int i; int length = vec_length(v); *dest = IDENT; for (i = 0; i < length; i++) { int val; get_vec_element(v, i, &val); *dest = *dest OPER val; } } Example P388
Example P392 void combine (vec ptr v, int *dest) int ii int length vec length (v)i int *data get vec start(v)i ★dest=TDEN; f。r(立=0;主<1 ength;i++) *dest *dest oper datalil;
6 void combine3(vec_ptr v, int *dest) { int i; int length = vec_length(v); int *data = get_vec_start(v); *dest = IDENT; for (i = 0; i < length; i++) { *dest = *dest OPER data[i]; } Example P392
Example P394 void combine(vec ptr v int *dest) int ii int length vec length(v)i int *data get vec start(v)i intx=工DENT; for (i=0;i< length; i++) x OpeR datalili ★dest Xi
7 void combine4(vec_ptr v, int *dest) { int i; int length = vec_length(v); int *data = get_vec_start(v); int x = IDENT; for (i = 0; i < length; i++) x = x OPER data[i]; *dest = x; } Example P394
Machine Independent Opt Results Optimizations Reduce function calls and memory references within loop
8 Machine Independent Opt. Results • Optimizations – Reduce function calls and memory references within loop
Machine Independent Opt Results Method Integer Floating Point Abstract -g Combine 42.06 4186 4144 n6000 143.00 P385 Abstract-02 Combine1 31.25 33.25 3125 Move vec_length combine 22.61 21.25 21.15 135.00 P388 P392 data access Combine 6.00 9.00 8.00 11700 Accum in temp Combine 2.00 4.00 3.00 500/394 Performance Anomaly Compl uting FP product of all elements exceptionally slow Very large speedup when accumulate in temporary Memory uses 64-bit format register use 80 Benchmark data caused overflow of 64 bits but not 80
9 Machine Independent Opt. Results • Performance Anomaly – Computing FP product of all elements exceptionally slow. – Very large speedup when accumulate in temporary – Memory uses 64-bit format, register use 80 – Benchmark data caused overflow of 64 bits, but not 80 Method Integer Floating Point + * + * Abstract -g 42.06 41.86 41.44 160.00 Abstract -O2 31.25 33.25 31.25 143.00 Move vec_length 22.61 21.25 21.15 135.00 data access 6.00 9.00 8.00 117.00 Accum. in temp Combine4 2.00 4.00 3.00 5.00 Combine3 Combine2 Combine1 Combine1 P385 P388 P392 P394
Optimization blockers P394 void combine(vec ptr v, int *dest) int i int length vec length(v) int *data get vec start(v)i int sum =0; f。r(主=0;i<1 ength;i++) sum + data[i]i ★dest=sum;
10 Optimization Blockers P394 void combine4(vec_ptr v, int *dest) { int i; int length = vec_length(v); int *data = get_vec_start(v); int sum = 0; for (i = 0; i < length; i++) sum += data[i]; *dest = sum; }