UNID 05-200 Chapter 5 Optimizing Program Performance Guobao Jiang(蒋国宝) 11210240049@fudan.edu.cn loveaborn@foxmail.com
Chapter 5 Optimizing Program Performance Guobao Jiang (蒋国宝) 11210240049@fudan.edu.cn loveaborn@foxmail.com
UN/ Problem 5. 1(P381) The following problem illustrates the way memory aliasing(存储器别名使用) can cause unexpected program behavior. Consider the following procedure to swap two values void swap (int *xp, int *yp) Xp 六 ★ Ⅺp+"yp; 六 X+V* yp=xp -ypi 大 x+y-y=X Ⅺ=x-yp y-×=y f this procedure is called with xp equal to yp what effect will it have 2021/10/29
Problem 5.1 (P381) • The following problem illustrates the way memory aliasing (存储器别名使用) can cause unexpected program behavior. Consider the following procedure to swap two values: void swap(int *xp, int *yp) { *xp = *xp + *yp; /* x+y */ *yp = *xp - *yp; /* x+y-y = x */ *xp = *xp - *yp; /* x+y-x = y */ } If this procedure is called with xp equal to yp, what effect will it have ? 2021/10/29 2
UN/ Problem 5.2(P384) Later in this chapter we will take a single function and generate many different variants that preserve the function s behavior but with different performance characteristics. For three of these variants we found that the run times(in clock cycles)can be approximated by y the following functions: Version 1 60+ 35n · Version2136+4n · Version3157+1.25n For what values of n would each version be the fastest of the three? Remember that n wil always be an integer 2021/10/29
Problem 5.2 (P384) • Later in this chapter we will take a single function and generate many different variants that preserve the function’s behavior, but with different performance characteristics. For three of these variants, we found that the run times (in clock cycles) can be approximated by the following functions: • Version 1 60 + 35n • Version 2 136 + 4n • Version 3 157 + 1.25n For what values of n would each version be the fastest of the three ? Remember that n will always be an integer. 2021/10/29 3
UN/ H Problem 5.3(P391 Consider the following functions int min(int x, int y) returnx<y? x: y] int max(int x, int return x<y? y: x void incr(int*xp, int vxp+=v;] nt square(in×) return×*×;} The following three code fragments call these functions 2021/10/29
Problem 5.3 (P391) • Consider the following functions: int min(int x, int y) {return x < y ? x:y;} int max(int x, int y){return x < y ? y:x;} void incr(int *xp, int v){ *xp += v;} int square(int x){return x*x;} • The following three code fragments call these functions: 2021/10/29 4
UN/ ty Problem 5.3+(P391) A. for(i= min(x, y); k=min(x, y): incr (&i, 1) t+= square(i) C. int low min(x,y) int high max(x, y) for (i= low; i< high; incr(&i, 1)) t+= square) 2021/10/29
Problem 5.3+ (P391) A. for(i = min(x, y); i=min(x, y);incr(&i,-1)) t += square(i); C. int low = min(x, y); int high = max(x, y); for (i = low; i < high; incr(&i, 1)) t+= square(i); 2021/10/29 5
UN ty Problem 5.3+(P391) A. for(i= min(x, y) imax(, y); incr (&i, 1) t += square (i C. int low min(x, y): int high max(x,y) for( i= low: i< high; incr &i, 1)) t+= square (: Assume x=10 and y=100. Fill in the table ode min Incr square A 1 91 90 90 B. 91 90 90 90 90 2021/10/29
1 1 90 90 91 1 90 90 1 91 90 90 Problem 5.3+ (P391) A. for(i = min(x, y); i<max(x, y); incr(&i,1)) t += square(i); C. int low = min(x, y); int high = max(x, y); for (i = low; i < high; incr(&i, 1)) t+= square(i); Assume x=10 and y=100. Fill in the table: 2021/10/29 6 Code min max incr square A. B. C
UN/ Problem 5. 4(P415 At times, Gcc does its own…i· Question L6 Write C code for a addl (%eax), %edx add4(%ea×),%edx procedure combine5px8 addl 8 ( %eax), %edx that shows how pointers loop variables, and addl 12(%eax), %edx termination conditions are ad16(%ea×);%edx being computed by this add2o(‰ea×),%edx code. show the general addl 24(%eax), % edx form with arbitrary data addl 28 (%eax), % edx addl $32,%eax and combining operation in i the style of figure 5. 19 addl 8,ecx (P392). Describe how计 cmpl %esi, %ecx differs form our j.L6 handwritten pointer code (Figure 5. 22) 2021/10/29
Problem 5.4 (P415) • At times, GCC does its own … .L6 addl (%eax), %edx addl 4(%eax),%edx addl 8(%eax),%edx addl 12(%eax),%edx addl 16(%eax),%edx addl 20(%eax),%edx addl 24(%eax),%edx addl 28(%eax),%edx addl $32,%eax addl $ 8,%ecx cmpl %esi, %ecx jl .L6 2021/10/29 7 • Question: Write C code for a procedure combine5px8 that shows how pointers, loop variables, and termination conditions are being computed by this code. Show the general form with arbitrary data and combining operation in the style of Figure 5.19 (P392). Describe how it differs form our handwritten pointer code (Figure 5.22)
UN/ Problem 5.5( P421 The following shows the code generated from a variant of combine(P416)that uses eight-way loop unrolling and four-way parallelism L152 addl (%eax), %ecx add 4(%eax), %esi add 8(%eax), %edi Questions: addl 12(%eax), %ebx addl 16(%eax),%ecx:A. What program variable has addl 20(%eax), %esi being spilled onto the stack ? addl 24(%eax), /edi B. At what location on the addl28(%ea×),%ebx addI 32 , %eax stack? addI 8, %edx C. Why is this a good choice cmp|-8(‰ebp),%edx of which value to spill? jI. L152 2021/10/29
Problem 5.5 (P421) • The following shows the code generated from a variant of combine6(P416) that uses eight-way loop unrolling and four-way parallelism. .L152 addl (%eax), %ecx addl 4(%eax), %esi addl 8(%eax), %edi addl 12(%eax), %ebx addl 16(%eax), %ecx addl 20(%eax), %esi addl 24(%eax), %edi addl 28(%eax), %ebx addl $32,%eax addl $ 8,%edx cmpl -8(%ebp), %edx jl .L152 2021/10/29 8 • Questions: A. What program variable has being spilled onto the stack? B. At what location on the stack? C. Why is this a good choice of which value to spill ?
UN/ Problem 5.6(P422) Consider the following function for computing the product of an array of n integers we have unrolled the loop by a factor of 3 int aprod(int al], int n) Int 1, X, y, Z: int r= 1, fo(=0:i<n-2;i+3) 三a[i:y=a[+1]:z=a[i+2] r=r*x*y'z;/*Product computation?/ for ci<n: i++) r x=a[i] return ri 2021/10/29
Problem 5.6 (P422) • Consider the following function for computing the product of an array of n integers. We have unrolled the loop by a factor of 3. int aprod(int a[], int n) { int i, x, y, z; int r = 1; for (i = 0; i < n-2; i += 3 ){ x = a[i]; y=a[i+1]; z=a[i+2]; r = r*x*y*z; /*Product computation*/ } for (; i < n; i++) r *= a[i]; return r; } 2021/10/29 9
UN/ H Problem 5.6+(P422) For the line labeled Product computation, we can use parentheses to create five different associations of the computation as follows: (r*×)*y)*z;A1*/ r=(*(X*y)*z;A2*/ r=r*(X*y)*2)/A3* r=r*(x*(y*z):/A4*/ ×)*(y*z):/"A5*/ Recall from Figure 5. 12 that the integer multiplication operation on this machine has a latency of 4 cycles and an issue time of 1 cycle 2021/10/29
Problem 5.6+ (P422) • For the line labeled Product computation, we can use parentheses to create five different associations of the computation, as follows: r = ((r * x) * y) * z; /* A1 */ r = (r * (x * y)) * z; /* A2 */ r = r * ((x * y) * z); /* A3 */ r = r * (x * (y * z)); /* A4 */ r = (r * x) * (y * z); /* A5 */ • Recall from Figure 5.12 that the integer multiplication operation on this machine has a latency of 4 cycles and an issue time of 1 cycle. 2021/10/29 10