Lecture Note 2:Solution of nonlinear equations Xiaoqun Zhang Shanghai Jiao Tong University Last updated:September 21,2012
Lecture Note 2: Solution of nonlinear equations Xiaoqun Zhang Shanghai Jiao Tong University Last updated: September 21, 2012
Lecture note 2 Numerical Analysis 1.1 Solution of nonlinear equations in one variable Chapter 2 in the textbook. ·Give a function f(z),iffp)=O,then we say that p is a"zero”ora"root” of f(x). ·Solve f(x)=0in[a,. -f(r)=22-4x+4=0,we can find the exact solution by hand. -f(r)=3x-et=0,need an algorithm to approximate the solution. An example of Bisection Method. Intermediate Value Theorem (special form):If f(x)is continuous on [a,b]and f(a)f(b)<0,then there exist a p[a,b]such that f(p)=0. f(b) y=f(x) f(a) Figure 1.1:Intermediate Value theorem Example of the Bisection method:Find a zero of f(z)=3x-e on the interval [1,2]. Step1f(1)=3-e≈0.282 f(2)=6-e2≈-1.389 There is a root in [a,b]. Step 2 Divide [1,2]into two equal sections(by the middle point of it):[1,1.5] and[1.5,2]. Where is the zero? Step3f(1.5)=3.1.5-e1.5≈0.018. f(1.5)f(2)<0,so the root is in [1.5,2]. Step 4 Divide [1.5,2]into equal sections(by the middle point of it):[1.5,1.75] and[1.75,2]. f(1.75)≈-0.505 f(1.5)f(1.75)<0 So the root is in [1.5,1.75] 2
Lecture note 2 Numerical Analysis 1.1 Solution of nonlinear equations in one variable • Chapter 2 in the textbook. • Give a function f(x), if f(p) = 0, then we say that p is a ”zero” or a ”root” of f(x). • Solve f(x) = 0 in [a, b]. – f(x) = x 2 − 4x + 4 = 0, we can find the exact solution by hand. – f(x) = 3x − e x = 0, need an algorithm to approximate the solution. An example of Bisection Method. • Intermediate Value Theorem (special form): If f(x) is continuous on [a, b] and f(a)f(b) < 0, then there exist a p ∈ [a, b] such that f(p) = 0. a p b y = f(x) x y f(b) f(a) Figure 1.1: Intermediate Value theorem • Example of the Bisection method: Find a zero of f(x) = 3x − e x on the interval [1, 2]. Step 1 f(1) = 3 − e ≈ 0.282 f(2) = 6 − e 2 ≈ −1.389 There is a root in [a, b]. Step 2 Divide [1, 2] into two equal sections (by the middle point of it): [1, 1.5] and [1.5, 2]. Where is the zero? Step 3 f(1.5) = 3 · 1.5 − e 1.5 ≈ 0.018. f(1.5)f(2) < 0 , so the root is in [1.5, 2]. Step 4 Divide [1.5, 2] into equal sections (by the middle point of it): [1.5, 1.75] and [1.75, 2]. f(1.75) ≈ −0.505 f(1.5)f(1.75) < 0 So the root is in [1.5, 1.75] 2
Lecture note 2 Numerical Analysis Step 5 Divide [1.5,1.75]into two equal section (by the middle point of it): [1.5,1.625]and[1.625,1.75l. f(1.625)≈-0.203 f(1.5)f(1.625)0,then f(pn)f(n)<0.Therefore p is in [pn:bn]. Set [an+1,6n+1]=[Pn,on]. -Set pn+1=ata±,and go to step(). 2 The algorithm will produce a sequence of intervals [an,bn]satisfying 1.p∈[an,bnl. 2.The interval length (on+1-an+1)=(on an) 3.pPn=an地m 2 One issue remains:How to judge if pn is close enough to p?Stopping criteria! 3
Lecture note 2 Numerical Analysis Step 5 Divide [1.5, 1.75] into two equal section (by the middle point of it): [1.5, 1.625] and [1.625, 1.75]. f(1.625) ≈ −0.203 f(1.5)f(1.625) 0, then f(pn)f(bn) < 0. Therefore p is in [pn, bn]. Set [an+1, bn+1] = [pn, bn]. – Set pn+1 = an+1+bn+1 2 , and go to step (⋆). • The algorithm will produce a sequence of intervals [an, bn] satisfying 1. p ∈ [an, bn]. 2. The interval length (bn+1 − an+1) = 1 2 (bn − an). 3. pn = an+bn 2 . • One issue remains: How to judge if pn is close enough to p? Stopping criteria! 3
Lecture note 2 Numerical Analysis Stopping criteria (I)Select a maximum iteration number N.We stop the iteration after N itera- tions. (II)Select a small tolerance e>0.We stop the iteration whenever one of the following is met. 1.The error is small enough (a)Absolute error,lp-pnl<c. (b)Relative error etc. IPl But we don't know p before we find it! 2.The residual is small enough (a)absolute lf(pn)<e. (b)relative<etc. But If(pn)is small the relative error Ipn-pl is small. Example::f(x)=(x-1)10,p=1,andp:=1+1/n.fp2)≤9.8×10-4 but lp2 -pl =0.5. 3.The difference between successive iterations is small enough (a)absolute Ipn -Pn-1l<c. (b)relativeap<,and pn0. But Ipn-Pn-1 converges to zero Pn converges.Counter example: pn=∑i=i(/).lpm-pn-i→0 but p diverges.. Which one is best?It depends! For safety (to avoid infinite loops),we use:when (I)OR (II)is met,we stop. Of course,the algorithm might fail when (I)is met. Convergence and Error estimate Theorem 1 (Theorem 2.1 in the textbook)Suppose -fECla,]. -f(a)f(b)<0. Then the sequence pn generated by the bisection method satisfies pn-p< where p is a zero of f in (a,b). Proof. bn-an=5bn-1-an-1) 11 =互2bm-2-am-2)=2z6m-2-am-2) (1.1) 1 1 -2nb1-a1)=2mb-a 4
Lecture note 2 Numerical Analysis Stopping criteria (I) Select a maximum iteration number N. We stop the iteration after N iterations. (II) Select a small tolerance ǫ > 0. We stop the iteration whenever one of the following is met. 1. The error is small enough (a) Absolute error, |p − pn| < ǫ. (b) Relative error |p−pn| |p| < ǫ, |p−pn| |p−p1| < ǫ, etc. But we don’t know p before we find it! 2. The residual is small enough (a) absolute |f(pn)| < ǫ. (b) relative |f(pn)| |f(p0)| < ǫ etc. But |f(pn)| is small 6=⇒ the relative error |pn − p| is small. Example: f(x) = (x − 1)10 , p = 1, and pi = 1 + 1/n. f(p2) ≤ 9.8 × 10−4 but |p2 − p| = 0.5. 3. The difference between successive iterations is small enough (a) absolute |pn − pn−1| < ǫ. (b) relative |pn−pn−1| |pn| < ǫ, and pn 6= 0. But |pn − pn−1| converges to zero 6=⇒ pn converges. Counter example: pn = Pn i=1(1/i). |pn − pn−1| → 0 but pn diverges. • Which one is best? It depends! • For safety (to avoid infinite loops), we use: when (I) OR (II) is met, we stop. Of course, the algorithm might fail when (I) is met. Convergence and Error estimate • Theorem 1 (Theorem 2.1 in the textbook) Suppose – f ∈ C[a, b]. – f(a)f(b) < 0. Then the sequence pn generated by the bisection method satisfies |pn−p| ≤ b−a 2n , where p is a zero of f in (a, b). Proof. bn − an = 1 2 (bn−1 − an−1) = 1 2 · 1 2 (bn−2 − an−2) = 1 2 2 (bn−2 − an−2) . . . = 1 2 n−1 (b1 − a1) = 1 2 n−1 (b − a) (1.1) 4
Lecture note 2 Numerical Analysis p∈[an,onl and pn=n,so Ipn-pl≤z(bn-an)=是 口 Corollary 1(Corollary 1.2) n limoe pn=p. For bisection method,we can stop it until the absolute lp-pil<e. Example:Determine the number of iterations necessary to solve 3x-e"=0 on [1,2]with accuracy 10-3.We have 6-a 1 pm-川≤2n=2a≤10-3 Solve 1 ≤10-3 Take logarithm on both hand sides, -n≤-31og210,s0n≥31og210≈9.96. We need at least 10 iterations! Summary of the bisection method ·PrOs 1.Simple and easy to implement 2.One function evaluation per iteration 3.No knowledge of the derivative is needed 4.Easy to converge (only need continuity) ·Cons 1.Have to find an interval [a,b]such that f(a)f(b)<0 2.The error depends on b-a and converges slowly 3.It is hard to generalize it to solutions of multi-variable nonlinear equa- tions 1.2 Fixed point iteration 1.2.1 Fixed point Definition:p is a fixed point of g of g(p)=p. Finding a root of f(z)is equivalent to finding a fixed point,e.g., g(x)=x-f(x),g(x)=x+3f(x),g(x)=x-h(x)f(r),h(x)≠0,… 5
Lecture note 2 Numerical Analysis p ∈ [an, bn] and pn = an+bn 2 , so |pn − p| ≤ 1 2 (bn − an) = b−a 2n . • Corollary 1 (Corollary 1.2) lim n→+∞ pn = p. • For bisection method, we can stop it until the absolute |p − pi | < ǫ. • Example: Determine the number of iterations necessary to solve 3x − e x = 0 on [1, 2] with accuracy 10−3 . We have |pn − p| ≤ b − a 2 n = 1 2 n ≤ 10−3 . Solve 1 2 n ≤ 10−3 Take logarithm on both hand sides, −n ≤ −3 log2 10, so n ≥ 3 log2 10 ≈ 9.96. We need at least 10 iterations! Summary of the bisection method • Pros 1. Simple and easy to implement 2. One function evaluation per iteration 3. No knowledge of the derivative is needed 4. Easy to converge (only need continuity) • Cons 1. Have to find an interval [a, b] such that f(a)f(b) < 0 2. The error depends on b − a and converges slowly 3. It is hard to generalize it to solutions of multi-variable nonlinear equations 1.2 Fixed point iteration 1.2.1 Fixed point • Definition: p is a fixed point of g of g(p) = p. • Finding a root of f(x) is equivalent to finding a fixed point, e.g., g(x) := x − f(x), g(x) = x + 3f(x), g(x) = x − h(x)f(x), h(x) 6= 0, , ... 5
Lecture note 2 Numerical Analysis y=g(x) y=T Figure 1.2:Fixed point The advantage of using fixed point. 1.Fixed point form is easier to analyze; 2.certain fix-point iterations lead to powerful root-finding techniques.(We have many freedoms to choose h():e.g.,h()) Example 1:Find the fixed point of g(x)=z2-2. g()=x÷x2-2=x分x2-x-2=0÷x=-1,x=2. Not all function exists a fixed point. Example 2:Find the fixed point of g(x)=2+z2. g(x)=x÷x2+2=x÷x2-x+2=0.No solution! No fixed points. From the examples.we see that the fixed points may not necessarily exist and be unique. Questions:When there exists a fixed point?When the fixed point is unique? Existence and uniqueness are two important aspects in numerical analysis. Theorem 2 (Theorem 2.1)(Eristence)If 1.g∈C[a,b 2.g(x)∈[a, Then g has a fired point in (a,b]. 6
Lecture note 2 Numerical Analysis p y = g(x) x y −2 −1 1 2 y = x b b Figure 1.2: Fixed point • The advantage of using fixed point. 1. Fixed point form is easier to analyze; 2. certain fix-point iterations lead to powerful root-finding techniques. (We have many freedoms to choose h(x): e.g., h(x) = 1 f ′(x) ) • Example 1: Find the fixed point of g(x) = x 2 − 2. g(x) = x ⇔ x 2 − 2 = x ⇔ x 2 − x − 2 = 0 ⇔ x = −1, x = 2. • Not all function exists a fixed point. Example 2: Find the fixed point of g(x) = 2 + x 2 . g(x) = x ⇔ x 2 + 2 = x ⇔ x 2 − x + 2 = 0. No solution! No fixed points. • From the examples, we see that the fixed points may not necessarily exist and be unique. • Questions: When there exists a fixed point? When the fixed point is unique? • Existence and uniqueness are two important aspects in numerical analysis. Theorem 2 (Theorem 2.1) (Existence) If 1. g ∈ C[a, b] 2. g(x) ∈ [a, b] Then g has a fixed point in [a, b]. 6
Lecture note 2 Numerical Analysis .Intuition:The graph of g(x)must cross the graph of y=x. Idea:construct a function and then apply intermediate value theorem. Proof.If g(a)=a and/or g(b)=b,then g has fixed points at a and/or b.Otherwise, then g(a)>a and g(b)0,f(b)=g(b)-b<0. Applying IVT(intermediate value theorem)concludes immediately the proof. 口 Remarks: All the conditions in the above theorems are sufficient but not necessary. ●Example:g(z)=x2-2on【-2,0] It is decreasing on [-2,0]. g(-2)=2≤g(x)≤g(0)=-2. Then g(x)∈[-2,2]年[-2,0] Theorem 2.1 cannot be applied.However,g has a fixed point at-1. Recall the mean value theorem. Theorem 3 (Mean Value Theorem)If f E Cla,b]and f is differentiable on (a,b),then a number g in (a,b)exists with f()=f-f@ b-a Theorem 4(Theorem 2.2)(Uniqueness)In addition to the assumptions 1,2 in Theorem 2.1,if 3.g()erists on(a,b). 4.There erists a k∈(0,l)with |g'(xl≤k for all x∈(a,b). Then the fired point in a,b is unique. Intuition:The derivative is the velocity.The mean velocity from one fixed point to another is 1.There must be a point whose velocity 181. Recall Mean Value Theorem. 7
Lecture note 2 Numerical Analysis • Intuition: The graph of g(x) must cross the graph of y = x. • Idea: construct a function and then apply intermediate value theorem. Proof. If g(a) = a and/or g(b) = b, then g has fixed points at a and/or b. Otherwise, then g(a) > a and g(b) 0, f(b) = g(b) − b < 0. Applying IVT (intermediate value theorem) concludes immediately the proof. Remarks: • All the conditions in the above theorems are sufficient but not necessary. • Example: g(x) = x 2 − 2 on [−2, 0] It is decreasing on [−2, 0]. g(−2) = 2 ≤ g(x) ≤ g(0) = −2. Then g(x) ∈ [−2, 2] 6∈ [−2, 0] Theorem 2.1 cannot be applied. However, g has a fixed point at −1. Recall the mean value theorem. Theorem 3 (Mean Value Theorem) If f ∈ C[a, b] and f is differentiable on (a, b), then a number ξ in (a, b) exists with f ′ (ξ) = f(b) − f(a) b − a Theorem 4 (Theorem 2.2) (Uniqueness) In addition to the assumptions 1,2 in Theorem 2.1, if 3. g ′ (x) exists on (a, b). 4. There exists a k ∈ (0, 1) with |g ′ (x)| ≤ k for all x ∈ (a, b). Then the fixed point in [a, b] is unique. • Intuition: The derivative is the velocity. The mean velocity from one fixed point to another is 1. There must be a point whose velocity is 1. • Recall Mean Value Theorem. 7
Lecture note 2 Numerical Analysis 、2 Figure 1.3:Exitence of fixed point Use contradiction to prove it.Typical in proofs of uniqueness. Proof.Suppose that we have two different fixed points p and q,pq, g(p)=p,g(q)=q. Applying the mean value theorem,there exists a E(a,b)such that 1= 9p)-9@=g(5). P-4 Therefore, 1=lg()≤k<1. It is a contradiction 1 1. 口 Remark: All the conditions in the above theorems are sufficient but not necessary. ·Example:g(z)=(x2-1)/3on[-1,1. The minimum occurs at x=0(g(0)=-)and the maximum at-1 and 1 (g(1)=0 and g(-1)=0).So g(x)E[-1,1].On the other hand, g()= s景 x∈(-1,1). So g satisfies all the hypotheses of Theorem 2.2 and has a unique fixed point. In this example,we can get the fixed point by solving p=o=号mf-p-1=0 We get 1 p=3-v,p=3+V. 8
Lecture note 2 Numerical Analysis p x y −2 −1 1 2 1 2 −1 −2 b a b Figure 1.3: Exitence of fixed point • Use contradiction to prove it. Typical in proofs of uniqueness. Proof. Suppose that we have two different fixed points p and q, p 6= q, g(p) = p, g(q) = q. Applying the mean value theorem, there exists a ξ ∈ (a, b) such that 1 = g(p) − g(q) p − q = g ′ (ξ). Therefore, 1 = |g ′ (ξ)| ≤ k < 1. It is a contradiction 1 < 1. Remark: All the conditions in the above theorems are sufficient but not necessary. • Example: g(x) = (x 2 − 1)/3 on [−1, 1]. The minimum occurs at x = 0 (g(0) = −1 3 ) and the maximum at −1 and 1 (g(1) = 0 and g(−1) = 0). So g(x) ∈ [−1, 1]. On the other hand, g ′ (x) = 2 3 x ≤ 2 3 , ∀x ∈ (−1, 1). So g satisfies all the hypotheses of Theorem 2.2 and has a unique fixed point. In this example, we can get the fixed point by solving p = g(p) = p 2 − 1 3 , then p 2 − 3p − 1 = 0. We get p = 1 2 (3 − √ 13), p = 1 2 (3 + √ 13). 8
Lecture note 2 Numerical Analysis Note that the latter one is in [3,4].However,g(4)=5 and g(4)=>1,so the hypotheses in Theorem 2.2 are not satisfied but the fixed point in [3,4]is unique. .Example:Let g(z)=3-. g(z)=-3-xn3<0z∈[0,1] So g(z)is decreasing on 0,1].Therefore g0)=3≤96四≤1=90, x∈0,1 Ths,g(z)∈[0,l]for all∈[0,1].Hence g has a fixed point on[0,l]. However, g(0)=-n3≈-1.0986 lg'()1 on(0,1).Theorem 2.2 cannot be applied.However,g is decreasing, it is clear that the fixed point is unique. 1.2.2 Fixed point Algorithm Questions: Suppose that g has a fixed point at p.How to find a fixed point of g()? The fixed point gives naturally a iteration. Given an initial guess of po,we define Pn+1=g(pn).fixed point iteration. ·Two questions. 1.When pn a convergent sequence? 2.If it converges,what is the limit? -Denote p be the limit,i.e.,limn=p.Then by the continuity of 9 p=lim Pn lim g(Pn-1)=g(lim Pn-1)=g(p). n¥0g n-oo We can see that it has to be a fixed point! Theorem 5 (Theorem 2.3 Fixed-point Theorem)Suppose that 1.g∈C[a, 2.g(x)∈[a,b]for Vx∈[a,. 3.g'(x)erists on (a,b). 4.There erists a k∈(0,1)with |g'(rl≤k for all x∈(a,b). 9
Lecture note 2 Numerical Analysis Note that the latter one is in [3, 4]. However, g(4) = 5 and g ′ (4) = 8 3 > 1, so the hypotheses in Theorem 2.2 are not satisfied but the fixed point in [3, 4] is unique. • Example: Let g(x) = 3−x . g ′ (x) = −3 −x ln 3 < 0 ∀x ∈ [0, 1] So g(x) is decreasing on [0, 1]. Therefore g(0) = 1 3 ≤ g(x) ≤ 1 = g(0), ∀x ∈ [0, 1] Thus, g(x) ∈ [0, 1] for all x ∈ [0, 1]. Hence g has a fixed point on [0, 1]. However, g ′ (0) = − ln 3 ≈ −1.0986 |g ′ (x)| 6≤ 1 on (0, 1). Theorem 2.2 cannot be applied. However, g is decreasing, it is clear that the fixed point is unique. 1.2.2 Fixed point Algorithm Questions: Suppose that g has a fixed point at p. How to find a fixed point of g(x)? • The fixed point gives naturally a iteration. • Given an initial guess of p0, we define pn+1 = g(pn). ←− fixed point iteration. • Two questions. 1. When pn a convergent sequence? 2. If it converges, what is the limit? – Denote p be the limit, i.e., limn→∞ = p. Then by the continuity of g p = limn→∞ pn = limn→∞ g(pn−1) = g( limn→∞ pn−1) = g(p). We can see that it has to be a fixed point! Theorem 5 (Theorem 2.3 Fixed-point Theorem) Suppose that 1. g ∈ C[a, b] 2. g(x) ∈ [a, b] for ∀x ∈ [a, b]. 3. g ′ (x) exists on (a, b). 4. There exists a k ∈ (0, 1) with |g ′ (x)| ≤ k for all x ∈ (a, b). 9
Lecture note 2 Numerical Analysis Then,for any number po E la,b],the sequence defined by Pn+1=g(Pn) converges to the unique fired point p in a,b. Remarks:The hypotheses here is the same as those in the Uniqueness theorem. Proof.From Theorem 2.2 (the Uniqueness theorem)we know that there is a unique fixed point p in [a,b].Using mean value theorem, g-小=g2=g(), ξis between p and pn-l, Pn-1-P which is lg(pn-1)-g(pl≤lg(ξllpn-1-pl lpn-pl≤klpn-1-pl It holds for any n.Then lpn-pl≤klpn-1-pl ≤k·klpm-2-pl=k2pn-2-p刊 ≤klpo-pl. Since kn≥1, IPm Pnl=Pm Pm-1 Pm-1-...Pn+1-Pnl pm Pm-1l+lpm-1-Pm-21+...+pn+1 -Pnl km-ilpi -pol +km-2p1-pol+...+k"lpi -pol =k"p1-pol(1+k+k2+..+km--1) 10
Lecture note 2 Numerical Analysis Then, for any number p0 ∈ [a, b], the sequence defined by pn+1 = g(pn) converges to the unique fixed point p in [a, b]. Remarks: The hypotheses here is the same as those in the Uniqueness theorem. Proof. From Theorem 2.2 (the Uniqueness theorem) we know that there is a unique fixed point p in [a, b]. Using mean value theorem, g(pn−1) − g(p) pn−1 − p = g ′ (ξ), ξ is between p and pn−1, which is |g(pn−1) − g(p)| ≤ |g ′ (ξ)||pn−1 − p| |pn − p| ≤ k|pn−1 − p| It holds for any n. Then |pn − p| ≤ k|pn−1 − p| ≤ k · k|pn−2 − p| = k 2 |pn−2 − p| . . . ≤ k n |p0 − p|. Since k n ≥ 1, |pm − pn| = |pm − pm−1 + pm−1 − . . . + pn+1 − pn| ≤ |pm − pm−1| + |pm−1 − pm−2| + . . . + |pn+1 − pn| ≤ k m−1 |p1 − p0| + k m−2 |p1 − p0| + . . . + k n |p1 − p0| = k n |p1 − p0|(1 + k + k 2 + . . . + k m−n−1 ) 10