《自动化仪表与过程控制》课程学习资料：APPENDIX IV OPTIMAL CONTROL THEORY.pdf_大学文库

Appendix IV APPENDIX IV OPTIMAL CONTROL THEORY This appendix provides a concise review of optimal control theory.Many economic problems require the use of optimal control theory.For example, optimization over time such as maximizations of utility over an individual's life time and of profit and social welfare of a country over time and optimization over space such as the ones analyzed in this book fit in its framework. Although these problems may be solved by the conventional techniques such as Lagrange's method and nonlinear programming if we formulate the problems in discrete form by dividing time (or distance)into a finite number of intervals,continuous time (or space)models are usually more convenient and yield results which are more transparent.Optimization over continuous time,however,introduces some technical difficulties.In the continuous time model,the number of choice variables is no longer finite:since decisions may be taken at each instant of time,there is a continuously infinite number of choice variables.The rigorous treatment of optimization in an infinite-dimensional space requires the use of very advanced mathematics. Fortunately,once proven,the major results are quite simple,and analogous to those in the optimization in a finite-dimensional space. There are three approaches in the optimal control theory:calculus of variations, the maximum principle and dynamic programming.Calculus of variations is the oldest among the three and treats only the interior solution.In applications,as it turned out, choice variables are often bounded,and may jump from one bound to the other in the interval considered.The maximum principle was developed to include such cases. Roughly speaking,calculus of variations and the maximum principle are derived by using some appropriate forms of differentiation in an infinite-dimensional space. Dynamic programming however,exploits the recursive nature of the problem.Many problems including those treated by calculus of variations and the maximum principle have the property that the optimal policy from any arbitrary time on depends only on the state of the system at that time and does not depend on the paths that the choice variables have taken up to that time.In such cases the maximum value of the objective function beyond time t can be considered as a function of the state of the system at time t.This function is called the value function.The value function yields the value which the best possible performance from t to the end of the interval achieves. The dynamic programming approach solves the optimization problem by first obtaining the value function.Although the maximum principle and dynamic programming yield the same results,where they can both be applied,dynamic programming is less general than the approach based on the maximum principle,since it requires differentiability of the value function. We first try to facilitate an intuitive understanding of control theory in section 1. In order to do so,a very simple control problem is formulated and the necessary conditions for the optimum are derived heuristically.Following the dynamic programming approach,Pontryagin's maximum principle is derived from the partial 189

Appendix IV 189 APPENDIX IV OPTIMAL CONTROL THEORY This appendix provides a concise review of optimal control theory. Many economic problems require the use of optimal control theory. For example, optimization over time such as maximizations of utility over an individual's life time and of profit and social welfare of a country over time and optimization over space such as the ones analyzed in this book fit in its framework. Although these problems may be solved by the conventional techniques such as Lagrange's method and nonlinear programming if we formulate the problems in discrete form by dividing time (or distance) into a finite number of intervals, continuous time (or space) models are usually more convenient and yield results which are more transparent. Optimization over continuous time, however, introduces some technical difficulties. In the continuous time model, the number of choice variables is no longer finite: since decisions may be taken at each instant of time, there is a continuously infinite number of choice variables. The rigorous treatment of optimization in an infinite-dimensional space requires the use of very advanced mathematics. Fortunately, once proven, the major results are quite simple, and analogous to those in the optimization in a finite-dimensional space. There are three approaches in the optimal control theory: calculus of variations, the maximum principle and dynamic programming. Calculus of variations is the oldest among the three and treats only the interior solution. In applications, as it turned out, choice variables are often bounded, and may jump from one bound to the other in the interval considered. The maximum principle was developed to include such cases. Roughly speaking, calculus of variations and the maximum principle are derived by using some appropriate forms of differentiation in an infinite-dimensional space. Dynamic programming however, exploits the recursive nature of the problem. Many problems including those treated by calculus of variations and the maximum principle have the property that the optimal policy from any arbitrary time on depends only on the state of the system at that time and does not depend on the paths that the choice variables have taken up to that time. In such cases the maximum value of the objective function beyond time t can be considered as a function of the state of the system at time t. This function is called the value function. The value function yields the value which the best possible performance from t to the end of the interval achieves. The dynamic programming approach solves the optimization problem by first obtaining the value function. Although the maximum principle and dynamic programming yield the same results, where they can both be applied, dynamic programming is less general than the approach based on the maximum principle, since it requires differentiability of the value function. We first try to facilitate an intuitive understanding of control theory in section 1. In order to do so, a very simple control problem is formulated and the necessary conditions for the optimum are derived heuristically. Following the dynamic programming approach, Pontryagin's maximum principle is derived from the partial

differential equation of dynamic programming. As mentioned above, this approach is not the most general one, but it facilitates economic interpretation of the necessary conditions. In section 2 the results in section I are applied to an example taken from Chapter VIl. Section 3 considers a more general form of the control problem(due to Bolza and Hestenes) and Hestenes' theorem, giving the necessary conditions for the optimum, is stated without proof. This theorem is general enough to include most problems that appear in this book. Finally, in section 4, Hestenes' theorem is used to solve the control problems in Chapter I 1. A Simple Control Problem Consider a dynamic process which starts at inital time to and ends at terminal time t,. Both to and 4, are taken as given in this section. For simplicity, the state of the system is described by only one variable, x(, called the state variable In most economic problems the state variable is usually a stock, such as the amounts of capital equipments and inventories available at time t. In Chapters IV and V of our book the volume of traffic at a radius is a state variable The state of the system is influenced by the choice of control variables, u(0, u2(0),.,u,(0), which are summarized as the control vector, l()=(1(t),u2(D)2…,ly(t) (11) The control vector must lie inside a given subset of a Euclidean r-dimensional space, U: to≤t≤1 (12) where U is assumed to be closed and unchanging. Note that control variables are chosen at each point of time. The rate of investment in capital equipment is one of the control variables in most models of capital accumulation; the rate of inventory investment is a variable in inventory adjustment models; and the population per unit distance is a control variable for the models in this book. An entire path of the control vector,u(O),to sIst, is a vector-valued function u(t) from the interval into the r-dimensional space and is simply called a control. A control is admissible if it satisfies the constraint(1.2)and some other regularity conditions which will be specified in section 3 The state variable moves according to the differential equation dx di=x()=/(x()u(0),0) (13) where f, is assumed to be continuously differentiable. Notice that the function fr is not the same as fo. In this section the initial state, x('o), is given, 190

Appendix IV 190 differential equation of dynamic programming. As mentioned above, this approach is not the most general one, but it facilitates economic interpretation of the necessary conditions. In section 2 the results in section 1 are applied to an example taken from Chapter VII. Section 3 considers a more general form of the control problem (due to Bolza and Hestenes) and Hestenes' theorem, giving the necessary conditions for the optimum, is stated without proof. This theorem is general enough to include most problems that appear in this book. Finally, in section 4, Hestenes' theorem is used to solve the control problems in Chapter I. 1. A Simple Control Problem Consider a dynamic process which starts at inital time 0 t and ends at terminal time 1 t . Both 0 t and 1 t are taken as given in this section. For simplicity, the state of the system is described by only one variable, x(t) , called the state variable. In most economic problems the state variable is usually a stock, such as the amounts of capital equipments and inventories available at time t. In Chapters IV and V of our book the volume of traffic at a radius is a state variable. The state of the system is influenced by the choice of control variables, ( ), ( ), , ( ), 1 2 u t u t u t K r which are summarized as the control vector, ( ) ( ( ), ( ), , ( )). 1 2 u t u t u t u t = K r (1.1) The control vector must lie inside a given subset of a Euclidean r-dimensional space, U: ( ) , , 0 1 u t ÎU t £ t £ t (1.2) where U is assumed to be closed and unchanging. Note that control variables are chosen at each point of time. The rate of investment in capital equipment is one of the control variables in most models of capital accumulation; the rate of inventory investment is a variable in inventory adjustment models; and the population per unit distance is a control variable for the models in this book. An entire path of the control vector, u(t) , , 0 1 t £ t £ t is a vector-valued function u(t) from the interval [ ] 0 1 t ,t into the r-dimensional space and is simply called a control. A control is admissible if it satisfies the constraint (1.2) and some other regularity conditions which will be specified in section 3. The state variable moves according to the differential equation ( ) ( ( ), ( ), ), 1 x t f x t u t t dt dx = & = (1.3) where 1 f is assumed to be continuously differentiable. Notice that the function 1 f , is not the same as 0 f . In this section the initial state, ( ) 0 x t , is given, ( ) , 0 0 x t = x (1.4)

The problem is illustrated in Figure 1. In Fig. la, a possible trajectory of the state variable with the initial value x is depicted. If the trajectory of the control vector is specified for the entire time horizon [o, t,I, the trajectory of the state variable is completely characterized. The value of the state variable at time t and the choice of the control vector then jointly determine fo(x(o),u(o),t) In Fig. lb we graph the part of the value of the objective functional which has been realized at any time t for the particular trajectory of the control vector fo therefore, appears as the slope in Fig 1b, while the value of the objective functional the sum of the integral from fo to t,, of fo, and So, the scrap value at terminal time Our problem is to obtain the trajectory of the control vector that maximizes the The major difficulty of this problem lies in the fact that an entire time path of the control vector must be chosen. This amounts to a continuously infinite number of control variables. In other words, what must be found is not just the optimal numbers but the optimal functions. The basic idea of control theory is to transform the problem hal uing the entire optimal path of control variables into the problem of find ing the optimal values of control variables at each instant of time. In this way the problem of choosing an infinite number of variables is decomposed into an infinite number of mor elementary problems each of which involves determining a finite number of variables The objective functional can be broken into three pieces for any time t-a past, a t and a futu f0(x(),u(n),)dn ∫f6(x0):()d f0(x(0),u(),)dr+S0(x(1),1) The decisions taken at any time have two effects. They directly affect the present erm fo(x(t), u(t), !)dt by changing fo. They also change x, and hence the future path of x(o), through i=f(x(o), u(t),t). The new path of x(o) changes the future part of the functional For example, if a firm increases investment at time t, the rate at which profits are earned at that time falls because the firm must pay for the investment. The investment however, increases the amount of capital available in the future and therefore profits earned in the future. The firm must make investment decisions weighing these two effects. In general, the choice of the control variables at any instant of time must take into account both the instantaneous effect on the current earnings foAt and the indirect effect on the future earnings [ fodr'+So through a change in the state 192

Appendix IV 192 The problem is illustrated in Figure 1. In Fig.la, a possible trajectory of the state variable with the initial value 0 x is depicted. If the trajectory of the control vector is specified for the entire time horizon [ ] 0 1 t ,t , the trajectory of the state variable is completely characterized. The value of the state variable at time t and the choice of the control vector then jointly determine ( ( ), ( ), ) 0 f x t u t t . In Fig.1b we graph the part of the value of the objective functional which has been realized at any time t for the particular trajectory of the control vector 0 f , therefore, appears as the slope in Fig.1b, while the value of the objective functional is the sum of the integral from 0 t to 1 t , of 0 f , and S0 , the scrap value at terminal time. Our problem is to obtain the trajectory of the control vector that maximizes the objective functional. The major difficulty of this problem lies in the fact that an entire time path of the control vector must be chosen. This amounts to a continuously infinite number of control variables. In other words, what must be found is not just the optimal numbers but the optimal functions. The basic idea of control theory is to transform the problem of choosing the entire optimal path of control variables into the problem of finding the optimal values of control variables at each instant of time. In this way the problem of choosing an infinite number of variables is decomposed into an infinite number of more elementary problems each of which involves determining a finite number of variables. The objective functional can be broken into three pieces for any time t - a past, a present and a future - : ò ò ò +D +D + ¢ ¢ ¢ ¢+ + ¢ ¢ ¢ ¢ = ¢ ¢ ¢ ¢ 1 0 ( ( ) , ( ) , ) ( ( ), ). ( ( ) , ( ) , ) ( ( ) , ( ) , ) 0 0 1 1 0 0 t t t t t t t t f x t u t t dt S x t t f x t u t t dt J f x t u t t dt The decisions taken at any time have two effects. They directly affect the present term, ò +D ¢ ¢ ¢ ¢ t t t f (x(t) ,u(t) ,t )dt 0 , by changing 0 f . They also change x& , and hence the future path of x(t) , through ( ( ), ( ), ) 1 x& = f x t u t t . The new path of x(t) changes the future part of the functional. For example, if a firm increases investment at time t, the rate at which profits are earned at that time falls because the firm must pay for the investment. The investment, however, increases the amount of capital available in the future and therefore profits earned in the future. The firm must make investment decisions weighing these two effects. In general, the choice of the control variables at any instant of time must take into account both the instantaneous effect on the current earnings f Dt 0 and the indirect effect on the future earnings ò +D ¢+ 1 0 0 t t t f dt S through a change in the state

Appendix IV 193 variable. The transformation of the problem is accomplished if a simple way to represent these two effects is found. This leads us to the concept of the value function, which might be used by a planner who wanted to recalculate the optimal policy at time t after the dynamic process began. Consider the problem of maximizing ò ¢ ¢ ¢ ¢ + 1 ( ( ), ( ), ) ( ( ), ) 0 0 1 1 t t f x t u t t dt S x t t (1.6) when the state variable at time t is x ; x(t) = x . The maximized value is then a function of x and t: J * (x,t), (1.7) which is called the value function. The optimal value of the objective functional for the original problem (1.2)-(1.5) is *( *( ), ) *( , ). 0 0 J x t t = J x t (1.8) The usefulness of the value function must be obvious by now: it facilitates the characterization of the indirect effect through a change in the state variable by summarizing the maximum possible value of the objective functional from time t on as a function of the state variable at time t (and t). The next step in the derivation of the necessary conditions for the optimum involves the celebrated Principle of Optimality due to Bellman. The principle exploits the fact that the value of the state variable at time t captures all the necessary information for the decision making from time t on: the paths of the control vector and the state variable up to time t do not make any difference as long as the state variable at time t is the same. This implies that if a planner recalculates the optimal policy at time t given the optimal value of the state variable at that time, the new optimal policy coincides with the original optimal policy. Thus if *( ), , 0 1 u t t £ t £ t is the optimal control for the original problem and *( ), , 0 1 x t t £ t £ t the corresponding trajectory of the state variable, the value function satisfies * ( *( ), *( ), ) ( * ( ), ). 1 ò 0 + 0 1 1 = ¢ ¢ ¢ ¢ t t J f x t u t t dt S x t t (1.9) Applying the principle of optimality again, we can rewrite (1.9) as ( *( ), *( ), ) *( *( ), ), ( *( ), ) *( *( ), ) ( *( ), *( ), ) ( *( ), *( ), ) 0 0 1 1 0 0 1 f x t u t t dt J x t t t t S x t t J x t t f x t u t t dt f x t u t t dt t t t t t t t t t = ¢ ¢ ¢ ¢+ + D + D + = ¢ ¢ ¢ ¢+ ¢ ¢ ¢ ¢ ò ò ò +D +D +D (1.10) for any t and t + Dt such that 0 1 t £ t £ t +Dt £t . This construction allows us to

Appendix IV 194 concentrate on the decisions in the short interval from t to t + Dt by summarizing the outcome in the remaining period in the value func tion, J * (x * (t + Dt),t + Dt) . By the definition of the value function, any admissible control cannot do better than the value function if the initial state is the same. Consider the following special type of control, 1 u(t¢), t £ t¢ £ t : the control is arbitrary between time t and time t + Dt and optimal in the remaining period given the state reached at time t + Dt . Then the corresponding value of the objective functional satisfies * ( * ( ), ) ( ( ), ( ), ) * ( ( ), ) 0 J x t t f x t u t t dt J x t t t t t t t > ¢ ¢ ¢ ¢ + + D + D ò + D (1.11) where x(t¢) , 1 t £ t¢ £ t , is the state variable corresponding to the control u(t¢) with the initial state x(t) = x * (t) . Combining (1.10) and (1.11) yields ( ( ), ( ), ) ( ( ), ) ( ( ), ) ( ( ), ( ), ) ( ( ), ) 0 0 f x t u t t dt J x t t t t J x t t f x t u t t dt J x t t t t t t t t t t ³ ¢ ¢ ¢ ¢ + * + D + D * * = * ¢ * ¢ ¢ ¢ + * * + D + D ò ò + D + D for any u(t¢)ÎU ,t £ t¢£ t + Dt . (1.12) This shows that the optimal control in the interval [t,t + Dt] maximizes the sum of the objective functional in the interval and the maximum possible value of the functional in the rest of the period [ ]1 t + Dt,t . If both sides of the inequality are differentiable, Taylor's expansion around t yields1 1 The details of Taylor's expansion here are as follows. Taylor's theorem states that if F (t) is differentiable at t = a , then F(t) = F (a) + (t - a)F(a) + o(t - a) , where 0 ( ) lim 0 = - - - ® t a o t a t a ．Noting that ò +D + D º ¢ ¢ t t t F (t t) f (t )dt 0 0 satisfies ( ) ( ), 0 0 F t = f t ¢ we obtain [( *( *( ), ) ) *( ) *( *( ), ) ] ( ), ( * ( ), *( ), ) *( *( ), ) ( * ( ), * ( '), ') ' * ( *( ), ) 0 0 J x t t x x t J x t t t t o t f x t u t t t J x t t f x t u t t dt J x t t t t t t t + ¶ ¶ +¶ ¶ D + D = D + ¢ + + D + D ò +D & and

Appendix IV 195 ( *( ), ( ), ) ( *( *( ), )/ ) ( *( ), ( ), ) , ( *( ), *( ), ) ( *( *( ), )/ ) ( *( ), *( ), ) ( *( *( ), )/ ) 0 1 0 1 K K ³ D + ¶ ¶ D + = D + ¶ ¶ D + - ¶ ¶ D f x t u t t t J x t t x f x t u t t t f x t u t t t J x t t x f x t u t t t J x t t t t for any u(t) Î U , (1.13) where ... represents higher order terms which become negligible as Dt tends to zero, since they approach zero faster than Dt . Note that we used x(t) = x * (t) , ( ) ( ( ), ( ), ) 1 x& t = f x t u t t and *( ) ( *( ), *( ), ). 1 x& t = f x t u t t Inequality (1.13) has a natural economic interpretation. For example, if a firm is contemplating the optimal capital accumulation policy, f (x*(t),u(t),t)Dt 0 , is approximately the amount of profits earned in the period [t,t + Dt] . ¶J * (x * (t),t) / ¶x is the marginal value of capital, or the contribution of an additional unit of capital at time t; and f (x*(t),u(t),t)Dt = x(t)Dt 1 & is approximately the amount of capital accumulated in period [t,t + Dt]. Thus ¶J ¶x f Dt 1 ( */ ) represents the value of capital accumulated during the period. (1.13), therefore, shows that the optimal control vector maximizes the sum of the current profits and the value of increased capital. Dividing (1.13) by At and taking limits as At approaches zero, we obtain ( *( ), ( ), ) ( *( *( ), )/ ) ( *( ), ( ), ) ( *( ), *( ), ) ( *( *( ), )/ ) ( *( ), *( ), ) *( *( ), )/ 0 1 0 1 f x t u t t J x t t x f x t u t t f x t u t t J x t t x f x t u t t J x t t t ³ + ¶ ¶ = + ¶ ¶ - ¶ ¶ for any u(t)ÎU . (1.14) Thus the optimal control vector u * (t) maximizes ( *( ), , ) ( *( *( ), )/ ) ( *( ), , ) 0 1 f x t u t + ¶J x t t ¶x f x t u t (1.15) at each instant of time, and we have finally transformed the problem of finding the optimal path to that of finding optimal numbers at each point in time. From the above discussion, it must be clear that (1.15) summarizes both the instantaneous effect and the indirect effect through a change in the state variable. (1.14) can be rewritten as [( *( *( ), ) ) ( ) *( * ( ), ) ] ( ), ( * ( ), ( ), ) *( * ( ), ) [( *( ( ), ) ) ( ) * ( ( ), ) ] ( ) ( ( ), ( ), ) *( ( ), ) ( ( '), ( '), ') ' * ( ( ), ) 0 0 0 J x t t x x t J x t t t t o t f x t u t t t J x t t J x t t x x t J x t t t t o t f x t u t t t J x t t f x t u t t dt J x t t t t t t t + ¶ ¶ + ¶ ¶ D + D = D + + ¶ ¶ + ¶ ¶ D + D = D + + + D + D ò +D & & where we used x(t) = x *(t) . Substituting these two equations into (1.12) yields (1.13)

a+/ar=maxecb(x+().)+(aJ+(ax)f(x+(,)] (1.14) This equation holds for any x, not just x*(O, and can be considered a partial differential equation of J*(x, 1). It is called the partial differential equation of dynamic programming or Bellman's equation In the dynamic programming approach, the right side of (1. 14)is maximized with respect to u, yielding the partial differential equation. The partial differential equation is then solved with the boundary conditions. At the initial time to, x(to=x, while at the terminal time t, the value function satisfies J*(x(1),1)=S0(x(1),1) (1.16) for any x( 1). This equation is the terminal boundary condition associated with Bellman's equation. Since(1. 16) holds for any x(tu), we have aJ*(x(1)1)/ax=aS0(x(41)h1)/ax, which is called the transversality condition at time t One of the disadvantages of the dynamic programming approach is that the partial ifferential equation is usually hard to solve. Pontryagin's maximum principle, which can be immediately derived from the partial differential equation of dynamic programming, is often more useful for economic applications. Furthermore, the method of dynamic programming employs the Taylor expansion in(1. 13), which requires that the value function be differentiable. There are many problems for which the value function is not differentiable everywhere. The maximum principle, however can be proven using a different and more general method. In this section we derive the maximum principle from Bellman's equation, and in Section 3 we state a more general version of the maximum principle without proof. o derive Pontryagin's maximum principle, we define the adjoint, or costate, o auxillary, variable, p(1)=0*(x*(t),1)/0x, and rewrite(1. 15 )as the Hamiltonian H[x(),(t)t,p()]=f0(x()u(),1)+p(t)(x(,(D0) (1. 14)now reads: if u*() is the optimal control and x*(O) the associated path of the state variable, then there exists a p(t) such that for any t H[x*(O),u*(),4, p(o]=max(au H[x*(O),u, t, P(o] (120) Since p(t) equals a*/ax, the adjoint variable p(t)is the marginal value of the state variable (if, for eample, x(o is capital, p(t is the marginal value of 196

Appendix IV 196 { } * / max [ ( * ( ), , ) ( * / ) ( *( ), , )]. 0 1 J t f x t u t J x f x t u t u U - ¶ ¶ = + ¶ ¶ e (1.14') This equation holds for any x, not just x * (t) , and can be considered a partial differential equation of J * (x,t) . It is called the partial differential equation of dynamic programming or Bellman's equation. In the dynamic programming approach, the right side of (1.14') is maximized with respect to u, yielding the partial differential equation. The partial differential equation is then solved with the boundary conditions. At the initial time , ( ) , 0 0 0 t x t = x while at the terminal time 1 t , the value function satisfies *( ( ), ) ( ( ), ) 1 1 0 1 1 J x t t = S x t t (1.16) for any ( ) 1 x t . This equation is the terminal boundary condition associated with Bellman's equation. Since (1.16) holds for any ( ) 1 x t , we have *( ( ), ) / ( ( ), )/ , 1 1 0 1 1 ¶J x t t ¶x = ¶S x t t ¶x (1.17) which is called the transversality condition at time 1 t . One of the disadvantages of the dynamic programming approach is that the partial differential equation is usually hard to solve. Pontryagin's maximum principle, which can be immediately derived from the partial differential equation of dynamic programming, is often more useful for economic applications. Furthermore, the method of dynamic programming employs the Taylor expansion in (1.13), which requires that the value function be differentiable. There are many problems for which the value function is not differentiable everywhere. The maximum principle, however, can be proven using a different and more general method. In this section we derive the maximum principle from Bellman's equation, and in Section 3 we state a more general version of the maximum principle without proof. To derive Pontryagin's maximum principle, we define the adjoint, or costate, or auxiliary, variable, p(t) = ¶J * (x * (t),t)/ ¶ x, (1.18) and rewrite (1.15) as the Hamiltonian, [ ( ), ( ), , ( )] ( ( ), ( ), ) ( ) ( ( ), ( ), ). 0 1 H x t u t t p t = f x t u t t + p t f x t u t t (1.19) (1.14') now reads: if u * (t) is the optimal control and x * (t) the associated path of the state variable, then there exists a p(t) such that for any t [ ] { } H x *(t),u * (t),t, p(t) max H [x * (t),u,t, p(t)]. = ueU (1.20) Since p(t) equals ¶J * / ¶x , the adjoint variable p(t) is the marginal value of the state variable (if, for example, x(t) is capital, p(t) is the marginal value of