正在加载图片...
variable. The transformation of the problem is accomplished if a simple way to represent these two effects is found This leads us to the concept of the value function, which might be used by a planner who wanted to recalculate the optimal policy at time t after the dynamic process began. Consider the problem of maximizing f(x(),u(t),)'+S0(x(1),41) when the state variable at time t is x; x(o=x. The maximized value is then a function ofx and t (1.7) which is called the value function. The optimal value of the objective functional for the original problem(1. 2)-(1.5)is J*(x=(D),1)=J*(x),0) (1.8) The usefulness of the value function must be obvious by now: it facilitates the characterization of the indirect effect through a cha the state variable by summarizing the maximum possible value of the objective functional from time t on as a function of the state variable at time t(and n) The next step in the derivation of the necessary conditions for the optimum involves the celebrated Principle of Optimality due to Bellman. The principle exploits the fact that the value of the state variable at time t captures all the necessary information for the decision making from time t on: the paths of the control vector and the state variable up to time t do not make any difference as long as the state variable at time t is the same. This implies that if a planner recalculates the optimal policy at time I given the optimal value of the state variable at that time, the new optimal policy coincides with the original optimal policy. Thus if u*(0),to stsII, is the optimal control for the ariginal problem and x*(O),to stsI, the corresponding trajectory of the state variable. the value function satisfies J*=[6(x*().n*(1),1)d+Sox*(1),) (1.9) Applying the principle of optimality again, we can rewrite(1.9)as J*(x+(0=g1(x*(0)+(O,)d+、6(xm)nnd S0(x*(1)41) (1.10) f(x*(),u*(1),1)d’+J*(x*(t+M),t+△D) for any t and t+ A such that to≤t≤t+At≤l1. This construction allows us toAppendix IV 193 variable. The transformation of the problem is accomplished if a simple way to represent these two effects is found. This leads us to the concept of the value function, which might be used by a planner who wanted to recalculate the optimal policy at time t after the dynamic process began. Consider the problem of maximizing ò ¢ ¢ ¢ ¢ + 1 ( ( ), ( ), ) ( ( ), ) 0 0 1 1 t t f x t u t t dt S x t t (1.6) when the state variable at time t is x ; x(t) = x . The maximized value is then a function of x and t: J * (x,t), (1.7) which is called the value function. The optimal value of the objective functional for the original problem (1.2)-(1.5) is *( *( ), ) *( , ). 0 0 J x t t = J x t (1.8) The usefulness of the value function must be obvious by now: it facilitates the characterization of the indirect effect through a change in the state variable by summarizing the maximum possible value of the objective functional from time t on as a function of the state variable at time t (and t). The next step in the derivation of the necessary conditions for the optimum involves the celebrated Principle of Optimality due to Bellman. The principle exploits the fact that the value of the state variable at time t captures all the necessary information for the decision making from time t on: the paths of the control vector and the state variable up to time t do not make any difference as long as the state variable at time t is the same. This implies that if a planner recalculates the optimal policy at time t given the optimal value of the state variable at that time, the new optimal policy coincides with the original optimal policy. Thus if *( ), , 0 1 u t t £ t £ t is the optimal control for the original problem and *( ), , 0 1 x t t £ t £ t the corresponding trajectory of the state variable, the value function satisfies * ( *( ), *( ), ) ( * ( ), ). 1 ò 0 + 0 1 1 = ¢ ¢ ¢ ¢ t t J f x t u t t dt S x t t (1.9) Applying the principle of optimality again, we can rewrite (1.9) as ( *( ), *( ), ) *( *( ), ), ( *( ), ) *( *( ), ) ( *( ), *( ), ) ( *( ), *( ), ) 0 0 1 1 0 0 1 f x t u t t dt J x t t t t S x t t J x t t f x t u t t dt f x t u t t dt t t t t t t t t t = ¢ ¢ ¢ ¢+ + D + D + = ¢ ¢ ¢ ¢+ ¢ ¢ ¢ ¢ ò ò ò +D +D +D (1.10) for any t and t + Dt such that 0 1 t £ t £ t +Dt £t . This construction allows us to
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有