
School ofAutomation,XJTU随机最优控制Stochastic Optimal ControlCCAIYUANLI1
School of Automation, XJTU © CAI YUANLI 1 随 机 最 优 控 制 Stochastic Optimal Control

SchoolofAutomation.XJTUContents最优性原理(principleofoptimality)2确定性最优控制基础32.1连续时间系统402.2离散时间系统492.3最优控制的数值解543随机动态规划技术2CAIYUANLI
School of Automation, XJTU © CAI YUANLI 2 Contents 1 最优性原理(principle of optimality). 1 2 确定性最优控制基础. 3 连续时间系统. 3 离散时间系统. 40 最优控制的数值解 . 49 3 随机动态规划技术. 54

School of Automation.XJTU553.1离散时间系统613.2连续时间系统714连续时间线性二次高斯问题724.1完全状态信息764.2非完全状态信息与分离原理835离散时间线性二次高斯问题875.1完全状态信息3CAIYUANLI
School of Automation, XJTU © CAI YUANLI 3 离散时间系统. 55 连续时间系统. 61 4 连续时间线性二次高斯问题. 71 完全状态信息. 72 非完全状态信息与分离原理 . 76 5 离散时间线性二次高斯问题. 83 完全状态信息. 87

SchoolofAutomation.XJTU915.2非完全状态信息976非线性随机系统次优控制方法986.1摄动法1016.2强迫分离法4CAIYUANLI
School of Automation, XJTU © CAI YUANLI 4 非完全状态信息 . 91 6 非线性随机系统次优控制方法. 97 摄动法. 98 强迫分离法.101

SchoolofAutomation,XJTU1最优性原理(principleofoptimality)An optimal policy has the property that no matter what the previousdecisions (i.e. controls) have been, the remaining decisions must constitutean optimal policy with regards to the state resulting from those previousdecisions.CCAIYUANLI?
School of Automation, XJTU © CAI YUANLI 1 1 最优性原理(principle of optimality) An optimal policy has the property that no matter what the previous decisions (i.e. controls) have been, the remaining decisions must constitute an optimal policy with regards to the state resulting from those previous decisions

School ofAutomation,XJTU最优策略具有这样的性质:不管过去控制策略如何,余下阶段的控制策略必须是关于目前状态的最优策略。(Bellman1957)[例]设C点是由A点到B点的最优路径上的任一点,那么由C点到B点的最优路径仍然是A到B的最优路径上C到B的部分。BA2CCAIYUANLI
School of Automation, XJTU © CAI YUANLI 2 最优策略具有这样的性质:不管过去控制策略如何,余下阶段的 控制策略必须是关于目前状态的最优策略。(Bellman 1957) [例] 设 C 点是由 A 点到 B 点的最优路径上的任一点,那么由 C 点 到 B 点的最优路径仍然是 A 到 B 的最优路径上 C 到 B 的部分。 C A B

School of Automation,XJTU2确定性最优控制基础2.1连续时间系统(2. 1)x(t) = f(x(t),u(t)), x(to) = xoJ(xo,to) = [T, x(T)] + JT L(x, u, t)dt(2.2)问题:求u*(t)E2r,使得J*(xo,to)≤J(xo,to)。CAIYUANLI3
School of Automation, XJTU © CAI YUANLI 3 2 确定性最优控制基础 连续时间系统 𝑥̇(𝑡) = 𝑓(𝑥(𝑡), 𝑢(𝑡)),𝑥(𝑡0 ) = 𝑥0 (2.1) 𝐽(𝑥0 ,𝑡0 ) = 𝜑[𝑇, 𝑥(𝑇)] + ∫ 𝐿(𝑥, 𝑢,𝑡)𝑑𝑡 𝑇 𝑡0 (2.2) 问题:求𝑢 ∗ (𝑡) ∈ 𝔄,使得𝐽 ∗ (𝑥0 ,𝑡0 ) ≤ 𝐽(𝑥0 ,𝑡0 )

School ofAutomation,XJTU一般地J*[x(t),t] = min(Φ[T, x(T)] + f' L[x(t), u(t), t]dt)(2.3)u(Tt≤T≤T根据最优性原理t+4tminL[x(t), u(t), t]dt + J*[x(t + t),t + 4tl)J*[x(t),t] =u(t)tst≤t+4t泰勒级数展开上式右端第2项:J*[x(t + △t),t + △t] = J*[x(t),t] + Jt[x(t),t]4t4CAIYUANLI
School of Automation, XJTU © CAI YUANLI 4 一般地 𝐽 ∗ [𝑥(𝑡),𝑡] = 𝑚𝑖𝑛 𝑢(𝜏) 𝑡≤𝜏≤𝑇 {𝜑[𝑇, 𝑥(𝑇)] + ∫ 𝐿[𝑥(𝜏), 𝑢(𝜏), 𝜏]𝑑𝜏 𝑇 𝑡 } (2.3) 根据最优性原理 𝐽 ∗ [𝑥(𝑡),𝑡] = min 𝑢(𝜏) 𝑡≤𝜏≤𝑡+𝛥𝑡 {∫ 𝐿[𝑥(𝜏), 𝑢(𝜏), 𝜏]𝑑𝜏 + 𝐽 ∗ [𝑥(𝑡 + 𝛥𝑡),𝑡 + 𝛥𝑡] 𝑡+𝛥𝑡 𝑡 } 泰勒级数展开上式右端第 2 项: 𝐽 ∗ [𝑥(𝑡 + 𝛥𝑡),𝑡 + 𝛥𝑡] = 𝐽 ∗ [𝑥(𝑡),𝑡] + 𝐽𝑡 ∗ [𝑥(𝑡),𝑡]𝛥𝑡

School ofAutomation,XJTU+ J*[x(t),t]T f[x(t),u(t),t]4t + o(4t)从而可得:-Jt[x(t),t] = min(L[x(t), u(t),t] + J*[x(t), t]T f[x(t),u(t),t (2.4)u(t)记(Hamilton函数)H[x(t),u(t),Jx, t] = L[x(t), u(t),t] + J*[x(t),t]T f[x(t), u(t),t](2.5)那么(2.6)-Jt[x(t),t] = minH[x(t), u(t), Jx,t]u(t)5CCAIYUANLI
School of Automation, XJTU © CAI YUANLI 5 + 𝐽𝑥 ∗ [𝑥(𝑡),𝑡] 𝑇𝑓[𝑥(𝑡), 𝑢(𝑡),𝑡]𝛥𝑡 + 𝑜(𝛥𝑡) 从而可得: −𝐽𝑡 ∗ [𝑥(𝑡),𝑡] = min 𝑢(𝑡) {𝐿[𝑥(𝑡), 𝑢(𝑡),𝑡] + 𝐽𝑥 ∗ [𝑥(𝑡),𝑡] 𝑇𝑓[𝑥(𝑡), 𝑢(𝑡),𝑡]} (2.4) 记(Hamilton 函数) 𝐻[𝑥(𝑡), 𝑢(𝑡),𝐽𝑥 ∗ ,𝑡] = 𝐿[𝑥(𝑡), 𝑢(𝑡),𝑡] + 𝐽𝑥 ∗ [𝑥(𝑡),𝑡] 𝑇𝑓[𝑥(𝑡), 𝑢(𝑡),𝑡] (2.5) 那么 −𝐽𝑡 ∗ [𝑥(𝑡),𝑡] = min 𝑢(𝑡) 𝐻[𝑥(𝑡), 𝑢(𝑡),𝐽𝑥 ∗ ,𝑡] (2.6)

School ofAutomation,XJTU哈密尔顿-雅可比-贝尔曼(HJB)方程边界条件:(2.7)J*[x(T), T] = [T, x(T)]【庞特里亚金极小值原理](2.8)H[x(t),u(t),a(t),t] = L[x(t), u(t),t] + ^ (t)f[x(t),u(t),t](2.9)u*(t) = arg minH[x(t), u(t), ^(t),t)u(t)CAIYUANLI6
School of Automation, XJTU © CAI YUANLI 6 ——哈密尔顿-雅可比-贝尔曼(HJB)方程 边界条件: 𝐽 ∗ [𝑥(𝑇), 𝑇] = 𝜑[𝑇, 𝑥(𝑇)] (2.7) [庞特里亚金极小值原理] 𝐻[𝑥(𝑡), 𝑢(𝑡), 𝜆(𝑡),𝑡] = 𝐿[𝑥(𝑡), 𝑢(𝑡),𝑡] + 𝜆 𝑇 (𝑡)𝑓[𝑥(𝑡), 𝑢(𝑡),𝑡] (2.8) 𝑢 ∗ (𝑡) = arg min 𝑢(𝑡) 𝐻[𝑥(𝑡), 𝑢(𝑡), 𝜆(𝑡),𝑡] (2.9)