Control Theory: From Classical to Quantum Optimal, Stochastic, and robust Control Notes for Quantum Control Summer School, Caltech, August 2005 M.R. James Department of engineering Australian National Universit Matthew.ames@anu. edu. au Contents 1 Introductio 2 Deterministic Dynamic Programming and Viscosity Solutions 5 2.1 Introduction 5 2.1.1 Preamble 2. 1.2 Optimal Control 2.1.3 Distance Function 1.4 Viscosity Solut 2.2 Value Functions are Viscosity Solutions .2.1 The Distance Function is a Viscosity Solution 2.2.2 The Optimal Control Value Function is a Viscosity Solution 2.3 Comparison and Uniquenes 2.3.1 Dirichlet problem 2.3.2 Cauchy Problem 3 Stochastic control 3. 1 Some Probability Theory 1.1 Basic Defin 3.1.2 Conditional Expectations 3.1.3 Stochastic Processes 3.1.4 Martingales 3.1.5 Semimartingales This work was supported by the Australian Research Council
Control Theory: From Classical to Quantum Optimal, Stochastic, and Robust Control Notes for Quantum Control Summer School, Caltech, August 2005 M.R. James∗ Department of Engineering Australian National University Matthew.James@anu.edu.au Contents 1 Introduction 3 2 Deterministic Dynamic Programming and Viscosity Solutions 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 Viscosity Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Value Functions are Viscosity Solutions . . . . . . . . . . . . . . . . . . . . 12 2.2.1 The Distance Function is a Viscosity Solution . . . . . . . . . . . . 12 2.2.2 The Optimal Control Value Function is a Viscosity Solution . . . . 14 2.3 Comparison and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Dirichlet Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Cauchy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Stochastic Control 22 3.1 Some Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.3 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.5 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 ∗This work was supported by the Australian Research Council. 1
3.1.6 Markov processes 3.1.7 Observation Processes 3. 1.8 Linear Representation of a Markov Chain 3.2 Controlled State Space Models 2.1 Feedback Control Laws or Policies 3.2.2 Partial and Full State Information 3.3 Filtering 34 3.3.1 Introduction 3.3.2 The Kalman Filter 3.3.3 The Kalman Filter for Controlled Linear Systems 3.3.4 The HMM Filter(Markov Chain) 3.3.5 Filter for Controlled HMM 3.4 Dynamic Programming -Case I: Complete State Information 3.4.1 Optimal Control Problem 3.5 Dynamic Programming -Case II: Partial State Information 3.5.1 Optimal Control of HMMs 2678 3.5.2 Optimal Control of Linear Systems(LQG) 3.6 Two Continuous Time Problems 3.6.1 System and Kalman Filte 3.6.2 LQG Control 51 3.6.3 LEQG Control 51 4 Robust Control 4.1 Introduction and Background 4.2 The Standard Problem of Hoo Control 54 4. 2.1 The Plant(Physical System Being Controlled) 4.2.2 The Class of Controllers 4.2.3 Control Objectives 4.3 The Solution for Linear Systems 4.3.1 Problem formulation 56 4.3.2 Background on Riccati equations 4.3.3 Standard Assumptions 57 4.3.4 Problem Solution 4.4 Risk-Sensitive Stochastic Control and Robustness 5 Optimal Feedback Control of Quantum Systems 5.1 Preliminaries 5.2 The Feedback Control Problem 5.3 Conditional Dynamics 63 5.3.1 Controlled State Transfer 5.3.2 Feedback Control 5.4 Optimal Control 5.5 Appendix: Formulas for the Two-State System with Feedback Example.. 73
3.1.6 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.7 Observation Processes . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.8 Linear Representation of a Markov Chain . . . . . . . . . . . . . . . 32 3.2 Controlled State Space Models . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Feedback Control Laws or Policies . . . . . . . . . . . . . . . . . . . 34 3.2.2 Partial and Full State Information . . . . . . . . . . . . . . . . . . . 34 3.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.2 The Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.3 The Kalman Filter for Controlled Linear Systems . . . . . . . . . . 39 3.3.4 The HMM Filter (Markov Chain) . . . . . . . . . . . . . . . . . . . 39 3.3.5 Filter for Controlled HMM . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Dynamic Programming - Case I : Complete State Information . . . . . . . 42 3.4.1 Optimal Control Problem . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Dynamic Programming - Case II : Partial State Information . . . . . . . . 46 3.5.1 Optimal Control of HMM’s . . . . . . . . . . . . . . . . . . . . . . 47 3.5.2 Optimal Control of Linear Systems (LQG) . . . . . . . . . . . . . . 48 3.6 Two Continuous Time Problems . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.1 System and Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.2 LQG Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6.3 LEQG Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 Robust Control 53 4.1 Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 The Standard Problem of H∞ Control . . . . . . . . . . . . . . . . . . . . 54 4.2.1 The Plant (Physical System Being Controlled) . . . . . . . . . . . . 54 4.2.2 The Class of Controllers . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.3 Control Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 The Solution for Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3.2 Background on Riccati Equations . . . . . . . . . . . . . . . . . . . 57 4.3.3 Standard Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.4 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Risk-Sensitive Stochastic Control and Robustness . . . . . . . . . . . . . . 59 5 Optimal Feedback Control of Quantum Systems 61 5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 The Feedback Control Problem . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3 Conditional Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.1 Controlled State Transfer . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.2 Feedback Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.5 Appendix: Formulas for the Two-State System with Feedback Example . . 73 2
6 Optimal Risk-Sensitive Feedback Control of Quantum Systems 6.1 System Model 74 6.2 Risk-Neutral Optimal Control 76 6.3 Risk-Sensitive Optimal Control 6.4 Control of a Two Level Atom 6.4.1 Setup 6.4.2 Information State 6.4.3 Dynamic Programmin 81 6.4.4 Risk-Neutral Control 6.5 Control of a Trapped Atom 6.5.1 Setup 6.5.2 Information State 6.5.3 Optimal LEQG Control 6.5.4 Robustness 1 Introduction The purpose of these notes is to provide an overview of some aspects of optimal and robust control theory considered relevant to quantum control. The notes begin with classical deterministic optimal control, move through classical stochastic and robust control, and conclude with quantum feedback control. Optimal control theory is a systematic approach to controller design whereby the desired performance objectives are encoded in a cost function, which is subsequently optimized to determine the desired controller. Robust control theory aims to enhance the robustness(ability to withstand, to some extent uncertainty,errors, etc) of controller designs by explicitly including uncertainty models in the design process. Some of the material is in continuous time, while other material is written in discrete time. There are two underlying and universal themes in the notes dynamic programming and filtering Dynamic programming is one of the two fundamental tools of optimal control, the other being Pontryagin's principle, [24]. Dynamic programming is a means by which candidate optimal controls can be verified optimal. The procedure is to find a suitable solution to a dynamic programming equation(DPe), which encodes the optimal performance, and to use it to compare the performance of a candidate optimal control. Candidate controls may be determined from Pontryagin's principle, or directly from the solution to the DPE In general it is difficult to solve DPEs. Explicit solutions exist in cases like the linear quadratic regulator, but in general approximations must usually be sought. In addition there are some technical complications regarding the DPE. In continuous time, the dPe is a nonlinear PDE, commonly called the Hamilton-Jacobi-Bellman(hjB )equation. The complications concern differentiability, or lackthereof, and occur even in "simple"classical deterministic problems, section 2. This is one reason it can be helpful to work in discrete time, where such regularity issues are much simpler(another reason for working in discrete time is to facilitate digital implementation
6 Optimal Risk-Sensitive Feedback Control of Quantum Systems 74 6.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2 Risk-Neutral Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.3 Risk-Sensitive Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . 77 6.4 Control of a Two Level Atom . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.4.2 Information State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.4.3 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.4.4 Risk-Neutral Control . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.5 Control of a Trapped Atom . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.5.2 Information State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.5.3 Optimal LEQG Control . . . . . . . . . . . . . . . . . . . . . . . . 85 6.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 1 Introduction The purpose of these notes is to provide an overview of some aspects of optimal and robust control theory considered relevant to quantum control. The notes begin with classical deterministic optimal control, move through classical stochastic and robust control, and conclude with quantum feedback control. Optimal control theory is a systematic approach to controller design whereby the desired performance objectives are encoded in a cost function, which is subsequently optimized to determine the desired controller. Robust control theory aims to enhance the robustness (ability to withstand, to some extent, uncertainty, errors, etc) of controller designs by explicitly including uncertainty models in the design process. Some of the material is in continuous time, while other material is written in discrete time. There are two underlying and universal themes in the notes: dynamic programming and filtering. Dynamic programming is one of the two fundamental tools of optimal control, the other being Pontryagin’s principle, [24]. Dynamic programming is a means by which candidate optimal controls can be verified optimal. The procedure is to find a suitable solution to a dynamic programming equation (DPE), which encodes the optimal performance, and to use it to compare the performance of a candidate optimal control. Candidate controls may be determined from Pontryagin’s principle, or directly from the solution to the DPE. In general it is difficult to solve DPEs. Explicit solutions exist in cases like the linear quadratic regulator, but in general approximations must usually be sought. In addition, there are some technical complications regarding the DPE. In continuous time, the DPE is a nonlinear PDE, commonly called the Hamilton-Jacobi-Bellman (HJB) equation. The complications concern differentiability, or lackthereof, and occur even in “simple” classical deterministic problems, section 2. This is one reason it can be helpful to work in discrete time, where such regularity issues are much simpler (another reason for working in discrete time is to facilitate digital implementation). 3
Filtering concerns the processing of measurement information. In optimal control filters are used to represent information about the system and control problem of interest In general, this information is incomplete, i.e. the state is typically not fully accessible, and may be corrupted by noise. To solve optimal control problems in these situations, the cost function is expressed in terms of the state of a suitably chosen filter, which is often called an information state. Dynamic programming can then be applied using the information state dynamics. The nature of the measurements and the purpose for which the data is to be used determine the architecture of the filter. In stochastic situations this is closely linked to the probabilistic concept of conditional expectation. The famous Kalman filter computes dynamically conditional expectations(of states given measure- ments in linear gaussian models), which are also optimal estimates in the mean square error sense. The quantum Belaukin filter, or stochastic master equation, also computes a quantum version of conditional expectation. In linear gaussian cases, the information states are gaussian, a fact which considerably simplifies matters due to the finite num ber of parameters. Filters such as these based on computing conditional expectations of states or system variables do not include any information about the cost or performance objective. While this is not an issue for many problems such as LQG, where the task of estimation can be completely decoupled from that of control [17, there are important problems where the filter dynamics must be modified to take into account the control objective. These problems include LEQG[48, 49] or risk-sensitive control[8, 37, and Hoo robust control [19, 54 Figure 1 shows a physical system being controlled in a feedback loop. The so-calle separation structure of the controller is shown. The control values are computed in the box marked"control", using a function of the information state determined using dynamic programming. The information state, as has been mentioned, is the state of the filter whose dynamics are built into the box marked"filter". This structure embodies the two themes of these notes output physical system control filter feed back controller Figure 1: Feedback controller showing the separation structure These notes were assembled from various lecture notes and research papers, and so we apologize for the inevitable inconsistencies that resulted
Filtering concerns the processing of measurement information. In optimal control, filters are used to represent information about the system and control problem of interest. In general, this information is incomplete, i.e. the state is typically not fully accessible, and may be corrupted by noise. To solve optimal control problems in these situations, the cost function is expressed in terms of the state of a suitably chosen filter, which is often called an information state. Dynamic programming can then be applied using the information state dynamics. The nature of the measurements and the purpose for which the data is to be used determine the architecture of the filter. In stochastic situations, this is closely linked to the probabilistic concept of conditional expectation. The famous Kalman filter computes dynamically conditional expectations (of states given measurements in linear gaussian models), which are also optimal estimates in the mean square error sense. The quantum Belavkin filter, or stochastic master equation, also computes a quantum version of conditional expectation. In linear gaussian cases, the information states are gaussian, a fact which considerably simplifies matters due to the finite number of parameters. Filters such as these based on computing conditional expectations of states or system variables do not include any information about the cost or performance objective. While this is not an issue for many problems such as LQG, where the task of estimation can be completely decoupled from that of control [17], there are important problems where the filter dynamics must be modified to take into account the control objective. These problems include LEQG [48, 49] or risk-sensitive control [8, 37], and H∞ robust control [19, 54]. Figure 1 shows a physical system being controlled in a feedback loop. The so-called separation structure of the controller is shown. The control values are computed in the box marked “control”, using a function of the information state determined using dynamic programming. The information state, as has been mentioned, is the state of the filter whose dynamics are built into the box marked “filter”. This structure embodies the two themes of these notes. ✛ ✲ filter ✛ physical system u y control feedback controller input output Figure 1: Feedback controller showing the separation structure. These notes were assembled from various lecture notes and research papers, and so we apologize for the inevitable inconsistencies that resulted. 4
2 Deterministic Dynamic Programming and viscos ity solutions References for this section include 24],[25 1,3,[15 2.1 Introduction 2.1.1 Preamble Hamilton-Jacobi(Hs) equations are nonlinear first-order partial differential equations of the form (x,V(x),VV(x)=0 (one can also consider second-order equations but we do not do so here). V(r)( C Ro)is the unknown function to be solved for, and Vv(a)=(2r(),., a))denotes the gradient. F(a, v, A)is a nonlinear function Or HJ equations have a long history, dating back at least to the calculus of variations the 19th century, and HJ equations find wide application in science, engineering, etc Perhaps surprisingly, it was only relatively recently that a satisfactory general notion of solutions for(1) became available, with the introduction of the concept of viscosity solution(Crandall-Lions, c. 1980). The difficulty of course, is that solutions are not in general globally smooth(e.g. C). Solutions are often smooth in certain regions, in which the famous method of characteristics may be used to construct solutions. There are a number of other notions of solution available, such as encountered in non-smooth analysis (e.g. pro. imal solution), though we will not discuss them here. In Engineering our principal interest in HJ equations lies in their connection with optimal control(and games) via the dymamic programming methodology. The value func tion is a solution to an HJ equation, and solutions of HJ equations can be used to test a controller for optimality, or perhaps to construct a feedback controller. In these notes we discuss dynamic programming and viscosity solutions in the context of two examples, and make some mention of the general theory 2.1.2 Optimal Control As a first and perhaps familiar example(e.g. LQR), let's consider a finite time horizon optimal control problem defined on a time interval [ to, ty (to, o)=inf (to, To, u()) Here, To is the initial state at time to, and u() is the control;J(to, to, u() represent the associated cost To be specific, and to prepare us for dynamic programming, suppose one wants to minimize the cost functional J(t, r; u())=/L((s), u(s)ds+(a(t1))
2 Deterministic Dynamic Programming and Viscosity Solutions References for this section include [24], [25], [3], [15]. 2.1 Introduction 2.1.1 Preamble Hamilton-Jacobi (HJ) equations are nonlinear first-order partial differential equations of the form F(x, V (x), ∇V (x)) = 0 (1) (one can also consider second-order equations but we do not do so here). V (x) (x ∈ Ω ⊂ Rn ) is the unknown function to be solved for, and ∇V (x) = ( ∂V (x) ∂x1 , . . . , ∂V (x) ∂xn ) denotes the gradient. F(x, v, λ) is a nonlinear function. HJ equations have a long history, dating back at least to the calculus of variations of the 19th century, and HJ equations find wide application in science, engineering, etc. Perhaps surprisingly, it was only relatively recently that a satisfactory general notion of solutions for (1) became available, with the introduction of the concept of viscosity solution (Crandall-Lions, c. 1980). The difficulty, of course, is that solutions are not in general globally smooth (e.g. C 1 ). Solutions are often smooth in certain regions, in which the famous method of characteristics may be used to construct solutions. There are a number of other notions of solution available, such as encountered in non-smooth analysis (e.g. proximal solution), though we will not discuss them here. In Engineering our principal interest in HJ equations lies in their connection with optimal control (and games) via the dynamic programming methodology. The value function is a solution to an HJ equation, and solutions of HJ equations can be used to test a controller for optimality, or perhaps to construct a feedback controller. In these notes we discuss dynamic programming and viscosity solutions in the context of two examples, and make some mention of the general theory. 2.1.2 Optimal Control As a first and perhaps familiar example (e.g. LQR), let’s consider a finite time horizon optimal control problem defined on a time interval [t0, t1]: J ∗ (t0, x0) = inf u(·) J(t0, x0, u(·)) (2) Here, x0 is the initial state at time t0, and u(·) is the control; J(t0, x0, u(·)) represent the associated cost. To be specific, and to prepare us for dynamic programming, suppose one wants to minimize the cost functional J(t, x; u(·)) = Z t1 t L(x(s), u(s)) ds + ψ(x(t1)), (3) 5
where r( is the solution of the initial value problem i(s)=∫(x(s),u(s),t≤s≤t1, (t) Here, tE[to, ti] is a "variable"initial time, u( )is a control defined on t, ti] taking values space of admissible controls, containing at least the piecewise continuous control The value function is defined by infJ(t,x;(·) for (t, r)E[to, ti x R". The dymamic programming principle states that for every r E t,til v(tr inf L((s),u(s)ds+V(r,a(r) ()∈,r (we will prove this later on). From this, one can derive formally the equation at v(t, )+H(r, Vav(t, a ))=0 in(to, t1)X R with terminal data V v(ar) in R Here, the Hamiltonian is given by H(x,入)=inf{λ·f(x,U)+L(x,U)} The nonlinear first order PDE (7) is the dynamic programming PDE or Hamilton-Jacobi- Bellman(hjb) equation. The pair(7),( 8) specify what is called a Cauchy problem, and can be viewed as a special case of (1)together with suitable boundary conditions, usin Q=(to, ti)x Rn. Notice that the Hamiltonian( 9)is concave in the variable A(since it is the infimum of linear functions) Let us see how(7)is obtained. Set r=t+h, h>0, and rearrange(6)to yield inf (V(t+h, c(t+h))-v(t, a))+ If V and u() are sufficiently smooth, then (V(+h,a(t+h)-v(t, 2)-atv(a, t)+v2V(r, t). f(, u(t)as h-0 L(x(s),u(s))ds→L(x,u(t)ash→0
where x(·) is the solution of the initial value problem x˙(s) = f(x(s), u(s)), t ≤ s ≤ t1, x(t) = x. (4) Here, t ∈ [t0, t1] is a “variable” initial time, u(·) is a control defined on [t, t1] taking values in, say, U ⊂ Rm (U closed), and x(·) is the state trajectory in Rn . We denote by Ut,t1 a space of admissible controls, containing at least the piecewise continuous controls. The value function is defined by V (t, x) = inf u(·)∈Ut,t1 J(t, x; u(·)) (5) for (t, x) ∈ [t0, t1] × Rn . The dynamic programming principle states that for every r ∈ [t, t1], V (t, x) = inf u(·)∈Ut,r Z r t L(x(s), u(s)) ds + V (r, x(r)) (6) (we will prove this later on). From this, one can derive formally the equation ∂ ∂tV (t, x) + H(x, ∇xV (t, x)) = 0 in (t0, t1) × Rn , (7) with terminal data V (t1, x) = ψ(x) in Rn . (8) Here, the Hamiltonian is given by H(x, λ) = inf v∈U {λ · f(x, v) + L(x, v)} (9) The nonlinear first order PDE (7) is the dynamic programming PDE or Hamilton-JacobiBellman (HJB) equation. The pair (7), (8) specify what is called a Cauchy problem, and can be viewed as a special case of (1) together with suitable boundary conditions, using Ω = (t0, t1) × Rn . Notice that the Hamiltonian (9) is concave in the variable λ (since it is the infimum of linear functions). Let us see how (7) is obtained. Set r = t + h, h > 0, and rearrange (6) to yield inf u(·) 1 h (V (t + h, x(t + h)) − V (t, x)) + 1 h Z t+h t L(x(s), u(s)) ds = 0. If V and u(·) are sufficiently smooth, then 1 h (V (t + h, x(t + h)) − V (t, x)) → ∂ ∂tV (x, t) + ∇xV (x, t) · f(x, u(t)) as h → 0 and 1 h Z t+h t L(x(s), u(s)) ds → L(x, u(t)) as h → 0. 6
Combining these displays one is formally led to(7). A proof of(7)when V is sufficiently smooth requires a careful derivation of two inequalities which combine to give(7). Below we will prove that v is a viscosity solution of (7); in fact, the unique one satisfying the terminal condition( 8) Verification. Let V(t, r)be a CI solution of(7), 8). Let u(Eui,t be any control Then using(7) d v(t, a(t))=av(t, a(t)+vv(t, r(t)i(t) av(t, r(t))+Vv(t, c(t))f(a(t),u(t) ≥-L(x(t),u(t) Integrating, we get V(ti, a(t1))-v(to, To)2-/L(a(t),u(t)) v(to, o)<M L(a(t), u(t)dt+V(t1,a(ti) L((t), u(t)dt +v(a(t1)) using( 8). This shows that V(to, to)<V(to, To)(V is the value function defined by (5)) Now this same calculation for the control u()=u*(EUto,, t satisfying ) E argmin,{V(,x(),((0)+L((0y for tEt1, ti, where x() is the corresponding state trajectory, gives V(to, o)=/L(a*(t),u(t)dt +v(ar"(t1) showing that in fact u* is optimal and V(to, ro)=v(to, to). Indeed we have V=V in to, ti] x R by this arguement, and so we have shown that any smooth solution to(7) (8)must equal the value function-this is a uniqueness result. Unfortuneatly, in general there may be no such smooth solutions Optimal feedback. The above calculations suggest how one might obtain an optimal feedback controller. To simplify a bit, suppose that U=R, f(, u)=f(a)+g(r)u, L(, u)=e()+ alul Then evaluating the infimum in( 9) gives g(a)'A and H(c, A)=Af(r)-jAg(a)g(c)X+e(r)
Combining these displays one is formally led to (7). A proof of (7) when V is sufficiently smooth requires a careful derivation of two inequalities which combine to give (7). Below we will prove that V is a viscosity solution of (7); in fact, the unique one satisfying the terminal condition (8). Verification. Let V˜ (t, x) be a C 1 solution of (7), (8). Let u(·) ∈ Ut1,t1 be any control. Then using (7) d dtV˜ (t, x(t)) = ∂ ∂tV˜ (t, x(t)) + ∇V˜ (t, x(t)) ˙x(t) = ∂ ∂tV˜ (t, x(t)) + ∇V˜ (t, x(t))f(x(t), u(t)) ≥ −L(x(t), u(t)) Integrating, we get V˜ (t1, x(t1)) − V˜ (t0, x0) ≥ − Z t1 t0 L(x(t), u(t))dt or V˜ (t0, x0) ≤ R t1 t0 L(x(t), u(t))dt + V˜ (t1, x(t1)) = R t1 t0 L(x(t), u(t))dt + ψ(x(t1)) using (8). This shows that V˜ (t0, x0) ≤ V (t0, x0) (V is the value function defined by (5)). Now this same calculation for the control u(·) = u ∗ (·) ∈ Ut0,t1 satisfying u ∗ (t) ∈ argmin v∈U n ∇xV˜ (t, x∗ (t)) · f(x ∗ (t), v) + L(x ∗ (t), v) o , (10) for t ∈ [t1, t1], where x ∗ (·) is the corresponding state trajectory, gives V˜ (t0, x0) = Z t1 t0 L(x ∗ (t), u∗ (t))dt + ψ(x ∗ (t1)) showing that in fact u ∗ is optimal and V˜ (t0, x0) = V (t0, x0). Indeed we have V˜ = V in [t0, t1] × Rn by this arguement, and so we have shown that any smooth solution to (7), (8) must equal the value function - this is a uniqueness result. Unfortuneatly, in general there may be no such smooth solutions. Optimal feedback. The above calculations suggest how one might obtain an optimal feedback controller. To simplify a bit, suppose that U = Rm, f(x, u) = f(x) + g(x)u, L(x, u) = `(x) + 1 2 |u| 2 . Then evaluating the infimum in (9) gives u ∗ = −g(x) 0λ 0 and H(x, λ) = λf(x) − 1 2 λg(x)g(x) 0λ 0 + `(x). 7
Hence the HjB equation can be written as 0 VVf-5vvgg'VV (11) with optimal feedback controller (t, r)=g(a)'VV(t, r) This means that the optimal control u* (Eu is given by (t)=a(t,x(t),to≤t≤t1 Of course, this makes sense only when V is sufficiently smooth The equation(11)is sometimes refered to as a nonlinear riccati equation LQR. Take U=R",f(x,u)=Ax+Bu,L(x,u)=是2+u2,v(x)=号①x As a trial solution of (11)we use V(t, r)=5a'P(t where P(t)>0(symmetric)is to be determined. Now ot(t, x)=Jr'P() c, and VV(t, a)=xP(t) Plugging these into(11) gives 号xP(t)x+xP(Ax-xP(t)B'P(t)x+xx=0 Since this holds for all r Rn we must have P(t)+AP(t)+P(t)A-P(t)BB'P(t)+I At time t=t1 we have v(t1,x)=是x业x, and so P(t1)=业 (14) Therefore if there exists a CI solution P(t)to the Riccati differential equation(13)on to, ti] with terminal condition(14)we obtain a smooth solution 2r'P(t) r to(7), (8),and s argued above the value function for the lQr problem is given by (,x)=是xP(t) (15) The optimal feedback controller is given by (t,x)=-BP() This gives the optimal control u*(Eu (1)=-B'P(t)x(t),to≤t≤t1
Hence the HJB equation can be written as ∂ ∂tV + ∇V f − 1 2∇V gg0∇V 0 + ` = 0 (11) with optimal feedback controller u ∗ (t, x) = −g(x) 0∇V (t, x) 0 . (12) This means that the optimal control u ∗ (·) ∈ U is given by u ∗ (t) = u ∗ (t, x∗ (t)), t0 ≤ t ≤ t1. Of course, this makes sense only when V is sufficiently smooth. The equation (11) is sometimes refered to as a nonlinear Riccati equation. LQR. Take U = Rm, f(x, u) = Ax + Bu, L(x, u) = 1 2 |x| 2 + 1 2 |u| 2 , ψ(x) = 1 2 x 0Ψx. As a trial solution of (11) we use V˜ (t, x) = 1 2 x 0P(t)x, where P(t) ≥ 0 (symmetric) is to be determined. Now ∂ ∂tV˜ (t, x) = 1 2 x 0P˙(t)x, and ∇V (t, x) = x 0P(t). Plugging these into (11) gives 1 2 x 0P˙(t)x + x 0P(t)Ax − 1 2 x 0P(t)BB0P(t)x + 1 2 x 0x = 0. Since this holds for all x ∈ Rn we must have P˙(t) + A 0P(t) + P(t)A − P(t)BB0P(t) + I = 0. (13) At time t = t1 we have V˜ (t1, x) = 1 2 x 0Ψx, and so P(t1) = Ψ. (14) Therefore if there exists a C 1 solution P(t) to the Riccati differential equation (13) on [t0, t1] with terminal condition (14) we obtain a smooth solution 1 2 x 0P(t)x to (7), (8), and as argued above the value function for the LQR problem is given by V (t, x) = 1 2 x 0P(t)x. (15) The optimal feedback controller is given by u ∗ (t, x) = −B 0P(t)x. (16) This gives the optimal control u ∗ (·) ∈ U: u ∗ (t) = −B 0P(t)x ∗ (t), t0 ≤ t ≤ t1. (17) 8
2.1.3 Distance function As another example, we consider the distance function d(a, an) to the boundary an of an open, bounded set Q R". In some ways the HJ equation for this function is simpler than that of the optimal control problem described above, and we can more easily explain viscosity solutions and issues of uniqueness, etc, in this context The distance function is defined by d(x,092)=inf|r- Note that the infimum here is always attained, not necessarily uniquely, since an is compact and y H -yl is continuous; denote by r(ar)c an the set of minimizing y We write V(x)=d(x,09) for simplicity, and consider V(a)as a function on the closed set Q2. It can be verified that V(r)is a non-negative Lipschitz continuous function. In fact, we shall see that V is the unique continuous viscosity solution of VV|-1=0in9 (20) satisfying the boundary condition V=0 on aQ Equations(20)and(21)constitute a Dirichlet problem Example 2.1 Q=(1, 1)CR. Here, aQ=f-1, 1 and Q2=[-1, 1.Then v(a) 1+xif-10 we have V(x)=,inf{x-2|+V(z)} We will use this later to show that V is a viscosity solution of (20), but for now we discuss d derive(22
2.1.3 Distance Function As another example, we consider the distance function d(x, ∂Ω) to the boundary ∂Ω of an open, bounded set Ω ⊂ Rn . In some ways the HJ equation for this function is simpler than that of the optimal control problem described above, and we can more easily explain viscosity solutions and issues of uniqueness, etc, in this context. The distance function is defined by d(x, ∂Ω) = inf y∈∂Ω |x − y|. (18) Note that the infimum here is always attained, not necessarily uniquely, since ∂Ω is compact and y 7→ |x − y| is continuous; denote by π(x) ⊂ ∂Ω the set of minimizing y. We write V (x) = d(x, ∂Ω) (19) for simplicity, and consider V (x) as a function on the closed set Ω. It can be verified that V (x) is a non-negative Lipschitz continuous function. In fact, we shall see that V is the unique continuous viscosity solution of |∇V | − 1 = 0 in Ω (20) satisfying the boundary condition V = 0 on ∂Ω. (21) Equations (20) and (21) constitute a Dirichlet problem. Example 2.1 Ω = (−1, 1) ⊂ R1 . Here, ∂Ω = {−1, 1} and Ω = [−1, 1]. Then V (x) = 1 + x if − 1 ≤ x ≤ 0 1 − x if 0 ≤ x ≤ 1 which is Lipschitz continuous, and differentiable except at x = 0. At each point x 6= 0 V solves the HJ equation (20), and V satisfies the boundary condition (21) (V (−1) = v(1) = 0), see Figure 2. Note that π(x) = −1 for −1 ≤ x 0 we have V (x) = inf |x−z|<r {|x − z| + V (z)}. (22) We will use this later to show that V is a viscosity solution of (20), but for now we discuss and derive (22). 9
Figure 2: Distance function V and another Lipschitz solution Vi Fixr∈ Q and r>0, and let|x-x<r. Choose y(2)∈丌(2), so that v(2 12-y*(a).Then V(x)≤|x-y*(2) ≤|x-2|+|z-y*(x) Since this holds for all laI-al <r we have (z)} To see that equality holds, simply take z=a. Thus establishes(22). Note that there are many minimizers z*for the RHS of(22), viz. segments of the lines joining z to points in 2.1. 4 Viscosity Solutions We turn now to the concept of viscosity solution for the HJ equation(1). The terminology comes from the vanishing viscosity method, which finds a solution V of (1)as a limit Vε→ V of solutions te AV(a)+ F(a, V(a), Vv(r))=0 The Laplacian term△v=是∑m1 azve can be used to model fluid viscosity.The definition below is quite independent of this limiting construction, and is closely related to dynamic programming; however, the definition applies also to equations that do not necessarily correspond to optimal control
V V_1 -1 1 Figure 2: Distance function V and another Lipschitz solution V1. Fix x ∈ Ω and r > 0, and let |x − z| < r. Choose y ∗ (z) ∈ π(z), so that V (z) = |z − y ∗ (z)|. Then V (x) ≤ |x − y ∗ (z)| ≤ |x − z| + |z − y ∗ (z)| = |x − z| + V (z). Since this holds for all |x − z| < r we have V (x) ≤ inf |x−z|<r {|x − z| + V (z)}. To see that equality holds, simply take z = x. Thus establishes (22). Note that there are many minimizers z ∗ for the RHS of (22), viz. segments of the lines joining x to points in π(x). 2.1.4 Viscosity Solutions We turn now to the concept of viscosity solution for the HJ equation (1). The terminology comes from the vanishing viscosity method, which finds a solution V of (1) as a limit V ε → V of solutions to − ε 2 ∆V ε (x) + F(x, V ε (x), ∇V ε (x)) = 0 (23) The Laplacian term ε 2∆V ε = ε 2 Pn i=1 ∂ 2 ∂x2 i V ε can be used to model fluid viscosity. The definition below is quite independent of this limiting construction, and is closely related to dynamic programming; however, the definition applies also to equations that do not necessarily correspond to optimal control. 10