麻省理工学院：《Multidisciplinary System》Lecture 14 Lagrange Multipliers.pdf_大学文库

MIest Kuhn-Tucker Conditions 16888 Interpretation Mlesd Optimization for Engineering 50. 3 Problems Condition 1: the optimal design satisfies the constraints Most engineering problems have a complicated design space, usually with several local optima Condition 2: if a constraint is not precisely satisfied, then Gradient-based methods can have trouble converging to the the corresponding Lagrange multiplier is zero orrect solution the h Lagrange multiplier represents the sensitivity of the Heuristic techniques offer absolutely no guarantee of objective function to the h constraint optimality, neither global nor local can be thought of as representing the tightness"of the Your post-optimality analysis should address the question constraint How confident are you that you have found the global if i, is large, then constraint is important for this solution optimum? Condition 3: the gradient of the lagrangian vanishes at Do you actually care? the optimum @ Massachusetts Institute of Technology -Prof de Weck and Prof Willcox G Massachusetts Insttute of Technology .Prof de Weck and Prof Willcox Engineering Systems Division and Dept of Aeronautics and Astronautics Engineering Systems Division and Dept of Aeronautics and Astronautics Mlesd Optimization for Engineering 650 3 MleSaTermination Criteria: Gradient-Based 50. 3 Problems Gradient-based algorithm is terminated when Usually cannot guarantee that absolute optimum is found an acceptable solution is found local optima numerical ill-conditioning a gradient-based techniques should be started from Need to decide. several initial solutions when an acceptable solution is found o best solution from a heuristic technique should be checked with kt conditions or used as an initial when to stop the algorithm with no acceptable condition for a gradient-based algorithm solution Can determine mathematically if have relative minimum but when progress is unreasonably slow Kuhn- Tucker conditions are only sufficient if the problem is when a specified amount of resources have been used(time number of iterations, etc. It is very important to interrogate the"optimum" solution when an acceptable solution does not exist a Massachusetts Institute of Technology - Prof de Weck and Prof Willcox when the iterative process is cycling @Massachusetts Insttute of Technology. Prof de Weck and Prof Willcox Engineening Systems Division and Dept of Aeronautics and Astronautics Aeronautics and astronautics

5 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Kuhn-Tucker Conditions: Tucker Conditions: Interpretation Interpretation Condition 1: the optimal design satisfies the constraints Condition 2: if a constraint is not precisely satisfied, then the corresponding Lagrange multiplier is zero – the jth Lagrange multiplier represents the sensitivity of the objective function to the jth constraint – can be thought of as representing the “tightness” of the constraint – if λj is large, then constraint j is important for this solution Condition 3: the gradient of the Lagrangian vanishes at the optimum 6 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Optimization for Engineering Optimization for Engineering Problems Problems • Most engineering problems have a complicated design space, usually with several local optima • Gradient-based methods can have trouble converging to the correct solution • Heuristic techniques offer absolutely no guarantee of optimality, neither global nor local • Your post-optimality analysis should address the question: – How confident are you that you have found the global optimum? – Do you actually care? 7 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Optimization for Engineering Optimization for Engineering Problems Problems • Usually cannot guarantee that absolute optimum is found – local optima – numerical ill-conditioning Æ gradient-based techniques should be started from several initial solutions Æ best solution from a heuristic technique should be checked with KT conditions or used as an initial condition for a gradient-based algorithm • Can determine mathematically if have relative minimum but Kuhn-Tucker conditions are only sufficient if the problem is convex • It is very important to interrogate the “optimum” solution 8 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Gradient Termination Criteria: Gradient-Based Gradient-based algorithm is terminated when ... an acceptable solution is found OR algorithm terminates unsuccessfully Need to decide: • when an acceptable solution is found • when to stop the algorithm with no acceptable solution – when progress is unreasonably slow – when a specified amount of resources have been used (time, number of iterations, etc.) – when an acceptable solution does not exist – when the iterative process is cycling

MIed Termination Criteria: Heuristics:| MIest Termination in iSIGHT幽甥 Simulated Annealing-cooling schedule: T(k=f(k, To) Be careful with iSIGHT: the last solution is not necessarily the best one Search stops when Need to look back over all the solutions in the monitor to T(k)≤,wheeε>0, find the"optimum but smal Feasibility paramete 3= infeasible search broadly search locally Tabu search termination 8= feasible and equal to the best design found so fal Usually after a predefined number of iterations 9=feasible and the best design found so far Best solution found is reported No guarantee of optimality The "optimum" will be the last solution with feasibility=9 Experimentation gives confidence e Massachusetts Institute of Technology- Prof de Weck and Prof Willcox e Massachusetts Insttute of Technology. Prof de Weck and Prof Willcox Engineering Systems Division and Dept of Aeronautics and Astronautics Engineering Systems Division and Dept of Aeronautics and Astronautics Mlesd Post-Optimality Analysis E07 Mlesd Lagrange Multipliers 507 The values of the Lagrange multipliers at the optimal Already talked about sensitivity analysis( Lecture 9) solution, i, give information on the constraints How does optimal solution change as a parameter is varied? If y is zero, then constraint j is inactive How does optimal solution change as a design If a is positive then constraint is active variable value is varied? The value of the /h Lagrange multiplier tells you by How does optimal solution change as constraints how much the objective function will change if the are varied? constraint is varied by a small amount Also would like to understand key drivers in optimal 0(x)=-4g(x) i is a vector containing the m LMs g is a vector containing all m constraints(inequality+equality) a Massachusetts Institute of Technology -Prof de Weck and Prof Willcox e Massachusetts Insttute of Technology. Prof de Weck and Prof Willcox Engineening Systems Division and Dept of Aeronautics and Astronautics Engineering Systems DiMsion and Dept of Aeronautics and Astronautics

13 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Heuristics Termination Criteria: Heuristics • Simulated Annealing - cooling schedule: T(k)=f(k, To) To 0 Search stops when T(k)0, but small • Tabu search termination • Usually after a predefined number of iterations • Best solution found is reported • No guarantee of optimality • Experimentation gives confidence search broadly search locally 14 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination in Termination in iSIGHT • Be careful with iSIGHT: the last solution is not necessarily the best one • Need to look back over all the solutions in the monitor to find the “optimum” • Feasibility parameter: 3 = infeasible 7 = feasible 8 = feasible and equal to the best design found so far 9 = feasible and the best design found so far • The “optimum” will be the last solution with feasibility=9 15 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Post-Optimality Analysis Optimality Analysis • Already talked about sensitivity analysis (Lecture 9) – How does optimal solution change as a parameter is varied? – How does optimal solution change as a design variable value is varied? – How does optimal solution change as constraints are varied? • Also would like to understand key drivers in optimal design 16 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lagrange Multipliers Lagrange Multipliers • The values of the Lagrange multipliers at the optimal solution, λj*, give information on the constraints. • If λj* is zero, then constraint j is inactive. • If λj* is positive, then constraint j is active. • The value of the jth Lagrange multiplier tells you by how much the objective function will change if the constraint is varied by a small amount: T ∂ =− ∂ J g ( *) ( *) x x λ – λ is a vector containing the m LMs – is a vector containing all g m constraints (inequality+equality)

17 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Objective Function Behavior Consider the quadratic function: 1 ( ) 2 Φ= + T T x c x x Hx The behavior of Φ(x) in the neighborhood of a local minimum is determined by the eigenvalues of H. 1 ( ) ( ) ( )( ) ˆ ˆ ˆˆ 2 Φ+ = + + + + α α αα T T x p c x p x p Hx p 1 1 2 ˆ ˆˆ ˆ ˆ 2 2 22 α α =+ + + + + α α TT T T T T c x x Hx c p p Hp x Hp p Hx 1 2 ( ) () ( ) ˆ ˆˆ 2 T Φ + =Φ + + + αα α T x p x p Hx c p Hp 18 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Objective Function Behavior 1 2 ( ) () ( ) ˆ ˆˆ 2 T Φ + =Φ + + + αα α T x p x p Hx c p Hp Consider the neighborhood of the optimal solution: x x ˆ = * ∇Φ = + = ( *) * 0 x Hx c or Hx c * = − 1 2 ( * ) ( *) 2 Φ + =Φ + α α T x p x p Hp The behavior of Φ in the neighborhood of x* is determined by H. 19 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Objective Function Behavior Let vj, λj be the jth eigenvector and eigenvalue of H: and since H is symmetric. Consider the case when p=vj: Hv v j = λ j j T i j ij v v = δ 2 2 1 ( * ) ( *) 2 1 *) 2 j j j j α α α λ Φ + =Φ + = Φ( + T x v x v Hv x As we move away from x* along the direction vj, the change in the objective depends on the sign of λj. 20 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Objective Function Behavior 1 2 ( * ) *) 2 Φ + = Φ( + xv x α j α λ j • If λj>0, Φ increases • If λj0, x* is a minimum of Φ • The contours of Φ are ellipsoids – principal axes in directions of eigenvectors – lengths of principal axes inversely proportional to square roots of eigenvalues

21 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Objective Function Behavior v1 v2 contours of constant Φ direction of first eigenvector • If λ2=λ1, the contours are circular • As λ2/λ1 gets very small, the ellipsoids get more and more stretched • If any eigenvalue is very close to zero, Φ will change very little when moving along that eigenvector direction of second eigenvector Φ(x) x 1 x 2 22 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Hessian Condition Number Hessian Condition Number • The condition number of the Hessian is given by • When κ(H)=1, the objective function contours are circular • As κ(H) increases, the contours become elongated • If κ(H)>>1, the change in the objective function due to a small change in x will vary radically depending on the direction of perturbation • κ(H) can be computed via a Cholesky factorization (H=LDLT) n λ κ λ 1 ( )= Η 23 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Scaling • In theory we should be able to choose any scaling of the design variables, constraints and objective functions without affecting the solution • In practice, the scaling can have a large effect on the solution ⇒ numerical accuracy, numerical conditioning • From Papalambros, p. 352: “scaling is the single most important, but simplest, reason that can make the difference between success and failure of a design optimization algorithm” 24 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Scaling 1 2 1 23 3 1 2 2 3 min 3 s.t. 5 2 1 632 i 0 x xx x x x x x + + + ≥ − ≥ ≥ 1 2 1 23 3 1 2 2 3 min 10 30 10 28 s.t. 5 2 1 632 i 0 xxx x x x x x +++ + ≥ − ≥ ≥ 1 2 1 23 3 1 2 2 3 min 3 s.t. 50 20 10 632 i 0 x xx x x x x x + + + ≥ − ≥ ≥ 1 2 12 3 3 1 2 2 3 min 3 10 s.t. 5 2 1 6 30 2 i 0 xx x x x x x x + + + ≥ − ≥ ≥ ≡ ≡ scale objective scale design variable scale constraint

Mlesa Design Variable Scaling 50 Mles Design Variable Scaling 505 In aircraft design, we are combining variables of very Consider the transformation x= Ly, where L is an arbitrary different magnitudes nonsingular transformation matrix eg. aircraft range~10° If at any iteration in the algorithm, x= ly (using exact arithmetic), then the algorithm is said to be scale invariant wing span-10'm this property will not hold skin thickness 10-3m The conditioning of the Hessian matrix at x gives us Need to non-dimensionalize and scale variables to information about the scaling of the design variables be of similar magnitude in the region of interest When H(x) is ill-conditioned, J(x) varies much more rapidly along some directions than along others Want each variable to be of similar weight during the optimization The ill-conditioning of the Hessian is a form of bad scaling since similar changes in xl do not cause similar changes in J e Massachusetts Institute of Technology- Prof de Weck and Prof Willcox e Massachusetts Insttute of Technology. Prof de Weck and Prof Willcox Engineering Systems Division and Dept of Aeronautics and Astronautics Engineering Systems Division and Dept of Aeronautics and Astronautics MIlesd Design Variable Scaling Objective Function Scaling We saw that if H(x*)is ill-conditioned(x(H)>>1), then the In theory, we can multiply J(x)by any constant or add change in the objective function due to a small change in x a constant term and not affect the solution will vary radically depending on the direction of perturbation In practice, it is generally desirable to have J-o(1)in J(x)may vary so slowly along an eigenvector associated with the region of interest a near-zero eigenvalue that changes that should be Algorithms can have difficulties if J(x) is very small significant are lost in rounding error everywhere, since convergence is usually tested using some small quantity Ve would like to scale our design variables so that K(H-1 Inclusion of a constant term can also cause difficulties In practice this may be unachievable(often we dont know H) since the error associated with the sum may reflect the size of the constant rather than the size of (x) Often, a diagonal scaling is used where we consider only the e.g. min x,2+x2 VS. min x,2+x2+1000 diagonal elements of H(x )and try to make them close to e Massachusetts Institute of Technology- Prof de Weck and Prof Willcox e Massachusetts Insttute of Technology. Prof de Weck and Prof Willcox Engineening Systems Division and Dept of Aeronautics and Astronautics Engineering Systems DiMsion and Dept of Aeronautics and Astronautics

25 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Design Variable Scaling Design Variable Scaling • In aircraft design, we are combining variables of very different magnitudes • e.g. aircraft range ~ 106 m wing span ~ 101 m skin thickness ~ 10-3 m • Need to non-dimensionalize and scale variables to be of similar magnitude in the region of interest • Want each variable to be of similar weight during the optimization 26 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Design Variable Scaling Design Variable Scaling • Consider the transformation x = Ly, where L is an arbitrary nonsingular transformation matrix • If at any iteration in the algorithm, xk = Lyk (using exact arithmetic), then the algorithm is said to be scale invariant • In practice, this property will not hold • The conditioning of the Hessian matrix at x* gives us information about the scaling of the design variables • When H(x) is ill-conditioned, J(x) varies much more rapidly along some directions than along others • The ill-conditioning of the Hessian is a form of bad scaling, since similar changes in ||x|| do not cause similar changes in J 27 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Design Variable Scaling Design Variable Scaling • We saw that if H(x*) is ill-conditioned (κ(H)>>1), then the change in the objective function due to a small change in x will vary radically depending on the direction of perturbation • J(x) may vary so slowly along an eigenvector associated with a near-zero eigenvalue that changes that should be significant are lost in rounding error • We would like to scale our design variables so that κ(H)~1 • In practice this may be unachievable (often we don’t know H) • Often, a diagonal scaling is used where we consider only the diagonal elements of H(x0) and try to make them close to unity 28 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Scaling Objective Function Scaling • In theory, we can multiply J(x) by any constant or add a constant term, and not affect the solution • In practice, it is generally desirable to have J~O(1) in the region of interest • Algorithms can have difficulties if J(x) is very small everywhere, since convergence is usually tested using some small quantity • Inclusion of a constant term can also cause difficulties, since the error associated with the sum may reflect the size of the constant rather than the size of J(x) e.g. min x12+x22 vs. min x12+x22 +1000

MIest Constraint Scaling 16888 Scal A well scaled set of constraints has two properties o reasons to scale each constraint is well conditioned with respect to perturbations in the design variables 1. At the beginning of optimization, using X, improve algorithm performance(e.g. decrease the constraints are balanced with respect to number of iterations) each other, i.e. all constraints have an equal weighting in the optimization → dairy farm demo 2. At the end of optimization, using x*, to make sure that the "optimal solution is indeed the The scaling of constraints can have a major effect on the best we can achieve path chosen by the optimizer. For example, many → BWB example algorithms maintain a set of active constraints and from one iteration to the next they interchange one active and one inactive constraint Constraint scaling impacts the selection of which constraint to add or delete e Massachusetts Institute of Technology- Prof de Weck and Prof Willcox e Massachusetts Insttute of Technology. Prof de Weck and Prof Willcox Engineering Systems Division and Dept of Aeronautics and Astronautics Engineering Systems Division and Dept of Aeronautics and Astronautics MIest Scaling Example Scaling EXample Consider optimization of the BWB Consider a two-aircraft family with common wings Rather than optimizing just a single aircraft, we want to design a family of aircraft commor ■ different This family has commonality-the planes share cente common parts, planforms and systems Commonality can help to reduce costs e.g. manufacturing costs design costs spare parts BWB3-450 BWB 3-250 475 passengers, 8550 nm 272 passengers, 8550 nm Inner wng But will require a trade with performance We set up an MDo framework for each aircraft It is easier to achieve commonality with the BWb thai We link the variables that are common between the two a, with conventional tube wing aircraft ogy- Prof de Weck and Prof Willcox 32 aircraft etts Institute of Technology . Prof de Weck and Prof. Willcox Engineering Systems Division and Dept of Aeronautics and Astronautics ystems Divsion and Dept of Aeronautics and Astronautics

29 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Constraint Scaling Constraint Scaling A well scaled set of constraints has two properties: – each constraint is well conditioned with respect to perturbations in the design variables – the constraints are balanced with respect to each other, i.e. all constraints have an equal weighting in the optimization The scaling of constraints can have a major effect on the path chosen by the optimizer. For example, many algorithms maintain a set of active constraints and from one iteration to the next they interchange one active and one inactive constraint. Constraint scaling impacts the selection of which constraint to add or delete. 30 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Scaling Two reasons to scale: 1. At the beginning of optimization, using x0, to improve algorithm performance (e.g. decrease number of iterations). ⇒ dairy farm demo 2. At the end of optimization, using x*, to make sure that the “optimal” solution is indeed the best we can achieve. ⇒ BWB example 31 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example Scaling Example • Consider optimization of the BWB • Rather than optimizing just a single aircraft, we want to design a family of aircraft • This family has commonality – the planes share common parts, planforms and systems • Commonality can help to reduce costs e.g. manufacturing costs design costs spare parts crew training • But will require a trade with performance • It is easier to achieve commonality with the BWB than with conventional tube & wing aircraft centerbody inner wing outer wing winglet 32 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example Scaling Example • Consider a two-aircraft family with common wings: • We set up an MDO framework for each aircraft • We link the variables that are common between the two aircraft BWB 3-250 272 passengers, 8550 nm BWB 3-450 475 passengers, 8550 nm common different

33 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example Scaling Example 0.99 1.00 1.01 1.02 1.03 0 20 40 60 80 100 Iteration number Plane 1 Plane 2 • In order to test the framework, we first try to optimize a family with no commonality • We should get the point-design solutions for each plane • However, the algorithm (SQP) does not converge to the correct solution for the smaller plane! MTOW Point-Design MTOW MTOW results for last 100 iterations 34 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example Scaling Example 0.99 1.00 1.01 1.02 1.03 0 20 40 60 80 100 Plane 1 Plane 2 (scaled) Plane 2 (unscaled) • When we look at the Hessian matrix at the solution, we see that the diagonal entries corresponding to the design variables of Plane 2 are badly scaled • We rescale these variables, and now the algorithm converges Iteration number MTOW Point-Design MTOW MTOW results for last 100 iterations ~4000 kg 35 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lecture Summary Lecture Summary • Optimality Conditions • Objective Function Behavior • Scaling • In practice, optimization can be very difficult to implement: algorithms can behave badly, and it can be difficult (impossible) to verify that a solution is truly optimal • Numerical accuracy is a real issue and can drastically affect results, especially if the problem is not well scaled • It is very important to interrogate the “optimum” solution carefully • The mathematical tools you learn are very useful in practice, but they must be applied carefully! 36 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics References References • Gill, P.E, Murray, W. and Wright, M. H., Practical Optimization, Academic Press, London, 1981 • Vanderplaats, G.N., Numerical Optimization Techniques for Engineering Design, Vanderplaats R&D, 1999. • Willcox, K. and Wakayama, S., “ Simultaneous Optimization of a Multiple-Aircraft Family,” AIAA Paper 2002-1423, 2002