正在加载图片...
When I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars except that they do not commute at first. Numerically take X-randn(n)and E-randn(n) for e=001 say, and then compule m less familiar The numerical (or first order perturbation theory) interpretation applies, but it may see X+∈E ≈X2E+XEX+EX This is the matrix version of (1). Holding X fixed and allowing E to vary, the right-hand side is a linear function of E. There is no simpler form possible Symbolically(or numerically) one can take dX= Ek which is the matrix that has a one in element(k, 1) nd 0 elsewhere. Then we can write down the matrix of partial derivatives 0X3 X(Ekl)+X(Ek)X+(Ek)X As we let h, I vary over all possible indices, we obtain all the information we need to compute the derivative in any general direction E In general, the directional derivative of Yi,(X)in the direction dX is given by(dY)ij. For a particula matrix X, dy(X) is a matrix of directional derivatives corresponding to a first order perturbation in tl direction E=dX. It is a matrix of linear functions corresponding to the linearization of Y(X)about X Structured Perturbations We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example belor that we will want to restrict E so that X E is antisymmetric when X is orthogonal Here y is a scalar and dot products commute so that dy= 2r dz. When y= l, a is on the unit sphere. o stay on the sphere, we need dy=0 so that z dr=0, i.e., the tangent to the sphere is perpendicular to the sphere. Note the two uses of dy. In the first case it is the change to the squared length of a. In the second case, by setting dy=0, we find perturbations to a which to first order do not change the length at ll.Indeed if one computes(r+dr)(r+dr) for a small finite dr, one sees that if r de=0 then the length changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle is second order in the distance along the tangent Example 3: y=z Ar Again y is scalar. We have dy=dxTAz +Adz. If A is symmetric then dy=2x"Adz Example 4: Y=X-1 We have that XY=I so that X(dy)+(dX)Y=0 so that dy=-X-dXX-I We recommend that the reader compute e-((X+EE)-X)numerically and verify that it is equal to-x-IEX-I In other words (X+E)-=X--eX-Ex-1+O(e2) Example 5:I=QQ If Q is orthogonal we have that Q dQ +dQ Q=0 so that Q"dQ is antisymmetric. If y is a scalar function of T1, T2,..., In then we have the"chain rule dy +dx2+.+dWhen I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars, except that they do not commute. The numerical (or first order perturbation theory) interpretation applies, but it may seem less familiar at first. Numerically take X=randn(n) and E=randn(n) for ǫ = .001 say, and then compute (X + ǫE)3 − X3 ≈ X2E + XEX + EX2 . (2) ǫ This is the matrix version of (1). Holding X fixed and allowing E to vary, the right-hand side is a linear function of E. There is no simpler form possible. Symbolically (or numerically) one can take dX = Ekl which is the matrix that has a one in element (k, l) and 0 elsewhere. Then we can write down the matrix of partial derivatives: ∂X3 = X2(Ekl) + X(Ekl)X + (Ekl)X2 . ∂xkl As we let k, l vary over all possible indices, we obtain all the information we need to compute the derivative in any general direction E. In general, the directional derivative of Yij (X) in the direction dX is given by (dY )ij . For a particular matrix X, dY (X) is a matrix of directional derivatives corresponding to a first order perturbation in the direction E = dX. It is a matrix of linear functions corresponding to the linearization of Y (X) about X. Structured Perturbations We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric, antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example below that we will want to restrict E so that XTE is antisymmetric when X is orthogonal. Example 2: y = xTx Here y is a scalar and dot products commute so that dy = 2xTdx. When y = 1, x is on the unit sphere. To stay on the sphere, we need dy = 0 so that xTdx = 0, i.e., the tangent to the sphere is perpendicular to the sphere. Note the two uses of dy. In the first case it is the change to the squared length of x. In the second case, by setting dy = 0, we find perturbations to x which to first order do not change the length at all. Indeed if one computes (x+ dx)T (x+ dx) for a small finite dx, one sees that if xT dx = 0 then the length changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle is second order in the distance along the tangent. Example 3: y = xTAx Again y is scalar. We have dy = dxTAx + xTAdx. If A is symmetric then dy = 2xTAdx. Example 4: Y = X−1 We have that XY = I so that X(dY ) + (dX)Y = 0 so that dY = −X−1dXX−1 . We recommend that the reader compute ǫ−1((X + ǫE)−1 − X−1) numerically and verify that it is equal to −X−1EX−1 . In other words, (X + ǫE) −1 = X−1 − ǫX−1EX−1 + O(ǫ2). Example 5: I = QTQ If Q is orthogonal we have that QTdQ + dQTQ = 0 so that QTdQ is antisymmetric. In general, d(QTQ) = QTdQ+dQTQ, but with no orthogonality condition on Q, there is no anti-symmetry condition on QTdQ. If y is a scalar function of x1, x2, . . . , xn then we have the “chain rule” ∂y ∂y ∂y dy = dx1 + dx2 + . . . + dxn . ∂x1 ∂x2 ∂xn
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有