The Stieltjes transform based approach om Matrices 18. J/16.394J: The Mathematics of Infinite Rando Rai rao Handout #4, Thursday, September 16, 2004 1 The eigenvalue distribution function For an N X N matrix AN, the eigenvalue distribution function (e. d.f. FAN(a)is defined as FaN()=summer or egenv es orAns As defined, the e.d. f. is right continuous and possibly atomic i.e. with step discontinuities at discrete points. In practical terms, the derivative of(1), referred to as the(eigenvalue) level density, is simply the appropriately normalized histogram of the eigenvalues of A. The MATLAB code hist we distributed earlier approximates this density A surprising result in infinite RMT is that for some matrix ensembles, the expectation EIFAN(a)has a well defined i. e. not zero and not infinite limit. We drop the notational dependence on N in(1) by defining the limiting edf as This limiting e.d. f. is also sometimes referred to in literature as the integrated density of states[ 2, 3.It derivative is referred to as the level density in physics literature 4. The region of support associated with this limiting density ply ion where dFA(a)#0. When discussing the limiting e.d.f. we shall often distinguish between, its atomic and non-atomic components 2 The Stieltjes transform representation One step removed from the e.d. f. is the Stieltjes transform which has proved to be an efficient tool for determining this limiting density. For all non-real z the Stieltjes(or Cauchy) transform of the probability neasure FA(a)is given by ()=/2=4F()Im:≠0 (3) The integral above is over the whole 3 or some subset of the real axis since for the matrices of interest such as the Hermitian or real symmetric matrices, the eigenvalues are real. When we refer to the"Stieltjes transform of A "in this paper, we are referring to a(z) defined as in (3)expressed in terms of the limiting density dF(a)of the random matrix ensemble A I This is also 2Unless we state otherwise any reference to an e.d. f. or the level density. in this paper will refer to the corresponding limiting e.d.f. or density respectively. 3While the Stieltjes integral is over the positive real axis, the tegral is more general [5] and can include complex contours as well. This distinction is irrelevant for several practica of matrices, such as the sample covariance matrices, there all of the eigenvalues are non-negative this paper,(3) will be referred to as the Stieltjes transform with the implicit assumption that the integral is over t. eal axis
6 � 18.338J/16.394J: The Mathematics of Infinite Random Matrices The Stieltjes transform based approach Raj Rao Handout #4, Thursday, September 16, 2004. 1 The eigenvalue distribution function For an N × N matrix AN , the eigenvalue distribution function 1 (e.d.f.) F AN (x) is defined as F AN (x) = Number of eigenvalues of AN ≤ x . (1) N As defined, the e.d.f. is right continuous and possibly atomic i.e. with step discontinuities at discrete points. In practical terms, the derivative of (1), referred to as the (eigenvalue) level density, is simply the appropriately normalized histogram of the eigenvalues of AN . The MATLAB code histn we distributed earlier approximates this density. A surprising result in infinite RMT is that for some matrix ensembles, the expectation E[F AN (x)] has a well defined i.e. not zero and not infinite limit. We drop the notational dependence on N in (1) by defining the limiting e.d.f. as F A(x) = lim E[F AN (x)]. (2) N→∞ This limiting e.d.f. 2 is also sometimes referred to in literature as the integrated density of states [2, 3]. Its derivative is referred to as the level density in physics literature [4]. The region of support associated with this limiting density is simply the region where dF A(x) = 0. When discussing the limiting e.d.f. we shall often distinguish between, its atomic and non-atomic components. 2 The Stieltjes transform representation One step removed from the e.d.f. is the Stieltjes transform which has proved to be an efficient tool for determining this limiting density. For all non-real z the Stieltjes (or Cauchy) transform of the probability measure F A(x) is given by 1 mA(z) = x − z dF A(x) Im z 6= 0. (3) The integral above is over the whole 3 or some subset of the real axis since for the matrices of interest, such as the Hermitian or real symmetric matrices, the eigenvalues are real. When we refer to the “Stieltjes transform of A” in this paper, we are referring to mA(z) defined as in (3) expressed in terms of the limiting density dF A(x) of the random matrix ensemble A. 1This is also referred to in literature as the empirical distribution function [1]. 2Unless we state otherwise any reference to an e.d.f. or the level density. in this paper will refer to the corresponding limiting e.d.f. or density respectively. 3While the Stieltjes integral is over the positive real axis, the Cauchy integral is more general [5] and can include complex contours as well. This distinction is irrelevant for several practical classes of matrices, such as the sample covariance matrices, where all of the eigenvalues are non-negative. Nonetheless, throughout this paper, (3) will be referred to as the Stieltjes transform with the implicit assumption that the integral is over the entire real axis
The Stieltjes transform in (3)can also be interpreted as an expectation with respect to the measure (r) such that Since there is a one-to-one correspondence between the probability measure FA(a) and the Stieltjes nsform, convergence of the Stieltjes transform can be used to show the convergence of the probability measure Fa(r). Once this convergence has been established, the Stieltjes transform can be used to yield the density using the so-called Stieltjes-Perron inversion formula [6 d=亓如mmm4(x+i) (5) When studying the limiting distribution of large random matrices, the Stieltjes transform has proved to be particularly relevant because of its correspondence with the matrix resolvent. The trace of the matrix resolvent, MA(z), defined as MA(z)=(AN-2r)-I can be written as trMA(z)l where Ai for i= 1, 2, are the eigenvalues of AN. For any AN, MA(z) is a non-random quantity However, when AN is a random matrix ma(z)=lim =tr[MA(a) The Stieltjes transform and its resolvent form in(7) are intimately linked to the classical moment problem 6. This connection can be observed by noting that the integral in (3) can be expressed as an analytic multipole" series expansion about 2= oo such that 2k+1 k+1 k=0 k=1 where MA=adFA()is the Ath moment of r on the probability measure dFA(a). The analyticity of the Stieltjes transform about z oo expressed in( 8)is a consequence of our implicit assumption that the region of support for the limiting density dFa(a)is bounded i.e. limz-oo dF(a)=0 Incidentally, the n-transform introduced by Tulino and Verdi in 7 can be expressed in terms of m(z) nd permits a series expansion about 2=0 Given the relationship in(7), it is worth noting that the matrix moment MA is simply MA= lim xtr[AN]=/rkdFA() Equation(8), written as a multipole series, suggests that a way of computing the density would be te determine the moments of the random matrix as in(9), and then invert the Stieltjes transform using(5).For the famous semi-circular law, Wigner actually used a moment based approach in [8 to determine the density for the standard Wigner matrix though he did not explicitly invert the Stieltjes transform as we suggested e reader may imagine, such a moment based approach is not particularly useful for more genera classes of random matrices. We discuss a more relevant and practically useful Stieltjes transform based
� � � � The Stieltjes transform in (3) can also be interpreted as an expectation with respect to the measure F A(x) such that 1 mA(z) = EX . (4) x − z Since there is a one-to-one correspondence between the probability measure F A(x) and the Stieltjes transform, convergence of the Stieltjes transform can be used to show the convergence of the probability measure F A(x). Once this convergence has been established, the Stieltjes transform can be used to yield the density using the so-called Stieltjes-Perron inversion formula [6] dF A(x) 1 = lim Im mA(x + iξ). (5) dx π ξ→0 When studying the limiting distribution of large random matrices, the Stieltjes transform has proved to be particularly relevant because of its correspondence with the matrix resolvent. The trace of the matrix resolvent, MA(z), defined as MA(z) = (AN − zI)−1 can be written as � N 1 tr[MA(z)] = (6) λi − z i=1 where λi for i = 1, 2, . . . , N are the eigenvalues of AN . For any AN , MA(z) is a non-random quantity. However, when AN is a large random matrix, 1 mA(z) = lim tr[MA(z)]. (7) N→∞ N The Stieltjes transform and its resolvent form in (7) are intimately linked to the classical moment problem [6]. This connection can be observed by noting that the integral in (3) can be expressed as an analytic “multipole” series expansion about z = ∞ such that � ∞ k ∞ � k mA(z) = zk+1 dF A(x) = � x dF A(x) � x zk+1 − − k=0 k=0 ∞ (8) 1 � Mk A = . zk+1 −z − k=1 where Mk A = xkdF A(x) is the kth moment of x on the probability measure dF A(x). The analyticity of the Stieltjes transform about z = ∞ expressed in (8) is a consequence of our implicit assumption that the region of support for the limiting density dF A(x) is bounded i.e. limx→∞ dF A(x) = 0. Incidentally, the η-transform introduced by Tulino and Verd`u in [7] can be expressed in terms of m(z) and permits a series expansion about z = 0. Given the relationship in (7), it is worth noting that the matrix moment MA is simply k MA = lim 1 tr[Ak N ] = x kdF A(x). (9) k N→∞ N Equation (8), written as a multipole series, suggests that a way of computing the density would be to determine the moments of the random matrix as in (9), and then invert the Stieltjes transform using (5). For the famous semi-circular law, Wigner actually used a moment based approach in [8] to determine the density for the standard Wigner matrix though he did not explicitly invert the Stieltjes transform as we suggested above, As the reader may imagine, such a moment based approach is not particularly useful for more general classes of random matrices. We discuss a more relevant and practically useful Stieltjes transform based approach next
3 Stieltjes transform based approach Instead of obtaining the Stieltjes transform directly, the so-called Stieltjes transform approach relies instead on finding a canonical equation that the Stieltjes transform satisfies. The Marcenko-Pastur theorem 9 was the first and remains most famous example of such an approach. We include a statement of its theorem in the form found in literature. We encourage you to write this theorem, as an exericse, in a simpler manner. Theorem 1(The Marcenko-Pastur Theorem ). Consider an N X N matriT, BN. Assume that 1. Xn is an nxN matriz such that the matriz elements Xn are independent identically distributed (i i.d. opler random variables with mean zero and variance I i.e. XmEC, EIXml=0 and Ell Xm=1 2.n=n(N)uihn/N→c>0asN→∞ 3. Tn= diag(T, T2, .. Tn) where Ti E R, and the e d f. of Ti,..., Tn converges almost surely in distribution to a probability distribution function H(T)as N-o0 4. BN=AN+NXnTnXn, where AN is a Hermitian NXN matriz for which FAN converges vaguely to A almost surely, A being a possibly defective(i.e. with discontinuities) nonrandom distribution function Xn, Tn, and AN are independent Then, almost surely, FoN converges vaguely, almost surely, as N-oo to a nonrandom d f. F whose Stieltjes transform m(2), zE C, satisfies the canonical equation dhl m(2)=mA(z- 1+m(z) (10) We now illustrate the use of this theorem with a representative example. This example will help highlight issues that will be of pedagogical interest throughout this semester Suppose AN=0ie. BN=NXnTnXn. The Stieltjes transform of AN, by the definition in(3), is then Hence, using the Marcenko-Pastur theorem as expressed in(10), the Stieltjes transform m(z)of BN is given Rearranging the terms in this equation and using m instead of m(z) for notational convenience, we get dH(T) (13) Equation(13)expresses the dependence between the Stieltjes transform variable m and probability space variable z. Such a dependence, expressed explicitly in terms of dH(T), will be referred to throughout this paper as a canonical equation. Equation(13)can also be interpreted as the expression for the functional inverse of m(z) To determine the density of BN by using the inversion formula in(5) we need to first solve(13)for m(z) In order to obtain an equation in m and z we need to first know dH(T)in(13). In theory, dH(r) could be any density that satisfies the conditions of the Marcenko-Pastur theorem. However, as we shall shortly recognize, for an arbitrary distribution, it might not be possible to obtain an analytical or even an easy numerical solution for the density On the other hand, for some specific distributions of dH(T), it will indeed be possible to analytically obtain the density We consider one such distribution below L ppose T=I i.e. the diagonal elements of Tn are non-random with d.f. dH(T)=8(T-1).Equation (14)
� � � � 3 Stieltjes transform based approach Instead of obtaining the Stieltjes transform directly, the so-called Stieltjes transform approach relies instead on finding a canonical equation that the Stieltjes transform satisfies. The Marˇcenko-Pastur theorem [9] was the first and remains most famous example of such an approach. We include a statement of its theorem in the form found in literature. We encourage you to write this theorem, as an exericse, in a simpler manner. Theorem 1 (The Marˇcenko-Pastur Theorem ). Consider an N × N matrix, BN . Assume that 1. Xn is an n×N matrix such that the matrix elements Xn ij are independent identically distributed (i.i.d.) complex random variables with mean zero and variance 1 i.e. Xn ij ] = 0 and E[kXn = 1. ij ∈ C, E[Xn ijk2] 2. n = n(N) with n/N → c > 0 as N → ∞. 3. Tn = diag(τ1 n, τ2 n, . . . , τ n n } converges almost surely in n ) where τi n ∈ R, and the e.d.f. of {τ1 n, . . . , τ n distribution to a probability distribution function H(τ) as N → ∞. 1 4. BN = AN + N X∗TnXn, where AN is a Hermitian N ×N matrix for which F AN n converges vaguely to A almost surely, A being a possibly defective (i.e. with discontinuities) nonrandom distribution function. 5. Xn, Tn, and AN are independent. Then, almost surely, F BN converges vaguely, almost surely, as N → ∞ to a nonrandom d.f. F B whose Stieltjes transform m(z), z ∈ C, satisfies the canonical equation τ dH(τ) m(z) = mA z − c (10) 1 + τ m(z) We now illustrate the use of this theorem with a representative example. This example will help us highlight issues that will be of pedagogical interest throughout this semester. 1 Suppose AN = 0 i.e. BN = N X∗TnXn. The Stieltjes transform of AN , by the definition in (3), is then n simply 1 1 mA(z) = = . (11) 0 − z −z Hence, using the Marˇcenko-Pastur theorem as expressed in (10), the Stieltjes transform m(z) of BN is given by 1 m(z) = � . (12) τ dH(τ) −z − c 1+τm(z) Rearranging the terms in this equation and using m instead of m(z) for notational convenience, we get 1 τ dH(τ) z = + − c . (13) m 1 + τm Equation (13) expresses the dependence between the Stieltjes transform variable m and probability space variable z. Such a dependence, expressed explicitly in terms of dH(τ), will be referred to throughout this paper as a canonical equation. Equation (13) can also be interpreted as the expression for the functional inverse of m(z). To determine the density of BN by using the inversion formula in (5) we need to first solve (13) for m(z). In order to obtain an equation in m and z we need to first know dH(τ) in (13). In theory, dH(τ) could be any density that satisfies the conditions of the Marˇcenko-Pastur theorem. However, as we shall shortly recognize, for an arbitrary distribution, it might not be possible to obtain an analytical or even an easy numerical solution for the density On the other hand, for some specific distributions of dH(τ), it will indeed be possible to analytically obtain the density We consider one such distribution below. Suppose Tn = I i.e. the diagonal elements of Tn are non-random with d.f. dH(τ) = δ(τ − 1). Equation (13) then becomes 1 c z = + − . (14) m 1 + m
Rearranging the terms in the above equation we get zm(1+m)=-(1+m)+cm (15) which. with a bit of algebra can be written as m(1-c+z)+1=0 Equation(16)is a polynomial equation in m whose coefficients are polynomials in z. We will refer to such polynomials, often derived from canonical equations as Stieltjes(transform) polynomials for the remainder of this paper. As discussed, to obtain the density using (5) we need to first solve(16) for m in terms of z. Since, from (16), we have a second degree polynomial in m it is indeed possible to analytically solve for its roots an obtain the density. Figure 1: Level density for BN=(1/N)XnXn with c= 2 This level density, sometimes referred to in the literature as the Marcenko-Pastur distribution, is b dF(a) b√(x-b-)(b+=x (17) where b+=(1+va and I(- b+ is the indicator function that is equal to 1 for b-<z<b+ and 0 elsewhere Figure 1 compares the histogram of the eigenvalues of 1000 realizations of the matrix B IX*X wit N=100 and n= 200 with the solid line indicating the theoretical density given by(17)for c=n/N=2 ance matrices that appear often in array processing applications em that is motivated by the sample covari- We now consider a modification to the marcenko-Pastur thee 3.1 The Sample Covariance Matrix In the previous section we used the Marcenko-Pastur theorem to examine the density of a class of random matrices BN= LX*TnXn. Suppose we defined the N x n matrix, YN=X:Tn/ 2, then BN may be written Recall that Tn was assumed to be diagonal and non-negative definite so Tl/ can be constructed uniquely up o the sign. If YN were to be interpreted as a matrix of observations, then BN written as(18)is reminiscent
� � � Rearranging the terms in the above equation we get zm(1 + m) = −(1 + m) + cm (15) which, with a bit of algebra, can be written as m z 2 + m(1 − c + z) + 1 = 0. (16) Equation (16) is a polynomial equation in m whose coefficients are polynomials in z. We will refer to such polynomials, often derived from canonical equations as Stieltjes (transform) polynomials for the remainder of this paper. As discussed, to obtain the density using (5) we need to first solve (16) for m in terms of z. Since, from (16), we have a second degree polynomial in m it is indeed possible to analytically solve for its roots and obtain the density. dFB/dx 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −1 0 1 2 3 4 5 6 7 x Figure 1: Level density for BN = (1/N)X∗Xn with c = 2. n This level density, sometimes referred to in the literature as the Marˇcenko-Pastur distribution, is given by dF B(x) = max 0, 1 − c δ(x) + (x − b−)(b+ − x) I[ b− ,b+] (17) dx 2πx where b± = (1±√c)2 and I[b ,b+] is the indicator function that is equal to 1 for b− < z < b+ and 0 elsewhere. − 1 Figure 1 compares the histogram of the eigenvalues of 1000 realizations of the matrix BN = N Xn ∗Xn with N = 100 and n = 200 with the solid line indicating the theoretical density given by (17) for c = n/N = 2. We now consider a modification to the Marˇcenko-Pastur theorem that is motivated by the sample covariance matrices that appear often in array processing applications. 3.1 The Sample Covariance Matrix In the previous section we used the Marˇcenko-Pastur theorem to examine the density of a class of random 1 ∗ 1/2 matrices BN = N X∗TnXn. Suppose we defined the N × n matrix, YN = XnTn , then BN may be written n as 1 ∗ BN = YN YN . (18) N 1/2 Recall that Tn was assumed to be diagonal and non-negative definite so Tn can be constructed uniquely up to the sign. If YN were to be interpreted as a matrix of observations, then BN written as (18) is reminiscent
四 ↓an Figure 2: The matrices Bn and CN when n>N of sample covariance matrices that in many engineering and statistical applications. However, among other things, it is subtly different be of the normalization of the by the number of rows N of YN rather than by the number of colum Hence, we need to come up definition of a sample covar matrix that mirrors the manner in which it is used in practical applications With engineering, particularly signal processing applications in mind, we introduce the n x N matrix In'-Xn and define to be the sample covariance matrit(SCM). By comparing(18)and(19) while recalling the definitions of YN and On, it is clear that the eigenvalues of Bn are related to the eigenvalues of Cn. For BN of the form in(18) the Marcenko- Pastur theorem can be used to obtain the canonical equation for mB(a) given by(13). Recal that by mB(z) we mean the Stieltjes transform associated with the limiting e d f. F(r) of BN as N-00 There is however, an exact relationship between the non-limiting e df's FBN() and the FCn(a)and hence the corresponding Stieltjes transforms mBN(a) and mc, (z) respectively. We exploit this relationship below to derive the canonical equation for Cn from the canonical equation for BN given in(13) Figure 2 schematically depicts Cn and BN when n>N i.e. when c>l. In this case, Cn, as denoted in the figure, will have n-N zero eigenvalues. The other N eigenvalues of Cn will, however, be identically equal to the N eigenvalues of BN. Hence, the e d f. of Cn can be exactly expressed in terms of the e.d. f of Recalling the definition of the Stieltjes transform in(3), this implies that the Stieltjes transform mcn (a) of Cn is related to the Stieltjes transform mBN(z) of BN by the expression mc,(2) +cmB(2) Similarly, Figure 3 schematically depicts Cn and BN when n< N i.e. c<l. In this case, BN, as denoted in the figure, will have N-n zero eigenvalues. The other n eigenvalues of BN will, however, be identically equal to the n eigenvalues of Cn. Hence, as before, the e.d. f. of BN can be exactly expressed in terms of the e.d.f. of C. as N(a) I(o,oo+-FCn(r 2-1)4as+=Fc Once again, recalling the definition of the Stieltjes transform in(3), this implies that the Stieltjes transform mcn(=)of Cn is related to the Stieltjes transform mBN(a) of Bn by the expres
� � � � � � � � 1 n n X∗ n n N n N n T 1/2 n n T 1/2 Xn n n N n n n n N n X∗ n T 1/2 T n Xn 1/2 n O 1 N n n N YN × YN × O ∗ ∗ = = Cn BN Figure 2: The matrices Bn and CN when n > N. of sample covariance matrices that appear in many engineering and statistical applications. However, among ∗ other things, it is subtly different because of the normalization of the YN YN by the number of rows N of YN rather than by the number of columns n. Hence, we need to come up with a definition of a sample covariance matrix that mirrors the manner in which it is used in practical applications. With engineering, particularly signal processing applications in mind, we introduce the n × N matrix, 1/2 On = Tn Xn and define 1 ∗ Cn = OnOn (19) N to be the sample covariance matrix (SCM). By comparing (18) and (19) while recalling the definitions of YN and On, it is clear that the eigenvalues of Bn are related to the eigenvalues of Cn. For BN of the form in (18), the Marˇcenko-Pastur theorem can be used to obtain the canonical equation for mB(z) given by (13). Recall, that by mB(z) we mean the Stieltjes transform associated with the limiting e.d.f. F B(x) of BN as N → ∞. There is however, an exact relationship between the non-limiting e.d.f.’s F BN (x) and the F Cn (x) and hence the corresponding Stieltjes transforms mBN (z) and mCn (z) respectively. We exploit this relationship below to derive the canonical equation for Cn from the canonical equation for BN given in (13). Figure 2 schematically depicts Cn and BN when n > N i.e. when c > 1. In this case, Cn, as denoted in the figure, will have n − N zero eigenvalues. The other N eigenvalues of Cn will, however, be identically equal to the N eigenvalues of BN . Hence, the e.d.f. of Cn can be exactly expressed in terms of the e.d.f. of BN as n F Cn (x) = n − N I(0,∞] + F BN (x) (20) N N = (c − 1)I(0,∞] + cF BN (x). (21) Recalling the definition of the Stieltjes transform in (3), this implies that the Stieltjes transform mCn (z) of Cn is related to the Stieltjes transform mBN (z) of BN by the expression mCn (z) = − c − 1 + cmBN (z). (22) z Similarly, Figure 3 schematically depicts Cn and BN when n < N i.e. c < 1. In this case, BN , as denoted in the figure, will have N − n zero eigenvalues. The other n eigenvalues of BN will, however, be identically equal to the n eigenvalues of Cn. Hence, as before, the e.d.f. of BN can be exactly expressed in terms of the e.d.f. of Cn as N F BN (x) = N − n I(0,∞] + F Cn (x) (23) n n 1 1 = c − 1 I(0,∞] + F Cn (x) (24) c Once again, recalling the definition of the Stieltjes transform in (3), this implies that the Stieltjes transform mCn (z) of Cn is related to the Stieltjes transform mBN (z) of BN by the expression 1 1 1 mBN (z) = − c − 1 z + c mCn (z). (25)
Figure 3: The matrices Bn and CN when n <N Multiplying both sides of(25) by c, we get cmBN(2)=-(1-c)-+mcn(2) which upon rearranging the terms is precisely(22). Equation(22)is an eract expression relating the Stieltjes transforms of Cn and BN for all n and N. As N- 00, the Marcenko-Pastur theorem states that mBN(2)mB(a) which implies that the limiting Stieltjes transform mc(a), for Cn can be written in terms of mB(z) using(22) 1 +cmB(a) For our purpose of getting the canonical equation for mc(a) using the canonical equation for mB(a),it is more useful to rearrange the terms in(27)and express mB(a) can be written in terms of mc(z). This elationship is simply mB(2) (a-) 2+ -mc(a) Hence, to obtain the canonical equation for mc(a) we simply have to substitute the expression for mB(z) in(28)into(13). With some fairly straightforward algebra, that we shall omit here, it can be verified that mc(a) is the solution to the canonical equation dH(r) {(1-c-czmc(2)}7-z (29) Incidentally, (29)was first derived by Silverstein in[10). He noted that the eigenvalues of +Tl Xnx: Tl/ were the same as those of xXnXnTn so that(29)was the canonical equation for this class of matrices as well. Additionally, in their proof of the Marcenko-Pastur theorem in 11, Bai and Silverstein dropped the restriction on Tn being diagonal so that Tn could be any matrix whose e d f. FIn-H. As the reader may appreciate, this broadens the class of matrices for which this theorem may be applied 3.2 The(generalized) Wishart matrix Revisiting the previous example, when Tn=l, Cn=NXnXn. This matrix Cn is the generalized version of he famous Wishart matrix ensemble first studied by Wishart in 1928 12. In physics literature, the Wishart matrix is also referred to as the Laguerre Ensemble [13. Strictly speaking, Cn is referred to as a Wishart matrix only when the elements of Xn are i.i. d. Gaussian random variables. The canonical equation for C in(29) beco hich upon rearranging yields the Stieltjes polynomia c zm)-2 n三 czm2-(1-c-2)m+1=0. (31)
� � � n N n n n T n n 1/2 n Xn T 1/2 n X∗ N n n n nn n n N X∗ N n T 1/2 n T 1/2 n Xn 1 1 N On n N YN × YN × O∗ ∗ = = Cn BN Figure 3: The matrices Bn and CN when n < N. Multiplying both sides of (25) by c, we get 1 c mBN (z) = −(1 − c) + mCn (z) (26) z which upon rearranging the terms is precisely (22). Equation (22) is an exact expression relating the Stieltjes transforms of Cn and BN for all n and N. As N → ∞, the Marˇcenko-Pastur theorem states that mBN (z) → mB(z) which implies that the limiting Stieltjes transform mC(z), for Cn can be written in terms of mB(z) using (22) as mC(z) = c − 1 − + c mB(z). (27) z For our purpose of getting the canonical equation for mC (z) using the canonical equation for mB(z), it is more useful to rearrange the terms in (27) and express mB(z) can be written in terms of mC (z). This relationship is simply 1 1 1 mB(z) = − c − 1 z + mC (z). (28) c Hence, to obtain the canonical equation for mC(z) we simply have to substitute the expression for mB(z) in (28) into (13). With some fairly straightforward algebra, that we shall omit here, it can be verified that mC (z) is the solution to the canonical equation dH(τ) mC(z) = . (29) {(1 − c − c z mC (z)}τ − z 1/2 ∗ 1/2 Incidentally, (29) was first derived by Silverstein in [10]. He noted that the eigenvalues of 1 N Tn XnXnTn ∗ were the same as those of 1 XnXnTn so that (29) was the canonical equation for this class of matrices as N well. Additionally, in their proof of the Marˇcenko-Pastur theorem in [11], Bai and Silverstein dropped the restriction on Tn being diagonal so that Tn could be any matrix whose e.d.f. F Tn → H. As the reader may appreciate, this broadens the class of matrices for which this theorem may be applied. 3.2 The (generalized) Wishart matrix 1 ∗ Revisiting the previous example, when Tn = I, Cn = N XnX . This matrix Cn n is the generalized version of the famous Wishart matrix ensemble first studied by Wishart in 1928 [12]. In physics literature, the Wishart matrix is also referred to as the Laguerre Ensemble [13]. Strictly speaking, Cn is referred to as a Wishart matrix only when the elements of Xn are i.i.d. Gaussian random variables. The canonical equation for Cn in (29) becomes 1 m = (30) (1 − c − c z m) − z which upon rearranging yields the Stieltjes polynomial 2 c z m − (1 − c − z)m + 1 = 0. (31)
c=0.01 Figure 4: The density of the(generalized)Wishart matrix Wn(c) for different values of c Once again,(31) is a second degree polynomial in m whose coefficients are polynomials in 2. Hence, as before, (31)can be solved analytically to yield a solution for m in terms of z. The inversion formula in(5 can then be used to obtain the limiting density for Cn. This is simply max[0, r-H where, as before, b+=(1+vc)2. As(32)suggests, the limiting density for Cn depends only on the parameter C. Hence, for the remainder of this paper, we will use the notation W(c) to denote the(generalized )Wishart matrix ensemble defined as W(c)=NXnXn with c=n/N>0 Figure 4 plots the density in(32)for different values of c. From(32)the reader may notice that as c-0 i.e. for a fixed n as N- 00, both the largest and the smallest eigenvalue, b+ and b_- respectively, approach 1. This validates our intuition about the behavior of w(c) for a fixed n as N- o0. Additionally, from Figure 4, the reader may notice that the density gets increasingly more symmetrical about z l as the value of c decreases. For c=0.01, the density is almost perfectly symmetrical about z 1. The reader may note that with an appropriate scaling and translation, the density of w(c) as c-0 could be made to resemble the semi-circular distribution. In fact, in [14 Jonsson used this observation and the correspondence between the distribution in(32)and the generalized B-distribution to infer the moments of Wn(c) from the even moments of the Wigner matrix which are incidentally the Catalan numbers denoted by Ck for an integer k More recently, Dumitriu recognized [15] that these moments could be written in terms of the(k, r)Narayana umbers [16 defined as 1/k)/k-1 (33) so that the individual moments may be obtained from the moment generating function M(a=N=>()(7) for which, it may be noted that Mp(1)=Ck=M%k are also the even moments of the standard Wigner matrix en discussing infinite(generalized) Wishart matrices we will simply to W(e)as Wishart matrix e elements of Xn are i.i. d. but not Gaussian. We will occasionally add a subscript such as Wi(c) differentiate between realizations of the ensemble W(c). When discussing finite Wishart matrices, we will implicitly ume that the elements of X are i.i.d. Gaussian. Its use in either manner will be obvious from the context
1 2 3 � � � � �� � � � 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1.5 2.5 3.5 dFC/dx c=1 c=0.5 c=0.1 c=0.01 x Figure 4: The density of the (generalized) Wishart matrix {Wn(c)} for different values of c. Once again, (31) is a second degree polynomial in m whose coefficients are polynomials in z. Hence, as before, (31) can be solved analytically to yield a solution for m in terms of z. The inversion formula in (5) can then be used to obtain the limiting density for Cn. This is simply dF C (x) 1 = max 0, 1 − δ(x) + (x − b−)(b+ − x) I[ b− ,b+] (32) dx c 2πxc where, as before, b± = (1±√c)2 . As (32) suggests, the limiting density for Cn depends only on the parameter c. Hence, for the remainder of this paper, we will use the notation {W(c)} to denote the (generalized) Wishart matrix ensemble 4 1 ∗ defined as W(c) = N XnXn with c = n/N > 0. Figure 4 plots the density in (32) for different values of c. From (32) the reader may notice that as c → i.e. for a fixed n as N → ∞, both the largest and the smallest eigenvalue, b+ and b− respectively, approach 1. This validates our intuition about the behavior of W(c) for a fixed n as N → ∞. Additionally, from Figure 4, the reader may notice that the density gets increasingly more symmetrical about x = 1 as the value of c decreases. For c = 0.01, the density is almost perfectly symmetrical about x = 1. The reader may note that with an appropriate scaling and translation, the density of {W(c)} as c → 0 could be made to resemble the semi-circular distribution. In fact, in [14] Jonsson used this observation and the correspondence between the distribution in (32) and the generalized β-distribution to infer the moments of Wn(c) from the even moments of the Wigner matrix which are incidentally the Catalan numbers denoted by Ck for an integer k. More recently, Dumitriu recognized [15] that these moments could be written in terms of the (k, r) Narayana numbers [16] defined as Nk,r = 1 k k − 1 (33) r + 1 r r so that the individual moments may be obtained from the moment generating function k−1 k−1 � �� � M r k W (c) = c rNk,r = c k k − 1 (34) r r r=0 r=0 for which, it may be noted that Mk W (1) = Ck = M2 S k are also the even moments of the standard Wigner matrix. 4For notational convenience, when discussing infinite (generalized) Wishart matrices we will simply refer to {W (c)} as the Wishart matrix even when the elements of Xn are i.i.d. but not Gaussian. We will occasionally add a subscript such as W1(c) to differentiate between different realizations of the ensemble {W (c)}. When discussing finite Wishart matrices, we will implicitly assume that the elements of Xn are i.i.d. Gaussian. Its use in either manner will be obvious from the context. 0
Figure 5: The density of Cn with c=n/N=0.1 and dH()=0.68(T-1)+0.48(T-3 The Wishart matrix W(c) has been studied exhaustively by statisticians and engineers. For infinite Wishart matrices, (32)captures the limiting density and the extreme eigenvalues i.e. the region of support The limiting moments are given by (34). As it was for the asymptotic Wigner matrix, the limiting density of the asymptotic Wishart matrix did not depend on whether the elements of Xn were real or complex. Similarly, as it was for the Gaussian ensembles, the analytical behavior of the finite Wishart matrices did indeed depend on whether the elements were real or complex. Nonetheless, the limiting moment and eigenvalue behavior could be inferred from the behavior of the finite (real or complex) Wishart matrix counterpart. The reader is directed towards some of the representative literature and the references therein on the distribution of the smallest [17, largest [18, 19, 20, 21, sorted [19, 22, 23, unsorted eigenvalues [19, 22, 23, and condition numbers 22, 23 of the Wishart matrix that invoke this link between the finite and infinite Wishart matrix ensembles. We will now discuss sample covariance matrices for which the behavior of the limiting density can be best, if not solely, analyzed analytically using the Marcenko-Pastur theore 3.3 Additional examples of sample covariance matrices Suppose dH()=pS(T-A1)+(1-p)S(T-A2)i.e. Tn has an atomic mass of weight p at A1 and another atomic mass of weight(1-p)at A2. The canonical equation in(29)becomes A1(1-c-cm2)-2 A2(1 (35) which upon rearranging yields the Stieltjes polynomial A1c2m32A2+(-2A1cz+A1c2+2h1c2A2+A2c2)m2 +(A1A2+ A2cz +pA2cz-A12+1c2A2+22-A22+2A1Cz-pAlc2-2A1A2c)m (P2+z-p1c+入1c-A1+pA2c+pA1)=0.(36) It can be readily verified that if p=l and A,=1, then C,=1Tl/X,x* Tl/ is simply the Wishart matrix e discussed above. Though it might not seem so from a cursory look, for p= 1 and A1= 1,(36)can be shown after some elementary factorization to simplify to(31). Since(36) is a third degree polynomial in it can conceivably ved analytically using Cardano's formula For general c, p, A and A2 this is cumbersome and cannot be solved analytically as a function of z and c for arbitrary values of p, A1, and A2. However, for specific values of c, p, A1, and A2 we can numerically solve
� 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 dFC/dx 0.1 0 −1 0 1 2 3 4 5 6 x Figure 5: The density of Cn with c = n/N = 0.1 and dH(τ) = 0.6 δ(τ − 1) + 0.4 δ(τ − 3). The Wishart matrix {W(c)} has been studied exhaustively by statisticians and engineers. For infinite Wishart matrices, (32) captures the limiting density and the extreme eigenvalues i.e. the region of support. The limiting moments are given by (34). As it was for the asymptotic Wigner matrix, the limiting density of the asymptotic Wishart matrix did not depend on whether the elements of Xn were real or complex. Similarly, as it was for the Gaussian ensembles, the analytical behavior of the finite Wishart matrices did indeed depend on whether the elements were real or complex. Nonetheless, the limiting moment and eigenvalue behavior could be inferred from the behavior of the finite (real or complex) Wishart matrix counterpart. The reader is directed towards some of the representative literature and the references therein on the distribution of the smallest [17], largest [18, 19, 20, 21], sorted [19, 22, 23], unsorted eigenvalues [19, 22, 23], and condition numbers [22, 23] of the Wishart matrix that invoke this link between the finite and infinite Wishart matrix ensembles. We will now discuss sample covariance matrices for which the behavior of the limiting density can be best, if not solely, analyzed analytically using the Marˇcenko-Pastur theorem. 3.3 Additional examples of sample covariance matrices Suppose dH(τ) = p δ(τ − λ1) + (1 − p) δ(τ − λ2) i.e. Tn has an atomic mass of weight p at λ1 and another atomic mass of weight (1 − p) at λ2. The canonical equation in (29) becomes p 1 − p m = + (35) λ1(1 − c − cmz) − z λ2(1 − c − cmz) − z which upon rearranging yields the Stieltjes polynomial 2� 2 λ1 c 2 m 3 z 2λ2 + � −2λ1λ2cz + λ1cz 2 + 2λ1c 2λ2z + λ2cz m 2 + λ1λ2 + λ2cz + pλ2cz − λ1z + λ1c 2λ2 + z − λ2z + 2λ1cz − pλ1cz − 2λ1λ2 c � m − (pλ2 + z − pλ1c + λ1c − λ1 + pλ2c + pλ1) = 0. (36) 1 1/2 ∗ 1/2 It can be readily verified that if p = 1 and λ1 = 1, then Cn = N Tn XnXnTn is simply the Wishart matrix we discussed above. Though it might not seem so from a cursory look, for p = 1 and λ1 = 1, (36) can be shown after some elementary factorization to simplify to (31). Since (36) is a third degree polynomial in m it can conceivably be solved analytically using Cardano’s formula. For general c, p, λ1 and λ2 this is cumbersome and cannot be solved analytically as a function of z and c for arbitrary values of p, λ1, and λ2. However, for specific values of c, p, λ1, and λ2 we can numerically solve
the resulting Stieltjes polynomial in(36). For example, when p=0.6, A1=l and A2=3, (36)simplifies to 3c2m2+(4c2-6c+6c2)m2+(3-4z-6c+c2+3c2+2)m+=c+ For c=0.1,(37) becomes 3_m32+ 502+212)m2+(243169 99 +z=0 To determine the density from (38) we need to determine the roots of the polynomial in m and use the inversion formula in (5). Since we do not know the region of support for this density we would conjecture such a region and basically solve the polynomial above for every value of z. Using numerical tools such as the roots command in MATlAB this is not very difficult Figure 5 shows the excellent agreement between the theoretical density(solid line) obtained from numer ically solving(38)and the histogram of the eigenvalues of 1000 realizations of the matrix Cn with n= 100 and N=n/c= 1000. Figure 6 shows the behavior of the density for a range of values of c. This figure captures our intuition that as c-0, the eigenvalues of the sample covariance matrix Cn will be increasingly localized about A1 1 and A2=3. By contrast, capturing this very same analytic behavior using finite RMt is not as straightforward c=001 Figure 6: The density of Cn with dH(T)=0.68(T-1)+0.48(T-3)for different values of c Unlike the Wishart matrix, the distribution function(or level density) of finite dimensional covariance matrices, such as the Cn we considered in this example, can only be expressed in terms of zonal or other multivariate orthogonal polynomials that appear frequently in texts such as [19. Though these polynomials have been studied extensively by multivariate statisticians 24, 25, 26, 27, 28, 29, 30, 31] and more recently by combinatorists 32, 33, 34, 35, 36, 37 the prevailing consensus is that they are unwieldy and not particularly intuitive to work with. This is partly because of their definition as infinite series expansions which makes their numerical evaluation a non-trivial task when dealing with matrices of moderate dimensions. More importantly, from an engineering point of view, the Stieltjes transform based approaches allows us to generate plots of the form in Figure 6 with the implicit assumption that the matrix in question is infinite and yet predict the behavior of the eigenvalues for the practical finite matrix counterpart with remarkable accuracy, as Figure 5 corroborates. This is the primary motivation for this course's focus on developing infinite random matrix theory In the lectures that follow we will discuss other techniques that allow us to characterize a very broad class of infinite random matrices that cannot be characterized using finite rmt. we will often be intrigued by and speculate on the link between these infinite matrix ensembles and their finite matrix counterparts We encourage you to ask us questions on this or to explore them further
1 2 � � � � the resulting Stieltjes polynomial in (36). For example, when p = 0.6, λ1 = 1 and λ2 = 3, (36) simplifies to 2 � 11 11 3 2 2 2 2 3 c 2 m z + � 4 cz 2 − 6 cz + 6 c z m + (3 − 4 z − 6 c + 31 cz + 3 c + z )m + c + z − = 0. (37) 5 5 5 For c = 0.1, (37) becomes 27 2 243 169 99 2 3 m 3 z 2 + −50 z + 2/5 z 2 m + 100 − 50 z + z m − + z = 0 (38) 100 50 To determine the density from (38) we need to determine the roots of the polynomial in m and use the inversion formula in (5). Since we do not know the region of support for this density we would conjecture such a region and basically solve the polynomial above for every value of z. Using numerical tools such as the roots command in MATLAB this is not very difficult. Figure 5 shows the excellent agreement between the theoretical density (solid line) obtained from numerically solving (38) and the histogram of the eigenvalues of 1000 realizations of the matrix Cn with n = 100 and N = n/c = 1000. Figure 6 shows the behavior of the density for a range of values of c. This figure captures our intuition that as c → 0, the eigenvalues of the sample covariance matrix Cn will be increasingly localized about λ1 = 1 and λ2 = 3. By contrast, capturing this very same analytic behavior using finite RMT is not as straightforward. 0 1 2 3 4 5 6 7 8 9 10 0 0.5 1.5 2.5 x dF C /dx c=0.2 c=0.05 c=0.01 Figure 6: The density of Cn with dH(τ) = 0.6 δ(τ − 1) + 0.4 δ(τ − 3) for different values of c Unlike the Wishart matrix, the distribution function (or level density) of finite dimensional covariance matrices, such as the Cn we considered in this example, can only be expressed in terms of zonal or other multivariate orthogonal polynomials that appear frequently in texts such as [19]. Though these polynomials have been studied extensively by multivariate statisticians [24, 25, 26, 27, 28, 29, 30, 31] and more recently by combinatorists [32, 33, 34, 35, 36, 37] the prevailing consensus is that they are unwieldy and not particularly intuitive to work with. This is partly because of their definition as infinite series expansions which makes their numerical evaluation a non-trivial task when dealing with matrices of moderate dimensions. More importantly, from an engineering point of view, the Stieltjes transform based approaches allows us to generate plots of the form in Figure 6 with the implicit assumption that the matrix in question is infinite and yet, predict the behavior of the eigenvalues for the practical finite matrix counterpart with remarkable accuracy, as Figure 5 corroborates. This is the primary motivation for this course’s focus on developing infinite random matrix theory. In the lectures that follow we will discuss other techniques that allow us to characterize a very broad class of infinite random matrices that cannot be characterized using finite RMT. We will often be intrigued by and speculate on the link between these infinite matrix ensembles and their finite matrix counterparts. We encourage you to ask us questions on this or to explore them further
4 Exericses 1. Verified that Silversteins sample covariance matrix theorem can be inferred directly from the Marcenko- Pastur theorem. Note: You will have to do some substitution tricks to get the parameter" c"to refer to the same quantity. 2. Derive the moments of the Wishart matrix from the observation that as c-0, the density becomes approximately" semi-circular. Hint: you will have to make an approximation for the region of suppor while remembering that for any a 1, a2< a. There will also be a shifting and rescaling in this problem to get the terms to match up correctly. ecall that the moments of the wishart matrix are expresse in terms of the Narayana polynomials in(34) 3. Come up with numerical code to compute the theoretical density when dH(T) has three atomic masses g. dH(=0.48(T-1)+0.48(T-3)+0.28(T-7). Plot the limiting density as a function of z for of values of 4. Do the same when there are four atomic masses in dh()(e. g. dH()=0.3 8(T-1)+0.25 8(T-3)+ 0.258(T-7)+0.25(T-10)). Verify that the solution obtained m rith the simulations Hints: Do all the roots match up 5. What happens if there are atomic masses of negative weight in dH(T)(e.g. dH()=0.58(7+1)+ 058(T-1). Does the limiting theoretical density line up with the experimental results? Check the ssumptions of the Marcenko-Pastur theorem! Is this" allowed"? References [1 Sang II Choi and Jack W. Silverstein. Analysis of the limiting spectral distribution of large dimensional andom matrices. volume 54(2), pages 295-309 1995 2 Boris A. Khoruzenkho, A M. Khorunzhy, and L.A. Pastur. Asymptotic properties of large random matrices with independent entries. Journal of Mathematical Physics, 37: 1-28, 1996 3 L Pastur. A simple approach to global regime of random matrix theory. In Mathematical results in statistical mechanics, Marseilles, 1998, pages 429-454. World Sci. Publishing, River Edge, NJ, 1999 4 Madan Lal Mehta. Random Matrices. Academic Press, Boston, second edition, 1991 5 M. Abramowitz and L.A. Stegun, editors. Handbook of Mathematical Functions. Dover Publications New York. 1970 6 N I. Akhiezer. The classical moment problem and some related questions in analysis. Hafner Publishing Co.. New York. New York. 1965. Translated by N. Kemmer [7] A M. Tulino and S. Verdi. Random matrices and wireless communications. Foundations and Trends in Communications and Information Theory, 1(1), June 2004 8 Eugene P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Math.,62:548-564,1955 9 V.A. Marcenko and L.A. Pastur. Distribution of eigenvalues for some sets of random matrices. Math USSR Sbornik. 1: 457-483. 1967. [10 Jack W. Silverstein. Strong convergence of the empirical distribution of eigenvalues of large dimensional andom matrices. Journal of Multivariate Analysis, 55(2): 331-339, 1995 11 Z D. Bai and J. W. Silverstein. On the empirical distribution of eigenvalues of a class of large dimensional random matices. Journal of Multivariate Analysis, 54(2): 175-192, 1995
4 Exericses 1. Verified that Silverstein’s sample covariance matrix theorem can be inferred directly from the MarˇcenkoPastur theorem. Note: You will have to do some substitution tricks to get the parameter “c” to refer to the same quantity. 2. Derive the moments of the Wishart matrix from the observation that as c → 0, the density becomes “approximately” semi-circular. Hint: you will have to make an approximation for the region of support while remembering that for any a < 1, a2 < a. There will also be a shifting and rescaling in this problem to get the terms to match up correctly. Recall that the moments of the Wishart matrix are expressed in terms of the Narayana polynomials in (34). 3. Come up with numerical code to compute the theoretical density when dH(τ) has three atomic masses (e.g. dH(τ) = 0.4 δ(τ − 1) + 0.4 δ(τ − 3) + 0.2 δ(τ − 7)). Plot the limiting density as a function of x for a range of values of c. 4. Do the same when there are four atomic masses in dH(τ) (e.g. dH(τ) = 0.3 δ(τ − 1) + 0.25 δ(τ − 3) + 0.25 δ(τ − 7) + 0.25 δ(τ − 10)) . Verify that the solution obtained matches up with the simulations. Hints: Do all the roots match up? 5. What happens if there are atomic masses of negative weight in dH(τ) (e.g. dH(τ) = 0.5 δ(τ + 1) + 0.5 δ(τ − 1)). Does the limiting theoretical density line up with the experimental results? Check the assumptions of the Marˇcenko-Pastur theorem! Is this “allowed”? References [1] Sang II Choi and Jack W. Silverstein. Analysis of the limiting spectral distribution of large dimensional random matrices. volume 54(2), pages 295–309. 1995. [2] Boris A. Khoruzenkho, A.M. Khorunzhy, and L.A. Pastur. Asymptotic properties of large random matrices with independent entries. Journal of Mathematical Physics, 37:1–28, 1996. [3] L. Pastur. A simple approach to global regime of random matrix theory. In Mathematical results in statistical mechanics, Marseilles, 1998, pages 429–454. World Sci. Publishing, River Edge, NJ, 1999. [4] Madan Lal Mehta. Random Matrices. Academic Press, Boston, second edition, 1991. [5] M. Abramowitz and I.A. Stegun, editors. Handbook of Mathematical Functions. Dover Publications, New York, 1970. [6] N. I. Akhiezer. The classical moment problem and some related questions in analysis. Hafner Publishing Co., New York, New York, 1965. Translated by N. Kemmer. [7] A. M. Tulino and S. Verd´u. Random matrices and wireless communications. Foundations and Trends in Communications and Information Theory, 1(1), June 2004. [8] Eugene P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Math., 62:548–564, 1955. [9] V.A. Marˇcenko and L.A. Pastur. Distribution of eigenvalues for some sets of random matrices. Math USSR Sbornik, 1:457–483, 1967. [10] Jack W. Silverstein. Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. Journal of Multivariate Analysis, 55(2):331–339, 1995. [11] Z. D. Bai and J. W. Silverstein. On the empirical distribution of eigenvalues of a class of large dimensional random matices. Journal of Multivariate Analysis, 54(2):175–192, 1995