Communication Theory Capacity of Multi-antenna Gaussian Channels* Lucent Technologies Bell Laborato EMRE TELATAR venue.Murtay Hill.NI 07974.USA Abstr et We in the nd/or receiving a nas for single use entia a sys 1 INTRODUCTION We will consider a amiliar co ha nnas by t and the entries form an i.id.Gaussian reaand imaginary ar with varianc ector ye depends on the transmitted .This choice modes y=Hx+n for each sow o the or.the cha t.The rans mitter is constrained in its total power to p g PRELIMINARIES con imaginary parts,= us,to specif everceri for the matix 1.H is deterministic. 网∈R2 and[-E阅(R-E[衡门∈R2ax2 nnel corresponds to We will say that a complex Gaussian random vectorx is fH. agerecincoiareoMtcoaspondn时 3.matrix.but is fixed once itis chosen -1=8现8}o) Vol.10.No.6.November-December 1999 55
1 Communication Theory] Capacity of Multi-antenna Gaussian Channels* EMRE TELATAR Lueent Technologies Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA telatar@ 1ucent.com Abstract. We investigate the use of multiple transmitting andor receiving antennas for single user communications over the additive Gaussian channel with and without fading. We derive formulas for the capacities and error exponents of such channels, and describe computational procedures to evaluate such formulas. We show that the potential gains of such multi-antenna systems over single-antenna systems is rather large under independenceassumptions for the fades and noises at different receiving antennas. 1 INTRODUCTION We will consider a single user Gaussian channel with multiple transmitting andor receiving antennas. We will denote the number of transmitting antennas by t and the number of receiving antennas by r. We will exclusively deal with a linear model in which the received vector y E Cr depends on the transmitted vector x E ([I via y=Hx+n (1) where H is a r x t complex matrix and n is zero-mean complex Gaussian noise with independent, equal variance real and imaginary parts. We assume E[nnt] = Ir, that is, the noises corrupting the different receivers are independent. The transmitter is constrained in its total power to P, E[xtx] 5 P. Equivalently, since xtx = tr(xxt), and expectation and trace commute, This second form of the power constraint will prove more useful in the upcoming discussion. We will consider several scenarios for the matrix H: 1. H is deterministic. 2. H is a random matrix (for which we shall use the notation H), chosen according to a probability distribution, and each use of the channel corresponds to an independent realization of H. 3. H is a random matrix, but is fixed once it is chosen. 'Invited paper The main focus of this paper in on the last two of these cases. The first case is included so as to expose the techniques used in the later cases in a more familiar context. In the cases when H is random, we will assume that its entries form an i.i.d. Gaussian collection with zero-mean, independent real and imaginary parts, each with variance 1/2. Equivalently, each entry of H has uniform phase and Rayleigh magnitude. This choice models a RayIeighfading environment with enough separation within the receiving antennas and the transmitting antennas such that the fades for each transmitting-receiving antenna pair are independent. In all cases, we will assume that the realization of H is known to the receiver, or, equivalently, the channel output consists of the pair (y, H), and the disfribution of H is known at the transmitter. 2 PRELIMINARIES A complex random vector x E C? is said to be Gaussian if the real random vector d E R2" consisting of its real and imaginary parts, 2 = [!lit(')], 3m(x) is Gaussian. Thus, to specify the distribution of a complex Gaussian random vector x, it is necessary to specify the expectation and covariance of 3, namely, E[2] E El2" and E[(d- E[2])(2- E[IZ])'] E R2nx2n. We will say that a complex Gaussian random vector x is circularly symmetric if the covariance of the corresponding d has the structure Vol. 10, No. 6, November-December 1999 585
E.Telatar for some Hermitian non-negative definite oeC*.Note that the real part of an Hermitian matrix is symmetric aring in (3)is real and symmetric.Ia this case(). T.o(x)=det()-exp(-()-1(i-A)) dom vector is specifed by =det(Q)-exp(-(x-u)'Q-(x-p)) Qhg5ypyoaoprcunt where the second equality follows from(4d)-(4h).The dif. Ha)=Ea【-log7e(xl logdet(Q)+(loge)E[x'Qx emma 1.The mappings→2=[and A→A =logdet()+(loge)tr(ExxO) =logdet(o)+(loge)tr(n have the following properties: logdet(reo). C=AB=→C=AB (4a) ortance of the circular C=A+B=C=A+月 14b】 metric complex Gaussians are entropy maximizers C=At→C= (4e) Lemma 2.Suppose the complex random vector x EC is C=A-1 =A- (4d) zero-mean and satisfies Elxx]=Q.i.e.Elx:=Qin det()=det(A)=det(AA") 4e} gdetreowhenhe es( t=x+y的主=+0 (4D y=Ax=A (4g e('y)= Eixx']=Q. (4h) 70(x)=det(*Q)-exp(-'-z) de闭=a(6可6-) o( (p)-H(Q)=-Epllogp(x)+Eollogra(x)] =det(A)det(A)'. =-E,los8】 (4f)(4g)and (4h)are immediate. ≤0 with eqaity only ifp=Yo.Thus(p)≤H(e.□ Proof.UTU =I(0)'0=h=ha doie then 的1]=E[x门t=i0t=R where K=AQA!. and (h) Lemma 4.Ifx andy are independent circularly symunetric xOx=阴c:0)=-O:>0 586
E. Telatar for some Hermitian non-negative definite Q E rc""". Note that the real part of an Hermitian matrix is symmetric and the imaginary part of an Hermitian matrix is antisymmetric and thus the matrix appearing in (3) is real and symmetric. In this case Z[(X - E[x])(x - !€[XI)+] = Q, and thus, a circularly symmetric complex Gaussian random vector x is specified by prescribing '€[XI and E[ (x - qXl)(x- mI,t]. For any z E C" and A E [Pxm define Lemma 1. The mappings z -+ 2 = [;:',:;] and A + A = Rc(A) -lim(A) have the following properties: Proof: The properties (4a), (4b) and (4c) are immediate. (4d) follows from (4a) and the fact that 1, = 12,. (4e) follows from det(A) =det ([A :]A [* I -il [I) = det ( [la A) :*I) = det(A) det(A)' , (4f), (4g) and (4h) are immediate. Corollary 1. (I E (TX" is unitaoifa?zd only $0 E PZnxZn is orthonormal. Corollary 2. /fQ E lPx" is non-negative definite then so is Q p 211 x Zn The probability density (with respect to the standard Lebesgue measure on C') of a circularly symmetric complex Gaussian with mean p and covariance Q is given by Y~,Q(~) = det(nQ)-'/'exp(-(f - fi)tQ-'(2- fi)) = det(rQ)-' exp (-(x- p) 'Q-'(x - p)) where the second equality follows from (4d)-(4h). The differential entropy of a complex Gaussian x with covariance Q is given by H(7Q) = !-?%a [-logYQ(x)l = logdet(nQ) + (loge) 'E[xtQ-'x] = logdet(nQ) + (Ioge)tr( '€[xxt]Q-') = logdet(nQ) + (loge) tr(1) = log det (nee). For us, the importance of the circularly symmetric complex Gaussians is due to the following lemma: circularly symmetric complex Gaussians are entropy maximizers. Lemma 2. Suppose the complex random vector x E ic is zero-mean and satisfies !E[xx'] = Q, i.e., E[x~x*.] J = Qij, 1 5 i, j 5 11. Then the entropy of x satisfies %(x) 5 logdet(neQ) with eqriafity ifand oniy ifx is ct circirlariy symmetric complex Gaussian with '€[xxt] = Q. Proof: Let p be any density function satisfying !E,[xix;] = Qi, 1 5 i,j 5 n. Let yp(x) = det(nQ)-l exp(-xtQ-'x). Observe that 'EyQ[~ixf] = Qij, and that logye(x) is a linear combination of the terms xi$. Thus qQ[log7~(x)] = Ep[logy~ (XI]. Then, @(PI - = - 2P[10gp(x)] f !-?%~[l~gTQ(~)l L 0, with equality only if p = 7p. Thus @(p) 5 H(yp). 0 Lemma 3. lf x E C' is a circularly symmetric complex Gaussian then so is y = Ax for any A E Pxn. Proof: We may assume x is zero-mean. Let Q = ~[xxt]. Then y is zero-mean, 3 = A%, and qyjq =A 55"BBtjAf = $AQ$ - = ZK ' ,. where K = AQA'. 0 Lemma 4. lfx and y are independent circiilarly symmetric coii~plex Gairssians, theti z = x i- y is a circ~rlnrly symtnetric cornplL.r Gaussian. Pmlf: Let A = E[xxt] and B = E[yy"]. Then '€[%?I = gc with C = A + 5. 0
Capacity of Multi-antenna Gaussian Channels 3 THE GAUSSIAN CHANNEL WITH FIXED Remark i (Recinrocity)Since the non-zer alues o TRANSFER FUNCTION HHare the same as those of HH we that th ies of channels corresponding toH and Hare the same Ve will sta Example I.Takeii=1 for all i.i.We can write H as 「可m H、 (m项.网 3.1 CAPACITY mg C=log(1+rp) H=UDVt The x=V&that achieves this capacity satisfies Elx x]= where UE and VE are unitary.andDeis Pkfora,ie,thetransmiersarealsendingthes ct.th agonal entries o at the re the colum ns of eigen Thus.we are uncorrelated the overall signal to noise ratio is prt. y=UDVx+n. Example 2.Taker=t=n and H=I.Then Let y=Uty,=V'x,n=U'n.Note that U and V are in- C=nlog(+P/n) channe For x that achieves this c apacity Efx x'l =P/n.i.e.the s of x are ii.d.How er,it is in rect to in 5=D龙+i 5) ssian.with ind the capcity of be achie ed by splitting the Since is of rank at most min),at most min of the ing these schemes sepa ately,and then sending the mod noting these by A d signals over the differe N:b ate them into =A入2+元,1≤i≤min{ct, and the rest of the co the corresp ding nd of these oba ty of erro ,r}is trans signal an ed to 3.2 ALTERNATIVE DERIVATION OF THE CAPACIT 0c(的=E0m内=- The mutual information I(x:y)can be written as where I(x:y)=(y)-H(ylx)=(y)-(n) d thus maximizing I(x:y)is lent to maximiz P=(-A)t C(u)=∑(a(uA) oatoehcov9aCe2Ei=Q,hen Vol.1.N.6.November-ecember 1999
Camcity of Multi-antenna Gaussian Channels 3 THE GAUSSIAN CHANNEL WITH FIXED TRANSFER FUNCTION We will start by reminding ourselves the case of deterministic H. The results of this section can be inferred from [I, Ch. 81 3.1 CAPACITY We will first derive an expression for the capacity C(H, P) of this channel. To that end, we will maximize the average mutual information f(x;y) between the input and the output of the channel over the choice of the distribution of x. By the singular value decomposition theorem, any matrix H E C’“‘ can be written as H = UDVt where U E Crxr and V E CXr are unitary, and D E Rrx‘ is non-negative and diagonal. In fact, the diagonal entries of D are the non-negative square roots of the eigenvalues of HHt, the columns of U are the eigenvectors of HHt and the columns of V are the eigenvectors of HtH. Thus, we can write (1) as y = UDV~X -+ n. Let 7 = Uty, % = Vtx, ii = Utn. Note that U and V are invertible, R has the same distribution as n and, E[ttt] = ‘€[xtx]. Thus, the original channel is equivalent to the channel Y=D%+ii (5) where R is zero-mean, Gaussian, with independent, identically distributed real and imaginary parts and E[fiiit] = I,. Since H is of rank at most min{ r,t}. at most min{ r, t} of the i = 1,. . ., min{r,r}, we can write (5) component-wise, to get singular values of it are non-zero. Denoting these by Xi 112 , ji = Xi 112 i;+iii, I _ min{t, r} is independent of the transmitted signal and that l; for i > min{t,r} don’t play any role. To maximize the mutual information, we need to choose {Ti : 1 5 i 5 min(r, t}) to be independent, with each 5; having independent Gaussian, zero-mean real and imaginary parts. The variances need to be chosen via “water-filling” as where p is chosen to meet the power constraint. Here, u+ denotes max{O,a}. The power P and the maximal mutual information can thus be parametrized as Remark I (Reciprocity). Since the non-zero eigenvalues of H’H are the same as those of HHt, we see that the capacities of channels corresponding to H and Ht are the same. Example 1. Take Hij = 1 for all i, j. We can write H as and we thus see that in the singular value decomposition of H the diagonal matrix D will have only one non-zero entry, fi. (We also see that the first column of U is *[I,. . ., 11’ and the first column of V is m[1,. ., I]+.) Thus, C=log(l+rtP). The x = V# that achieves this capacity satisfies E[xixf] = P/t for all i, j, i.e., the transmitters are all sending the same signal. Note that, even though each transmitter is sending a power of P/t, since their signals add coherently at the receiver, the power received at each receiver is Pt. Since each receiver sees the same signal and the noises at the receivers are uncorrelated the overall signal to noise ratio is Prt. Exarnple2. Taker=t=nandH=I,.Then C=nlog(l+P/n) For x that achieves this capacity !E[xixf] = G;jP/n, i.e, the components of x are i.i.d. However, it is incorrect to infer from this conclusion that to achieve capacity one has to do independent coding for each transmitter. It is true that the capacity of this channel can be achieved by splitting the incoming data stream into t streams, coding and modulating these schemes separately, and then sending the t modulated signals over the different transmitters. But, suppose Nt bits are going to be transmitted, and we will either separate them into t groups of N bits each and use each group to select one of 2N signals for each transmitter, or, we will use all all Nt bits to select one of 2” signal vectors. The second of these alternatives will yield a probability of error much smaller than the first, at the expense of much greater complexity. Indeed, the log of the error probability in the two cases will differ by a factor oft. (See the error exponents of parallel channels in [I, pp. 149-1501.) 3.2 ALTERNATIVE DERIVATION OF THE CAPACITY The mutual information I(x;y) can be written as I(x;Y) = d(y) -H(ylx) = H(Y) -H(n), and thus maximizing I(x;y) is equivalent to maximizing g(y). Note that if x satisfies Z[xrx] 5 P, so does x- ?€[XI, so we can restrict our attention to zero-mean x. Furthermore, if x is zero-mean with covariance ‘€[xxt] = Q, then Vol. 10, No. 6. November-December 1999 557
E.Telatar where the random cding exo(R)is given by an,which is the cas E风=哭op-pR mas 3 and 4).So.we can further restrict our attention to where,in turn,Enle)is given by the supremum over all input distributions gx satisfying the energy constraint of I(x:y)=logdet(l,+HQH)=logdet(+) olep,9d=-iogj[gl树a”d ality follo to choose maxim some th ion yo we get (atte quantity log det(+HH will occur in this note fre Eofp.Q)plogdet(+(1+)HOH) quently enough that we will let =p(1+p),H) (Q,H)logdet(1+HQH') same smaximizing C((+D).H) A=diag(A tion conce det(,+HQH')det(+UQUA) n uppe bound to the probability of error. equally well 0.Note also 4 THE GAUSSIAN CHANNEL WITH RAY- LEIGH FADING deU,+A/0A/23)≤Π1+2) Suppose now that the matrix H is not fixed.but is a ran the t tter.The e is thus with inputand outpu can be found be w me that the 2n=(),i=1. with in entrealandimigid each with ya try of H has unifor R expected m nitudes ual to ur ity.This is inte 0 ∑(Iog(rA) h enough physic e independ ce in the entric s ofH.We wil as before 3.3 ERROR EXPONENTS Lemma 5.Suppose H is Gaussian m CmlnwihindCpendCtealandinaginaygatswihzC capacity.Eror exponents provide a par engtd ate The upper o und is known as the ran- dom coding bound and is given by of this to G". s of Hare indep lent,the P(error)exp(-nE,(R)), 588 ETT
E. Telatar y is zero-mean with covariance ‘€[yyt] = HQHt + lr, and by Lemma 2 among such y the entropy is largest when y is circularly symmetric complex Gaussian, which is the case when x is circularly symmetric complex Gaussian (Lemmas 3 and 4). So, we can further restrict our attention to circularly symmetric complex Gaussian x. In this case the mutual information is given by Z(x;y) = logdet(lr + HQHt) = logdet(1, + QHtH) where the second equality follows from the determinant identity det(l+AB) = det(l+BA), and it only remains to choose Q to maximize this quantity subject to the constraints tr(Q) < P and that Q is non-negative definite. The quantity logdet(l+ HQH’) will occur in this note frequently enough that we will let P(Q, H) = logdet(l+ HQH’) to denote it. Since HtH is Hermitian it can be diagonalized, HtH = U’AU, with unitary U and non-negative diagonal A = diag( X 1 . . . A,). Applying the determinant identity again we see that det(lr+HQHt) = det(l, + il’/2UQUtil’/2). Observe that Q = UQUt is non-negative definite when and only when Q is, and that tr(Q) = tr(Q); thus the maximization over Q can be carried equally well over Q. Note also that for any non-negative definite matrix A, det(A) 5 ni Aii, thus det(l, + A’/zQA’’z) 5 n( 1 + &Xi) i with equality when Q is diagonal. Thus we see that the maximizing Q is diagonal, and the optimal diagonal entries can be found via “water-filling” to be 0;; = (p - A;’)+, i = I,. . .,t where p is chosen to satisfy xi Qii = P. The corresponding maximum mutual information is given by i as before. 3.3 ERROR EXPONENTS Knowing the capacity of a channel is not always sufficient. One may be interested in knowing how hard it is to get close to this capacity. Error exponents provide a partial answer to this question by giving an upper bound to the probability of error achievable by block codes of a given length n and rate R. The upper bound is known as the random coding bound and is given by ?‘(error) 5 exp(-nE,.(R)), where the random coding exponent E,(R) is given by where, in turn, Eo(p) is given by the supremum over all input distributions qx satisfying the energy constraint of In our case p(ylx) = det(Tl,)-’exp(-(y-x)+(y-x)). If we choose qx as the Gaussian distribution ye we get (after some algebra) The maximization of Eo over Q is thus same same problem as maximizing the mutual information, and we get Eo(p) = To choose qx as Gaussian is not optimal, and a distribution concentrated on a “thin spherical shell” will give better results as in [l, $7.3]-nonetheless, the above expression is a convenient lower bound to EO and thus yields an upper bound to the probability of error. PC(P/(l +P)IH). 4 THE GAUSSIAN CHANNEL WITH RAYLEIGH FADING Suppose now that the matrix H is not fixed, but is a random matrix H independent of both x and n. The realization H of H is assumed to be known at the receiver, but not at the transmitter. The channel is thus with input x and output (y,H) = (Hx+ n,H). We will assume that the entries of H are independent and each entry is zero-mean, Gaussian, with independent real and imaginary parts, each with variance 1/2. Equivalently, each entry of H has uniformly distributed phase and Rayleigh distributed magnitude, with expected magnitude square equal to unity. This is intended to model a Rayleigh fading channel with enough physical separation within the transmitting and the receiving antennas to achieve independence in the entries of H. We will first show that such an H is invariant under unitary transformations. Lemma 5. Suppose H E Vx‘ is a complex Gaussian matrix with independent identically distributed entries, each entry with independent real and imaginaryparts with zeromemi and equal variance. Then for any unitary U € Cxr, and V E Cx‘, the distribution of UHVt is the same as the distribrrtion H. Prooj? It suffices to show that G = UH has the same distribution as H. The lemma then follows from an application of this to Gt. Since columns of H are independent. the columns of G are independent also. It remains to check 588 ETT
Capacity of Multi-antenna Gaussian Channels s that of optimal e must f the f It is clear tha at the max Egg】=U Eh hlut=Ehh] P/.To summarize,we have shown the following: y ee皓 Theoreml,hecapacihafthctenclitnactiwiiwe n thie ea that the channel is pendent realiza of H is d case we are on f are valid ver in the limit of large r equals vever,the lts that follow rlog(1+P). (6 pacity 4.2 EVALUATION OF THE CAPACITY 4.1 CAPACITY Since the receiver knows the realization of H the chan- The mutual output is the det(+(P/t)H)=det((P/t)H' Ix:y,H)=Ix:H刊+IxyH and define =I(x:y H) 「trct =Ea(I(x:yi旧=H w- 1HtHr≥, n=maxr}andm三minr小.Then W isan mxmra mizes/( H)is the cir rly symmetric values.We can ite the capacity in termso We thus need to maximize (Q)=(,H)]=E[logdet(/,+HQH')] E[2+m e eigenvalues is known to be (see e.g.[2]or [3,p.371) 8ghWimiendDisoaegaead on)=KⅡeAmΠ- (Q)=E[logdet(I,+(HU)D(HU)')] A12.2m20 whereis a normalizing factor.The unordered eigen diagonal Give any such values then have the density mA=nmKΠey-夏A-护 itive definiter es.o The expectation we wish to compute [s+(-三es+al 0=20 =mE[og(1+(P/)A】 Vol.No.6 November-December 589
Capacity of Multi-antenna Gaussian Channels that each column of G has the same distribution as that of H. Since the columns of H are circularly symmetric complex Gaussian vectors, so are those of G. If g, and hi are the fh column of G and H respectively, then where the last equality holds because E[hjh) is a multiple In this section we will assume that the channel is memoryless: for each use of the channel an independent realization of H is drawn. In this case we are on familiar ground and the capacity can be computed as the maximum mutual information. However, the results that follow are valid verbatim for channels for which H is generated by an ergodic process: as long as the receiver observes the H process only the first order statistics are needed to determine channel capaci ty. of the identity matrix. 0 4.1 CAPACITY Since the receiver knows the realization of H, the channel output is the pair (y,H) = (Hx+n,H). The mutual information between input and output is then I(x;(y,H)) = I(x;H) + I(x;ylH) = I(x;ylH) = E~[l(x;y(H= H)]. We know from the previous section that if x is constrained to have covariance Q, the choice of x that maximizes I(x;y(H = H) is the circularly symmetric complex Gaussian of covariance Q, and !P(Q, H) = logdet(I, + HQHt) is the corresponding maximal mutual information. We thus need to maximize over the choice of non-negative definite Q subject to Since Q is non-negative definite, we can write it as Q = U5Ut where U is unitary and D is non-negative and diagonal. With this substitution S(Q) = '€[logdet(I,+ (HU)D(HU)t)] By Lemma 5 the distribution of HU is the same as that of H, and thus S(Q) = !P(D). We can thus restrict our attention to non-negative diagonal Q. Given any such Q and any permutation matrix n, consider Q" = 17Qllt. Since HI7 has the same distributionas H, @(en) = !P(Q). Note that for any H, the mapping Q ++ I, + HQHt is linear and preserves positive definiteness. Since logdet is concave on the set of positive definite matrices, Q c) #(el H) = logdet(l,+HQHt) is concave. It then follows that Q i-f S(Q) is concave. Thus tr(Q) 5 P. 1 t! Q= -CQ" satisfies !P(Q) 2 !P(Q) and tr(0) = tr(Q). Note that Q is a multiple of the identity matrix and we conclude that the optimal Q must be of the form d. It is clear that the maximum is achieved when a is the largest possible, namely f/t. To summarize, we have shown the following: Theorem 1. The capacity ofthe channel is achieved when x is a circularly symmetric complex Gaussian with zeromean and covariance (P/t)Ig. The capacity is given by E[logdet (I, + (P/t)HHt)]. Note that for fixed r, by the law of large numbers SHHt + I, almost surely as t gets large. Thus, the capacity in the limit of large t equals 4.2 EVALUATION OF THE CAPACITY Although the expectation"E[log det (I,+ (P /r)HHt)] is easy to evaluate for either r = 1 or t = 1, its evaluation gets rather involved for r and t larger than 1. We will now show how to do this evaluation. Note that det(I,+ (P/t)HHt) = det(I, + (P/t)HtH) and define W={ HH~ r<t H~H rzt, n = max{r,t} and m = min{r1t}. Then W is an m x m random non-negative definite matrix and thus has real, nonnegative eigenvalues. We can write the capacity in terms of the eigenvalues XI , . . . , Am of W: rm 1 EIClog(l+ (P/r)Xi) (7) i= I The distribution law of W is called the Wishart distribution with parameters m, n and the joint density of the ordered eigenvalues is known to be (see e.g. [2] or [3, p. 371) where Km,n is a normalizing factor. The unordered eigenvalues then have the density The expectation we wish to compute E[ i= 2 1 log( 1 + (P/t)Ai) Im = i= C 1 '€"log = m E[Iog( Vol. 10, No. 6, November-December 1999 589
E.Telatar integrate out the2 PA,(A)=pAA,AdA2d以n 0 and we can write pa as 101520 201510 pa()=(mK)-det(D(.)2 Figure Capaciry (in nas)vs.randforP=20dB. ΠA"e -Cxm-1) With row operations we can transformD()int 「p1(A).p1(Am)门 =ag D(ML.m)= where the equality follows from the fact that if Lw(Ai) a=fori≥2 then o1-also((since bothand月 1.入.12,Am-1 now that the Gram- in the space of real valued functions with inner product +1(=(÷可LgA.k=0m-i (f.g)=f(A)g(A)x-me-AdA. To summarize: Theorem 2.The of the channe ansmit de0aAn》=-ID. +)网 =E(-1)PeR)I() [Lg-m(X)]2x-me-dx (8) where the summation pa(A,Anl=Cn∑(-l)erfae× xΠa,(Apa,(AMA-me-. Notin thapplicatio of ()yields the capacity as Integrating over2 we get T而g1os1+PWN-。-*d 9) (Ai)=C(() k-mga the tha 590
E. Telatar depends only on the distribution of one of the unordered eigenvalues. To compute the density of XI we only need to integrate out the X2, . . . , px, (XI ) = 1. . ./px (XI, . . . , Art) dXz. .dX,. To that end, note that ni, j( Xi - Xi) is the determinant of a Vandermonde matrix and we can write px as I With row operations we can transform D(A1,. . .,A,) into (FI(AI) '.' PI(X,n) 't?iii(X~) . . . Pm(X,n) where PI,. . . , p," is the result of applying the GramSchmidt orthogonalization procedure to the sequence 1, A, A', . . . , A'"-' in the space of real valued functions with inner product (f,g) = ~wf(A)g(X)X"-me-* dX. Thus SOm~,(X)3,(X)Xn-me-XdX = d,. The determinant of D then equals (modulo multiplicative constants picked up from the row operations) the determinant of d, which in turn, by the definition of the determinant, equals det(D(A1, . -A,)) = ~(-l)"~"'nd,i n i where the summation is over all permutations of { 1,. . . m}, and per( a) is 0 or 1 depending on the perrnutation a being even or odd. Thus px(A1, . ,A,) =C,C(-l)per(n)+per(p) x a ,P x n'Fn, (A,)'Fpt (X;)X;-"e-XI. px, (XI) =c, C(-l)pr(Ly)+Fr(P) 'Fn,(Xl)P/4(/\I) i Integrating over . . , A, we get 0 ,i.1 Au-m -A1 I e fl[L,A ill c 8o t 63.4 - 501 - 38 . 25.3 I0 60 - 50 - 40 - 30 - 20 - - 0 c0 J _- -_ 10 'OW5 t l5 20 20 Figure 1: Capacity (in nats) vs. r and t for P = 20dB. = Cm,(m- I)! Cpi(~,)2~;-me-x~ m- i= I where the second equality follows from the fact that if oi = j;l; for i 2 2 then NI = PI also (since both Q and p are permutations of { 1, . . . , m}) and thus cr = ,d, and the last equality follows from the fact that pi(Xl)2Ay-'ne-xl integrates to unity and thus C,l must equal l/m!. Observe now that the Gram-Schmidt orthonormalization yields 112 pk+l(~) = [-I L;-"I(X), k = 0,. . .,m- 1 where L!-"'(.r) = $~r~"-" $ ( e-r-fln-m+k ) is the associated Laguerre polynomial of order k. (See [4, $8.90.8.971.) Theorem 2. The capacity of the channel with t transmitters and r receivers under power constraint P eqiials To summarize: S,-=log( 1 + f A) '5' k! k=O (k+n-m)! [L;-"'(X)]2X"-"e-~ dX (8) where m = min{r, t}, ti = max{ r, t}, and Lf are the associcited Luguerre polynomiuls. Figure 1 shows the value of the integral in (8) for 1 5 r, f _< 20 and P = 20dB. Example 3. Consider t = 1. In this case m = 1 and n = r. Noting that L;-'"(A) = 1, an application of (8) yields the capacity as (9) Note that as r gets large, so does the capacity. For large r, the capacity is asymptotic to log( I + Pr), in the sense that the difference goes to zero. 590 ETT
Capacity of Multi-antenna Gaussian Channels For the case under consideration,mn,for which =0v=4 and C 40 ==ios1+pmh月-dw 120 100 -gP-1++2a 80 The ftherpoind out er 60 40 20 0 20 Figure 2:Cupacity vs.forand various values ofP. hog(1+P/)du (10) 0 abo Asnoted in(),the approaches o(P)asgets matix w,and the cap ds ple 5.Co se n=m=r,and an of achanne Thus if C(,P)der mitter power P.then og1+PA/nd. C(a,b,Pb)=C(b,a,Pa) (11) ned the d yiheap density ofm C=(1+Pm)mdFiw(u) pA( m! der(D(A,A)D(A,A)】 cnofcgmee where =the number of eigenvalues of A less than D(A,)= A very general result from the theory of random matri- Lom(Ai).sm(Ae)] ces(see 4.3 ERROR EXPONENTS wu,{g-for ve dv hat end,note first tha otherwise, 2aaal=-los∬[树ptHw*wa女sah with=()2.Thus,in the limit of largerandt 月→ros(+Ve-1-w 1+ (13)Eol.)=-log dx dy. Vol.10.No.6.November-December 1999 591
Capacity of Multi-antenna Gaussian Channels The value of the capacity (in nats) as found from (I 1) vs. r for OdB 5 P 5 35dB in 5dB increments. C / ,' 140 - 120- 0 5 10 15 20 Figure 2: Capacity vs. r for r = t and various values of P. Example 4. Consider r = 1. As in the previous example, applying (8) yields the capacity as As noted in (6), the capacity approaches log( I +P) as t gets large. E.ramp1r 5. Consider r = t. In this case n = m = r, and an application of (8) yields the capacity as r- I lalog(l +PX/r) c Lk(X)'e-AdX, (11) k=O where Lk = L: is the Laguerre polynomial of order k. Figure 2 shows this capacity for various values of r and P. It is clear from the figure that the capacity is very well approximated by a linear function of r. Indeed, first rewrite (7) as where FA (x) is the empirical distribution of the eigenvalues of an m x m Hermitian matrix A: the number of eigenvalues of A less than x m FA(.r) = A very general result from the theory of random matrices (see, e.g., [5]) says that for W defined as above, as n = max { r, t} and m = min{ r, t} are increased with n/m approaching a limit T 2 1, otherwise, (12) dv with v+ = (fik 1)'. Thus, in the limit of large r and t, For the case under consideration, m = n = r = t, for which Y- = 0, Y+ = 4, and c4 11 r-w lim - r = 1 log( 1 + PY) - lr /zdu 1 + 2 tanh-' ~ dm- 1 = IogP- 1 + 2P Jm' The closed form of the integral was pointed out to me recently by [6]. Remark 2. The result from the theory of random matrices used in Example 5 applies to random matrices that are not necessarily Gaussian. For equation (13) to hold it is sufficient for H to have i.i.d. entries of unit variance. Remark 3. The reciprocity property that we observed for deterministic H does not hold for random H: Compare Examples 3 and 4 where the corresponding H's are transposes of each other. In Example-3, capacity increases without bound as r gets large, whereas in Example 4 the capacity is bounded from above. Nonetheless, interchanging r and t does not change the matrix W, and the capacity depends only on P/t and the eigenvalues of W. Thus, if C(r,t,P) denotes the capacity of a channel with r receivers, t transmitters and total transmitter power P, then C(a,b,Pb)= C(b,a,Pa). Remark4. In the computation preceding Theorem 2 we obtained the density of one of the unordered eigenvalues of the complex Wishart matrix W. Using the identity (19) in the appendix we can find the joint density of any number k of unordered eigenvalues of W: 4.3 ERROR EXPONENTS As we did in the case of deterministic H we can compute the error exponent in the case of fading channel. To that end, note first that Since H is independent of x, p(y, Hlx) = p ~ ( H ) p ( y l x , H ) and thus Vol. 10. No. 6, November-December 1999 59 1
E.Telatar Note that p(ylz,H)=det(l,)-'exp(-(y-Hx)'(y-Hx)). ake ou de length.omeer ble of su rany rate less than R 6for al but aset of Eo(e,Yo)=log Eldet(+(1+)-HOH) Noting that A-det(A)is a convex function,the argu ment we used previously to show that=(P/maxi Pou(R,P)= 器e< 15) formation applies to maximizing Eo as where 业(O.HD=logdet1+HOH) =-o+nr)门间 will tak lue density as a v ith independen 1/2 ,8)=fNg(AAm(1+阿eAd P(logdet(l,+HPH)<R)=P(log(1+PHH)<R) Temrtipticecefacorpnctkapiateoahoaoamaia Since H'H variable with 2r deg ees of free dom and mean we can compute the outage probability As before.the striction of ssopii.ischo2eOea9snpieepresioa as ✉R=2-/ r() (16 5 NON-ERGODIC CHANNELS where()du is the incomplete gamma function.Let)be the value of R that satisfies P(V(PH)<R)=E (17) Figure3 shows)asafunction offor various values f H.This iso hen the maximum mutua of e and ne capacity the chann Note that by thane as that (UgU,田 st.for th has the same distribution.H).By choosing U to conie case when the ent ries of H are Conjecture.The optimal o is of the form in the previous section diag(100 5.1 CAPACITY mek∈{I on the rat 592 ett
E. Telatar Note that p(yJx, H) = det( rIr)-' exp( - (y - Hx)~ (y - Hx)) . and for qx = YQ, the Gaussian distribution with covariance Q, we can use the results for the deterministic H case to conclude Noting that A + det(A)-P is a convex function, the argument we used previously to show that Q = (P/t)I, maximizes the mutual information applies to maximizing Eo as well. and we obtain P HHt) -'] . (14) To efficiently compute Eo, one would represent the Wishart eigenvalue density as a Vandermonde determinant, (just as in the previous section), and orthonormalize the monomials 1, A, X2,. . . , A'"-', with respect to the inner product The multiplicative factor picked up in the orthonormalization is the value of the expectation in (14). As before, the restriction of qx to Gaussian distributions is suboptimal, but this choice leads to simpler expressions. 5 NON-ERGODIC CHANNELS We had remarked at the beginning of the previous section that the maximum mutual information has the meaning of capacity when the channel is memoryless, i.e., when each use of the channel employs an independent realization of H. This is not the only case when the maximum mutual information is the capacity of the channel. In particular, if the process that generates H is ergodic, then too, we can achieve rates arbitrarily close to the maximum mutual information. In contrast, for the case in which H is chosen randomly at the beginning of all time and is held fixed for all the uses of the channel, the maximum mutual information is in general not equal to the channel capacity. In this section we will focus on such a case when the entries of H are i.i.d., zero-mean circularly symmetric complex Gaussians with '€[lhi,12] = 1, the same distribution we have analyzed in the previous section. 5.1 CAPACITY In the case described above, the Shannon capacity of the channel is zero: however small the rate we attempt to communicate at, there is a non-zero probability that the realized H is incapable of supporting it no matter how long we take our code length. On the other hand one can talk about a tradeoff between outage probability and supportable rate. Namely, given a rate R, and power P, one can find Pout(R, P) such that for any rate less than R and any S there exists a code satisfying the power constraint P for which the error probability is less than 6 for all but a set of H whose total probability is less than Pou,(R, P): Pout(R,P) = inf P(!P(Q,H) < R) (15) Q:Q9 tr(Ql9 where !P(Q,H) = logdet(l,+HQHt). This approach is taken in [7] in a similar problem. In this section, as in the previous section we will take the distribution of H to be such that the entries of H are independent zero-mean Gaussians, ,each with independent real and imaginary parts with variance 1/2. Example 6. Consider t = 1. In this case, it is clear that Q = P is optimal. The outage probability is then !P(logdet(lr+HPHt) < R) = P(log(l+PHtH) <R) Since HtH is a y' random variable with 2r degrees of freedom and mean r, we can compute the outage probability as where r(a,x) = ~~u"'e-''du is the incomplete gamma function. Let $(P, c) be the value'of R that satisfies P(!P(P,H) 5 R) = 6. (17) Figure 3 shows +(P, E) as a function of r for various values of E and P. Note that by Lemma 5 the distribution of HU is the same as that of H for unitary U. Thus, we can conclude that @(uQutl H) has the same distribution as !P(Q,H). By choosing U to diagonalize Q we can restrict our attention to diagonal Q. The symmetry in the problem suggests the following conjecture. Conjecture. The optimal Q is of the form P -diag( 1,. . ., 1,0,. . ., 0) k - k ones t - k :enis for some k E { I, . . . , t}. The value of k depends on the rote: higher the rate (i.e. higher the outageprobability). smciller the k. 592 ETT
Capacity of Multi-antenna Gaussian Channels v Ea r.to =,2.3.4.5.6.7.8.9.i0nd100 2 10°+ 15 10 ! 10 0.5 0 10-3 0 0 020.4 (a)P=Od 0.6 0. (a)P=OdB 5 100 4 3 X 10-1 2 10-2 10-3 0 8 (b)P=15dB (b)P=15dB 100 10-1 103 8 10 (e)P=30dB Figure:Distribution of((P/t) tatistic with degrees of menthus decay faster. To minin the probal P(P0,≤=-1/D sted Figure 4 shows this distribution for various values oftand d 1be large rse,th han,sy the in Example3 let)be the value of P((P/4,H≤R)=e Wi re above, of t:if the ac smitters is,say.,then the orageprob bXh6ee他 available transmitters is always better than using a subset VoL.10.No.6.November-December1999 599
Capacity of Multi-antenna Gaussian Channels $(P,c) vs. rat t = 1 for various values of P and c. Recall that $(P,c) is the highest rate for which the outage probability is less than c. Each set of curves correspond to the P indicated below it. Within each set the curves correspond, in descending order, to e = 10-',10-2,10-~ , 10-4. 0 2 4 6 8 10 I (a) P = OdB 0 2 4 6 8 I0 (b) P = 15dB 0 2 4 6 8 10 (c) P = 30dB Figure 3: The €-capacity fort = 1 as defrned by (1 7). As one shares the power equally between more transmitters, the expectation of 9 increases, but the tails of its distribution decay faster. To minimize the probability of outage, one has to maximize the probability mass of 9 that lies to the right of the rate of interest. If one is interested in achieving rates higher than the expectation of 9, then it makes sense to use a small number of transmitters to take advantage of the slow decay of the tails of the distribution of 9. Of course, the corresponding outage probability will still be large (larger than 4, say). Example 7. Consider r = 1. With the conjecture above, it suffices to compute P(@( (P/t)Irl H) < R) for all values of t; if the actual number of transmitters is, say, T, then the outage probability will be the minimum of the probabilities for t = I,. . . , T. As in Example 6 we see that HHt is a l2 !P(P(P/t)I,H) 5 R) vs. R for various values of P and t. Each set of curves corresponds to the P indicated below it. Within each set, the curves correspond, in the order of increasing sharpness, to t = 1,2,3,4,5,6,7, 8,9, 10 and 100. 100 + 10-2 10-3 0 0.2 0.4 0.6 0.8 (a) P = OdB 10-2 lo-' A 0 1 2 3 4 (b) P = 15dB 0 2 4 6 8 (c) P = 30dB Figure4: Distribution ofP((P/t)I,H)for r = I. statistic with 2r degrees of freedom and mean t, thus Figure 4 shows this distribution for various values oft and P. It is clear from the figure that large t performs better at low R and small t performs better at high R, in keeping with , the conjecture. As in Example 3, let $(P, E) be the value of R satisfying Figure 5 shows $i(P, E) vs. t for various values of P and c. For the small E values considered in the figure, using all available transmitters is always better than using a subset. Vol. 10, No. 6, November-December 1999 593
E.Telatar ending order. n ror eac 04 .rae ve() 0.2 M≤ctl,m,aam=lM 0.1 X 0 ()=0B 8 7 CONCLUSION 15 ters can b stimated at the 0.5 sec ical 0 4 8 10 hrst requiremen mu riting of this 95.the ateniormioisrepacdwihthesmp tion of a slowly varying channel,see e.g.[8 APPENDIX 0 8 10 (⊙P=30uB Theorem.Given m finctionsorhonormal with respect to ()eiA)dF(X)=u Figure5:The -capaciry forr as defined by (18) 6 MULTIACCESS CHANNELS let Consider now a number of transmitters,say M,each De(.)= ng a pn(A).pm(A)】 The received y and Au()=Da(D(.).Then y=(H.Hv] de(eA,Aw》dF(A =(m-k+1)det((().(19) Then the (i j)th (A(X) (A)) cularly symmetric plex Gaus ian entri s of zero mean det(A(a,)》=∑(-1)T点1(A)(入a) 594 ETT
E. Telatar q(f,c) vs. t for various values of P and c. Recall that d( f,c) is the highest rate for which the outage probability remains less than e. Each set of curves corresponds to the P indicated below it. Within each set the curves correspond, in descending order, to c = lo-', 1o-*.l0-~, 10-4. '/ I- +-++ i i. -+ + I 0 i ! /I I I I I 0246810 (c) P = 30dB Fisirre 5: The c-cnpcicity for r = I, 0.7 defined by (18). 6 MULTIACCESS CHANNELS Consider now a number of transmitters, say M, each with t transmitting antennas, and each subject to a power constraint P. There is a single receiver with r antennas. The received signal y is given by where x, is the signal transmitted by the mth transmitter, n is Gaussian noise as in (l), and H, m = I,. . .,M are r x 1 complex matrices. We assume that the receiver knows ail the H,'s, and that these have independent circularly symmetric complex Gaussian entries of zero mean and unit variance. The multiuser capacity for this communication scenario can be evaluated easily by exploting the nature of the solution to the single user scenario discussed above. Namely, since the capacity achieving distribution for the single user scenario yields an i.i.d. solution for each antenna, that the users in the multiuser scenario cannot cooperate becomes immaterial. A rate vector (R1 . . . , RM) will be achievable if m i= I C~r~l< C(r, mt,rn~), for all m = 1,. . .,M where (RfI] ]. . . , R[MI) is the ordering of the rate vector from the largest to the smallest, and C(a, b, P) is the single user a receiver b transmitter capacity under power constraint P. 7 CONCLUSION The use of multiple antennas will greatly increase the achievable rates on fading channels if the channel parameters can be estimated at the receiver and if the path gains between different antenna pairs behave independently. The second of these requirements can be met with relative ease and is somewhat technical in nature. The first requirement is a rather tall order, and can be justified in certain communication scenarios and not in others. Since the original writing of this note in late 1994 and early 1995, there has been some work in which the assumption of the availability of channel state information is replaced with the assumption of a slowly varying channel, see e.g., [8]. APPENDIX Theorem. Given m fiinctions 'p I . . . , 'p , orthonormal 1 det(Ak(/\ I, . . . , Ad) dF(Ak) = (m- k+ 1) det((Ak-l(A1 ,. . , Ak-I)). (19) Proof: Let @(A) = [p!(A), .,'p,( X)lt. Then the (i,j)th element of AL(AI,. .,Ak) is @(Xi)t@(Aj). Note that JcD(X)~@(A)~F(X) =tnandJ@(X)@(X)tdF(X) = I,. By the definition of the determinant det(Ak(X1,. . . ,A,)) = C(-l)P"""'n~=I~(A,)t~(A,) CI 594 ETT