Poor, H.V., Looney, C.G., Marks Il,RJ, Verdu, S, Thomas, J.A., Cover, TM "Information Theory The electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRc Press llc. 2000
Poor, H.V., Looney, C.G., Marks II, R.J., Verdú, S., Thomas, J.A., Cover, T.M. “Information Theory” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
73 Information Theory 73. 1 Signal Detection General Considerations. Detection of Known Signals. Detection of Parametrized Signals. Detection of Random Signals. Deciding Among Multiple Signals. Detection of Signals in More General Noise Processes.Robust and Nonparametric Detection. Distributed and equential Detection. Detection with Continuous-Time Measurements 3.2 Noise Statistics of noise. Noise power Effect of linear transformations on Autocorrelation and Power Spectral Density.White, Gaussian and pink noise models Thermal noise as gaussian white noise Some Examples. Measuring Thermal Noise. Effective Noise and Antenna Noise. Noise Factor and Noise Ratio. Equivalent Input Noise.Other Electrical Noise. Measurement and Quantization Noise. Coping with Noise 73.3 Stochastic Processes Introduction to Random variables Stochastic Processes Classifications of Stochastic Processes. Stationarity of Processes H. Vincent poor Gaussian and Markov Processes. Examples of Stochastic Processes Linear Filtering of Weakly Stationary Processes. Cross-Correlation of Processes. Coherence. Ergodicity Carl G. Looney 3.4 The Sampling Theore niversity of Nevada The Cardinal Series. Proof of the Sampling Theorem. The Time- Bandwidth Product Sources of Error Generalizations of the J. Marks II Sampling Theorem niversity of washington 73.5 Channel Capacity Sergio Verdu Information Rates Communication Channels Reliable Information Transmission: Shannon,s Theorem Bandwidth and Capacity. Channel Coding Theorems Joy A. Thomas 73.6 Data Compression ntropy. The Huffman Algorithm. Entropy Rate. Arithmetic Thomas m. Cover Quantization and Vector Quantization. Kolmogorov Complexity Stanford University Data Compression in Practice 73.1 Signal Detection H. Vincent poe The field of signal detection and estimation is concerned with the processing of information-bearing signals for the purpose of extracting the information they contain. The applications of this methodology are quite broad, ranging from areas of electrical engineering such as automatic control, digital communications, image processing, and remote sensing, into other engineering disciplines and the physical, biological, and social c 2000 by CRC Press LLC
© 2000 by CRC Press LLC 73 Information Theory 73.1 Signal Detection General Considerations • Detection of Known Signals • Detection of Parametrized Signals • Detection of Random Signals • Deciding Among Multiple Signals • Detection of Signals in More General Noise Processes • Robust and Nonparametric Detection • Distributed and Sequential Detection • Detection with Continuous-Time Measurements 73.2 Noise Statistics of Noise • Noise Power • Effect of Linear Transformations on Autocorrelation and Power Spectral Density • White, Gaussian, and Pink Noise Models • Thermal Noise as Gaussian White Noise • Some Examples • Measuring Thermal Noise • Effective Noise and Antenna Noise • Noise Factor and Noise Ratio • Equivalent Input Noise • Other Electrical Noise • Measurement and Quantization Noise • Coping with Noise 73.3 Stochastic Processes Introduction to Random Variables • Stochastic Processes • Classifications of Stochastic Processes • Stationarity of Processes • Gaussian and Markov Processes • Examples of Stochastic Processes • Linear Filtering of Weakly Stationary Processes • Cross-Correlation of Processes • Coherence • Ergodicity 73.4 The Sampling Theorem The Cardinal Series • Proof of the Sampling Theorem • The TimeBandwidth Product • Sources of Error • Generalizations of the Sampling Theorem 73.5 Channel Capacity Information Rates • Communication Channels • Reliable Information Transmission: Shannon’s Theorem • Bandwidth and Capacity • Channel Coding Theorems 73.6 Data Compression Entropy • The Huffman Algorithm • Entropy Rate • Arithmetic Coding • Lempel–Ziv Coding • Rate Distortion Theory • Quantization and Vector Quantization • Kolmogorov Complexity • Data Compression in Practice 73.1 Signal Detection H. Vincent Poor The field of signal detection and estimation is concerned with the processing of information-bearing signals for the purpose of extracting the information they contain. The applications of this methodology are quite broad, ranging from areas of electrical engineering such as automatic control, digital communications, image processing, and remote sensing, into other engineering disciplines and the physical, biological, and social sciences. H. Vincent Poor Princeton University Carl G. Looney University of Nevada R. J. Marks II University of Washington Sergio Verdú Princeton University Joy A. Thomas IBM Thomas M. Cover Stanford University
There are two basic types of problems of interest in this context. Signal detection problems are concerned primarily with situations in which the information to be extracted from a signal is discrete in nature. That is, signal detection procedures are techniques for deciding among a discrete(usually finite) number of possible alternatives. An example of such a problem is the demodulation of a digital communication signal, in which the task of interest is to decide which of several possible transmitted symbols has elicited a given received signal. Estimation problems, on the other hand, deal with the determination of some numerical quantity taking values in a continuum. An example of an estimation problem is that of determining the phase or frequency of the carrier underlying a communication signal. Although signal detection and estimation is an area of considerable current research activity, the fundamental principles are quite well developed. These principles, which are based on the theory of statistical inference. plain and motivate most of the basic signal detection and estimation procedures used in practice. In this ction,we will give a brief overview of the basic principles underlying the field of signal detection. Estimation is treated elsewhere this volume, notably in Section 16. 2. A more complete introduction to these subjects found in Poor[1994] General Considerations The basic principles of signal detection can be conveniently discussed in the context of decision-making between two possible statistical models for a set of real-valued measurements, Y,, Y2,..., Ym. In particular, on observing Y Y,..., Y,, we wish to decide whether these measurements are most consistent with the m Yk=Nk,k=1,2,,,n (73.1) Yk=N+ Sk, k=l, 2,...,n (73.2) where N, N2,..., N is a random sequence representi d where S, S,,...,S, is a sequence In deciding between Eqs. (73. 1)and(73. 2), there are two types of errors possible: a false alarm, in which (73. 2)is falsely chosen, and a miss, in which(73. 1) is falsely chosen. The probabilities of these two types of errors can be used as performance indices in the optimization of rules for deciding between(73. 1)and(73. 2) Obviously, it is desirable to minimize both of these probabilities to the extent possible. However, the minimi- zation of the false-alarm probability and the minimization of the miss probability are opposing criteria. So, it is necessary to effect a trade-off between them in order to design a signal detection procedure. There are several ways of trading off the probabilities of miss and false alarm: the Bayesian detector minimizes an average of the two probabilities taken with respect to prior probabilities of the two conditions(73. 1)and(73. 2), the minimax detector minimizes the maximum of the two error probabilities, and the Neyman-Pearson detector mizes the miss probability under an upper-bound constraint on the false-alarm probabili If the statistics of noise and signal are known, the Bayesian, minimax, and Neyman-Pearson detectors are all of the same form. Namely, they reduce the measurements to a single number by computing the likelihood ratio I(Y,12…,Y)P、Yn (73.3) Px(Y1,Y2,……,Yn) where PsN and px denote the probability density functions of the measurements under signal-plus-noise(73. 2) and noise-only(73. 1)conditions, respectively. The likelihood ratio is then compared to a decision threshold, with the signal-present model(73. 2) being chosen if the threshold is exceeded, and the signal-absent model (73. 1)being chosen otherwise. Choice of the decision threshold determines a trade-off of the two error probabilities, and the optimum procedures for the three criteria mentioned above differ only in this choice e 2000 by CRC Press LLC
© 2000 by CRC Press LLC There are two basic types of problems of interest in this context. Signal detection problems are concerned primarily with situations in which the information to be extracted from a signal is discrete in nature. That is, signal detection procedures are techniques for deciding among a discrete (usually finite) number of possible alternatives. An example of such a problem is the demodulation of a digital communication signal, in which the task of interest is to decide which of several possible transmitted symbols has elicited a given received signal. Estimation problems, on the other hand, deal with the determination of some numerical quantity taking values in a continuum. An example of an estimation problem is that of determining the phase or frequency of the carrier underlying a communication signal. Although signal detection and estimation is an area of considerable current research activity, the fundamental principles are quite well developed. These principles, which are based on the theory of statistical inference, explain and motivate most of the basic signal detection and estimation procedures used in practice. In this section, we will give a brief overview of the basic principles underlying the field of signal detection. Estimation is treated elsewhere this volume, notably in Section 16.2. A more complete introduction to these subjects is found in Poor [1994]. General Considerations The basic principles of signal detection can be conveniently discussed in the context of decision-making between two possible statistical models for a set of real-valued measurements, Y1, Y2, . . ., Yn. In particular, on observing Y1, Y2, . . ., Yn, we wish to decide whether these measurements are most consistent with the model Yk = Nk , k = 1, 2, . . . , n (73.1) or with the model Yk = Nk + Sk , k = 1, 2, . . . , n (73.2) where N1, N2 , . . ., Nn is a random sequence representing noise, and where S1, S2, . . ., Sn is a sequence representing a (possibly random) signal. In deciding between Eqs. (73.1) and (73.2), there are two types of errors possible: a false alarm, in which (73.2) is falsely chosen, and a miss, in which (73.1) is falsely chosen. The probabilities of these two types of errors can be used as performance indices in the optimization of rules for deciding between (73.1) and (73.2). Obviously, it is desirable to minimize both of these probabilities to the extent possible. However, the minimization of the false-alarm probability and the minimization of the miss probability are opposing criteria. So, it is necessary to effect a trade-off between them in order to design a signal detection procedure. There are several ways of trading off the probabilities of miss and false alarm: the Bayesian detector minimizes an average of the two probabilities taken with respect to prior probabilities of the two conditions (73.1) and (73.2), the minimax detector minimizes the maximum of the two error probabilities, and the Neyman-Pearson detector minimizes the miss probability under an upper-bound constraint on the false-alarm probability. If the statistics of noise and signal are known, the Bayesian, minimax, and Neyman-Pearson detectors are all of the same form. Namely, they reduce the measurements to a single number by computing the likelihood ratio (73.3) where pS+N and pN denote the probability density functions of the measurements under signal-plus-noise (73.2) and noise-only (73.1) conditions, respectively. The likelihood ratio is then compared to a decision threshold, with the signal-present model (73.2) being chosen if the threshold is exceeded, and the signal-absent model (73.1) being chosen otherwise. Choice of the decision threshold determines a trade-off of the two error probabilities, and the optimum procedures for the three criteria mentioned above differ only in this choice. L Y Y Y p Y Y Y p Y Y Y n S N n N n ( , , , ) ( , , , ) ( , , , ) 1 2 1 2 1 2 ... ... ... D +
There are several basic signal detection structures that can be derived from Eqs.(73. 1)to(73.3)under the assumption that the noise sequence consists of a set of independent and identically distributed (i.i.d. )Gaussian random variables with zero means Such a sequence is known as discrete-time white Gaussian noise. Thus, until further notice, we will make this assumption about the noise. It should be noted that this assumption is physically justifiable in many applications Detection of Known Signals If the signal sequence S,,S2,..,Sn is known to be given by a specific sequence, say SI, s2,..,s,(a situation known as coherent detection), then the likelihood ratio(73.3)is given in the white Gaussian noise case by exp (73.4 where o is the variance of the noise samples. The only part of (73. 4 )that depends on the measurements is the erm EkalskYk and the likelihood ratio is a monotonically increasing function of this quantity. Thus, optimum detection of a coherent signal can be accomplished via a correlation detector, which operates by comparing the quantity kYk (73.5) to a threshold, announcing signal presence when the threshold is exceeded. Note that this detector works on the principle that the signal will correlate well with itself, yielding a value of (73.5)when present, whereas the random noise will tend to average out in the sum(73.5), yiel relatively small value when the signal is absent. This detector is illustrated in Fig. 73.1 ∑() Comparison k=1 FIGURE 73. 1 Correlation detector for a coherent signal in additive white Gaussian noise. Detection of Parametrized Signals The correlation detector cannot usually be used directly unless the signal is known exactly. If, alternatively, the ignal is known up to a short vector e of random parameters(such as frequencies or phases)that are independent of the noise, then an optimum test can be implemented by threshold comparison of the quantity 5(6)y_1 ∑ (6)|/}p()de where we have written Sk=S(0)to indicate the functional dependence of the signal on the parameters, and where A and p(0)denote the range and probability density function, respectively, of the parameters The most important example of such a parametrized signal is that in which the signal is a modulated sinusoid with random phase; i.e e 2000 by CRC Press LLC
© 2000 by CRC Press LLC There are several basic signal detection structures that can be derived from Eqs. (73.1) to (73.3) under the assumption that the noise sequence consists of a set of independent and identically distributed (i.i.d.) Gaussian random variables with zero means. Such a sequence is known as discrete-time white Gaussian noise. Thus, until further notice, we will make this assumption about the noise. It should be noted that this assumption is physically justifiable in many applications. Detection of Known Signals If the signal sequence S1, S2, . . ., Sn is known to be given by a specific sequence, say s1, s2 , . . ., sn (a situation known as coherent detection), then the likelihood ratio (73.3) is given in the white Gaussian noise case by (73.4) where s2 is the variance of the noise samples. The only part of (73.4) that depends on the measurements is the term S n k =1skYk and the likelihood ratio is a monotonically increasing function of this quantity. Thus, optimum detection of a coherent signal can be accomplished via a correlation detector, which operates by comparing the quantity (73.5) to a threshold, announcing signal presence when the threshold is exceeded. Note that this detector works on the principle that the signal will correlate well with itself, yielding a large value of (73.5) when present, whereas the random noise will tend to average out in the sum (73.5), yielding a relatively small value when the signal is absent. This detector is illustrated in Fig. 73.1. Detection of Parametrized Signals The correlation detector cannot usually be used directly unless the signal is known exactly. If, alternatively, the signal is known up to a short vector u of random parameters (such as frequencies or phases) that are independent of the noise, then an optimum test can be implemented by threshold comparison of the quantity (73.6) where we have written Sk = sk (u) to indicate the functional dependence of the signal on the parameters, and where L and p(u) denote the range and probability density function, respectively, of the parameters. The most important example of such a parametrized signal is that in which the signal is a modulated sinusoid with random phase; i.e., FIGURE 73.1 Correlation detector for a coherent signal in additive white Gaussian noise. exp s Y s / k k k k n k n - Ê Ë Á ˆ ¯ ˜ Ï Ì Ô Ó Ô ¸ ˝ Ô ˛ = = Ô Â Â 1 2 2 1 1 2 s s Yk k k n = Â 1 exp s Y ( ) s ( ) / p( )d k k k k n k n q - [ ] q q q Ê Ë Á ˆ ¯ ˜ Ï Ì Ô Ó Ô ¸ ˝ Ô ˛ Â= Â= Ô Ú 1 2 2 1 1 2 s L
Sk= ak cos(o k +0), k=l, 2,...,n where a is a known amplitude modulation sequence, o is a known(discrete-time) carrier frequency, and the random phase 0 is uniformly distributed in the interval [, In this case, the likelihood ratio is a monotonically increasing function of the quantity ∑吗 cos(o k)Yk+∑ ap sin(o.k)x (738) Thus, optimum detection can be implemented via comparison of(73. 8)with a threshold, a structure known as an envelope detector. Note that this detector correlates the measurements with two orthogonal components of the signal, ak cos(o k)and a, sin(o k). These two correlations, known as the in-phase and quadrature components of the measurements, respectively, capture all of the energy in the signal, regardless of the value of 0. Since 0 is unknown, however, these two correlations cannot be combined coherently, and thus they are combined noncoherently via(73. 8)before the result is compared with a threshold. This detector is illustrated in Fig. 73.2. Parametrized signals also arise in situations in which it is not appropriate to model the unknown parameters as random variables with a known distribution. In such cases, it is not possible to compute the likelihood ratio (73.6)so an alternative to the likelihood ratio detector must then be used. (An exception is that in which the likelihood ratio detector is invariant to the unknown parameters-a case known as uniformly most powerful detection. Several alternatives to the likelihood ratio detector exist for these cases. One useful such procedure is to test for the signals presence by threshold comparison of the generalized likelihood ratio, given by maxL(Y1,Y2,……,Y) (73.9 where Le denotes the likelihood ratio for Eqs. (73. 1)and(73. 2)for the known-signal problem with the parameter vector fixed at 0. In the case of white gaussian noise we have L(Y,Y2,……,Yn) (73.10) It should be noted that this formulation is also valid if the statistics of the noise have unknown parameters, e.g.,the noise variance in the white Gaussian case in which the generalized likelihood ratio detector is useful is that of detect signal that is known except for its time of arrival. That is, we interested in ized where (al is a known finite-duration signal sequence and where 0 ranges over the integers. Assuming white Gaussian noise and an observation interval much longer than the duration of tab, the generalized likelihood ratio detector in this case announces the presence of the signal if the quantity (73.12) exceeds a fixed threshold. This type of detector is known as a matched filter, since it can be implemented by filtering the measurements with a digital filter whose pulse response is a time-reversed version of the known e 2000 by CRC Press LLC
© 2000 by CRC Press LLC Sk = ak cos(vc k + u), k = 1, 2, . . ., n (73.7) where a1, a2, . . ., an is a known amplitude modulation sequence, vc is a known (discrete-time) carrier frequency, and the random phase u is uniformly distributed in the interval [–p,p]. In this case, the likelihood ratio is a monotonically increasing function of the quantity (73.8) Thus, optimum detection can be implemented via comparison of (73.8) with a threshold, a structure known as an envelope detector. Note that this detector correlates the measurements with two orthogonal components of the signal, ak cos(vck) and ak sin(vck). These two correlations, known as the in-phase and quadrature components of the measurements, respectively, capture all of the energy in the signal, regardless of the value of u. Since u is unknown, however, these two correlations cannot be combined coherently, and thus they are combined noncoherently via (73.8) before the result is compared with a threshold. This detector is illustrated in Fig. 73.2. Parametrized signals also arise in situations in which it is not appropriate to model the unknown parameters as random variables with a known distribution. In such cases, it is not possible to compute the likelihood ratio (73.6) so an alternative to the likelihood ratio detector must then be used. (An exception is that in which the likelihood ratio detector is invariant to the unknown parameters—a case known as uniformly most powerful detection.) Several alternatives to the likelihood ratio detector exist for these cases. One useful such procedure is to test for the signal’s presence by threshold comparison of the generalized likelihood ratio, given by (73.9) where Lu denotes the likelihood ratio for Eqs. (73.1) and (73.2) for the known-signal problem with the parameter vector fixed at u. In the case of white Gaussian noise, we have (73.10) It should be noted that this formulation is also valid if the statistics of the noise have unknown parameters, e.g., the noise variance in the white Gaussian case. One common application in which the generalized likelihood ratio detector is useful is that of detecting a signal that is known except for its time of arrival. That is, we are often interested in signals parametrized as sk(u) = ak–u (73.11) where {ak} is a known finite-duration signal sequence and where u ranges over the integers. Assuming white Gaussian noise and an observation interval much longer than the duration of {ak}, the generalized likelihood ratio detector in this case announces the presence of the signal if the quantity (73.12) exceeds a fixed threshold. This type of detector is known as a matched filter, since it can be implemented by filtering the measurements with a digital filter whose pulse response is a time-reversed version of the known ak ck Yk a k Y k n k c k k n cos(w w ) sin( ) = = Â Â È Î Í Í ˘ ˚ ˙ ˙ + È Î Í Í ˘ ˚ ˙ 1 ˙ 2 1 2 max ( , , , ) u u ŒL L Y1 2 Y Yn ... L Y Y Y s Y s n k k k k n k n u( , , ) exp ( ) u (u) / 1 2 2 1 1 1 2 2 . . . , = - [ ] Ê Ë Á ˆ ¯ ˜ Ï Ì Ô Ó Ô ¸ ˝ Ô ˛ = = Ô Â Â s max q q a Y k k k  -
Correlator () cos(o. k Threshold sin(o k) ∑() Channel Correlator FIGURE 73.2 Envelope detector for a noncoherent signal in additive white Gaussian noise. signal ag(hence it is"matched"to the signal), and announcing the signals presence if the filter output exceeds the decision threshold at any time Detection of random signals In some applications, particularly in remote sensing applications such as sonar and radio astronomy, it is appropriate to consider the signal sequence S,, S2,.,Sn itself to be a random sequence, statistically independent of the noise. In such cases, the likelihood ratio formula of (73.6) is still valid with the parameter vector 0 simply taken to be the signal itself. However, for long measurement records (i.e, large n),(73.6)is not a very practical formula except in some specific cases, the most important of which is the case in which the signal is Gaussian. In particular, if the signal is Gaussian with zero-mean and autocorrelation sequence n eE[,I, then the likelihood ratio is a monotonically increasing function of the quantity ∑∑ (73.13) with ak the element in the kth row and hh column of the positive-definite matrix QAl-(I+R/0) (73.14) where I denotes the n X n identity matrix, and R is the covariance matrix of the signal, i.e., it is the n X n matrix with elements ru Note that(73. 13)is a quadratic function of the measurements; thus, a detector based on the comparison of this quantity to a threshold is known as a quadratic detector. The simplest form of this detector results from the situation in which the signal samples are, like the noise samples, i i d. In this case, the quadratic function (73. 13)reduces to a positive constant multiple of the quantity Yk2 e 2000 by CRC Press LLC
© 2000 by CRC Press LLC signal {ak} (hence it is “matched’’ to the signal), and announcing the signal’s presence if the filter output exceeds the decision threshold at any time. Detection of Random Signals In some applications, particularly in remote sensing applications such as sonar and radio astronomy, it is appropriate to consider the signal sequence S1 , S2 , . . ., Sn itself to be a random sequence, statistically independent of the noise. In such cases, the likelihood ratio formula of (73.6) is still valid with the parameter vector u simply taken to be the signal itself. However, for long measurement records (i.e., large n), (73.6) is not a very practical formula except in some specific cases, the most important of which is the case in which the signal is Gaussian. In particular, if the signal is Gaussian with zero-mean and autocorrelation sequence rk,l = D E{SkSl}, then the likelihood ratio is a monotonically increasing function of the quantity (73.13) with qk,l the element in the kth row and lth column of the positive-definite matrix (73.14) where I denotes the n 3 n identity matrix, and R is the covariance matrix of the signal, i.e., it is the n 3 n matrix with elements rk,l . Note that (73.13) is a quadratic function of the measurements; thus, a detector based on the comparison of this quantity to a threshold is known as a quadratic detector. The simplest form of this detector results from the situation in which the signal samples are, like the noise samples, i.i.d. In this case, the quadratic function (73.13) reduces to a positive constant multiple of the quantity (73.15) FIGURE 73.2 Envelope detector for a noncoherent signal in additive white Gaussian noise. qk l YkYl l n k n , = = Â Â 1 1 Q D I - + I R - ( /s ) 2 1 Yk k n 2 =1 Â
a detector based on(73. 15)simply measures the energy in the measurements and then announces the presence of the signal if this energy is large enough. This type of detector is known as a radiometer. Thus, radiometry is optimum in the case in which both signal and noise are i.i. d Gaussian sequences with zero means. Since in this case the presence of the signal is manifested only by an increase in energy level, it is intuitively obvious that radiometry is the only way of detecting the signals presence. More generally, when the ignal is correlated, the quadratic function(73.13)exploits both the increased energy level and the correlation structure introduced by the presence of the signal. For example, if the signal is a narrowband Gaussian then the quadratic function(73. 13)acts as a narrowband radiometer with bandpass characteristic that imately matches that of the signal. In general, the quadratic detector will make use of whatever properties the signal exhibits If the signal is random but not Gaussian, then its optimum detection [described by(73.6)] typica more complicated nonlinear processing than the quadratic processing of (73. 13)in order to exploit butional differences between signal and noise. This type of processing is often not practical for implementation, and thus approximations to the optimum detector are typically used. An interesting family of such detectors ses cubic or quartic functions of the measurements, which exploit the higher-order spectral properties of the ignal [Mendel, 1991. As with deterministic signals, random signals can be parametrized. In this case, however, is the distribution of the signal that is parametrized. For example, the power spectrum of the signal of interest may be known only up to a set of unknown parameters. Generalized likelihood ratio detectors(73.9)are often used to detect such signals Deciding Among Multiple Signals The preceding results have been developed under the model (73. 1)-(73. 2)that there is a single signal that is either present or absent. In digital communications applications, it is more common to have the situation in which we wish to decide between the presence of two(or more) possible signals in a given set of measurement The foregoing results can be adapted straightforwardly to such problems. This can be seen most easily in the case of deciding among known signals. In particular, consider the problem of deciding between two alternatives (73.16) Y=N+s,k=1,2,,n (73.17) where,s,……, s(o)and s出,s2,…, s(are two known signals. Such problems arise in data transmission problems, in which the two signals sfo) and s()correspond to the waveforms received after transmission of a logical"zero"and"one, "respectively. In such problems, we are generally interested in minimizing the average robability of error, which is the average of the two error probabilities weighted by the prior probabilities of straightforward extension of the correlation detector based on (73.5). In particular, under the assumptIons t, ccurrence of the two signals. This is a Bayesian performance criterion, and the optimum decision rule is a the two signals are equally likely to occur prior to measurement, and that the noise is white and Gaussian, the optimum decision between(73. 16)and (73. 17) is to choose the model(73.16)if 2 a s(l, is larger than 2,=l s Y, and to choose the model(73.17)otherwise probability is to choose the signal si,, ,...,s, where j is a solution of the maximization problem ∑Y= max (73.18) 0≤m≤M=1 e 2000 by CRC Press LLC
© 2000 by CRC Press LLC A detector based on (73.15) simply measures the energy in the measurements and then announces the presence of the signal if this energy is large enough. This type of detector is known as a radiometer. Thus, radiometry is optimum in the case in which both signal and noise are i.i.d. Gaussian sequences with zero means. Since in this case the presence of the signal is manifested only by an increase in energy level, it is intuitively obvious that radiometry is the only way of detecting the signal’s presence. More generally, when the signal is correlated, the quadratic function (73.13) exploits both the increased energy level and the correlation structure introduced by the presence of the signal. For example, if the signal is a narrowband Gaussian process, then the quadratic function (73.13) acts as a narrowband radiometer with bandpass characteristic that approximately matches that of the signal. In general, the quadratic detector will make use of whatever spectral properties the signal exhibits. If the signal is random but not Gaussian, then its optimum detection [described by (73.6)] typically requires more complicated nonlinear processing than the quadratic processing of (73.13) in order to exploit the distributional differences between signal and noise. This type of processing is often not practical for implementation, and thus approximations to the optimum detector are typically used. An interesting family of such detectors uses cubic or quartic functions of the measurements, which exploit the higher-order spectral properties of the signal [Mendel, 1991]. As with deterministic signals, random signals can be parametrized. In this case, however, it is the distribution of the signal that is parametrized. For example, the power spectrum of the signal of interest may be known only up to a set of unknown parameters. Generalized likelihood ratio detectors (73.9) are often used to detect such signals. Deciding Among Multiple Signals The preceding results have been developed under the model (73.1)–(73.2) that there is a single signal that is either present or absent. In digital communications applications, it is more common to have the situation in which we wish to decide between the presence of two (or more) possible signals in a given set of measurements. The foregoing results can be adapted straightforwardly to such problems. This can be seen most easily in the case of deciding among known signals. In particular, consider the problem of deciding between two alternatives: (73.16) and (73.17) where s 1 (0), s 2 (0), . . ., s n (0) and s 1 (1), s 2 (1), . . ., s n (1) are two known signals. Such problems arise in data transmission problems, in which the two signals s(0) and s(1) correspond to the waveforms received after transmission of a logical “zero’’ and “one,’’ respectively. In such problems, we are generally interested in minimizing the average probability of error, which is the average of the two error probabilities weighted by the prior probabilities of occurrence of the two signals. This is a Bayesian performance criterion, and the optimum decision rule is a straightforward extension of the correlation detector based on (73.5). In particular, under the assumptions that the two signals are equally likely to occur prior to measurement, and that the noise is white and Gaussian, the optimum decision between (73.16) and (73.17) is to choose the model (73.16) if (k n =1 s k (0)Yk is larger than (k n =1 s k (1)Yk , and to choose the model (73.17) otherwise. More generally, many problems in digital communications involve deciding among M equally likely signals with M > 2. In this case, again assuming white Gaussian noise, the decision rule that minimizes the error probability is to choose the signal s 1 (j) , s 2 (j) , . . ., s n (j) , where j is a solution of the maximization problem (73.18) Y Ns k n k kk =+ = ( ), , ,..., 0 1 2 Y Ns k n k kk =+ = ( ), , ,..., 1 1 2 sY s Y k j k k n m M k m k k n ( ) ( ) max = ££ - = Â Â = 1 0 1 1
There are two basic types of digital communications applications in which the problem(73. 18)arises. One is in M-ary data transmission, in which a symbol alphabet with M elements is used to transmit data, and a decision among these M symbols must be made in each symbol interval [Proakis, 1983]. The other type of application in which(73. 18)arises is that in which data symbols are correlated in some way because of intersymbol interference, coding, or multiuser transmission. In such cases, each of the M possible signals represents a frame of data symbols, and a joint decision must be made about the entire frame since individual symbol decisions cannot be decoupled. Within this latter framework, the problem(73. 18)is known as sequence detection. The basic distinction between M-ary transmission and sequence detection is one of degree. In typical M-ary transmission, the number of elements in the signaling alphabet is typically a small power of 2(say 8 or 32), whereas the number of symbols in a frame of data could be on the order of thousands. Thus, solution of (73. 18)by exhaustive search is prohibitive for sequence detection, and less complex algorithms must be used. Typical digital communications applications in which sequence detection is necessary admit dynamic program ming solutions to(73. 18)(see, e.g., Verdu[1993]) Detection of Signals in More General Noise Processes In the foregoing paragraphs, we have described three basic detection procedures: correlation detection of signals that are completely known, envelope detection of signals that are known except for a random phase, and quadratic detection for Gaussian random signals. These detectors were all derived under an assumption of white Gaussian noise. This assumption provides an accurate model for the dominant noise arising in many communication channels. For example, the thermal noise generated in signal processing elect quately described as being white and Gaussian. However, there are also many channels in which the statistical behavior of the noise is not well described in this way, particularly when the dominant noise is produce the physical channel rather than in the receiver electronics One type of noise that often ise that is Gaussian but not white. In this case, the detection (73. 1)-(73.2)can be converted to an equivalent problem with white noise by applying a linear filtering known as prewhitening to the measurements. In particular, on denoting the noise covariance matrix can write ∑=CCr (73.19) where C is an n X n invertible, lower-triangular matrix and where the superscript T denotes matrix transpo- sition. The representation(73. 19)is known as the Cholesky decomposition. On multiplying the measurement vector YA(Y, Y2,..., Y)satisfying(73. 1)-(73. 2)with noise covariance 2, by C-i, we produce an equivalent (in terms of information content)measurement vector that satistifies the model(73. 1)-(73. 2)with white Gaussian noise and with the signal conformally transformed. This model can then be treated using the methods evisu In other channels, the noise can be modeled as being i.i. d. but with an amplitude distribution that is not Gaussian. This type of model arises, for example, in channels dominated by impulsive phenomena, such as urban radio channels. In the non-Gaussian case the procedures discussed previously lose their optimality as defined in terms of the error probabilities. These procedures can still be used, and they will work well under many conditions; however, there will be a resulting performance penalty with respect to op timum pi dures based on the likelihood ratio. Generally speaking, likelihood-ratio-based procedures for non-Gaussia hannels involve more complex nonlinear processing of the measurements than is required in the detectors, although the retention of the i.i. d. assumption greatly simplifies this problem. A treatment of for such channels can be found in Kassam [1988] When the noise is both non-Gaussian and dependent, the methodology is less well developed, although some techniques are available in these cases. An overview can be found in Poor and Thomas [1993] Robust and Nonparametric Detection All of the procedures outlined in the preceding subsection are based on the assumption of a known(possibly up to a set of unknown parameters)statistical model for signals and noise. In many practical situations it e 2000 by CRC Press LLC
© 2000 by CRC Press LLC There are two basic types of digital communications applications in which the problem (73.18) arises. One is in M-ary data transmission, in which a symbol alphabet with M elements is used to transmit data, and a decision among these M symbols must be made in each symbol interval [Proakis, 1983]. The other type of application in which (73.18) arises is that in which data symbols are correlated in some way because of intersymbol interference, coding, or multiuser transmission. In such cases, each of the M possible signals represents a frame of data symbols, and a joint decision must be made about the entire frame since individual symbol decisions cannot be decoupled. Within this latter framework, the problem (73.18) is known as sequence detection. The basic distinction between M-ary transmission and sequence detection is one of degree. In typical M-ary transmission, the number of elements in the signaling alphabet is typically a small power of 2 (say 8 or 32), whereas the number of symbols in a frame of data could be on the order of thousands. Thus, solution of (73.18) by exhaustive search is prohibitive for sequence detection, and less complex algorithms must be used. Typical digital communications applications in which sequence detection is necessary admit dynamic programming solutions to (73.18) (see, e.g., Verdú [1993]). Detection of Signals in More General Noise Processes In the foregoing paragraphs, we have described three basic detection procedures: correlation detection of signals that are completely known, envelope detection of signals that are known except for a random phase, and quadratic detection for Gaussian random signals. These detectors were all derived under an assumption of white Gaussian noise. This assumption provides an accurate model for the dominant noise arising in many communication channels. For example, the thermal noise generated in signal processing electronics is adequately described as being white and Gaussian. However, there are also many channels in which the statistical behavior of the noise is not well described in this way, particularly when the dominant noise is produced in the physical channel rather than in the receiver electronics. One type of noise that often arises is noise that is Gaussian but not white. In this case, the detection problem (73.1)–(73.2) can be converted to an equivalent problem with white noise by applying a linear filtering process known as prewhitening to the measurements. In particular, on denoting the noise covariance matrix by (, we can write ( = CCT (73.19) where C is an n 3 n invertible, lower-triangular matrix and where the superscript T denotes matrix transposition. The representation (73.19) is known as the Cholesky decomposition. On multiplying the measurement vector Y = D (Y1 , Y2 , . . ., Yn )T satisfying (73.1)–(73.2) with noise covariance (, by C–1, we produce an equivalent (in terms of information content) measurement vector that satistifies the model (73.1)–(73.2) with white Gaussian noise and with the signal conformally transformed. This model can then be treated using the methods described previously. In other channels, the noise can be modeled as being i.i.d. but with an amplitude distribution that is not Gaussian. This type of model arises, for example, in channels dominated by impulsive phenomena, such as urban radio channels. In the non-Gaussian case the procedures discussed previously lose their optimality as defined in terms of the error probabilities. These procedures can still be used, and they will work well under many conditions; however, there will be a resulting performance penalty with respect to optimum procedures based on the likelihood ratio. Generally speaking, likelihood-ratio-based procedures for non-Gaussian noise channels involve more complex nonlinear processing of the measurements than is required in the standard detectors, although the retention of the i.i.d. assumption greatly simplifies this problem. A treatment of methods for such channels can be found in Kassam [1988]. When the noise is both non-Gaussian and dependent, the methodology is less well developed, although some techniques are available in these cases. An overview can be found in Poor and Thomas [1993]. Robust and Nonparametric Detection All of the procedures outlined in the preceding subsection are based on the assumption of a known (possibly up to a set of unknown parameters) statistical model for signals and noise. In many practical situations it is
not possible to specify accurate statistical models for signals or noise, and so it is of interest to design detection procedures that do not rely heavily on such models. Of course, the parametrized models described in the foregoing paragraphs allow for uncertainty in the statistics of the observations. Such models are known parametric models, because the set of possible distributions can be parametrized by a finite set of real parameters e While parametric models can be used to describe many types of modeling uncertainty, composite models in which the set of possible distributions is much broader than a parametric model would allow are sometimed more realistic in practice. Such models are termed nonparametric models. For example, one might be able to ssume only some very coarse model for the noise, such as that it is symmetrically distributed. a wide variety of useful and powerful detectors have been developed for signal-detection problems that cannot be parame trized. These are basically of two types: robust and nonparametric. Robust detectors are those designed to perform well despite small, but potentially damaging, nonparametric deviations from a nominal parametric model, whereas nonparametric detectors are designed to achieve constant false-alarm probability over very wide classes of noise statistics Robustness problems are usually treated analytically via minimax formulations that seek best worst-case performance as the design objective. This formulation has proven to be very useful in the design and charac erization of robust detectors for a wide variety of detection problems Solutions typically call for the intro- duction of light limiting to prevent extremes of gain dictated by an(unrealistic)nominal model. For example, the correlation detector of Fig. 73. 1 can be made robust against deviations from the Gaussian noise model by introducing a soft-limiter between the multiplier and the accumulator. Nonparametric detection is usually based on relatively coarse information about the observations, such a he algebraic signs or the ranks of the observations. One such test is the sign test, which bases its decisions on the number of positive observations obtained. This test is nonparametric for the model in which the noise samples are i.i. d. with zero median and is reasonably powerful against alternatives such as the presence of a positive constant signal in such noise. More powerful tests for such problems can be achieved at the expense of complexity by incorporating rank information into the test statistic. Distributed and Sequential Detection The detection procedures discussed in the preceding paragraphs are based on the assumption that all measure- ments can and should be used in the detection of the signal, and moreover that no constraints exist on how measurements can be combined. There are a number of applications, however, in which constraints apply to the information pattern of the measurements One type of constrained information pattern that is of interest in a number of applications is a network consisting of a number of distributed or local decision makers, each of which processes a subset of the measurements, and a fusion center, which combines the outputs of the distributed decision makers to produce a global detection decision the communication between the distributed decision makers and the fusion center is constrained so that each local decision maker must reduce its subset of measurements to ng local decision to be transmitted to the fusion center. Such structures arise in applications such as the testing of large-scale integrate circuits, in which data collection is decentralized, or in detection problems involving very large data sets, in which is desirable to distribute the computational work of the detection algorithm. Such problems lie in the field of distributed detection. Except in some trivial special cases, the constraints imposed by distributing the detection algorithm introduce a further level of difficulty into the design of optimum detection systems. Nevertheless, considerable progress has been made on this problem, a survey of which can be found in Tsitsiklis [ 1993] Another type of nonstandard information pattern that arises is that in which the number of measurements potentially infinite, but in which there is a cost associated with taking each measurement. This type of model arises in applications such as the synchronization of wideband communication signals. In such situations, the error probabilities alone do not completely characterize the performance of a detection system, consideration must also be given to the cost of sampling. The field of sequential detection deals with the optimization of detection systems within such constraints. In sequential detectors, the number of measurements taken becomes a random variable depending on the measurements themselves. a typical performance criterion for optimizing such a system is to seek a detector that minimizes the expected number of measurements for given levels of miss and false-alarm probabilities e 2000 by CRC Press LLC
© 2000 by CRC Press LLC not possible to specify accurate statistical models for signals or noise, and so it is of interest to design detection procedures that do not rely heavily on such models. Of course, the parametrized models described in the foregoing paragraphs allow for uncertainty in the statistics of the observations. Such models are known as parametric models, because the set of possible distributions can be parametrized by a finite set of real parameters. While parametric models can be used to describe many types of modeling uncertainty, composite models in which the set of possible distributions is much broader than a parametric model would allow are sometimed more realistic in practice. Such models are termed nonparametric models. For example, one might be able to assume only some very coarse model for the noise, such as that it is symmetrically distributed. A wide variety of useful and powerful detectors have been developed for signal-detection problems that cannot be parametrized. These are basically of two types: robust and nonparametric. Robust detectors are those designed to perform well despite small, but potentially damaging, nonparametric deviations from a nominal parametric model, whereas nonparametric detectors are designed to achieve constant false-alarm probability over very wide classes of noise statistics. Robustness problems are usually treated analytically via minimax formulations that seek best worst-case performance as the design objective. This formulation has proven to be very useful in the design and characterization of robust detectors for a wide variety of detection problems. Solutions typically call for the introduction of light limiting to prevent extremes of gain dictated by an (unrealistic) nominal model. For example, the correlation detector of Fig. 73.1 can be made robust against deviations from the Gaussian noise model by introducing a soft-limiter between the multiplier and the accumulator. Nonparametric detection is usually based on relatively coarse information about the observations, such as the algebraic signs or the ranks of the observations. One such test is the sign test, which bases its decisions on the number of positive observations obtained. This test is nonparametric for the model in which the noise samples are i.i.d. with zero median and is reasonably powerful against alternatives such as the presence of a positive constant signal in such noise. More powerful tests for such problems can be achieved at the expense of complexity by incorporating rank information into the test statistic. Distributed and Sequential Detection The detection procedures discussed in the preceding paragraphs are based on the assumption that all measurements can and should be used in the detection of the signal, and moreover that no constraints exist on how measurements can be combined. There are a number of applications, however, in which constraints apply to the information pattern of the measurements. One type of constrained information pattern that is of interest in a number of applications is a network consisting of a number of distributed or local decision makers, each of which processes a subset of the measurements, and a fusion center, which combines the outputs of the distributed decision makers to produce a global detection decision. The communication between the distributed decision makers and the fusion center is constrained, so that each local decision maker must reduce its subset of measurements to a summarizing local decision to be transmitted to the fusion center. Such structures arise in applications such as the testing of large-scale integrated circuits, in which data collection is decentralized, or in detection problems involving very large data sets, in which it is desirable to distribute the computational work of the detection algorithm. Such problems lie in the field of distributed detection. Except in some trivial special cases, the constraints imposed by distributing the detection algorithm introduce a further level of difficulty into the design of optimum detection systems. Nevertheless, considerable progress has been made on this problem, a survey of which can be found in Tsitsiklis [1993]. Another type of nonstandard information pattern that arises is that in which the number of measurements is potentially infinite, but in which there is a cost associated with taking each measurement. This type of model arises in applications such as the synchronization of wideband communication signals. In such situations, the error probabilities alone do not completely characterize the performance of a detection system, since consideration must also be given to the cost of sampling. The field of sequential detection deals with the optimization of detection systems within such constraints. In sequential detectors, the number of measurements taken becomes a random variable depending on the measurements themselves. A typical performance criterion for optimizing such a system is to seek a detector that minimizes the expected number of measurements for given levels of miss and false-alarm probabilities
The most commonly used sequential detection procedure is the sequential probability ratio test, which operates by recursive comparison of the likelihood ratio (73.3)to two thresholds In this detector, if the likelihood ratio for a given number of samples exceeds the larger of the two thresholds, then the signals presence is announced and the test terminates. Alternatively, if the likelihood ratio falls below the smaller of the two thresholds, the e. &nal's absence is announced and the test terminates. However, if neither of the two thresholds is crossed, then another measurement is taken and the test is repeated Detection with Continuous Time measurements Note that all of the preceding formulations have involved the assumption of discrete-time (i.e, sampled-data) measurements. From a practical point of view, this is the most natural framework within which to consider these problems, since implementations most often involve digital hardware. However, the procedures discussed this section all have continuous-time counterparts, which are of both theoretical and practical interest. Mathematically, continuous-time detection problems are more difficult than discrete-time ones, because they involve probabilistic analysis on function spaces. The theory of such problems is quite elegant, and the interested reader is referred to Poor[ 1994] or Grenander [1981] for more detailed exposition. Continuous-time models are of primary interest in the front-end stages of radio frequency or optical communication receivers. At radio frequencies, continuous-time versions of the models described in the preceding paragraphs can be used. For example, one may consider the detection of signals in continuous-time Gaussian white noise. At optical wavelengths, one may consider either continuous models(such as Gaussian processes)or point-process models(such as Poisson counting processes), depending on the type of detection used(see, e.g., Snyder and Miller[1991 ) In the most fundamental analyses of optical detection problems, it is sometimes desirable to consider the quantum mechanical nature of the measurements [ Helstrom, 1976] Defining Terms Bayesian detector: A detector that minimizes the average of the false-alarm and miss probabilities, weighted with respect to prior probabilities of signal-absent and signal-present conditions Correlation detector: The optimum structure for detecting coherent signals in the presence of additive white Discrete-time white Gaussian noise: Noise samples modeled as independent and identically distributed Envelope detector: The optimum structure for detecting a modulated sinusoid with random phase in the presence of additive white Gaussian noise. False-alarm probability: The probability of falsely announcing the presence of a signal Likelihood ratio: The optimum processor for reducing a set of signal-detection measurements to a single Miss probability: The probability of falsely announcing the absence of a signal. Neyman-Pearson detector: A detector that minimizes the miss probability within an upper-bound constraint Quadratic detector: A detector that makes use of the second-order statistical structure(e.g, the spectral characteristics)of the measurements. The optimum structure for detecting a zero-mean Gaussian signal in the presence of additive Gaussian noise is of this form. Related Topics 16.2 Parameter Estimation. 70.3 Spread Spectrum Communications U Grenander, Abstract Inference, New York: Wiley, 1981 C W. Helstrom, Quantum Detection and Estimation Theory, New York: Academic Press,1976 S.A. Kassam, Signal Detection in Non-Gaussian Noise, New York: Springer-Verlag, 1988 e 2000 by CRC Press LLC
© 2000 by CRC Press LLC The most commonly used sequential detection procedure is the sequential probability ratio test, which operates by recursive comparison of the likelihood ratio (73.3) to two thresholds. In this detector, if the likelihood ratio for a given number of samples exceeds the larger of the two thresholds, then the signal’s presence is announced and the test terminates. Alternatively, if the likelihood ratio falls below the smaller of the two thresholds, the signal’s absence is announced and the test terminates. However, if neither of the two thresholds is crossed, then another measurement is taken and the test is repeated. Detection with Continuous-Time Measurements Note that all of the preceding formulations have involved the assumption of discrete-time (i.e., sampled-data) measurements. From a practical point of view, this is the most natural framework within which to consider these problems, since implementations most often involve digital hardware. However, the procedures discussed in this section all have continuous-time counterparts, which are of both theoretical and practical interest. Mathematically, continuous-time detection problems are more difficult than discrete-time ones, because they involve probabilistic analysis on function spaces. The theory of such problems is quite elegant, and the interested reader is referred to Poor [1994] or Grenander [1981] for more detailed exposition. Continuous-time models are of primary interest in the front-end stages of radio frequency or optical communication receivers. At radio frequencies, continuous-time versions of the models described in the preceding paragraphs can be used. For example, one may consider the detection of signals in continuous-time Gaussian white noise. At optical wavelengths, one may consider either continuous models (such as Gaussian processes) or point-process models (such as Poisson counting processes), depending on the type of detection used (see, e.g., Snyder and Miller [1991]). In the most fundamental analyses of optical detection problems, it is sometimes desirable to consider the quantum mechanical nature of the measurements [Helstrom, 1976]. Defining Terms Bayesian detector: A detector that minimizes the average of the false-alarm and miss probabilities, weighted with respect to prior probabilities of signal-absent and signal-present conditions. Correlation detector: The optimum structure for detecting coherent signals in the presence of additive white Gaussian noise. Discrete-time white Gaussian noise: Noise samples modeled as independent and identically distributed Gaussian random variables. Envelope detector: The optimum structure for detecting a modulated sinusoid with random phase in the presence of additive white Gaussian noise. False-alarm probability: The probability of falsely announcing the presence of a signal. Likelihood ratio: The optimum processor for reducing a set of signal-detection measurements to a single number for subsequent threshold comparison. Miss probability: The probability of falsely announcing the absence of a signal. Neyman-Pearson detector: A detector that minimizes the miss probability within an upper-bound constraint on the false-alarm probability. Quadratic detector: A detector that makes use of the second-order statistical structure (e.g., the spectral characteristics) of the measurements. The optimum structure for detecting a zero-mean Gaussian signal in the presence of additive Gaussian noise is of this form. Related Topics 16.2 Parameter Estimation • 70.3 Spread Spectrum Communications References U. Grenander, Abstract Inference, New York: Wiley, 1981. C.W. Helstrom, Quantum Detection and Estimation Theory, New York: Academic Press, 1976. S.A. Kassam, Signal Detection in Non-Gaussian Noise, New York: Springer-Verlag, 1988