Schroeter, J, Mehta, S.K., Carter, G.C.Acoustic Signal Processing The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton CRC Press llc. 2000
Schroeter, J., Mehta, S.K., Carter, G.C. “Acoustic Signal Processing” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
Acoustic Signal Processing uergen Schroeter 19.1 Digital Signal Processing in Audio and Electroacoustics Acoustics Research Dept, ATa)T Bell laboratories Steerable Microphone Arrays. Digital Hearing Aids. Spatial ing.Audio Coding. Echo Cancellation.Active Noise Sanjay K Mehta 19.2 Underwater Acoustical Signal Processing s What Is Underwater Acoustical Signal Processing?. Technical G. Clifford Carter Overview. Underwater Propagation. Processing NUwC Detachment Functions. Advanced Signal Processing. Application 19.1 Digital Signal processing in audio and electroacoustics Juergen Schroeter In this section we will focus on advances in algorithms and technologies in digital signal processing(DSP)that have already had or, most likely, will soon have, a major impact on audio and electroacoustics(A&E). Because A&E embraces a wide range of topics, it is impossible for us to go here into any depth in any one of them Instead, this section will try to give a compressed overview of the topics the author judges to be most important n the following, we will look into steerable microphone arrays, digital hearing aids, spatial processing, audio coding, echo cancellation, and active noise and sound control. We will not cover basic techniques in digital recording [ Pohlmann, 1989] and computer [Moore, 1990 Steerable Microphone arrays Steerable microphone arrays have controllable directional characteristics. One important application is in teleconferencing. Here, sound pickup can be highly degraded by reverberation and room noise. One solution to this problem is to utilize highly directional microphones. Instead of pointing such a microphone manually to a desired talker, steerable microphone arrays can be used for reliable automatic tracking of speakers as they move around in a noisy room or auditorium, if combined with a suitable speech detection algorithm Figure 19. 1 depicts the simplest kind of steerable array using N microphones that are uniformly spaced with distance d along the linear x-axis. It can be shown that the response of this system to a plane wave impinging at an angle 0 is ane Here,j=V-1,@ is the radian frequency, and c is the speed of sound. Equation(19. 1)is a spatial filter coefficients am and the delay operator zl= exp(-jdovc cose). Therefore, we can apply finite impulse resp (FIR)filter theory. For example, we could taper the weights a, to suppress sidelobes of the array. we also to guard against spatial aliasing, that is, grating lobes that make the directional characteristic of the array c 2000 by CRC Press LLC
© 2000 by CRC Press LLC 19 Acoustic Signal Processing 19.1 Digital Signal Processing in Audio and Electroacoustics Steerable Microphone Arrays • Digital Hearing Aids • Spatial Processing • Audio Coding • Echo Cancellation • Active Noise and Sound Control 19.2 Underwater Acoustical Signal Processing s What Is Underwater Acoustical Signal Processing? • Technical Overview • Underwater Propagation • Processing Functions • Advanced Signal Processing • Application 19.1 Digital Signal Processing in Audio and Electroacoustics Juergen Schroeter In this section we will focus on advances in algorithms and technologies in digital signal processing (DSP) that have already had or, most likely, will soon have, a major impact on audio and electroacoustics (A&E). Because A&E embraces a wide range of topics, it is impossible for us to go here into any depth in any one of them. Instead, this section will try to give a compressed overview of the topics the author judges to be most important. In the following, we will look into steerable microphone arrays, digital hearing aids, spatial processing, audio coding, echo cancellation, and active noise and sound control. We will not cover basic techniques in digital recording [Pohlmann, 1989] and computer music [Moore, 1990]. Steerable Microphone Arrays Steerable microphone arrays have controllable directional characteristics. One important application is in teleconferencing. Here, sound pickup can be highly degraded by reverberation and room noise. One solution to this problem is to utilize highly directional microphones. Instead of pointing such a microphone manually to a desired talker, steerable microphone arrays can be used for reliable automatic tracking of speakers as they move around in a noisy room or auditorium, if combined with a suitable speech detection algorithm. Figure 19.1 depicts the simplest kind of steerable array using N microphones that are uniformly spaced with distance d along the linear x-axis. It can be shown that the response of this system to a plane wave impinging at an angle q is: (19.1) Here, j = , w is the radian frequency, and c is the speed of sound. Equation (19.1) is a spatial filter with coefficients an and the delay operator z–1 = exp(–jdw/c cosq). Therefore, we can apply finite impulse response (FIR) filter theory. For example, we could taper the weights an to suppress sidelobes of the array. We also have to guard against spatial aliasing, that is, grating lobes that make the directional characteristic of the array H j a en n N c nd ( ) cos w w q = = - - Â 0 1 j( / ) -1 Juergen Schroeter Acoustics Research Dept., AT&T Bell Laboratories Sanjay K. Mehta NUWC Detachment G. Clifford Carter NUWC Detachment
x=d x=2d FIGURE 19. 1 A linear array of N microphones(here, N=5; t=d/c cos 0) o FIGURE 19.2 Three superimposed linear arrays depicted by large, midsize, and small circles. The largest array covers the low frequencies, the midsize array covers the midrange frequencies, and the smallest covers the high frequenc ambiguous. The array is steered to an angle eo by introducing appropriate delays into the n microphone lines In Eg (19.1), we can incorporate these delays by letting an joto a+j(o/c)nd cos eo (19.2) Here to is an overall delay equal to or larger than Nd/c cose, that ensures causality, while the second term in Eq (19.2)cancels the corresponding term in Eq. (19.1)at 8 =0 o. Due to the axial symmetry of the one dimensional (linear, 1-D)array, the directivity of the array is a figure of revolution around the x-axis. Therefore in case we want the array to point to a single direction in space, we need a 2-D array. Since most of the energy of typical room noise and the highest level of reverberation in a room is at low frequencies, one would like to use arrays that have their highest directivity (i.e, narrowest beamwidth) at low frequencies. Unfortunately, this need collides with the physics of arrays: the smaller the array relative to the wavelength, the wider the beam (Again, the corresponding notion in filter theory is that systems with shorter have wider bandwidth. )One solution to this problem is to superimpose different-size arrays and filter each output by the appropriate bandpass filter, similar to a crossover network used in two-or three way loudspeaker designs. Such a superposition of three five-element arrays is shown in Fig. 19. 2. Note that we only need nine microphones in this example, instead of 5 X 3=1 Another interesting application is the use of an array to mitigate discrete noise sources in a room. For this, we need to attach an FIR filter to each of the microphone signal outputs. For any given frequency, one can show that N microphones can produce N-l nulls in the directional characteristic of the array. Similarly, attaching an M-point FIR filter to each of the microphones, we can get these zeros at M-1 frequencies. The weights for these filters have to be adapted, usually under the constraint that the transfer function(frequency nicely in(almost)anechoic environments. Their performance degrades, however, with increasing revedctaio haracteristic)of the array for the desired source is optimally flat. In practical tests, systems of this kind work More information on microphone arrays can be found in Flanagan et al. [1991 ] in particular, they describe how to make arrays adapt to changing talker positions in a room by constantly scanning the room with a moving search beam and by switching the main beam accordingly. Current research issues are, among others, 3-D arrays and how to take advantage of low-order wall reflections Digital Hearing Aids only used hearing aids attempt to compensate for sensorineural (cochlear) hearing loss by delivering an acoustic signal to the external ear canal. As will be pointed out below, the most important problem is how to find the best aid for a given patient. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC ambiguous. The array is steered to an angle q0 by introducing appropriate delays into the N microphone lines. In Eq. (19.1), we can incorporate these delays by letting an (19.2) Here t0 is an overall delay equal to or larger than Nd/c cosq0 that ensures causality, while the second term in Eq. (19.2) cancels the corresponding term in Eq. (19.1) at q =q 0. Due to the axial symmetry of the onedimensional (linear, 1-D) array, the directivity of the array is a figure of revolution around the x-axis. Therefore, in case we want the array to point to a single direction in space, we need a 2-D array. Since most of the energy of typical room noise and the highest level of reverberation in a room is at low frequencies, one would like to use arrays that have their highest directivity (i.e., narrowest beamwidth) at low frequencies. Unfortunately, this need collides with the physics of arrays: the smaller the array relative to the wavelength, the wider the beam. (Again, the corresponding notion in filter theory is that systems with shorter impulse responses have wider bandwidth.) One solution to this problem is to superimpose different-size arrays and filter each output by the appropriate bandpass filter, similar to a crossover network used in two- or threeway loudspeaker designs. Such a superposition of three five-element arrays is shown in Fig. 19.2. Note that we only need nine microphones in this example, instead of 5 2 3 = 15. Another interesting application is the use of an array to mitigate discrete noise sources in a room. For this, we need to attach an FIR filter to each of the microphone signal outputs. For any given frequency, one can show that N microphones can produce N – 1 nulls in the directional characteristic of the array. Similarly, attaching an M-point FIR filter to each of the microphones, we can get these zeros at M – 1 frequencies. The weights for these filters have to be adapted, usually under the constraint that the transfer function (frequency characteristic) of the array for the desired source is optimally flat. In practical tests, systems of this kind work nicely in (almost) anechoic environments. Their performance degrades, however, with increasing reverberation. More information on microphone arrays can be found in Flanagan et al. [1991]; in particular, they describe how to make arrays adapt to changing talker positions in a room by constantly scanning the room with a moving search beam and by switching the main beam accordingly. Current research issues are, among others, 3-D arrays and how to take advantage of low-order wall reflections. Digital Hearing Aids Commonly used hearing aids attempt to compensate for sensorineural (cochlear) hearing loss by delivering an amplifed acoustic signal to the external ear canal. As will be pointed out below, the most important problem is how to find the best aid for a given patient. FIGURE 19.1 A linear array of N microphones (here, N = 5; t = d/c cos q). FIGURE 19.2 Three superimposed linear arrays depicted by large, midsize, and small circles. The largest array covers the low frequencies, the midsize array covers the midrange frequencies, and the smallest covers the high frequencies. t x = 0 x = d x = 2d d 2t 3t 4t q X a e e n j j c nd = - wt 0 + (w/ ) cos q0
Historically, technology has been the limiting factor in hearing aids. Early on, carbon hearing aids provided a limited gain and a narrow, peaky frequency response. Nowadays, hearing aids have a broader bandwidth and a flatter frequency response. Consequently, more people can benefit from the improved technology. With the advent of digital technology, the promise is that even more people would be able to do so. Unfortunately, as will be pointed out below, we have not fulfilled this promise yet Ve distinguish between analog, digitally controlled analog, and digital hearing aids. Analog hearing aids contain only(low-power)pre-amp, filter(s),(optional)automatic gain control (AGC)or compressor, power amp, and output limiter Digitally controlled aids have certain additional components: one kind adds a digital controller to monitor and adjust the analog components of the aid. Another kind contains switched-capacitor circuits that represent sampled signals in analog form, in effect allowing simple discrete-time processing (e. g Itering). Aids with switched-capacitor circuits have a lower power consumption compared to digital aids Digital aids--none are yet commercially available-contain A/D and D/A converters and at least one program mable digital signal processing(DSP)chip, allowing for the use of sophisticated DSP algorithms,(small) microphone arrays, speech enhancement in noise, etc. Experts disagree, however, as to the usefulness of these techniques. To date, the most successful approach seems to be to ensure that all parts of the signal get amplified so that they are clearly audible but not too loud and to "let the brain sort out signal and noise Hearing aids pose a tremendous challenge for the DSP engineer, as well as for the audiologist and acous- tician. Due to the continuing progress in chip tech- 0 nology, the physical size of a digital aid should no 70 In AK SPEECH LEVELS ever, power consumption will still be a problem for quite some time. Besides the obvious necessity of 350 avoiding howling(acoustic feedback), for example, 4 by employing sophisticated models of the electroa coustic transducers acoustic leaks, and ear canal to WEAK CONSONA control the aid accordingly, there is a much more fundamental problem: since DSP allows complex NORMAL schemes of splitting, filtering, compressing, and(re- 10 combining the signal, hearing aid performance is limited by bottlenecks 125255 still limited, however, by the lack of basic knowledge FREQUENCY[kHz about how to map an arbitrary input signal (i.e, speech from a desired speaker) onto the reduced FIGURE 19.3 Peak third-octave band levels of normal to capabilities of the auditory system of the targeted loud speech(hatched)and typical levels/dominant freque wearer of the aid. Hence, the selection and fitting of cies of speech sounds(identifiers). Both can be compared to an appropriate aid becomes the most important issue. the third-octave threshold of normal-hearing people(solid This serious problem is illustrated in Fig. 19.3. line), thresholds for a mildly hearing- impaired person(A) portant to note that for for a severely hearing- impaired person( B), and for a pr a constant level, a linear(no compression) hearing foundly hearing-impaired person(C). For example, for per- aid can be tuned to do as well as a hearing aid with son(A), sibilants and some weak consonants in a normal conversation cannot be perceived. ( Source: H. Levitt, " Speech compression. However, if parameters like signal and discrimination ability in the hearing impaired: spectrum con- background noise levels change dynamically, com- siderations, in The Vanderbilt Hearing-Aid Report: State of pression aids, in particular those with two bands or the Art-Research Needs, G.A. Studebaker and E.H. Bess(Eds. more,should have an advantage. Audiology, Upp While a patient usually has no problem telling 1982, p. 34. with permission. whether setting A or B is"clearer, "adjusting more than just 2-3(usually interdependent) parameters is very time consuming. For a multiparameter aid, an efficient fitting procedure that maximizes a certain objective is needed. Possible objectives are, for example, intelligibility maximization or loudness restoration. The latter objective is assumed in the following known that an impaired ear has a reduced dynamic range. Therefore, the procedure for fitting a patient with a hearing aid could estimate the so-called loudness-growth function(LGF)that relates the sound pressure e 2000 by CRC Press LLC
© 2000 by CRC Press LLC Historically, technology has been the limiting factor in hearing aids. Early on, carbon hearing aids provided a limited gain and a narrow, peaky frequency response. Nowadays, hearing aids have a broader bandwidth and a flatter frequency response. Consequently, more people can benefit from the improved technology. With the advent of digital technology, the promise is that even more people would be able to do so. Unfortunately, as will be pointed out below, we have not fulfilled this promise yet. We distinguish between analog, digitally controlled analog, and digital hearing aids. Analog hearing aids contain only (low-power) pre-amp, filter(s), (optional) automatic gain control (AGC) or compressor, power amp, and output limiter. Digitally controlled aids have certain additional components: one kind adds a digital controller to monitor and adjust the analog components of the aid. Another kind contains switched-capacitor circuits that represent sampled signals in analog form, in effect allowing simple discrete-time processing (e.g., filtering). Aids with switched-capacitor circuits have a lower power consumption compared to digital aids. Digital aids—none are yet commercially available—contain A/D and D/A converters and at least one programmable digital signal processing (DSP) chip, allowing for the use of sophisticated DSP algorithms, (small) microphone arrays, speech enhancement in noise, etc. Experts disagree, however, as to the usefulness of these techniques. To date, the most successful approach seems to be to ensure that all parts of the signal get amplified so that they are clearly audible but not too loud and to “let the brain sort out signal and noise.” Hearing aids pose a tremendous challenge for the DSP engineer, as well as for the audiologist and acoustician. Due to the continuing progress in chip technology, the physical size of a digital aid should no longer be a serious problem in the near future; however, power consumption will still be a problem for quite some time. Besides the obvious necessity of avoiding howling (acoustic feedback), for example, by employing sophisticated models of the electroacoustic transducers, acoustic leaks, and ear canal to control the aid accordingly, there is a much more fundamental problem: since DSP allows complex schemes of splitting, filtering, compressing, and (re- ) combining the signal, hearing aid performance is no longer limited by bottlenecks in technology. It is still limited, however, by the lack of basic knowledge about how to map an arbitrary input signal (i.e., speech from a desired speaker) onto the reduced capabilities of the auditory system of the targeted wearer of the aid. Hence, the selection and fitting of an appropriate aid becomes the most important issue. This serious problem is illustrated in Fig. 19.3. It is important to note that for speech presented at a constant level, a linear (no compression) hearing aid can be tuned to do as well as a hearing aid with compression. However, if parameters like signal and background noise levels change dynamically, compression aids, in particular those with two bands or more, should have an advantage. While a patient usually has no problem telling whether setting A or B is “clearer,” adjusting more than just 2–3 (usually interdependent) parameters is very time consuming. For a multiparameter aid, an efficient fitting procedure that maximizes a certain objective is needed. Possible objectives are, for example, intelligibility maximization or loudness restoration. The latter objective is assumed in the following. It is known that an impaired ear has a reduced dynamic range. Therefore, the procedure for fitting a patient with a hearing aid could estimate the so-called loudness-growth function (LGF) that relates the sound pressure FIGURE 19.3 Peak third-octave band levels of normal to loud speech (hatched) and typical levels/dominant frequencies of speech sounds (identifiers). Both can be compared to the third-octave threshold of normal-hearing people (solid line), thresholds for a mildly hearing-impaired person (A), for a severely hearing-impaired person (B), and for a profoundly hearing-impaired person (C). For example, for person (A), sibilants and some weak consonants in a normal conversation cannot be perceived. (Source: H. Levitt, “Speech discrimination ability in the hearing impaired: spectrum considerations,” in The Vanderbilt Hearing-Aid Report: State of the Art-Research Needs, G.A. Studebaker and F.H. Bess (Eds.), Monographs in Contemporary Audiology, Upper Darby, Pa., 1982, p. 34. With permission.)
FIGURE 19.4 Measuring and using transfer functions of the external ear for binaural mixing(Fir finite impulse response).( Source: E M. Wenzel, Localization in virtual acoustic displays, Presence, vol. 1, p. 91, 1992. With permission level of a specific(band-limited) sound to its loudness. An efficient way of measuring the LGF is described by Allen et al.[ 1990]. Once the LGF of an impaired ear is known, a multiband hearing aid can implement the necessary compression for each band [ Villchur, 1973]. Note, however, that this assumes that interaction tween the bands can be neglected (problem of summation of partial loudnesses). This might not be vali for aids with a large number of bands. Other open questions include the choice of widths and filter shape of the bands, and optimization of dynamic aspects of the compression (e. g time constants). For aids with just two bands, the crossover frequency is a crucial parameter that is difficult to optimize Spatial Processing In spatial processing, audio signals are modified to give them new spatial attributes, such as, for example, the perception of having been recorded in a specific concert hall. The auditory system-using only the two ears as inputs-is capable of perceiving the direction and distance of a sound source with a high degree of accuracy, by exploiting binaural and monaural spectral cues. Wave propagation in the ear canal is essentially one dimensional. Hence, the 3-D spatial information is coded by sound diffraction into spectral information before he sound enters the ear canal. The sound diffraction is caused by the head/torso(on the order of 20-dB and 600-Hs interaural level difference and delay, respectively) and at the two pinnae(auriculae); see, for example, Shaw [1980]. Binaural techniques like the one discussed below can be used for evaluating room and concert hall acoustics(optionally in reduced-scale model rooms using a miniature dummy head), for noise assessment (e.g, in cars), and for"Kunstkopfstereophonie"(dummy-head stereophony ). In addition, there are techniques for loudspeaker reproduction (like" Q-Sound")that try to extend the range in horizontal angle of traditional stereo speakers by using interaural cross cancellation. Largely an open question is how to reproduce spatial information for large audiences, for example, in movie theaters. Figure 19.4 illustrates the technique for filtering a single-channel source using measured headrelated transfer functions, in effect, creating a virtual sound source in a given direction of the listeners auditory space(assuming plane waves, i. e, infinite source distance). On the left in this figure, the measurement of head-related transfer functions is shown. Focusing on the left ear for a moment(subscript I), we need to estimate the so-called free field transfer function(subscript ff) for given angles of incidence in the horizontal plane(azimuth (p)and vertical plane(elevation 8) H GO, o, 8)=Probe (jo, p, 8)/Pe Gjo) (19.3) where Probe. is the Fourier transform of the sound pressure measured in the subjects left ear, and Pref is the Fourier transform of the pressure measured at a suitable reference point in the free field without the subject ing present(e.g, at the midpoint between the two ears).( Note that Prf is independent of the direction of sound incidence since we assume an anechoic environment. )The middle of Fig. 19.4 depicts the convolution e 2000 by CRC Press LLC
© 2000 by CRC Press LLC level of a specific (band-limited) sound to its loudness. An efficient way of measuring the LGF is described by Allen et al. [1990]. Once the LGF of an impaired ear is known, a multiband hearing aid can implement the necessary compression for each band [Villchur, 1973]. Note, however, that this assumes that interactions between the bands can be neglected (problem of summation of partial loudnesses). This might not be valid for aids with a large number of bands. Other open questions include the choice of widths and filter shape of the bands, and optimization of dynamic aspects of the compression (e.g., time constants). For aids with just two bands, the crossover frequency is a crucial parameter that is difficult to optimize. Spatial Processing In spatial processing, audio signals are modified to give them new spatial attributes, such as, for example, the perception of having been recorded in a specific concert hall. The auditory system—using only the two ears as inputs—is capable of perceiving the direction and distance of a sound source with a high degree of accuracy, by exploiting binaural and monaural spectral cues. Wave propagation in the ear canal is essentially onedimensional. Hence, the 3-D spatial information is coded by sound diffraction into spectral information before the sound enters the ear canal. The sound diffraction is caused by the head/torso (on the order of 20-dB and 600-ms interaural level difference and delay, respectively) and at the two pinnae (auriculae); see, for example, Shaw [1980]. Binaural techniques like the one discussed below can be used for evaluating room and concerthall acoustics (optionally in reduced-scale model rooms using a miniature dummy head), for noise assessment (e.g., in cars), and for “Kunstkopfstereophonie” (dummy-head stereophony). In addition, there are techniques for loudspeaker reproduction (like “Q-Sound”) that try to extend the range in horizontal angle of traditional stereo speakers by using interaural cross cancellation. Largely an open question is how to reproduce spatial information for large audiences, for example, in movie theaters. Figure 19.4 illustrates the technique for filtering a single-channel source using measured headrelated transfer functions, in effect, creating a virtual sound source in a given direction of the listener’s auditory space (assuming plane waves, i.e., infinite source distance). On the left in this figure, the measurement of head-related transfer functions is shown. Focusing on the left ear for a moment (subscript l), we need to estimate the so-called free- field transfer function (subscript ff) for given angles of incidence in the horizontal plane (azimuth j) and vertical plane (elevation d): (19.3) where Pprobe,l is the Fourier transform of the sound pressure measured in the subject’s left ear, and Pref is the Fourier transform of the pressure measured at a suitable reference point in the free field without the subject being present (e.g., at the midpoint between the two ears). (Note that Pref is independent of the direction of sound incidence since we assume an anechoic environment.) The middle of Fig. 19.4 depicts the convolution FIGURE 19.4 Measuring and using transfer functions of the external ear for binaural mixing (FIR = finite impulse response). (Source: E.M. Wenzel, Localization in virtual acoustic displays, Presence, vol. 1, p. 91, 1992. With permission.) H j P j P j ff,l probe,l ref ( w, j, d) = ( w, j, d)/ ( w)
of any"dry"(e.g, mono, low reverberation) source with the stored He. Ga, (p, S)s and corresponding H.( p, 8)s On the right side in the figure, the resulting binaural signals are reproduced via equalized headphones The equalization ensures that a sound source with a flat spectrum(e.g, white noise)does not suffer any perceivable coloration for any direction(p, 8) Implemented in a real-time "binaural mixing console, the above scheme can be used to create"virtual sound sources. When combined with an appropriate scheme for interpolating head-related transfer functions, moving sound sources can be mimicked realistically. Furthermore, it is possible to superimpose early reflection of a hypothetical recording room, each filterered by the appropriate head-related transfer function. Such inclusion of a room in the simulation makes the spatial reproduction more robust against individual differences between"recording"and"listening"ears, in particular, if the listener's head movements are fed back to the binaural mixing console.(Head movements are useful for disambiguating spatial cues. )Finally, such a can be used to create virtual acoustic displays, " for example, for pilots and astronauts Wenzel, 1992] research issues are, for example, the required accuracy of the head-related transfer functions, inters variability, and psychoacoustic aspects of room simulations. Audio coding Audio coding is concerned with compressing(reducing the bit rate)of audio signals. The uncompressed digital audio of compact disks( CDs)is recorded at a rate of 705.6 kbit/s for each of the two channels of a stereo signal (ie, 16 bit/sample, 44. 1-kHz sampling rate; 1411. 2 kbit/s total). This is too high a bit rate for digital audio roadcasting(DAB)or for transmission via end-to-end digital telephone connections(integrated services digital network, ISDN). Current audio coding algorithms provide at least"better than FM"quality at a combined rate of 128 kbit/s for the two stereo channels(2 ISDN B channels! ), transparent coding "at rates of 96 to 128 kbit/s per mono channel, and"studio quality at rates between 128 and 196 kbit/s per mono channel. (While a large lumber of people will be able to detect distortions in the first class of coders, even so-called"golden ears should not be able to detect any differences between original and coded versions of known"critical"test signals the highest quality category adds a safety margin for editing, filtering, and/or recoding. To compress audio signals by a factor as large as eleven while maintaining a quality exceeding that of a local FM radio station requires sophisticated algorithms for reducing the irrelevance and redundancy in a given signal. A large portion(but usually less than 50%)of the bit-rate reduction in an audio coder is due to the first of the two mechanisms. Eliminating irrelevant portions of an input signal is done with the help of psycho- coustic models. It is obvious that a coder can eliminate portions of the input signal that-when played back-will be below the threshold of hearing. More complicated is the case when we have multiple signal components that tend to cover each other, that is, when weaker components cannot be heard due to the presence of stronger components. This effect is called masking. To let a coder take advantage of masking effects, we need to use good masking models. Masking can be modeled in the time domain where we distinguish so-called simultaneous masking(masker and maskee occur at the same time), forward masking(masker occurs before masked), and backward masking(masker occurs after maskee) Simultaneous masking usually is modeled in the frequency domain. This latter case is illustrated in Fig. 19.5 Audio coders that employ common frequency-domain models of masking start out by splitting and sub- sampling the input signal into different frequency bands(using filterbanks such as subband filterbanks or time- frequency transforms). Then, the masking threshold (i.e, Predicted masked threshold) is determined, followed by quantization of the spectral information and (optional) noiseless compression using variable-length coding The encoding process is completed by multiplexing the spectral information with side information, adding error protection, The first stage, the filter bank, has the following requirements. First, decomposing and then simply recon structing the signal should not lead to distortions("perfect reconstruction filterbank"). This results in the advantage that all distortions are due to the quantization of the spectral data. Since each quantizer works on band-limited data, the distortion(also band-limited due to refiltering) is controllable by using the masking models described above. Second, the bandwidths of the filters should be narrow to provide sufficient coding gain. On the other hand, the length of the impulse responses of the filters should be short enough(time resolution of the coder! )to avoid so-called pre-echoes, that is, backward spreading of distortion components e 2000 by CRC Press LLC
© 2000 by CRC Press LLC of any “dry” (e.g., mono, low reverberation) source with the stored Hff,1(jw, j, d)s and corresponding Hff,r(jw, j, d)s. On the right side in the figure, the resulting binaural signals are reproduced via equalized headphones. The equalization ensures that a sound source with a flat spectrum (e.g., white noise) does not suffer any perceivable coloration for any direction (j, d). Implemented in a real-time “binaural mixing console,” the above scheme can be used to create “virtual” sound sources. When combined with an appropriate scheme for interpolating head-related transfer functions, moving sound sources can be mimicked realistically. Furthermore, it is possible to superimpose early reflections of a hypothetical recording room, each filterered by the appropriate head-related transfer function. Such inclusion of a room in the simulation makes the spatial reproduction more robust against individual differences between “recording” and “listening” ears, in particular, if the listener’s head movements are fed back to the binaural mixing console. (Head movements are useful for disambiguating spatial cues.) Finally, such a system can be used to create “virtual acoustic displays,” for example, for pilots and astronauts [Wenzel, 1992]. Other research issues are, for example, the required accuracy of the head-related transfer functions, intersubject variability, and psychoacoustic aspects of room simulations. Audio Coding Audio coding is concerned with compressing (reducing the bit rate) of audio signals. The uncompressed digital audio of compact disks (CDs) is recorded at a rate of 705.6 kbit/s for each of the two channels of a stereo signal (i.e., 16 bit/sample, 44.1-kHz sampling rate; 1411.2 kbit/s total). This is too high a bit rate for digital audio broadcasting (DAB) or for transmission via end-to-end digital telephone connections (integrated services digital network, ISDN). Current audio coding algorithms provide at least “better than FM” quality at a combined rate of 128 kbit/s for the two stereo channels (2 ISDN B channels!), “transparent coding” at rates of 96 to 128 kbit/s per mono channel, and “studio quality” at rates between 128 and 196 kbit/s per mono channel. (While a large number of people will be able to detect distortions in the first class of coders, even so-called “golden ears” should not be able to detect any differences between original and coded versions of known “critical” test signals; the highest quality category adds a safety margin for editing, filtering, and/or recoding.) To compress audio signals by a factor as large as eleven while maintaining a quality exceeding that of a local FM radio station requires sophisticated algorithms for reducing the irrelevance and redundancy in a given signal. A large portion (but usually less than 50%) of the bit-rate reduction in an audio coder is due to the first of the two mechanisms. Eliminating irrelevant portions of an input signal is done with the help of psychoacoustic models. It is obvious that a coder can eliminate portions of the input signal that—when played back—will be below the threshold of hearing. More complicated is the case when we have multiple signal components that tend to cover each other, that is, when weaker components cannot be heard due to the presence of stronger components. This effect is called masking. To let a coder take advantage of masking effects, we need to use good masking models. Masking can be modeled in the time domain where we distinguish so-called simultaneous masking (masker and maskee occur at the same time), forward masking (masker occurs before maskee), and backward masking (masker occurs after maskee). Simultaneous masking usually is modeled in the frequency domain. This latter case is illustrated in Fig. 19.5. Audio coders that employ common frequency-domain models of masking start out by splitting and subsampling the input signal into different frequency bands (using filterbanks such as subband filterbanks or timefrequency transforms). Then, the masking threshold (i.e., predicted masked threshold) is determined, followed by quantization of the spectral information and (optional) noiseless compression using variable-length coding. The encoding process is completed by multiplexing the spectral information with side information, adding error protection, etc. The first stage, the filter bank, has the following requirements. First, decomposing and then simply reconstructing the signal should not lead to distortions (“perfect reconstruction filterbank”). This results in the advantage that all distortions are due to the quantization of the spectral data. Since each quantizer works on band-limited data, the distortion (also band-limited due to refiltering) is controllable by using the masking models described above. Second, the bandwidths of the filters should be narrow to provide sufficient coding gain. On the other hand, the length of the impulse responses of the filters should be short enough (time resolution of the coder!) to avoid so-called pre-echoes, that is, backward spreading of distortion components
Masked threshold &r o/Unmasked Threshold 录 (for Pure Tones) g0020.0501020.5 requency[kHz]→ FIGURE 19.5 Masked threshold in the frequency domain for a hypothetical input signal. In the vicinity of high-level spectral components, signal components below the current masked threshold cannot be heard. that result from sudden onsets(e.g, castanets). These two contradictory requirements, obviously, have to be worked out by a compromise. Critical band"filters have the shortest impulse responses needed for coding of transient signals. On the other hand, the optimum frequency resolution(ie, the one resulting in the highest coding gain) for a typical signal can be achieved by using, for example, a 2048-point modified discrete cosine transform(MDCt). In the second stage, the(time-varying) masking threshold as determined by the psychoacoustic model usually ontrols an iterative analysis-by-synthesis quantization and coding loop. It ate rules for masking of tones by noise and of noise by tones, though little is known in the psychoacoustic literature for more general signals. Quantizer step sizes can be set and bits can be allocated according to the known spectral estimate, by block companding with transmission of the scale factors as side information or iteratively in a variable-length coding loop(Huffman coding). In the latter case, one can low-pass filter the signal if the total required bit rate is too high The decoder has to invert the processing steps of the encoder, that is, do the error correction, perform Huffman decoding, and reconstruct the filter signals or the inverse-transformed time-domain signal. decoder is significantly less complex than the encoder, it is usually implemented on a single DSP chip, while the encoder uses several DSP chips Current research topics encompass tonality measures and time-frequency representations of signals. More information can be found in Johnston and Brandenburg [1991] Echo cancellation Echo cancellers were first deployed in the U.S. telephone network in 1979. Today, they are virtually ubiquitous in long-distance telephone circuits where they cancel so-called line echoes(i.e, electrical echoes)resulting from nonperfect hybrids(the devices that couple local two-wire to long-distance four-wire circuits). In satellite circuits, echoes bouncing back from the far end of a telephone connection with a round-trip delay of about 600 ms are very annoying and disruptive. Acoustic echo cancellation-where the echo path is characterized by the transfer function H(z) between a loudspeaker and a microphone in a room(e. g, in a speakerphone)-is crucial for teleconferencing where two or more parties are connected via full-duplex links. Here, echo cancel lation can also alleviate acoustic feedback(howling,) The principle of acoustic echo cancellation is depicted in Fig 19.6(a). The echo path H()is cancelled by modeling H(z) by an adaptive filter and subtracting the filter's output y(r) from the microphone signal yo The adaptability of the filter is necessary since H(z) changes appreciably with movement of people or objects in the room and because periodic measurements of the room would be impractical. Acoustic echo cancellation is more challenging than cancelling line echoes for several reasons. First, room impulse responses h(n)are longer than 200 ms compared to less than 20 ms for line echo cancellers. Second, the echo path of a room h(r) is likely to change constantly (note that even small changes in temperature can cause significant changes of h). Third, e 2000 by CRC Press LLC
© 2000 by CRC Press LLC that result from sudden onsets (e.g., castanets). These two contradictory requirements, obviously, have to be worked out by a compromise. “Critical band” filters have the shortest impulse responses needed for coding of transient signals. On the other hand, the optimum frequency resolution (i.e., the one resulting in the highest coding gain) for a typical signal can be achieved by using, for example, a 2048-point modified discrete cosine transform (MDCT). In the second stage, the (time-varying) masking threshold as determined by the psychoacoustic model usually controls an iterative analysis-by-synthesis quantization and coding loop. It can incorporate rules for masking of tones by noise and of noise by tones, though little is known in the psychoacoustic literature for more general signals. Quantizer step sizes can be set and bits can be allocated according to the known spectral estimate, by block companding with transmission of the scale factors as side information or iteratively in a variable-length coding loop (Huffman coding). In the latter case, one can low-pass filter the signal if the total required bit rate is too high. The decoder has to invert the processing steps of the encoder, that is, do the error correction, perform Huffman decoding, and reconstruct the filter signals or the inverse-transformed time-domain signal. Since the decoder is significantly less complex than the encoder, it is usually implemented on a single DSP chip, while the encoder uses several DSP chips. Current research topics encompass tonality measures and time-frequency representations of signals. More information can be found in Johnston and Brandenburg [1991]. Echo Cancellation Echo cancellers were first deployed in the U.S. telephone network in 1979. Today, they are virtually ubiquitous in long-distance telephone circuits where they cancel so-called line echoes (i.e.,electrical echoes) resulting from nonperfect hybrids (the devices that couple local two-wire to long-distance four-wire circuits). In satellite circuits, echoes bouncing back from the far end of a telephone connection with a round-trip delay of about 600 ms are very annoying and disruptive. Acoustic echo cancellation—where the echo path is characterized by the transfer function H(z) between a loudspeaker and a microphone in a room (e.g., in a speakerphone)—is crucial for teleconferencing where two or more parties are connected via full-duplex links. Here, echo cancellation can also alleviate acoustic feedback (“howling”). The principle of acoustic echo cancellation is depicted in Fig. 19.6(a). The echo path H(z) is cancelled by modeling H(z) by an adaptive filter and subtracting the filter’s output y$(t) from the microphone signal y(t). The adaptability of the filter is necessary since H(z) changes appreciably with movement of people or objects in the room and because periodic measurements of the room would be impractical. Acoustic echo cancellation is more challenging than cancelling line echoes for several reasons. First,room impulse responses h(t) are longer than 200 ms compared to less than 20 ms for line echo cancellers. Second, the echo path of a room h(t) is likely to change constantly (note that even small changes in temperature can cause significant changes of h). Third, FIGURE 19.5 Masked threshold in the frequency domain for a hypothetical input signal. In the vicinity of high-level spectral components, signal components below the current masked threshold cannot be heard
ADAPTATION s FIGURE 19.6(a) Principle of using an echo canceller in teleconferencing. (b) Realization of the echo canceller in subbands. After M. M. Sondhi and W. Kellermann, "Adaptive echo cancellation for speech signals, "in Advances in Speech Signal Processing, S Furui and M. M. Sondhi, Eds, New York: Marcel Dekker, 1991. By courtesy of Marcel Dekker, Inc. teleconferencing eventually will demand larger audio bandwidths(e. g, 7 kHz) compared to standard telephone connections(about 3. 2 kHz). Finally, we note that echo cancellation in a stereo setup(two microphones and two loudspeakers at each end) is an even harder problem on which very little work has been done so far It is obvious that the initially unknown echo path H() has to be "learned"by the canceller. It is also clear that for adaptation to work there needs to be a nonzero input signal x(n)that excites all the eigenmodes of the system(resonances, or"peaks"of the system magnitude response H(jo)). Another important problem is how to handle double-talk(speakers at both ends are talking simultaneously ) In such a case, the canceller could sily get confused by the speech from the near end that acts as an uncorrelated noise in the adaptation. Finally, the convergence rate, that is, how fast the canceller adapts to a change in the echo path, is an important measure to compare different algorithms Adaptive filter theory suggests several algorithms for use in echo cancellation. The most popular one is the so-called least-mean square(LMs)algorithm that models the echo path by an FIr filter with an impulse respons h(t). Using vector notation h for the true echo path impulse response, h for its estimate, and x for the excitation time signal, an estimate of the echo is obtained by y(o)=h,x, where the prime denotes vector transpose. A reasonable objective for a canceller is to minimize the instantaneous squared error e 2 (t), where e(n)=yo-no The time derivative of h can be set to dh -uVh e (t)=-2ue(t)vA e(t)=2ue(t)x (194 resulting in the simple update equation hk+=h+ae x, where a(or H)control the rate of change. In practice, whenever the far-end signal x(r) is low in power, it is a good idea to freeze the canceller by setting a =0 Sophisticated logic is needed to detect double talk. When it occurs, then also set a =0. It can be shown that the spread of the eigenvalues of the autocorrelation matrix of xd t)determines the convergence rate, where the lowest-converging eigenmode corresponds to the smallest eigenvalue. Since the eigenvalues themselves scale with the power of the predominant spectral components in x(o), setting a= 2w/(x'x)will make the convergence will converge at the same rate only if x(r) is white noise. Therefore, pre-whitening the far-end signal will help The LMS method is an iterative approach to echo cancellation. An example of a noniterative, block-oriented approach is the least-squares(LS)algorithm Solving a system of equations to get h, however, is computational more costly. This cost can be reduced considerably by running the LS method on a sample-by-sample basis and by taking advantage of the fact that the new signal vectors are the old vectors with the oldest sample dropped and one new sample added. This is the recursive least-squares(rls)algorithm. It also has the advantage e 2000 by CRC Press LLC
© 2000 by CRC Press LLC teleconferencing eventually will demand larger audio bandwidths (e.g., 7 kHz) compared to standard telephone connections (about 3.2 kHz). Finally, we note that echo cancellation in a stereo setup (two microphones and two loudspeakers at each end) is an even harder problem on which very little work has been done so far. It is obvious that the initially unknown echo path H(z) has to be “learned” by the canceller. It is also clear that for adaptation to work there needs to be a nonzero input signal x(t) that excites all the eigenmodes of the system (resonances, or “peaks” of the system magnitude response *H(jv)*). Another important problem is how to handle double-talk (speakers at both ends are talking simultaneously). In such a case, the canceller could easily get confused by the speech from the near end that acts as an uncorrelated noise in the adaptation. Finally, the convergence rate, that is, how fast the canceller adapts to a change in the echo path, is an important measure to compare different algorithms. Adaptive filter theory suggests several algorithms for use in echo cancellation. The most popular one is the so-called least-mean square (LMS) algorithm that models the echo path by an FIR filter with an impulse response $ h(t). Using vector notation h for the true echo path impulse response, $ h for its estimate, and x for the excitation time signal, an estimate of the echo is obtained by y$(t) = $ h¢x, where the prime denotes vector transpose. A reasonable objective for a canceller is to minimize the instantaneous squared error e 2 (t), where e(t) = y(t) – y$(t). The time derivative of $ h can be set to (19.4) resulting in the simple update equation $ hk+1 = $ hk + aekxk, where a (or m) control the rate of change. In practice, whenever the far-end signal x(t) is low in power, it is a good idea to freeze the canceller by setting a = 0. Sophisticated logic is needed to detect double talk. When it occurs, then also set a = 0. It can be shown that the spread of the eigenvalues of the autocorrelation matrix of x(t) determines the convergence rate, where the slowest-converging eigenmode corresponds to the smallest eigenvalue. Since the eigenvalues themselves scale with the power of the predominant spectral components in x(t), setting a = 2m/(x¢x) will make the convergence rate independent of the far-end power. This is the normalized LMS method. Even then, however, all eigenmodes will converge at the same rate only if x(t) is white noise. Therefore, pre-whitening the far-end signal will help in speeding up convergence. The LMS method is an iterative approach to echo cancellation. An example of a noniterative, block-oriented approach is the least-squares (LS) algorithm. Solving a system of equations to get $ h, however, is computationally more costly. This cost can be reduced considerably by running the LS method on a sample-by-sample basis and by taking advantage of the fact that the new signal vectors are the old vectors with the oldest sample dropped and one new sample added. This is the recursive least-squares (RLS) algorithm. It also has the advantage FIGURE 19.6 (a) Principle of using an echo canceller in teleconferencing. (b) Realization of the echo canceller in subbands. ( After M. M. Sondhi and W. Kellermann, “Adaptive echo cancellation for speech signals,” in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds., New York: Marcel Dekker, 1991. By courtesy of Marcel Dekker, Inc.) d d ^ ^ ^ h x h h t =- — =- — = mm m e t et et et 2 () () () () 2 2
yt □)r~810b e H e(0 W e(t IGURE 19.7 Two principles of active noise control Feedback control system(a)and(b); feedforward control system( and(d). Physical block diagrams(a)and (c), and equivalent electrical forms( b)and (d). (After P. A. Nelson and S J. Elliott, Active Control of Sound, London: Academic Press, 1992. With permission. of normalizing x by multiplying it with the inverse of its autocorrelation matrix. This, in effect, equalizes the daptation rate of all eigenmodes. Another interesting approach is outlined in Fig. 19.6(b). As in subband coding(discussed earlier), splitting the signals x and y into subbands with analysis filterbanks A, doing the cancellation in bands, and resynthesizing the outgoing ("error")signal e through a synthesis filterbank S also reduces the eigenvalue spread of each bandpass signal compared to the eigenvalue spread of the fullband signal. This is true for the eigenvalues that correspond to the"center"(i. e, unattenuated) portions of each band. It turns out, however, that the slowly converging"transition-band"eigenmodes get attenuated significantly by the synthesis filter S. The main advan tage of the subband approach is the reduction in computational complexity due to the down-sampling of the filterbank signals. The drawback of the subband approach, however, is the introduction of the combined delay of A and S. Eliminating the analysis filterbank on y() and moving the synthesis filterbank into the adaptation branch Y will remove this delay with the result that the canceller will not be able to model the earliest portions of the echo-path impulse response h(n). To alleviate this problem, we could add in parallel a fullband echo canceller with a short filter. Further information and an extensive bibliography can be found in Haensler [1992 Active noise and Sound control Active noise control (ANC) is a way to reduce the sound pressure level of a given noise source through lectroacoustic means, ANC and echo cancellation are somewhat related, While even acoustic echo cancellation is actually done on electrical signals, ANC could be labeled"wave cancellation, " since it involves using one or e Secondary acoustic or vibrational sources. Another important difference is the fact that in ANC one usually would like to cancel a given noise in a whole region in space, while echo cancellation commonly involves only one microphone picking up the echo signal at a single point in space. Finally, the transfer function of the transducer used to generate a cancellation("secondary source")signal needs to be considered in ANC. Active sound control(ASC)can be viewed as an offspring of ANC. In ASC, instead of trying to cancel a iven sound field, one tries to control specific spatial and temporal characteristics of the sound field. One application is in adaptive sound reproduction systems. Here, ASC aims at solving the large-audience spatial reproduction problem mentioned in the spatial processing section of this chapter. Two important principles of ANC are depicted in Fig. 19.7. In the upper half [Fig. 19.7(a)and (b)], a feedback loop is formed between the controller G(s) and the transfer function C(s) of the secondary source, and the coustic path to the error microphone Control theory suggests that E/Y=1/(1+ C(s)G(s)], where E(s)and Y(s)are Laplace transforms of e(n)and y(o), respectively. Obviously, if we could make C a real constant and G oo0, we would get a"zone of quiet"around the error microphone. Unfortunately, in practice, C(s) will introduce at least a delay, thus causing stability problems for too large a magnitude G at high enough frequencies. The system can be kept stable, for example, by including a low-pass filter in G and by positioning the secondary source in close vicinity to the error microphone. A highly successful application of the feedback e 2000 by CRC Press LLC
© 2000 by CRC Press LLC of normalizing x by multiplying it with the inverse of its autocorrelation matrix. This, in effect, equalizes the adaptation rate of all eigenmodes. Another interesting approach is outlined in Fig. 19.6(b). As in subband coding (discussed earlier), splitting the signals x and y into subbands with analysis filterbanks A, doing the cancellation in bands, and resynthesizing the outgoing (“error”) signal e through a synthesis filterbank S also reduces the eigenvalue spread of each bandpass signal compared to the eigenvalue spread of the fullband signal. This is true for the eigenvalues that correspond to the “center” (i.e., unattenuated) portions of each band. It turns out, however, that the slowly converging “transition-band” eigenmodes get attenuated significantly by the synthesis filter S. The main advantage of the subband approach is the reduction in computational complexity due to the down-sampling of the filterbank signals. The drawback of the subband approach, however, is the introduction of the combined delay of A and S. Eliminating the analysis filterbank on y(t) and moving the synthesis filterbank into the adaptation branch Y $ will remove this delay with the result that the canceller will not be able to model the earliest portions of the echo-path impulse response h(t). To alleviate this problem, we could add in parallel a fullband echo canceller with a short filter. Further information and an extensive bibliography can be found in Haensler [1992]. Active Noise and Sound Control Active noise control (ANC) is a way to reduce the sound pressure level of a given noise source through electroacoustic means. ANC and echo cancellation are somewhat related. While even acoustic echo cancellation is actually done on electrical signals, ANC could be labeled “wave cancellation,” since it involves using one or more secondary acoustic or vibrational sources. Another important difference is the fact that in ANC one usually would like to cancel a given noise in a whole region in space, while echo cancellation commonly involves only one microphone picking up the echo signal at a single point in space. Finally, the transfer function of the transducer used to generate a cancellation (“secondary source”) signal needs to be considered in ANC. Active sound control (ASC) can be viewed as an offspring of ANC. In ASC, instead of trying to cancel a given sound field, one tries to control specific spatial and temporal characteristics of the sound field. One application is in adaptive sound reproduction systems. Here, ASC aims at solving the large-audience spatial reproduction problem mentioned in the spatial processing section of this chapter. Two important principles of ANC are depicted in Fig. 19.7. In the upper half [Fig. 19.7(a) and (b)], a feedback loop is formed between the controller G(s) and the transfer function C(s) of the secondary source, and the acoustic path to the error microphone. Control theory suggests that E/Y = 1/[1 + C(s)G(s)], where E(s) and Y(s) are Laplace transforms of e(t) and y(t), respectively. Obviously, if we could make C a real constant and G Æ •, we would get a “zone of quiet” around the error microphone. Unfortunately, in practice, C(s) will introduce at least a delay, thus causing stability problems for too large a magnitude *G* at high enough frequencies. The system can be kept stable, for example, by including a low-pass filter in G and by positioning the secondary source in close vicinity to the error microphone. A highly successful application of the feedback FIGURE 19.7 Two principles of active noise control. Feedback control system (a) and (b); feedforward control system (c) and (d). Physical block diagrams (a) and (c), and equivalent electrical forms (b) and (d). (After P. A. Nelson and S. J. Elliott, Active Control of Sound, London: Academic Press, 1992. With permission.)
control in ANC is in active hearing protective devices(HPDs) and high-quality headsets and"motional feedback"loudspeakers. Passive HPDs offer little or no noise attenuation at low frequencies due to inherent physical limitations. Since the volume enclosed by earmuffs is rather small, HPDs can benefit from the increase in low-frequency attenuation brought about by feedback-control ANC. Finally, note that the same circuit can be used for high-quality reproduction of a communications signal s(t) fed into a headset by subtracting s(t) electrically from ef(n). The resulting transfer function is E/S=C(s)G()/[1+CsG(s) assuming Y(s)=0. Thus, a high loop gain G(s will ensure both, a high noise attenuation at low frequencies and a faithful bass reproduction of the communications signal. The principle of the feedforward control method in ANC is outlined in the lower half of Fig. 19.6(c)and (d). The obvious difference to the feedback control method is that the separate reference signal x(t) is used. Here, cancellation is achieved for the filter transfer function W= H(s)/C(s)which is most often implemented oy an adaptive filter. The fact that x(t)reaches the ANC system earlier than e(t) allows for a causal filter, needed broadband systems. However, a potential problem with this method is the possibility of feedback of the secondary source signal y(r) into the path of the reference signal x(t). This is obviously the case when x(r)is picked up by a microphone in a duct just upstream of the secondary source C An elegant solution for ANC in a duct without explicit feedback cancellation is to use a recursive filter W. Single error signal/single secondary source systems cannot achieve global canc tion or sound control in a room. An intuitive argument for this fact is that one needs at least as many secondary sources and error microphones as there are orthogonal wave modes in the room. Since the number of wave modes in a room below a given frequency is approximately proportional to the third power of this frequency, it is clear that anc d ASC)is practical only at low frequencies. In practice, using small(point-source)transducers, it turns out that one should use more error microphones than secondary sources. Examples of such multidimensional ANC systems are employed for cancelling the lowest few harmonics of the engine noise in an airplane cabin and in a passenger car. In both of these cases, the adaptive filter matrix is controlled by a multiple-error version of the LMS algorithm. Further information can be found in Nelson and Elliott [1992] Summary and acknowledgment ave touched upon several topics in audio and electroacoustics. The reade that the authors choice of these topics was biased by his background in communication acoustics(and by his lack of knowledge in music). Furthermore, ongoing efforts in integrating different communication modalities into systems for teleconferencing [see, e.g, Flanagan et al., 1990] had a profound effect in focusing this contribution. Experts in topics covered in this contribution, like Jont Allen, David Berkley, Gary Elko, Joe Hall, Jim Johnston, Mead Killion, Harry Levitt, Dennis Morgan, and--last, but not least--Mohan Sondhi, are gratefully acknowledged for their patience and help Defining Terms Audio: Science of processing signals that are within the frequency range of hearing, that is, roughly between 20 Hz and 20 kHz. Also name for this kind of signal. Critical bands: Broadly used to refer to psychoacoustic phenomena of limited frequency resolution in the cochlea. More specifically, the concept of critical bands evolved in experiments on the audibility of a tone in noise of varying bandwidth, centered around the frequency of the tone. Increasing the noise bandwidth beyond a certain critical value has little effect on the audibility of the tone. Electroacoustics: Science of interfacing between acoustical waves and corresponding electrical signals. This includes the engineering of transducers(e.g, loudspeakers and microphones), but also parts of the psychology of hearing, following the notion that it is not necessary to present to the ear signal components that cannot be perceived Intelligibility maximization and loudness restoration: Two different objectives in fitting hearing aids. Maximiz ing intelligibility involves conducting laborious intelligibility tests. Loudness restoration involves measuring the mapping between a given sound level and its perceived loudness. Here, we assume that recreating the loudness a normal hearing person would perceive is close to maximizing the intelligibility of speech. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC control in ANC is in active hearing protective devices (HPDs) and high-quality headsets and “motionalfeedback” loudspeakers. Passive HPDs offer little or no noise attenuation at low frequencies due to inherent physical limitations. Since the volume enclosed by earmuffs is rather small, HPDs can benefit from the increase in low-frequency attenuation brought about by feedback-control ANC. Finally, note that the same circuit can be used for high-quality reproduction of a communications signal s(t) fed into a headset by subtracting s(t) electrically from e(t). The resulting transfer function is E/S = C(s)G(s)/[1 + C(s)G(s)] assuming Y(s) = 0. Thus, a high loop gain *G(s)* will ensure both, a high noise attenuation at low frequencies and a faithful bass reproduction of the communications signal. The principle of the feedforward control method in ANC is outlined in the lower half of Fig. 19.6(c) and (d). The obvious difference to the feedback control method is that the separate reference signal x(t) is used. Here, cancellation is achieved for the filter transfer function W = H(s)/C(s) which is most often implemented by an adaptive filter. The fact that x(t) reaches the ANC system earlier than e(t) allows for a causal filter, needed in broadband systems. However, a potential problem with this method is the possibility of feedback of the secondary source signal y$(t) into the path of the reference signal x(t). This is obviously the case when x(t) is picked up by a microphone in a duct just upstream of the secondary source C. An elegant solution for ANC in a duct without explicit feedback cancellation is to use a recursive filter W. Single error signal/single secondary source systems cannot achieve global cancellation or sound control in a room. An intuitive argument for this fact is that one needs at least as many secondary sources and error microphones as there are orthogonal wave modes in the room. Since the number of wave modes in a room below a given frequency is approximately proportional to the third power of this frequency, it is clear that ANC (and ASC) is practical only at low frequencies. In practice, using small (point-source) transducers, it turns out that one should use more error microphones than secondary sources. Examples of such multidimensional ANC systems are employed for cancelling the lowest few harmonics of the engine noise in an airplane cabin and in a passenger car. In both of these cases, the adaptive filter matrix is controlled by a multiple-error version of the LMS algorithm. Further information can be found in Nelson and Elliott [1992]. Summary and Acknowledgment In this section, we have touched upon several topics in audio and electroacoustics. The reader may be reminded that the author’s choice of these topics was biased by his background in communication acoustics (and by his lack of knowledge in music). Furthermore, ongoing efforts in integrating different communication modalities into systems for teleconferencing [see, e.g., Flanagan et al., 1990] had a profound effect in focusing this contribution. Experts in topics covered in this contribution, like Jont Allen, David Berkley, Gary Elko, Joe Hall, Jim Johnston, Mead Killion, Harry Levitt, Dennis Morgan, and—last, but not least—Mohan Sondhi, are gratefully acknowledged for their patience and help. Defining Terms Audio: Science of processing signals that are within the frequency range of hearing, that is, roughly between 20 Hz and 20 kHz. Also name for this kind of signal. Critical bands: Broadly used to refer to psychoacoustic phenomena of limited frequency resolution in the cochlea. More specifically, the concept of critical bands evolved in experiments on the audibility of a tone in noise of varying bandwidth, centered around the frequency of the tone. Increasing the noise bandwidth beyond a certain critical value has little effect on the audibility of the tone. Electroacoustics: Science of interfacing between acoustical waves and corresponding electrical signals. This includes the engineering of transducers (e.g., loudspeakers and microphones), but also parts of the psychology of hearing, following the notion that it is not necessary to present to the ear signal components that cannot be perceived. Intelligibility maximization and loudness restoration: Two different objectives in fitting hearing aids. Maximizing intelligibility involves conducting laborious intelligibility tests. Loudness restoration involves measuring the mapping between a given sound level and its perceived loudness. Here, we assume that recreating the loudness a normal hearing person would perceive is close to maximizing the intelligibility of speech