Delp, E.J. Allebach, J, Bouman, C.A., Rajala, S.A., Bose, N.K., Sibul, L H, Wolf, w Zhang, Y-Q. "Multidimensional Signal Processing The electrical Engineering Handbook Ed. Richard C. dorf Boca Raton: CRc Press llc. 2000
Delp, E.J. Allebach, J., Bouman, C.A., Rajala, S.A., Bose, N.K., Sibul, L.H., Wolf, W., Zhang, Y-Q. “Multidimensional Signal Processing” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
17 Multidimensional Signal Processing Edward J Delp Purdue University Jan Allebach Image Processing Capture. Point Operations Enhance Compression. Reconstruction Detection· Analysis Charles a bouman and Computer Vision Purdue University 17.2 Video Signal Processing mpling. Quantization. Vector Quantization. Video ah A. rajala Compression. Information-Preserving Coders. Predictive North Carolina State University Coding. Motion-Compensated Predictive Coding. Transform Coding.Subband Coding. HDTV. Motion Estimation N. K. Bose Techniques Token Matching Methods. Image Quality and Visual Pennsylvania State University Perception. Visual Perception L H. Sibul 17.3 Sensor Array Process patial Arrays, Beamformers, and FIR Filters. Discrete Arrays for Pennsylvania State University Beamforming. Discrete Arrays and Polynomials. Velocity Filtering Wayne Wolf 17.4 Video Processing Architectures Princeton University Computational Techniques.Heterogeneous Multiprocessors. Video Signal Processors. Instruction Set Extensions Ya-Qin Zhang 17.5 MPEG-4 Based Multimedia Information System Microsoft Research, china MPEG-4 Multimedia System 17.1 Digital Image processing Edward Delp, Jan Allebach, and Charles A Bouman What is a digital image? What is digital image processing? Why does the use of computers to process pictures seem to be everywhere? The space program, robots, and even people with personal computers are using digital image processing techniques. In this section we shall describe what a digital image is, how one obtains digital images, what the problems with digital images are(they are not trouble-free), and finally how these images are on of processing the images is presented later in the section. At the end of this section is a bibliography of selected references on digital image processing. The use of computers to process pictures is about 30 years old. While some work was done more than 50 years ago, the year 1960 is usually the accepted date when serious work was started in such areas as optical character recognition, image coding, and the space program. NASAs Ranger moon mission was one of the first programs to return digital images from space. The Jet Propulsion Laboratory (JPL) established one of the early genera purpose image processing facilities using second-generation computer technology. < The early attempts at digital image processing were hampered because of the relatively slow computers used, the IBM 7094, the fact that computer time itself was expensive, and that image digitizers had to be built by the research centers. It was not until the late 1960s that image processing hardware was generally available Ithough expensive). Today it is possible to put together a small laboratory system for less than $60,000;a system based on a popular home computer can be assembled for about $5,000. As the cost of computer hardware c2000 by CRC Press LLC
© 2000 by CRC Press LLC 17 Multidimensional Signal Processing 17.1 Digital Image Processing Image Capture • Point Operations • Image Enhancement • Digital Image Compression • Reconstruction • Edge Detection • Analysis and Computer Vision 17.2 Video Signal Processing Sampling • Quantization • Vector Quantization • Video Compression • Information-Preserving Coders • Predictive Coding • Motion-Compensated Predictive Coding • Transform Coding • Subband Coding • HDTV • Motion Estimation Techniques • Token Matching Methods • Image Quality and Visual Perception • Visual Perception 17.3 Sensor Array Processing Spatial Arrays, Beamformers, and FIR Filters • Discrete Arrays for Beamforming • Discrete Arrays and Polynomials • Velocity Filtering 17.4 Video Processing Architectures Computational Techniques • Heterogeneous Multiprocessors • Video Signal Processors • Instruction Set Extensions 17.5 MPEG-4 Based Multimedia Information System MPEG-4 Multimedia System 17.1 Digital Image Processing Edward J. Delp, Jan Allebach, and Charles A. Bouman What is a digital image? What is digital image processing? Why does the use of computers to process pictures seem to be everywhere? The space program, robots, and even people with personal computers are using digital image processing techniques. In this section we shall describe what a digital image is, how one obtains digital images, what the problems with digital images are (they are not trouble-free), and finally how these images are used by computers. A discussion of processing the images is presented later in the section. At the end of this section is a bibliography of selected references on digital image processing. The use of computers to process pictures is about 30 years old. While some work was done more than 50 years ago, the year 1960 is usually the accepted date when serious work was started in such areas as optical character recognition, image coding, and the space program. NASA’s Ranger moon mission was one of the first programs to return digital images from space. The Jet Propulsion Laboratory (JPL) established one of the early generalpurpose image processing facilities using second-generation computer technology. The early attempts at digital image processing were hampered because of the relatively slow computers used, i.e., the IBM 7094, the fact that computer time itself was expensive, and that image digitizers had to be built by the research centers. It was not until the late 1960s that image processing hardware was generally available (although expensive). Today it is possible to put together a small laboratory system for less than $60,000; a system based on a popular home computer can be assembled for about $5,000. As the cost of computer hardware Edward J. Delp Purdue University Jan Allebach Purdue University Charles A. Bouman Purdue University Sarah A. Rajala North Carolina State University N. K. Bose Pennsylvania State University L. H. Sibul Pennsylvania State University Wayne Wolf Princeton University Ya-Qin Zhang Microsoft Research, China
decreases, more uses of digital image processing will appear in all facets of life. Some people have predicted that by the turn of the century at least 50% of the images we handle in our private and professional lives will have been processed on a computer. Image Capture a digital image is nothing more than a matrix of numbers. The question is how does this matrix represent a real image that one sees on a computer screen? K Like all imaging processes, whether they are analog or digital, one first starts with a sensor(or transducer) that converts the original imaging energy into an electrical signal. These sensors, for instance, could be the photomultiplier tubes used in an x-ray system that converts the x-ray energy into a known electrical voltage. The transducer system used in ultrasound imaging is an example where sound pressure is converted to electrical energy, a simple Tv camera is perhaps the most ubiquitous example. An important fact to note is that the process of conversion from one energy form to an electrical signal is not necessarily a linear process. In other words, a proportional charge in the input energy to the sensor will not always cause the same proportional charge in the output electrical signal. In many cases calibration data are obtained in the laboratory so that the relationship between the input energy and output electrical signal is known. These data are necessary because istics change with age and other usage facto The sensor is not the only thing needed to form an image in an imaging system. The sensor must have some spatial extent before an image is formed By spatial extent we mean that the sensor must not be a simple point source examining only one location of energy output. To explain this further, let us examine two types of imaging sensors used in imaging: a CCD video camera and the ultrasound transducer used in many medical imaging applications The CCD camera consists of an array of light sensors known as charge-coupled devices. The image is formed by examining the output of each sensor in a preset order for a finite time. The electronics of the system then forms an electrical signal which produces an image that is shown on a cathode-ray tube(Crt) display. The image formed because there is an array of sensors, each one examining only one spatial location of the region to be sensed. The process of sampling the output of the sensor array in a particular order is known as scanning. Scanning the typical method used to convert a two-dimensional energy signal or image to a one-dimensional electrical ignal that can be handled by the computer. (An image can be thought of as an energy field with spatial extent. Another form of scanning is used in ultrasonic imaging. In this application there is only one sensor instead of an array of sensors. The ultrasound transducer is moved or steered (either mechanically or electrically)to various spatial locations on the patient's chest or stomach. As the sensor is moved to each location, the output electrical signal of the sensor is sampled and the electronics of the system then form a television-like signal which is displayed. Nearly all the transducers used in imaging form an image by either using an array of sensors or a single sensor that is moved to each spatial location. One immediately observes that both of the approaches discussed above are equivalent in that the energy is sensed at various spatial locations of the object to be imaged. This energy is then converted to an electrical signal by the transducer. The image formation processes just described are classical analog image formation, with the distance between the sensor locations limiting the spatial resolution in the system. In the array sensors, resolution is determined by how close the sensors are located in the array In the single-sensor approach, the spatial resolution is limited by how far the sensor is moved. In an actual system spatial resolution is also determined by the performance characteristics of the sensor. Here we are assuming for our purposes perfect sensors. In digital image formation one is concerned about two processes: spatial sampling and quantization. Sam pling is quite similar to scanning in analog image formation. The second process is known as quantization or analog-to-digital conversion, whereby at each spatial location a number is assigned to the amount of energy the transducer observes at that location. This number is usually proportional to the electrical signal at the output of the transducer. The overall process of sampling and quantization is known as digitization. Sometimes the digitization process is just referred to as analog-to-digital conversion, or A/D conversion; however, the reader should remember that digitization also includes spatial sampling The digital image formulation process is summarized in Fig 17. 1. The spatial sampling process can be considered as overlaying a grid on the object, with the sensor examining the energy output from each grid box C 2000 by CRC Press LLC
© 2000 by CRC Press LLC decreases, more uses of digital image processing will appear in all facets of life. Some people have predicted that by the turn of the century at least 50% of the images we handle in our private and professional lives will have been processed on a computer. Image Capture A digital image is nothing more than a matrix of numbers. The question is how does this matrix represent a real image that one sees on a computer screen? Like all imaging processes, whether they are analog or digital, one first starts with a sensor (or transducer) that converts the original imaging energy into an electrical signal. These sensors, for instance, could be the photomultiplier tubes used in an x-ray system that converts the x-ray energy into a known electrical voltage. The transducer system used in ultrasound imaging is an example where sound pressure is converted to electrical energy; a simple TV camera is perhaps the most ubiquitous example. An important fact to note is that the process of conversion from one energy form to an electrical signal is not necessarily a linear process. In other words, a proportional charge in the input energy to the sensor will not always cause the same proportional charge in the output electrical signal. In many cases calibration data are obtained in the laboratory so that the relationship between the input energy and output electrical signal is known. These data are necessary because some transducer performance characteristics change with age and other usage factors. The sensor is not the only thing needed to form an image in an imaging system. The sensor must have some spatial extent before an image is formed. By spatial extent we mean that the sensor must not be a simple point source examining only one location of energy output. To explain this further, let us examine two types of imaging sensors used in imaging: a CCD video camera and the ultrasound transducer used in many medical imaging applications. The CCD camera consists of an array of light sensors known as charge-coupled devices. The image is formed by examining the output of each sensor in a preset order for a finite time. The electronics of the system then forms an electrical signal which produces an image that is shown on a cathode-ray tube (CRT) display. The image is formed because there is an array of sensors, each one examining only one spatial location of the region to be sensed. The process of sampling the output of the sensor array in a particular order is known as scanning. Scanning is the typical method used to convert a two-dimensional energy signal or image to a one-dimensional electrical signal that can be handled by the computer. (An image can be thought of as an energy field with spatial extent.) Another form of scanning is used in ultrasonic imaging. In this application there is only one sensor instead of an array of sensors. The ultrasound transducer is moved or steered (either mechanically or electrically) to various spatial locations on the patient’s chest or stomach. As the sensor is moved to each location, the output electrical signal of the sensor is sampled and the electronics of the system then form a television-like signal which is displayed. Nearly all the transducers used in imaging form an image by either using an array of sensors or a single sensor that is moved to each spatial location. One immediately observes that both of the approaches discussed above are equivalent in that the energy is sensed at various spatial locations of the object to be imaged. This energy is then converted to an electrical signal by the transducer. The image formation processes just described are classical analog image formation, with the distance between the sensor locations limiting the spatial resolution in the system. In the array sensors, resolution is determined by how close the sensors are located in the array. In the single-sensor approach, the spatial resolution is limited by how far the sensor is moved. In an actual system spatial resolution is also determined by the performance characteristics of the sensor. Here we are assuming for our purposes perfect sensors. In digital image formation one is concerned about two processes: spatial sampling and quantization. Sampling is quite similar to scanning in analog image formation. The second process is known as quantization or analog-to-digital conversion, whereby at each spatial location a number is assigned to the amount of energy the transducer observes at that location. This number is usually proportional to the electrical signal at the output of the transducer. The overall process of sampling and quantization is known as digitization. Sometimes the digitization process is just referred to as analog-to-digital conversion, or A/D conversion; however, the reader should remember that digitization also includes spatial sampling. The digital image formulation process is summarized in Fig. 17.1. The spatial sampling process can be considered as overlaying a grid on the object, with the sensor examining the energy output from each grid box
c Object to be imaged 0001410o o39|9g|97 o59996 The quantization process 0049842I 0069950o o0289400 oo02|300|0 FIGURE 17.1 Digital image formation: sampling and quantization. and converting it to an electrical signal. The quantization process then assigns a number to the electrical signal; the result, which is a matrix of numbers, is the digital representation of the image. Each spatial location in the image (or grid)to which a number is assigned is known as a picture element or pixel (or pel). The size of the sampling grid is usually given by the number of pixels on each side of the grid, e.g., 256 X 256, 512 X 512, 488×380 The quantization process is necessary because all information to be processed using computers must be represented by numbers. The quantization process can be thought of as one where the input energy to the transducer is represented by a finite number of energy values. If the energy at a particular pixel location does not take on one of the finite energy values, it is assigned to the closest value. For instance, suppose that we me a priori that only energy values of 10, 20, 50, and 110 will be represented(the units are of no concern this example). Suppose at one pixel an energy of 23.5 was observed by the transducer. The A/D converter would then assign this pixel the energy value of 20(the closest one). Notice that the quantization process makes mistakes; this error in assignment is known as quantization error or quantization noise In our example, each pixel is represented by one of four possible values. For ease of representation of the data, it would be simpler to assign to each pixel the index value 0, 1, 2, 3, instead of 10, 20, 50, 110. In fact, this is typically done by the quantization process. One needs a simple table to know that a pixel assigned the value 2 corresponds to an energy of 50. Also, the number of possible energy levels is typically some integer power of two to also aid in representation. This power is known as the number of bits needed to represent the energy of each pixel In our example each pixel is represented by two bits. One question that immediately arises is how accurate the digital representation of the image is when one compares the digital image with a corresponding analog image. It should first be pointed out that after the digital image is obtained one requires special hardware to convert the matrix of pixels back to an image that can be viewed on a CRT display. The process of converting the digital image back to an image that can be viewed is known as digital-to-analog conversion, or D/a conversion C 2000 by CRC Press LLC
© 2000 by CRC Press LLC and converting it to an electrical signal. The quantization process then assigns a number to the electrical signal; the result, which is a matrix of numbers, is the digital representation of the image. Each spatial location in the image (or grid) to which a number is assigned is known as a picture element or pixel (or pel). The size of the sampling grid is usually given by the number of pixels on each side of the grid, e.g., 256 256, 512 512, 488 380. The quantization process is necessary because all information to be processed using computers must be represented by numbers. The quantization process can be thought of as one where the input energy to the transducer is represented by a finite number of energy values. If the energy at a particular pixel location does not take on one of the finite energy values, it is assigned to the closest value. For instance, suppose that we assume a priori that only energy values of 10, 20, 50, and 110 will be represented (the units are of no concern in this example). Suppose at one pixel an energy of 23.5 was observed by the transducer. The A/D converter would then assign this pixel the energy value of 20 (the closest one). Notice that the quantization process makes mistakes; this error in assignment is known as quantization error or quantization noise. In our example, each pixel is represented by one of four possible values. For ease of representation of the data, it would be simpler to assign to each pixel the index value 0, 1, 2, 3, instead of 10, 20, 50, 110. In fact, this is typically done by the quantization process. One needs a simple table to know that a pixel assigned the value 2 corresponds to an energy of 50. Also, the number of possible energy levels is typically some integer power of two to also aid in representation. This power is known as the number of bits needed to represent the energy of each pixel. In our example each pixel is represented by two bits. One question that immediately arises is how accurate the digital representation of the image is when one compares the digital image with a corresponding analog image. It should first be pointed out that after the digital image is obtained one requires special hardware to convert the matrix of pixels back to an image that can be viewed on a CRT display. The process of converting the digital image back to an image that can be viewed is known as digital-to-analog conversion, or D/A conversion. FIGURE 17.1 Digital image formation: sampling and quantization.
RE 17. 2 This image shows the effects of aliasing due to sampling the image at too low a rate. The image should be lines converging at a point. Because of undersampling, it appears as if there are patterns in the lines at various These are known as moire patterns. The quality of representation of the image is determined by how close spatially the pixels are located and how many levels or numbers are used in the quantization, i. e, how coarse or fine is the quantization. The sampling accuracy is usually measured in how many pixels there are in a given area and is cited in pixels/unit length, i. e, pixels/cm. This is known as the spatial sampling rate. One would desire to use the lowest rate possible the number of pixels needed to represent the object. If the sampling rate is too low, then obviously me details of the object to be imaged will not be represented very well. In fact, there is a mathematical theorem which determines the lowest sampling rate possible to preserve details in the object. This rate is known as the Nyquist sampling rate(named after the late Bell Laboratories engineer Harry Nyquist). The theorem states that the sampling rate must be twice the highest possible detail one expects to image in the object. If the object has details closer than, say 1 mm, one must take at least 2 pixels/mm. (The Nyquist theorem actually lys more than this, but a discussion of the entire theorem is beyond the scope of this section. If we sample at a lower rate than the theoretical lowest limit, the resulting digital representation of the object will be distorted. This type of distortion or sampling error is known as aliasing errors. Aliasing errors usually manifest themselves in the image as moire patterns(Fig 17. 2). The important point to remember is that there is a lower limit to the spatial sampling rate such that object detail can be maintained. The sampling rate can also be stated as the total number of pixels needed to represent the digital image, i. e, the matrix size(or grid size). One often sees these sampling rates cited as 256 X 256, 512 X 512, and so on. If the same object is imaged with a large matrix ize, the sampling rate has obviously increased. Typically, images are sampled on 256X 256, 512 X 512, or 1024 X 1024 grids, depending on the application and type of modality. One immediately observes an important sue in digital representation of images: that of the large number of pixels needed to represent the image. A 256 X 256 image has 65, 536 pixels and a 512 X 512 image has 262, 144 pixels! We shall return to this point later when we discuss processing or storage of these images The quality of the representation of the digital image is also determined by the number of levels or shades of gray that are used in the quantization. If one has more levels, then fewer mistakes will be made in assigning values at the output of the transducer. Figure 17.3 demonstrates how the number of gray levels affects the digital representation of an artery. When a small number of levels are used, the quantization is coarse and the quantization error is large. The quantization error usually manifests itself in the digital image by the appearance e 2000 by CRC Press LLC
© 2000 by CRC Press LLC The quality of representation of the image is determined by how close spatially the pixels are located and how many levels or numbers are used in the quantization, i.e., how coarse or fine is the quantization. The sampling accuracy is usually measured in how many pixels there are in a given area and is cited in pixels/unit length, i.e., pixels/cm. This is known as the spatial sampling rate. One would desire to use the lowest rate possible to minimize the number of pixels needed to represent the object. If the sampling rate is too low, then obviously some details of the object to be imaged will not be represented very well. In fact, there is a mathematical theorem which determines the lowest sampling rate possible to preserve details in the object. This rate is known as the Nyquist sampling rate (named after the late Bell Laboratories engineer Harry Nyquist). The theorem states that the sampling rate must be twice the highest possible detail one expects to image in the object. If the object has details closer than, say 1 mm, one must take at least 2 pixels/mm. (The Nyquist theorem actually says more than this, but a discussion of the entire theorem is beyond the scope of this section.) If we sample at a lower rate than the theoretical lowest limit, the resulting digital representation of the object will be distorted. This type of distortion or sampling error is known as aliasing errors. Aliasing errors usually manifest themselves in the image as moiré patterns (Fig. 17.2). The important point to remember is that there is a lower limit to the spatial sampling rate such that object detail can be maintained. The sampling rate can also be stated as the total number of pixels needed to represent the digital image, i.e., the matrix size (or grid size). One often sees these sampling rates cited as 256 3 256, 512 3 512, and so on. If the same object is imaged with a large matrix size, the sampling rate has obviously increased. Typically, images are sampled on 256 3 256, 512 3 512, or 1024 3 1024 grids, depending on the application and type of modality. One immediately observes an important issue in digital representation of images: that of the large number of pixels needed to represent the image. A 256 3 256 image has 65,536 pixels and a 512 3 512 image has 262,144 pixels! We shall return to this point later when we discuss processing or storage of these images. The quality of the representation of the digital image is also determined by the number of levels or shades of gray that are used in the quantization. If one has more levels, then fewer mistakes will be made in assigning values at the output of the transducer. Figure 17.3 demonstrates how the number of gray levels affects the digital representation of an artery. When a small number of levels are used, the quantization is coarse and the quantization error is large. The quantization error usually manifests itself in the digital image by the appearance FIGURE 17.2 This image shows the effects of aliasing due to sampling the image at too low a rate. The image should be straight lines converging at a point. Because of undersampling, it appears as if there are patterns in the lines at various angles. These are known as moiré patterns
RE 17.3 This image demonstrates the effects of quantization error. The upper left image is a coronary artery image 8 bits(256 levels or shades of gray) per pixel. The upper right image has 4 bits/pixel (16 levels). The lower left image bits/pixel(8 levels). The lower right image has 2 bits/pixel (4 levels). Note the false contouring in the images as the number of possible levels in the pixel representation is reduced. This false contouring is the quantization error, and as the number of levels increases the quantization error decreases because fewer mistakes are being made in the representation of false contouring in the picture. One usually needs at least 6 bits or 64 gray levels to represent an image adequately. Higher-quality imaging systems use 8 bits(256 levels)or even as many as 10 bits(1024 levels)per ixel. In most applications, the human observer cannot distinguish quantization error when there are more than 256 levels. (Many times the number of gray levels is cited in bytes. One byte is 8 bits, i.e., high-quality monochrome digital imaging systems use one byte per pixel) One of the problems briefly mentioned previously is the large number of pixels needed to represent an i. y which translates into a large amount of digital data needed for the representation. A 512 X 512 image wit 8 bits/pixel(1 byte/pixel)of gray level representation requires 2,097, 152 bits of computer data to describe it.A typical computer file that contains 1000 words usually requires only about 56,000 bits to describe it. The 512 X 512 image is 37 times larger! (A picture is truly worth more than 1000 words. )This data requirement is one of the major problems with digital imaging, given that the storage of digital images in a computer file system is expensive. Perhaps another example will demonstrate this problem. Many computers and word processing systems have the capability of transmitting information over telephone lines to other systems at data rates of 2400 bits per second. At this speed it would require nearly 15 minutes to transmit a 512 X 512 image! Moving objects are imaged digitally by taking digital snapshots of them, i. e, digital video. True digital imaging would acquire about 30 images/s to capture all the important motion in a scene. At 30 images/s, with each image sampled at 512 X 512 and with 8 bits/pixel, the system must handle 62, 914, 560 bits/s. Only very expensive acquisition systems are capable of handling these large data rates The grea do on a computer can be done to a digital image. Recall that a digital image is Just a(huge)matrix test advantage of digital images is that they can be processed on a computer. Any type of operation that one car of numbers. Digital image processing is the process of using a computer to extract useful information from this matrix. Processing that cannot be done optically or with analog systems(such as early video systems)can be easily done on computers. The disadvantage is that a large amount of data needs to be processed and on ome small computer systems this can take a long time(hours). we shall examine image processing in more detail in the next subsection and discuss some of the computer hardware issues in a later chapt c2000 by CRC Press LLC
© 2000 by CRC Press LLC of false contouring in the picture. One usually needs at least 6 bits or 64 gray levels to represent an image adequately. Higher-quality imaging systems use 8 bits (256 levels) or even as many as 10 bits (1024 levels) per pixel. In most applications, the human observer cannot distinguish quantization error when there are more than 256 levels. (Many times the number of gray levels is cited in bytes. One byte is 8 bits, i.e., high-quality monochrome digital imaging systems use one byte per pixel.) One of the problems briefly mentioned previously is the large number of pixels needed to represent an image, which translates into a large amount of digital data needed for the representation. A 512 3 512 image with 8 bits/pixel (1 byte/pixel) of gray level representation requires 2,097,152 bits of computer data to describe it. A typical computer file that contains 1000 words usually requires only about 56,000 bits to describe it. The 512 3 512 image is 37 times larger! (A picture is truly worth more than 1000 words.) This data requirement is one of the major problems with digital imaging, given that the storage of digital images in a computer file system is expensive. Perhaps another example will demonstrate this problem. Many computers and word processing systems have the capability of transmitting information over telephone lines to other systems at data rates of 2400 bits per second. At this speed it would require nearly 15 minutes to transmit a 512 3 512 image! Moving objects are imaged digitally by taking digital snapshots of them, i.e., digital video. True digital imaging would acquire about 30 images/s to capture all the important motion in a scene. At 30 images/s, with each image sampled at 512 3 512 and with 8 bits/pixel, the system must handle 62,914,560 bits/s. Only very expensive acquisition systems are capable of handling these large data rates. The greatest advantage of digital images is that they can be processed on a computer. Any type of operation that one can do on a computer can be done to a digital image. Recall that a digital image is just a (huge) matrix of numbers. Digital image processing is the process of using a computer to extract useful information from this matrix. Processing that cannot be done optically or with analog systems (such as early video systems) can be easily done on computers. The disadvantage is that a large amount of data needs to be processed and on some small computer systems this can take a long time (hours). We shall examine image processing in more detail in the next subsection and discuss some of the computer hardware issues in a later chapter. FIGURE 17.3 This image demonstrates the effects of quantization error. The upper left image is a coronary artery image with 8 bits (256 levels or shades of gray) per pixel. The upper right image has 4 bits/pixel (16 levels). The lower left image has 3 bits/pixel (8 levels). The lower right image has 2 bits/pixel (4 levels). Note the false contouring in the images as the number of possible levels in the pixel representation is reduced. This false contouring is the quantization error, and as the number of levels increases the quantization error decreases because fewer mistakes are being made in the representation
URE 17. 4 Contrast stretching. The image on the right has gray values between 0 and 63, causing the contrast to look washed out. The image on the right has been contrast enhanced by multiplying the gray levels by four. Point operations Perhaps the simplest image processing operation is that of modifying the values of individual pixels in an image These operations are commonly known as point operations. A point operation might be used to highlight certain regions in an image. Suppose one wished to know where all the pixels in a certain gray level region were spatially located in the image. One would modify all those pixel values to 0 (black) or 255(white)such that the observer could see where they were located Another example of a point operation is contrast enhancement or contrast stretching. The pixel values in a articular image may occupy only a small region of gray level distribution. For instance, the pixels in an image may only take on values between 0 and 63, when they could nominally take on values between 0 and 255. Thi is sometimes caused by the way the image was digitized and/or by the type of transducer used. When this image is examined on a CrT display the contrast looks washed out. A simple point operation that multiplies each pixel value in the image by four will increase the apparent contrast in the image; the new image now has gray values between 0 and 252. This operation is shown in Fig. 17. 4. Possibly the most widely used point operation in medical imaging is pseudo-coloring. In this point operation all the pixels in the image with a particular gray value are assigned a color. Various schemes have been proposed for appropriate pseudo-color tables that assign he gray values to colors. It should be mentioned that point operations are often cascaded, i. e, an image undergoes contrast enhancement and then pseudo-coloring The operations described above can be thought of as operations(or algorithms)that modify the range of the gray levels of the pixels. An important feature that describes a great deal about an image is the histogram of the pixel values. A histogram is a table that lists how many pixels in an image take on a particular gray value. These data are often plotted as a function of the gray value. Point operations are also known as histogram modification or histogram stretching. The contrast enhancement operation shown in Fig. 17.4 modifies the histogram of the resultant image by stretching the gray values from a range of 0-63 to a range of 0-252 Some point operations are such that the resulting histogram of the processed image has a particular shape. a popular rm of histogram modification is known as histogram equalization, whereby the pixels are modified such that ge is almost flat, i. e,, all the pixel values occur equally It is impossible to list all possible types of point operations; however, the important thing to remember is that these operations process one pixel at a time by modifying the pixel based only on its gray level value and nowhere it is distributed spatially(i. e, location in the pixel matrix). These operations are performed to enhance the image, make it easier to see certain structures or regions in the image, or to force a particular shape to the histogram of the image. They are also used as initial operations in a more complicated image processing e 2000 by CRC Press LLC
© 2000 by CRC Press LLC Point Operations Perhaps the simplest image processing operation is that of modifying the values of individual pixels in an image. These operations are commonly known as point operations. A point operation might be used to highlight certain regions in an image. Suppose one wished to know where all the pixels in a certain gray level region were spatially located in the image. One would modify all those pixel values to 0 (black) or 255 (white) such that the observer could see where they were located. Another example of a point operation is contrast enhancement or contrast stretching. The pixel values in a particular image may occupy only a small region of gray level distribution. For instance, the pixels in an image may only take on values between 0 and 63, when they could nominally take on values between 0 and 255. This is sometimes caused by the way the image was digitized and/or by the type of transducer used. When this image is examined on a CRT display the contrast looks washed out. A simple point operation that multiplies each pixel value in the image by four will increase the apparent contrast in the image; the new image now has gray values between 0 and 252. This operation is shown in Fig. 17.4. Possibly the most widely used point operation in medical imaging is pseudo-coloring. In this point operation all the pixels in the image with a particular gray value are assigned a color. Various schemes have been proposed for appropriate pseudo-color tables that assign the gray values to colors. It should be mentioned that point operations are often cascaded, i.e., an image undergoes contrast enhancement and then pseudo-coloring. The operations described above can be thought of as operations (or algorithms) that modify the range of the gray levels of the pixels. An important feature that describes a great deal about an image is the histogram of the pixel values. A histogram is a table that lists how many pixels in an image take on a particular gray value. These data are often plotted as a function of the gray value. Point operations are also known as histogram modification or histogram stretching. The contrast enhancement operation shown in Fig. 17.4 modifies the histogram of the resultant image by stretching the gray values from a range of 0–63 to a range of 0–252. Some point operations are such that the resulting histogram of the processed image has a particular shape. A popular form of histogram modification is known as histogram equalization, whereby the pixels are modified such that the histogram of the processed image is almost flat, i.e., all the pixel values occur equally. It is impossible to list all possible types of point operations; however, the important thing to remember is that these operations process one pixel at a time by modifying the pixel based only on its gray level value and not where it is distributed spatially (i.e., location in the pixel matrix). These operations are performed to enhance the image, make it easier to see certain structures or regions in the image, or to force a particular shape to the histogram of the image. They are also used as initial operations in a more complicated image processing algorithm. FIGURE 17.4 Contrast stretching. The image on the right has gray values between 0 and 63, causing the contrast to look washed out. The image on the right has been contrast enhanced by multiplying the gray levels by four
Image enhancement Image enhancement is the use of image processing algorithms to ertain types of distortion in an mage. The image is enhanced by removing noise, making the edge s in the image stand out, or an other operation that makes the image look better. Point operation bove are generally considered to be enhancement operations Enhancement also includes operations that use groups of pixels and the spatial location of the pixels in the image. The most widely used algorithms for enhancement are based on pixel functions that are known as window operations. a window operation performed on an image is nothing more than the process of examining the pixels in a certain region of the image, called the window region, and computing some type of mathematical function derived from the pixels in the window. In most cases the windows are square or rectangle, although other shapes have been used. After the operation is performed, the result of the computation is placed in the center pixel of the window where a 3 x 3 pixel window has been extracted from the image. The values of the pixels in the window, labeled a1, a2,..., ag, are used to compute a new pixel value which replaces the value of as, and the window is moved to a new center location until all the pixels in the original image have been processed. As an example of a window operation, suppose we computed the average value of the pixels in the ndow. This operation is known as smoothing and will tend to reduce noise in the image, but unfortunately it will also tend to blur edge structures in the image. Another window operation often used is the computation of a linear weighted sum of the pixel values. Let d s be the new pixel value that will replace a, in the original image. We then form C: a (17.1) where the a s are any real numbers. For the simple smoothing operation described above we set a =1/9 for all i. By changing the values of the a; weights, one can perform different types of enhancement operations to an image. Any window operation that can be described by Eq 17. 1 is known as a linear window operation or convolution operator. If some of the a; coefficients take on negative values, one can enhance the appearance of edge structures in the image. It is possible to compute a nonlinear function of the pixels in the window. One of the more powerful nonline window operations is that of median filtering. In this operation all the pixels in the window are listed middle, or median, pixel is obtained. The median pixel then is used to replace as. The median filter is used to remove noise from an image and at the same time preserve the edge structure the image. More recently there has been a great deal of interest in morphological operators. These are also nonlinear window operations that can be used to extract or enhance shape information in an image. In the preceding discussion, all of the window operations were described on 3 x 3 windows. The current research in window operations is directed at using large window sizes, i.e., 9 x 9, 13 X 13, or 21 x 21. The philosophy in this work is that small window sizes only use local information and what one really needs to use is information that is more global in nature. Digital Image Compression Image compression refers to the task of reducing the amount of data required to store or transmit a digital Issed earlier, in its natural form, a digital image comprises an array of numbers. Each such cement is often confused with image restoration Image enhancement is the ac processing algorithms to enhance the appearance of the image. Image restoration is the application of algorithn knowledge of the degradation process to enhance or restore the image, i.e., deconvolution algorithms used to effect of the aperture point spread function in blurred images. a discussion of image restoration is beyond the scope of this c2000 by CRC Press LLC
© 2000 by CRC Press LLC Image Enhancement Image enhancement is the use of image processing algorithms to remove certain types of distortion in an image. The image is enhanced by removing noise, making the edge structures in the image stand out, or any other operation that makes the image look better. 1 Point operations discussed above are generally considered to be enhancement operations. Enhancement also includes operations that use groups of pixels and the spatial location of the pixels in the image. The most widely used algorithms for enhancement are based on pixel functions that are known as window operations. A window operation performed on an image is nothing more than the process of examining the pixels in a certain region of the image, called the window region, and computing some type of mathematical function derived from the pixels in the window. In most cases the windows are square or rectangle, although other shapes have been used. After the operation is performed, the result of the computation is placed in the center pixel of the window where a 3 3 3 pixel window has been extracted from the image. The values of the pixels in the window, labeled a1, a2, . . ., a9, are used to compute a new pixel value which replaces the value of a5, and the window is moved to a new center location until all the pixels in the original image have been processed. As an example of a window operation, suppose we computed the average value of the pixels in the window. This operation is known as smoothing and will tend to reduce noise in the image, but unfortunately it will also tend to blur edge structures in the image. Another window operation often used is the computation of a linear weighted sum of the pixel values. Let a9 5 be the new pixel value that will replace a5 in the original image. We then form (17.1) where the ai ’s are any real numbers. For the simple smoothing operation described above we set ai= 1/9 for all i. By changing the values of the ai weights, one can perform different types of enhancement operations to an image. Any window operation that can be described by Eq. 17.1 is known as a linear window operation or convolution operator. If some of the ai coefficients take on negative values, one can enhance the appearance of edge structures in the image. It is possible to compute a nonlinear function of the pixels in the window. One of the more powerful nonlinear window operations is that of median filtering. In this operation all the pixels in the window are listed in descending magnitude and the middle, or median, pixel is obtained. The median pixel then is used to replace a5. The median filter is used to remove noise from an image and at the same time preserve the edge structure in the image. More recently there has been a great deal of interest in morphological operators. These are also nonlinear window operations that can be used to extract or enhance shape information in an image. In the preceding discussion, all of the window operations were described on 3 3 3 windows. The current research in window operations is directed at using large window sizes, i.e., 9 3 9, 13 3 13, or 21 3 21. The philosophy in this work is that small window sizes only use local information and what one really needs to use is information that is more global in nature. Digital Image Compression Image compression refers to the task of reducing the amount of data required to store or transmit a digital image. As discussed earlier, in its natural form, a digital image comprises an array of numbers. Each such 1 Image enhancement is often confused with image restoration. Image enhancement is the ad hoc application of various processing algorithms to enhance the appearance of the image. Image restoration is the application of algorithms that use knowledge of the degradation process to enhance or restore the image, i.e., deconvolution algorithms used to remove the effect of the aperture point spread function in blurred images. A discussion of image restoration is beyond the scope of this section. ¢ = = a a  i i i 5 1 9 a
Encoder Image Modulator Error Control Coding Mutiplexing etc Demodulator Processing FIGURE 17.5 Overview of an image compression system number is the sampled value of the image at a pixel (picture element)location. These numbers are represented with finite precision using a fixed number of bits. Until recently, the dominant image size was 512 X 512 pixels with 8 bits or 1 byte per pixel. The total storage size for such an image is 5122=0.25 X 10 bytes or 0.25 Mbytes. When digital image processing first emerged in the 1960s, this was considered to be a formidable amount of data, and so interest in developing ways to reduce this storage requirement arose immediately. Since that time, image compression has continued to be an active area of research. The recent emergence of standards for image coding algorithms and the commercial availability of very large scale integration(VLSI)chips that implement image coding algorithms is indicative of the present maturity of the field, although research activity continues apace. with declining memory costs and increasing transmission bandwidths, 0. 25 Mbytes is no longer considered to be the large amount of data that it once was. This might suggest that the need for image compression is not as great as previously. Unfortunately(or fortunately, depending on one's point of view), this is not the case because our appetite for image data has also grown enormously over the years. The old 512 X 512 pixels x I byte per pixel"standard"was a consequence of the spatial and gray scale resolution of sensors and displays that were commonly available until recently. At this time, displays with more than 10x 10 pixels and 24 bits/pixel to allow full color representation(8 bits each for red, green, and blue)are becoming commonplace. Thus, our 0.25-Mbyte standard image size has grown to 3 Mbytes. This is just the tip of the iceberg, however. For example, in desktop printing applications, a 4-color(cyan, magenta, yellow, and black)image of an 8.5 X hyperspectral image contains terrain irradiance measurements in each of 200 10-nm-wide spectral band,, R 11 in. page sampled at 600 dots per in. requires 134 Mbytes. In remote sensing applications, a typic 5-m intervals on the ground. Each measurement is recorded with 12-bit precision. Such data are acquire from aircraft or satellite and are used in agriculture, forestry, and other fields concerned with management of natural resources. Storage of these data from just a 10 X 10 km2 area requires 4800 Mbytes. Figure 17.5 shows the essential components of an image compression system. At the system input, the image encoded into its compressed form by the image coder. The compressed image may then be subjected to further digital processing, such as error control coding, encryption, or multiplexing with other data sources, before being used to modulate the analog signal that is actually transmitted through the channel or stored in a storage medium. At the system output, the image is processed step by step to undo each of the operations that was performed on it at the system input At the final step, the image is decoded into its original uncom- ressed form by the image decoder. Because of the role of the image encoder and decoder in an image compression system, image coding is often used as a synonym for image compression. If the reconstructed image is identical to the original image, the compression is said to be lossless. Otherwise, it is lossy Image compression algorithms depend for their success on two separate factors: redundancy and irrelevancy. Redundancy refers to the fact that each pixel in an image does not take on all possible values with equal probability, and the value that it does take on is not independent of that of the other pixels in the image. If this were not true, the image would appear as a white noise pattern such as that seen when a television receiver is tuned to an unused channel. From an information-theoretic point of view, such an image contains the e 2000 by CRC Press LLC
© 2000 by CRC Press LLC number is the sampled value of the image at a pixel (picture element) location. These numbers are represented with finite precision using a fixed number of bits. Until recently, the dominant image size was 512 3 512 pixels with 8 bits or 1 byte per pixel. The total storage size for such an image is 5122 ª 0.25 3 106 bytes or 0.25 Mbytes. When digital image processing first emerged in the 1960s, this was considered to be a formidable amount of data, and so interest in developing ways to reduce this storage requirement arose immediately. Since that time, image compression has continued to be an active area of research. The recent emergence of standards for image coding algorithms and the commercial availability of very large scale integration (VLSI) chips that implement image coding algorithms is indicative of the present maturity of the field, although research activity continues apace. With declining memory costs and increasing transmission bandwidths, 0.25 Mbytes is no longer considered to be the large amount of data that it once was. This might suggest that the need for image compression is not as great as previously. Unfortunately (or fortunately, depending on one’s point of view), this is not the case because our appetite for image data has also grown enormously over the years. The old 512 3 512 pixels 3 1 byte per pixel “standard’’ was a consequence of the spatial and gray scale resolution of sensors and displays that were commonly available until recently. At this time, displays with more than 103 3 103 pixels and 24 bits/pixel to allow full color representation (8 bits each for red, green, and blue) are becoming commonplace. Thus, our 0.25-Mbyte standard image size has grown to 3 Mbytes. This is just the tip of the iceberg, however. For example, in desktop printing applications, a 4-color (cyan, magenta, yellow, and black) image of an 8.5 3 11 in.2 page sampled at 600 dots per in. requires 134 Mbytes. In remote sensing applications, a typical hyperspectral image contains terrain irradiance measurements in each of 200 10-nm-wide spectral bands at 25-m intervals on the ground. Each measurement is recorded with 12-bit precision. Such data are acquired from aircraft or satellite and are used in agriculture, forestry, and other fields concerned with management of natural resources. Storage of these data from just a 10 3 10 km2 area requires 4800 Mbytes. Figure 17.5 shows the essential components of an image compression system. At the system input, the image is encoded into its compressed form by the image coder. The compressed image may then be subjected to further digital processing, such as error control coding, encryption, or multiplexing with other data sources, before being used to modulate the analog signal that is actually transmitted through the channel or stored in a storage medium. At the system output, the image is processed step by step to undo each of the operations that was performed on it at the system input. At the final step, the image is decoded into its original uncompressed form by the image decoder. Because of the role of the image encoder and decoder in an image compression system, image coding is often used as a synonym for image compression. If the reconstructed image is identical to the original image, the compression is said to be lossless. Otherwise, it is lossy. Image compression algorithms depend for their success on two separate factors: redundancy and irrelevancy. Redundancy refers to the fact that each pixel in an image does not take on all possible values with equal probability, and the value that it does take on is not independent of that of the other pixels in the image. If this were not true, the image would appear as a white noise pattern such as that seen when a television receiver is tuned to an unused channel. From an information-theoretic point of view, such an image contains the FIGURE 17.5 Overview of an image compression system
Feature Compressed FIGURE 17.6 Key elements of an image encoder. maximum amount of information. From the point of view of a human or machine interpreter, however, it ontains no information at all. Irrelevancy refers to the fact that not all the information in the image is required for its intended application. First, under typical viewing conditions, it is possible to remove some of the information in an image without producing a change that is perceptible to a human observer. This is because of the limited ability of the human viewer to detect small changes in luminance over a large area or larger changes in luminance over a very small area, especially in the presence of detail that may mask these changes. Second, even though some degradation in image quality may be observed as a result of image compression, the degradation may not be objectionable for a particular application, such as teleconferencing. Third, the degradation introduced by image compression may not interfere with the ability of a human or machine to extract the information from the image that is important for a particular application. Lossless compression algorithms can only exploit redundancy, whereas lossy methods may exploit both redundancy and irrelevancy. A myriad of approaches have been proposed for image compression. To bring some semblance of order to the field, it is helpful to identify those key elements that provide a reasonably accurate description of most encoding algorithms. These are shown in Fig. 17. 6. The first step is feature extraction. Here the partitioned into x Blocks of pixels. Within each block, a feature vector is computed which is used to represent all the pixels within that block. If the feature vector provides a complete description of the block, i. the block of pixel values can be determined exactly from the feature vector, then the feature is suitable for use in a lossless compression algorithm. Otherwise, the algorithm will be lossy. For the simplest feature vector, we let the block size N= l and take the pixel values to be the features. Another important example for N= l is to let the feature be the error in the prediction of the pixel value based on the values of neighboring pixels which have already been encoded and, hence, whose values would be known as the decoder. This feature forms the basis for predictive encoding, of which differential pulse-code modulation(DPCM)is a special case. For larger size blocks, the most important example is to compute a two-dimensional(2-D) Fourier-like transform of the block of pixels and to use the n transform coefficients as the feature vector. The widely used Joint graphic Experts Group(JPEG)standard image coder is based on the discrete cosine transform(DCT) a block size of N=8. In all of the foregoing examples, the block of pixel values can be reconstructed exactly from the feature vector. In the last example, the inverse DCT is used. Hence, all these features may form the basis for a lossless compression algorithm. A feature vector that does not provide a complete description of the pixel block is a vector consisting of the mean and variance of the pixels within the block and an nX N binary mask indicating whether or not each pixel exceeds the mean. From this vector, we can only reconstruct an approximation to the original pixel block which has the same mean and variance as the original. This feature provide as nonredundant as possible a representation of the image and to separate those aspects of the image that are relevant to the viewer from those that are irrelevant The second step in image encoding is vector quantization. This is essentially a clustering step in which we partition the feature space into cells, each of which will be represented by a single prototype feature vector ince all feature vectors belonging to a given cell are mapped to the same prototype, the quantization process is irreversible and, hence, cannot be used as part of a lossless compression algorithm. Figure 17.7 shows an xample for a two-dimensional feature space. Each dot corresponds to one feature vector from the image. The Xs signify the prototypes used to represent all the feature vectors contained within its quantization cell, the boundary of which is indicated by the dashed lines. Despite the simplicity with which vector quantization may be described, the implementation of a vector quantizer is a computationally complex task unless some structure is imposed on it. The clustering is based on minimizing the distortion between the original and quantized feature vectors, averaged over the entire image. The distortion measure can be chosen to account for the relative ensitivity of the human viewer to different kinds of degradation. In one dimension, the vector quantizer reduces to the Lloyd-Max scalar quantizer c2000 by CRC Press LLC
© 2000 by CRC Press LLC maximum amount of information. From the point of view of a human or machine interpreter, however, it contains no information at all. Irrelevancy refers to the fact that not all the information in the image is required for its intended application. First, under typical viewing conditions, it is possible to remove some of the information in an image without producing a change that is perceptible to a human observer. This is because of the limited ability of the human viewer to detect small changes in luminance over a large area or larger changes in luminance over a very small area, especially in the presence of detail that may mask these changes. Second, even though some degradation in image quality may be observed as a result of image compression, the degradation may not be objectionable for a particular application, such as teleconferencing. Third, the degradation introduced by image compression may not interfere with the ability of a human or machine to extract the information from the image that is important for a particular application. Lossless compression algorithms can only exploit redundancy, whereas lossy methods may exploit both redundancy and irrelevancy. A myriad of approaches have been proposed for image compression. To bring some semblance of order to the field, it is helpful to identify those key elements that provide a reasonably accurate description of most encoding algorithms. These are shown in Fig. 17.6. The first step is feature extraction. Here the image is partitioned into N 3 N blocks of pixels. Within each block, a feature vector is computed which is used to represent all the pixels within that block. If the feature vector provides a complete description of the block, i.e., the block of pixel values can be determined exactly from the feature vector, then the feature is suitable for use in a lossless compression algorithm. Otherwise, the algorithm will be lossy. For the simplest feature vector, we let the block size N = 1 and take the pixel values to be the features. Another important example for N = 1 is to let the feature be the error in the prediction of the pixel value based on the values of neighboring pixels which have already been encoded and, hence, whose values would be known as the decoder. This feature forms the basis for predictive encoding, of which differential pulse-code modulation (DPCM) is a special case. For larger size blocks, the most important example is to compute a two-dimensional (2-D) Fourier-like transform of the block of pixels and to use the N2 transform coefficients as the feature vector. The widely used Joint Photographic Experts Group (JPEG) standard image coder is based on the discrete cosine transform (DCT) with a block size of N = 8. In all of the foregoing examples, the block of pixel values can be reconstructed exactly from the feature vector. In the last example, the inverse DCT is used. Hence, all these features may form the basis for a lossless compression algorithm. A feature vector that does not provide a complete description of the pixel block is a vector consisting of the mean and variance of the pixels within the block and an N 3 N binary mask indicating whether or not each pixel exceeds the mean. From this vector, we can only reconstruct an approximation to the original pixel block which has the same mean and variance as the original. This feature is the basis for the lossy block truncation coding algorithm. Ideally, the feature vector should be chosen to provide as nonredundant as possible a representation of the image and to separate those aspects of the image that are relevant to the viewer from those that are irrelevant. The second step in image encoding is vector quantization. This is essentially a clustering step in which we partition the feature space into cells, each of which will be represented by a single prototype feature vector. Since all feature vectors belonging to a given cell are mapped to the same prototype, the quantization process is irreversible and, hence, cannot be used as part of a lossless compression algorithm. Figure 17.7 shows an example for a two-dimensional feature space. Each dot corresponds to one feature vector from the image. The X’s signify the prototypes used to represent all the feature vectors contained within its quantization cell, the boundary of which is indicated by the dashed lines. Despite the simplicity with which vector quantization may be described, the implementation of a vector quantizer is a computationally complex task unless some structure is imposed on it. The clustering is based on minimizing the distortion between the original and quantized feature vectors, averaged over the entire image. The distortion measure can be chosen to account for the relative sensitivity of the human viewer to different kinds of degradation. In one dimension, the vector quantizer reduces to the Lloyd-Max scalar quantizer. FIGURE 17.6 Key elements of an image encoder