《医学影像信息学概论》课程参考资源：Geoff Dougherty：Pattern Recognition and Classification An Introduction.pdf_大学文库

Geoff Dougherty Applied Physics and Medical Imaging California State University, Channel Islands Camarillo, CA, USA Please note that additional material for this book can be downloaded from http://extras.springer.com ISBN 978-1-4614-5322-2 ISBN 978-1-4614-5323-9 (eBook) DOI 10.1007/978-1-4614-5323-9 Springer New York Heidelberg Dordrecht London Library of Congress Control Number: 2012949108 # Springer Science+Business Media New York 2013 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface The use of patten recognition and classification is fundamental to many of the automated electronic systems in use today. Its applications range from military efense to medical diagnosis, from biometrics to machine learning, from bioinfor- matics to home entertainment, and more. However, despite the existence of a number of notable books in the field, the subject remains very challenging, espe We have found that the current textbooks are not completely satisfactory for our students, who are primarily computer science students but also include students from mathematics and physics backgrounds and those from industry. Their mathe matical and computer backgrounds are considerably varied, but they all want to understand and absorb the core concepts with a minimal time investment to the point where they can use and adapt them to problems in their own fields. Texts with extensive mathematical or statistical prerequisites were daunting and unappealing to them. Our students complained of"not seeing the wood for the trees, which is rather ironic for textbooks in pattern recognition. It is crucial for newcomers to the field to be introduced to the key concepts at a basic level in an ordered, logical fashion, so that they appreciate the"big picture"; they can then handle progres sively more detail, building on prior knowledge, without being overwhelmed. Too often our students have dipped into various textbooks to sample different approaches but have ended up confused by the different terminologies in use We have noticed that the majority of our students are very comfortable with and respond well to visual learning, building on their often limited entry knowledge, but focusing on key concepts illustrated by practical examples and exercises. We believe that a more visual presentation and the inclusion of worked examples promote a greater understanding and insight and appeal to a wider audience This book began as notes and lecture slides for a senior undergraduate course in Pattern Recognition at California State University Channel Islands(CSUCI). Over time it grew and approached its current form, which has been class tested over several years at CSUCI. It is suitable for a wide range of students at the advanced undergraduate or graduate level. It assumes only a modest

Preface The use of pattern recognition and classification is fundamental to many of the automated electronic systems in use today. Its applications range from military defense to medical diagnosis, from biometrics to machine learning, from bioinformatics to home entertainment, and more. However, despite the existence of a number of notable books in the field, the subject remains very challenging, especially for the beginner. We have found that the current textbooks are not completely satisfactory for our students, who are primarily computer science students but also include students from mathematics and physics backgrounds and those from industry. Their mathematical and computer backgrounds are considerably varied, but they all want to understand and absorb the core concepts with a minimal time investment to the point where they can use and adapt them to problems in their own fields. Texts with extensive mathematical or statistical prerequisites were daunting and unappealing to them. Our students complained of “not seeing the wood for the trees,” which is rather ironic for textbooks in pattern recognition. It is crucial for newcomers to the field to be introduced to the key concepts at a basic level in an ordered, logical fashion, so that they appreciate the “big picture”; they can then handle progressively more detail, building on prior knowledge, without being overwhelmed. Too often our students have dipped into various textbooks to sample different approaches but have ended up confused by the different terminologies in use. We have noticed that the majority of our students are very comfortable with and respond well to visual learning, building on their often limited entry knowledge, but focusing on key concepts illustrated by practical examples and exercises. We believe that a more visual presentation and the inclusion of worked examples promote a greater understanding and insight and appeal to a wider audience. This book began as notes and lecture slides for a senior undergraduate course and a graduate course in Pattern Recognition at California State University Channel Islands (CSUCI). Over time it grew and approached its current form, which has been class tested over several years at CSUCI. It is suitable for a wide range of students at the advanced undergraduate or graduate level. It assumes only a modest v

Preface background in statistics and mathematics, with the necessary additional material integrated into the text so that the book is essentially self-contained. The book is suitable both for individual study and for classroom use for students physics, computer science, computer engineering, electronic engineering medical engineering, and applied mathematics taking senior undergraduate and graduate courses in pattern recognition and machine learning. It presents a compre hensive introduction to the core concepts that must be understood in order to make independent contributions to the field. It is designed to be accessible to newcomers rom varied backgrounds, but it will also be useful to researchers and professionals n image and signal processing and analysis, and in computer vision. The goal is to present the fundamental concepts of supervised and unsupervised classification in an informal, rather than axiomatic, treatment so that the reader can quickly acquire the necessary background for applying the concepts to real problems. A final chapter indicates some useful and accessible projects which may be undertaken WeuseiMagej(http://rsbweb.nihgov/ij/)andtherelateddistributionFiji(http:// fiji. sc/wiki/index. php/ Fiji) in the early stages of image exploration and analysis, because of its intuitive interface and ease of use. We then tend to move on to MATLAB for its extensive capabilities in manipulating matrices and its image processing and statistics toolboxes. We recommend using an attractive GUI called Diplmage(fromhttp://www.diplib.org/download)toavoidmuchofthecommand line typing when manipulating images. There are also classification toolboxes availableforMatlab,suchasClassificationToolbox(http://www.wiley.com/ Wiley CDA/Section/id-105036. html) which requires a password obtainable from theassociatedcomputermanual)andPrtoOls(hTtp://www.prtools.org/download html). We use the Classification Toolbox in Chap. 8 and recommend it highly for its intuitive GUl. Some of our students have explored Weka, a collection of machine learning algorithms for solving data mining problems implemented in Java and open sourced(http://www.cs.waikatoac.nz/ml/weka/index_downloading.html There are a number of additional resources which can be downloaded from the companionWebsiteforthisbookathttp://extras.springercom/,includingseveral Useful Excel files and data files. Lecturers who adopt the book can also obtain access to the end-of-chapter exercises In spite of our best efforts at proofreading, it is still possible that some typos ma have survived. Please notify me if you find any I have very much enjoyed writing this book; I hope you enjoy reading it Camarillo. ca

background in statistics and mathematics, with the necessary additional material integrated into the text so that the book is essentially self-contained. The book is suitable both for individual study and for classroom use for students in physics, computer science, computer engineering, electronic engineering, biomedical engineering, and applied mathematics taking senior undergraduate and graduate courses in pattern recognition and machine learning. It presents a comprehensive introduction to the core concepts that must be understood in order to make independent contributions to the field. It is designed to be accessible to newcomers from varied backgrounds, but it will also be useful to researchers and professionals in image and signal processing and analysis, and in computer vision. The goal is to present the fundamental concepts of supervised and unsupervised classification in an informal, rather than axiomatic, treatment so that the reader can quickly acquire the necessary background for applying the concepts to real problems. A final chapter indicates some useful and accessible projects which may be undertaken. We use ImageJ (http://rsbweb.nih.gov/ij/) and the related distribution, Fiji (http:// fiji.sc/wiki/index.php/Fiji) in the early stages of image exploration and analysis, because of its intuitive interface and ease of use. We then tend to move on to MATLAB for its extensive capabilities in manipulating matrices and its image processing and statistics toolboxes. We recommend using an attractive GUI called DipImage (from http://www.diplib.org/download) to avoid much of the command line typing when manipulating images. There are also classification toolboxes available for MATLAB, such as Classification Toolbox (http://www.wiley.com/ WileyCDA/Section/id-105036.html) which requires a password obtainable from the associated computer manual) and PRTools (http://www.prtools.org/download. html). We use the Classification Toolbox in Chap. 8 and recommend it highly for its intuitive GUI. Some of our students have explored Weka, a collection of machine learning algorithms for solving data mining problems implemented in Java and open sourced (http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html). There are a number of additional resources, which can be downloaded from the companion Web site for this book at http://extras.springer.com/, including several useful Excel files and data files. Lecturers who adopt the book can also obtain access to the end-of-chapter exercises. In spite of our best efforts at proofreading, it is still possible that some typos may have survived. Please notify me if you find any. I have very much enjoyed writing this book; I hope you enjoy reading it! Camarillo, CA Geoff Dougherty vi Preface

Contents 1 Introduction .......................................... 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Classification . ..................................... 3 1.3 Organization of the Book . . ........................... 6 1.4 Exercises ......................................... 6 References ............................................ 7 2 Classification .......................................... 9 2.1 The Classification Process . ............................ 9 2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Training and Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Supervised Learning and Algorithm Selection . . . . . . . . . . . . . . 17 2.5 Approaches to Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6.1 Classification by Shape . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6.2 Classification by Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.3 More Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6.4 Classification of Letters . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Nonmetric Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Decision Tree Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 Information, Entropy, and Impurity . . . . . . . . . . . . . . . . 29 3.2.2 Information Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.3 Decision Tree Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.4 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Rule-Based Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 ix

4 Statistical Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 Measured Data and Measurement Errors . . . . . . . . . . . . . . . . . . 43 4.2 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.1 Simple Probability Theory . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.2 Conditional Probability and Bayes’ Rule . . . . . . . . . . . . . 46 4.2.3 Naı¨ve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.1 The Multivariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.2 The Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.3 The Mahalanobis Distance . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.1 Parametric and Non-parametric Learning . . . . . . . . . . . . . . . . . . 75 5.2 Parametric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.1 Bayesian Decision Theory . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Discriminant Functions and Decision Boundaries . . . . . . 87 5.2.3 MAP (Maximum A Posteriori) Estimator . . . . . . . . . . . . 94 5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6 Nonparametric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.1 Histogram Estimator and Parzen Windows . . . . . . . . . . . . . . . . . 99 6.2 k-Nearest Neighbor (k-NN) Classification . . . . . . . . . . . . . . . . . 100 6.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.4 Kernel Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7 Feature Extraction and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.1 Reducing Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.1 Inter/Intraclass Distance . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.2 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 127 7.3.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . 135 7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 x Contents

8 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.2 k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.2.1 Fuzzy c-Means Clustering . . . . . . . . . . . . . . . . . . . . . 148 8.3 (Agglomerative) Hierarchical Clustering . . . . . . . . . . . . . . . . . 150 8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9 Estimating and Comparing Classifiers . . . . . . . . . . . . . . . . . . . . . . 157 9.1 Comparing Classifiers and the No Free Lunch Theorem . . . . . . 157 9.1.1 Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.2 Cross-Validation and Resampling Methods . . . . . . . . . . . . . . . 160 9.2.1 The Holdout Method . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.2.2 k-Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . 162 9.2.3 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.3 Measuring Classifier Performance . . . . . . . . . . . . . . . . . . . . . . 164 9.4 Comparing Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.4.1 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.4.2 McNemar’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.4.3 Other Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . 169 9.4.4 The Classification Toolbox . . . . . . . . . . . . . . . . . . . . . 171 9.5 Combining Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 10 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 10.1 Retinal Tortuosity as an Indicator of Disease . . . . . . . . . . . . . . 177 10.2 Segmentation by Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 10.3 Biometric Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 10.3.1 Fingerprint Recognition . . . . . . . . . . . . . . . . . . . . . . . 184 10.3.2 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Contents xi

hapter 1 Introduction 1.1 Overview Humans are good at recognizing objects(or patterns, to use the generic term).We are so good that we take this ability for granted, and find it difficult to analyze the steps in the process. It is generally easy to distinguish the sound of a human voic from that of a violin; a handwritten numeral“3,”" from an“8”; and the aroma of a rose, from that of an onion. Every day, we recognize faces around us, but we do it unconsciously and because we cannot explain our expertise, we find it difficult to write a computer program to do the same. Each person's face is a pattern composed of a particular combination of structures (eyes, nose, mouth, .) located in certain positions on the face. By analyzing sample images of faces, a program should be able to capture the pattern specific to a face and identify(or recognize) it as a face(as a member of a category or class we already know ) this would be pattern recognition. There may be several categories(or classes)and we have to sort(or classify)a particular face into a certain category(or class); hence the term classification. Note that in pattern recognition, the term pattern is interpreted widely and does not necessarily imply a repetition; it is used to include all objects that we might want to classify, e. g, apples (or oranges), speech waveforms, and fingerprint a class is a collection of objects that are similar, but not necessarily identical, and which is distinguishable from other classes. Figure 1. 1 illustrates the difference between classification where the classes are known beforehand and classification where classes are created after inspecting the objects. Interest in pattern recognition and classification has grown due to emerging applications, which are not only challenging but also computatio eman These applications include · Data mining ifting through a large volume of data to extract a small amount of relevant and useful information, e.g, fraud detection, financial forecasting, and redit scoring) G. Dougherty, Pattern DOI 10.1007/978-1-4614-5323-9_1, o Springer Science+Business Media New York 2013

Chapter 1 Introduction 1.1 Overview Humans are good at recognizing objects (or patterns, to use the generic term). We are so good that we take this ability for granted, and find it difficult to analyze the steps in the process. It is generally easy to distinguish the sound of a human voice, from that of a violin; a handwritten numeral “3,” from an “8”; and the aroma of a rose, from that of an onion. Every day, we recognize faces around us, but we do it unconsciously and because we cannot explain our expertise, we find it difficult to write a computer program to do the same. Each person’s face is a pattern composed of a particular combination of structures (eyes, nose, mouth, ...) located in certain positions on the face. By analyzing sample images of faces, a program should be able to capture the pattern specific to a face and identify (or recognize) it as a face (as a member of a category or class we already know); this would be pattern recognition. There may be several categories (or classes) and we have to sort (or classify) a particular face into a certain category (or class); hence the term classification. Note that in pattern recognition, the term pattern is interpreted widely and does not necessarily imply a repetition; it is used to include all objects that we might want to classify, e.g., apples (or oranges), speech waveforms, and fingerprints. A class is a collection of objects that are similar, but not necessarily identical, and which is distinguishable from other classes. Figure 1.1 illustrates the difference between classification where the classes are known beforehand and classification where classes are created after inspecting the objects. Interest in pattern recognition and classification has grown due to emerging applications, which are not only challenging but also computationally demanding. These applications include: • Data mining (sifting through a large volume of data to extract a small amount of relevant and useful information, e.g., fraud detection, financial forecasting, and credit scoring) G. Dougherty, Pattern Recognition and Classification: An Introduction, DOI 10.1007/978-1-4614-5323-9_1, # Springer Science+Business Media New York 2013 1

A异Acat a B 6 AA1171111 B Category "B B凸B8 Fig. 1.1 Classification when the classes are(a) known and (b) unknown beforehand Biometrics(personal identification based on physical attributes of the face, iris, fingerprints, etc.) Machine vision(e. g, automated visual inspection in an assembly line) Character recognition [e. g, automatic mail sorting by zip code, automated check scanners at ATMs(automated teller machines) Document recognition(e.g, recognize whether an e-mail is spam or not, based on the message header and content) Computer-aided diagnosis [e. g, helping doctors make diagnostic decisions based on interpreting medical data such as mammographic images, ultrasound images, electrocardiograms(ECGs), and electroencephalograms(EEGs) Medical imaging [e.g, classifying cells as malignant or benign based on the results of magnetic resonance imaging(MRI)scans, or classify different emo- tional and cognitive states from the images of brain activity in functional MRI Speech recognition(e. g, helping handicapped patients to control machines) Bioinformatics(e. g, DNA sequence analysis to detect genes related to particul Remote sensing(e.g, land use and crop yield) Astronomy (classifying galaxies based on their shapes; or automated searches such as the Search for Extra-Terrestrial Intelligence (SETD) which analyzes radio telescope data in an attempt to locate signals that might be artificial i The methods used have been developed in various fields, often independently In statistics, going from particular observations to general descriptions is called nference, learning [i.e, using example(training) data] is called estimation, and classification is known as discriminant analysis(McLachlan 1992). In engineer g, classification is called pattern recognition and the approach is nonparametric and much more empirical (Duda et al. 2001). Other approaches have their origins n machine learning(Alpaydin 2010), artificial intelligence(Russell and Norvig 2002), artificial neural networks( Bishop 2006), and data mining(Han and Kamber 2006). We will incorporate techniques from these different emphases to give a more unified treatment( Fig. 1.2)

• Biometrics (personal identification based on physical attributes of the face, iris, fingerprints, etc.) • Machine vision (e.g., automated visual inspection in an assembly line) • Character recognition [e.g., automatic mail sorting by zip code, automated check scanners at ATMs (automated teller machines)] • Document recognition (e.g., recognize whether an e-mail is spam or not, based on the message header and content) • Computer-aided diagnosis [e.g., helping doctors make diagnostic decisions based on interpreting medical data such as mammographic images, ultrasound images, electrocardiograms (ECGs), and electroencephalograms (EEGs)] • Medical imaging [e.g., classifying cells as malignant or benign based on the results of magnetic resonance imaging (MRI) scans, or classify different emotional and cognitive states from the images of brain activity in functional MRI] • Speech recognition (e.g., helping handicapped patients to control machines) • Bioinformatics (e.g., DNA sequence analysis to detect genes related to particular diseases) • Remote sensing (e.g., land use and crop yield) • Astronomy (classifying galaxies based on their shapes; or automated searches such as the Search for Extra-Terrestrial Intelligence (SETI) which analyzes radio telescope data in an attempt to locate signals that might be artificial in origin) The methods used have been developed in various fields, often independently. In statistics, going from particular observations to general descriptions is called inference, learning [i.e., using example (training) data] is called estimation, and classification is known as discriminant analysis (McLachlan 1992). In engineering, classification is called pattern recognition and the approach is nonparametric and much more empirical (Duda et al. 2001). Other approaches have their origins in machine learning (Alpaydin 2010), artificial intelligence (Russell and Norvig 2002), artificial neural networks (Bishop 2006), and data mining (Han and Kamber 2006). We will incorporate techniques from these different emphases to give a more unified treatment (Fig. 1.2). Fig. 1.1 Classification when the classes are (a) known and (b) unknown beforehand 2 1 Introduction