
Chapman&Hall/CRCComputerScience&DataAnalysisSeriesExploratory MultivariateAnalysisbyExampleUsingRSECOND EDITIONFrancoisHusson·SebastienLeJeromePagesCRCPressTayfor&FrancisGroupA CHAPMAN & HALL BOOK

Chapman&Hall/CRCComputer ScienceandDataAnalysis SeriesExploratory MultivariateAnalysis by Example Using RFrancois HussonSébastienLeJeromePagesCRC PressRTaylor &Francis GroupBoca Raton London New YorkCRC Press is an imprint of theTaylor & Francis Group, an informa businessACHAPMAN&HALLBOOK
Chapman & Hall/CRC Computer Science and Data Analysis Series Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business A CHAPMAN & HALL BOOK François Husson Sébastien Lê Jérôme Pagès Exploratory Multivariate Analysis by Example Using R

Chapman&HalVCRCComputer Science and Data Analysis SeriesThe interface between the computer and statistical sciences is increasing,as eachdisciplineseekstoharnessthepowerandresourcesoftheother.Thisseriesaimstofostertheintegrationbetweenthecomputersciencesandstatistical,numerical,andprobabilisticmethodsbypublishingabroadrangeofreferenceworks,textbooks,andhandbooks.SERIESEDITORSDavid Blei, Princeton UniversityDavid Madigan, Rutgers UniversityMarina Meila, University of WashingtonFionn Murtagh,Royal Holloway, Universityof LondonProposalsfortheseriesshouldbesentdirectlytooneoftheserieseditorsabove,orsubmittedtoChapman&Hall/CRCTaylorandFrancisGroup3ParkSquare,MiltonParkAbingdon, OX14 4RN, UKPublished TitlesSemisupervisedLearningforComputationalLinguisticsStevenAbneyVisualizationandVerbalizationofDataJorgBlasiusandMichaelGreenacreDesignandModelingforComputerExperimentsKai-Tai Fang,Runze Li,and Agus SudjiantoMicroarrayImageAnalysis:AnAlgorithmicApproachKarlFraser,ZidongWang,andXiaohuiLiuRProgrammingforBioinformaticsRobertGentlemanExploratoryMultivariateAnalysisbyExampleUsingRFrancoisHusson,SebastienLe,andJeromePagesBayesianArtificial Intelligence,SecondEditionKevinB.KorbandAnnE.Nicholson
Chapman & Hall/CRC Computer Science and Data Analysis Series The interface between the computer and statistical sciences is increasing, as each discipline seeks to harness the power and resources of the other. This series aims to foster the integration between the computer sciences and statistical, numerical, and probabilistic methods by publishing a broad range of reference works, textbooks, and handbooks. SERIES EDITORS David Blei, Princeton University David Madigan, Rutgers University Marina Meila, University of Washington Fionn Murtagh, Royal Holloway, University of London Proposals for the series should be sent directly to one of the series editors above, or submitted to: Chapman & Hall/CRC Taylor and Francis Group 3 Park Square, Milton Park Abingdon, OX14 4RN, UK Published Titles Semisupervised Learning for Computational Linguistics Steven Abney Visualization and Verbalization of Data Jörg Blasius and Michael Greenacre Design and Modeling for Computer Experiments Kai-Tai Fang, Runze Li, and Agus Sudjianto Microarray Image Analysis: An Algorithmic Approach Karl Fraser, Zidong Wang, and Xiaohui Liu R Programming for Bioinformatics Robert Gentleman Exploratory Multivariate Analysis by Example Using R François Husson, Sébastien Lê, and Jérôme Pagès Bayesian Artificial Intelligence, Second Edition Kevin B. Korb and Ann E. Nicholson

PublishedTitlescont.Computational Statistics HandbookwithMATLAB,ThirdEditionWendyL.MartinezandAngelR.MartinezExploratoryDataAnalysiswithMATLABThirdEditionWendyL.Martinez,AngelR.Martinez,andJeffreyL.SolkaStatisticsinMATLAB:APrimerWendyL.MartinezandMoonJungChaClusteringforDataMining:ADataRecoveryApproach,Second EditionBorisMirkinIntroductiontoMachineLearningandBioinformaticsSushmitaMitra,SujayDatta,TheodorePerkins,andGeorgeMichailidisIntroductiontoDataTechnologiesPaul MurrellRGraphicsPaul MurrellDataScienceFoundations:GeometryandTopologyof ComplexHierarchicSystemsandBigDataAnalyticsFionn MurtaghCorrespondenceAnalysis andData Codingwith JavaandRFionnMurtaghPattern Recognition AlgorithmsforData MiningSankarK.Pal and Pabitra MitraStatistical ComputingwithRMariaL.RizzoStatisticalLearningandDataScienceMireilleGettlerSumma,LeonBottou,BernardGoldfarb,FionnMurtagh,CatherinePardoux,andMyriamTouatiMusicDataAnalysis:FoundationsandApplicationsClausWeihs,DietmarJannach,IgorVatolkin,andGunterRudolphFoundationsofStatisticalAlgorithms:WithReferencestoRPackagesClausWeihs,OlafMersmann,andUweLigges
Computational Statistics Handbook with MATLAB® , Third Edition Wendy L. Martinez and Angel R. Martinez Exploratory Data Analysis with MATLAB®, Third Edition Wendy L. Martinez, Angel R. Martinez, and Jeffrey L. Solka Statistics in MATLAB®: A Primer Wendy L. Martinez and MoonJung Cho Clustering for Data Mining: A Data Recovery Approach, Second Edition Boris Mirkin Introduction to Machine Learning and Bioinformatics Sushmita Mitra, Sujay Datta, Theodore Perkins, and George Michailidis Introduction to Data Technologies Paul Murrell R Graphics Paul Murrell Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics Fionn Murtagh Correspondence Analysis and Data Coding with Java and R Fionn Murtagh Pattern Recognition Algorithms for Data Mining Sankar K. Pal and Pabitra Mitra Statistical Computing with R Maria L. Rizzo Statistical Learning and Data Science Mireille Gettler Summa, Léon Bottou, Bernard Goldfarb, Fionn Murtagh, Catherine Pardoux, and Myriam Touati Music Data Analysis: Foundations and Applications Claus Weihs, Dietmar Jannach, Igor Vatolkin, and Günter Rudolph Foundations of Statistical Algorithms: With References to R Packages Claus Weihs, Olaf Mersmann, and Uwe Ligges Published Titles cont

CRC PressTaylor &Francis Group6000BrokenSound ParkwayNW,Suite300BocaRaton,FL33487-2742@2017by Taylor&Francis Group,LLCCRC Press isan imprint of Taylor&Francis Group,anInformabusinessNoclaimtooriginalU.S.GovernmentworksPrinted on acid-free paperVersion Date:20170331InternationalStandardBookNumber-13:978-1-1381-9634-6(Hardback)This book contains information obtained from authentic and highly regarded sources,Reasonableefforts havebeen madeto publish reliable data and information,butthe author and publisher cannotassumeresponsibilityforthevalidity ofall materials or the consequences oftheir use.Theauthors andpublishers have attempted to trace thecopyright holders of all material reproduced in this publicationand apologize to copyright holders if permission to publish in this form has not been obtained. If anycopyright material has not been acknowledged please write and let us know so we may rectify in anyfuture reprint.Except as permitted under U.S.Copyright Law, no part of this book may be reprinted, reproducedtransmitted, orutilizedinanyform by anyelectronic,mechanical, or other means, nowknown orhereafter invented, including photocopying, microfilming, and recording, or in any informationstorage or retrieval system, without written permission from the publishers.For permission to photocopy or use material electronically from this work, please accesswww.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.(CCC),222 Rosewood Drive,Danvers,MA01923,978-750-8400.CCC is a not-for-profit organizationthat provides licenses and registration for a variety of users.For organizations that have been granteda photocopy license by the CCC,a separate system of payment has been arranged.Trademark Notice: Product or corporate names may be trademarks or registered trademarks, andare used only for identification and explanation without intent to infringe.Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.comand the CRC Press Web site athttp://www.crcpress.com
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2017 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper Version Date: 20170331 International Standard Book Number-13: 978-1-1381-9634-6 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

ContentsxiPreface1Principal Component Analysis (PCA)111.1 Data-Notation--Examples11.2Objectives21.2.1Studying Individuals31.2.2Studying Variables51.2.3Relationships between the Two Studies51.3Studying Individuals51.3.1The Cloud of Individuals71.3.2Fitting the Cloud of Individuals71.3.2.1BestPlaneRepresentationofNi101.3.2.2Sequence of Axes forRepresenting Ni101.3.2.3How AretheComponents Obtained?101.3.2.4Example1.3.3Representation of the Variables as an Aid for11Interpreting the Cloud of Individuals.131.4StudyingVariables131:4.1TheCloud of Variables141.4.2FittingtheCloudof Variables161.5Relationships between theTwo Representations NandNk171.6Interpreting the Data171.6.1NumericalIndicators1.6.1.1Percentage of Inertia Associated with a17Component1.6.1.2Quality of Representation of an Individual or18Variable.191.6.1.3Detecting Outliers1.6.1.4Contribution of an Individual or Variable to19the Construction of a Component201.6.2Supplementary Elements.1.6.2.1RepresentingSupplementaryQuantitative21Variables.1.6.2.2Representing Supplementary Categorical22Variables241.6.2.3Representing Supplementary Individualsv
Contents Preface xi 1 Principal Component Analysis (PCA) 1 1.1 Data — Notation — Examples . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 Studying Individuals . . . . . . . . . . . . . . . . . . . 2 1.2.2 Studying Variables . . . . . . . . . . . . . . . . . . . . 3 1.2.3 Relationships between the Two Studies . . . . . . . . 5 1.3 Studying Individuals . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 The Cloud of Individuals . . . . . . . . . . . . . . . . 5 1.3.2 Fitting the Cloud of Individuals . . . . . . . . . . . . 7 1.3.2.1 Best Plane Representation of NI . . . . . . . 7 1.3.2.2 Sequence of Axes for Representing NI . . . . 10 1.3.2.3 How Are the Components Obtained? . . . . 10 1.3.2.4 Example . . . . . . . . . . . . . . . . . . . . 10 1.3.3 Representation of the Variables as an Aid for Interpreting the Cloud of Individuals . . . . . . . . . . 11 1.4 Studying Variables . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 The Cloud of Variables . . . . . . . . . . . . . . . . . 13 1.4.2 Fitting the Cloud of Variables . . . . . . . . . . . . . . 14 1.5 Relationships between the Two Representations NI and NK 16 1.6 Interpreting the Data . . . . . . . . . . . . . . . . . . . . . . 17 1.6.1 Numerical Indicators . . . . . . . . . . . . . . . . . . . 17 1.6.1.1 Percentage of Inertia Associated with a Component . . . . . . . . . . . . . . . . . . . 17 1.6.1.2 Quality of Representation of an Individual or Variable . . . . . . . . . . . . . . . . . . . . . 18 1.6.1.3 Detecting Outliers . . . . . . . . . . . . . . . 19 1.6.1.4 Contribution of an Individual or Variable to the Construction of a Component . . . . . . 19 1.6.2 Supplementary Elements . . . . . . . . . . . . . . . . . 20 1.6.2.1 Representing Supplementary Quantitative Variables . . . . . . . . . . . . . . . . . . . . 21 1.6.2.2 Representing Supplementary Categorical Variables . . . . . . . . . . . . . . . . . . . . 22 1.6.2.3 Representing Supplementary Individuals . . 24 v

viContents241.6.3AutomaticDescription oftheComponents251.7ImplementationwithFactoMineR261.8AdditionalResults261.8.1TestingtheSignificanceoftheComponents271.8.2Variables: Loadings versus Correlations271.8.3Simultaneous Representation: Biplots281.8.4Missing Values291.8.5Large Datasets291.8.6Varimax Rotation301.9Example:TheDecathlon Dataset301.9.1Data DescriptionIssues301.9.2Analysis Parameters301.9.2.1Choiceof ActiveElements321.9.2.2Should the Variables Be Standardised?321.9.3Implementation of theAnalysis1.9.3.1Choosingthe Number of Dimensions to34Examine351.9.3.2Studying the Cloud of Individuals381.9.3.3Studying the Cloud of Variables1.9.3.4Joint Analysis of the Cloud of Individuals and40the Cloud of Variables431.9.3.5Comments on the Data451.10 Example:TheTemperatureDataset451.10.1 Data Description Issues451.10.2 Analysis Parameters451.10.2.1 Choice of Active Elements461.10.2.2 Should theVariables Be Standardised?471.10.3ImplementationoftheAnalvsis531.11 Example of Genomic Data: The Chicken Dataset531.11.1 DataDescription—Issues541.11.2 Analysis Parameters541.11.3 Implementation of the Analysis612Correspondence Analysis (CA)612.1Data-ExamplesNotation-632.2Objectives and the Independence Model632.2.1Objectives.642.2.2Independence Model and x2 Test662.2.3The Independence Model and CA672.3Fitting the Clouds672.3.1Clouds of Row Profiles2.3.268Clouds of Column Profiles2.3.370Fitting Clouds N and Nj2.3.4Example: Women's Attitudes to Women's Work in France71in1970
vi Contents 1.6.3 Automatic Description of the Components . . . . . . . 24 1.7 Implementation with FactoMineR . . . . . . . . . . . . . . . 25 1.8 Additional Results . . . . . . . . . . . . . . . . . . . . . . . . 26 1.8.1 Testing the Significance of the Components . . . . . . 26 1.8.2 Variables: Loadings versus Correlations . . . . . . . . 27 1.8.3 Simultaneous Representation: Biplots . . . . . . . . . 27 1.8.4 Missing Values . . . . . . . . . . . . . . . . . . . . . . 28 1.8.5 Large Datasets . . . . . . . . . . . . . . . . . . . . . . 29 1.8.6 Varimax Rotation . . . . . . . . . . . . . . . . . . . . 29 1.9 Example: The Decathlon Dataset . . . . . . . . . . . . . . . 30 1.9.1 Data Description — Issues . . . . . . . . . . . . . . . 30 1.9.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 30 1.9.2.1 Choice of Active Elements . . . . . . . . . . 30 1.9.2.2 Should the Variables Be Standardised? . . . 32 1.9.3 Implementation of the Analysis . . . . . . . . . . . . . 32 1.9.3.1 Choosing the Number of Dimensions to Examine . . . . . . . . . . . . . . . . . . . . 34 1.9.3.2 Studying the Cloud of Individuals . . . . . . 35 1.9.3.3 Studying the Cloud of Variables . . . . . . . 38 1.9.3.4 Joint Analysis of the Cloud of Individuals and the Cloud of Variables . . . . . . . . . . . . . 40 1.9.3.5 Comments on the Data . . . . . . . . . . . . 43 1.10 Example: The Temperature Dataset . . . . . . . . . . . . . . 45 1.10.1 Data Description — Issues . . . . . . . . . . . . . . . 45 1.10.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 45 1.10.2.1 Choice of Active Elements . . . . . . . . . . 45 1.10.2.2 Should the Variables Be Standardised? . . . 46 1.10.3 Implementation of the Analysis . . . . . . . . . . . . . 47 1.11 Example of Genomic Data: The Chicken Dataset . . . . . . 53 1.11.1 Data Description — Issues . . . . . . . . . . . . . . . 53 1.11.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 54 1.11.3 Implementation of the Analysis . . . . . . . . . . . . . 54 2 Correspondence Analysis (CA) 61 2.1 Data — Notation — Examples . . . . . . . . . . . . . . . . . 61 2.2 Objectives and the Independence Model . . . . . . . . . . . . 63 2.2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.2 Independence Model and χ 2 Test . . . . . . . . . . . . 64 2.2.3 The Independence Model and CA . . . . . . . . . . . 66 2.3 Fitting the Clouds . . . . . . . . . . . . . . . . . . . . . . . . 67 2.3.1 Clouds of Row Profiles . . . . . . . . . . . . . . . . . . 67 2.3.2 Clouds of Column Profiles . . . . . . . . . . . . . . . . 68 2.3.3 Fitting Clouds NI and NJ . . . . . . . . . . . . . . . . 70 2.3.4 Example: Women’s Attitudes to Women’s Work in France in 1970 . . . . . . . . . . . . . . . . . . . . . . . . . . 71

viiContents722.3.4.1Column Representation (Mother's Activity).742.3.4.2RowRepresentation (Partner's Work)2.3.5Superimposed Representation of BothRows and74Columns.792.4InterpretingtheData792.4.1Inertias Associated with the Dimensions (Eigenvalues)822.4.2Contribution of Points toa Dimension's Inertia2.4.3Representation Quality of Points on a Dimension or83Plane842.4.4Distance and Inertia in the Initial Space852.5Supplementary Elements (- Illustrative)882.6ImplementationwithFactoMineR902.7CA and Textual Data Processing:942.8Example: The Olympic Games Dataset942.8.1DataDescription—Issues962.8.2Implementation of theAnalysis2.8.2.1ChoosingtheNumberofDimensionsto98Examine982.8.2.2Studying the Superimposed Representation1012.8.2.3Interpreting the Results1022.8.2.4Comments on the Data1042.9Example: The White Wines Dataset2.9.1104Data Description-Issues2.9.2106Margins2.9.3107Inertia2.9.4109RepresentationontheFirstPlane1122.10 Example:The Causes of Mortality Dataset1122.10.1 Data DescriptionIssues1142.10.2 Margins1162.10.3 Inertia1182.10.4 FirstDimension1202.10.5 Plane 2-31242.10.6 Projecting the SupplementaryElements1272.10.7Conclusion1313 Multiple Correspondence Analysis (MCA)1313.1 DataNotation-Examples1323.2Objectives1323.2.1StudyingIndividuals1333.2.2Studying the Variables and Categories3.3Defining Distances between Individuals and Distances between134Categories1343.3.1Distances between the Individuals1343.3.2DistancesbetweentheCategories1363.4CAontheIndicatorMatrix
Contents vii 2.3.4.1 Column Representation (Mother’s Activity) . 72 2.3.4.2 Row Representation (Partner’s Work) . . . . 74 2.3.5 Superimposed Representation of Both Rows and Columns . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.4 Interpreting the Data . . . . . . . . . . . . . . . . . . . . . . 79 2.4.1 Inertias Associated with the Dimensions (Eigenvalues) 79 2.4.2 Contribution of Points to a Dimension’s Inertia . . . . 82 2.4.3 Representation Quality of Points on a Dimension or Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.4.4 Distance and Inertia in the Initial Space . . . . . . . . 84 2.5 Supplementary Elements (= Illustrative) . . . . . . . . . . . 85 2.6 Implementation with FactoMineR . . . . . . . . . . . . . . . 88 2.7 CA and Textual Data Processing . . . . . . . . . . . . . . . . 90 2.8 Example: The Olympic Games Dataset . . . . . . . . . . . . 94 2.8.1 Data Description — Issues . . . . . . . . . . . . . . . 94 2.8.2 Implementation of the Analysis . . . . . . . . . . . . . 96 2.8.2.1 Choosing the Number of Dimensions to Examine . . . . . . . . . . . . . . . . . . . . 98 2.8.2.2 Studying the Superimposed Representation . 98 2.8.2.3 Interpreting the Results . . . . . . . . . . . . 101 2.8.2.4 Comments on the Data . . . . . . . . . . . . 102 2.9 Example: The White Wines Dataset . . . . . . . . . . . . . . 104 2.9.1 Data Description — Issues . . . . . . . . . . . . . . . 104 2.9.2 Margins . . . . . . . . . . . . . . . . . . . . . . . . . . 106 2.9.3 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 2.9.4 Representation on the First Plane . . . . . . . . . . . 109 2.10 Example: The Causes of Mortality Dataset . . . . . . . . . . 112 2.10.1 Data Description — Issues . . . . . . . . . . . . . . . 112 2.10.2 Margins . . . . . . . . . . . . . . . . . . . . . . . . . . 114 2.10.3 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 2.10.4 First Dimension . . . . . . . . . . . . . . . . . . . . . 118 2.10.5 Plane 2-3 . . . . . . . . . . . . . . . . . . . . . . . . . 120 2.10.6 Projecting the Supplementary Elements . . . . . . . . 124 2.10.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 127 3 Multiple Correspondence Analysis (MCA) 131 3.1 Data — Notation — Examples . . . . . . . . . . . . . . . . . 131 3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.2.1 Studying Individuals . . . . . . . . . . . . . . . . . . . 132 3.2.2 Studying the Variables and Categories . . . . . . . . . 133 3.3 Defining Distances between Individuals and Distances between Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 3.3.1 Distances between the Individuals . . . . . . . . . . . 134 3.3.2 Distances between the Categories . . . . . . . . . . . . 134 3.4 CA on the Indicator Matrix . . . . . . . . . . . . . . . . . . 136

viliContents1363.4.1Relationship between MCA and CA1373.4.2The Cloud of Individuals1383.4.3The Cloud of Variables1393.4.4The Cloud of Categories3.4.5142Transition Relations1443.5Interpreting the Data1443.5.1Numerical Indicators3.5.1.1Percentage of Inertia Associated with a144Component..3.5.1.2Contribution and RepresentationQuality of145an Individual or Category3.5.2146Supplementary Elements:1473.5.3Automatic Description of the Components1493.6Implementation withFactoMineR1523.7Addendum1523.7.1Analysing a Survey1523.7.1.1Designing a Questionnaire:Choice of Format1533.7.1.2Accounting forRare Categories.3.7.2Description of a Categorical Variable or a154Subpopulation3.7.2.1Description of a Categorical Variable by a154Categorical Variable3.7.2.2Description ofa Subpopulation (ora155Category)by a QuantitativeVariable3.7.2.3Description ofa Subpopulation (oraCategory)by the Categories of a Categorical156Variable.1573.7.3The Burt Table.1583.7.4Missing ValuesExample: The Survey on the Perception of Genetically3.8160ModifiedOrganisms1603.8.1DataDescriptionIssues3.8.2Analysis Parameters and Implementation with163FactoMineR1643.8.3AnalysingtheFirstPlane1653.8.4Projection of SupplementaryVariables3.8.5167Conclusion1673.9Example:The Sorting Task Dataset3.9.1167DataDescription-Issues3.9.2169Analysis Parameters3.9.3169Representation of Individuals on the First Plane1703.9.4Representation of Categories1713.9.5Representation of the Variables
viii Contents 3.4.1 Relationship between MCA and CA . . . . . . . . . . 136 3.4.2 The Cloud of Individuals . . . . . . . . . . . . . . . . 137 3.4.3 The Cloud of Variables . . . . . . . . . . . . . . . . . 138 3.4.4 The Cloud of Categories . . . . . . . . . . . . . . . . . 139 3.4.5 Transition Relations . . . . . . . . . . . . . . . . . . . 142 3.5 Interpreting the Data . . . . . . . . . . . . . . . . . . . . . . 144 3.5.1 Numerical Indicators . . . . . . . . . . . . . . . . . . . 144 3.5.1.1 Percentage of Inertia Associated with a Component . . . . . . . . . . . . . . . . . . . 144 3.5.1.2 Contribution and Representation Quality of an Individual or Category . . . . . . . . . . . 145 3.5.2 Supplementary Elements . . . . . . . . . . . . . . . . . 146 3.5.3 Automatic Description of the Components . . . . . . . 147 3.6 Implementation with FactoMineR . . . . . . . . . . . . . . . 149 3.7 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 3.7.1 Analysing a Survey . . . . . . . . . . . . . . . . . . . . 152 3.7.1.1 Designing a Questionnaire: Choice of Format 152 3.7.1.2 Accounting for Rare Categories . . . . . . . . 153 3.7.2 Description of a Categorical Variable or a Subpopulation . . . . . . . . . . . . . . . . . . . . . . 154 3.7.2.1 Description of a Categorical Variable by a Categorical Variable . . . . . . . . . . . . . . 154 3.7.2.2 Description of a Subpopulation (or a Category) by a Quantitative Variable . . . . 155 3.7.2.3 Description of a Subpopulation (or a Category) by the Categories of a Categorical Variable . . . . . . . . . . . . . . . . . . . . . 156 3.7.3 The Burt Table . . . . . . . . . . . . . . . . . . . . . . 157 3.7.4 Missing Values . . . . . . . . . . . . . . . . . . . . . . 158 3.8 Example: The Survey on the Perception of Genetically Modified Organisms . . . . . . . . . . . . . . . . . . . . . . . 160 3.8.1 Data Description — Issues . . . . . . . . . . . . . . . 160 3.8.2 Analysis Parameters and Implementation with FactoMineR . . . . . . . . . . . . . . . . . . . . . . . . 163 3.8.3 Analysing the First Plane . . . . . . . . . . . . . . . . 164 3.8.4 Projection of Supplementary Variables . . . . . . . . . 165 3.8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 167 3.9 Example: The Sorting Task Dataset . . . . . . . . . . . . . . 167 3.9.1 Data Description — Issues . . . . . . . . . . . . . . . 167 3.9.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 169 3.9.3 Representation of Individuals on the First Plane . . . 169 3.9.4 Representation of Categories . . . . . . . . . . . . . . 170 3.9.5 Representation of the Variables . . . . . . . . . . . . . 171

ixContents173Clustering41734.1DataIssues1774.2Formalising the Notion of Similarity4.2.1177Similarity between Individuals1774.2.1.1DistancesandEuclideanDistances4.2.1.2178ExampleofNon-EuclideanDistance1794.2.1.3OtherEuclideanDistances4.2.1.4179Similarities and Dissimilarities1804.2.2Similarity between Groups of Individuals1814.3Constructing an Indexed Hierarchy4.3.1181Classic Agglomerative Algorithm4.3.2Hierarchy and Partitions .1834.4183Ward's Method1844.4.1PartitionQuality4.4.2185Agglomeration According to Inertia4.4.3187Two Properties of the Agglomeration Criterion1884.4.4Analysing Hierarchies, Choosing PartitionsDirect1894.5SearchforPartitions:K-Means Algorithm4.5.1189Data--Issues4.5.2190Principle1914.5.3Methodology1914.6Partitioning and Hierarchical Clustering1924.6.1Consolidating Partitions1924.6.2Mixed Algorithm1924.7Clustering and Principal Component Methods1934.7.1Principal Component MethodsPriorto AHC4.7.2Simultaneous Analysis of a Principal ComponentMap193and Hierarchy.1944.8Clustering and Missing Data1944.9Example:TheTemperature Dataset1944.9.1Data Description -Issues4.9.2195AnalysisParameters4.9.3195Implementation of the Analysis1994.10 Example:TheTeaDataset1994.10.1 Data Description—Issues2014.10.2ConstructingtheAHC2024.10.3 Defining the Clusters2044.11 Dividing QuantitativeVariables into Classes2095Visualisation2095.1.Data-Issues2095.2ViewingPCAData2105.2.1Selecting a Subset of ObjectsCloud of Individuals2115.2.2Selecting a Subset of ObjectsCloud of Variables .2125.2.3Adding SupplementaryInformation
Contents ix 4 Clustering 173 4.1 Data — Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.2 Formalising the Notion of Similarity . . . . . . . . . . . . . . 177 4.2.1 Similarity between Individuals . . . . . . . . . . . . . 177 4.2.1.1 Distances and Euclidean Distances . . . . . . 177 4.2.1.2 Example of Non-Euclidean Distance . . . . . 178 4.2.1.3 Other Euclidean Distances . . . . . . . . . . 179 4.2.1.4 Similarities and Dissimilarities . . . . . . . . 179 4.2.2 Similarity between Groups of Individuals . . . . . . . 180 4.3 Constructing an Indexed Hierarchy . . . . . . . . . . . . . . 181 4.3.1 Classic Agglomerative Algorithm . . . . . . . . . . . . 181 4.3.2 Hierarchy and Partitions . . . . . . . . . . . . . . . . . 183 4.4 Ward’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.4.1 Partition Quality . . . . . . . . . . . . . . . . . . . . . 184 4.4.2 Agglomeration According to Inertia . . . . . . . . . . 185 4.4.3 Two Properties of the Agglomeration Criterion . . . . 187 4.4.4 Analysing Hierarchies, Choosing Partitions . . . . . . 188 4.5 Direct Search for Partitions: K-Means Algorithm . . . . . . 189 4.5.1 Data — Issues . . . . . . . . . . . . . . . . . . . . . . 189 4.5.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . 190 4.5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 191 4.6 Partitioning and Hierarchical Clustering . . . . . . . . . . . . 191 4.6.1 Consolidating Partitions . . . . . . . . . . . . . . . . . 192 4.6.2 Mixed Algorithm . . . . . . . . . . . . . . . . . . . . . 192 4.7 Clustering and Principal Component Methods . . . . . . . . 192 4.7.1 Principal Component Methods Prior to AHC . . . . . 193 4.7.2 Simultaneous Analysis of a Principal Component Map and Hierarchy . . . . . . . . . . . . . . . . . . . . . . . 193 4.8 Clustering and Missing Data . . . . . . . . . . . . . . . . . . 194 4.9 Example: The Temperature Dataset . . . . . . . . . . . . . . 194 4.9.1 Data Description — Issues . . . . . . . . . . . . . . . 194 4.9.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 195 4.9.3 Implementation of the Analysis . . . . . . . . . . . . . 195 4.10 Example: The Tea Dataset . . . . . . . . . . . . . . . . . . . 199 4.10.1 Data Description — Issues . . . . . . . . . . . . . . . 199 4.10.2 Constructing the AHC . . . . . . . . . . . . . . . . . . 201 4.10.3 Defining the Clusters . . . . . . . . . . . . . . . . . . . 202 4.11 Dividing Quantitative Variables into Classes . . . . . . . . . 204 5 Visualisation 209 5.1 Data — Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 209 5.2 Viewing PCA Data . . . . . . . . . . . . . . . . . . . . . . . 209 5.2.1 Selecting a Subset of Objects — Cloud of Individuals 210 5.2.2 Selecting a Subset of Objects — Cloud of Variables . . 211 5.2.3 Adding Supplementary Information . . . . . . . . . . 212