当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

Data Mining and Model Choice in Supervised Learning

资源类别:文库,文档格式:PPT,文档页数:63,文件大小:935KB,团购合买
1. What is data mining? 2. Association rule discovery 3. Statistical models 4. Predictive modelling 5. A scoring case study 6. Discussion
点击下载完整版文档(PPT)

Data Mining and Model choice in Supervised Learning Gilbert Saporta Chaire de statistique appliquee CEDRIC, CNAM 292 rue Saint Martin 6003 paris gilbert saporta@cnam. fr http://cedric.cnam.fr/usaporta

Data Mining and Model Choice in Supervised Learning Gilbert Saporta Chaire de Statistique Appliquée & CEDRIC, CNAM, 292 rue Saint Martin, F-75003 Paris gilbert.saporta@cnam.fr http://cedric.cnam.fr/~saporta

Outline 1. What is data mining 2. Association rule discovery 3. Statistical models 4. Predictive modelling 5. a scoring case study 6. Discussion Beijing, 2008 2

Beijing, 2008 2 Outline 1. What is data mining? 2. Association rule discovery 3. Statistical models 4. Predictive modelling 5. A scoring case study 6. Discussion

1. What is data mining Data mining is a new field at the frontiers of statistics and information technologies(database management, artificial intelligence, machine learning etc which aims at discovering structures and patterns in large data sets Beijing, 2008 3

Beijing, 2008 3 1. What is data mining? ◼ Data mining is a new field at the frontiers of statistics and information technologies (database management, artificial intelligence, machine learning, etc.) which aims at discovering structures and patterns in large data sets

1.1 Definitions U M Fayyad, G Piatetski-Shapiro :Data Mining is the nontrivial process of identifying valid novel potentially useful and ultimately understandable patterns in data D.J. Hand shall define Data Mining as the discovery of interesting, unexpected, or valuable structures in large data sets Beijing, 2008

Beijing, 2008 4 1.1 Definitions: ◼ U.M.Fayyad, G.Piatetski-Shapiro : “ Data Mining is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data ” ◼ D.J.Hand : “ I shall define Data Mining as the discovery of interesting, unexpected, or valuable structures in large data sets

The metaphor of Data Mining means that there are treasures (or nuggets) hidden under mountains of data, which may be discovered by specific tools Data Mining is concerned with data which were collected for another purpose: it is a secondary analysis of data bases that are collected Not Primarily For Analysis, but for the management of individual cases(Kardaun, T Alanko, 1998) Data Mining is not concerned with efficient methods for collecting data such as surveys and experimental designs(Hand, 2000) Beijing, 2008 5

Beijing, 2008 5 ◼ The metaphor of Data Mining means that there are treasures (or nuggets) hidden under mountains of data, which may be discovered by specific tools. ◼ Data Mining is concerned with data which were collected for another purpose: it is a secondary analysis of data bases that are collected Not Primarily For Analysis, but for the management of individual cases (Kardaun, T.Alanko,1998) . ◼ Data Mining is not concerned with efficient methods for collecting data such as surveys and experimental designs (Hand, 2000)

What is new? Is it a revolution The idea of discovering facts from data is as old as Statistics which"is the science of learning from data OKettenring former ASa president) In the 60s: Exploratory Data Analysis(tukey, Benzecri) Data analysis is a tool for extracting the diamond of truth from the mud of data,>> O P Benzecri 1973) Beijing, 2008 6

Beijing, 2008 6 ◼ The idea of discovering facts from data is as old as Statistics which “ is the science of learning from data ” (J.Kettenring, former ASA president). ◼ In the 60’s: Exploratory Data Analysis (Tukey, Benzecri..) « Data analysis is a tool for extracting the diamond of truth from the mud of data. » (J.P.Benzécri 1973) What is new? Is it a revolution ?

2 Data Mining started from an evolution of DBms towards Decision support Systems using a data Warehouse Storage of huge data sets: credit card transactions, phone calls, supermarket bills: giga and terabytes of data are collected automatically Marketing operations: CRM customer relationship management Research in artificial Intelligence, machine learning KDD for Knowledge Discovery in Data Bases Beijing, 2008 7

Beijing, 2008 7 1.2 Data Mining started from: ◼ an evolution of DBMS towards Decision Support Systems using a Data Warehouse. ◼ Storage of huge data sets: credit card transactions, phone calls, supermarket bills: giga and terabytes of data are collected automatically. ◼ Marketing operations: CRM (customer relationship management) ◼ Research in Artificial Intelligence, machine learning, KDD for Knowledge Discovery in Data Bases

1.3 Goals and tools Data Mining is a secondary analysis >> of data collected in an other purpose(management eg Data Mining aims at finding structures of two kinds: models and patterns Patterns a characteristic structure exhibited by a few number of points a small subgroup of customers with a high commercial value, or conversely highly risked Tools: cluster analysis visualisation by dimension reduction PCA, CA etc association rules Beijing, 2008 8

Beijing, 2008 8 1.3 Goals and tools ◼ Data Mining is a « secondary analysis » of data collected in an other purpose (management eg) ◼ Data Mining aims at finding structures of two kinds : models and patterns ◼ Patterns ◼ a characteristic structure exhibited by a few number of points : a small subgroup of customers with a high commercial value, or conversely highly risked. ◼ Tools: cluster analysis, visualisation by dimension reduction: PCA, CA etc. association rules

Models Building models is a major activity for statisticians econometricians and other scientists a model is a global summary of relationships between variables, which both helps to understand phenomenons and allows predictions dM is not concerned with estimation and tests off prespecified models but with discovering models through an algorithmic search process exploring linear and non linear models explicit or not: neural networks, decision trees, Support Vector Machines logistic regression, graphical models etc In DM Models do not come from a theory but from data exploration Beijing, 2008 9

Beijing, 2008 9 Models ◼ Building models is a major activity for statisticians econometricians, and other scientists. A model is a global summary of relationships between variables, which both helps to understand phenomenons and allows predictions. ◼ DM is not concerned with estimation and tests, of prespecified models, but with discovering models through an algorithmic search process exploring linear and non-linear models, explicit or not: neural networks, decision trees, Support Vector Machines, logistic regression, graphical models etc. ◼ In DM Models do not come from a theory, but from data exploration

process or tools? DM often appears as a collection of tools presented usually in one package, in such a way that several techniques may be compared on the same data-set But DM is a process not only tools Data Information Knowledge preprocessIng analysis Beijing, 2008 10

Beijing, 2008 10 process or tools? ◼ DM often appears as a collection of tools presented usually in one package, in such a way that several techniques may be compared on the same data-set. ◼ But DM is a process, not only tools: Data Information Knowledge preprocessing analysis

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共63页,可试读20页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有