正在加载图片...
data cleaning: handle missing data, reduce noise, fix errors data transformation: normalize the data, aggregate data, construct new attributes data reduction reduce number of attributes and records balance skewed data 5. How does crisp-dm differ from SEMMa? The main difference between CRISP-DM and SEMMA is that CRiSP-DM takes a more comprehensive approach--includ ing understand ing of the business and the relevant data-to data mining projects, whereas SEMMa implicitly assumes that the data mining project s goals and objectives along with the appropriate data sources have been identified and understood Section 4.5 Review Questions Identify at least three of the main data mining methods Classification learns patterns from past data(a set of information--traits variables, features--on characteristics of the previously labeled items, objects,or events)in order to place new instances(with unknown labels) into their respective groups or classes. The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases(e.g, people, things, events) into groups or clusters, so that the degree of association is strong among members of the san cluster and weak among members of different clusters Association rule mining is a popular data mining method that is commonly used lIning Is technologically less savvy audience. Association rule mining aims to find interesting relationships(affinities) between variables(items) in large databases Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique Students' answers will differ, but should be based on the following issues Classification is for prediction that can be based on historical data and relationships, such as predicting the weather, product demand, or a students success in a university. If what is being predicted is a class label(e.g,"sunny rainy, or" cloudy )the prediction problem is called a classification, whereas if it is a numeric value(e. g, temperature such as 68F), the prediction problem is called a regression Copyright C2018 Pearson Education, Inc.7 Copyright © 2018Pearson Education, Inc. data cleaning: handle missing data, reduce noise, fix errors data transformation: normalize the data, aggregate data, construct new attributes data reduction: reduce number of attributes and records; balance skewed data 5. How does CRISP-DM differ from SEMMA? The main difference between CRISP-DM and SEMMA is that CRISP-DM takes a more comprehensive approach—including understanding of the business and the relevant data—to data mining projects, whereas SEMMA implicitly assumes that the data mining project’s goals and objectives along with the appropriate data sources have been identified and understood. Section 4.5 Review Questions 1. Identify at least three of the main data mining methods. Classification learns patterns from past data (a set of information—traits, variables, features—on characteristics of the previously labeled items, objects, or events) in order to place new instances (with unknown labels) into their respective groups or classes. The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior. Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters. Association rule mining is a popular data mining method that is commonly used as an example to explain what data mining is and what it can do to a technologically less savvy audience. Association rule mining aims to find interesting relationships (affinities) between variables (items) in large databases. 2. Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique. Students’ answers will differ, but should be based on the following issues. Classification is for prediction that can be based on historical data and relationships, such as predicting the weather, product demand, or a student’s success in a university. If what is being predicted is a class label (e.g., “sunny,” “rainy,” or “cloudy”) the prediction problem is called a classification, whereas if it is a numeric value (e.g., temperature such as 68°F), the prediction problem is called a regression
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有