8 Copyright © 2018Pearson Educat_中国高校课件下载中心

点击下载：《商务智能：数据分析的管理视角 Business Intelligence, Analytics, and Data Science：A Managerial Perspective》教学资源（教师手册，原书第4版）04 Predictive Analytics I：Data Mining Process, Methods, and Algorithms

正在加载图片...

3. List and briefly define at least two classification techniques Decision tree analysis. Decision tree analysis(a machine-learning technique) is arguably the most popular classification technique in the data mining arena. Statistical analysis. Statistical classification techniques include logistic regression and discriminant analysis, both of which make the assumptions that the relationships between the input and output variables are linear in nature, the data is normally distributed, and the variables are not correlated and are independent of each other Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class(or category) Genetic algorithms. The use of the analogy of natural evolution to build directed search-based mechanisms to classify data samples Rough sets. This method takes into account the partial membership of class labels to predefined categories in build ing models(collection of rules) for classification problems What are some of the criteria for comparing and selecting the best classification technique? The amount and availability of historical data The types of data, categorical, interval, ration, etc What is being predicted--class or numeric value The purpose or objective 5. Briefly describe the general algorithm used in decision trees a general algorithm for build ing a decision tree is as follows 1. Create a root node and assign all of the training data to it 2. Select the best splitting attribute 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive(non-overlapping) subsets along the lines of the specif ic split and mode to the branches Copyright C2018 Pearson Education, Inc.8 Copyright © 2018Pearson Education, Inc. 3. List and briefly define at least two classification techniques. • Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. • Statistical analysis. Statistical classification techniques include logistic regression and discriminant analysis, both of which make the assumptions that the relationships between the input and output variables are linear in nature, the data is normally distributed, and the variables are not correlated and are independent of each other. • Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. • Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). • Genetic algorithms. The use of the analogy of natural evolution to build directed search-based mechanisms to classify data samples. • Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems. 4. What are some of the criteria for comparing and selecting the best classification technique? • The amount and availability of historical data • The types of data, categorical, interval, ration, etc. • What is being predicted—class or numeric value • The purpose or objective 5. Briefly describe the general algorithm used in decision trees. A general algorithm for building a decision tree is as follows: 1. Create a root node and assign all of the training data to it. 2. Select the best splitting attribute. 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive (non-overlapping) subsets along the lines of the specific split and mode to the branches

<<向上翻页向下翻页>>