正在加载图片...
3. List and briefly define at least two classification techniques Decision tree analysis. Decision tree analysis(a machine-learning technique) is arguably the most popular classification technique in the data mining arena. Statistical analysis. Statistical classification techniques include logistic regression and discriminant analysis, both of which make the assumptions that the relationships between the input and output variables are linear in nature, the data is normally distributed, and the variables are not correlated and are independent of each other Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class(or category) Genetic algorithms. The use of the analogy of natural evolution to build directed search-based mechanisms to classify data samples Rough sets. This method takes into account the partial membership of class labels to predefined categories in build ing models(collection of rules) for classification problems What are some of the criteria for comparing and selecting the best classification technique? The amount and availability of historical data The types of data, categorical, interval, ration, etc What is being predicted--class or numeric value The purpose or objective 5. Briefly describe the general algorithm used in decision trees a general algorithm for build ing a decision tree is as follows 1. Create a root node and assign all of the training data to it 2. Select the best splitting attribute 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive(non-overlapping) subsets along the lines of the specif ic split and mode to the branches Copyright C2018 Pearson Education, Inc.8 Copyright © 2018Pearson Education, Inc. 3. List and briefly define at least two classification techniques. • Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. • Statistical analysis. Statistical classification techniques include logistic regression and discriminant analysis, both of which make the assumptions that the relationships between the input and output variables are linear in nature, the data is normally distributed, and the variables are not correlated and are independent of each other. • Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. • Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). • Genetic algorithms. The use of the analogy of natural evolution to build directed search-based mechanisms to classify data samples. • Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems. 4. What are some of the criteria for comparing and selecting the best classification technique? • The amount and availability of historical data • The types of data, categorical, interval, ration, etc. • What is being predicted—class or numeric value • The purpose or objective 5. Briefly describe the general algorithm used in decision trees. A general algorithm for building a decision tree is as follows: 1. Create a root node and assign all of the training data to it. 2. Select the best splitting attribute. 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive (non-overlapping) subsets along the lines of the specific split and mode to the branches
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有