An Introduction to WEKA Contributed by Yizhou Sun 2008
Contributed by Yizhou Sun 2008 An Introduction to WEKA
Content What iS WEKA? e EXPLorer e Preprocess data ● Classification Clustering Association rules Attribute Selection Data visualization References and resources
Content What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 2 1/29/2021
What is WEKA? o Waikato Environment for Knowledge analysis It's a data mining/ machine learning tool developed by Department of computer Science, university of waikato, New Zealand e Weka is also a bird found only on the islands of new zealand
What is WEKA? Waikato Environment for Knowledge Analysis It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand. 3 1/29/2021
Download and Install WeKa Website http://www.cs.waikatoac.nz/ml/weka/index.htm Support multiple platforms(written in java) Windows Mac os X and linux
Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html Support multiple platforms (written in java): Windows, Mac OS X and Linux 4 1/29/2021
Main Features e 49 data preprocessing tools o 76 classification /regression algorithms 8 clustering algorithms 3 algorithms for finding association rules e 15 attribute/ subset evaluators+ 10 search algorithms for feature selection
Main Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection 5 1/29/2021
Main gu Three graphical user interfaces eka ●“ The explorer”( exploratory data analysis) Waikato Environment for The Experimenter"(experimental Version 3. 4. 12 environment) (c)1999·200 niversity of Waikato ●“ The Knowledge Flow( new process model inspired interface Experimenter KnowledgeFlow
Main GUI Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface) 6 1/29/2021
Content What iS WEKa? The explorer e Preprocess data ● Classification Clustering Association rules Attribute Selection Data visualization References and resources
Content What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 7 1/29/2021
EXplorer: pre-processing the data Data can be imported from a file in various formats: ARFF CSV, C4.5,binary e Data can also be read from a url or from an SQl database (using jDBC ●Pre- processing tools in WEKa are called“ filters” WEKA contains filters for Discretization, normalization, resampling, attribute selection transforming and combining attributes
8 1/29/2021 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
WEKA only deals with"flatfiles arelation heart-disease-simplified (attribute age numeric @attribute sexi female, male) @attribute chest-pain_type typ_angina, asympt, non_anginal, atyp_anginal (attribute cholesterol numeric @attribute exercise_induced _angina no, yes @attribute class present, not_present) (ad 63, male, typ_angina, 233, no, not_present 67, male, asympt. 286. ves, present 67, male, asympt, 229, yes, present Flat file in 38, female, non_anginal, no, not_present ARFF format 1/29/2021
9 1/29/2021 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... WEKA only deals with “flat” files
WEKA only deals with"flatfiles arelation heart-disease-simplified numeric attribute (attribute age numeric @attribute sexi female, male) -nominal attribute @attribute chest-pain_type typ_angina, asympt, non_anginal, atyp_anginal (attribute cholesterol numeric @attribute exercise_induced _angina no, yes @attribute class present, not_present) (ad 63, male, typ_angina, 233, no, not_present 67, male, asympt. 286. ves, present 67, male, asympt, 229, yes, present 38, female, non_anginal, no, not_present
10 1/29/2021 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... WEKA only deals with “flat” files