Lecture 2 Raw Data Analysis and Pre-processing Dr.李晓瑜Xiaoyu Li Email:xiaoyuuestc@uestc.edu.cn http://blog.sciencenet.cn/u/uestc2014xiaoyu 2019-Spring SunData Group http://www.sundatagroup.org School of Information and Software Engineering,UESTC 1966 Copyright2019 by Xiaoyu Li
Dr.李晓瑜 Xiaoyu Li Email:xiaoyuuestc@uestc.edu.cn http://blog.sciencenet.cn/u/uestc2014xiaoyu 2019-Spring Lecture 2 Raw Data Analysis and Pre-processing SunData Group http://www.sundatagroup.org/ School of Information and Software Engineering, UESTC Copyright © 2019 by Xiaoyu Li. 1
GunData Group Content (6H) .2.1 Overview of data types .2.2 Review of Data pre-processing tools and platforms 2.3 Clean,storage and management of raw data 2.4 Collections of data analysis and data mining 4 Copyright 2019 by Xiaoyu Li
Content(6H) 2.1 Overview of data types 2.2 Review of Data pre-processing tools and platforms 2.3 Clean, storage and management of raw data 2.4 Collections of data analysis and data mining Copyright © 2019 by Xiaoyu Li. 4
Group Target JATA Obtain the work flow of raw data to clean and pre-process. Know some useful data processing tools and platforms. 5 Copyright 2019 by Xiaoyu Li
Target Obtain the work flow of raw data to clean and pre-process. Know some useful data processing tools and platforms. Copyright © 2019 by Xiaoyu Li. 5
Data Science Process Exploratory Data Analysis Raw Data Data Is Clean Collected Processed Dataset Models Algorithms Data Communicate Visualize Make Product Report Decisions ATA 6 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 6 Data Science Process
2.1 Overview of data types What's data? ●What's data types? 。Date:1980-1-1 。Time:20:08:12 ·Age:65 years .Colors:Red,Black,Blue,Green,White,... ·Name:Xiaoyu Li.. 。Symbols:%,&,#,*,… 7 DATA Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 7 2.1 Overview of data types What’s data? What’s data types? Date:1980-1-1 Time:20:08:12 Age: 65 years Colors: Red, Black, Blue, Green, White,… Name: Xiaoyu Li…. Symbols: %, &, #, *, …
(1)Basic Terms ·Data a set of values of qualitative or quantitative variables; restated,pieces of data are individual pieces of information; ●Dataset a collection of data,lists values for each of the variables; Data object a location in memory having a value and possibly referenced by an identifier; Points,vectors,patterns,samples,observations.... DATA 8 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 8 (1)Basic Terms Data a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information; Dataset a collection of data, lists values for each of the variables; Data object a location in memory having a value and possibly referenced by an identifier; Points, vectors, patterns, samples, observations…
(2)General definition of data types ●In computer science and computer programming,a data type or simply type is a classification identifying one of various types of data,such as real,integer or Boolean,that determines the possible values for that type;the operations that can be done on values of that type;the meaning of the data;and the way values of that type can be stored. DATA 9 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 9 (2) General definition of data types In computer science and computer programming, a data type or simply type is a classification identifying one of various types of data, such as real, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored
(3)Common data types Statistics jc real-valued (interval scale) floating-point real-valued (ratio scale) count data (usually non-negative) integer ●Integers, binary data Boolean categorical data enumerated type ●Booleans, random vector list or array random matrix two-dimensional array ●Characters,. random tree tree Floating-point numbers, Alpha numeric strings. 10 DATA Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 10 (3) Common data types Integers, Booleans, Characters, Floating-point numbers, Alpha numeric strings
(4)Classes of data types Primitive data types ●Composite types .Other types Abstract data types ●Utility types 11 DATA Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 11 (4) Classes of data types Primitive data types Composite types Other types Abstract data types Utility types
1)Primitive data types 。Character character, char ) Integer (integer,int, short, long, byte with a variety of precisions: Floating-point number float, double, real,double precision ) Fixed-point number (fixed with a variety of precisions and a programmer-selected scale. Boolean,logical values true and false. Reference (also called a pointer or handle),a small value referring to another object's address in memory,possibly a much larger one. More sophisticated types which can be built-in include: 。Tuple in MI,Python 。List in Lisp Complex number in Fortran,C (C99),Lisp,Python,Perl 6,D Rational number in Lisp,Perl 6 Associative array in various guises,in Lisp,Perl,Python,Lua,D First-class function,closure,continuation in languages that support functional programming such as Lisp,ML,Perl 6,D and C#3.0 ATA 12 Copyright C 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 12 1)Primitive data types