第6章资料结构 Data structures Classifying the various Types of Data sets
第6章 资料结构 Data Structures: Classifying the Various Types of Data Sets
基本术语 ◆数据集合 o Measurements of items e.g., Yearly sales volume for your 23 salespeople e.g., Cost and number produced, daily, for the past month ◆基本单元 o The items being measured e.g., Salespeople, Days, Companies, Catalogs ◆变量 o The type of measurement being done e.g., Sales volume, Cost, Productivity, Number of defects
基本术语 ◆数据集合 ⚫ Measurements of items ◼ e.g., Yearly sales volume for your 23 salespeople ◼ e.g., Cost and number produced, daily, for the past month ◆基本单元 ⚫ The items being measured ◼ e.g., Salespeople, Days, Companies, Catalogs, … ◆变量 ⚫ The type of measurement being done ◼ e.g., Sales volume, Cost, Productivity, Number of defects, …
有哪些变量? Univariate data set: One variable measured for each elementary unit(单变量) e.g., Sales for the top 30 computer companies Can do: Typical summary, diversity, special features ◆ Bivariate data set: Two variables(双变量) o e.g., Sales and Employees for top 30 computer firms Can also do: relationship, prediction ◆ Multivariate data set: Three or more variables(多变量) o e.g., Sales, Employees, Inventories, Profits o Can also do: predict one from all other variables
◆ Univariate data set: One variable measured for each elementary unit(单变量) ⚫ e.g., Sales for the top 30 computer companies. ⚫ Can do: Typical summary, diversity, special features ◆ Bivariate data set: Two variables(双变量) ⚫ e.g., Sales and Employees for top 30 computer firms ⚫ Can also do: relationship, prediction ◆ Multivariate data set: Three or more variables(多变量) ⚫ e.g., Sales, Employees, Inventories, Profits, … ⚫ Can also do: predict one from all other variables 有哪些变量?
数值型或分类型( Categories) ◆ Quantitative Variable:计量型或尺度型 o e.g., Sales, Employees Can add. rank. count ◆ Qualitative Variable:分类型(有序、名义) Ordinal Variable: Categories with meaningful ordering a e.g., Bond rating(AA, A, B,...), Diamonds(VSI, SI,...) Can rank count o Nominal Variable: categories without meaningful ordering ae.g, State, Type of business, Field of study Can count
数值型或分类型(Categories) ◆ Quantitative Variable: 计量型或尺度型 ⚫ e.g., Sales, # Employees ⚫ Can add, rank, count ◆ Qualitative Variable: 分类型(有序、名义) ⚫ Ordinal Variable: Categories with meaningful ordering ◼ e.g., Bond rating (AA, A, B, …), Diamonds(VSI, SI, …) ◼ Can rank, count ⚫ Nominal Variable: categories without meaningful ordering ◼ e.g., State, Type of business, Field of study ◼ Can count
时间序列型或横截面型 Time-Series or Cross-Sectional? Time-Series Data: Data values recorded in meaningful sequence such as stock market index et al o Elementary units might be days or quarters or years o e.g., Daily Dow-Jones stock market average close for the past 90 days o e.g., Your firm's quarterly sales over the past 5 years Cross-Sectional Data: No meaningful sequence o e.g., Sales of 30 companies o e.g., Productivity of each sales division o Easier than time series/
时间序列型或横截面型 Time-Series or Cross-Sectional? ◆ Time-Series Data: Data values recorded in meaningful sequence such as stock market index et.al.. ⚫ Elementary units might be days or quarters or years ⚫ e.g., Daily Dow-Jones stock market average close for the past 90 days ⚫ e.g., Your firm’s quarterly sales over the past 5 years ◆ Cross-Sectional Data: No meaningful sequence ⚫ e.g., Sales of 30 companies ⚫ e.g., Productivity of each sales division ⚫ Easier than time series!
EXample-1 Firm Sales Industry Group S&P rating IBM 66, 346 Office equipment A Exxon 59.023 Fuel A GE 40.482 Conglomerates A+ AT&T 34.357 Telecommunications A S&P: Standard&Poor’s
Example-1 Firm Sales Industry Group S&P Rating IBM 66,346 Office Equipment A Exxon 59,023 Fuel AGE 40,482 Conglomerates A+ AT&T 34,357 Telecommunications AS&P: Standard & Poor’s
EXample(续) Cross on Multivariate Data(3 variables) S Fi rm Sales Industry Group S&P Rating IBM 66, 346 Office Equipment Exxon, 023 Fuel AA GE 40,482 Conglomerates A+ AT&T/ 34, 357 Telecommunications A Elementary Quantitative Nominal Ordinal units variable Qualitative Qualitative variable variable
Example (续) Firm Sales Industry Group S&P Rating IBM 66,346 Office Equipment A Exxon 59,023 Fuel AGE 40,482 Conglomerates A+ AT&T 34,357 Telecommunications AMultivariate Data (3 variables) Elementary units Quantitative variable Nominal Qualitative variable Ordinal Qualitative variable
EXample-2 Year Small Business administration Budget(S Millions 1991 464 1992 1.891 1993 1.177 1994 2,058 1995 798 1996 749
Example-2 Year Small Business Administration Budget ($ Millions) 1991 464 1992 1,891 1993 1,177 1994 2,058 1995 798 1996 749
EXample(continued) Ime series Year Small Business administration Budget(S Millions) 1991 464 1992 1891 1993 177 1994 2.058 1995 798 1996 749 Elementary unit Quantitative data defined by“year
Example(continued) Year Small Business Administration Budget ($ Millions) 1991 464 1992 1,891 1993 1,177 1994 2,058 1995 798 1996 749 Elementary unit defined by “year” Quantitative data