第7章描述性统计 Descriptive Statistics
第7章 描述性统计 Descriptive Statistics
、集中趋势( Central Tendency) 1 What is the most typical value? The Average: A typical value for quantitative data The Weighted Average: Adjusting for importance The Median: A typical value for quantitative and ordinal data The Mode: A typical value even for nominal data 2 What percentile is it? Extremes, Quartiles, and Box Plots e The Cumulative distribution function displays the percentiles
一、集中趋势(Central Tendency ) 1、What is the most typical value? ◆ The Average: A typical value for quantitative data ◆ The Weighted Average: Adjusting for importance ◆ The Median: A typical value for quantitative and ordinal data ◆ The Mode: A typical value even for nominal data 2、What percentile is it? ◆ Extremes, Quartiles, and Box Plots ◆ The Cumulative distribution function displays the percentiles
平均值或均数( Average or Mean) Add the data, divide by n or N(the number of elementary units X1+X2+..+X X (样本) Sample average X1+X2+…+XN (总体) Population average Divides total equally. The only such summary .A representative, central number (if data set is approximately norma l近似正态分布) ◆ Summation notation 1x=1x ∑ is capital Greek sign
平均值或均数(Average or Mean) ◆ Add the data, divide by n or N (the number of elementary units) ◆ Divides total equally. The only such summary ◆ A representative, central number (if data set is approximately normal近似正态分布) ◆ Summation notation ⚫ S is capital Greek sigma n X X X X + + + n = ... 1 2 N X + X + + X N = ... 1 2 (样本) Sample average (总体)Population average = = n i Xi n X 1 1 = = N i Xi N 1 1
Example:次品数( Number of Defects) Defects measured for each of 10 production lots 4.1.3.7.3.0.7.14.5.9 10 20 Defects per lot Average is 5.1 defects per lot
Example: 次品数(Number of Defects) ◆ Defects measured for each of 10 production lots 4, 1, 3, 7, 3, 0, 7, 14, 5, 9 0 2 0 5 10 15 20 Defects per lot Frequency (lots) Average is 5.1 defects per lot
中位数( Median) e Also summarizes the data ◆ The middle one:强它是一个位置指标! Pu d ut data in order (先排序) ● Pick middle one( or average middle two if n is even(偶数)) Median(9, 4, 5)=Median(4, 5,9)=5 5+7 Median(9,4,5,7)= Median(4,5,7,9)=276 ◆Rank(秩) of the median is(1+m)2 o If n=3, rank is(1+3)/2=2 o If n=4, rank is(1+4)/2-2.5(so average 2nd and 3rd) Ifn=262, rank is(1+262)2=131.5
中位数(Median) ◆ Also summarizes the data ◆ The middle one:强调它是一个位置指标! ⚫ Put data in order(先排序) ⚫ Pick middle one (or average middle two if n is even(偶数)) ⚫ Median (9, 4, 5) = Median(4, 5, 9) = 5 ⚫ Median (9, 4, 5, 7) = Median (4, 5, 7, 9) = = 6 ◆ Rank(秩) of the median is (1+n)/2 ⚫ If n=3, rank is (1+3)/2 = 2 ⚫ If n=4, rank is (1+4)/2 = 2.5 (so average 2nd and 3rd) ⚫ If n=262, rank is (1+262)/2 = 131.5 5+7 2
中位数(续) . A representative, central number . If data set has a center o Less sensitive to outliers than the average ◆ For skewed data, represents the typical case(代表性个案即 大多数的)” better than the average does g Incomes Average income for a country equally divides the total, which may include some very highincomes a Median income chooses the middle person (halfearn less, halfearn more), giving less influence to high incomes (if any)
中位数(续) ◆ A representative, central number ⚫ If data set has a center ◆ Less sensitive to outliers than the average ◆ For skewed data, represents the “typical case(代表性个案即 大多数的)” better than the average does ⚫ e.g., incomes ◼ Average income for a country equally divides the total, which may include some very high incomes ◼ Median income chooses the middle person (half earn less, half earn more), giving less influence to high incomes (if any)
Example:消费( Spending) Customers plan to spend(Thousands) 3.8.1.4.0.3.0.6.2.8.5.5.0.9.1.1 ◆Rank(秩) ordered from smallest to largest Do 0.3.0.6.0.9.1.1.1.4.2.8.3.8.5.5 R 123415678 Rank of median (1+8)/2=4.5 64 Median is(1.1+1.4)/2=1.25 3188 e Smaller than the average, 2.05 012345 Due to slight skewness? Median verage
Example: 消费(Spending) ◆ Customers plan to spend ($thousands) 3.8, 1.4, 0.3, 0.6, 2.8, 5.5, 0.9, 1.1 ◆ Rank(秩) ordered from smallest to largest 0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.5 1 2 3 4 5 6 7 8 ◆ Median is (1.1+1.4)/2 = 1.25 ⚫ Smaller than the average, 2.05 ◼ Due to slight skewness? Rank of median = (1+8)/2 = 4.5 0 1 2 3 4 5 3 1 8 8 5 6 4 9 Median Average
Example: The Crash of October 19.1987 The market lost about 20% of its value in one day Dow-Jones Industrials, stock-price changes as each stock began trading that fateful morning ◆ Fairly normal(近似正态) e Mean and median are similar 20% -10% 0% Median =-8.6% Percent change at opening Average=-8.2%
Example: The Crash of October 19,1987 The market lost about 20% of its value in one day ◆ Dow-Jones Industrials, stock-price changes as each stock began trading that fateful morning ◆ Fairly normal(近似正态) ◆ Mean and median are similar 0 5 -20% -10% 0% Percent change at opening Frequency Average = -8.2% Median = -8.6%
EXample: Incomes(Many small values some moderate valuesla few large and very large values) Personal income of 100 people Average is higher than median due to skewness 50 40 30 20 10 0 S100,000 S200.000 Income Average=$38, 710 Median =$27, 216
Example: Incomes(Many small values\some moderate values\a few large and very large values) ◆ Personal income of 100 people ◆ Average is higher than median due to skewness 0 10 20 30 40 50 $0 $100,000 $200,000 Income Average = $38,710 Median = $27,216 Frequency
众数(Mode) e Also summarizes the data e Most common data value o Middle of tallest histogram bar Mode ● Problems: Depends on how you draw histogram(bin width) Might be more than one mode(two tallest bars) ode e Good if most data values arecorrect ● Good for nominal(名义的)data(eg, elections
众数(Mode) ◆ Also summarizes the data ◆ Most common data value ⚫ Middle of tallest histogram bar ◆ Problems: ⚫ Depends on how you draw histogram (bin width) ⚫ Might be more than one mode (two tallest bars) ◆ Good if most data values are “correct” ◆ Good for nominal(名义的) data (e.g., elections) Mode Mode