第11章非参数统计法 Nonparametric Statistics
第11章 非参数统计法 Nonparametric Statistics
本章概要 Testing with Rank sum Z Test for Differences in Two Proportions (Independent Samples u x Test for Differences in Two Proportions (Independent Samples) u x Test for Differences in c Proportions (Independent Samples) n x Test of Independence
本章概要 •Testing with Rank Sum • Z Test for Differences in Two Proportions (Independent Samples) 2 Test for Differences in Two Proportions (Independent Samples) 2 Test for Differences in c Proportions (Independent Samples) 2 Test of Independence
常见非参数法 Statistical Procedures for Hypothesis Testing that o Because they are based on Counts or Ran o A Random Sample is still required The Nonparametric Approach Based on a Count the number of times some event occurs a Use the binomial distribution to decide whether this count is reasonable or not under the null hypothesis The Nonparametric Approach Based on o Replace each data value with its rank( a Use formulas and ta bles created for testing ranks
常见非参数法 Statistical Procedures for Hypothesis Testing that do Not Require a Normal Distribution Because they are based on Counts or Ranks A Random Sample is still required The Nonparametric Approach Based on Counts Count the number of times some event occurs Use the binomial distribution to decide whether this count is reasonable or not under the null hypothesis The Nonparametric Approach Based on Ranks Replace each data value with its rank (1, 2, 3, …) Use formulas and tables created for testing ranks
参数法及其效率 Parametric Methods, Efficiency Parametric methods a Statistical procedures that require a completely specified model D e.g. t tests, regression tests, F tests Efficiency D A measure of the effectiveness of a statistical test a Tells how well it makes use of the information in the data D A more-efficient test can achieve the same results with a smaller sample size
参数法及其效率 Parametric Methods, Efficiency Parametric Methods Statistical procedures that require a completely specified model e.g., t tests, regression tests, F tests Efficiency A measure of the effectiveness of a statistical test Tells how well it makes use of the information in the data A more-efficient test can achieve the same results with a smaller sample size
优、缺点 Advantages of Nonparametric Testing d No need to assume normalit n Avoids problems of transformation(e.g, interpretation) D Can be used with ordinal data o Because ranks can be found a Can be much more efficient than parametric methods when distributions are not normal Disadvantage of Nonparametric Testing o Less statistically efficient than parametric methods when distributions are normal Often, this loss of efficiency is slight
优、缺点 Advantages of Nonparametric Testing No need to assume normality Avoids problems of transformation (e.g., interpretation) Can be used with ordinal data ⚫ Because ranks can be found Can be much more efficient than parametric methods when distributions are not normal Disadvantage of Nonparametric Testing Less statistically efficient than parametric methods when distributions are normal ⚫ Often, this loss of efficiency is slight
中位数的检验( Median) a Without assuming a normal distribution a Note: The number of sample data values below a continuous population's median follows a binomial distribution where p=0.5 and n is the sample size 口 The sign Test号验 1. Find the modified sample size m, the number of data values different from the reference value 0 2. Find the limits in the table for this modified sample size 3. Count how many data values fall below the reference value 4. Significant if the count(step 3) is outside the limits(step 2)
中位数的检验( Median) Without assuming a normal distribution Note: The number of sample data values below a continuous population’s median follows a binomial distribution where p = 0.5 and n is the sample size The Sign Test(符号检验) 1.Find the modified sample size m, the number of data values different from the reference value q0 2.Find the limits in the table for this modified sample size 3.Count how many data values fall below the reference value 4.Significant if the count (step 3) is outside the limits (step 2)
Example: Family Income Comparing local to National Family Income a Survey median is 70,547, based on 25 families 口 National median is5 This is the reference value a Performing the sign test Modified sample size is m-25, since all sampled families have incomes different from the reference value Limits from the table are 8 and 7(for testing at the 5%o level with m-25) There are o families with income below the reference value o Since falls outside the limits(from to )
Example: Family Income Comparing Local to National Family Income Survey median is $70,547, based on n = 25 families National median is $27,735 ⚫ This is the reference value q0 Performing the sign test ⚫ Modified sample size is m = 25, since all sampled families have incomes different from the reference value ⚫ Limits from the table are 8 and 17 (for testing at the 5% level with m = 25) ⚫ There are 6 families with income below the reference value ⚫ Since 6 falls outside the limits (from 8 to 17), Median family income in the community is significantly higher than the national median
配对设计资料的检验 Sign Test for the Differences(符号检验) 口 Two columns of data a Reduce to a single column representing the differences(changes) between the two columns A similar approach to the two-sample paired t test, Chapter 10 a Perform the sign test on these differences 1. Find the the number of data values that change between columns and 2. Find the limits in the table for this modified sample size 3. Count how many data values went down 4. Significant if the count(step 3) is outside the limits(step 2)
配对设计资料的检验 Sign Test for the Differences(符号检验) Two columns of data Reduce to a single column representing the differences (changes) between the two columns ⚫ A similar approach to the two-sample paired t test, Chapter 10 Perform the sign test on these differences 1.Find the modified sample size m, the number of data values that change between columns 1 and 2 2.Find the limits in the table for this modified sample size 3.Count how many data values went down 4.Significant if the count (step 3) is outside the limits (step 2)
Sign Test for the Differences 口 Hypotheses Probability of Probability of That is, the probability of going up equals the probability of going down H: Probability of xr is to Probability of X>H That is, the probability of going up and down are unequal 口 Assumption o The data set is a random sample from the population of interest and each elementary unit in the sample has both values X and y measured for it
Sign Test for the Differences Hypotheses H0 : Probability of X Y That is, the probability of going up equals the probability of going down H1 : Probability of X Y That is, the probability of going up and down are unequal Assumption ⚫ The data set is a random sample from the population of interest and each elementary unit in the sample has both values X and Y measured for it
Two Unpaired Samples(非配对样本检验) The Nonparametric Test is Based on the ranks ofall of the data a Put both samples together to define overall ranks a Three ways to obtain the same answer o The Wilcoxon rank-sum test o The Mann-Whitney test o Test the average ranks against each other R2-R1 est statistic 7.+12+1 (11+n2 If the test statistic is larger than o in magnitude, the two samples are
Two Unpaired Samples(非配对样本检验) The Nonparametric Test is Based on the Ranks of ALL of the Data Put both samples together to define overall ranks Three ways to obtain the same answer ⚫ The Wilcoxon rank-sum test ⚫ The Mann-Whitney U test ⚫ Test the average ranks against each other If the test statistic is larger than 1.960 in magnitude, the two samples are significantly different 1 2 1 2 1 2 2 1 12 1 ( ) Test statistic n n n n n n R R + + + − =