Chapter 4 STATISTICALINFERENCE ESTIMATION AND HYPOTHESES TESTING Statistical inference draws conclusions about a population /i.e, probability density function(PDF/from a random sample that has supposedly been drawn from that population ●●●●●●
Chapter 4 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING Statistical inference draws conclusions about a population [i.e., probability density function (PDF) ] from a random sample that has supposedly been drawn from that population
4.1 THE MEANING OF STATISTICAL INFERENCE Statistical inference: the study of the relationship between a population and a sample drawn for that population The process of generalizing from the sample value ( to the population value E(X)is the essence of statistical inference
4.1 THE MEANING OF STATISTICAL INFERENCE Statistical inference: the study of the relationship between a population and a sample drawn for that population. The process of generalizing from the sample value ( ) to the population value E(X) is the essence of statistical inference. X
4.2 ESTIMATION AND HYPOTHESIS TESTING TWIN BRANCHES OF STATISTICAL INFERENCE 1. Estimation Estimation: the first step in statistical inference X: an estimator/statistic of the population parameter E(X), estimate: the particularnumerical value of the estimator sampling variation /sampling error: the variation in estimation from sample to sample. 2. Hypothesis testing In hypothesis testing we may have a prior judgment or expectation about what value a particularparameter may assume
4.2 ESTIMATION AND HYPOTHESIS TESTING: TWIN BRANCHES OF STATISTICAL INFERENCE 1. Estimation Estimation: the first step in statistical inference. : an estimator/statistic of the population parameter E(X), estimate: the particular numerical value of the estimator sampling variation /sampling error: the variation in estimation from sample to sample. 2.Hypothesis testing In hypothesis testing we may have a prior judgment or expectation about what value a particular parameter may assume. X
4.3 ESTIMATION OF PARAMETERS The usual procedure of estimation to assume that we have a random sample of size n from the known probability distribution and use the sample to estimate the unknown parameters, that is, use the sample mean as an estimate of the population mean (or expected value) and the sample variance as an estimate of the population variance 1. Point estimate A point estimator, or a statistic, is an r.V., its value will vary from sample to sample How can we rely on just one estimate X of the true population mean
4.3 ESTIMATION OF PARAMETERS The usual procedure of estimation: —— to assume that we have a random sample of size n from the known probability distribution and use the sample to estimate the unknown parameters, that is, use the sample mean as an estimate of the population mean (or expected value) and the sample variance as an estimate of the population variance. • 1. Point estimate A point estimator, or a statistic, is an r.v., its value will vary from sample to sample. How can we rely on just one estimate of the true population mean. X
2. Interval estimate Although x is the single"best guess of the true population mean, the interval, say, from 8 to 14, most likely includes the true u? This is interval estimation Sampling or probability distribution:X N(L, (X- N(21) P(tn1≤t≤tn1)=1-a S/vn P(X ≤Hx≤x+ critical t values:±t n-1 confidence interval X X (lower limit-upper limit confidence coefficient: 1-a level of significance/the prob of committing type I error: a
2. Interval estimate Although is the single “best” guess of the true population mean, the interval, say, from 8 to 14, most likely includes the true μχ ? This is interval estimation. Sampling or probability distribution: • • P(-t n-1 ≤t≤t n-1 )=1-α critical t values:±t n-1 confidence interval: (lower limit-upper limit) confidence coefficient: 1-α level of significance/the prob. of committing type I error: α X ~ ( , ) 2 n X N x ~ (0,1) / ( ) N n X Z X − = ~ ( 1) / − − = n X t S n X t 1 α n t S μ X n t S P(X n 1 X n 1 − + = − − − n t S X n t S X n X n−1 −1 − +
Note: The interval is random, and not the parameteru The confidence interval: a random interval. because it is based on X and S/ nwhich will vary from sample to sample The population mean: although unknown, is some fixed number and it is not random You should not say: the probability is 0.95(1-a) that u lies in this interval You should say: the probability is 0.95 that the random interval, contains the truex Interval estimation, in contrast to point estimation, provides a range of values that will include the true value with a certain degree of confidence or probability(such as 0.95) P(L<U)=1-00<0<1 That is, the prob. is (1-a) that the random interval from L to U contains the truex If we construct a confidence interval with a confidence coefficient of 0.95, then in repeated such constructions 95 out of 100 intervals can be expected to include the true u
Note: The interval is random, and not the parameterμx . The confidence interval: a random interval, because it is based on and which will vary from sample to sample. The population mean: although unknown, is some fixed number and it is not random. You should not say: the probability is 0.95(1-α) that μx lies in this interval. You should say: the probability is 0.95 that the random interval, contains the trueμx. Interval estimation, in contrast to point estimation, provides a range of values that will include the true value with a certain degree of confidence or probability (such as 0.95). P(L≤μx≤U)=1-α 0<α<1 That is, the prob. is(1-α) that the random interval from L to U contains the trueμx . If we construct a confidence interval with a confidence coefficient of 0.95, then in repeated such constructions 95 out of 100 intervals can be expected to include the true μx . X S / n
4.4 PROPERTIES OF POINT ESTIMATORS The sample mean is the most frequently used measure of the population mean because it satisfies several properties that statisticians deem desirable 1. Linearity An estimator if said to be a linear estimator if it is a linear function of the sample observations ∑=-(X1+x2+…+Xn) nn 2. Unbiasedness An estimator X is an unbiased estimator ofu, if E(-4 If we draw repeated samples of size n from the normal population and compute for each sample, then on the average will coincide with u The unbiasedness is a repeated sampling property 3. Efficiency If we consider only unbiased estimators of a parameter, the one with the smallest variance is called best. or efficient estimator
4.4 PROPERTIES OF POINT ESTIMATORS The sample mean is the most frequently used measure of the population mean because it satisfies several properties that statisticians deem desirable. • 1. Linearity An estimator if said to be a linear estimator if it is a linear function of the sample observations. • • • 2. Unbiasedness An estimator is an unbiased estimator ofμx if If we draw repeated samples of size n from the normal population and compute for each sample, then on the average will coincide with μx . The unbiasedness is a repeated sampling property. • 3. Efficiency If we consider only unbiased estimators of a parameter, the one with the smallest variance is called best, or efficient, estimator. X E X = x ( ) = = = + + + n i n i X X X n n X X 1 1 2 ( ... ) 1
4. Best Linear Unbiased Estimator (BLUE) If an estimator is linear. is unbiased and has a minimum variance in the class of all linear unbiased estimators of a parameter, it is called a best linear unbiased estimator 5. Consistency x=∑ x=∑ X n+1 E(X=uy (X)=-14 +1 An estimator (e.g, X*) is said to be a consistent estimator if it approaches the true value of the parameter as the sample size gets larger and larger
4. Best Linear Unbiased Estimator(BLUE) If an estimator is linear, is unbiased, and has a minimum variance in the class of all linear unbiased estimators of a parameter, it is called a best linear unbiased estimator. 5. Consistency An estimator (e.g., X*) is said to be a consistent estimator if it approaches the true value of the parameter as the sample size gets larger and larger. = n X X i + = n 1 X X i E X = X ( ) X n n E X + = 1 ( ) *
4.5 STATISTICAL INFERENCE: HYPOTHESIS TESTING Hypothesis testing: Instead of establishing a confidence interval, in hypothesis testing, we hypothesize that the true takes a particular numerical value, e. g, u=13. Our task is to"test this hypothesis Null hypothesis (Ho): the hypothesis we hypothesize, e.g.ux-13 Alternative hypothesis (H): the hypothesis used to test the null hypothesis H: 413, one-sided alternative hypothesis Hi: 4x13, one-sided alternative hypothesis H1:队≠13,two- sided alternative hypothesis The Confidence Interval Approach to Hypothesis Testing In hypothesis testing, the 95% confidence interval is called the acceptance region and the area outside the acceptance region is called the critical region/the region of rejection, of the null hypothesis. The boundaries of the acceptance region are called critical values The null hypothesis is rejected if the value of the parameter under the null hypothesis either exceeds the upper critical value or is less than the lower critical value of the acceptance region
4.5 STATISTICAL INFERENCE: HYPOTHESIS TESTING Hypothesis testing: Instead of establishing a confidence interval, in hypothesis testing, we hypothesize that the trueμx takes a particular numerical value, e.g., μx=13. Our task is to “test” this hypothesis. Null hypothesis(H0):the hypothesis we hypothesize, e.g.μx=13. Alternative hypothesis(H1): the hypothesis used to test the null hypothesis. H1 : μx >13, one-sided alternative hypothesis H1 : μx <13, one-sided alternative hypothesis H1 : μx≠13, two-sided alternative hypothesis • 1. The Confidence Interval Approach to Hypothesis Testing In hypothesis testing, the 95% confidence interval is called the acceptance region and the area outside the acceptance region is called the critical region/the region of rejection, of the null hypothesis. The boundaries of the acceptance region are called critical values. The null hypothesis is rejected if the value of the parameter under the null hypothesis either exceeds the upper critical value or is less than the lower critical value of the acceptance region
2. Type I and Type ll Errors: A Digression Type I error: the error of rejecting a hypothesis when it is true Type II error: the error of accepting a false hypothesis Type I error= a=prob (rejecting Ho I Ho is true) Type II error=B=prob(accepting Ho Ho is false) The classical approach to deal with type I, type lI problems -To assume a type I error is more serious than a type ll error, try to keep the prob. of committing a type I error at a fairly low level, and then minimize a type Il error as much as possible. That is, simply specifies the value of a without worrying too much about B The decision to accept or reject a null hypothesis depends critically on both the d f. and the probability of committing a type I error A% confidence coefficient/a 5% level of significance/a 95%level or degree of confidence: we are prepared to accept at the most a 5 percent probability of committing a type I error
2. Type I and Type II Errors: A Digression Type I error: the error of rejecting a hypothesiswhen it is true. Type II error: the error of accepting a false hypothesis. Type I error=α=prob.(rejectingH0 |H0 is true) Type II error=β=prob.(acceptingH0 |H0 is false) The classical approach to deal with type I, type II problems: ——To assume a type I error is more serious than a type II error, try to keep the prob. of committing a type I error at a fairly low level, and then minimize a type II error as much as possible. That is, simply specifies the value ofαwithout worrying too much aboutβ. The decision to accept or reject a null hypothesis depends critically on both the d.f. and the probability of committing a type I error. A 95% confidence coefficient/a 5% level of significance/a 95% level or degree of confidence: we are prepared to accept at the most a 5 percent probability of committing a type I error