Review of the normal distribution Lecture 5 a distribution is a collection of scores (values)on a variable that is arranged in order from lowest to highest value on the orizontal (X)axis, and in terms of T-Test frequency on the vertical Y) axis. A ormal distribution sometimes referred to as a bell curve, has a distribution that forms the shape of a bell. All you need to know to plot the normal distribution is the mean and standard deviation of the data This lecture covers A normal distribution with a mean of u, and a standard deviation of o, is denoted as · Norma| distribution distribution of N( 15, 2), then we would say it is a normal distribution with a mean of 15 One-sample t-test and a standard deviation of 2. normal Paired-sample t-test distributions do not all look alike: their shape depends on the values of the mean Two-sample t-test and standard deviation. For a given mean, normal distribution may be tall and thin (if o is small), or short and flat (if o is large). See the figures in the next slide
1 1 Lecture 5 T-Test 2 This lecture covers • Normal distribution • One-sample t-test • Paired-sample t-test • Two-sample t-test 2 3 Review of the Normal Distribution • A distribution is a collection of scores (values) on a variable that is arranged in order from lowest to highest value on the horizontal (X) axis, and in terms of frequency on the vertical (Y) axis. A normal distribution, sometimes referred to as a bell curve, has a distribution that forms the shape of a bell. All you need to know to plot the normal distribution is the mean and standard deviation of the data. 4 • A normal distribution with a mean of µ, and a standard deviation of σ, is denoted as N(µ, σ). If a set of scores has a distribution of N(15, 2), then we would say it is a normal distribution with a mean of 15 and a standard deviation of 2. Normal distributions do not all look alike; their shape depends on the values of the mean and standard deviation. For a given mean, a normal distribution may be tall and thin (if σ is small), or short and flat (if σ is large). See the figures in the next slide
While normal distributions are all not the characteristic: a given standard deviation from the mean always"cuts off the sam proportion or percentage of scores in all normal distributions Specifically, one standard deviation above and below the mean includes about 68% of the scores; two(actually, about 1.96) standard deviations above and below the an include 95 percent, and three include more than 99.7 percent of the Major characteristics These percentages are worth committing to memory: 68-95-99.7 lower halves of the distribution of scores are mages of eat Second it is unimodal the mean. median and mode are all in the same place in the center of the distribution(at the top of the bell curve), the distribution is thus highest in the middle, and curves downward toward the top and bottom of the distribution and lower tails of the distribution never actually touch the baseline known as the x-axis
3 5 6 Major characteristics • It is symmetrical, meaning that the upper and lower halves of the distribution of scores are mirror images of each other. • Second it is unimodal; the mean, median and mode are all in the same place, in the center of the distribution (at the top of the bell curve); the distribution is thus highest in the middle, and curves downward toward the top and bottom of the distribution. • Third it is asymptotic, meaning that the upper and lower tails of the distribution never actually touch the baseline, known as the X-axis. 4 7 • While normal distributions are all not the same, they share an important characteristic: a given standard deviation from the mean always “cuts off” the same proportion or percentage of scores in all normal distributions. • Specifically, one standard deviation above and below the mean includes about 68% of the scores; two (actually, about 1.96) standard deviations above and below the mean include 95 percent, and three include more than 99.7 percent of the scores. 8 These percentages are worth committing to memory: 68-95-99.7
There are three kinds of distributions The sampling distribution of the mean for a that we need to distinguish random sample has extremely important properties. As the sample size n increases, Population Distribution: the distribution the sampling distribution of the mean more of scores in a population, for example, the and more closely resembles a normal distribution of height scores for everyone distribution Statisticians refer to this tendency as the Distribution of a Sample: the distribution central limit theorem one of the most of scores in a sample, for example, the height scores of the students in this class mportant ideas in statistics. Sampling Distribution: the distribution of some statistic(e.g, the mean) in all The following figure presents a schematic depiction of these three distributions. It is obviously impossible to actually draw all possible samples Actually the sampling distribution of the mates a normal distribution fairly closely for sample sizes of 30 or more. This is true regardless of the shape of the variable s distribution in the population. Thus even if a variable is not normally distributed in the population, the mean of all possible sample means of this ariable is the same as the population mean,μ
5 9 There are three kinds of distributions that we need to distinguish • Population Distribution: the distribution of scores in a population, for example, the distribution of height scores for everyone in a country. • Distribution of a Sample: the distribution of scores in a sample, for example, the height scores of the students in this class. • Sampling Distribution: the distribution of some statistic (e.g., the mean) in all possible samples. 10 The following figure presents a schematic depiction The following figure presents a schematic depiction of these three distributions. It is obviously impossible of these three distributions. It is obviously impossible to actually draw all possible samples 6 11 • The sampling distribution of the mean for a random sample has extremely important properties. As the sample size n increases, the sampling distribution of the mean more and more closely resembles a normal distribution. • Statisticians refer to this tendency as the central limit theorem, one of the most important ideas in statistics. 12 • Actually the sampling distribution of the mean approximates a normal distribution fairly closely for sample sizes of 30 or more. This is true regardless of the shape of the variable’s distribution in the population. Thus even if a variable is not normally distributed in the population, the mean of all possible sample means of this variable is the same as the population mean, μ
Inference for the mean of a Population In this setting, the sample mean X has the normal distribution with mean u and Confidence intervals and tests of standard deviation significance for the mean A of a normal opulation are based on the sample do not know o, we estimate it by the mean x. The sampling distribution of x sample standard deviation s. We then has u as its mean. That is, x is an estimate the standard deviation of x unbiased estimator of the unknown u by s/n This quantity is called the The spread of x depends on the sample standard error of the sample mean x size and also on the population standard deviation g Assumptions for inference about When we know the value o we base a mean confidence intervals and tests for l on the one-sample z statistic random sampling x data are a simple (SRS)of size n from the population. This assumption is very important Observations from the population have a This z statistic has the standard normal ormal distribution with mean ll and distribution N(O, 1 ) When we do not know a standard deviation. In practice, it is we substitute the standard error s/n of X enough that the distribution be symmetric and single-peaked unless the sample is for its standard deviation a/vn.The statistic that results does not have a normal very small. Both A and o are unknown distribution It has a distribution that is new to parameters uS. called a t distribution 8
7 13 Inference for the Mean of a Population • Confidence intervals and tests of significance for the mean of a normal population are based on the sample mean .The sampling distribution of has as its mean.That is, is an unbiased estimator of the unknown . The spread of depends on the sample size and also on the population standard deviation σ . μ μ μ x x x x 14 Assumptions for inference about a mean • Our data are a simple random sampling (SRS) of size n from the population. This assumption is very important. • Observations from the population have a normal distribution with mean and standard deviation . In practice, it is enough that the distribution be symmetric and single-peaked unless the sample is very small. Both and are unknown parameters. μ μ σ σ 8 15 In this setting, the sample mean has the normal distribution with mean and standard deviation . Because we do not know , we estimate it by the sample standard deviation . We then estimate the standard deviation of by . This quantity is called the standard error of the sample mean . μ σ s σ n s n x x x 16 When we know the value , we base confidence intervals and tests for on the one-sample z statistic This z statistic has the standard normal distribution N(0,1). When we do not know , we substitute the standard error of for its standard deviation . The statistic that results does not have a normal distribution. It has a distribution that is new to us, called a t distribution. x z n μ σ − = σ σ μ s n x σ n
The one-sample t statistic and Figure 1 Density curve for the t distributions with 2 and 9 degrees of freedom and the he t distributions standard normal distribution raw an SRS of size n from a population that has the normal distribution with meanu and standard deviation o the one- has the t distribution with n-1 degrees of freedom The t statistic has the same interpretation as any standard statistic: it says how farx The figure illustrates these facts is from its mean u in standard deviation about the t distributions The degrees of freedom for one-sample t The density curves of the t distributions statistic come from the sample standard are similar in shape to the standard normal deviation s in the denominator of t We know that s has n-1 degree of freedom We will write the t distribution with k degrees of freedom as t(k) for short
9 17 The one-sample t statistic and the t distributions • Draw an SRS of size n from a population that has the normal distribution with mean and standard deviation . The onesample t statistic has the t distribution with n-1 degrees of freedom. x t s n − μ = μ σ 18 • The t statistic has the same interpretation as any standard statistic: it says how far is from its mean in standard deviation units. • The degrees of freedom for one-sample t statistic come from the sample standard deviation s in the denominator of t. We know that s has n-1 degree of freedom. We will write the t distribution with k degrees of freedom as t(k) for short. x μ 10 19 Figure 1 Density curve for the t distributions with 2 and 9 degrees of freedom and the standard normal distribution. 20 The figure illustrates these facts about the t distributions: • The density curves of the t distributions are similar in shape to the standard normal curve. They are symmetric about zero, single-peaked, and bell-shaped
The spread of the t distribution is a bit greater than that of the standard norm Table c gives critical values for t distributions distribution The t distributions have more Each row in the table contains critical values probability in the tails and less in the for one of the t distributions; the degrees of center than does the standard normal freedom appear at the left of the row This is true because substituting the estimates s for the fixed parameter o By looking down any column, you can check introduces more variation into the statistic that the t critical values approach the normal values as the degrees of freedom increase As the degrees of freedom k increase, the t(k) density curve approaches the N(o, 1) curve ever more closely. This happens Table t ecause s estimates o more accurately distribution as the sample size increases. So using s critical in place of o causes little extra variation when the sample is large 温需
11 21 • The spread of the t distribution is a bit greater than that of the standard normal distribution. The t distributions have more probability in the tails and less in the center than does the standard normal. This is true because substituting the estimates s for the fixed parameter introduces more variation into the statistic. σ 22 • As the degrees of freedom k increase, the t(k) density curve approaches the N(0,1) curve ever more closely. This happens because s estimates more accurately as the sample size increases. So using s in place of causes little extra variation when the sample is large. σ σ 12 23 Table C gives critical values for t distributions. Each row in the table contains critical values for one of the t distributions; the degrees of freedom appear at the left of the row. By looking down any column, you can check that the t critical values approach the normal values as the degrees of freedom increase. 24 Table C t distribution critical values
The t confidence intervals and tests To test the hypothesis Ho A=u based on an SRS of size n, compute the To analyze samples from normal one-sample t statistic populations with unknown o, just replace the standard deviationa/n of x by its standard error in the z procedures The z procedures then become one- mple t procedures. Use P-value or critical values from the t distribution with n- 1 degrees of freedom in place of the normal values The one-sample t procedures In terms of a variable T having t(n distribution, the P-value for a test of Ho Draw an SRS of size n from a population against having unknown mean u. Alevel C confidence interval for p is H1:x>isP(T≥1) x±t H1;<AisP(T≤ where t is the upper(1-c)/2 critical value H1:≠6 for the t(n-1)distribution
13 25 The t confidence intervals and tests • To analyze samples from normal populations with unknown , just replace the standard deviation of by its standard error in the z procedures. The z procedures then become onesample t procedures. Use P-value or critical values from the t distribution with n- 1 degrees of freedom in place of the normal values. σ σ n x s n 26 The one-sample t procedures • Draw an SRS of size n from a population having unknown mean . A level C confidence interval for is μ s x t n ± where t is the upper (1-C)/2 critical value for the t(n-1) distribution. μ 14 27 • To test the hypothesis based on an SRS of size n, compute the one-sample t statistic 0 0 H : μ = μ 0 x t s n − μ = 28 • In terms of a variable T having t(n-1) distribution, the P-value for a test of against H0 1 0 H PT t : is ( ) μ > ≥ μ 1 0 H : is ( ) μ < ≤ μ PT t 1 0 H PT t : is 2 ( ) μ μ ≠ ≥
EXample 1 The one-sample t distribution confidence interval h form An SRS of 5 objects 5595682452.7321.502378 estimate±t× SE To calculate 95% confidence interval. first alculate SE stands for "standard error x=4444andS=20741 The degrees of freedom are n-1=4.From Table c we find that for 95% confidence he confidence interval is Example 2: Sweetness lose x±t-==44.42.7-041 Sweetness losses from 10 tasters n 2.0040.72.0-0422-1.3121.123 4444±2575 Are these data good evidence that the drink lost sweetness? =1869to70.19 Step 1: Hypothesis. The null hypothesis is The large margin of error is due to the small no loss, and the alternative hypothesis says sample size and the rather large variation there is a loss H among the observations, reflected in the large value of s H1:g>0
15 29 Example 1 • An SRS of 5 objects: 55.95 68.24 52.73 21.50 23.78 To calculate 95% confidence interval, first calculate =44.44 and =20.741 The degrees of freedom are n-1=4. From Table C we find that for 95% confidence t=2.776. x s 30 • The confidence interval is 20.741 44.44 2.776 5 44.44 25.75 18.69 to 70.19 s x t n ±= ± = ± = The large margin of error is due to the small sample size and the rather large variation among the observations, reflected in the large value of s. 16 31 • The one-sample t distribution confidence interval has the form estimate SE ± × estimate t “SE” stands for “standard error.” 32 Example 2: Sweetness lose Sweetness losses from 10 tasters: 2.0 0.4 0.7 2.0 -0.4 2.2 -1.3 1.2 1.1 2.3 Are these data good evidence that the drink lost sweetness? • Step 1: Hypothesis. Step 1: Hypothesis. The null hypothesis is The null hypothesis is “no loss no loss”, and the alternative hypothesis says , and the alternative hypothesis says “there is a loss there is a loss”. 0 1 : 0 : 0 H H μ μ = >
Step 2: Test statistic. The basic statistics Table c shows that the observed t lies are x=1.02 and S=1.196 between 0.01 and 0.02. Using SPSS,we get the exact result P=0.0122. There is The one-sample t test statistic quite strong evidence for a loss of sweetness 1=k_1.02-0 2.70 190 Step 3: P-value. The P-value for t=2. 70 is ptin Statistic the area to the right of 2. 70 under the t distribution curve with degrees of freedom n-1=9. Figure 2 shows this area T-Test
17 33 • Step 2: Test statistic. The basic statistics are =1.02 and =1.196. The one-sample t test statistic is x s 0 1.02 0 2.70 1.196 10 x t s n − μ − == = 34 • Step 3: P-value. The P-value for t=2.70 is the area to the right of 2.70 under the t distribution curve with degrees of freedom n-1=9. Figure 2 shows this area. 18 35 • Table C shows that the observed t lies between 0.01 and 0.02. Using SPSS, we get the exact result P=0.0122. There is quite strong evidence for a loss of sweetness. 36
Matched pairs t procedures Example 3 istening to Mozart improves students The taste test in Example 2 is in fact a matched pairs study in which the same 10 tasters rated before-and-after sweetness 迪过(cm甲A( Group B Diference Subject(co甲A(mBDa 371258.938350-24571 To compare the responses to the two 5641014435351.3 treatments in a matched pairs design, 37.9329.33 apply the one-sample t procedures to observed differences 747045487185.53580044 28?329420 1051236847-172421536747.06.67 1654051.101430 To analyze these data, subtract Group B The parameter u in a matched pairs t cores from group a scores for each procedure is the mean difference in the responses to the two treatments within matched pairs of subjects in the entire Step 1: Hypothesis. To assess whether population listening to Mozart significantly improved performance, we test HH 0
19 37 Matched pairs t procedures • The taste test in Example 2 is in fact a matched pairs study in which the same 10 tasters rated before-and-after sweetness. • To compare the responses to the two treatments in a matched pairs design, apply the one-sample t procedures to observed differences. 38 • The parameter in a matched pairs t procedure is the mean difference in the responses to the two treatments within matched pairs of subjects in the entire population. μ 20 39 Example 3 Subject Group A Group B Difference Subject Group A Group B Difference 1 30.60 37.97 -7.37 12 58.93 83.50 -24.57 2 48.43 51.57 -3.14 13 54.47 38.30 16.17 3 60.77 56.67 4.10 14 43.53 51.37 -7.84 4 36.07 40.47 -4.40 15 37.93 29.33 8.60 5 68.47 49.00 19.47 16 43.50 54.27 -10.77 6 32.43 43.23 -10.80 17 87.70 62.73 24.97 7 43.70 44.57 -0.87 18 53.53 58.00 -4.47 8 37.10 28.40 8.70 19 64.30 52.40 11.90 9 31.17 28.23 2.94 20 47.37 53.63 -6.26 10 51.23 68.47 -17.24 21 53.67 47.00 6.67 11 65.40 51.10 14.30 Listening to Mozart improves student’s performance on tests. 40 To analyze these data, subtract Group B scores from Group A scores for each subject. • Step 1: Hypothesis. To assess whether listening to Mozart significantly improved performance, we test 0 1 : 0 : 0 H H μ μ = >