Contents 11 Association Between Variables 767 11.1 Introduction............................767 11.1.1 Measure of Association 768 11.1.2 Chapter Summary····· 769 11.2 Chi Square Based Measures 769 11.2.1Phi......··.。·.。…·…··…·……· 774 11.2.2 Contingency coefficient 778 l1.2.3 Cramer'sV......,..········…···· 782 11.2.4 Summary of Chi Square Based Measures... 784 ll.3 Reduction in Error Measures·.············ 786 766
Contents 11 Association Between Variables 767 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 11.1.1 Measure of Association . . . . . . . . . . . . . . . . . 768 11.1.2 Chapter Summary . . . . . . . . . . . . . . . . . . . . 769 11.2 Chi Square Based Measures . . . . . . . . . . . . . . . . . . . 769 11.2.1 Phi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 11.2.2 Contingency coefficient . . . . . . . . . . . . . . . . . 778 11.2.3 Cramer’s V . . . . . . . . . . . . . . . . . . . . . . . . 782 11.2.4 Summary of Chi Square Based Measures . . . . . . . . 784 11.3 Reduction in Error Measures . . . . . . . . . . . . . . . . . . 786 766
Chapter 11 Association Between Variables 11.1 Introduction In previous chapters,much of the discussion concerned a single variable. describing a distribution,calculating summary statistics,obtaining interval estimates for parameters and testing hypotheses concerning these parame- ters.Statistics that describe or make inferences about a single distribution are referred to as univariate statistics.While univariate statistics form the basis for many other types of statistics,none of the issues concerning relationships among variables can be answered by examining only a single variable.In order to examine relationships among variables,it is neces- sary to move to at least the level of bivariate statistics,examining two variables.Frequently the researcher wishes to move beyond this to mul- tivariate statistics,where the relationships among several variables are simultaneously examined. Cross classification tables,used to determine independence and depen- dence for events and for variables,are one type of bivariate statistics.A test for a difference between two proportions can also be considered a type of bivariate statistics.The only other example of bivariate methods used so far in this textbook is the test for the difference between two means,us- ing either the normal or the t distribution.The latter is the only bivariate method which has been used to examine variables that have interval or ratio level scales. An example of a relationship that a researcher might investigate is the 767
Chapter 11 Association Between Variables 11.1 Introduction In previous chapters, much of the discussion concerned a single variable, describing a distribution, calculating summary statistics, obtaining interval estimates for parameters and testing hypotheses concerning these parameters. Statistics that describe or make inferences about a single distribution are referred to as univariate statistics. While univariate statistics form the basis for many other types of statistics, none of the issues concerning relationships among variables can be answered by examining only a single variable. In order to examine relationships among variables, it is necessary to move to at least the level of bivariate statistics, examining two variables. Frequently the researcher wishes to move beyond this to multivariate statistics, where the relationships among several variables are simultaneously examined. Cross classification tables, used to determine independence and dependence for events and for variables, are one type of bivariate statistics. A test for a difference between two proportions can also be considered a type of bivariate statistics. The only other example of bivariate methods used so far in this textbook is the test for the difference between two means, using either the normal or the t distribution. The latter is the only bivariate method which has been used to examine variables that have interval or ratio level scales. An example of a relationship that a researcher might investigate is the 767
Measures of Association 768 relationship between political party supported and opinion concerning so- cioeconomic issues.In Chapters 9 and 10.the relationship between political party supported and opinion concerning various explanations for unemploy- ment,among a sample of Edmonton adults,was examined.This type of re- lationship was examined using a cross classification table and the chi square statistic.Differences of proportions,or difference of mean opinion could have been used as a method of examining this relationship as well.In this chap- ter,various summary measures are used to describe these relationships.The chi square statistic from the cross classification table is modified to obtain a measure of association.Correlation coefficients and regression models are also used to examine the relationship among variables which have ordinal, interval or ratio level scales. Bivariate and multivariate statistics are useful not only for statistical reasons,but they form a large part of social science research.The social sci- ences are concerned with explaining social phenomena and this necessarily involves searching for,and testing for,relationships among variables.So- cial phenomena do not just happen,but have causes.In looking for causal factors,attempting to determine which variables cause or influence other variables,the researcher examines the nature of relationships among vari- ables.Variables that appear to have little relationship with the variable that the researcher is attempting to explaing may be ignored.Variables which appear to be related to the variable being explained must be closely exam- ined.The researcher is concerned with whether a relationship among vari- ables exists or not.If the relationship appears to exist,then the researcher wishes to know more concerning the nature of this relationship.The size and strength of the relationship are of concern,and there are various tests concerning these. In this chapter,there is no examination of multivariate relationships, where several variables are involved.This chapter looks only at bivariate relationships,testing for the existence of such relationships,and attempting to describe the strength and nature of such relationships.The two variable methods of this chapter can be extended to the examination of multivariate relationships.But the latter methods are beyond the scope of an introduc- tory textbook,and are left to more advanced courses in statistics. 11.1.1 Measure of Association Measures of association provide a means of summarizing the size of the association between two variables.Most measures of association are scaled
Measures of Association 768 relationship between political party supported and opinion concerning socioeconomic issues. In Chapters 9 and 10, the relationship between political party supported and opinion concerning various explanations for unemployment, among a sample of Edmonton adults, was examined. This type of relationship was examined using a cross classification table and the chi square statistic. Differences of proportions, or difference of mean opinion could have been used as a method of examining this relationship as well. In this chapter, various summary measures are used to describe these relationships. The chi square statistic from the cross classification table is modified to obtain a measure of association. Correlation coefficients and regression models are also used to examine the relationship among variables which have ordinal, interval or ratio level scales. Bivariate and multivariate statistics are useful not only for statistical reasons, but they form a large part of social science research. The social sciences are concerned with explaining social phenomena and this necessarily involves searching for, and testing for, relationships among variables. Social phenomena do not just happen, but have causes. In looking for causal factors, attempting to determine which variables cause or influence other variables, the researcher examines the nature of relationships among variables. Variables that appear to have little relationship with the variable that the researcher is attempting to explaing may be ignored. Variables which appear to be related to the variable being explained must be closely examined. The researcher is concerned with whether a relationship among variables exists or not. If the relationship appears to exist, then the researcher wishes to know more concerning the nature of this relationship. The size and strength of the relationship are of concern, and there are various tests concerning these. In this chapter, there is no examination of multivariate relationships, where several variables are involved. This chapter looks only at bivariate relationships, testing for the existence of such relationships, and attempting to describe the strength and nature of such relationships. The two variable methods of this chapter can be extended to the examination of multivariate relationships. But the latter methods are beyond the scope of an introductory textbook, and are left to more advanced courses in statistics. 11.1.1 Measure of Association Measures of association provide a means of summarizing the size of the association between two variables. Most measures of association are scaled
Measures of Association 769 so that they reach a maximum numerical value of I when the two variables have a perfect relationship with each other.They are also scaled so that they have a value of 0 when there is no relationship between two variables. While there are exceptions to these rules,most measures of association are of this sort.Some measures of association are constructed to have a range of only 0 to 1,other measures have a range from-1 to +1.The latter provide a means of determining whether the two variables have a positive or negative association with each other. Tests of significance are also provided for many of the measures of as- sociation.These tests begin by hypothesizing that there is no relationship between the two variables,and that the measure of association equals 0. The researcher calculates the observed value of the measure of association, and if the measure is different enough from 0,the test shows that there is a significant relationship between the two variables. 11.1.2 Chapter Summary This chapter begins with measures of association based on the chi square statistic.It will be seen in Section 11.2 that the x2 statistic is a function not only of the size of the relationship between the two variables,but also of the sample size and the number of rows and columns in the table.This statistic can be adjusted in various ways,in order to produce a measure of associ- ation.Following this,in Section 11.3,a different approach to obtaining a measure of association is outlined.This is to consider how much the error of prediction for a variable can be reduced when the researcher has knowledge of a second variable.Section ?examines various correlation coefficients, measures which summarize the relationship between two variables that have an ordinal or higher level of measurement.Finally,Section ?presents the regression model for interval or ratio variables.The regression model allows the researcher to estimate the size of the relationship between two variables, where one variable is considered the independent variable,and the other variable depends on the first variable. 11.2 Chi Square Based Measures One way to determine whether there is a statistical relationship between two variables is to use the chi square test for independence of Chapter 10. A cross classification table is used to obtain the expected number of cases under the assumption of no relationship between the two variables.Then
Measures of Association 769 so that they reach a maximum numerical value of 1 when the two variables have a perfect relationship with each other. They are also scaled so that they have a value of 0 when there is no relationship between two variables. While there are exceptions to these rules, most measures of association are of this sort. Some measures of association are constructed to have a range of only 0 to 1, other measures have a range from -1 to +1. The latter provide a means of determining whether the two variables have a positive or negative association with each other. Tests of significance are also provided for many of the measures of association. These tests begin by hypothesizing that there is no relationship between the two variables, and that the measure of association equals 0. The researcher calculates the observed value of the measure of association, and if the measure is different enough from 0, the test shows that there is a significant relationship between the two variables. 11.1.2 Chapter Summary This chapter begins with measures of association based on the chi square statistic. It will be seen in Section 11.2 that the χ 2 statistic is a function not only of the size of the relationship between the two variables, but also of the sample size and the number of rows and columns in the table. This statistic can be adjusted in various ways, in order to produce a measure of association. Following this, in Section 11.3, a different approach to obtaining a measure of association is outlined. This is to consider how much the error of prediction for a variable can be reduced when the researcher has knowledge of a second variable. Section ?? examines various correlation coefficients, measures which summarize the relationship between two variables that have an ordinal or higher level of measurement. Finally, Section ?? presents the regression model for interval or ratio variables. The regression model allows the researcher to estimate the size of the relationship between two variables, where one variable is considered the independent variable, and the other variable depends on the first variable. 11.2 Chi Square Based Measures One way to determine whether there is a statistical relationship between two variables is to use the chi square test for independence of Chapter 10. A cross classification table is used to obtain the expected number of cases under the assumption of no relationship between the two variables. Then
Measures of Association 770 the value of the chi square statistic provides a test whether or not there is a statistical relationship between the variables in the cross classification table. While the chi square test is a very useful means of testing for a rela- tionship,it suffers from several weakenesses.One difficulty with the test is that it does not indicate the nature of the relationship.From the chi square statistic itself.it is not possible to determine the extent to which one vari- able changes,as values of the other variable change.About the only way to do this is to closely examine the table in order to determine the pattern of the relationship between the two variables. A second problem with the chi square test for independence is that the size of the chi square statistic may not provide a reliable guide to the strength of the statistical relationship between the two variables.When two different cross classification tables have the same sample size,the two variables in the table with the larger chi square value are more strongly related than are the two variables in the table with the smaller chi square value.But when the sample sizes for two tables differ.the size of the chi square statistic is a misleading indicator of the extent of the relationship between two variables. This will be seen in Example 11.2.1. A further difficulty is that the value of the chi square statistic may change depending on the number of cells in the table.For example,a table with 2 columns and 3 rows may give a different chi square value than does a cross classification table with 4 columns and 5 rows,even when the relationship between the two variables and the sample sizes are the same.The number of rows and columns in a table are referred to as the dimensions of the table.Tables of different dimension give different degrees of freedom,partly correcting for this problem.But it may still be misleading to compare the chi square statistic for two tables of quite different dimensions In order to solve some of these problems,the chi square statistic can be adjusted to take account of differences in sample size and dimension of the table.Some of the measures which can be calculated are phi,the contingency coefficient,and Cramer's V.Before examining these measures,the following example shows how sample size affects the value of the chi square statistic. Example 11.2.1 Effect of Sample Size on the Chi Square Statistic The hypothetical examples of Section 6.2 of Chapter 6 will be used to illustrate the effect of sample size on the value of the chi square statistic. The data from Tables 6.9 and 6.10 will first be used to illustrate how a larger
Measures of Association 770 the value of the chi square statistic provides a test whether or not there is a statistical relationship between the variables in the cross classification table. While the chi square test is a very useful means of testing for a relationship, it suffers from several weakenesses. One difficulty with the test is that it does not indicate the nature of the relationship. From the chi square statistic itself, it is not possible to determine the extent to which one variable changes, as values of the other variable change. About the only way to do this is to closely examine the table in order to determine the pattern of the relationship between the two variables. A second problem with the chi square test for independence is that the size of the chi square statistic may not provide a reliable guide to the strength of the statistical relationship between the two variables. When two different cross classification tables have the same sample size, the two variables in the table with the larger chi square value are more strongly related than are the two variables in the table with the smaller chi square value. But when the sample sizes for two tables differ, the size of the chi square statistic is a misleading indicator of the extent of the relationship between two variables. This will be seen in Example 11.2.1. A further difficulty is that the value of the chi square statistic may change depending on the number of cells in the table. For example, a table with 2 columns and 3 rows may give a different chi square value than does a cross classification table with 4 columns and 5 rows, even when the relationship between the two variables and the sample sizes are the same. The number of rows and columns in a table are referred to as the dimensions of the table. Tables of different dimension give different degrees of freedom, partly correcting for this problem. But it may still be misleading to compare the chi square statistic for two tables of quite different dimensions. In order to solve some of these problems, the chi square statistic can be adjusted to take account of differences in sample size and dimension of the table. Some of the measures which can be calculated are phi, the contingency coefficient, and Cramer’s V. Before examining these measures, the following example shows how sample size affects the value of the chi square statistic. Example 11.2.1 Effect of Sample Size on the Chi Square Statistic The hypothetical examples of Section 6.2 of Chapter 6 will be used to illustrate the effect of sample size on the value of the chi square statistic. The data from Tables 6.9 and 6.10 will first be used to illustrate how a larger
Measures of Association 771 chi square value can be used to indicate a stronger relationship between two variables when two tables have the same sample size.Then the misleading nature of the chi square statistic when sample size differs will be shown. Opinion Male Female Total Agree 65 25 90 (60.0) (30.0) Disagree 35 25 60 (40.0) (20.0】 Total 100 50 150 X2=0.417+0.833+0.625+1.250=3.125 df =1 0.075<a<0.10 Table 11.1:Weak Relationship between Sex and Opinion Table 11.1 gives the chi square test for independence for the weak rela- tionship between sex and opinion,originally given in Table 6.9.The first entry in each cell of the table is the count,or observed number of cases. The number in brackets in each cell of the table is the expected number of cases under the assumption of no relationship between sex and opinion.It can be seen that the value of the chi square statistic for the relationship shown in Table 11.1 is 3.125.With one degree of freedom,this value is statistically significant at the 0.10 level of significance,but not at the 0.075 level.This indicates a rather weak relationship,providing some evidence for a relationship between sex and opinion.But the null hypothesis of no relationship between the two variables can be rejected at only the 0.10 level of significance. Table 11.2 gives much stronger evidence for a relationship between sex and opinion.In this table,the distribution of opinions for females is the same as in the earlier table,but more males are in agreement,and less in disagreement than in the earlier table.As a result,the chi square value for Table 11.2 gives a larger value,indicating a more significant relationship
Measures of Association 771 chi square value can be used to indicate a stronger relationship between two variables when two tables have the same sample size. Then the misleading nature of the chi square statistic when sample size differs will be shown. Opinion Male Female Total Agree 65 25 90 (60.0) (30.0) Disagree 35 25 60 (40.0) (20.0) Total 100 50 150 χ 2 = 0.417 + 0.833 + 0.625 + 1.250 = 3.125 df = 1 0.075 < α < 0.10 Table 11.1: Weak Relationship between Sex and Opinion Table 11.1 gives the chi square test for independence for the weak relationship between sex and opinion, originally given in Table 6.9. The first entry in each cell of the table is the count, or observed number of cases. The number in brackets in each cell of the table is the expected number of cases under the assumption of no relationship between sex and opinion. It can be seen that the value of the chi square statistic for the relationship shown in Table 11.1 is 3.125. With one degree of freedom, this value is statistically significant at the 0.10 level of significance, but not at the 0.075 level. This indicates a rather weak relationship, providing some evidence for a relationship between sex and opinion. But the null hypothesis of no relationship between the two variables can be rejected at only the 0.10 level of significance. Table 11.2 gives much stronger evidence for a relationship between sex and opinion. In this table, the distribution of opinions for females is the same as in the earlier table, but more males are in agreement, and less in disagreement than in the earlier table. As a result, the chi square value for Table 11.2 gives a larger value, indicating a more significant relationship
Measures of Association 772 Opinion Male Female Total Agree 75 25 100 (66.7)(33.3) Disagree 25 25 50 (33.3) (16.7) Total 100 50 150 X2=1.042+2.083+2.083+4.167=9.375 df =1 0.001<a<0.005 Table 11.2:Strong Relationship between Sex and Opinion than in Table 11.1.For Table 11.2,the chi square value is 9.375,and with one degree of freedom,this statistic provides evidence of a relationship at the 0.005 level of significance. When comparing these two tables,the size of the chi square value pro- vides a reliable guide to the strength of the relationship between sex and opinion in the two tables.The larger chi square value of Table 11.2 means a stronger relationship between sex and opinion than does the smaller chi square value of Table 11.1.In these two tables,the sample size is the same, with n =150 cases in each table. Now examine Table 11.3,which is based on the weak relationship of Table 11.1,but with the sample size increased from n=150 to n=600.In order to preserve the nature of the relationship,each of the observed numbers of cases in the cells of Table 11.1 are multiplied by 4.The new table again shows that females are equally split between agree and disagree,but males are split 260/140=65/35 between agree and disagree.The pattern of the relationship between sex and opinion is unchanged from Table 11.1.But now the chi square statistic is dramatically increased.In Table 11.3,the chi square statistic is 12.5,as opposed to only 3.125 in Table 11.1.The 12.5 of the new table is even larger than the chi square value of 9.375 of Table 11.2. The larger sample size in the new table has increased the value of the chi
Measures of Association 772 Opinion Male Female Total Agree 75 25 100 ( 66.7) (33.3) Disagree 25 25 50 (33.3) (16.7) Total 100 50 150 χ 2 = 1.042 + 2.083 + 2.083 + 4.167 = 9.375 df = 1 0.001 < α < 0.005 Table 11.2: Strong Relationship between Sex and Opinion than in Table 11.1. For Table 11.2, the chi square value is 9.375, and with one degree of freedom, this statistic provides evidence of a relationship at the 0.005 level of significance. When comparing these two tables, the size of the chi square value provides a reliable guide to the strength of the relationship between sex and opinion in the two tables. The larger chi square value of Table 11.2 means a stronger relationship between sex and opinion than does the smaller chi square value of Table 11.1. In these two tables, the sample size is the same, with n = 150 cases in each table. Now examine Table 11.3, which is based on the weak relationship of Table 11.1, but with the sample size increased from n = 150 to n = 600. In order to preserve the nature of the relationship, each of the observed numbers of cases in the cells of Table 11.1 are multiplied by 4. The new table again shows that females are equally split between agree and disagree, but males are split 260/140 = 65/35 between agree and disagree. The pattern of the relationship between sex and opinion is unchanged from Table 11.1. But now the chi square statistic is dramatically increased. In Table 11.3, the chi square statistic is 12.5, as opposed to only 3.125 in Table 11.1. The 12.5 of the new table is even larger than the chi square value of 9.375 of Table 11.2. The larger sample size in the new table has increased the value of the chi
Measures of Association 773 square statistic so that even the relatively weak relationship between sex and opinion becomes very significant statistically.Given the assumption of no relationship between sex and opinion,the probability of obtaining the data of Table 11.3 is less than 0.0005. Opinion Male Female Total Agree 260 100 360 (240.0) (120.0) Disagree 140 100 240 (160.0)(80.0)0 Total 400 200 600 X2=1.667+3.333+2.500+5.000=12.500 df=1 a<0.0005 Table 11.3:Weak Relationship-Larger Sample Size This example shows how the value of the chi square statistic is sensitive to the sample size.As can be seen by comparing Tables 11.1 and 11.3,mul- tiplying all the observed numbers of cases by 4 also increases the chi square statistic by 4.The degrees of freedom stay unchanged,so that the larger chi square value appears to imply a much stronger statistical relationship between sex and opinion. Considerable caution should be exercised when comparing the chi square statistic,and its significance,for two tables having different sample sizes.If the sample size for the two tables is the same,and the dimensions of the table are also identical,the table with the larger chi square value generally provides stronger evidence for a relationship between the two variables.But when the sample sizes,or the dimensions of the table differ,the chi square statistic and its significance may not provide an accurate idea of the extent of the relationship between the two variables. One way to solve some of the problems associated with the chi square
Measures of Association 773 square statistic so that even the relatively weak relationship between sex and opinion becomes very significant statistically. Given the assumption of no relationship between sex and opinion, the probability of obtaining the data of Table 11.3 is less than 0.0005. Opinion Male Female Total Agree 260 100 360 (240.0) (120.0) Disagree 140 100 240 (160.0) (80.0)0 Total 400 200 600 χ 2 = 1.667 + 3.333 + 2.500 + 5.000 = 12.500 df = 1 α < 0.0005 Table 11.3: Weak Relationship - Larger Sample Size This example shows how the value of the chi square statistic is sensitive to the sample size. As can be seen by comparing Tables 11.1 and 11.3, multiplying all the observed numbers of cases by 4 also increases the chi square statistic by 4. The degrees of freedom stay unchanged, so that the larger chi square value appears to imply a much stronger statistical relationship between sex and opinion. Considerable caution should be exercised when comparing the chi square statistic, and its significance, for two tables having different sample sizes. If the sample size for the two tables is the same, and the dimensions of the table are also identical, the table with the larger chi square value generally provides stronger evidence for a relationship between the two variables. But when the sample sizes, or the dimensions of the table differ, the chi square statistic and its significance may not provide an accurate idea of the extent of the relationship between the two variables. One way to solve some of the problems associated with the chi square
Measures of Association 774 statistic is to adjust the chi square statistic for either the sample size or the dimension of the table,or for both of these.Phi,the contingency coefficient and Cramer's V,are measures of association that carry out this adjustment, using the chi square statistic.These are defined in the following sections, with examples of each being provided. 11.2.1Phi The measure of association,phi,is a measure which adjusts the chi square statistic by the sample size.The symbol for phi is the Greek letter phi, written o,and usually pronounced fye'when used in statistics.Phi is most easily defined as n Sometimes phi squared is used as a measure of association,and phi squared is defined as In order to calculate these measures,the chi square statistic for the table is first determined,and from this it is relatively easy to determine phi or phi squared.Since phi is usually less than one,and since the square of a number less than one is an even smaller number,o2 can be extremely small.This is one the reasons that phi is more commonly used than is phi squared. Example 11.2.2 and o2 for Tables of Section 11.2 Table 11.4 gives the three two by two tables shown in the last section, without the row and column totals.The chi square statistic and sample size for each of the tables is given below the frequencies for each cell in the table.From these,2 and o are then determined.For the first table,with the strong relationship,having females equally divided on some issue,but with males split 75 agreeing and 25 disagreeing,x2=9.375.The sample size for this table is n 150 so that 2=2=9.375 =0.0625 150 and φ=n /9.37币=√0.0625=0.25
Measures of Association 774 statistic is to adjust the chi square statistic for either the sample size or the dimension of the table, or for both of these. Phi, the contingency coefficient and Cramer’s V, are measures of association that carry out this adjustment, using the chi square statistic. These are defined in the following sections, with examples of each being provided. 11.2.1 Phi The measure of association, phi, is a measure which adjusts the chi square statistic by the sample size. The symbol for phi is the Greek letter phi, written φ, and usually pronounced ‘fye’ when used in statistics. Phi is most easily defined as φ = s χ2 n . Sometimes phi squared is used as a measure of association, and phi squared is defined as φ 2 = χ 2 n . In order to calculate these measures, the chi square statistic for the table is first determined, and from this it is relatively easy to determine phi or phi squared. Since phi is usually less than one, and since the square of a number less than one is an even smaller number, φ 2 can be extremely small. This is one the reasons that phi is more commonly used than is phi squared. Example 11.2.2 φ and φ 2 for Tables of Section 11.2 Table 11.4 gives the three two by two tables shown in the last section, without the row and column totals. The chi square statistic and sample size for each of the tables is given below the frequencies for each cell in the table. From these, φ 2 and φ are then determined. For the first table, with the strong relationship, having females equally divided on some issue, but with males split 75 agreeing and 25 disagreeing, χ 2 = 9.375. The sample size for this table is n = 150 so that φ 2 = χ 2 n = 9.375 150 = 0.0625 and φ = s χ2 n = r 9.375 150 = √ 0.0625 = 0.25
Measures of Association 775 Nature of Relation and Sample Size Strong;n=150 Weak,n=150 Weak.n =600 Opinion Male Female Male Female Male Female Agree 75 25 65 25 260 100 Disagree 25 25 35 25 140 100 x2 9.375 3.125 12.500 n 150 150 600 2 0.0625 0.02083 0.02083 0.25 0.144 0.144 Table 11.4:02 and o for 2 x 2 Tables The values of o2 and o for the other tables are obtained in a similar manner.Note how small o2 is in each of the tables.Since a very small value might seem to indicate no relationship between the two variables,sex and opinion,it might be preferable to use o rather than o2.Note that o is 0.25 for the strong relationship,and only 0.144 for the weak relationship.By comparing the two values of o,you can obtain some idea of the association between sex and opinion.This indicates that the relationship of Table 11.2, for which o =0.25,is a stronger relationship than is the relationship of Table 11.1,where o is only 0.144.Also note in the two right panels of Table 11.4 that o for the weak relationship is the same,regardless of the sample size.As shown earlier,in Tables 11.1 and 11.3,the value of x2 is quite different for these two types of data,but o is the same.That is,the nature of the relationship is the same in the two right panels of Table 11.4, but the sample size is four times greater on the right than in the middle panel.This dramatically increases the size of the chi square statistic,but leaves the values of 2 and o unchanged. Example 11.2.3 Relationship Between Age and Opinion Concern- ing Male and Female Job Roles The Regina Labour Force Survey asked respondents the question
Measures of Association 775 Nature of Relation and Sample Size Strong, n = 150 Weak, n = 150 Weak, n = 600 Opinion Male Female Male Female Male Female Agree 75 25 65 25 260 100 Disagree 25 25 35 25 140 100 χ 2 9.375 3.125 12.500 n 150 150 600 φ 2 0.0625 0.02083 0.02083 φ 0.25 0.144 0.144 Table 11.4: φ 2 and φ for 2 × 2 Tables The values of φ 2 and φ for the other tables are obtained in a similar manner. Note how small φ 2 is in each of the tables. Since a very small value might seem to indicate no relationship between the two variables, sex and opinion, it might be preferable to use φ rather than φ 2 . Note that φ is 0.25 for the strong relationship, and only 0.144 for the weak relationship. By comparing the two values of φ, you can obtain some idea of the association between sex and opinion. This indicates that the relationship of Table 11.2, for which φ = 0.25, is a stronger relationship than is the relationship of Table 11.1, where φ is only 0.144. Also note in the two right panels of Table 11.4 that φ for the weak relationship is the same, regardless of the sample size. As shown earlier, in Tables 11.1 and 11.3, the value of χ 2 is quite different for these two types of data, but φ is the same. That is, the nature of the relationship is the same in the two right panels of Table 11.4, but the sample size is four times greater on the right than in the middle panel. This dramatically increases the size of the chi square statistic, but leaves the values of φ 2 and φ unchanged. Example 11.2.3 Relationship Between Age and Opinion Concerning Male and Female Job Roles The Regina Labour Force Survey asked respondents the question