閤 Are Emily and greg more employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination OR。 Marianne Bertrand: Sendhil Mullainathan The American Economic Review, Vol 94, No 4.(Sep, 2004), pp. 991-1013 Stable url: ttp: //inks. istor org/sici?sici=0002-8282%28200409%02994%3A4%3C991%3AAEAGME%3E2.0. C0%03B2-H The American Economic Review is currently published by American EconomIc Association Your use of the jStoR archive indicates your acceptance of jSTOR's Terms and Conditions of Use, available at http:/lwww.istororg/about/terms.htmlJstOr'sTermsandConditionsofUseprovidesinpartthatunlessyouhaveobtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JStOR archive only for your personal, non-commercial use Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact support(@jstor.org http://www.jstor.org Tue may1510:59:102007
Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination Marianne Bertrand; Sendhil Mullainathan The American Economic Review, Vol. 94, No. 4. (Sep., 2004), pp. 991-1013. Stable URL: http://links.jstor.org/sici?sici=0002-8282%28200409%2994%3A4%3C991%3AAEAGME%3E2.0.CO%3B2-H The American Economic Review is currently published by American Economic Association. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/aea.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact support@jstor.org. http://www.jstor.org Tue May 15 10:59:10 2007
Are Emily and Greg More Employable Than Lakisha and Jamal? a Field Experiment on Labor Market Discrimination By MARIANNE BERTRAND AND SENDHIL MULLAINATHAN* We study race in the labor market by sending fictitious resumes to help-wanted ads in Boston and Chicago newspapers. To manipulate perceived race, resumes are randomly assigned African-American- or White-sounding names. White receive 50 percent more callbacks for interviews. Callbacks are also more re sive to resume quality for Whi es than for African-American ones. The gap is uniform across occupation, industry, and employer size. We also find little evidence that employers are inferring social class from the names. Differential treatment by race still appears to still be prominent in the U.S. labor market. JEL J71,J64) Every measure of economic success reveals dates, employers might favor the African- significant racial inequality in the U.S. labor American one. Data limitations make it market. Compared to Whites, African-Ameri- difficult to empirically test these views. Since cans are twice as likely to be unemployed and researchers possess far less data than employers earn nearly 25 percent less when they are em- do, White and African-American workers that ployed( Council of Economic Advisers, 1998) ear similar to researchers may look very This inequality has sparked a debate ferent to employers. So any racial difference whether employers treat members of different in labor market outcomes could just as easily be races differentially. When faced with observ- attributed to differences that are observable to ably similar African-American and White ap- employers but unobservable to researchers plicants, do they favor the White one? Some To circumvent this difficulty, we conduct a argue yes, citing either employer prejudice or field experiment that builds on the correspon- employer perception that race signals lower pro- dence testing methodology that has been pri ductivity. Others argue that differential treat- marily used in the past to study minority ment by race is a relic of the past, eliminated by outcomes in the United Kingdom. We send some combination of employer enlil chemel resumes in response to help-wanted ads in Chi affirmative action programs and the profit cago and Boston newspapers and measure call maximization motive. In fact, many in this latter back for interview for each sent resume. We amp even feel that stringent enforcement of affirmative action programs has produced an environment of reverse discrimination. They Th ften explains the or performance of would argue that faced with identical candi- African-Americans in terms of supply factors. If African Americans lack many basic skills entering the labor market, then they will perform worse, even with parity or favoritism Chicago, 1101 E. 58th Street, RO 229D, Chicago, IL 6063 Roger Jowell and Patricia Prescott-Clarke(1970). NBER, and CEPR (e-mail: marianne bertrand@ gsb. Jim Hubbuck and Simon Carter(1980), Colin Bi hicago.edu): Mullainathan: Department of Econo Pat Gay(1985), and Peter A Riach and Judith Rich massachusetts Institute of Technology, 50 Memorial Driv One caveat is that some of these studies fail to fully match MA 02142, and NBER (e skills between mi arkowitz, Hong Chung, Almudena Fernandez, Mary Anne racial origin. Doris Weichselbaumer (2003, 2004)studies cha Maheswari, Beverley the artis, Alison Tisza, grant Whitehorn, and Christine Yee ard E nisbett Cohen(1996) provided excellent research assistance. We are also grateful experiment to study how employers to numerous colleagues and seminar participants for very past varies between the North and the p response to criminal helpful comments
Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination We study race in the labor market by sending fictitious resumes to help-wanted ads in Boston and Chicago newspapers. To manipulate perceived race, resumes are randomly assigned African-American- or White-sounding names. White names receive 50 percent more callbacks for interviews. Callbacks are also more responsive to resume quality for White names than for African-American ones. The racial gap is uniform across occupation, industry, and employer size. We also find little evidence that employers are inferring social class from the names. Differential treatment by race still appears to still be prominent in the U.S. labor market. (JEL 571, J64). Every measure of economic success reveals significant racial inequality in the U.S. labor market. Compared to Whites, African-Americans are twice as likely to be unemployed and earn nearly 25 percent less when they are employed (Council of Economic Advisers, 1998). This inequality has sparked a debate as to whether employers treat members of different races differentially. When faced with observably similar African-American and White applicants, do they favor the White one? Some argue yes, citing either employer prejudice or employer perception that race signals lower productivity. Others argue that differential treatment by race is a relic of the past, eliminated by some combination of employer enlightenment, affirmative action programs and the profitmaximization motive. In fact, many in this latter camp even feel that stringent enforcement of affirmative action programs has produced an environment of reverse discrimination. They would argue that faced with identical candi- * Bertrand: Graduate School of Business, University of Chicago, 1101 E. 58th Street, RO 229D, Chicago, IL 60637, NBER, and CEPR (e-mail: marianne. bertrand @gsb. uchicago.edu); Mullainathan: Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, E52-380a, Cambridge, MA 02142, and NBER (e-mail: mullain@mit.edu). David Abrams, Victoria Bede, Simone Berkowitz, Hong Chung, Almudena Femandez, Mary Anne Guediguian, Christine Jaw, Richa Maheswari, Beverley Martis, Alison Tisza, Grant Whitehorn, and Christine Yee provided excellent research assistance. We are also grateful to numerous colleagues and seminar participants for very helpful comments. dates, employers might favor the AfricanAmerican one.' Data limitations make it difficult to empirically test these views. Since researchers possess far less data than employers do, White and African-American workers that appear similar to researchers may look very different to employers. So any racial difference in labor market outcomes could just as easily be attributed to differences that are observable to employers but unobservable to researchers. To circumvent this difficulty, we conduct a field experiment that builds on the correspondence testing methodology that has been primarily used in the past to stud minority outcomes in the United Kingdom! We send resumes in response to help-wanted ads in Chicago and Boston newspapers and measure callback for interview for each sent resume. We ' This camp often explains the poor performance of African-Americans in terms of supply factors. If AfricanAmericans lack many basic skills entering the labor market, then they will perform worse, even with parity or favoritism in hiring. See Roger Jowell and Patricia Prescott-Clarke (1970), Jim Hubbuck and Simon Carter (1980), Colin Brown and Pat Gay (1985), and Peter A. Riach and Judith Rich (1991). One caveat is that some of these studies fail to fully match skills between minority and nonminority resumes. For example some impose differential education background by racial origin. Doris Weichselbaumer (2003, 2004) studies the impact of sex-stereotypes and sexual orientation. Richard E. Nisbett and Dov Cohen (1996) perform a related field experiment to study how employers' response to a criminal past varies between the North and the South in the United States
THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 experimentally manipulate perception of race Whites with lower-quality resumes. On the via the name of the fictitious job applicant. We other hand, having a higher-quality resume has randomly assign very White-sounding names a smaller effect for African-Americans. In other (such as Emily Walsh or Greg Baker)to half the words, the gap between Whites and African- resumes and very African-American-sounding Americans widens with resume quality. while names(such as Lakisha Washington or Jamal one may have expected improved credentials to Jones) to the other half. Because we are also alleviate employers'fear that African-American gap in callback, we experimentally vary the skills, this is not the case in our date servable interested in how credentials affect the racial applicants are deficient in some unob given ad Higher-quality applicants have on ay. 4p The experiment also reveals several other quality of the resumes used in response to a spects of the differential treatment by race erage a little more labor market experience and First, since we randomly assign applicant fewer holes in their employment history; they postal addresses to the resumes, we can stud are also more likely to have an e-mail address, the effect of neighborhood of residence on the have completed some certification degree, pos- likelihood of callback. We find that living in a sess foreign langua 3 In practice, we typically borhood increases callback rates. But, interest ge skills, or have been wealthier(or more educated or Whiter) neigh awarded some honor send four resumes in response to each ad: two ingly, African-Americans are not helped more higher-quality and two lower-quality ones. than Whites by living in a"better"neighbor- We randomly assign to one of the higher- and hood. Second, the racial gap we measure one of the lower-quality resumes an African- different industries does not appear correlated to American-sounding name. In total, we respond Census-based measures of the racial gap in to over 1, 300 employment ads in the sales, wages. The same is true for the racial gap we administrative support, clerical, and customer measure in different occupations. In fact, we services job categories and send nearly 5,000 find that the racial gaps in callback are statisti resumes. The ads we respond to cover a large cally indistinguishable across all the occupation spectrum of job quality, from cashier work at and industry categories covered in the experi retail establishments and clerical work in a mail ment. Federal contractors, who are thought to be room, to office and sales management positions. more severely constrained by affirmative action We find large racial differences in callback laws, do not treat the African-American re- rates Applicants with White names need to sumes more preferentially; neither do larger em- send about 10 resumes to get one callback ployers or employers who explicitly state that whereas applicants with African-American they are"Equal Opportunity Employers. In names need to send about 15 resumes. This Chicago, we find a slightly smaller racial ga 50-percent gap in callback is statistically signi when employers are located in more African- icant. a White name yields as many more call- American neighborhoods backs as an additional eight years of experience The rest of the paper is organized as follows on a resume. Since applicants names are ran- Section I compar domly assigned, this gap can only be attributed work on racial discrimination, and most nota- to the name manipulation bly to the labor market audit studies. We Race also affects the reward to having a bet- describe the experimental design in Section ter resume. Whites with higher-quality resumes II and present the results in Section Ill, subsec receive nearly 30-percent more callbacks than tion A In Section IV, we discuss possible in- terpretations of our results, focusing especiall on two issues First we examine whether the In creating the higher-quality resumes, we deliberate 6 These results contrast with the view, mostly ered in this experiment higher returns to skills. For example, estimating however, these effects are about the several decades of census d We briefly discuss below eckman et al. (2001) show that African- discussion is experience higher returns to a high school degree than offered in Section iv. subsection B Whites do
992 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 experimentally manipulate perception of race via the name of the fictitious job applicant. We randomly assign very White-sounding names (such as Emily Walsh or Greg Baker) to half the resumes and very African-American-sounding names (such as Lakisha Washington or Jamal Jones) to the other half. Because we are also interested in how credentials affect the racial gap in callback, we experimentally vary the quality of the resumes used in response to a given ad. Higher-quality applicants have on average a little more labor market experience and fewer holes in their employment history; they are also more likely to have an e-mail address, have completed some certification degree, possess foreign language skills, or have been awarded some honor^.^ In practice, we typically send four resumes in response to each ad: two higher-quality and two lower-quality ones. We randomly assign to one of the higher- and one of the lower-quality resumes an AfricanAmerican-sounding name. In total, we respond to over 1,300 employment ads in the sales, administrative support, clerical, and customer services job categories and send nearly 5,000 resumes. The ads we respond to cover a large spectrum of job quality, from cashier work at retail establishments and clerical work in a mail room, to office and sales management positions. We find large racial differences in callback rates.4 Applicants with White names need to send about 10 resumes to get one callback whereas applicants with African-American names need to send about 15 resumes. This 50-percent gap in callback is statistically significant. A White name yields as many more callbacks as an additional eight years of experience on a resume. Since applicants' names are randomly assigned, this gap can only be attributed to the name manipulation. Race also affects the reward to having a better resume. Whites with higher-quality resumes receive nearly 30-percent more callbacks than In creating the higher-quality resumes, we deliberately make small changes in credentials so as to minimize the risk of overqualification. For ease of exposition, we refer to the effects uncovered in this experiment as racial differences. Technically, however, these effects are about the racial soundingness of names. We briefly discuss below the potential confounds between name and race. A more extensive discussion is offered in Section IV, subsection B. Whites with lower-quality resumes. On the other hand, having a higher-quality resume has a smaller effect for African-Americans. In other words, the gap between Whites and AfricanAmericans widens with resume quality. While one may have expected improved credentials to alleviate employers' fear that African-American applicants are deficient in some unobservable skills, this is not the case in our data.5 The experiment also reveals several other aspects of the differential treatment by race. First, since we randomly assign applicants' postal addresses to the resumes, we can study the effect of neighborhood of residence on the likelihood of callback. We find that living in a wealthier (or more educated or Whiter) neighborhood increases callback rates. But, interestingly, African-Americans are not helped more than Whites by living in a "better" neighborhood. Second, the racial gap we measure in different industries does not appear correlated to Census-based measures of the racial gap in wages. The same is true for the racial gap we measure in different occupations. In fact, we find that the racial gaps in callback are statistically indistinguishable across all the occupation and industry categories covered in the experiment. Federal contractors, who are thought to be more severely constrained by affirmative action laws, do not treat the African-American resumes more preferentially; neither do larger employers or employers who explicitly state that they are "Equal Opportunity Employers." In Chicago, we find a slightly smaller racial gap when employers are located in more AfricanAmerican neighborhoods. The rest of the paper is organized as follows. Section I compares this experiment to earlier work on racial discrimination, and most notably to the labor market audit studies. We describe the experimental design in Section I1 and present the results in Section 111, subsection A. In Section IV, we discuss possible interpretations of our results, focusing especially on two issues. First, we examine whether the These results contrast with the view, mostly based on nonexperimental evidence, that African-Americans receive higher returns to skills. For example, estimating earnings regressions on several decades of Census data, James J. Heckman et al. (2001) show that African-Americans experience higher returns to a high school degree than Whites do
VOL 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET race-specific names we have chosen might also Rouse(2000), for example, examine the effect proxy for social class above and beyond the race of blind auditioning on the hiring process of of the applicant. Using birth certificate data on orchestras. By observing the treatment of fe- mothers education for the different first names male candidates before and after the introdu used in our sample, we find little relationship tion of blind auditions, they try to measure the between social background and the name- amount of sex discrimination. when such pseu specific callback rates. Second, we discuss how do-experiments can be found, the resulting our results map back to the different models of study can be very informative; but finding such discrimination proposed in the economics liter- experiments has proven to be extremely ature. In doing so, we focus on two important challenging results: the lower returns to credentials for a different set of studies, known as audit African-Americans and the relative homogene- studies, attempts to place comparable minority ity of the racial gap across occupations and and White actors into actual social and eco- industries. We conclude that existing models do nomic settings and measure how each group a poor job of explaining the full set of findings. fares in these settings. Labor market audit Section V concludes studies send comparable minority(African- American or Hispanic) and White auditors in I. Previous Research for interviews and measure whether one is more likely to get the job than the other. while the With conventional labor force and household results vary somewhat across studies, minority surveys, it is difficult to study whether differ- auditors tend to perform worse on average: they ential treatment occurs in the labor market. are less likely to get called back for a second Armed only with survey data, researchers usu- interview and, conditional on getting called ally measure differential treatment by compar- back, less likely to get hired ing the labor market performance of Whites and These audit studies provide some of the African-Americans (or men and women) for cleanest nonlaboratory evidence of differential which they observe similar sets of skills. But treatment by race. But they also have weak such comparisons can be quite misleading. nesses, most of which have been highlighted in Standard labor force surveys do not contain all Heckman and Siegelman( 1992)and Heckman the characteristics that employers observe when (1998). First, these studies require that both hiring, promoting, or setting wages. So one can members of the auditor pair are identical in all never be sure that the minority and nonminority dimensions that might affect productivity in workers being compared are truly similar from employers eyes, except for race. To accomplish the employers'perspective. As a consequence, this, researchers typically match auditors on any measured differences in outcomes could be several characteristics(height, weight, age, di attributed to these unobserved (to the re- alect, dressing style, hairdo) and train them for several days to coordinate interviewing styles This difficulty with conventional data has Yet, critics note that this is unlikely to erase the led some authors to instead rely on pseudo- numerous differences that exist between the au experiments. Claudia Goldin and Cecilia ditors in a pair Another weakness of the audit studies is that they are not double-blind. Auditors know the e 6 We also argue that a social class interpretation would purpose of the study. As Turner et al.(1991) ndings, such as why living in a better neighborhood does not increase callback rates ican- American names than for white names. Michael Fix and Marjery A. Turner(1998)provide a See Joseph G, Altonji and Rebecca M. Blank(1999) survey of many such audit studies (1978)and Shelby J. Mclntyre et al. (1980). Three more B William A. Darity, Jr. and Patrick L. escribe an interesting nonexperimental study. Prior to the and Steve w. DelCastillo (1991), and Turner et al.(1991) Civil Rights Act of 1964, ial biases, providing a direct measure of differential and Altonji and Blank (1999)summarize these studies. See treatment, Of course, as ( 998)mentions, discrin also David Neumark(1996)for a labor market audit stud nation was at that time too evident for detection on gender discrimination
VOL. 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET 993 race-specific names we have chosen might also proxy for social class above and beyond the race of the applicant. Using birth certificate data on mother's education for the different first names used in our sample, we find little relationship between social back round and the namespecific callback rates. WSecond, we discuss how our results map back to the different models of discrimination proposed in the economics literature. In doing so, we focus on two important results: the lower returns to credentials for African-Americans and the relative homogeneity of the racial gap across occupations and industries. We conclude that existing models do a poor job of explaining the full set of findings. Section V concludes. 1. Previous Research With conventional labor force and household surveys, it is difficult to study whether differential treatment occurs in the labor market.7 Armed only with survey data, researchers usually measure differential treatment by comparing the labor market performance of Whites and African-Americans (or men and women) for which they observe similar sets of skills. But such comparisons can be quite misleading. Standard labor force surveys do not contain all the characteristics that employers observe when hiring, promoting, or setting wages. So one can never be sure that the minority and nonminority workers being compared are truly similar from the employers' perspective. As a consequence, any measured differences in outcomes could be attributed to these unobserved (to the researcher) factors. This difficulty with conventional data has led some authors to instead rely on pseudoexperiments.* Claudia Goldin and Cecilia We also argue that a social class interpretation would find it hard to explain some of our findings, such as why living in a better neighborhood does not increase callback rates more for African-American names than for White names. 'See Joseph G. Altonji and Rebecca M. Blank (1999) for a detailed review of the existing literature on racial discrimination in the labor market. William A. Darity, Jr. and Patrick L. Mason (1998) describe an interesting nonexperimental study. Prior to the Civil Rights Act of 1964, employment ads would explicitly state racial biases, providing a direct measure of differential treatment. Of course, as Arrow (1998) mentions, discrimination was at that time "a fact too evident for detection." Rouse (2000), for example, examine the effect of blind auditioning on the hiring process of orchestras. By observing the treatment of female candidates before and after the introduction of blind auditions, they try to measure the amount of sex discrimination. When such pseudo-experiments can be found, the resulting study can be very informative; but finding such experiments has proven to be extremely challenging. A different set of studies, known as audit studies, attempts to place comparable minority and White actors into actual social and economic settings and measure how each group fares in these settings9 Labor market audit studies send comparable minority (AfricanAmerican or Hispanic) and White auditors in for interviews and measure whether one is more likely to get the job than the other.'' While the results vary somewhat across studies, minority auditors tend to perform worse on average: they are less likely to get called back for a second interview and, conditional on getting called back, less likely to get hired. These audit studies provide some of the cleanest nonlaboratory evidence of differential treatment by race. But they also have weaknesses, most of which have been highlighted in Heckman and Siegelman (1992) and Heckman (1998). First, these studies require that both members of the auditor pair are identical in all dimensions that might affect productivity in employers' eyes, except for race. To accomplish this, researchers typically match auditors on several characteristics (height, weight, age, dialect, dressing style, hairdo) and train them for several days to coordinate interviewing styles. Yet, critics note that this is unlikely to erase the numerous differences that exist between the auditors in a pair. Another weakness of the audit studies is that they are not double-blind. Auditors know the purpose of the study. As Turner et al. (1991) Michael Fix and Marjery A. Turner (1998) provide a survey of many such audit studies. lo Earlier hiring audit studies include Jerry M. Newman (1978) and Shelby J. McIntyre et al. (1980). Three more recent studies are Harry Cross et al. (1990), Franklin James and Steve W. DelCastillo (19911, and Turner et al. (1991). Heckman and Peter Siegelman (1992), Heckman (1998), and Altonji and Blank (1999) summarize these studies. See also David Neumark (1996) for a labor market audit study on gender discrimination
THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 note:"The first day of training also included an examine the nature of the differential treatment introduction to employment discrimination, from many more angles equal employment opportunity, and a review of project design and methodology. This may I. Experimental I generate conscious or subconscious motives among auditors to generate data consistent or A. Creating a Bank of Resu inconsistent with their beliefs about race issues in America. As psychologists know very well, The first step of the experimental design is to these demand effects can be quite strong. It is generate templates for the resumes to be sent very difficult to insure that auditors will not The challenge is to produce a set of realistic and yant to do"a good job. " Since they know the representative resumes without using resumes goal of the experiment, they can alter their that belong to actual job seekers. To achieve behavior in front of employers to express (indi- this goal, we start with resumes of actual job rectly)their own views. Even a small belief by searchers but alter them sufficiently to create ently can result in measured differences in treat- structure and realism of the initial resumes with ment. This effect is further magnified by the fact out compromising their owners. that auditors are not in fact seeking jobs and are We begin with resumes posted on two job therefore more free to let their beliefs affect the search Web sites as the basis for our artificial sumes. While the resumes posted on these Finally, audit studies are extremely expen- Web sites may not be completely representative sive, making it difficult to generate large of the average job seeker, they provide a prac- enough samples to understand nuances and pos- tical approximation 12 We restrict ourselves to sible mitigating factors. Also, these budgetary people seeking employment in our experimental constraints worsen the problem of mismatched cities(Boston and Chicago). We also restrict auditor pairs. Cost considerations force the use ourselves to four occupational categories: sales of a limited number of pairs of auditors, mean- administrative support, clerical services, and ing that any one mismatched pair can easily customer services. Finally, we further restrict drive the results. In fact, these studies generally ourselves to resumes posted more than six tend to find significant differences in outcomes months prior to the start of the experiment. We across paIr purge the selected resumes of the person's name Our study circumvents these problems. First, and contact information because we only rely on resumes and not peo- During this process, we classify the resumes ple, we can be sure to generate comparability within each detailed occupational category into across race. In fact, since race is randomly as- two groups: high and low quality. In judging signed to each resume, the same resume will resume quality, we use criteria such as labor sometimes be associated with an African- market experience, career profile, existence of American name and sometimes with a White gaps in employment, and skills listed. Such a name. This guarantees that any differences we classification is admittedly subjective but it is find are caused solely by the race manipulation. made independently of any race assignment on econd, the use of paper resumes insulates us the resumes(which occurs later in the experi from demand effects. While the research assis- mental design). To further reinforce the quality tants know the purpose of the study, our proto- gap between the two sets of resumes, we add to col allows little room for conscious or each high-quality resume a subset of the follow subconscious deviations from the set proce- ing features: summer or while-at-school em- dures. Moreover, we can objectively measure ployment experience, volunteering experience, whether the randomization occurred as ex- extra computer skills, certification degrees, for pected. This kind of objective measurement is eign language skills, honors, or some military impossible in the of the previous audit ginal cost, we can send out a large number of rewww.careerbuilder.comandwww resumes. Besides giving us more precise esti ve found large variation in skill levels mates, this larger sample size also allows us to among people posting their resumes on these sites
994 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 note: "The first day of training also included an introduction to employment discrimination, equal employment opportunity, and a review of project design and methodology." This may generate conscious or subconscious motives among auditors to generate data consistent or inconsistent with their beliefs about race issues in America. As psychologists know very well, these demand effects can be quite strong. It is very difficult to insure that auditors will not want to do "a good job." Since they know the goal of the experiment, they can alter their behavior in front of employers to express (indirectly) their own views. Even a small belief by auditors that employers treat minorities differently can result in measured differences in treatment. This effect is further magnified by the fact that auditors are not in fact seeking jobs and are therefore more free to let their beliefs affect the interview process. Finally, audit studies are extremely expensive, making it difficult to generate large enough samples to understand nuances and possible mitigating factors. Also, these budgetary constraints worsen the problem of mismatched auditor pairs. Cost considerations force the use of a limited number of pairs of auditors, meaning that any one mismatched pair can easily drive the results. In fact, these studies generally tend to find significant differences in outcomes across pairs. Our study circumvents these problems. First, because we only rely on resumes and not people, we can be sure to generate comparability across race. In fact, since race is randomly assigned to each resume, the same resume will sometimes be associated with an AfricanAmerican name and sometimes with a White name. This guarantees that any differences we find are caused solely by the race manipulation. Second, the use of paper resumes insulates us from demand effects. While the research assistants know the purpose of the study, our protocol allows little room for conscious or subconscious deviations from the set procedures. Moreover, we can objectively measure whether the randomization occurred as expected. This kind of objective measurement is impossible in the case of the previous audit studies. Finally, because of relatively low marginal cost, we can send out a large number of resumes. Besides giving us more precise estimates, this larger sample size also allows us to examine the nature of the differential treatment from many more angles. 11. Experimental Design A. Creating a Bank of Resumes The first step of the experimental design is to generate templates for the resumes to be sent. The challenge is to produce a set of realistic and representative resumes without using resumes that belong to actual job seekers. To achieve this goal, we start with resumes of actual job searchers but alter them sufficiently to create distinct resumes. The alterations maintain the structure and realism of the initial resumes without compromising their owners. We begin with resumes posted on two job search Web sites as the basis for our artificial resumes." While the resumes posted on these Web sites may not be completely representative of the average job seeker, they provide a practical approximation.12 We restrict ourselves to people seeking employment in our experimental cities (Boston and Chicago). We also restrict ourselves to four occupational categories: sales, administrative support, clerical services, and customer services. Finally, we further restrict ourselves to resumes posted more than six months prior to the start of the experiment. We purge the selected resumes of the person's name and contact information. During this process, we classify the resumes within each detailed occupational category into two groups: high and low quality. In judging resume quality, we use criteria such as labor market experience, career profile, existence of gaps in employment, and skills listed. Such a classification is admittedly subjective but it is made independently of any race assignment on the resumes (which occurs later in the experimental design). To further reinforce the quality gap between the two sets of resumes, we add to each high-quality resume a subset of the following features: summer or while-at-school employment experience, volunteering experience, extra computer skills, certification degrees, foreign language skills, honors, or some military "The sites are www.careerbuilder.com and www. americasjobbank.com. l2 In practice, we found large variation in skill levels among people posting their resumes on these sites
VOL 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET experience. This resume quality manipulation which names are distinctively White and which needs to be somewhat subtle to avoid making a are distinctively African-American. Distinctive higher-quality job applicant overqualified fc names are those that have the highest ratio of given job. We try to avoid this problem by frequency in one racial group to frequency in making sure that the features listed above are the other racial group not all added at once to a given resume. this As a check of distinctiveness, we conducted a leaves us with a high-quality and a low-quality survey in various public areas in Chicago. Each pool of resumes respondent was asked to assess features of a To minimize similarity to actual job seekers, person with a particular name, one of which is we use resumes from Boston job seekers to race. For each name, 30 respondents were asked form templates for the resumes to be sent out in to identify the name as either "White " "African- Chicago and use resumes from Chicago job American, ""Other, "or"Cannot Tell. "In gen- seekers to form templates for the resumes to be eral, the names led respondents to readily sent out in Boston. To implement this migra- attribute the expected race for the person but previous employers on the resumes More spe- were disregarded, ptions and these names ion, we alter the names of the schools and there were a few ex cifically, for each Boston resume we use the The final list of first names used for this study Chicago resumes to replace a Boston school is shown in Appendix Table Al. The table with a Chicago school. 4 We also use the Chi- reports the relative likelihood of the names for cago resumes to replace a Boston employer with the Whites and African-Americans in the Mas a Chicago employer in the same industry. We sachusetts birth certificates data as well as use a similar procedure to migrate Chicago re- the recognition rate in the field survey. As sumes to Boston. This produces distinct but Appendix Table Al indicates, the African- realistic looking resumes, similar in their edu- American first names used in the experiment are cation and career profiles to this subpopulation quite common in the population. This suggests of job searchers. that by using these names as an indicator of race, we are actually covering a rather large B. Identities of Fictitious Applicants segment of the African-American population Applicants in each race/sex/city/resume qual The next step is to generate identities for the ity cell are allocated the same phone number fictitious job applicants: names, telephone num- This guarantees that we can precisely track em bers, postal addresses, and(possibly) e-mail ployer callbacks in each of these cells. The addresses. The choice of names is crucial to our phone lines we use are virtual ones with only a experiment. To decide on which names are voice mailbox attached to them. a similar out uniquely African-American and which are going message is recorded on each of the voice uniquely white, name frequency data mailboxes but each message is recorded by calculated from birth certificates of all babies someone of the appropriate race and ge born in Massachusetts between 1974 and 1979 We tabulate these data by race to determine 18 For example, Maurice and Jerome are distinctively African-American names in a frequency sense yet are not perceived as such by many peopl lever tailed summary of resume characteristics by quality cause there is censoring of the data at five births. If there an In Section Ill, subsection B, and Table 3, we provi ikelihood ratio of∞ ewer than five babies in any race/name cell,it We try as much as possible to match high schools and (and we do not know whether a cell has ity and demographic characteristic censored). This is primarily a problem for the Note that for applicants with schooling or work expe- of how many African-American babies hav rience outside of the Boston or Chicago areas, we leave the names use more White-sounding last name We also generate a set of different fonts, layouts, and for White applicants and more African-Americar over letters to further differentiate the resumes. These are last names for African-American applicants. The lations of Murphy, Murray, O'Brien, Ryan, Sullivan, and Walsh. The race, such as affiliation with a minority group because we last names used for African-American applicants are: Jack- felt such affiliations may especially convey more than race. son, Jones, Robinson, Washington, and williams
VOL. 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET 995 experience. This resume quality manipulation needs to be somewhat subtle to avoid malung a higher-quality job applicant overqualified for a given job. We try to avoid this problem by making sure that the features listed above are not all added at once to a given resume. This leaves us with a high-quality and a low-quality pool of resumes.13 To minimize similarity to actual job seekers, we use resumes from Boston job seekers to form templates for the resumes to be sent out in Chicago and use resumes from Chicago job seekers to form templates for the resumes to be sent out in Boston. To implement this rnigration, we alter the names of the schools and previous employers on the resumes. More specifically, for each Boston resume, we use the Chicago resumes to replace a Boston school with a Chicago schoo~.'~ We also use the Chicago resumes to replace a Boston employer with a Chicago employer in the same industry. We use a similar procedure to migrate Chicago resumes to ~0ston.l~ This produces distinct but realistic looking resumes, similar in their education and career rofiles to this subpopulation of job searchers. 1t' B. Identities of Fictitious Applicants The next step is to generate identities for the fictitious job applicants: names, telephone numbers, postal addresses, and (possibly) e-mail addresses. The choice of names is crucial to our experiment." To decide on which names are uniquely African-American and which are uniquely White, we use name frequency data calculated from birth certificates of all babies born in Massachusetts between 1974 and 1979. We tabulate these data by race to determine 13 In Section 111, subsection B, and Table 3, we provide a detailed summary of resume characteristics by quality level. l4 We try as much as possible to match high schools and colleges on quality and demographic characteristics. l5 Note that for applicants with schooling or work experience outside of the Boston or Chicago areas, we leave the school or employer name unchanged. l6 We also generate a set of different fonts, layouts, and cover letters to further differentiate the resumes. These are applied at the time the resumes are sent out. I' We chose name over other potential manipulations of race, such as affiliation with a minority group, because we felt such affiliations may especially convey more than race. which names are distinctively White and which are distinctively African-American. Distinctive names are those that have the highest ratio of frequency in one racial group to frequency in theother racial group. As a check of distinctiveness, we conducted a survey in various public areas in Chicago. Each respondent was asked to assess features of a person with a particular name, one of which is race. For each name, 30 respondents were asked to identify the name as either "White," "AfricanAmerican," "Other," or "Cannot Tell." In general, the names led respondents to readily attribute the expected race for the person but there were a few exceptions and these names were disregarded.'* The final list of first names used for this study is shown in Appendix Table Al. The table reports the relative likelihood of the names for the Whites and African-Americans in the Massachusetts birth certificates data as well as the recognition rate in the field survey.19 As Appendix Table A1 indicates, the AfricanAmerican first names used in the experiment are quite common in the population. This suggests that by using these names as an indicator of race, we are actually covering a rather large segment of the African-American population.20 Applicants in each race/sex/city/resume quality cell are allocated the same phone number. This guarantees that we can precisely track employer callbacks in each of these cells. The phone lines we use are virtual ones with only a voice mailbox attached to them. A similar outgoing message is recorded on each of the voice mailboxes but each message is recorded by someone of the appropriate race and gender. la For example, Maurice and Jerome are distinctively African-American names in a frequency sense yet are not perceived as such by many people. l9 So many of names show a likelihood ratio of x because there is censoring of the data at five births. If there are fewer than five babies in any racelname cell, it is censored (and we do not know whether a cell has zero or was censored). This is primarily a problem for the computation of how many African-American babies have "White" names. 'O We also tried to use more White-sounding last names for White applicants and more African-American-sounding last names for African-American applicants. The last names used for White applicants are: Baker, Kelly, McCarthy, Murphy, Murray, O'Brien, Ryan, Sullivan, and Walsh. The last names used for African-American applicants are: Jackson, Jones, Robinson, Washington, and Williams
THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 Since we allocate the same phone number fo ample four resumes(two high-quality and two applicants with different names, we cannot use low-quality) that fit the job description and re a person name in the outgoing message quirements as closely as possible. In some While we do not expect positive feedback cases, we slightly alter the resumes to improve from an employer to take place via postal mail, the quality of the match, such as by adding the resumes still need postal addresses. We there- knowledge of a specific software program fore construct fictitious addresses based on real One of the high- and one of the low-quality streets in Boston and Chicago using the white resumes selected are then drawn at random to Pages. We select up to three addresses in each receive African-American names, the other 5-digit zip code in Boston and Chicago. Within high- and low-quality resumes receive White cities,we randomly assign addresses across all names. We use male and female names for resumes. We also create eight e-mail addresses, sales jobs, whereas we use nearly exclusivel four for Chicago and four for Boston. These female names for administrative and clerical e-mail addresses are neutral with respect to both jobs to increase callback rates. Based on sex race and sex. Not all applicants are given an race, city, and resume quality, we assign a re- e-mail address. The e-mail addresses are used sume the appropriate phone number. We also almost exclusively for the higher-quality re- select at random a postal address. Finally, e sumes. This procedure leaves us with a bank of mail addresses are added to most of the high names, phone numbers, addresses, and e-mail quality resumes. The final resumes are addresses that we can assign to the template formatted, with fonts, layout, and cover letter resumes when responding to the employment style chosen at random. The resumes are then faxed (or in a few cases mailed)to the em- ployer. All in all, we respond to more than C. Responding to Ads 1, 300 employment ads over the entire sample period and send close to 5, 000 resumes 2001 and January 2002 in Boston and between July 2001 and May 2002 in Chicago. 22Over D. Measuring Responses that period, we surveyed all employment ads in We measure whether a given resume elicits a the Sunday editions of The Boston Globe and callback or e-mail back for an interview. For The Chicago Tribune in the sales, administra- each phone or e-mail response, we use the con- tive support, and clerical and customer services tent of the message left by the employer(name sections. We eliminate any ad where applicants of the applicant, company name, telephone were asked to call or appear in person. In fact, number for contact)to match the response to the most of the ads we surveyed in these job cate- corresponding resume-ad pair. Any attempt gories ask for applicants to fax in or(more by employers to contact applicants via postal rarely) mail in their resume. We log the name mail cannot be measured in our experiment (when available) and contact information for since the addresses are fictitious. Several hu each employer, along with any information on man resource managers confirmed to us that he position advertised and specific require- ments(such as education, experience, or com puter skills). We also record whether or not the ad explicitly states that the employer is an equal resumes that are appropriate matches for a given ad In such pportunity employer. instances, we send only two resumes. For each ad. we use the bank of resumes to Though the same na ce that no given ad receives multiple Male names were used for a few administrative jobs in 2 The e-mail addresses are registered on Yahoo. com, the first month of the experiment Angelfire. com, or Hotmail. com 26 In the first month of the experiment, a few high- This period spans tighter and slacker labor markets. In quality resumes were sent without e-mail addresses and a our data, this is apparent as callback rates(and number of low-quality resumes were given e-mail addresses. See new ads)dropped after September 11, 2001. Interestingly, Table 3 for details however, the racial gap we measure is the same across these 27Very few employers used e-mail to contact an appli- two periods cant back
996 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 Since we allocate the same phone number for applicants with different names, we cannot use a person name in the outgoing message. While we do not expect positive feedback from an employer to take place via postal mail, resumes still need postal addresses. We therefore construct fictitious addresses based on real streets in Boston and Chicago using the White Pages. We select up to three addresses in each 5-digit zip code in Boston and Chicago. Within cities, we randomly assign addresses across all resumes. We also create eight e-mail addresses, four for Chicago and four for ~oston.~~ These e-mail addresses are neutral with respect to both race and sex. Not all applicants are given an e-mail address. The e-mail addresses are used almost exclusively for the higher-quality resumes. This procedure leaves us with a bank of names, phone numbers, addresses, and e-mail addresses that we can assign to the template resumes when responding to the employment ads. C. Responding to Ads The experiment was carried out between July 2001 and January 2002 in Boston and between July 2001 and May 2002 in ~hica~o.~~ Over that period, we surveyed all employment ads in the Sunday editions of The Boston Globe and The Chicago Tribune in the sales, administrative support, and clerical and customer services sections. We eliminate any ad where applicants were asked to call or appear in person. In fact, most of the ads we surveyed in these job categories ask for applicants to fax in or (more rarely) mail in their resume. We log the name (when available) and contact information for each employer, along with any information on the position advertised and specific requirements (such as education, experience, or computer skills). We also record whether or not the ad explicitly states that the employer is an equal opportunity employer. For each ad,-we use the bank of resumes to "The e-mail addresses are registered on Yahoo.com, Angelfire.com, or Hotmail.com. 22 This period spans tighter and slacker labor markets. In our data, this is apparent as callback rates (and number of new ads) dropped after September 11, 2001. Interestingly, however, the racial gap we measure is the same across these two periods. sample four resumes (two high-quality and two low-quality) that fit the job description and requirements as closely as possible.23 In some cases, we slightly alter the resumes to improve the quality of the match, such as by adding the knowledge of a specific software program. One of the high- and one of the low-quality resumes selected are then drawn at random to receive African-American names, the other high- and low-quality resumes receive White names.24 We use male and female names for sales jobs, whereas we use nearly exclusively female names for administrative and clerical jobs to increase callback rates.25 Based on sex, race, city, and resume quality, we assign a resume the appropriate phone number. We also select at random a postal address. Finally, email addresses are added to most of the highquality resumes.26 The final resumes are formatted, with fonts, layout, and cover letter style chosen at random. The resumes are then faxed (or in a few cases mailed) to the employer. All in all, we respond to more than 1,300 employment ads over the entire sample period and send close to 5,000 resumes. D. Measuring Responses We measure whether a given resume elicits a callback or e-mail back for an interview. For each phone or e-mail response, we use the content of the message left by the employer (name of the applicant, company name, telephone number for contact) to match the response to the corresponding resume-ad pair.27 Any attempt by employers to contact applicants via postal mail cannot be measured in our experiment since the addresses are fictitious. Several human resource managers confirmed to us that 27 In some instances, our resume bank does not have four resumes that are appropriate matches for a given ad. In such instances, we send only two resumes. 24 Though the same names are repeatedly used in our experiment, we guarantee that no given ad receives multiple resumes with the same name. 25 Male names were used for a few administrative jobs in the first month of the experiment. 26 In the first month of the experiment, a few highquality resumes were sent without e-mail addresses and a few low-quality resumes were given e-mail addresses. See Table 3 for details. 27 Very few employers used e-mail to contact an applicant back
VOL 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET TABLE 1-MEAN CALLBACK RATES BY RACIAL SOUNDINGNESS OF NAMES Percent callback Percent callback for Percent difference African-American names Ratio All se 965 6.45 [243 (0000 Chicago 149 [1,352 Boste [1083 (00003) Females in administrative jobs 1,358】 [1,359 0.0003) Females in sales jobs [502] (0.3523) 575 (00513) different subsamples of sent resumes, the callback rates for applicants with American-sounding name(column 2), as well as the ratio(column 3)and ng the null hypothesis that the callback rates are equal across racial groups. employers rarely, if ever, contact applicants via name). We return to this issue in Section IV, ostal mail to set up interviews subsection B Finally, and this is an issue pervasive in both E. Weaknesses of the experiment our study and the pair-matching audit studies, ds represent only one channel for We have already highlighted the strengths of job search. As is well known from previous this experiment relative to previous audit stud- work, social networks are another common ies. We now discuss its weaknesses. First, our means through which people find jobs and one outcome measure is crude, even relative to the that clearly cannot be studied here. This omis- previous audit studies. Ultimately, one cares sion could qualitatively affect our results if about whether an applicant gets the job and African-Americans use social networks more or about the wage offered conditional on getting if employers who rely more on networ the job. Our procedure, however, simply mea entiate less by ra he search process has even moderate frictions, III. Results one would expect that reduced interview rates would translate into reduced job offers. How A. Is There a Racial Gap in Callback? ever, we are not able to translate our results into aps in hiring rates or gaps in earnings Table I tabulates average callback rates by Another weakness is that the resumes do not racial soundingness of names. Included in directly report race but instead suggest race brackets under each rate is the number of re- through personal names. This leads to various sumes sent in that cell. Row I presents our chosen to make race salient, some employers may simply not notice the names or not recog nize their racial content. On a related note 28 As Appendix Table Al indicates, the African because we are not assigning race but only American names we use are, however, quite common race-specific names, our results are not repre- aman g Africah-Amenicans deng this ae s on COncem. sentative of the average African-American ay rely less on social networks for their job search( Harry who may not have such a racially distinct J. Holzer, 1987)
VOL. 94 NO. 4 BERTRAND AND MULUINATHAN: RACE IN THE LABOR MARKET 997 Percent callback Percent callback for Percent difference for White names African-American names Ratio (D-value) Sample: All sent resumes Chicago Boston Females Females in administrative jobs Females in sales jobs Males Notes: The table reports, for the entire sample and different subsamples of sent resumes, the callback rates for applicants with a White-sounding name (column 1) an an African-American-sounding name (column 2), as well as the ratio (column 3) and difference (column 4) of these callback rates. In brackets in each cell is the number of resumes sent in that cell. Column 4 also reports the p-value for a test of proportion testing the null hypothesis that the callback rates are equal across racial groups. employers rarely, if ever, contact applicants via postal mail to set up interviews. E. Weaknesses of the Experimen~ We have already highlighted the strengths of this experiment relative to previous audit studies. We now discuss its weaknesses. First, our outcome measure is crude, even relative to the previous audit studies. Ultimately, one cares about whether an applicant gets the job and about the wage offered conditional on getting the job. Our procedure, however, simply measures callbacks for interviews. To the extent that the search process has even moderate frictions, one would expect that reduced interview rates would translate into reduced job offers. However, we are not able to translate our results into gaps in hiring rates or gaps in earnings. Another weakness is that the resumes do not directly report race but instead suggest race through personal names. This leads to various sources of concern. First, while the names are chosen to make race salient, some employers may simply not notice the names or not recognize their racial content. On a related note, because we are not assigning race but only race-specific names, our results are not representative of the average African-American (who may not have such a racially distinct name).28 We return to this issue in Section IV, subsection B. Finally, and this is an issue pervasive in both our study and the pair-matching audit studies, newspaper ads represent only one channel for job search. As is well known from previous work, social networks are another common means through which people find jobs and one that clearly cannot be studied here. This omission could qualitatively affect our results if African-Americans use social networks more or if employers who rely more on networks differentiate less by race.29 111. Results A. Is There a Racial Gap in Callback? Table 1 tabulates average callback rates by racial soundingness of names. Included in brackets under each rate is the number of resumes sent in that cell. Row 1 presents our results for the full data set. Resumes with White As Appendix Table A1 indicates, the AfricanAmerican names we use are, however, quite common among African-Americans, making this less of a concern. 29 In fact, there is some evidence that African-Americans may rely less on social networks for their job search (Hany J. Holzer, 1987)
THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 names have a 9.65 percent chance of receiving than African-American applicants to receive a a callback. Equivalent resumes with African- callback in Chicago and 50 percent more likely American names have a 6.45 percent chance of in Boston. These racial differences are statisti being called back. This represents a difference cally significant in both cities in callback rates of 3, 20 percentage points, or 50 Finally, rows 4 to 7 break down the full percent, that can solely be attributed to the name sample into female and male applicants. Row 4 manipulation. Column 4 shows that this differ- displays the average results for all female names ence is statistically significant. Put in other while rows 5 and 6 break the female sample into words, these results imply that a White appli- administrative(row 5)and sales jobs(row 6); cant should expect on average one callback for row 7 displays the average results for all male every 10 ads she or he applies to; on the other names. As noted earlier, female names were hand, an African-American applicant would used in both sales and administrative job open need to apply to about 15 different ads to ings whereas male names were used close to achieve the same result exclusively for sales openings. Looking How large are these effects? While the cost of across occupations, we find a significant racial ending additional resumes might not be large gap in callbacks for both males(52 percent)and per se, this 50-percent gap could be quite sub- females(49 percent). Comparing males to fe- stantial when compared to the rate of arrival of males in sales occupations, we find a larger new job openings. In our own study, the biggest racial gap among males(52 percent versus 22 constraining factor in sending more resumes percent). Interestingly, females in sales jobs ap- was the limited number of new job openings pear to receive more callbacks than males; how each week. Another way to benchmark the mea- ever, this(reverse) gender gap is statistically sured return to a white name is to compare it to insignificant and economically much smaller the returns to other resume characteristics For than any of the racial gaps discussed above example, in Table 5, we will show that, at the Rather than studying the distribution of call- average number of years of experience in our backs at the applicant level, one can also tabu- sample, an extra year of experience increases late the distribution of callbacks at the point. Based on this point estimate, the return to the fraction of employers that treat White and a White name is equivalent to about eight ad- African-American applicants equally, the frac ditional years of experience tion of employers that favor White appl Rows 2 and 3 break down the full sample of cants and the fraction of employers that favor sent resumes into the Boston and Chicago mar- African-American applicants Because we send kets. About 20 percent more resumes were sent up to four resumes in response to each sampled in Chicago than in Boston. The average call- ad, the three categories above can each take back rate(across races)is lower in Chicago than three different forms. Equal treatment occurs in Boston. This might reflect differences in la- when either no applicant gets called back, one bor market conditions across the two cities over White and one African-American get called the experimental period or maybe differences in back or two Whites and two African-Americans the ability of the MiT and Chicago teams of get called back. whites are favored when either research assistants in sele resumes that only one White gets called back, two Whites were good matches for a given help-wanted ad. and no African-American get called back or two The percentage difference in callback rates Whites and one African-American get called however, strikingly similar across both cities. back. African-Americans are favored in all White applicants are 49 percent more likely other cases As Table 2 indicates, equal treatment occurs for about 88 percent of the help-wanted ads, As 30 These statistical tests assume independence of call- expected, the major source of equal treatment backs. We have, however, verified that the results stay comes from the high fraction of ads for which significant when we assume that the callbacks are correlated 31 This oby cants cannot assess a priori which firms are more likely to Only about 6 percent of all male resumes were sent in response to an administrative job opening
998 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2004 names have a 9.65 percent chance of receiving a callback. Equivalent resumes with AfricanAmerican names have a 6.45 percent chance of being called back. This represents a difference in callback rates of 3.20 percentage points, or 50 percent, that can solely be attributed to the name manipulation. Column 4 shows that this difference is statistically ~ignificant.~' Put in other words, these results imply that a White applicant should expect on average one callback for every 10 ads she or he applies to; on the other hand, an African-American applicant would need to apply to about 15 different ads to achieve the same result.31 How large are these effects? While the cost of sending additional resumes might not be large per se, this 50-percent gap could be quite substantial when compared to the rate of arrival of new job openings. In our own study, the biggest constraining factor in sending more resumes was the limited number of new job openings each week. Another way to benchmark the measured return to a White name is to compare it to the returns to other resume characteristics. For example, in Table 5, we will show that, at the average number of years of experience in our sample, an extra year of experience increases the likelihood of a callback by a 0.4 percentage point. Based on this point estimate, the return to a White name is equivalent to about eight additional years of experience. Rows 2 and 3 break down the full sample of sent resumes into the Boston and chicago markets. About 20 percent more resumes were sent in Chicago than in Boston. The average callback rate (across races) is lower in Chicago than in Boston. This might reflect differences in labor market conditions across the two cities over the experimental period or maybe differences in the ability of the MIT and Chicago teams of research assistants in selecting resumes that were good matches for a given help-wanted ad. The percentage difference in callback rates is, however, strikingly similar across both cities. White applicants are 49 percent more likely 30 These statistical tests assume independence of callbacks. We have, however, verified that the results stay significant when we assume that the callbacks are correlated either at the employer or first-name level. 31 This obviously assumes that African-American applicants cannot assess a priori which firms are more likely to treat them more or less favorably. than African-American applicants to receive a callback in Chicago and 50 percent more likely in Boston. ~heseracial differences are statistically significant in both cities. Finally, rows 4 to 7 break down the full sample into female and male applicants. Row 4 displays the average results for all female names while rows 5 and 6 break the female sample into administrative (row 5) and sales jobs (row 6); row 7 displays the average results for all male names. As noted earlier, female names were used in both sales and administrative job openings whereas male names were used close to exclusively for sales openings.32 Looking across occupations, we find a significant racial gap in callbacks for both males (52 percent) and females (49 percent). Comparing males to females in sales occupations, we find a larger racial gap among males (52 percent versus 22 percent). Interestingly, females in sales jobs appear to receive more callbacks than males; however, this (reverse) gender gap is statistically insignificant and economically much smaller than any of the racial gaps discussed above. Rather than studying the distribution of callbacks at the applicant level, one can also tabulate the distribution of callbacks at the employment-ad level. In Table 2, we compute the fraction of employers that treat White and African-American applicants equally, the fraction of employers that favor White applicants and the fraction of employers that favor African-American applicants. Because we send up to four resumes in response to each sampled ad, the three categories above can each take three different forms. Equal treatment occurs when either no applicant gets called back, one White and one African-American get called back or two Whites and two African-Americans get called back. Whites are favored when either only one White gets called back, two Whites and no African-American get called back or two Whites and one African-American get called back. African-Americans are favored in all other cases. As Table 2 indicates, equal treatment occurs for about 88 percent of the help-wanted ads. As expected, the major source of equal treatment comes from the high fraction of ads for which 32 Only about 6 percent of all male resumes were sent in response to an administrative job opening
BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET TABLE 2--DISTRIBUTION OF CALLBACKS BY EMPLOYMENT AI lo Callback 88.13 percent [1,103] Favored (WF): W+ OB 8.39 percent African-Americans Favored(BF) 2B+1W 33 Ho: WF= BF p=0.0000 whieh none of the fictitious be casts received acablacaksk tw emlBv is he percent of ads for which xactlvrne wh ite tnd ne African-American applicant received a callback "2W+ 2B "is the percent of ads for which exactly two white applica and two African-American applicants received a callback "Equal Treatment"is defined as the sum of "No Callback, ""Iw IB,and"2W+ 2B. Iw OB"is the percent of ads for which exactly one White applicant and no African-American call back. 2W+ OB"is the percent of ads for which excatly two white applica frican-American applicant received a callback " 2W+ 1B"is the percent of ads for which exactly two White applicants and e African-American applicant received a callback. "Whites Favored"is defined as the sum of received a callback, "2B+ ow"is the percent of ads for which exactly two African-American applicants and no whit applicant received a callback. 2B+ IW"is the percent of ads for which exactly two African-American applicants and White applicant received a callback. African-Americans Favored"is defined as the sum of"1B +ow, 2B+ ow, "and 2B+ Iw. "In brackets in each cell is the number of employment ads in that cell. " Ho: WF= wB"reports the p-value for test of symmetry between the proportion of employers that favor White names and the proportion of employers that favo African-American names no callbacks are recorded(83 percent of the employment ads we respond to, we send four ads). Whites are favored by nearly 8.4 percent different resumes: two higher-quality and two of the employers, with a majority of these em- lower-quality ones. Table 3 gives a better sense loyers contacting exactly one White applicant. of which factors enter into this subjective clas- African-Americans, on the other hand, are fa- sification. Table 3 displays means and standard vored by only about 3.5 percent of employers. deviations of the most relevant resume charac- We formally test whether there is symmetry in teristics for the full sample(column 1), as well the favoring of whites over African-Americans as broken down by race(columns 2 and 3)and and African-Americans over Whites. We find resume quality(columns 4 and 5). Since appli that the difference between the fraction of em- cants' names are randomized, there is no differ ployers favoring Whites and the fraction of ence in resume characteristics by race. Columns employers favoring African-Americans is sta- 4 and 5 document the objective differences tistically very significant(p =0.0000) tween resumes subjectively classified as high and low quality. Higher-quality applicants have B. Do African-Americans Receive Different on average close to an extra year of labor mar- Returns to Resume Quality? ket experience, fewer employment holes(where an employment hole is defined as a period of at Our results so far demonstrate a substantial least six months without a reported job),are gap in callback based on applicants'names. more likely to have worked while at school, Next, we would like to learn more about the and to report some military experience. Also, factors that may infuence this gap. More spe- higher-quality applicants are more likely to ally, we ask how employers respond to im- have an e-mail address, to have received some provements in African-American applicants' honors, and to list some computer skills and credentials, To answer this question, we exam- other special skills (such as a certification ine how the racial gap in callback varies by degree or foreign language skills) on their re- resume quality sume. Note that the higher- and lower-quality As we explained in Section Il, for most of the resumes do not differ on average with regard to
VOL. 94 NO. 4 BERTRAND AND MULLAINATHAN: RACE IN THE LABOR MARKET 999 Equal Treatment: No Callback 88.13 percent 83.37 [1,1661 Whites Favored (WF): [1,1031 1W + OB 8.39 percent 5.59 [I111 African-Americans Favored (BF): [741 1B + OW 3.48 percent 2.49 [461 Ho: WF = BF [331 p = 0.0000 Notes: This table documents the distribution of callbacks at the employment-ad level. "No Callback" is the percent of ads for which none of the fictitious applicants received a callback. "1W + 1B" is the percent of ads for which exactly one White and one African-American applicant received a callback. "2W + 2B" is the percent of ads for which exactly two White applicants and two African-American applicants received a callback. "Equal Treatment" is defined as the sum of "No Callback," "1W + lB," and "2W + 2B." "1W + OB" is the percent of ads for which exactly one White applicant and no African-American applicant received a call back. "2W + OB" is the percent of ads for which excatly two White applicants and no African-American applicant received a callback. "2W + 1B" is the percent of ads for which exactly two White applicants and one African-American applicant received a callback. "Whites Favored" is defined as the sum of "1W + OB," "2W + OB," and "2W + 1B." "1B + OW is the percent of ads for which exactly one African-American applicant and no White applicant received a callback. "2B + OW" is the percent of ads for which exactly two African-American applicants and no White applicant received a callback. "2B + 1W" is the percent of ads for which exactly two African-American applicants and one White applicant received a callback. "African-Americans Favored" is defined as the sum of "1B + OW," "2B + OW," and "2B + 1W." In brackets in each cell is the number of employment ads in that cell. "Ho: WF = WB reports the p-value for a test of symmetry between the proportion of employers that favor White names and the proportion of employers that favor African-American names. no callbacks are recorded (83 percent of the employment ads we respond to, we send four ads). Whites are favored by nearly 8.4 percent different resumes: two higher-quality and two of the employers, with a majority of these em- lower-quality ones. Table 3 gives a better sense ployers contacting exactly one White applicant. of which factors enter into this subjective clasAfrican-Americans, on the other hand, are fa- sification. Table 3 displays means and standard vored by only about 3.5 percent of employers. deviations of the most relevant resume characWe formally test whether there is symmetry in teristics for the full sample (column I), as well the favoring of Whites over African-Americans as broken down by race (columns 2 and 3) and and African-Americans over Whites. We find resume quality (columns 4 and 5). Since applithat the difference between the fraction of em- cants' names are randomized, there is no differployers favoring Whites and the fraction of ence in resume characteristics by race. Columns employers favoring African-Americans is sta- 4 and 5 document the objective differences betistically very significant (p = 0.0000). tween resumes subjectively classified as high and low quality. Higher-quality applicants have B. Do African-Americans Receive Different on average close to an extra year of labor marReturns to Resume Quality? ket experience, fewer employment holes (where an employment hole is defined as a period of at Our results so far demonstrate a substantial least six months without a reported job), are gap in callback based on applicants' names. more likely to have worked while at school, Next, we would like to learn more about the and to report some military experience. Also, factors that may influence this gap. More spe- higher-quality applicants are more likely to cifically, we ask how employers respond to im- have an e-mail address, to have received some provements in African-American applicants' honors, and to list some computer skills and credentials. To answer this question, we exam- other special skills (such as a certification ine how the racial gap in callback varies by degree or foreign language skills) on their reresume quality. sume. Note that the higher- and lower-quality As we explained in Section 11, for most of the resumes do not differ on average with regard to