Cecilia Hyunjung Mo and Katharine M.Conn closeness to Hispanics by student population is statis- fects when we conducted an intent-to-treat (ITT)anal- tically meaningful (p <0.001). ysis in Table E.8 in Online Appendix E.In other words Our results also indicate that these effects are long- when the treatment assignment is based upon admis- lasting,given that the estimated effects are the aver- sion,the "treatment"group also includes those that age effects for participants six months to seven years were assigned to receive the treatment but did not.If after the completion of TFA service,and the robust- participation causes shifts in attitudes and beliefs on ness of our effects are not sensitive to the exclusion the dimensions we are interested in,we would expect of more recent cohorts.For example,when we exam the inclusion of nonmatriculants to result in an attenu- ine the cohort-by-cohort effects of our racial injus- ation of our effects.Indeed,when we look at the ITT tice measure,we find that the impact of participation effect sizes rather than the treatment-on-the-treated in TFA on the reduction of racial resentment ranges (TOT)or complier average treatment effect sizes,each from 6.4 to 15.4 percentage points in magnitude (see of the ITT effect sizes are smaller than the TOT effect Figure E.8(a)in Online Appendix E).The largest ef- sizes by 0.20 to 1.8 percentage points.However,with fect is for the 2013 cohort;however,we do not see the majority of those assigned to the "treatment"group strong evidence of a decay effect.Notably,when we ex- receiving the treatment,statistical significance (or in- amine the Skin-Tone Implicit Association Test,we see significance)for each outcome never changes. that the reduction in implicit racial prejudice becomes Second.we focused on those who were admitted into slightly stronger over time (see Figure E.8(b)in On- TFA only,and compared matriculants to nonmatricu- line Appendix E).32 As noted by Paluck(2016)in her lants.If applicant scores are such that you need to have meta-analyses of prejudice research,while there are particular ideological preconceptions to be admitted. very few studies of real interventions that reduce prej- by focusing on those who were admitted into the pro- udice,there are even fewer that examine long-term ef- gram,our study population only contains those who fects,where even just three months is considered long- have these preconceptions.We conducted regression term.By examining the impact of service on partici analyses of each outcome measure on the admitted ap- 4r元 pants at least six months after program participation, plicant's matriculation decision with controls for the we contribute to a relatively scant but important body applicant's selection score,all observable demographic of causal research on the long-term effects of interven- characteristics,and application year.We find that the tions on prejudice reduction. relationship between matriculation and each outcome measure of interest is identical to what we see in our ROBUSTNESS CHECKS RDD analyses.Moreover,these relationships are sta- tistically meaningful for 25 of the 26 outcome measures To assess the robustness of our findings,we conduct a (see Figure E.10 in Online Appendix E). number of tests.We begin by reexamining the racial Third,we leverage the fact that we have access to one prejudice questions on closeness.First,there is no rea- component of the overall selection score that is likely son to believe that participation in a service program correlated with perspective-taking:demonstrating"re- like TFA,which focuses on education,would have any spect for individuals'diverse experiences and the abil- impact on attitudes toward the elderly community or ity to work effectively with people from a variety of Christian community.As a placebo test,we included backgrounds."33 Reassuringly,we find that there is no S5.501g “the elderly'”and“Christians'”as groups in the bat-- meaningful difference in this score between those who tery of questions of what groups with which an in- were barely admitted and barely rejected (p=0.804). dividual feels "particularly close."Reassuringly,TFA Fourth,we take advantage of the data we have on participation does not alter feelings of closeness to the current participants to assess whether the observed are elderly (-3.49 percentage points,p =0.249)or Chris- detectable at the outset.While there is no data on par- tians (-0.17 percentage points,p=0.969;see Figure ticipants before they begin the two-year program,we E.9 in Online Appendix E). can take advantage of data we have on individuals who While it is highly unlikely that applicants right be- have participated in the program for fewer than six low the cutoff and applicants right above the cutoff months-the 2015 cohort between the months of Oc- meaningfully differed at the pretreatment stage on ide- tober and December.As shown in Figure E.11 in On- ological perspectives that correlate with our outcome line Appendix E,when we examine the effect size of measures,we conduct four robustness checks to ad- TFA participation for those who received a smaller dress concerns that our fuzzy RDD approach results "dose"of the program,we see evidence that the ef- are biased.First,we examined the average causal ef- fects that we see did not exist pretreatment.34 For those who only began receiving the treatment,effect sizes are never statistically meaningful.One could note that this is Hispanic(black)and the reported effects when the minority of the student population is Hispanic(black)because the pooled analyses is an issue of statistical power.However,apart from two includes observations that were dropped in the subgroup analyses questions out of the 26 outcomes we consider,differ- ences between participants and nonparticipants are ei- L gEi品oseioconnelyooRieilhyenicantoreachcoiort ther closer to 0 and/or of an opposite direction than our However,we are underpowered to detect effects when we examine each cohort separately;the RDD approach is data-intensive,as it fo- cuses on individuals close to the cutoff.By mapping the effects by 33 This description was provided by TFA's admissions team. each cohort,we see that one recent cohort is not responsible for the 34 When we created an index combining variables,we only consider pooled effect across multiple cohorts. the index for this analytical exercise. 736Cecilia Hyunjung Mo and Katharine M. Conn closeness to Hispanics by student population is statistically meaningful (p < 0.001). Our results also indicate that these effects are longlasting, given that the estimated effects are the average effects for participants six months to seven years after the completion of TFA service, and the robustness of our effects are not sensitive to the exclusion of more recent cohorts. For example, when we examine the cohort-by-cohort effects of our racial injustice measure, we find that the impact of participation in TFA on the reduction of racial resentment ranges from 6.4 to 15.4 percentage points in magnitude (see Figure E.8(a) in Online Appendix E). The largest effect is for the 2013 cohort; however, we do not see strong evidence of a decay effect. Notably, when we examine the Skin-Tone Implicit Association Test, we see that the reduction in implicit racial prejudice becomes slightly stronger over time (see Figure E.8(b) in Online Appendix E).32 As noted by Paluck (2016) in her meta-analyses of prejudice research, while there are very few studies of real interventions that reduce prejudice, there are even fewer that examine long-term effects, where even just three months is considered longterm. By examining the impact of service on participants at least six months after program participation, we contribute to a relatively scant but important body of causal research on the long-term effects of interventions on prejudice reduction. ROBUSTNESS CHECKS To assess the robustness of our findings, we conduct a number of tests. We begin by reexamining the racial prejudice questions on closeness. First, there is no reason to believe that participation in a service program like TFA, which focuses on education, would have any impact on attitudes toward the elderly community or Christian community. As a placebo test, we included “the elderly” and “Christians” as groups in the battery of questions of what groups with which an individual feels “particularly close.” Reassuringly, TFA participation does not alter feelings of closeness to the elderly (−3.49 percentage points, p = 0.249) or Christians (−0.17 percentage points, p = 0.969; see Figure E.9 in Online Appendix E). While it is highly unlikely that applicants right below the cutoff and applicants right above the cutoff meaningfully differed at the pretreatment stage on ideological perspectives that correlate with our outcome measures, we conduct four robustness checks to address concerns that our fuzzy RDD approach results are biased. First, we examined the average causal efis Hispanic (black) and the reported effects when the minority of the student population is Hispanic (black) because the pooled analyses includes observations that were dropped in the subgroup analyses due to missing student population data. 32 Effects are not consistently statistically significant for each cohort. However, we are underpowered to detect effects when we examine each cohort separately; the RDD approach is data-intensive, as it focuses on individuals close to the cutoff. By mapping the effects by each cohort, we see that one recent cohort is not responsible for the pooled effect across multiple cohorts. fects when we conducted an intent-to-treat (ITT) analysis in Table E.8 in Online Appendix E. In other words, when the treatment assignment is based upon admission, the “treatment” group also includes those that were assigned to receive the treatment but did not. If participation causes shifts in attitudes and beliefs on the dimensions we are interested in, we would expect the inclusion of nonmatriculants to result in an attenuation of our effects. Indeed, when we look at the ITT effect sizes rather than the treatment-on-the-treated (TOT) or complier average treatment effect sizes, each of the ITT effect sizes are smaller than the TOT effect sizes by 0.20 to 1.8 percentage points. However, with the majority of those assigned to the “treatment” group receiving the treatment, statistical significance (or insignificance) for each outcome never changes. Second, we focused on those who were admitted into TFA only, and compared matriculants to nonmatriculants. If applicant scores are such that you need to have particular ideological preconceptions to be admitted, by focusing on those who were admitted into the program, our study population only contains those who have these preconceptions. We conducted regression analyses of each outcome measure on the admitted applicant’s matriculation decision with controls for the applicant’s selection score, all observable demographic characteristics, and application year. We find that the relationship between matriculation and each outcome measure of interest is identical to what we see in our RDD analyses. Moreover, these relationships are statistically meaningful for 25 of the 26 outcome measures (see Figure E.10 in Online Appendix E). Third, we leverage the fact that we have access to one component of the overall selection score that is likely correlated with perspective-taking: demonstrating “respect for individuals’ diverse experiences and the ability to work effectively with people from a variety of backgrounds.”33 Reassuringly, we find that there is no meaningful difference in this score between those who were barely admitted and barely rejected (p = 0.804). Fourth, we take advantage of the data we have on current participants to assess whether the observed are detectable at the outset. While there is no data on participants before they begin the two-year program, we can take advantage of data we have on individuals who have participated in the program for fewer than six months—the 2015 cohort between the months of October and December. As shown in Figure E.11 in Online Appendix E, when we examine the effect size of TFA participation for those who received a smaller “dose” of the program, we see evidence that the effects that we see did not exist pretreatment.34 For those who only began receiving the treatment, effect sizes are never statistically meaningful. One could note that this is an issue of statistical power.However, apart from two questions out of the 26 outcomes we consider, differences between participants and nonparticipants are either closer to 0 and/or of an opposite direction than our 33 This description was provided by TFA’s admissions team. 34 When we created an index combining variables, we only consider the index for this analytical exercise. 736 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:53:05, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000412