Journal of Economic Literature Vol. XLII (December 2004) pp. 1009-1055 Field Experiments GLENN W. HARRISoN and JOHN A. LIST 1. Introduction experimental environment. We do not see the notion of a "sterile environment" as a Iin e e of an pet negative, provided one recognizes its role in Every researcher who behaves as if an exoge- the research discovery process. In one nous variable varies independently of an error sense, that sterility allows us to see in crisp term effectively views their data as coming relief the effects of exogenous treatments on from an experiment. In some cases this belief behavior. However, lab experiments in isola- tion are necessarily limited in relevance for is a matter of a priori judgement; in some cases it is based on auxiliary evidence and predicting field behavior, unless one wants inference; and in some cases it is built into the to insist a priori that those aspects of eco- design of the data collection process. But the nomic behavior under study are perfectly distinction is not always as bright and clear general in a sense that we will explain. Testing that assumption is a recurring difficul- Rather, we see the beauty of lab experi- ty for applied econometricians, and the search ments within a broader context-when they always continues for variables that might bet- are combined with field data, they permit ter qualify as truly exogenous to the process sharper and more convincing inference. under study. Similarly, the growing popularity In search of greater relevance, experi- of explicit experimental methods arises in mental economists are recruiting subjects in large part from the potential for constructing the field rather than in the classroom, using the proper counterfactual. field goods rather than induced valuations, Field experiments provide a meeting ground and using field context rather than abstract between these two broad approaches to empirical economic science. By examining the 2 When we talk about combining lab and field data, we nature of field experiments, we seek to make it do not just mean a summation of conclusions. Instead, we have in mind the two complementing each other in some a common ground between researchers. functional way, much as one might conduct several lab We approach field experiments from the experiments in order to tease apart potential confounds perspective of the sterility of the laboratory For example, James Cox(2004) demonstrates nicely how 'trust" and"reciprocity" are often confounded with "other egarding preferences," and can be better identified sep- Harrison: Department of Economics, College of arately if one undertakes several types of experiments Business Administration, University of Central Floridag with the same population. Similarly. Alvin Roth and List: Department of Agricultural and Resource Economics Michael Malouf(1979) demonstrate how the use of dollar and Department of Economics, University of Maryland payoffs can confound tests of cooperative game theory and NBER. We are grateful to Stephen Burks, Colin with less information of one kind (knowledge of the utili- Camerer, Jeffrey Carpenter, Shelby Gerking, R. Mark ty function of the other player), and more information of Isaac, Alan Krueger, John McMillan, Andreas Ortmann another kind (the ability to make interpersonal compar Charles Plott, David Reiley, E. Elisabet Rutstrom, isons of monetary gain), than is usually assumed in the Nathaniel Wilcox, and the referees for generous comments leading theoretical prediction. 1009
1010 Journal of Economic Literature,Vol.XLII (December 2004 terminology in instructions.3 We argue that Our second point is that many of the char there is something methodologically funda- acteristics of field experiments can be found mental behind this trend.Field experiments in varying,correlated degrees in lab experi differ from laboratory experiments in ma ments.Thus,many of the characteristics that imply mpting to view field people identify with field experiments are not only found in field experiments,and nts.w argu should not be used to differentiate them e them from lab experiments. "in lab tory exper Our third point,following from the first two is that th ere is much to learn from field t be precisely the oppo if it is arti icial to the subject or context of the returning to the lab.The task.In the en we se e field experiments as methodologically comple entary to rol in the field are traditional laboratory experiments. f the Our primary point is that dissecting the en e lab.T characteristics of field experiments help field experimen s can nelp one c gn bett define what might be better called an ideal and have a methodological exneriment in the sense that one is able to role quite apart from their complementarity observe a subject in a controlled setting but at a substantive level where the subiect does not n ceive In section 2 we offer a typology of field the ontrols as being un dth is n experiments in the literature,identifying the de otion bei key characteristics defining the species.We idea tha At first blush sub ects in setting et have eriments or more se contr ctory,t ague thgh accurately to identify different characteris. We explain this jargon n from experimental economic a bright line to define ne field and other Murphy (1973 set of hat expe egrees in a field sub ents in a e propos d to de ermine the ield context of an h the cl brea experiment:t nature of thes ect poo the nature of the information that the sub group a fr jects bring to the task,the nature of the com modity,the nature of the task or trading rules applied,the nature of the stakes,and the environment in which the subjects operate. ud not seem unnatural to th Having identified what defines a field exper oing what come iment.in section 3 we put experiments in Along s omists can identify see Ed es to in ch a the ing-Re (1) d auctions in tute field ments.I through an of rev and we four mor single-unit auction format the des broad types of field eaknesse
Harrison and List:Field Experiments 1011 literature review is necessarily selective material,language,animal,etc.,and not in although List(2004d)offers a more complete the laboratory,study,or office."This orients bibliography. us to think of the natural environment of the In sections 7 and 8 we review two types of different components of an experiment. It is impe ortant to identify what factors xperiments.One is called a social p riment so that we that it is a delib a what facto cial re di peri ma mot ivate ample when pe of some governmen progr n is implemented periment that diffe They have become popular in ce areas counterpart lab experiments o f Ronald such as employment schemes and the detec ummings,Glenn Harrison,anc Laura tion of discrimination.Their disadvantages Osborne (1995)and Cummings and Laura have been well documented.given thei Tavlor (1999).what explains the differencer political popularity,and there are several Is it the use of data from a particular marke important methodological lessons from those whose participants have selected into the debates for the design of field experiments. market instead of student subjects;the use The other is called a"natural experiment. of subjects with experience in related tasks: The idea is to recognize that som event that the use of private sports-cards as the under naturally occurs in the field ha ens to have odity instead of an environme som e of the characteristics of a field e lying co tal wd. the of streamlined me nt The an be attra ctive sc sof dat ntal thods the less-intrusive e at e cost due to mun some combin and simila cing the certai Finally,in section S we briefly exa related types of experiments of the mind.In ed.)defines the one case these are the thought exper tion ments"of theorists and statisticians.and in rgnltcgetonofohowyome the other they are the ord means "neuro-economics experiments" provided by technology. The dane things objective is simply to identify how they differ minded that d to be son 2.Defining Field Experiment ethingo y on to bes Th e are several ways to define One is asc ormal d b up ir onary.An Eve n tha identify w t it is that you want the I to differen o阳ere is the lab The Oxf ord English Dictionary (Secon Edition)defines the word"field"in the fol ory of the in publi policy.If th is son lowing manner: "Used attributively to hat rem denote an investigation,study,etc.,carried optimum,itdoe natic out in the natural environment of a given to the optimum
1012 Journal of Economic Literature,Vol.XLII (December 2004 differences?We believe field experiments Blackburn,Harrison,and have mature d to the point that some frame. Rutstrom 1994).Alternatively,the subject work for addressing such differences in a pool can be designed to represent a target systematic manner is necessary. population of the econom (e.g.,traders at 2.1 Criteria that Define Field Experiments the Chicago Board of Trade in Michael Haigh and John List 2004)or the general Running the risk of oversimplifying what oulation (e.g.,the Danish population in is inherently a multidimensional iss we rison.Morton Igel Lat. and Melonie Williams 2002). riment addition. the nature of the subject po monstandard subject p rience with the commodit ·the nat of the info mation that the the tasl exper apar the e nature o the task In the mg ce of the or trading rules to their trading activities in applied addition to their knowledge of e trading the nature of the stakes,and institution In abstract settings the impo the nature of the environment that the tance of this information is diminished.by subject operates in. design.and that can lead to behavioral We recognize at the outset that these changes.for example.absent such informa characteristics will often be correlated to tion.risk aversion can lead to subjects varying degrees.Nonetheless.they can be requiring a risk premium when bidding for used to propose a taxonomy of field experi objects with uncertain characteristics. ments that will,we believe,be valuable as The commodity itself can be an ir een lab and field experi tant part of the field.Re mental results become common ent years seen ents er can be ed as the used by valua g000 ha menters ions ove rtual g00 ie ce sa when n physical s or use s field subje ts,th abstra ly de goods latte as nonstandar in this have been the staple of xperimental eco e.But we a argu that the use of onstar nomics since Edward Chamberlin (1948) and Vernon Smith(1962),but impo ses ar ify the experiment as a field experiment.The artificiality that could be a factor influenc experiments of Cummings,Harrison,and E ing behavior.'Such influences are actually Elizabet Rutstrom(1995),for example,used of great interest,or should be.If the nature individuals recruited from churches in order of the commodity itself affects behavior in to obtain a wider range of demographic that is not accounted for by thethe characteristics than one would obtain in the has at standard college setting.The im of be st a lin domain of a a nonstandard subiect aries fron should be aw t worse ment to nt.in this ply false In on y pro et of soc cha Smith(962sgi3t mportan when dev loping statistica gh Smith (1962)does models to adjust for hypothetical bias p.121)in which monetary payoffs were employed
Harrison and List:Field Experiments 1013 understand the limitations of the generality this is an important component of the inter- of theory only via empirical testing. play between the lab and field.Early illus- Again,however,just having one field char trations of the value of this approach include acteristic,in this case a physical good,does David Grether,R.Mark Isaac,and Charles not constitute a field expe riment in any fun Plott [1981,1989],Grether and Plott [1984] damental se nse.Rutstrom (1998)sold lots and James Hong and Plott [19821. and lots of h truffles in a lab h nature of the stakes also affec ren ake in the laboratory designed to diffe th was svery much a lab expe er n the e field. we an eff tastiness commodi imilarly,Iar vior.If val Bateman et al.(1997)elicite I valuat ons ove when they are in the tens of ars,or in the pizza and dessert vouchers for a local restau ut are made in er ently when rant.While these commodities were no the price is less than one dollar,laboratory or actual pizza or dessert themselves,but field experiments with stakes below one do vouchers entitling the subiect to obtain lar could easily engender imprecise bids.Of them,they are not abstract.There are many course,people buy inexpensive goods in the r o omme field as well,but the valuation process they se might be keved to different stake levels The nature of the task that the subject is Alternatively,field experiments in relatively being asked to undertake is an in portant ortunity to eval epect that sino omtafan s within a n budget role in helping Th The vior vironme s of gel and Dan Levin illustrate th can provide super-expe heuristic a lab sett g might 0 experimenters always wondered inexperiencec terms of their whether the use of classrooms might gen propensity to fall prey to the winners' curse der role-playing behavior,and indeed this is An important question is whether the suc one of the reasons experimental economists cessful heuristics that evolve in certain field are generally suspicious of experiments settings“travel”to the other field and lab without salient monetary rewards. Even settings (Harrison and List 2003).Another with salient rewards however environmen- aspect of the task is the specific parameteri tal effects could remain.Rather than view zation that is adopted in the e then as uncontrolled effects,we see them as One can nt worthy of controlled study. m the so as study lab be or in a field 2.2 A Proposed Taxonomy is of of field n-specifi vior can always be ors tha diffe e exper convention again,Lis lab experiment (on There Is some owever,in having ive power of broad terms to differentiate wha we see a ura义wo the key differences.We propose the following terminology: a conventional lab experiment is one
1014 Journal of Economic Literature,Vol.XLII(December 2004 impr nnd that employs a standard subject pool of extent of discrimination in the sports-card marketplace. set of rules; an artefactual field experiment is the 3.Methodological Imp ortance of Field same as a conventional lab experime Experiment but with a nonstandard subject pool:1 a framed field experiment is the same as an artefactual field experiment but with Fieldpecause they mechanicologically ecause they mechanically f field context in either the co pay attention to issues tha great researcher seem task,or information set that the subiects to intuitively address.These issues cannot be comfortably can use. forgotten in the field,but they are of more ·a general importance. The goal of any evaluation method for "treatment effects"is to construct the prop- these task nere the subjects dokow er counterfactual,and economists have that they are in an experiment. rnative methods We recognize that any such taxonom of nstructing the counterfactual:co leaves gaps,and that certain studies may not fall neatly into our classification scheme trolled experir ments,natural (PSM) Moreover,it is often appropriate to con- nstr duct several types of experiments in order to nd t 3 identify the issue of interest.For example approache with treatme e y as the th ment.and le me w tho Harrison and List(2003)conducted artefac tual field experiments and framed field treated and eriments with the same subject pool.pr The treatment cisely to identify how well the he effect for unit i can then be measured as istics might naturally t=y-yo.The major problem,however,is the latte one of a missing counterfactual:t.is unknown It we could observe the outcome for an untreated observation had it been conducted ctual,fr and natur treated,then there is no evaluation problem experiments to investigate the nature and Controlled" experiments.which include laboratory experiments and field experi ments esent the most onvincing method of creating the counterfa al,sinc side trol group ents the popu field with We simplify by c the lo of peter Boh ontin of st albeit wit ics.one the Vicl might have some measure of risk ave ne pre run in which the ntrol cking-Reiley (2002
Harrison and List:Field Experiments 1015 average treatment effect is given by individuals with the same value for these fac- y*o where y*and y*are the treat- tors will display homogenous responses to ed and nontreated Iaverage outcomes afte the treatment,then the treatment effect can the treatment.We have much more to be measured without bias.In effect,one about controlled exp eriments,in particulan use statistical methods to identify which field experime ts.bel individuals are s lab rat Na experiments"the treat- for th eatme ent and find a na fo rmally.th tha th compar Z)∈(0,l,where terenc outcome lenotes independence after rfor the treate group with the before Another alterative to the DID model is and after outc nes for the nontreated group. the use of instrumental variables(IV),which Estimation of the treatment effect takes the approaches the structural econometric form Y=XB+tT.+n.where i indexes the method in the sense that it relies on exclusion unit of observation.t indexes vears.y.is the restrictions (Joshua D.Angrist,Guido W outcome in cross-section i at time t,X is a Imbens and Donald B Bubin 1996.and vector of controls,T.is a binary variable Joshua D.Angrist and Alan B.Krueger 2001) n=a++E.and t is the difference-in-dif The IV method,which essentially assume ferences(DID)ave age treatment effect.If that some components of the no we assume that data exists for ntal data are he then t=(y* utilized for ome 6 g 2000).Th to find variab of theg nd Kenn ed from proa the e outcome equation ut whi -specific sh ch is relate to unit to ent status and has no direct association are correlated with treatment status with the outcome.The weakness of the IV and that selection into treatment is inde approach is that such variables do not often pendent of temporary individual-specific exist.or that unpalatable assumptions must effect:E(maXwD)=E(aIX,D)+入,If be maintained in order for them to be used to E,and t are related,DID is inconsistently identify the treatment effect of interest. estimated as E(t)=t+E(Er-Eo D=1) A final alternative to the DID model is -E(e,-£oD=0) structural modeling.Such models often entail One alte native method of assessing the a heavy mix of identifying restrictions (e.g. impact of the treatment is the method of ensity and dor 1(1983) ed in eaker ethod has be nsively the debat nd ence nt effec uld have been had the that sele 100g and Te that the di Smith and Petra Todd(2000). en matchi goa dbymay limit the use PSM is to make no perimental data baum and(a9s3, 9s4)sh d that matchn like" experimental data.The intuition to g behind PSM is that if the researcher can treated p(Z). the select observable factors so that any two
1016 Journal of Economic Literature,Vol.XLII (December 2004 separability),impose structure on technology could be applied to real people,but to actu- and preferences (e.g.,constant returns to ally do so entails some serious and often scale or unitary income elasticities).and sim unattractive logistical problems. plifying assumptions about equilibrium out- A more substantial response to this criti- comes (e.g.,zero-profit conditions defining cism is to consider what it is about students equilibrium industrial structure).Perhaps the that is viewed.a priori.as being nonrepre best-known class of such structural models is sentative of the target population.There are putable general equilibrium models. at least two issues here.The first is whether which have been extensively a endogenous sample selection or attrition has ate trade policies,for example ncourred due to incom lete control ove relies on c estimation strategi so that the eters that mple is sta simulatio sen e.g. onsistent sitivity analys trea al mode n this s se ether can be inform een the c ative on the behav or the population, experimenta tax and assuming away sample selection issues. policies (R.Blun ll and Thomas MaCurdy 4.2 Sample Selection in the Field 1999:and Blundell and M.Costas Dias 2002) Conventional lab experiments typically 4.Artefactual Field Experiments use students who are recruited after being told only general statements about the 4.1 The Nature of the Subject Pool experiment.By and large.recruitment pro A common criticism of the relevance of cedures avoid mentioning the nature of the inferences drawn from laboratory experi- task,or the expected earnings.Most lab ments is that one needs to undertake an xperiments are also one-shot.in the experiment with"real"people,not students that they do ot involve ated obse This criticism is often deflected by experi of a mpl ect to menters with the following imperative:if you eith ed to of the think that the experiment will generate differ If one w with spec ent results with"real"p ople,then go ahead rest i a task ould be o do and run the expe le.A (e.g.,Peter m and Han ind 1993).An ant of this is to challer the if one wanted to recruit subjects for several ics'asse tudents ses super-expenenced or to conduct pre-tests of such trust.or other an things as risk aversion st response,to suggest regarding preferences that could be built the experiment with real people into the design as well often adequate to get rid of unwanted refer One concern with lab experiments con ees at academic journals.In practice.howev ducted with convenience san oles of students er,few experimenters ever examine fielo behavior in a serious and large-sample way It is relatively easy to say that the experiment onard reen (19)for dramati theory to organ For example.the Pl king and David Tarr (197 or exampe John Kagel and Dan Levin( Harrison and H.D.Vinod (1992)
Harrison and list:Field Experiments 1017 is that students might be self-selected in allows one to remove this recruitment bias some way,so that they are a sample that from the resulting inference excludes certain individuals with cha racteris Some field ex eriments face a more seri tics that are nts of sa selection tha unde dep problem on the of the task.Once th ev ent has be it is not a importanc he el inf n flo overemphas lways possible to sim th of the .Th matte degree can ead to endoge pop lation are not represented,at nous subject attrition from the experiment least under the tentative assumptio that it is uch attrition is actually informat ive about only observables that matter.In this case it subject preferences,since the subject's exit would behoove the researcher to augment from the experiment indicates that the sub the initial convenience sample with a quota ject had made a negative evaluation of it sample,in which the missing strata were sur (Tomas Philipson and Larry Hedges 1998) veved Thus one tends not to see many con The classic problem of sample selection victed mass murderers or brain surge ons in refers to possible recruitment biases.such that the c where to natur of the ther n in our sa rm2naeraton ent.Th fo Another proble, of any exper pe the pos is th recruitme typ treatmen dures his issue is popu whi s ar es the role o ruited has c vers and plau recruitment fees in biasing the sample sibly expects s the experiment to ave some subjects that are obtained Ihe context for element rand then the her experiment is particularly relevant here observed sample will tend to look less risk- since it entails the elicitation of values for a averse than the population. It is easy to private commoditv She finds that there are imagine how this could then affect behavior some significant biases in the strata of the differentially in some treatments lames population recruited as one varies the Heckman and leffrey Smith (1995)discuss recruitment fee from zero dollars to two dol- this issue in the context of social experi- lars,and then up to ten dollars.An im ortant me nts,but the concern applies equally to finding how r is that field and lab experiments. a s del 4.3 Are Students Different? them. ects and th This wortditdOGo addre ssed in In a group subjects e tre ent ichte stein and Pa ment has 60 cent fem les and the ot (1973) and Penny Burns (1985 sample of subjects in another trea ment has lenn Harrison and James Lesley (1996 only 40 percent females,provided one con (HL)approach this question with a simple trols for the difference in gender when pool statistical framework.Indeed,they do not ing the data and examining the key treatment consider the issue in terms of the relevance effect.This is a situation in which gende might influence the response or the effect of oercholegoaemtche 2 the treatment,but controlling for gender
1018 Journal of Economic Literature,Vol.XLII (December 2004 of experimental methods,but rather in the bject was asked whether he or she terms of t the relevance of convenience sam would be willing to pay $X towards a public ples for the contingent valuation method good,where $X was randomly selected to However,it is easy to see that their methods be $10,$30,$60,or $120.A subject would apply much more generally. respond to this question with a "yes,"a The HL approach may be explained in “no,”ora"not sure.”A simple statistical terms of their attempt to mimic the results model is developed to explain behavior as a of a large-scale national survey conducted for the Exxon Valdez oil-spill litigation.A hanerc observable socioeconomic major national survey was undertaken in this case by Richard Car on et al.(1992)for the oped.HL the ceeded to the eral of the state of Alaska Thi ed ther stat -of-th e sta mor y for preve on this is the HL asked apply to t can obtain is simply mai the sam resu ts us then the statistical mode may be ience sample of stud used to predict the behavior of the target of South Carolina.Using students as a con population if one can obtain information venience sample is largely a matter about the socioeconomic characteristics of methodological bravado.One could readily the target population. obtain convenience samples in other ways The essential idea of the hl method is but using students provides a tough test of hTipnoe oroceeded by developing a sim sentative in the sense of allowing the survey instrument than the one used in the rcher to develon a d"statistical of this is pu model of the behavi or under the of the a not essential to the method.Th at se pi available of the survey wa to a relative the beha of student with stude ta tha rvey th a nts"is the to contro t attributes,is the col variability in their socio-demographic char lection of a range of tandard socioeconom acteristics,not necessarily the unrepresen- ic characteristics of the individual (e.g.,sex tativeness of their behavioral responses age,income,parental income,household conditional on their socio-demographic size,and marital status).Once these data characteristics. are collated,a statistical model is developed To the extent that student samples exhibit in order to explain the key responses in the limited variability in some key characteris- survey.In this case the key re nonse is a tics.such as age.then one might be wary of the veracity of the maintained assumption ce valuation question. In other words volved here.Howey r the sample do e to look like the order for the statistical n odel to be an adequate e one 4 Th xact form of that statistical odel is not impe (9)for et for of this method