Field Experiments GLENN W.HARRISON and JOHN A.LISTI 1.Introduction experimental environment.We do not see In some sense every empirical researcher is the notion of a "sterile environment"as a Lreporting the results of an experiment. negative,provided one recognizes its role in Every rese rcher who behaves as if an exoge. the research discovery process.In one nous variable varies independently of an error sense,that sterility allows us to see in crisp term effectively views their data as coming relief the effects of exogenous treatments on from an experiment.In some cases this belief behavior.However,lab experiments in isola is a matter of a priori judgement;in some tion are necessarily limited in relevance for cases it is based on auxiliary evidence and predicting field behavior,unless one wants inference;and in some cases it is built into the to insist a priori that those aspects of eco design of the data collection process.But the nomic behavior under study are distinction is not always as bright and clear Testing that assumption is a rec rring difficul- Rather.we see the beauty of lab appliedc ometricians.and the search ments within a br ty for context-when sha th fiel data,they pe tor mualify as truly pe and m under study.Similarly.the of ate 1 of explicit ime th ential for constructing rather thar n in the iting subject the tha b82ercm pd anc contex at ha bstract between thes approa empiric cience.By examining the nature of field exp ments,we seek to make it a common groun tween researchers field experim nts from the perspective of the sterility of the laboratory cely he and CcprociyareotienCeoeounmdedltwih】 one undertak Alvin Rot d Re Michael Malouf (1)de nonstrate how the use and with less ation of one kindg of th eas Ortmanr othe kind (the ab to make 1009 This content downl
Journal of Economic Literature Vol. XLII (December 2004) pp. 1009-1055 Field Experiments GLENN W. HARRISON and JOHN A. LIST1 1. Introduction n some sense every empirical researcher is reporting the results of an experiment. Every researcher who behaves as if an exoge- nous variable varies independently of an error term effectively views their data as coming from an experiment. In some cases this belief is a matter of a priori judgement; in some cases it is based on auxiliary evidence and inference; and in some cases it is built into the design of the data collection process. But the distinction is not always as bright and clear. Testing that assumption is a recurring difficul- ty for applied econometricians, and the search always continues for variables that might bet- ter qualify as truly exogenous to the process under study. Similarly, the growing popularity of explicit experimental methods arises in large part from the potential for constructing the proper counterfactual. Field experiments provide a meeting ground between these two broad approaches to empirical economic science. By examining the nature of field experiments, we seek to make it a common ground between researchers. We approach field experiments from the perspective of the sterility of the laboratory 1 Harrison: Department of Economics, College of Business Administration, University of Central Florida; List: Department of Agricultural and Resource Economics and Department of Economics, University of Maryland, and NBER. We are grateful to Stephen Burks, Colin Camerer, Jeffrey Carpenter, Shelby Gerking, R. Mark Isaac, Alan Krueger, John McMillan, Andreas Ortmann, Charles Plott, David Reiley, E. Elisabet Rutstrom, Nathaniel Wilcox, and the referees for generous comments. experimental environment. We do not see the notion of a "sterile environment" as a negative, provided one recognizes its role in the research discovery process. In one sense, that sterility allows us to see in crisp relief the effects of exogenous treatments on behavior. However, lab experiments in isola- tion are necessarily limited in relevance for predicting field behavior, unless one wants to insist a priori that those aspects of eco- nomic behavior under study are perfectly general in a sense that we will explain. Rather, we see the beauty of lab experi- ments within a broader context-when they are combined with field data, they permit sharper and more convincing inference.2 In search of greater relevance, experi- mental economists are recruiting subjects in the field rather than in the classroom, using field goods rather than induced valuations, and using field context rather than abstract 2 When we talk about combining lab and field data, we do not just mean a summation of conclusions. Instead, we have in mind the two complementing each other in some functional way, much as one might conduct several lab experiments in order to tease apart potential confounds. For example, James Cox (2004) demonstrates nicely how "trust" and "reciprocity" are often confounded with "other regarding preferences," and can be better identified sep- arately if one undertakes several types of experiments with the same population. Similarly, Alvin Roth and Michael Malouf (1979) demonstrate how the use of dollar payoffs can confound tests of cooperative game theory with less information of one kind (knowledge of the utili- ty function of the other player), and more information of another kind (the ability to make interpersonal compar- isons of monetary gain), than is usually assumed in the leading theoretical prediction. 1009 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
1010 Journal of Economic Literature,Vol.XLII(December 2004) terminology in instructions.3 We argue that Our second p there is something methodologically funda- acteristi be found mental behind this trend.Field experiments in varying. correlat egrees in lab differ from laboratory experiments in many ment nus,many of the characteristics that ways.Although it is tempting to view field people identify with f d experiments are exr eriments as simply less controlled variants not only found in field experiments,and of laboratory experiments,we argue that to do shoul not be used to differentiate them so would be to seriously mischar cterize them from lab experiments What passes for"control"in laboratory expe Our third point,following from the first iments might in fact be cisely the two,is that there is much to learn from field if it is artificial to the subject or contex ptof the experiments when returning to the lab.The task In the end we field unexpected behaviors that occur when one loosens control in the field are often indica- tradi品onal laborat ts entary to tors of key features of the economic transac Ou ecting the tion that have been neglected in the lab.Thus field experiments can help one design better what ca an lab experiments,and have a methodological role ense t quite apart from their complementarity subject in at a substantive level where the ject does not perceive any of In section 2 we offer a typology of field s in the liter the contro s as being unna and there is no deception being practiced.At first blush,the c atur ifying the acteris th idea that one can observe subjects in a natural logy to b ette setting and yet have controls might seem rimen contradictory,but we will argue that it is not. ntify diff nts We e do not propose We explain thi jargon from experimental economics ents s someth else,but ta set of criteria that one wou expect to see in varying degrees in a field experment.we propos actors that can be used to determine the field context of an experiment:the nature of the subject pool the nature of the information that the sub iects bring to the task.the nature of the com traditional lecture. modity,the nature of the task or trading rules other features ofthe applied,the nature of the stakes,and the environment in which the subjects operate Having identified what defines a field exner iment.in section 3 we put experiments in ective.as Along s nomists can identify treatment effects.This serves to remind us king-Reiley()Camerer( S)pla want control and internal validity in all such an not the sst ntemet-b hte field ments.In ctions 4 through and weakn major types of field experiment
Journal of Economic Literature, Vol. XLII (December 2004) terminology in instructions.3 We argue that there is something methodologically funda- mental behind this trend. Field experiments differ from laboratory experiments in many ways. Although it is tempting to view field experiments as simply less controlled variants of laboratory experiments, we argue that to do so would be to seriously mischaracterize them. What passes for "control" in laboratory exper- iments might in fact be precisely the opposite if it is artificial to the subject or context of the task. In the end, we see field experiments as being methodologically complementary to traditional laboratory experiments.4 Our primary point is that dissecting the characteristics of field experiments helps define what might be better called an ideal experiment, in the sense that one is able to observe a subject in a controlled setting but where the subject does not perceive any of the controls as being unnatural and there is no deception being practiced. At first blush, the idea that one can observe subjects in a natural setting and yet have controls might seem contradictory, but we will argue that it is not.5 3 We explain this jargon from experimental economics below. 4 This view is hardly novel: for example, in decision research, Robert Winkler and Allan Murphy (1973) pro- vide an excellent account of the difficulties of reconciling suboptimal probability assessments in artefactual laborato- ry settings with field counterparts, as well as the limitations of applying inferences from laboratory data to the field. Imagine a classroom setting in which the class breaks up into smaller tutorial groups. In some groups a video cov- ering certain material is presented, in another group a free discussion is allowed, and in another group there is a more traditional lecture. Then the scores of the students in each group are examined after they have taken a common exam. Assuming that all of the other features of the experiment are controlled, such as which student gets assigned to which group, this experiment would not seem unnatural to the sub- jects. They are all students doing what comes naturally to students, and these three teaching alternatives are each stan- dardly employed. Along similar lines in economics, albeit with simpler technology and less control than one might like, see Edward Duddy (1924). For recent novel examples in the economics literature, see Colin Camerer (1998) and David Lucking-Reiley (1999). Camerer (1998) places bets at a race track to examine if asset markets can be manipulated, while Lucking-Reiley (1999) uses internet-based auctions in a pre- existing market with an unknown number of participating bidders to test the theory of revenue equivalence between four major single-unit auction formats. Our second point is that many of the char- acteristics of field experiments can be found in varying, correlated degrees in lab experi- ments. Thus, many of the characteristics that people identify with field experiments are not only found in field experiments, and should not be used to differentiate them from lab experiments. Our third point, following from the first two, is that there is much to learn from field experiments when returning to the lab. The unexpected behaviors that occur when one loosens control in the field are often indica- tors of key features of the economic transac- tion that have been neglected in the lab. Thus, field experiments can help one design better lab experiments, and have a methodological role quite apart from their complementarity at a substantive level. In section 2 we offer a typology of field experiments in the literature, identifying the key characteristics defining the species. We suggest some terminology to better identify different types of field experiments, or more accurately to identify different characteris- tics of field experiments. We do not propose a bright line to define some experiments as field experiments and others as something else, but a set of criteria that one would expect to see in varying degrees in a field experiment. We propose six factors that can be used to determine the field context of an experiment: the nature of the subject pool, the nature of the information that the sub- jects bring to the task, the nature of the com- modity, the nature of the task or trading rules applied, the nature of the stakes, and the environment in which the subjects operate. Having identified what defines a field exper- iment, in section 3 we put experiments in general into methodological perspective, as one of the ways that economists can identify treatment effects. This serves to remind us why we want control and internal validity in all such analyses, whether or not they consti- tute field experiments. In sections 4 through 6 we describe strengths and weaknesses of the broad types of field experiments. Our 1010 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
Harrison and List:Field Experiments 1011 literature review is necessarily selective. material,language,animal,etc.,and not in although List(2004d)offers a more complete the laboratory.study.or office."This orients bibliography. us to think of the natural environment of the In sections 7 and 8 we review two types of different components of an experiment. experiments that may be contrasted with It is important to identify what factors ideal field experiments.One is called a social make up a field experiment so that we can experiment.in the sense that it is a deliber nally identify what factors drive ate part of social policy by the g results in different e periments.To provide eriments involve deliberate. a direct example of the type of problem that domized changes in the manner in which motivated t when List (2001)obtains nt ram is imple nented have he rog results in a field They me the coun mer ts of Ro ald e nd the de Harris and Ia nation.Thei nd La do given thei aylor (1999,wha and there explains h rom a p ons from the par o the de the des of field ent subje h tural experiment ubjects with experi The ide ais to re som vent tha use of private sports-card s as the unde natural ens to have ng c ead of an environme some of acteristics of a field experi tal public good the use of streamline ment.These can be attractive sources of data instructions,the less-intrusive experimental on large-scale economic transactions but methods,mundane experimenter effects, usually at some cost due to the lack of con is it some combination of these and similar trol.forcing the researcher to make certain identification assumptions. mine the role of "controls"in differer Finally,in section 9 we briefly examine related types of experiments of the mind.In one case these are the "thought experi- n the free action。 o hold sw ments"of theorists and statisticians and in and inte the other they are the "neuro-economics periments"provided by technology.The objective is simply to identify how they differ quip and where of xperiments we consider But w Gt in es d to be sor 2.Defining Field Experiment than it would othe the ing othet T y t to be sitive,when studyin from There are several spect.The reason is that the e the g it up ir the dicti ary.An tha it is that you want the word labe re is th The Ox rd En glish Dict ary (Seco T nt is y th in the "th Edition)defines the word"field in the fol ory of the lowing manner: Used attributively to ot auto denote an investigation,study,etc.,carried out in the natural environment of a given to the optimum This content downle
Harrison and List: Field Experiments literature review is necessarily selective, although List (2004d) offers a more complete bibliography. In sections 7 and 8 we review two types of experiments that may be contrasted with ideal field experiments. One is called a social experiment, in the sense that it is a deliber- ate part of social policy by the government. Social experiments involve deliberate, ran- domized changes in the manner in which some government program is implemented. They have become popular in certain areas, such as employment schemes and the detec- tion of discrimination. Their disadvantages have been well documented, given their political popularity, and there are several important methodological lessons from those debates for the design of field experiments. The other is called a "natural experiment." The idea is to recognize that some event that naturally occurs in the field happens to have some of the characteristics of a field experi- ment. These can be attractive sources of data on large-scale economic transactions, but usually at some cost due to the lack of con- trol, forcing the researcher to make certain identification assumptions. Finally, in section 9 we briefly examine related types of experiments of the mind. In one case these are the "thought experi- ments" of theorists and statisticians, and in the other they are the "neuro-economics experiments" provided by technology. The objective is simply to identify how they differ from other types of experiments we consider, and where they fit in. 2. Defining Field Experiments There are several ways to define words. One is to ascertain the formal definition by looking it up in the dictionary. Another is to identify what it is that you want the word- label to differentiate. The Oxford English Dictionary (Second Edition) defines the word "field" in the fol- lowing manner: "Used attributively to denote an investigation, study, etc., carried out in the natural environment of a given material, language, animal, etc., and not in the laboratory, study, or office." This orients us to think of the natural environment of the different components of an experiment.6 It is important to identify what factors make up a field experiment so that we can functionally identify what factors drive results in different experiments. To provide a direct example of the type of problem that motivated us, when List (2001) obtains results in a field experiment that differ from the counterpart lab experiments of Ronald Cummings, Glenn Harrison, and Laura Osborne (1995) and Cummings and Laura Taylor (1999), what explains the difference? Is it the use of data from a particular market whose participants have selected into the market instead of student subjects; the use of subjects with experience in related tasks; the use of private sports-cards as the under- lying commodity instead of an environmen- tal public good; the use of streamlined instructions, the less-intrusive experimental methods, mundane experimenter effects, or is it some combination of these and similar 6 If we are to examine the role of "controls" in different experimental settings, it is appropriate that this word also be defined carefully. The OED (2nd ed.) defines the verb "control" in the following manner: "To exercise restraint or direction upon the free action of; to hold sway over, exer- cise power or authority over; to dominate, command." So the word means something more active and intervention- ist than is suggested by its colloquial clinical usage. Control can include such mundane things as ensuring ster- ile equipment in a chemistry lab, to restrain the free flow of germs and unwanted particles that might contaminate some test. But when controls are applied to human behav- ior, we are reminded that someone's behavior is being restrained to be something other than it would otherwise be if the person were free to act. Thus we are immediate- ly on alert to be sensitive, when studying responses from a controlled experiment, to the possibility that behavior is unusual in some respect. The reason is that the very con- trol that defines the experiment may be putting the sub- ject on an artificial margin. Even if behavior on that margin is not different than it would be without the con- trol, there is the possibility that constraints on one margin may induce effects on behavior on unconstrained margins. This point is exactly the same as the one made in the "the- ory of the second best" in public policy. If there is some immutable constraint on one of the margins defining an optimum, it does not automatically follow that removing a constraint on another margin will move the system closer to the optimum. 1011 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
1019 Journal of Economic Literature,Vol.XLII(December 2004) differences?We believe field experiments (McKinley Blackburn. Harrison, he point tha and t some mame Rutstrom ernatively,the subject addressing such differences in a pool can be de igned to represent a target systematic manner is necessary. oof the ecn e.g.,traders a 2.1 Criteria that Define Field Experiments Board of Trade in Michae Haigh and John List 2004)or the general Running the risk of oversimplifying what population (e.g.,the Danish population in is inherently a multidimensional issue,we Harrison,Morton Igel Lau,and Melonie propose six factors that can be used to deter- Williams 2002). mine the field context of an experiment: In addition,nonstandard subject pools the nature of the subject pool, might bring experience with the commodity the nature of the information that the or the task to the experiment,quite apart subjects bring to the task. from their wider array of demogra hic char the nature of the commodity. acteristics.In the field,subjects bring cer- the nature of the task or trading rules tain information to their trading activities in addition to their knowledge of the trading the nature of the environment that the tance of this information subiect op design and that can lead behavioral at the outset that thes ng abs ch info rrelated to theles of field can be premiu ments that a ta we bon OD] can an mpo ween and field exper tant part of results e common be as the growth Recent years have xperiments concerne ting valuati actual goods standard P00 by rathe than using induced valuations over menters,simply be use the are a conven virtua goods The distinction here is ience sample for academics.Thus when one betwe cen physical goods or actual services ects.the and abstractly defined goods.The latter as nonstandard in this have been the staple of experimental eco sense.But we argue that the use of nonstar nomics since Edward Chamberlin (1948) dard subjects should not automatically qual and Vernon Smith(1962),but imposes an ify the experiment as a field experiment.The artificiality that could be a factor influenc experiments of Cummings, Harrison.and e ing behavior.'Such influences are actually Elizabet Rutstrom(1995),for example,used ot great interest,or should be.If the nature individuals recruited from churches in order of the commodity itself affects behavior ir to obtain a wider range of de a way that is not accounted for by the the characteristics than one standard college setting.The of he I domain of a a nonstandard ool varies from worse is sim eriment to experiment:in this case it sim ply false.In either one can better provideda rated se of s ohic char cte ith el.which ()h nt whe 2 0P1 and s o to nt (9 adjust for D. This content downl
Journal of Economic Literature, Vol. XLII (December 2004) differences? We believe field experiments have matured to the point that some frame- work for addressing such differences in a systematic manner is necessary. 2.1 Criteria that Define Field Experiments Running the risk of oversimplifying what is inherently a multidimensional issue, we propose six factors that can be used to deter- mine the field context of an experiment: * the nature of the subject pool, * the nature of the information that the subjects bring to the task, * the nature of the commodity, * the nature of the task or trading rules applied, * the nature of the stakes, and * the nature of the environment that the subject operates in. We recognize at the outset that these characteristics will often be correlated to varying degrees. Nonetheless, they can be used to propose a taxonomy of field experi- ments that will, we believe, be valuable as comparisons between lab and field experi- mental results become more common. Student subjects can be viewed as the standard subject pool used by experi- menters, simply because they are a conven- ience sample for academics. Thus when one goes "outdoors" and uses field subjects, they should be viewed as nonstandard in this sense. But we argue that the use of nonstan- dard subjects should not automatically qual- ify the experiment as a field experiment. The experiments of Cummings, Harrison, and E. Elizabet Rutstr6m (1995), for example, used individuals recruited from churches in order to obtain a wider range of demographic characteristics than one would obtain in the standard college setting. The importance of a nonstandard subject pool varies from experiment to experiment: in this case it sim- ply provided a less concentrated set of socio- demographic characteristics with respect to age and education level, which turned out to be important when developing statistical models to adjust for hypothetical bias (McKinley Blackburn, Harrison, and Rutstr6m 1994). Alternatively, the subject pool can be designed to represent a target population of the economy (e.g., traders at the Chicago Board of Trade in Michael Haigh and John List 2004) or the general population (e.g., the Danish population in Harrison, Morton Igel Lau, and Melonie Williams 2002). In addition, nonstandard subject pools might bring experience with the commodity or the task to the experiment, quite apart from their wider array of demographic char- acteristics. In the field, subjects bring cer- tain information to their trading activities in addition to their knowledge of the trading institution. In abstract settings the impor- tance of this information is diminished, by design, and that can lead to behavioral changes. For example, absent such informa- tion, risk aversion can lead to subjects requiring a risk premium when bidding for objects with uncertain characteristics. The commodity itself can be an impor- tant part of the field. Recent years have seen a growth of experiments concerned with eliciting valuations over actual goods, rather than using induced valuations over virtual goods. The distinction here is between physical goods or actual services and abstractly defined goods. The latter have been the staple of experimental eco- nomics since Edward Chamberlin (1948) and Vernon Smith (1962), but imposes an artificiality that could be a factor influenc- ing behavior.7 Such influences are actually of great interest, or should be. If the nature of the commodity itself affects behavior in a way that is not accounted for by the the- ory being applied, then the theory has at best a limited domain of applicability that we should be aware of, and at worse is sim- ply false. In either case, one can better 7 It is worth noting that neither Chamberlin (1948) nor Smith (1962) used real payoffs to motivate subjects in their market experiments, although Smith (1962) does explain how that could be done and reports one experiment (fn 9., p. 121) in which monetary payoffs were employed. 1012 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
Harrison and List:Field Experiments 1013 nerstand themitations of the geperality this is an important component of the inter- of theory only via empirical testing. play between the lab and field.Early illus Again,however.just having one field char trations of the value of this approach include acteristic,in this case a physical good,does David Grether,R.Mark Isaac,and Charles not constitute a field experiment in any fun Plott [1981,1989],Grether and Plott [1984] damental sense Rutstrom (1998)sold lots and lames Hong and Plott [19821 and lots of chocolate truffles in a laboratory The nature of the stakes can also affect study of different auction institution field responses. Stakes in the laboratory designed to elicit values truthfully,but hers might be very different than those encou was very much a lab e neriment desnite the tered in the field,and hence have an effect tastiness of the Similarly.Iar on behavior if valuations are taken seriousl Bateman et al.(1997)elicited valuations ove when they are in the t s of dollars,or in the nd dessert voucher s for a local res undred made ant While de com wer th c dollar bu nts with stakes beloy the t to obtai a ould e nge em,they are not abs ract.Ther nder imprecise e are many e goods in the o of de mental noditie ation p to differe The natu tas that the subject is periments in relative being asked undertake poor co mes offe e opp ortunity to eva component of a field experiment,since on uate the effects of substantial stakes within a would expect that fiele coul given budget. play a major role in helping individua The environment of the experiment can develop heuristics for specific tasks.The lab also influence behavior.The environmen experiments of John Kagel and Dan Levin can provide context to suggest strategies and (1999)illustrate this point,with"super-expe heuristics that a lab setting might not.Lab rienced"subiects behaving differently thar experimenters have always wondered inexperienced subiects in terms ot thei whether the use of classrooms might engen- propensity to fall prey to the winners'curse der role-playing behavior,and indeed this is one of the reasons experimental economists settings "travel"to the other field and lab onetary rewards.Even settings (Harrison and List 2003).Another with salient rewards.however,environmen t of the task is the st rameteri tal effects could remain.Rather than viev ion that is adopted the xpe controlled effects,we see them as One worthy of controlled study ted f the 2.2 A Proposed Taxonomy ,50 as o bel Mior in a rele is of Any of field domain-specific c,and bel vior can always be ng rentia lab of Chamberlin (1948 There som er,m naving e power of oad terms to differ rentiate wh see a would the key differences.We propose the following terminology a conventional lab experiment is one This content downle a20g2016o61s4UTC
Harrison and List: Field Experiments understand the limitations of the generality of theory only via empirical testing.8 Again, however, just having one field char- acteristic, in this case a physical good, does not constitute a field experiment in any fun- damental sense. Rutstr6m (1998) sold lots and lots of chocolate truffles in a laboratory study of different auction institutions designed to elicit values truthfully, but hers was very much a lab experiment despite the tastiness of the commodity. Similarly, Ian Bateman et al. (1997) elicited valuations over pizza and dessert vouchers for a local restau- rant. While these commodities were not actual pizza or dessert themselves, but vouchers entitling the subject to obtain them, they are not abstract. There are many other examples in the experimental literature of designs involving physical commodities.9 The nature of the task that the subject is being asked to undertake is an important component of a field experiment, since one would expect that field experience could play a major role in helping individuals develop heuristics for specific tasks. The lab experiments of John Kagel and Dan Levin (1999) illustrate this point, with "super-expe- rienced" subjects behaving differently than inexperienced subjects in terms of their propensity to fall prey to the winners' curse. An important question is whether the suc- cessful heuristics that evolve in certain field settings "travel" to the other field and lab settings (Harrison and List 2003). Another aspect of the task is the specific parameteri- zation that is adopted in the experiment. One can conduct a lab experiment with parameter values estimated from the field data, so as to study lab behavior in a "field- relevant" domain. Since theory is often domain-specific, and behavior can always be, 8 To use the example of Chamberlin (1948) again, List (2004e) takes the natural next step by exploring the pre- dictive power of neoclassical theory in decentralized, nat- urally occurring field markets. 9 We would exclude experiments in which the com- modity was a gamble, since very few of those gambles take the form of naturally occurring lotteries. this is an important component of the inter- play between the lab and field. Early illus- trations of the value of this approach include David Grether, R. Mark Isaac, and Charles Plott [1981, 1989], Grether and Plott [1984], and James Hong and Plott [1982]. The nature of the stakes can also affect field responses. Stakes in the laboratory might be very different than those encoun- tered in the field, and hence have an effect on behavior. If valuations are taken seriously when they are in the tens of dollars, or in the hundreds, but are made indifferently when the price is less than one dollar, laboratory or field experiments with stakes below one dol- lar could easily engender imprecise bids. Of course, people buy inexpensive goods in the field as well, but the valuation process they use might be keyed to different stake levels. Alternatively, field experiments in relatively poor countries offer the opportunity to eval- uate the effects of substantial stakes within a given budget. The environment of the experiment can also influence behavior. The environment can provide context to suggest strategies and heuristics that a lab setting might not. Lab experimenters have always wondered whether the use of classrooms might engen- der role-playing behavior, and indeed this is one of the reasons experimental economists are generally suspicious of experiments without salient monetary rewards. Even with salient rewards, however, environmen- tal effects could remain. Rather than view them as uncontrolled effects, we see them as worthy of controlled study. 2.2 A Proposed Taxonomy Any taxonomy of field experiments runs the risk of missing important combinations of the factors that differentiate field experi- ments from conventional lab experiments. There is some value, however, in having broad terms to differentiate what we see as the key differences. We propose the following terminology: * a conventional lab experiment is one 1013 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
1014 Journal of Economic Literature,Vol.XLII (December 2004) that employs a standard subject pool of extent of discrimination in the sports-card stud ct framing,and an marketplace .an'artefactual field e riment is the 3.Methodological Importance of Field Experiments Field experiments are methodologically aframed field experiment is the same as rtant beca use the an artefactual field experiment but with tion to iss s tha field context in either the commodity. task,or information set that the subjects can use: i the field.bu e com a natural field experiment is the same as ral imporf t they are of more anc a framed field experiment but where the environment is one where the sub- ny t effect tion method for construct t jects naturally undertake these tasks count e prop. er rfactua that tow spen years exam ing approaches to this We recognize that any such taxonom problem five alternative methods structing the counterfactual:con leaves o s,and that certain studies may no fall neatly into our classification scheme trolled experiments,natural experiments propensity score matching (PSM),instru it is often iate mental variables(IV)estimation,and struc- duct e identify the in order tural approaches.Define y,as the outcome with treatment as the outcome withou tual L Har List (2003 expe ents The treatment h the ect pool,pr effect for unit i can then be measured as ntify The maior problem.however.is natur latter setting one of a missing unterfactual:t.is to s context-ridd unknown.If we could observe the outcome found in the former se ting.And for an untreated observation had it been conducted artefactual framed,and natura experiments to investigate the nature and Controlled" laborat expe nte and field ent the ing or sc nethod ofpr ting the counterfa di rand rectyc9St group nts cas the population an ok the a rimental desi f Maril of the field with () continuous tent.ohit naturally ics.one might have some asure of risk av 2 and" pts that Lis 01)conducted wit nwhich the ontrol d the they moditie trcitaticnot Th cono-UTC
Journal of Economic Literature, Vol. XLII (December 2004) that employs a standard subject pool of students, an abstract framing, and an imposed10 set of rules; * an artefactual field experiment is the same as a conventional lab experiment but with a nonstandard subject pool;11 * aframed field experiment is the same as an artefactual field experiment but with field context in either the commodity, task, or information set that the subjects can use;2 * a naturalfield experiment is the same as a framed field experiment but where the environment is one where the sub- jects naturally undertake these tasks and where the subjects do not know that they are in an experiment.l3 We recognize that any such taxonomy leaves gaps, and that certain studies may not fall neatly into our classification scheme. Moreover, it is often appropriate to con- duct several types of experiments in order to identify the issue of interest. For example, Harrison and List (2003) conducted artefac- tual field experiments and framed field experiments with the same subject pool, pre- cisely to identify how well the heuristics that might apply naturally in the latter setting "travel" to less context-ridden environments found in the former setting. And List (2004b) conducted artefactual, framed, and natural experiments to investigate the nature and 10 The fact that the rules are imposed does not imply that the subjects would reject them, individually or social- ly, if allowed. 11 To offer an early and a recent example, consider the risk-aversion experiments conducted by Hans Binswanger (1980, 1981) in India, and Harrison, Lau, and Williams (2002), who took the lab experimental design of Maribeth Coller and Melonie Williams (1999) into the field with a representative sample of the Danish population. 12 For example, the experiments of Peter Bohm (1984b) to elicit valuations for public goods that occurred naturally in the environment of subjects, albeit with unconventional valuation methods; or the Vickrey auctions and "cheap talk" scripts that List (2001) conducted with sport-card collectors, using sports cards as the commodity and at a show where they trade such commodities. 13 For example, the manipulation of betting markets by Camerer (1998) or the solicitation of charitable contribu- tions by List and Lucking-Reiley (2002). extent of discrimination in the sports-card marketplace. 3. Methodological Importance of Field Experiments Field experiments are methodologically important because they mechanically force the rest of us to pay attention to issues that great researchers seem to intuitively address. These issues cannot be comfortably forgotten in the field, but they are of more general importance. The goal of any evaluation method for "treatment effects" is to construct the prop- er counterfactual, and economists have spent years examining approaches to this problem. Consider five alternative methods of constructing the counterfactual: con- trolled experiments, natural experiments, propensity score matching (PSM), instru- mental variables (IV) estimation, and struc- tural approaches. Define y, as the outcome with treatment, Yo as the outcome without treatment, and let T=l when treated and T=0 when not treated.14 The treatment effect for unit i can then be measured as ti=yi--yio. The major problem, however, is one of a missing counterfactual: t, is unknown. If we could observe the outcome for an untreated observation had it been treated, then there is no evaluation problem. "Controlled" experiments, which include laboratory experiments and field experi- ments, represent the most convincing method of creating the counterfactual, since they directly construct a control group via randomization.15 In this case, the population 14 We simplify by considering a binary treatment, but the logic generalizes easily to multiple treatment levels and continuous treatments. Obvious examples from outside economics include dosage levels or stress levels. In eco- nomics, one might have some measure of risk aversion or "other regarding preferences" as a continuous treatment. 15 Experiments are often run in which the control is pro- vided by theory, and the objective is to assess how well the- ory matches behavior. This would seem to rule out a role for randomization, until one recognizes that some implicit or explicit error structure is required in order to test theo- ries meaningfully. We return to this issue in section 8. 1014 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
Harrison and List:Field Experiments 1015 average treatment effect is given by individuals with the same value for these fac t=U u*where u*,and u*are the treat tors will display homogenous responses to ed and nontreated average outcomes after the treatment.then the treatment effect can the treatment.we have much more to sav be measured without bias.In effect.one can about controlled experiments,in particular use statistical methods to identify which two field experiments,below individuals are"more homogeneous lab rats' “Natural experiments'”consider the treat- ment itself as an exp eriment and find a natu rally occurring con is to find a vector of covariates.Z.such that o⊥T|Z and pr(T=1Z)∈(0,1,wher ing the difference in outcomes before mpar enotes indepe after for the treated g nother tive and after outo s fo eated grou th of instn ntal Estin ation of the tre tm tak che the the hat it relie unit o where th s, Guido at time t, Dona 1996 and controls. nary varia Angrist and Al lan B.Krueger 2001) nr=4+7 n-d e IV method,which essentially assumes ferences(DID age treatment effec that compor nts of the nor xpen we assume that data exists for two periods ental data are random,is perhaps the most then t=(u (y-y*)where widely utilized approach to measuring treat for examp y'is the mean outcome for ment effects(Mark Rosenzweig and Kenneth the treated group Wolpin 2000).The crux of the IV approach is A maior identitving assumption in dIl to find a variable that is excluded from the estimation is that there e are no time-varyin outcome equation.but which is related to unit-specific shocks to the outcome variab treatment status and has no direct association that are correlated with treatment status with the outcome The weakness of the Iv and that selection into treatment is inde approach is that such variables do not often pendent of temporary individual-specific exist.or that unpalatable assumptions must effect:E(n X D)=E(a X D.)+.If be maintained in order for them to be used to E,and t are related,DID is inconsistently identify the treatment effect of interest estimated as E(t)=t+E(e.,-Em D=1) A final alternative to the DID model is -E(E.-ED=0). structural modeling such models often entail One hative method of assessing the a heavy impact of the treatr nt is the method of 16 If (1983 g t sively in the debate c nd Ialonde (1986 dhave been had the 1000 a 2002 and le Smith and etra Todd (2000 ymay limit the M is to make nor erimental data sho li experimental The intuition he be hind PSM is that if the researcher can "propensit select observable factors so that any two bit or logit model with T as the dependent variable. This content downle 3728s206614UTC
Harrison and List: Field Experiments average treatment effect is given by T=y* -y*o, where y*J and y*0 are the treat- ed and nontreated average outcomes after the treatment. We have much more to say about controlled experiments, in particular field experiments, below. "Natural experiments" consider the treat- ment itself as an experiment and find a natu- rally occurring comparison group to mimic the control group: T is measured by compar- ing the difference in outcomes before and after for the treated group with the before and after outcomes for the nontreated group. Estimation of the treatment effect takes the form Yit=Xit3 + Tit+lit, where i indexes the unit of observation, t indexes years, Yit is the outcome in cross-section i at time t, Xit is a vector of controls, Tit is a binary variable, lit=a,+ t+£it, and t is the difference-in-dif- ferences (DID) average treatment effect. If we assume that data exists for two periods, then t=(ytt-yl y*to)-(y*ti -y*tO) where, for example, yt*t is the mean outcome for the treated group. A major identifying assumption in DID estimation is that there are no time-varying, unit-specific shocks to the outcome variable that are correlated with treatment status, and that selection into treatment is inde- pendent of temporary individual-specific effect: E(rlit I Xit, Dit)=E(oi I Xit, Dit)+. If Eit, and T are related, DID is inconsistently estimated as E(t)=X+ E(£it1-£ D=1) -E(Eitl -ito D=0). One alternative method of assessing the impact of the treatment is the method of propensity score matching (PSM) developed in P. Rosenbaum and Donald Rubin (1983). This method has been used extensively in the debate over experimental and nonexper- imental evaluation of treatment effects initi- ated by Lalonde (1986): see Rajeev Dehejia and Sadek Wahba (1999, 2002) and Jeffrey Smith and Petra Todd (2000). The goal of PSM is to make non-experimental data "look like" experimental data. The intuition behind PSM is that if the researcher can select observable factors so that any two individuals with the same value for these fac- tors will display homogenous responses to the treatment, then the treatment effect can be measured without bias. In effect, one can use statistical methods to identify which two individuals are "more homogeneous lab rats" for the purposes of measuring the treatment effect. More formally, the solution advocated is to find a vector of covariates, Z, such that y,,y0 I T | Z and pr(T=l IZ) e (0,1), where I denotes independence.6 Another alternative to the DID model is the use of instrumental variables (IV), which approaches the structural econometric method in the sense that it relies on exclusion restrictions (Joshua D. Angrist, Guido W. Imbens, and Donald B. Rubin 1996; and Joshua D. Angrist and Alan B. Krueger 2001). The IV method, which essentially assumes that some components of the non-experi- mental data are random, is perhaps the most widely utilized approach to measuring treat- ment effects (Mark Rosenzweig and Kenneth Wolpin 2000). The crux of the IV approach is to find a variable that is excluded from the outcome equation, but which is related to treatment status and has no direct association with the outcome. The weakness of the IV approach is that such variables do not often exist, or that unpalatable assumptions must be maintained in order for them to be used to identify the treatment effect of interest. A final alternative to the DID model is structural modeling. Such models often entail a heavy mix of identifying restrictions (e.g., 16 If one is interested in estimating the average treat- ment effect, only the weaker condition E(yolT=l, Z)=E(yoIT=O, Z)=E(yo I Z) is required. This assumption is called the "conditional independence assumption," and intuitively means that given Z, the nontreated outcomes are what the treated outcomes would have been had they not been treated. Or, likewise, that selection occurs only on observables. Note that the dimensionality of the prob- lem, as measured by Z, may limit the use of matching. A more feasible alternative is to match on a function of Z. Rosenbaum and Rubin (1983, 1984) showed that matching on p(Z) instead of Z is valid. This is usually carried out on the "propensity" to get treated p(Z), or the propensity score, which in turn is often implemented by a simple pro- bit or logit model with T as the dependent variable. 1015 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
1016 Journal of Economic Literature,Vol.XLII(December 2004) could be applied to real people,but to a pref (e.g. ally enous and often e or unitary incor elasticitie )and sin ctive lo al problems plifying assumptions about equilibrium out- A more subst anti response to this criti- (e.g.,zero-profit condit su)Perhaps the ons defin is to consider what it is about students equilibrium industr that is view a priori,as being nonrepre best-known class of such structural mode S IS sentative of the target population.There are computable general equilibrium models at least two issues here.The first is whethe which have been extensively applied to eval endogenous sample selection or attrition has ate trade policies.for example. It typically occurred due to incomplete control ove relies on complex estimation strategies,but recruitment and retention,so that the yields structural parameters that are well observed sample is unreliable in some statis suited for ex ante policy simulation provided tical sense (e.g.,generating inconsistent esti- one undertakes systematic sensitivity analysis mates of treatment effects)The second is of those parameters.8 In this sense,structur whether the observed sample can be inform al models have been the cornerstone of non experimental evaluation of tax and welfan policies (R.Blundell and Thomas MaCurdy 1999:and Blundell and M.Costas Dias 2002) A 2 sample selection in the field 4.Artefactual Field Experiments tudents who ar told only the 4.1 The Nature of the Subject Pool rge A common criticism of the relevance of oid mentio inferences drawn from laboratory experi as以 the 1 ea Most lab ments is that one needs to undertake an e-shot, ople not students tha d repeated subject to attrition. with the following i course,neither of these atures is sentia think that the te differ If one wanted to recruit sub cts with speci ic interest in a task,it would be easy to do experim (e.g..Peter Bohm and Hans lind 1993).And al pe challe nge the if one wanted to recruit subiects for severa "super-experienced or to conduct pre-tests of such heeheont things as risk aversion trust or "other an The first response,to suggest t regarding preferences that could be built run the experiment th real people, into the design as well often adequate to get rid of unwanted refer One concern with lab experiments con- ducted with convenience sam oles of students ees at academic journals.In practice,howev er,few experimenters ever examine fielo behavior in a serious and large-sample way It is relatively easy to say that the experiment Kagel,Battalio, nard from the ory to organ 1Trr(1997 nd H.D.Vinod (192) 200 For e cample,.Cox(2004)】 Th cono-UTC
Journal of Economic Literature, Vol. XLII (December 2004) separability), impose structure on technology and preferences (e.g., constant returns to scale or unitary income elasticities), and sim- plifying assumptions about equilibrium out- comes (e.g., zero-profit conditions defining equilibrium industrial structure). Perhaps the best-known class of such structural models is computable general equilibrium models, which have been extensively applied to evalu- ate trade policies, for example.17 It typically relies on complex estimation strategies, but yields structural parameters that are well- suited for ex ante policy simulation, provided one undertakes systematic sensitivity analysis of those parameters.18 In this sense, structur- al models have been the cornerstone of non- experimental evaluation of tax and welfare policies (R. Blundell and Thomas MaCurdy 1999; and Blundell and M. Costas Dias 2002). 4. Artefactual Field Experiments 4.1 The Nature of the Subject Pool A common criticism of the relevance of inferences drawn from laboratory experi- ments is that one needs to undertake an experiment with "real" people, not students. This criticism is often deflected by experi- menters with the following imperative: if you think that the experiment will generate differ- ent results with "real" people, then go ahead and run the experiment with real people. A variant of this response is to challenge the crit- ics' assertion that students are not representa- tive. As we will see, this variant is more subtle and constructive than the first response. The first response, to suggest that the crit- ic run the experiment with real people, is often adequate to get rid of unwanted refer- ees at academic journals. In practice, howev- er, few experimenters ever examine field behavior in a serious and large-sample way. It is relatively easy to say that the experiment 17 For example, the evaluation of the Uruguay Round of multilateral trade liberalization by Harrison, Thomas Rutherford, and David Tarr (1997). 18 For example, see Harrison and H.D. Vinod (1992). could be applied to real people, but to actu- ally do so entails some serious and often unattractive logistical problems.19 A more substantial response to this criti- cism is to consider what it is about students that is viewed, a priori, as being nonrepre- sentative of the target population. There are at least two issues here. The first is whether endogenous sample selection or attrition has occurred due to incomplete control over recruitment and retention, so that the observed sample is unreliable in some statis- tical sense (e.g., generating inconsistent esti- mates of treatment effects). The second is whether the observed sample can be inform- ative on the behavior of the population, assuming away sample selection issues. 4.2 Sample Selection in the Field Conventional lab experiments typically use students who are recruited after being told only general statements about the experiment. By and large, recruitment pro- cedures avoid mentioning the nature of the task, or the expected earnings. Most lab experiments are also one-shot, in the sense that they do not involve repeated observa- tions of a sample subject to attrition. Of course, neither of these features is essential. If one wanted to recruit subjects with specif- ic interest in a task, it would be easy to do (e.g., Peter Bohm and Hans Lind 1993). And if one wanted to recruit subjects for several sessions, to generate "super-experienced" subjects20 or to conduct pre-tests of such things as risk aversion, trust, or "other- regarding preferences,"21 that could be built into the design as well. One concern with lab experiments con- ducted with convenience samples of students 19 Or one can use "real" nonhuman species: see John Kagel, Don MacDonald, and Raymond Battalio (1990) and Kagel, Battalio, and Leonard Green (1995) for dramatic demonstrations of the power of economic theory to organ- ize data from the animal kingdom. 20 For example, John Kagel and Dan Levin (1986, 1999, 2002). 21 For example, Cox (2004). 1016 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
Harrison and List:Field Experiments 1017 is that students might be self-selected in allows one to remove this recruitment bias some way,so that they are a sample that from the resulting inference excludes certain individuals with characteris Some field experiments face a more seri tics that are important determinants of ous problem of sample selection that underlying population behavior.Although den nds on the nature of the task.Once the potential periment has beg in.it is not as easy as it is should not be in the lab to control information flow about remphasi d.it is always possible to sim the t the mple to s matter of degree endog repr ented ubiect att least undop rition fr the expen umn this t i actually in ab tha ma fro the the nitial ec t had whicl i th S vere su classi and Lary Hed 06 no many cor selec d mass murderers or brain surgeons in to possibl recruitment biase suc student samples, we certainly that the erved sample is generate where to go if we feel the need to include process that depends on the nature of the them in our sampl Another consideration.of increasing importance for experimenters,is the possi bility of recruitment biases in our proce dures.One aspect of this issue is studied by population from which volunteers are being Rutstrom (1998).She examines the role of recruited has diverse risk attitudes and plau- recruitment fees in biasing the samples of sibly expects the experiment to have some subiects that are obtained The context for element of randomization then the her experiment is particularly relevant here observed sample will tend to look less risk since it entails the elicitation ofv月lues for a averse than the population.It is easy to imagine how this could then affect behavior differentially in some treatments.Iame ited as varies the Heckman and Jeffrey Smith (1995)discr this issue in the ontext of social and then up toten dolars.An tan ots but the cor applies equ ally to ho s tha of tho h experiments 4.3 Are Students Different? 6 a s and th This oy co essed in ing for the In eral includi field mat group o subjects trea Lichte tein an ment has 60 ercent fen es and the o 1973 and Penny Burr ns (19 sample of subjects in another trea ent has nn Harrison and James Les ley(1996 only 40 percent females,provided one con (HL)approach this ques tion with a simple trols for the difference in gender when poo framework.Indeec they do not ing the data and examining the key treatment consider the issue in terms of the relevance effect.This is a situation in which gender might influence the response or the effect of owerchoiesiodeenntthe mization often occur the treatment,but controlling for gender This content downle a20g2016o61s4UTC
Harrison and List: Field Experiments is that students might be self-selected in some way, so that they are a sample that excludes certain individuals with characteris- tics that are important determinants of underlying population behavior. Although this problem is a severe one, its potential importance in practice should not be overemphasized. It is always possible to sim- ply inspect the sample to see if certain strata of the population are not represented, at least under the tentative assumption that it is only observables that matter. In this case it would behoove the researcher to augment the initial convenience sample with a quota sample, in which the missing strata were sur- veyed. Thus one tends not to see many con- victed mass murderers or brain surgeons in student samples, but we certainly know where to go if we feel the need to include them in our sample. Another consideration, of increasing importance for experimenters, is the possi- bility of recruitment biases in our proce- dures. One aspect of this issue is studied by Rutstr6m (1998). She examines the role of recruitment fees in biasing the samples of subjects that are obtained. The context for her experiment is particularly relevant here since it entails the elicitation of values for a private commodity. She finds that there are some significant biases in the strata of the population recruited as one varies the recruitment fee from zero dollars to two dol- lars, and then up to ten dollars. An important finding, however, is that most of those biases can be corrected simply by incorporating the relevant characteristics in a statistical model of the behavior of subjects and thereby con- trolling for them. In other words, it does not matter if one group of subjects in one treat- ment has 60 percent females and the other sample of subjects in another treatment has only 40 percent females, provided one con- trols for the difference in gender when pool- ing the data and examining the key treatment effect. This is a situation in which gender might influence the response or the effect of the treatment, but controlling for gender allows one to remove this recruitment bias from the resulting inference. Some field experiments face a more seri- ous problem of sample selection that depends on the nature of the task. Once the experiment has begun, it is not as easy as it is in the lab to control information flow about the nature of the task. This is obviously a matter of degree, but can lead to endoge- nous subject attrition from the experiment. Such attrition is actually informative about subject preferences, since the subject's exit from the experiment indicates that the sub- ject had made a negative evaluation of it (Tomas Philipson and Larry Hedges 1998). The classic problem of sample selection refers to possible recruitment biases, such that the observed sample is generated by a process that depends on the nature of the experiment. This problem can be serious for any experiment, since a hallmark of virtually every experiment is the use of some ran- domization, typically to treatment.22 If the population from which volunteers are being recruited has diverse risk attitudes and plau- sibly expects the experiment to have some element of randomization, then the observed sample will tend to look less risk- averse than the population. It is easy to imagine how this could then affect behavior differentially in some treatments. James Heckman and Jeffrey Smith (1995) discuss this issue in the context of social experi- ments, but the concern applies equally to field and lab experiments. 4.3 Are Students Different? This question has been addressed in sev- eral studies, including early artefactual field experiments by Sarah Lichtenstein and Paul Slovic (1973), and Penny Burns (1985). Glenn Harrison and James Lesley (1996) (HL) approach this question with a simple statistical framework. Indeed, they do not consider the issue in terms of the relevance 22 If not to treatment, then randomization often occurs over choices to determine payoff. 1017 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms
1018 Journal of Economic Literature,Vol.XLII(December 2004) of experimental methods.but rather in the subject was asked whether he or she e re ples or the contingent vala would b e wil g to pay ard a pi domly selecte However it is easy to se hat their methods e910 0,or $120.A s bject would respond to this question with a yes, m0, or a "not sure."A simple statistical terms of their attempt to mimic the results model is deve eloped to explain behavior as a of a large-scale na ional survey conducted Valdez oil-spill litigation. function of theobservable socioecnm for the Exxon characteristics major national survey was undertaken in this Assuming that a statistical model has case by Richard Carson et al.(1992)for the been developed,HL then proceeded to the attorney general of the state of alaska.this key stage of their method.This is to assume survey used then-state-of-the-art survey that the coefficient estimates from the statis- methods but,more importantly for present tical model based on the student sample purposes,used a full probability sa nle of apply to the the nation HI asked if one can obtair essentially the same results using a conver tained,then the statistical model may be ience sample of students from the University used to predict the behavior of the targe Carolina.using students as a con oulation if one can obtain information ole is la matter about the onomic characteristics of methodolo ould r ilv ulati in using stude eof the HL method is ents pro a tough e and m a licable than pro They proceeded by developinga senta ample If st pp survey re op a tudy behavior to the method on can oft use publ y ava ilable infor not essentia to Thi matio on the characte s of the targe survey was inistered to a relativel population to predict the be havior of large sample of students population Their fund mental point is that survey that aims the“pi oblem with students is the lack of to control for subject attributes,is the col variability in their socio-demographic char- lection of a range of standard socioeconom acteristics.not necessarily the unrepresen ic characteristics of the individual (e.g.,sex tativeness of their behavioral responses age,income. parental income conditional on their socio-demographic size,and marital status).Once these data characteristics. are collated,a statistical model is developed To the extent that student samples exhibit in order to explain the key responses in the limited variability in some key characteris- survey.In this case the key tics,such as age,then one might be wary of the ve racity of the maintained assumption ce valuation question. In other words involved here.Hov er.the sample do no have to look like the ulation ir order for the statistical model to be an ade ate one 黑the subject to place )e
Journal of Economic Literature, Vol. XLII (December 2004) of experimental methods, but rather in terms of the relevance of convenience sam- ples for the contingent valuation method.23 However, it is easy to see that their methods apply much more generally. The HL approach may be explained in terms of their attempt to mimic the results of a large-scale national survey conducted for the Exxon Valdez oil-spill litigation. A major national survey was undertaken in this case by Richard Carson et al. (1992) for the attorney general of the state of Alaska. This survey used then-state-of-the-art survey methods but, more importantly for present purposes, used a full probability sample of the nation. HL asked if one can obtain essentially the same results using a conven- ience sample of students from the University of South Carolina. Using students as a con- venience sample is largely a matter of methodological bravado. One could readily obtain convenience samples in other ways, but using students provides a tough test of their approach. They proceeded by developing a simpler survey instrument than the one used in the original study. The purpose of this is purely to facilitate completion of the survey and is not essential to the use of the method. This survey was then administered to a relatively large sample of students. An important part of the survey, as in any field survey that aims to control for subject attributes, is the col- lection of a range of standard socioeconom- ic characteristics of the individual (e.g., sex, age, income, parental income, household size, and marital status). Once these data are collated, a statistical model is developed in order to explain the key responses in the survey. In this case the key response is a simple "yes" or "no" to a single dichotomous choice valuation question. In other words, 23 The contingent valuation method refers to the use of hypothetical field surveys to value the environment, by posing a scenario that asks the subject to place a value on an environmental change contingent on a market for it existing. See Cummings and Harrison (1994) for a critical review of the role of experimental economics in this field. the subject was asked whether he or she would be willing to pay $X towards a public good, where $X was randomly selected to be $10, $30, $60, or $120. A subject would respond to this question with a "yes," a "no," or a "not sure." A simple statistical model is developed to explain behavior as a function of the observable socioeconomic characteristics.24 Assuming that a statistical model has been developed, HL then proceeded to the key stage of their method. This is to assume that the coefficient estimates from the statis- tical model based on the student sample apply to the population at large. If this is the case, or if this assumption is simply main- tained, then the statistical model may be used to predict the behavior of the target population if one can obtain information about the socioeconomic characteristics of the target population. The essential idea of the HL method is simple and more generally applicable than this example suggests. If students are repre- sentative in the sense of allowing the researcher to develop a "good" statistical model of the behavior under study, then one can often use publicly available infor- mation on the characteristics of the target population to predict the behavior of that population. Their fundamental point is that the "problem with students" is the lack of variability in their socio-demographic char- acteristics, not necessarily the unrepresen- tativeness of their behavioral responses conditional on their socio-demographic characteristics. To the extent that student samples exhibit limited variability in some key characteris- tics, such as age, then one might be wary of the veracity of the maintained assumption involved here. However, the sample does not have to look like the population in order for the statistical model to be an adequate one 24 The exact form of that statistical model is not impor- tant for illustrative purposes, although the development of an adequate statistical model is important to the reliability of this method. 1018 This content downloaded from 218.106.182.180 on Sat, 11 Jun 2016 06:18:54 UTC All use subject to http://about.jstor.org/terms