Journal of Public Economics 6 (1976)123-162. North-Holland Publishing Company ON THE SPECIFICATION OF MODELS OF OPTIMUM INCOME TAXATION N.H. STERN* St. Catherine's College, Oxford, England with programming by D. Deans Revised version received September 1975 The main concerns of the paper are the problems of estimating labour supply functions for use in models of optimum income taxation, and the calculation of the effect on the optimum linear tax rate of varying the elasticity of substitution, a, between leisure and goods from 0 to 1. Backward sloping supply curves are commonly observed and they imply a<1. Our calculation of e from estimates of supply curves by Ashenfelter and Heckman gives a=0.4. Optimum marginal rates decrcase with e when taxation is purely redistributive but may be nonmonotonic if positive revenue is to be raised. It is proved that optimum (linear or nonlinear) taxation involves a marginal rate of 100 percent when e=0. 1. Introduction There are four main ingredients for a model of optimum income taxation: an objective function, a preference relation or supply function for individuals. a skill structure and distribution, and a production relation. They are closely intertwined. An individualistic social welfare function would take into account the preference structure of individuals. The supply of various kinds of skills will depend on individuals' wishes or ability to produce these skills. The pro- duction relation must state how skills of different kinds are combined to produce outputs. The optimum income taxation problem as usually posed is to maximise a social welfare function, which depends on individual utilities, subject to two constraints. The first is that each individual should consume goods and supply factors in amounts which maximise his utility subject to the constraint of the tax function, which describes how much post-tax consumption can be acquired from pre-tax earnings. We are searching for the optimum function. The second is that the total labour supplied can produce the total quantity of goods *This paper has benefitted greatly from discussions with A.B. Atkinson, D.L. Bevan, P.A. Diamond, J.S. Flemming, J.A. Mirrlees and K.W.S. Roberts. The comments of partici- pants at a seminar in Cambridge were also helpful. Responsibility for all errors is mine. The paper was presented to the ISPE Conference on taxation in Paris. January 18-20. 1975. The comments of the discussants at that conference, E. Malinvaud and M, Bruno, were helpful. The support of the SSRC under grant HR 3733 is gratefully acknowledged
Journal of Public Economics 6 (1976) 123-162.6 North-Holland Publishing Company ON TJ3E SPECIFICATION OF MODELS OF OPTIMUM INCOME TAXATION N.H. STERN* St. Catherine’s College, Oxford, &gland with programming by D. Deans Revised version received September 1975 The main concerns of the paper are the problems of estimating labour supply functions for use in models of optimum income taxation, and the calculation of the effect on the optimum linear tax rate of varying the elasticity of substitution, e, between leisure and goods from 0 to 1. Backward sloping supply curves are commonly observed and they imply e < 1. Our calculation of e from estimates of supply curves by Ashenfelter and Heckman gives e = 0.4. Optimum marginal rates decrease with e when taxation is purely redistributive but may be nonmonotonic if positive revenue is to be raised. It is proved that optimum (linear or nonlinear) taxation involves a marginal rate of 100 percent when e = 0. 1. Introduction There are four main ingredients for a model of optimum income taxation: an objective function, a preference relation or supply function for individuals, a skill structure and distribution, and a production relation. They are closely intertwined. An individualistic social welfare function would take into account the preference structure of individuals. The supply of various kinds of skills will depend on individuals’ wishes or ability to produce these skills. The production relation must state how skills of different kinds are combined to produce outputs. The optimum income taxation problem as usually posed is to maximise a social welfare function, which depends on individual utilities, subject to two constraints. The first is that each individual should consume goods and suppiy factors in amounts which maximise his utility subject to the constraint of the tax function, which describes how much post-tax consumption can be acquired from pre-tax earnings. We are searching for the optimum function. The second is that the total labour supplied can produce the total quantity of goods *This paper has benefitted greatly from discussions with A.B. Atkinson, D.L. Bevan, P.A. Diamond, J.S. Flemming, J.A. Mirrlees and K.W.S. Roberts. The comments of participants at a seminar in Cambridge were also helpful. Responsibility for all errors is mine. The paper was presented to the ISPE Conference on taxation in Paris, January 18-20, 1975. The comments of the discussants at that conference, E. Malinvaud and M, Bruno, were helpful. The support of the SSRC under grant HR 3733 is gratefully acknowledged
名 N.H.Stern,Optimum income taxation demanded.It is the forme constraint which characterises the optimum income ation prob and which makes it a problem of the second best.Without this constraint,that individuals are on their supply curves,we have a first-best problem. When taxation is discussed it is often in terms of a trade-off between equality and efficiency,or the distribution of the cake and its size.The optimum income taxation problem is one way of formalising this trade-off and it is,perhaps g that it was not until Mirrlees (1971)that a suitable model wa We are stage of unders ture of these model and th t the the various co ponents.It should clear e outset that the purpose of this paper is not to make recommendations to the Treasury as to appropriate tax rates,but to contribute to the understanding of the dis cussion of equality versus efficiency through examination of a particular model. The particular concern of this paper is the supply function,and attention is focussed on the special case of labour supply.We shall examine the problem of estimation,which preference structures obtain support from the empirical literature on labour supply,a tes should h on o of th c level taxation. d tha most previous tax rates may have been b The next section presents the models of Mirrlees (1971)and Atkinson(1972) and contains a brief discussion of their numerical results.The problems of specifying and estimating ski!:distributions are discussed in section 3,together with calculations of the elasticity of substitution (e)between leisure and goods, based on empirical estimates of labour supply functions. The calculations of section 3 suggest that elasticities of substitution around o and and in s the ax,for m a odel。 Mirrlees (197 The e me case of0is examined,in the Mirrle mo in section 5 an we fin th optimum income taxation(linear or nonlinear)involves margina taxation at 100 percent.It is not surprising,therefore,that the calculations of section 4 show that,for small e,the optimum linear tax rate increases to 100 percent as s decreases to zero.However,where taxation is imposed to raisc revenue,as well as to redistribute,the optimum marginal rate may increase as e increases over a certain range.In section 6 the numerical discussion is evaluated. The rer ainder of this ection is devoted to a brief examination of those eler nte of the model,the obj ctive functio and the ion relation hich e no f ater discus ave worked with a concave transformation of indi- vidual cardinal.The transformation rangs from the linear utilitaran sum to the case where the 'degree of concavity'goes to infinity-the maximin, originally i more involved with my own investigations. retical and empirical work in p ess but became
124 N.H. Stern, Optimum income taxation demanded. It is the former constraint which characterises the optimum income taxation problem and which makes it a problem of the second best. Without this constraint, that individuals are on their supply curves, we have a first-best problem. When taxation is discussed it is often in terms of a trade-off between equality and efficiency, or the distribution of the cake and its size. The optimum income taxation problem is one way of formalising this trade-off and it is, perhaps, surprising that it was not until Mirrlees (1971) that a suitable model was developed. We are still at the stage of understanding the structure of these models and the importance of the various components. It should be clear at the outset that the purpose of this paper is not to make recommendations to the Treasury as to appropriate tax rates, but to contribute to the understanding of the discussion of equality versus efficiency through examination of a particular model. The particular concern of this paper is the supply function, and attention is focussed on the special case of labour supply. We shall examine the problem of estimation, which preference structures obtain support from the empirical literature on labour supply, and then the influence such estimates should have on our view of the appropriate level of income taxation. It will be suggested that most previous calculations of optimum tax rates may have been biased low. The next section presents the models of Mirrlees (1971) and Atkinson (1972) and contains a brief discussion of their numerical results.’ The problems of specifying and estimating ski!: distributions are discussed in section 3, together with calculations of the elasticity of substitution (E) between leisure and goods, based on empirical estimates of labour supply functions. The calculations of section 3 suggest that elasticities of substitution around 3 are of interest, and in section 4 the optimum linear income tax, for values of E between 0 and 1, is calculated in a model similar to that of Mirrlees (1971). The extreme case of E = 0 is examined, in the Mirrlees model, in section 5 and we find the optimum income taxation (linear or nonlinear) involves marginal taxation at 100 percent. It is not surprising, therefore, that the calculations of section 4 show that, for small E, the optimum linear tax rate increases to 100 percent as E decreases to zero. However, where taxation is imposed to raise revenue, as well as to redistribute, the optimum marginal rate may increase as E increases over a certain range. In section 6 the numerical discussion is evaluated. The remainder of this section is devoted to a brief examination of those elements of the model, the objective function and the production relation, which receive no further attention in the later discussion. Most previous writers have worked with a concave transformation of individual cardinal utilities. The transformation ranges from the linear utilitarian sum to the case where the ‘degree of concavity’ goes to infinity - the maximin, ‘1 originally intended to do a survey of theoretical and empirical work in progress but became more involved with my own investigations
N.H.Stern,Optimum income taxation 125 or Rawlsian.solution.Some might wish to claim that one is merely specifying he value i dgements of thedecison-maker by g an individual indifference curves together with a me hod by utilities are aggregated.The specification of a particular cardinal numbering for individuals and the form of the social preference relation over utilities may well be difficult,if not impossible,to disentangle,but I find it hard to understand a quantitative comparison between different forms of social welfare function for the same indifference structure(for individuals)if some benchmark of cardinality is not involved eardinality problem is much less severe when a one argument utility function is used- for example,Atkinson(1973a). .One can th nen suppose tha the government defines its values ov er the vectors whose hous hold incomes.However,when supply functions are central to the model 0n argument utility function seems out of place.It then becomes more difficul to wriggle out of the problem of numbering individual indifference curves.It is possible that part of the attraction of maximin objective functions is that the cardinality problem is less troublesome-maximising the lowest utility level will h。 cardinalisation is used when the same monotonic ng tra ation of utilities is applied to all individuals. The abe ove dis and of the literature ha s suppo i that the Be Samuelson social welfare function (no decreas g in each rgument appropriate tool for capturing social values in such analyses.Leaving aside the question of whether it should be used,it is possible that many people have some different underlying notion of welfare or distributional justice when they discuss income taxation.We illustrate the possible phenomenon with a few quotations and argu ents which might be thought plausible and vet imply non-Paretian o6 egin with thre quotations on inequality each of which clearly involves a non-Paretian position Tawney:2 When the press assails them with the sparkling epigram that they desire not merely to make the poor richer but to make the rich poorer,instead of replying,as they should,that,being sensible men,they desire both,since the extremes of both of riches and poverty are degrading and anti-social,they are apt to take refuge in gestures of depreciation. Simons:3 The case for drastic progression in taxation must be rested on the case against inequality-on the ethical or aesthetic judgement that the prevailing distribu- JPE-D
N.H. Stern, Optimum income taxation 125 or Rawlsian, solution. Some might wish to claim that one is merely specifying the value judgements of the decision-maker by using an arbitrary numbering of individual indifference curves together with a method by which individual utilities are aggregated. The specification of a particular cardinal numbering for individuals and the form of the social preference relation over utilities may well be difficult, if not impossible, to disentangle, but I find it hard to understand a quantitative comparison between different forms of social welfare function for the same indifference structure (for individuals) if some benchmark of cardinality is not involved. The cardinality problem is much less severe when a one argument utility function is used - see, for example, Atkinson (1973a). One can then suppose that the government defines its values over the vectors whose components are household incomes. However, when supply functions are central to the model a oneargument utility function seems out of place. It then becomes more difficult to wriggle out of the problem of numbering individual indifference curves. It is possible that part of the attraction of maximin objective functions is that the cardinality problem is less troublesome - maximising the lowest utility level will give the same policy whichever cardinalisation is used when the same monotonic increasing transformation of utilities is applied to all individuals. The above discussion and most of the literature has supposed that the BergsonSamuelson social welfare function (nondecreasing in each argument) is the appropriate tool for capturing social values in such analyses. Leaving aside the question of whether it shouZd be used, it is possible that many people have some different underlying notion of welfare or distributional justice when they discuss income taxation. We illustrate the possible phenomenon with a few quotations and arguments which might be thought plausible and yet imply non-Paretian objectives. We begin with three quotations on inequality each of which clearly involves a non-Paretian position. Tawney : ’ When the press assails them with the sparkling epigram that they desire not merely to make the poor richer but to make the rich poorer, instead of replying, as they should, that, being sensible men, they desire both, since the extremes of both of riches and poverty are degrading and anti-social, they are apt to take refuge in gestures of depreciation. Simons : 3 The case for drastic progression in taxation must be rested on the case against inequality - on the ethical or aesthetic judgement that the prevailing distribuYke Atkinson (1973b, p. 19). I am grateful to Kevin Roberts for drawing my attention to this quote. %ee Simons (1938, p. 15). Kevin Roberts drew my attention to this quote too. JPE- D
126 N.H.Stern.Optimum income taxation tion of wealth and income reveals a degree(and/or kind)of inequality which is distinctly evil or unlovely. Fair(1971)quotes Plato as follows Plato felt thato should bemore than four times richer than the poor membe er of so ety fo rin a society which mune from the os fatal disorders which might more properly be faction,there must be no place for penury in any section of the population, nor yet for opulence,as both breed either consequence.' Certain arguments on tax prop osals and structures might seem plausible to e.Sadka (1973) eteto Oe in prmert many and a ements ampl has sh wn that levels the optin as follows.Suppose that a given tax structure is a candic late for the optimum and it results in the most skilled person earning Er.Consider the announced marginal tax on the (Y+1)pound and suppose it is positive.Reduce it to zero. The most skilled person may work more and if he does he is better off.Similarly, others of lower skill may also work more.If they do,then they are better off (exploiting opportunities e not available to them before)and they pay 0 ckets rginal rate Thus,our cha ange has produ ed m re tax revenue and ha as made everyon least as well off as before. A Paretian should approve Many,h er,might regard a zero marginal rate at the top as offensive.It is conceivable that they may wish to retain this view even after they have understood the above argument. We should note that one cannot deduce that,where the skill distribution has positive density,for all positive skill levels the optimum marginal tax rate tends to zero.Indeed,Mirrlees(1971)gives examples where it does not.The structure is s owth model where we cannot infer from e result that nilar to ar op th del should hay apital stock at the the conclusion that the capit Istock tends to on th infinite horizo Some might propose a 10 percent tax on ink on grounds equality of opportunity for children.It is non-Paretian(if one rules out envy as the basis of the argument).since the ability to confer the inheritance makes the parent better off(the desire is to give rather than consume)and,presumably,the offspring as well.5 Many have found6 the 'equal absolute sacrifice'proposal an attractive basis for optimum income taxation.This abstracts from incentive problems and states that to raise a given revenue everyone should give up that amount of his income 40 .K tax schedules in Stern (1973)
126 N.H. Stern, Optimum income taxation tion of wealth and income reveals a degree (and/or kind) of inequality which is distinctly evil or unlovely. Fair (1971) quotes Plato as follows: Plato felt that no one in a society should be more than four times richer than the poorest member of society for ‘in a society which is to be immune from the most fatal disorders which might more properly be called distraction than faction, there must be no place for penury in any section of the population, nor yet for opulence, as both breed either consequence.’ Certain arguments on tax proposals and structures might seem plausible to many and also involve non-Paretian judgements. For example, Sadka (1973) has shown that with a finite number of individuals or skill levels the optimum marginal tax rate at the very top is zero. One can express his argument verbally as follows. Suppose that a given tax structure is a candidate for the optimum and it results in the most skilled person earning EY. Consider the announced marginal tax on the (Y+ 1) pound and suppose it is positive. Reduce it to zero. The most skilled person may work more and if he does he is better off. Similarly, others of lower skill may also work more. If they do, then they are better off (exploiting opportunities that were not available to them before) and they pay more tax since they move through tax brackets with nonnegative marginal rates. Thus, our change has produced more tax revenue and has made everyone at least as well off as before.4 A Paretian should approve. Many, however, might regard a zero marginal rate at the top as offensive. It is conceivable that they may wish to retain this view even after they have understood the above argument. We should note that one cannot deduce that, where the skill distribution has positive density, for all positive skill levels the optimum marginal tax rate tends to zero. Indeed, Mirrlees (1971) gives examples where it does not. The structure of the model is similar to an optimum growth model where we cannot infer from the result that a finite horizon model should have zero capital stock at the end, the conclusion that the capital stock tends to zero on the infinite horizon path. Some might propose a 100 percent tax on inheritance on the grounds of equality of opportunity for children, It is non-Paretian (if one rules out envy as the basis of the argument), since the ability to confer the inheritance makes the parent better off (the desire is to give rather than consume) and, presumably, the offspring as well.’ Many have found6 the ‘equal absolute sacrifice’ proposal an attractive basis for optimum income taxation. This abstracts from incentive problems and states that to raise a given revenue everyone should give up that amount of his income 4One can throw away the extra tax revenue if it is so desired. The argument is clearly rather general. ‘Better off’ has been used here in the weak sense of ‘at least as well off.’ 5Mirrlees drew my attention to this argument. 6This principle is discussed and fitted to U.K. tax schedules in Stem (1973)
N.H.Stern,Optimum income taxation which makes the sacrifice of utility equal.It turns out that one can choose a utility function which fits the U.K.income tax structure rather well.?Although it does not violate the Paretian condition,the proposal cannot be based on any ncave Bergson-Samuelson welfare function since,abstracting in ntive effects such welfare functions lead to equal post-tax incomes. The abov ve examples the standard welfar e ec ics dure based on th usual welfar o" would no starting point by many who might be prepared to comment on income tax structures Most of this paper will use a production structure with one basic input- labour in efficiency units-with a fixed wage.This does not mean that we are ea6eeagae mp the basic input ome for g0 t.Nevertheless,the assumption of one s worrying icular skill is lacking 4 h stror complementarity with other factors rather than the omple e sub titutab assumed in the case of labour in efficiency units.Felds has made a star in this direction and incorporates two different kinds of labour into his model The frequent assumption of public ownership seems less serious.If,for cxan aple,there are profits in the system,onc can carry out an analysis of the optim els esumably subiect to some constraints).The constraints on income tax would th take t of the esence of these pre other taxes r work is neessary,h wever iglitz(1976)have begu an examination of appropriate comb dons of ious taxe The absence of further discussion of the production assumptions should not be taken as a belief that they do not matter.The specification of the way different skills interact in the production process embodies an aspect of income taxation that many would regard as crucial.It is an important area for further research The models discussed here will all be static and will not,therefore,involve elasticity of its supply in any essential way.These models allow the ue apply and raise sufficien cant ar tions to w ss has been mad with dynamics but the components of the moc have to le 9 D rather simp 2.The model and numerical results of the studies of Mirrlees and Atkinson This discussion is not intended as a comprehensive survey since Atkinso (1973a)has recently provided a thorough discussion of previous numerical Sce Stern (1973). The different types oflabour ough a cobb-douglas production
N.H. Stern, Optimum income taxation 127 which makes the sacrifice of utility equal. It turns out that one can choose a utility function which fits the U.K. income tax structure rather well.’ Although it does not violate the Paretian condition, the proposal cannot be based on any symmetric strictly concave Bergson-Samuelson welfare function since, abstracting from incentive effects, such welfare functions lead to equal post-tax incomes. The above examples indicate that the standard welfare economics procedure based on the usual welfare functions would not be regarded as the obvious starting point by many who might be prepared to comment on income tax structures. Most of this paper will use a production structure with one basic input - labour in efficiency units - with a fixed wage. This does not mean that we are assuming constant returns to scale. We can regard the wage as the marginal product at the level of optimum total production and any profits that accrue as lump sum income for the government. Nevertheless, the assumption of one basic input is worrying. It is often asserted that a particular skill is lacking (say, management in the U.K.) and this carries with it a strong notion of complementarity with other factors rather than the complete substitutability assumed in the case of labour in efficiency units. Feldstein* has made a start in this direction and incorporates two different kinds of labour into his model. The frequent assumption of public ownership seems less serious. If, for example, there are profits in the system, one can carry out an analysis of the optimum levels (presumably subject to some constraints). The constraints on income taxation would then take account of the presence of these other taxes. Further work is necessary, however, and Atkinson and Stiglitz (1976) have begun an examination of appropriate combinations of various taxes. The absence of further discussion of the production assumptions should not be taken as a belief that they do not matter. The specification of the way different skills interact in the production process embodies an aspect of income taxation that many would regard as crucial. It is an important area for further research. The models discussed here will all be static and will not, therefore, involve capital and the elasticity of its supply in any essential way. These models allow the discussion of the important questions of labour supply and raise sufficient significant and difficult questions to warrant study. Some progress has been made with dynamics but the components of the models have to be kept rather simple.’ 2. The model and numerical results of the studies of Mirrlees and Atkinson This discussion is not intended as a comprehensive survey since Atkinson (1973a) has recently provided a thorough discussion of previous numerical ‘See Stem (1973). 8Feldstein (I 973). The different types of labour combine through a Cobb-Douglas production function to produce output. ?!ke, e.g., Feldstein (1973)
128 N.H.Stern,Optimum income taxation d op te rginal ta which are,on the wh lower than one might h ve pr edicted Indeed Mirrlee (1971,D.207)remarked I must confess that I had expe ted the rigorou analysis of income-taxation in the utilitarian manner to provide arguments for high tax rates.It has not done so."10 A partial response to these results has been the use of strongly egalitarian('highly concave')social welfare functions and the limiting case the 'maxi-min welfare function.'1 We shall suggest later that there is no need to use these more extreme social welfare functions to obtain tax rates that se closer to obse and that one has m erely to use labour supply functio seem ose to those which ar ally estimated for the moment,however,we give a brief sketch results and. in the process,set out the model of income taxat The original work on the current models of income taxation was that of Mirrlees (1971).In his model individuals supply labour of different qualities and hence face different pre-tax wage rates.They choose how much to supply by maximising u(c.D)subject to c=g(nlw),where c is consumption,I the hours worked.nw the hourly wage of an n-man-he produces n efficiency hours per hour worked- v is the ag per effciency hou and g()the tax function giving -tax inc come as a fu on ol pre-t aggregate prod ction is (n)dn =H(S nlf(n)dn) H(Z),where X(total consumption)is a functi H(Z)of effecti 2,and f(n)is the density of the distribution of individuals.The problem is to vary g()to maximise f G(u)fdn,where G()is a concave function and the constraints are that the amounts individuals choose to supply of labour and consume of okeamettmhaeaoaoRoghenoanht。eocpehia identify an n-man without affecting his behaviour,then opti mum can levying an appropriate lu ump sum ta for each n with a ze ma Mirrlees provided detaile d calculations for the case here u(c,1) =log c+ log(1-D),n distributed lognormally (parameters of the ass iated norma distribution being i and a),H linear and G(u)=u or -e".Using a value of =0.39,derived from the work of Lydall,12 on the distribution of earnings,he obtained median marginal tax rates for the case of G(u)=u of 22%and 2013 The higher rate was for the case where 7%of product was required by the me and the lower where 17%could be added-the additions or rresponding respe ctively to ases where profits or revenues 100 ent taxation.One is initially surpr 1073 See Lydall (1968)and Mirrlees(1971) Interpolated from Mirrlees (1971,table es I-IV,p.202)
128 N.H. Stern, Optimum income luxation work. The main purpose of this section is to draw attention to the levels of calculated optimum marginal tax rates, in models similar to those of section 4 which are, on the whole, lower than one might have predicted. Indeed Mirrlees (1971, p. 207) remarked ‘. . ., I must confess that I had expected the rigorous analysis of income-taxation in the utilitarian manner to provide arguments for high tax rates. It has not done ~0.‘~’ A partial response to these results has been the use of strongly egalitarian (‘highly concave’) social welfare functions and the limiting case the ‘maxi-min welfare function.‘” We shall suggest later that there is no need to use these more extreme social welfare functions to obtain tax rates that seem closer to observed rates, and that one has merely to use labour supply functions which seem closer to those which are usually estimated. For the moment, however, we give a brief sketch of these earlier results and, in the process, set out the model of income taxation to be used later. The original work on the current models of income taxation was that of Mirrlees (1971). In his model individuals supply labour of different qualities and hence face different pre-tax wage rates. They choose how much to supply by maximising u(c, I) subject to c = g(nlw), where c is consumption, I the hours worked, nw the hourly wage of an n-man - he produces n efficiency hours per hour worked - w is the wage per efficiency hour and g( a) the tax function giving post-tax income as q function of pre-tax income. The aggregate production constraint is X = J cf(n) dn = H(J n&z) dn) = H(Z), where X (total consumption) is a function H(Z) of effective labour 2, and f(n) is the density of the distribution of individuals. The problem is to vary g( .) to maximise J G(u)f dn, where G(a) is a concave function and the constraints are that the amounts individuals choose to supply of Iabour and consume of goods be compatible with the production relation. Note that the formulation involves taxation of nlw and does not require (nw) and I to be separately observable. If one can identify an n-man without affecting his behaviour, then the first-best optimum can be achieved by levying an appropriate lump sum tax for each n with a zero marginal rate of taxation. Mirrlees provided detailed calculations for the cases where u(c, I) = log c+ log (1 -I), n distributed lognormally (parameters of the associated normal distribution being F and a), H linear and G(u) = u or -e-“. Using a value of 0 = 0.39, derived from the work of Lydall,” on the distribution of earnings, he obtained median marginal tax rates for the case of G(u) = u of 22% and 20 “/ ’ 3 The higher rate was for the case where 7 y0 of product was required by the iovernment and the lower where 17 % could be added - the additions or subtractions corresponding respectively to cases where profits or revenues loThe utilitarian optimum ignoring incentives involves 100 percent taxation. One is initially surprised therefore when the introduction of incentives drops the rate down to 20 percent. %ke Atkinson (1972). ‘?3ee Lydall(l968) and Mirrlees (1971). 131nterpolated from Mirrlees (1971, tables I-IV, p. 202)
N.H.Stern,Optimum income taxation 129 elsewhere outweighed or were outweighed by fixed costs.or necessary expendi- ture. With a net exp nditure of 12%of product and G(u) -e", the median margir.rate rises to33 The highest marginal rates for the three cases respectively are 26,21% and 39.The marginal rates rise at first but begin falling before the median is reached.Mirrlees proves that,for the log-normal distribution and where the elasticity of substitution between consumption and leisure is less than one,the marginal rate tends to zero as n tends to co.There is a higher limit in the case of the pareto distribution where with the same condition on the substitution elasticity the marginal rate tends to 1/l+asn→o when("→-+2), Examin tion stribu of earnings(see sectio 0n3.2 suggests values of from.5 to2.5 giving limiting marginal rates Higher rates can also be produced by widening the distributi ion of skills in the log normal case is increased to 1.0(from 0.39),the median rate is 56% for the case G(u)=-e-"and a government requirement of 7%of product. Presumably with a wider distribution of skills,inequality considerations increase relative to those concerned with incentives.However.Mirrlees (1971.p.207) ts that such a a'doe not n to be at all realistic skills too vid compatible with obs earnings There are two main features of the cal tonically increasing-most of the population is in the region where they are falling-and the highest marginal rates are low.For the 'realistic'case of 0.39,applying to 5 out of 6 of Mirrlees'examples,the highest marginal rate is39 (1972)and (1973a)discusses the effect of increasing the concavity of G()a ting case of It seem clear t ha was in a influenced by the low rates in th e Mirrlees calculation tkinson(19 p.2)and (1973a,pp.390-391).The maximin criterion in the Mirrlees mod yields tax rates around 50%for the median person [see Atkinson (1972,p.28)]. We have already given the Sadka argument which explains why,for a finite 2Smanyaige7gomenat妆Romiom infinite domain,p ovided the weight in the tail is not too big.We have noted. for exar nple,that he log-norm a limitin m nal rate of ze o but the Pareto no zero limi of the marginal rat dis suggests that a declining rate at the upper end may be a feature of many del of optimum income taxation.We shall say no more (except for the spec ial case of section 5)about the shape of the tax funtion.and concentrate on the labour supply function and its relation to optimum linear taxation. otleitiobiohgm32whetherthedstnbutionotearninaseamiteadingimpresop These are annou ates rather than effective rates
N.H. Stern, Optimum income taxation 129 elsewhere outweighed or were outweighed by fixed costs, or necessary expenditure. With a net government expenditure of 12% of product and G(u) = -emu, the median margirJ rate rises to 33 %. The highest marginal rates for the three cases respectively are 26 %, 21% and 39%. The marginal rates rise at first but begin falling before the median is reached. Mirrlees proves that, for the log-normal distribution and where the elasticity of substitution between consumption and leisure is less than one, the marginal rate tends to zero as n tends to co. There is a higher limit in the case of the Pareto distribution where, with the same condition on the substitution elasticity, the marginal rate tends to l/(1 +y) as n + co when (nf’/‘) + - (7 +2). Examination of distributions of earnings (see section 3.2) suggests values of y from 0.5 to 2.5 giving limiting marginal rates from 67 % to 29 %. Higher rates can also be produced by widening the distribution of skills -if ts in the log normal case is increased to 1.0 (from 0.39), the median rate is 56% for the case G(u) = -e-” and a government requirement of 7% of product. Presumably with a wider distribution of skills, inequality considerations increase relative to those concerned with incentives. However, Mirrlees (1971, p. 207) suggests that such a cr ‘does not seem to be at all realistic. . .’ since it gives a dispersion of skills too wide to be compatible with observed distributions of earnings.r4 There are two main features of the calculated tax schedules which look different from actual income tax structures.’ 5 Marginal rates are not monotonically increasing - most of the population is in the region where they are falling - and the highest marginal rates are low. For the ‘realistic’ case of IJ = 0.39, applying to 5 out of 6 of Mirrlees’ examples, the highest marginal rate is 39%. Atkinson (1972) and (1973a) discusses the effect of increasing the concavity of G(a) and the limiting case of maximin. It seems clear that he was in part influenced by the low rates in the Mirrlees calculations - see Atkinson (1972, p. 2) and (1973a, pp. 390-391). The maximin criterion in the Mirrlees model yields tax rates around 50 % for the median person [see Atkinson (1972, p. 28)]. We have already given the Sadka argument which explains why, for a finite population, we should expect zero marginal tax rates at the top of the distribution. This argument may also have some intuitive force for distributions with an infinite domain, provided the weight in the tail is not too big. We have noted, for example, that the log-normal gives a limiting marginal rate of zero but the Pareto does not. The zero limit of the marginal rate for certain distributions suggests that a declining rate at the upper end may be a feature of many models of optimum income taxation. We shall say no more (except for the special case of section 5) about the shape of the tax funtion, and concentrate on the labour supply function and its relation to optimum linear taxation. I’IWe discuss in section 3.2 whether the distribution of earnings gives a misleading impression of the distribution of skills. lSThese are announced rates rather than effective rates
130 N.H.Stern,Oprtmum income taxatton 3.The estimation of supply functions and skill distributions 3.1.Supply functions whereotimtion ha deal evely wth siuation The y ave ning capacity.One can also imagine cases where individ als differ in their preferen relations but face the same earnings function which is determined,as far as they are concerned,exogenously.In this subsection we shall be discussing such alternative specifications,and the different problems they pose for estimation. We shall supposc,for the moment (but see section 3.2)that the number of hours of work is the appropriate argument of an individual's utility function and that the pre-tax wag es the skill or efficiency of a worker per hour of mation (but not taxation)purposes we suppose that the wage are separ rately obser To make some of our formulae explicit we shall consider utility fun ctions of the constant elasticity of substitution(CES)form,although it is clear that many of the problems we shall discuss do not depend on the particular form of the utility function. We suppose an individual maximises (c,D=[1-a)c-“+x(L-0)-]1e, (1) subject to the budget constraint c=A+(nw)l. (2) We thus have a linear tax schedule.The individual is characterised by the triple (h,n,L)and one could consider a distribution of this triple over the population. We shall be discussing some special cases.We should think of L as the number of hours available to the individual for allocation between work and leisure,given his family commitments,sleeping requirements,physical attributes and so on. The parameter h measures the ability to enjoy leisure and n the ability to produce efficiency hours of work from clock hours.DifTerent specifications of the rela- tions betweenh,nand L may lead to very different interpretations of data on condition for maximisation of utility subject to the budget constraint is (3) where =1/(1+u). The comments of A.B.Atkinson on this subsection were particularly useful
130 N.H. Stern, Optimum income taxation 3. The estimation of supply functions and skill distributions 3.1. Supply functions’6 The work on optimum income taxation has dealt exclusively with situations where individuals have the same preference relation but differ in their earnings capacity. One can also imagine cases where individuals differ in their preference relations but face the same earnings function which is determined, as far as they are concerned, exogenously. In this subsection we shall be discussing such alternative specifications, and the different problems they pose for estimation. We shall suppose, for the moment (but see section 3.2) that the number of hours of work is the appropriate argument of an individual’s utility function and that the pre-tax wage measures the skill or efficiency of a worker per hour of work. For estimation (but not taxation) purposes we suppose that the wage and hours are separately observable. To make some of our formulae explicit we shall consider utility functions of the constant elasticity of substitution (CES) form, although it is clear that many of the problems we shall discuss do not depend on the particular form of the utility function. We suppose an individual maximises u(c, I) = [(l-U) c-“+cc(h(L-I))-“]-“, (1) subject to the budget constraint c = A+(nw)Z. (2) We thus have a linear tax schedule. The individual is characterised by the triple (h, n, L) and one could consider a distribution of this triple over the population. We shall be discussing some special cases. We should think of L as the number of hours available to the individual for allocation between work and leisure, given his family commitments, sleeping requirements, physical attributes and so on. The parameter h measures the ability to enjoy leisure and n the ability to produce efficiency hours of work from clock hours. Different specifications of the relations between h, n and L may lead to very different interpretations of data on wages and hours. The first-order condition for maximisation of utility subject to the budget constraint is (A + WY> P-"(L-Z) = [_.~J, (3) where E = l/(1 +I*). ‘@like comments of A.B. Atkinson on this subsection were particularly useful
N.H.Stern,Optimum income taxation 131 In the Mirrlees case individuals have identical preferences so that h and L are constant over the population.Puttingh=1 and taking logarithms,we have 1os(t)=eo (+los() (4) We see immediately that where the total quantity of hours available(L)is known or specified,we can estimate e and a by regressing consumption per hour of leisure on the wage rate(mw). (nw Fig.1 Note that our assumption of identical preferences enables us to identify the supply functio on by mer plotting the rela on bety een I and the rate per clock hour(nw)(see fig.1).This form ,especiall convenient for estimation purposes (see section 3.3).The skill d then given by the distribution of wage rates. The above procedure is very sensitive to the assumption of identical preferences.We give two examples to illustrate this point.First,suppose L is onstant in the population but h=n.In other words,individuals have identical vailable hours but those who produce more efficiency hours of work obtain a similar hour of leisure.And supp ose for the sake of illustration,that -0.We see from (3)that is inde ent of In othe words,everyone work s the same numbe hu might fer,or seeing a distribution of wages and no curve was inelastic when an increase in w(the wage per efficiency hour)would change hours worked
N.H. Stern, Optimum income taxation 131 In the Mirrlees case individuals have identical preferences so that h and L are constant over the population. Putting h = 1 and taking logarithms, we have 1% = &log(nw)+&log (4) We see immediately that where the total quantity of hours available (L) is known or specified, we can estimate E and a by regressing consumption per hour of leisure on the wage rate (nw). 0-4 A @p$q - - - - - i j .-_ _ (nzw) - - - - - Fig. 1 Note that our assumption of identical preferences enables us to identify the supply function by merely plotting the relation between I and the post-tax wage rate per clock hour (nw) (see fig. 1). This formulation is, therefore, especially convenient for estimation purposes (see section 3.3). The skill distribution is then given by the distribution of wage rates. The above procedure is very sensitive to the assumption of identical preferences. We give two examples to illustrate this point. First, suppose L is constant in the population but h = n. In other words, individuals have identical available hours but those who produce more efficiency hours of work obtain a similarly increased satisfaction per hour of leisure. And suppose, for the sake of illustration, that A = 0. We see from (3) that I is independent of n. In other words, everyone works the same number of hours. Thus we might infer, on seeing a distribution of wages and no variation of hours, that the supply curve was inelastic when an increase in w (the wage per efficiency hour) would change hours worked
132 N.H.Stern,Optimum tncome taxation A second example has been used by Hall (1974)He supposes but pop He deals in icular with the case where e=1 (equiva alent toμ= 0 or u(c,)= - tha we e have (L-1)= a(4+wL)/w.He assumesL=(1-0)I the beta density on f()=6001-0).He applies the model to the Penn-New Jersey negative income tax (NIT)experiment.Families were offered a choice between (Ao,wo) (participation)and (A,w)(nonparticipation)with Ao A,wo <w.His model predicts both participation rates and changes in hours given participation fairly well.Hall argues that the representative individual is not a sensible concept see 、dis ersion of ho ars worked for a given(A,w),and that a theory of labour supply should account for this dispersion 3.2.Some problems of estimating the skill distribution In the previous subsection we suppose for our discussion of estimation that nw and were separately observable.We had been interpreting I as clock-hours and regarding/as the relevant argument for the disutility of labour and as the sure The probler ism re mplicated than this wever .BoRhdisutiiyandprod ty of lab imarily of the effort required rather than the num ours,although the atter is obviously of importance.In the absence of a direct measurement o should discuss estimation problems when we can observe nwl,total pre-tax labour income,and not nw and separately.Here we interpret as effort. There is one special formulation'7 which makes the problem disappear. If individuals maximise(1-a)log c+a log(1-1)subject to c =a(nwl),where and 6 define the tax function then l is constant and (pre-tax)incomes are ant times n We can,therefore,read off the distribution of skills from tribution of labour in not directly obs ervable the assumption that it is consta nt is not violate though we can ate It is clear,however,that the trick is rather special and will not work for r mor general utility and tax functions. In general then,if is not dircctly observable,we cannot pass from a distribu tion of labour income to a distribution of n unless we have full knowledge of the utility function and the tax function,when can be deduced.We can,however, ax schedule changes.We illustrate this as follows.Put =1 in (3),and we have (nwl)=(1-@)nwL-aA. ⑤ We can now use(5)to estimate.Let us suppose that the current post-tax wage 1The formation was used by Vickrey(1947)and Bevan(1974)
132 N.H. Stern, Optimum income taxation A second example has been used by Hall (1974). He supposes h = n = 1 but L varies in the population. He deals in particular with the case where E = 1 (equivalent to p = 0 or U(C, I) = c’-“(L-I)‘) so that we have (L-I) = a(_4 + wL)/w . He assumes L = (1 - f3)L, where 0 has the beta density on [O, 11: f(0) = 60( I- 0). He applies the model to the Penn-New Jersey negative income tax (NIT) experiment. Families were offered a choice between (A, wO) (participation) and (A, w) (nonparticipation) with A0 > A, w, c w. His model predicts both participation rates and changes in hours given participation fairly well. Hall argues that the representative individual is not a sensible concept when we see a dispersion of hours worked for a given (A, w), and that a theory of labour supply should account for this dispersion. 3.2. Some problems of estimating the skill distribution In the previous subsection we suppose for our discussion of estimation that nw and I were separately observable. We had been interpreting 1 as clock-hours and regarding I as the relevant argument for the disutility of labour and as the basis of the productivity measure. The problem is more complicated than this, however. Both disutility and productivity of labour may be a function primarily of the effort required rather than the number of hours, although the latter is obviously of importance. In the absence of a direct measurement of effort we should discuss estimation problems when we can observe nwl, total pre-tax labour income, and not nw and 1 separately. Here we interpret I as effort. There is one special formulation” which makes the problem disappear. If individuals maximise (1 -a) log c+a log (l-1) subject to c = a(nwi)‘, where a and S define the tax function, then 1 is constant and (pre-tax) incomes are distributed as a constant times n. We can, therefore, read off the distribution of skills from the distribution of labour income. Since I is not directly observable, the assumption that it is constant is not violated, although we cannot estimate tl. It is clear, however, that the trick is rather special and will not work for more general utility and tax functions. In general then, if I is not directly observable, we cannot pass from a distribution of labour income to a distribution of n unless we have full knowledge of the utility function and the tax function, when I can be deduced. We can, however, gain information on the utility function and skill distribution separately if the tax schedule changes. We can illustrate this as follows. Put E = 1 in (3), and we have (nwl) = (l-a)nwL-&4. (5) We can now use (5) to estimate CL. Let us suppose that the current post-tax wage “The formation was used by Vickrey (1947) and Bevan (1974)