《金融经济学》（英文版） Bayesian Learning in Social Networks.pdf_大学文库

1. INTRODUCTION The canonical model of social learning comprises a set of agents 1,a finite set of actions A, a set of states of nature Q, and a common payoff function U(a, w), where a is the action chosen and w is the state of nature. Each agent i receives a private signal oi(w), a function of the state of nature w, and uses this private information to identify a payoff-maximizing action. This setup provides an example of a pure information externality. Each agent's payoff depends on his own action and on the state of nature. It does not depend directly on the actions of other agents. However, each agents action reveals something about his private signal, so an agent can generally improve his decision by observing what others do before choosing his own action. In social settings, where agents can observe one another's actions. it is rational for them to learn from one another This kind of social learning was first studied by Banerjee(1992)and Bikhchandani, Hirshleifer and Welch(1992). Their work was extended by Smith and Sorensen(2000). These models of social learning assume a sim- ple sequential structure, in which the order of play is fixed and exogenous. They also assume that the actions of all agents are public information Thus, at date 1, agent 1 chooses an action a1, based on his private in- formation; at date 2, agent 2 observes the action chosen by agent 1 and chooses an action a based on his private information and the information evealed by agent 1s action; at date 3, agent 3 observes the actions chosen by agents 1 and 2 and chooses an action a3.. and so on. In what follows we refer to this structure as the sequential social-learning model(SSLM) One goal of the social learning literature is to explain the striking uni- formity of social behavior that occurs in fashion, fads, "mob psychology and so forth. In the context of the SslM, this uniformity takes the form of herd behavior. Smith and Sorensen(2000) have shown that, in the SSLM, herd behavior arises in finite time with probability one. Once the proportion of agents choosing a particular action is large enough, the pub- lic information in favor of this action outweighs the private information of any single agent. So each subsequent agent"ignores"his own signal and follows the herd” This is an important result and it helps us understand the basis for uniformity of social behavior. At the same time. the sslM has several 2A herd occurs if, after some finite date t, every agent chooses the action. An informational cascade occurs if, after some finite date t, every agent finds it optimal to he same action regardless of the value of his private signal. An informational implies herd behavior, but a herd can arise without a cascade most interesting property of the models of Bikhchandani, Hirshleifer and Welch (1992)and Banerjee(1992)is that informational cascades arise very rapidly. before much information has been revealed. For example, models if the first two agents the same choice, all subsequent agents will ignore their information and imitate the first two. The behavior of a potential infinity of agents is determined by the behavior of the first two. This is both informationally inefficient and Pareto inefficient

1. INTRODUCTION The canonical model of social learning comprises a set of agents I, a finite set of actions A, a set of states of nature Ω, and a common payoff function U(a, ω), where a is the action chosen and ω is the state of nature. Each agent i receives a private signal σi(ω), a function of the state of nature ω, and uses this private information to identify a payoff-maximizing action. This setup provides an example of a pure information externality. Each agent’s payoff depends on his own action and on the state of nature. It does not depend directly on the actions of other agents. However, each agent’s action reveals something about his private signal, so an agent can generally improve his decision by observing what others do before choosing his own action. In social settings, where agents can observe one another’s actions, it is rational for them to learn from one another. This kind of social learning was first studied by Banerjee (1992) and Bikhchandani, Hirshleifer and Welch (1992). Their work was extended by Smith and Sørensen (2000). These models of social learning assume a simple sequential structure, in which the order of play is fixed and exogenous. They also assume that the actions of all agents are public information. Thus, at date 1, agent 1 chooses an action a1, based on his private information; at date 2, agent 2 observes the action chosen by agent 1 and chooses an action a2 based on his private information and the information revealed by agent 1’s action; at date 3, agent 3 observes the actions chosen by agents 1 and 2 and chooses an action a3 ...; and so on. In what follows we refer to this structure as the sequential social-learning model (SSLM). One goal of the social learning literature is to explain the striking uniformity of social behavior that occurs in fashion, fads, “mob psychology”, and so forth. In the context of the SSLM, this uniformity takes the form of herd behavior.2 Smith and Sørensen (2000) have shown that, in the SSLM, herd behavior arises in finite time with probability one. Once the proportion of agents choosing a particular action is large enough, the public information in favor of this action outweighs the private information of any single agent. So each subsequent agent “ignores” his own signal and “follows the herd”. This is an important result and it helps us understand the basis for uniformity of social behavior.3 At the same time, the SSLM has several 2A herd occurs if, after some finite date t, every agent chooses the same action. An informational cascade occurs if, after some finite date t, every agent finds it optimal to choose the same action regardless of the value of his private signal. An informational cascade implies herd behavior, but a herd can arise without a cascade. 3The most interesting property of the models of Bikhchandani, Hirshleifer and Welch (1992) and Banerjee (1992) is that informational cascades arise very rapidly, before much information has been revealed. For example, in these models if the first two agents make the same choice, all subsequent agents will ignore their information and imitate the first two. The behavior of a potential infinity of agents is determined by the behavior of the first two. This is both informationally inefficient and Pareto inefficient. 2

special features that deserve further examination: (i) each agent makes a single, irreversible decision; (ii) the timing of the agent's decision(his osition in the decision-making queue)is fixed and exogenous; (ii)agent observe the actions of all their predecessors; and (iv) the number of signals like the number of agents, is infinite. so once a cascade begins the amount of information lost is large. These features simplify the analysis of the SSLM, but they are quite restrictive In this paper, we study the uniformity of behavior in a framework that llows for a richer pattern of social learning. We depart from the SSLM two ways. First, we drop the assumption that actions are public infor- mation and assume that agents can observe the actions of some, but not necessarily all, of their neighbors. Second, we allow agents to make deci sions simultaneously, rather than sequentially, and to revise their decisions rather than making a single, irreversible decision. We refer to this structure as the social network model(sNM). For empirical examples that illustrate he important role of networks in social learning, see Bikhchandani. Hirsh leifer and Welch(1998) On the face of it, uniform behavior seems less likely in the SNM, where agents have very different information sets, than in the SSLM. However uniformity turns out to be a robust feature of connected social networks. The following results are established for any connected network: Uniformity of behavior: Initially, diversity of private information leads to diversity of actions. Over time. as agents learn by observing the actions of their neighbors, some convergence of beliefs is inevitable. A central question is whether agents can rationally choose different actions forever Disconnected agents can clearly 'disagree'forever. Also, there may be cases where agents are indifferent between two actions and disagreement of actions is immaterial. However, apart from cases of disconnectedness and indifference, all agents must eventually choose the same action. Thus, learning occurs through diversity but is eventually replaced by uniformity. Optimality: We are interested in whether the common action chosen asymp- otically is optimal, in the sense that the same action would be chosen if all the signals were public information. In special cases, we can show that asymptotically the optimal action is chosen but, in general. there is no reason why this should be the case. Although the process of learning in networks can be very complicated e SNM has several features that make the asymptotic analysis tractable. The first is the welfare-improvement principle Agents have perfect recall, so expected utility is non-decreasing over time. This implies that equilibri d A network is a directed graph in which the node Agent i can observe the actions of agent j if i is connected to agent 3. A network is onnected if, for any two agents i and j, there is a sequence il,. iK such that il =i. iK=j and ik is connected to ik+1 for k= l,,K-1

special features that deserve further examination: (i) each agent makes a single, irreversible decision; (ii) the timing of the agent’s decision (his position in the decision-making queue) is fixed and exogenous; (iii) agents observe the actions of all their predecessors; and (iv) the number of signals, like the number of agents, is infinite, so once a cascade begins the amount of information lost is large. These features simplify the analysis of the SSLM, but they are quite restrictive. In this paper, we study the uniformity of behavior in a framework that allows for a richer pattern of social learning. We depart from the SSLM in two ways. First, we drop the assumption that actions are public information and assume that agents can observe the actions of some, but not necessarily all, of their neighbors. Second, we allow agents to make decisions simultaneously, rather than sequentially, and to revise their decisions rather than making a single, irreversible decision. We refer to this structure as the social network model (SNM). For empirical examples that illustrate the important role of networks in social learning, see Bikhchandani, Hirshleifer and Welch (1998). On the face of it, uniform behavior seems less likely in the SNM, where agents have very different information sets, than in the SSLM. However, uniformity turns out to be a robust feature of connected social networks.4 The following results are established for any connected network: Uniformity of behavior : Initially, diversity of private information leads to diversity of actions. Over time, as agents learn by observing the actions of their neighbors, some convergence of beliefs is inevitable. A central question is whether agents can rationally choose different actions forever. Disconnected agents can clearly ‘disagree’ forever. Also, there may be cases where agents are indifferent between two actions and disagreement of actions is immaterial. However, apart from cases of disconnectedness and indifference, all agents must eventually choose the same action. Thus, learning occurs through diversity but is eventually replaced by uniformity. Optimality: We are interested in whether the common action chosen asymptotically is optimal, in the sense that the same action would be chosen if all the signals were public information. In special cases, we can show that asymptotically the optimal action is chosen but, in general, there is no reason why this should be the case. Although the process of learning in networks can be very complicated, the SNM has several features that make the asymptotic analysis tractable. The first is the welfare-improvement principle. Agents have perfect recall, so expected utility is non-decreasing over time. This implies that equilibrium 4A network is a directed graph in which the nodes correspond to representative agents. Agent i can observe the actions of agent j if i is connected to agent j. A network is connected if, for any two agents i and j, there is a sequence i1, ..., iK such that i1 = i, iK = j and ik is connected to ik+1 for k = 1, ..., K − 1. 3

payoffs form a submartingale. We use the martingale convergence theorem to establish that an agents(random) payoff converges almost surely to a constant Another useful property of the model is the imitation principle. If agent i can observe the actions of agent j, then one strategy available to him imitate whatever i does. Since i and j have different information sets. their conditional payoffs under this strategy may be different. However,on average, i must do as well as j The imitation principle, together with the connectedness of the network is used to show that, asymptotically, i and j must get the same average unconditional) payoffs. It turns out that this is only possible if agents choose the same actions. More precisely, agents choose different actions only if they are indifferent While the convergence properties of the model are quite general, other properties have only been established for particular networks Convergence in finite time: In special cases, we can rule out the possibility of indifference between actions with probability one. In that case, all agents choose the same action in finite time with probability one Speed of convergence: In two- and three-person networks, we can show that convergence to a uniform action is extremely rapid typically occurring within five or six periods with probability close to 1. What happens in those first few periods is important for the determination of the asymptotic state Network architecture: Systematic differences can be identified in the behav ior of different networks. For example, in three-person complete networks (where each agent observes all the others ), learning stops almost immedi- ately and the probability of an incorrect action in the long run is high. In hree-person incomplete networks, learning continues for a longer time and he probability of choosing an incorrect action in the long run is lower The rest of the paper is organized as follows. In Section 2 we define the model and the equilibrium concept more precisely. In Section 3 we use the case of two-person networks to illustrate the working of the general model and some special features of complete networks. In Section 4 we derive the asymptotic properties of the general model. In Section 5 we study a selection of three-person graphs. Here we see the impact of lack of common knowledge on the dynamics of social learning and the efficiency of aggregation. We also compare the dynamic and asymptotic properties of different networks. The results are discussed in Section 6. Proofs are gathered in Section 7 2. THE MODEL The social learning literature ignores the complications of strategic be- havior in order to focus on pure Bayesian learning. Non-strategic behavior

payoffs form a submartingale. We use the martingale convergence theorem to establish that an agents’ (random) payoff converges almost surely to a constant. Another useful property of the model is the imitation principle. If agent i can observe the actions of agent j, then one strategy available to him is to imitate whatever j does. Since i and j have different information sets, their conditional payoffs under this strategy may be different. However, on average, i must do as well as j. The imitation principle, together with the connectedness of the network, is used to show that, asymptotically, i and j must get the same average (unconditional) payoffs. It turns out that this is only possible if agents choose the same actions. More precisely, agents choose different actions only if they are indifferent. While the convergence properties of the model are quite general, other properties have only been established for particular networks: Convergence in finite time: In special cases, we can rule out the possibility of indifference between actions with probability one. In that case, all agents choose the same action in finite time with probability one. Speed of convergence: In two- and three-person networks, we can show that convergence to a uniform action is extremely rapid, typically occurring within five or six periods with probability close to 1. What happens in those first few periods is important for the determination of the asymptotic state. Network architecture: Systematic differences can be identified in the behavior of different networks. For example, in three-person complete networks (where each agent observes all the others), learning stops almost immediately and the probability of an incorrect action in the long run is high. In three-person incomplete networks, learning continues for a longer time and the probability of choosing an incorrect action in the long run is lower. The rest of the paper is organized as follows. In Section 2 we define the model and the equilibrium concept more precisely. In Section 3 we use the case of two-person networks to illustrate the working of the general model and some special features of complete networks. In Section 4 we derive the asymptotic properties of the general model. In Section 5 we study a selection of three-person graphs. Here we see the impact of lack of common knowledge on the dynamics of social learning and the efficiency of aggregation. We also compare the dynamic and asymptotic properties of different networks. The results are discussed in Section 6. Proofs are gathered in Section 7. 2. THE MODEL The social learning literature ignores the complications of strategic behavior in order to focus on pure Bayesian learning. Non-strategic behavior 4

is simpler to analyze and it also seems more appropriate for a model of social behavior. However, special assumptions are needed to rationalized non-strategic behavior. In the Sslm, for example, an agent is assumed to make a once-in-a-lifetime decision. Because his payoff is independent of other agents'actions, it is rational for him to behave myopically and ig nore the affect of his action on the agents who follow him. In the sNm. an agent's payoff is independent of other agents'actions but, unlike the SSLM agents make repeated decisions. In order to eliminate strategic behavior, we assume that the economy comprises a large number of individually in- significant agents and that agents only observe the distribution of actions t each date. Since a single agent cannot affect the distribution of actions he cannot influence the future play of the game. This allows us to "strategic"considerations and focus on the pure Bayesian-learning of the model The agents Formally, we assume there is a finite set of locations indexed by i=1,.,n At each location, there is a non-atomic continuum of identical agents. In he sequel, the continuum of agents at location i is replaced by a single representative agent i who maximizes his short-run payoff in each period Uncertainty is represented by a probability measure space( Q, F,P) where s is a compact metric space, F is a a-field, and P a probability measure. Time is represented by a countable set of dates indexed by t 1.2 Let ACR be a finite set of actions and let U:AxQ-R be the common payoff function, where U(a, )is a bounded, measurable function for every action a. Each(representative)agent i receives a private signal Ji(w) at date 1, where ;: Q2-+R is a random variable The network A social network is represented by a family of sets N;: i=l,.,n), where N≤{1,…,-1,i+1,…,n} For each agent i, Ni denotes the set of agents j# i who can be observed by agent i. We can think of Ni as representing is"neighborhood". The sets IN: 1=1, , n define a directed graph with nodes N=(1,.,ny and edges E= U(i,): J N. The social network determines the information How in the economy. agent i can observe the action of agent j if and only if j E Ni. Agents have perfect recall so their information set at each date includes the actions they have observed at every previous date For any nodes i and j, a path from i to j is a finite sequence il, . iK such that i1=i, iK =3 and ik+1 E Nik for k= 1,.K-1. A node connected to j if there is a path from i to j. The network INil is connected if every pair of nodes i and j is connected. Connectedness is essential for uniformity of behavior. but not for other results

is simpler to analyze and it also seems more appropriate for a model of social behavior. However, special assumptions are needed to rationalized non-strategic behavior. In the SSLM, for example, an agent is assumed to make a once-in-a-lifetime decision. Because his payoff is independent of other agents’ actions, it is rational for him to behave myopically and ignore the affect of his action on the agents who follow him. In the SNM, an agent’s payoff is independent of other agents’ actions but, unlike the SSLM, agents make repeated decisions. In order to eliminate strategic behavior, we assume that the economy comprises a large number of individually insignificant agents and that agents only observe the distribution of actions at each date. Since a single agent cannot affect the distribution of actions, he cannot influence the future play of the game. This allows us to ignore “strategic” considerations and focus on the pure Bayesian-learning aspect of the model. The agents Formally, we assume there is a finite set of locations indexed by i = 1, ..., n. At each location, there is a non-atomic continuum of identical agents. In the sequel, the continuum of agents at location i is replaced by a single representative agent i who maximizes his short-run payoff in each period. Uncertainty is represented by a probability measure space (Ω, F, P), where Ω is a compact metric space, F is a σ-field, and P a probability measure. Time is represented by a countable set of dates indexed by t = 1, 2, .... Let A ⊂ R be a finite set of actions and let U : A × Ω → R be the common payoff function, where U(a, ·) is a bounded, measurable function for every action a. Each (representative) agent i receives a private signal σi(ω) at date 1, where σi : Ω → R is a random variable. The network A social network is represented by a family of sets {Ni : i = 1, ..., n}, where Ni ⊆ {1, ..., i − 1, i + 1, ..., n}. For each agent i, Ni denotes the set of agents j 6= i who can be observed by agent i. We can think of Ni as representing i’s “neighborhood”. The sets {Ni :1=1, ..., n} define a directed graph with nodes N = {1, ..., n} and edges E = ∪n i=1{(i, j) : j ∈ Ni}. The social network determines the information flow in the economy. Agent i can observe the action of agent j if and only if j ∈ Ni. Agents have perfect recall so their information set at each date includes the actions they have observed at every previous date. For any nodes i and j, a path from i to j is a finite sequence i1, ..., iK such that i1 = i, iK = j and ik+1 ∈ Nik for k = 1, ..., K − 1. A node i is connected to j if there is a path from i to j. The network {Ni} is connected if every pair of nodes i and j is connected. Connectedness is essential for uniformity of behavior, but not for other results. 5

equilibriun At the beginning of each date t, agents choose actions simultaneously. Then each agent i observes the actions ait chosen by the agents j E Ni and updates his beliefs accordingly. Agent is information set at date t consists of his signal oi( w) and the history of actions ais: jE Ni, s<t-1 Agent i chooses the action ait to maximize the expectation of his short-run payoff U(ait, w) conditional on the information available An agents behavior can be described more formally as follows. Agent i's choice of action at date t is described by a random variable Xit(w) and his information at date t is described by a a-field Fit. Since the agents choice can only depend on the information available to him, Xit must be measurable with respect to Fit. Since Fit represents the agent's information at date t, it must be the a-field generated by the random variables g: and 3 N 1. Note that there is no need to condition explicitly on agent is past actions because they are functions of the past actions of agents j E Ni and the signal oi(w). Finally, since it is optimal, there cannot be any other Fit-measurable choice function that yields a higher expected utility. These are the essential elements of our definition of equilibrium, as stated below DEFINITION 1. A weak perfect Bayesian equilibrium consists of a se- quence of random variables (Xit and o-fields Fit such that for each (i) Xit: Q2- A is Fit-measurable (i)Fit=F(oi, [X,s:jE Ni]s=1),and (ii) EU(w),w)s EU(Xit(w), w). for any Fit-measurable function 3x:g2→A Note that our definition of equilibrium does not require optimality"off the equilibrium path". This entails no essential loss of generality as long as it is assumed that the actions of a single agent, who is of measure zero are not observed by other players. Then a deviation by a single agent has no effect on the subsequent decisions of other agents 3. LEARNING WITH TWO(REPRESENTATIVE)AGENTS AND TWO ACTIONS To fix ideas and illustrate the workings of the basic model. we first onsider the special case of two representative agents, A and B, and two actions,0 and 1. There are three graphs, besides the empty graph M ()NA={B},NB={4} ( i)NA=(B, NB=0 必,NB={4}

Equilibrium At the beginning of each date t, agents choose actions simultaneously. Then each agent i observes the actions ajt chosen by the agents j ∈ Ni and updates his beliefs accordingly. Agent i’s information set at date t consists of his signal σi(ω) and the history of actions {ajs : j ∈ Ni, s ≤ t − 1}. Agent i chooses the action ait to maximize the expectation of his short-run payoff U(ait, ω) conditional on the information available. An agent’s behavior can be described more formally as follows. Agent i’s choice of action at date t is described by a random variable Xit(ω) and his information at date t is described by a σ-field Fit. Since the agent’s choice can only depend on the information available to him, Xit must be measurable with respect to Fit. Since Fit represents the agent’s information at date t, it must be the σ-field generated by the random variables σi and {Xjs : j ∈ Ni, s ≤ t − 1}. Note that there is no need to condition explicitly on agent i’s past actions because they are functions of the past actions of agents j ∈ Ni and the signal σi(ω). Finally, since Xit is optimal, there cannot be any other Fit-measurable choice function that yields a higher expected utility. These are the essential elements of our definition of equilibrium, as stated below. Definition 1. A weak perfect Bayesian equilibrium consists of a sequence of random variables {Xit} and σ-fields {Fit} such that for each i = 1, ..., n and t = 1, 2, ..., (i) Xit : Ω → A is Fit-measurable, (ii) Fit = F ¡ σi, {Xjs : j ∈ Ni}t−1 s=1¢ , and (iii) E[U(x(ω), ω)] ≤ E[U(Xit(ω), ω)], for any Fit-measurable function x : Ω → A. Note that our definition of equilibrium does not require optimality “off the equilibrium path”. This entails no essential loss of generality as long as it is assumed that the actions of a single agent, who is of measure zero, are not observed by other players. Then a deviation by a single agent has no effect on the subsequent decisions of other agents. 3. LEARNING WITH TWO (REPRESENTATIVE) AGENTS AND TWO ACTIONS To fix ideas and illustrate the workings of the basic model, we first consider the special case of two representative agents, A and B, and two actions, 0 and 1. There are three graphs, besides the empty graph NA = NB = ∅, (i) NA = {B}, NB = {A}; (ii) NA = {B}, NB = ∅; (iii) NA = ∅, NB = {A}. 6

ases (ii) and(iii) are uninteresting because there is no possibility of mu- ual learning. For example, in case(i), agent B observes a private signal and chooses the optimal action at date 1. Since he observes no further in- formation, he chooses the same action at every subsequent date. Agent A observes a private signal and chooses the optimal action at date 1. At date 2. he observes agent B's action at date 1. updates his beliefs and chooses the new optimal action at date 2. After that, A receives no additional information, so agent A chooses the same action at every subsequent date. Agent A has learned something from agent B, but that is as far as it goes In case(i), on the other hand, the two agents learn from each other and learning can continue for an unbounded number of periods. We focus on the network defined in(i) in what follows. For simplicity, we consider a special information and payoff structure. We assume that Q2=SA QB, where Q; is an interval a, b and the generic element is w=(wA, WB). The signals are assumed to satisfy =A,B, where the random variables w a and wb are independently and continuously distributed. that is, P= PA x PB and Pi has no atoms. There are two actions a=0, 1 and the payoff function is assumed to satisfy 0 ifa=0 where the function U(wA, wB) is assumed to be a continuous and increasing function. To avoid trivialities we assume that neither action is weakly These assumptions are sufficient for the optimal strategy to have the form of a cutoff rule. To see this, note that for any history that occurs with positive probability, agent is beliefs at date t take the form of an event Wwil x Bit, where the true value of wi is known to belong to Bit. Then the payoff to action 1 is pi wi, Bit)=eu(wA, wBwi x Bit. Clearly, Pi wi, Bit)is increasing in wi, because the distribution of w, is independent of wi, so there exists a cutoff wi (Bit)such that >(Bt)→→y;(,Bt)>0, 1<(Bt)=1(u,Bjt)<0 We assume that when an agent is indifferent between two actions, he chooses action 1. The analysis is essentially the same for any other the tie-breaking rule. The fact that agent is strategy takes the form of a cutoff rule implies that the set Bit is an interval. This can be proved by induction as follows At date 1, agent j has a cutoff w, and Xi1(w)=l if and only if wj2w;

Cases (ii) and (iii) are uninteresting because there is no possibility of mutual learning. For example, in case (ii), agent B observes a private signal and chooses the optimal action at date 1. Since he observes no further information, he chooses the same action at every subsequent date. Agent A observes a private signal and chooses the optimal action at date 1. At date 2, he observes agent B’s action at date 1, updates his beliefs and chooses the new optimal action at date 2. After that, A receives no additional information, so agent A chooses the same action at every subsequent date. Agent A has learned something from agent B, but that is as far as it goes. In case (i), on the other hand, the two agents learn from each other and learning can continue for an unbounded number of periods. We focus on the network defined in (i) in what follows. For simplicity, we consider a special information and payoff structure. We assume that Ω = ΩA ×ΩB , where Ωi is an interval [a, b] and the generic element is ω = (ωA, ωB). The signals are assumed to satisfy σi(ω) = ωi, ∀ω ∈ Ω, i = A, B, where the random variables ωA and ωB are independently and continuously distributed, that is, P = PA × PB and Pi has no atoms. There are two actions a = 0, 1 and the payoff function is assumed to satisfy u(a, ω) = ½ 0 if a = 0 U(ω) if a = 1, where the function U(ωA, ωB) is assumed to be a continuous and increasing function. To avoid trivialities we assume that neither action is weakly dominated. These assumptions are sufficient for the optimal strategy to have the form of a cutoff rule. To see this, note that for any history that occurs with positive probability, agent i’s beliefs at date t take the form of an event {ωi} × Bjt, where the true value of ωj is known to belong to Bjt. Then the payoff to action 1 is ϕi(ωi, Bjt) = E[U(ωA, ωB)|{ωi} × Bjt}. Clearly, ϕi(ωi, Bjt) is increasing in ωi, because the distribution of ωj is independent of ωi, so there exists a cutoff ω∗ i (Bjt) such that ωi > ω∗ i (Bjt) =⇒ ϕi(ωi, Bjt) > 0, ωi < ω∗ i (Bjt) =⇒ ϕi(ωi, Bjt) < 0. We assume that when an agent is indifferent between two actions, he chooses action 1. The analysis is essentially the same for any other the tie-breaking rule. The fact that agent i’s strategy takes the form of a cutoff rule implies that the set Bit is an interval. This can be proved by induction as follows. At date 1, agent j has a cutoff ω∗ j1 and Xj1(ω)=1 if and only if ωj ≥ ω∗ j1. 7

orU(≌A,B)=0.If A,B{4}×BB]0, a contradiction. Thus, B is a singleton and U(w)=0 if wE B The set Ww: U(w)=0 has probability zero, so the probability of disagreeing forever is 0. In other words, both agents will choose the same action in finite time and once they have chosen the same action, they have reached an absorbing state and will continue to choose the same action in every subseque 3.1.An To illustrate the short-run dynamics of the model, we can further spe- cialize the example by assuming that, for each agent i, the signal i w)=wi is uniformly distributed on the interval [-1, 1 and the payoff to action 1 At date 1, each agent chooses 1 if his signal is positive and zero if it is negative. If both choose the same action at date 1, they will continue to choose the same action at each subsequent date. Seeing the other agent choose the same action will only reinforce each agent's belief that he has made the correct choice. No further information is revealed at subsequent dates and so we have reached an absorbing state, in which each agent knows his own signal and that the other's signal has the same sign, but nothing more. So interesting dynamics occur only in the case where agents choose different actions at date 1. The exact nature of the dynamics depends on the relative strength of the two signals, measured here by their absolute values. Without loss of generality, we assume that A has a negative signal B a positive signal, and B's signal is relatively the stronger, i. e, wal-1/2 and wB>1/2. In the first round at date 1, agent A will choose action 0 and agent B will choose action 1. At the second date, having observed that agent B chose 1, agent A will switch to action 1. while agent B will continue to choose 1. Thereafter, there is an absorbing state in which both agents choose 1 for ever and no further learning occurs Case 2: 3/43/4. As before, A chooses 0 and B chooses 1 at date 1. at date 2. a observes that b chose 1 and infers that his signal has expected value 1/2. Since wA 1/ 2 so the expected value of B's signal is 3/4 and since wA >-3/4 is optimal for him to switch to l, which then becomes an absorbing state

or U(ωA, ωB)=0. If BB is not a singleton, E[U(ωA, ωB)|{ωA} × BB] 0, a contradiction. Thus, B is a singleton and U(ω)=0 if ω ∈ B. The set {ω : U(ω)=0} has probability zero, so the probability of disagreeing forever is 0. In other words, both agents will choose the same action in finite time and once they have chosen the same action, they have reached an absorbing state and will continue to choose the same action in every subsequent period. 3.1. An example To illustrate the short-run dynamics of the model, we can further specialize the example by assuming that, for each agent i, the signal σi(ω) = ωi is uniformly distributed on the interval [−1, 1] and the payoff to action 1 is U(ω) = ωA + ωB. At date 1, each agent chooses 1 if his signal is positive and zero if it is negative. If both choose the same action at date 1, they will continue to choose the same action at each subsequent date. Seeing the other agent choose the same action will only reinforce each agent’s belief that he has made the correct choice. No further information is revealed at subsequent dates and so we have reached an absorbing state, in which each agent knows his own signal and that the other’s signal has the same sign, but nothing more. So interesting dynamics occur only in the case where agents choose different actions at date 1. The exact nature of the dynamics depends on the relative strength of the two signals, measured here by their absolute values. Without loss of generality, we assume that A has a negative signal, B a positive signal, and B’s signal is relatively the stronger, i.e., |ωA| −1/2 and ωB > 1/2. In the first round at date 1, agent A will choose action 0 and agent B will choose action 1. At the second date, having observed that agent B chose 1, agent A will switch to action 1, while agent B will continue to choose 1. Thereafter, there is an absorbing state in which both agents choose 1 for ever and no further learning occurs. Case 2: 3/4 3/4. As before, A chooses 0 and B chooses 1 at date 1. At date 2, A observes that B chose 1 and infers that his signal has expected value 1/2. Since ωA 1/2 so the expected value of B’s signal is 3/4 and since ωA > −3/4 it is optimal for him to switch to 1, which then becomes an absorbing state. 9

Case 3: −(t−1)/t > ωA > −(t−2)/t and ωB > (t−1)/t. By analogous reasoning, A chooses 0 and B chooses 1 until date t when A switches to 1. The other interesting case to consider is when the signals are similar in strength. For example, suppose that ωA = −ωB = x∗ where x∗ is the limit of the sequent {xt}∞ t=1 defined by putting x1 = 1 2 , x2 = 1 4 , and xt = 1 2 (xt−1 + xt−2) for t = 3, 4, .... Notice that if t is even then xt −x∗ and switches to 1. By the symmetric argument, B switches to 0. At date 3, A observes B’s switch to 0 and realizes that 1/4 xt = 1 2 (xt−1 + xt−2), so it is optimal for A to choose 0 at t + 1. In fact, we can find a signal ω to rationalize any sequence of actions with the properties that for some T, xAt 6= xBt for t<T and xAt = xBt = a for t ≥ T. However, the sequences corresponding to T = ∞ occur with probability 0 and the sequences with T < ∞ occur with positive probability. This example can also be used to illustrate the speed of convergence to uniformity of actions. In the first period, the probability that agents choose the same action is 1/2. In the second period, it is 3/4. In the third period, it is 7/8, and so on. This is a very special example, but simulations of other examples confirm these results. Finally, we note that in this simple example, where the signals of the two players are symmetrically distributed, the asymptotic outcome must be Pareto-efficient. This follows from the fact that the agent with the stronger signal, as measured by its absolute value, will ultimately determine the action chosen. However, a simple adjustment to this example shows the possibility of an inefficient outcome. Suppose that A has a signal uniformly distributed on [0, 1] and B has a signal uniformly distributed on £ −1 2 , 1 ¤ . Then both A and B will choose action 1 at the first date and there will be no learning. However, if ωA is close to 0 and ωB is close to −1 2 then action 0 is clearly preferred conditional on the information available to the two agents. 10