Journal of Memory and Language 76(2014)9-6 Contents lists available at scienceDirect Journal of Memory and Language ELSEVIER journal homepage:www.elsevier.com/locate/jml Alignment and task success in spoken dialogue David Reitter,Johanna D.Moore ARTICLE INFO ABSTRACT anistic re uistic levels.In this e de The Int of sp Task success wth ogue.whe da po athe for the high-level ent o! 2014 Introduction Humans appear to be remarkably efficient communica speaker interaction.Priming occurs when memory tors in light of the computational complexity of natural val is biased by previousco have different vie oints linguistic preferences and have been used shortly beforehand. The IAM assume s that this repetition of linguistic cho rather than to ment Mod build a common understanding rgeting of the me dialogue 1.rel Thcutor's ns varies.The edu (D.Reitter)(D task-oriented dialogue depends on communication and is Moorel quantifiable.allowing us to test the IAM by linking it to
Alignment and task success in spoken dialogue David Reitter a,⇑ , Johanna D. Moore b a College of Information Sciences and Technology, The Pennsylvania State University, United States b School of Informatics, University of Edinburgh, United Kingdom article info Article history: Received 12 June 2013 revision received 26 May 2014 Keywords: Dialogue Interactive alignment Syntactic priming Structural priming Entrainment Task success abstract Task-solving in dialogue depends on the convergence of the situation models held by the dialogue partners. The Interactive Alignment Model (Pickering & Garrod, 2004) suggests that this convergence is the result of an interactive alignment process, which is based on mechanistic repetition at a number of linguistic levels. In this paper, we develop two predictions arising from the theory, along with two methods to quantify the known structural priming effects in the full inventory of syntactic choices found in text and speech corpora. (a) Under a rational perspective, we expect increased repetition in task-oriented dialogue compared to spontaneous conversation. We find within- and between-speaker priming in a corpus of spontaneous conversations, but stronger priming in task-oriented dialogue. (b) The Interactive Alignment Model predicts linguistic adaptation to be correlated with task success. We show this effect in a corpus of task-oriented dialogue, where we find a positive correlation of long-term adaptation and a quantifiable task success measure. We argue that the repetition tendency relevant for the high-level alignment of situation models is based on slow adaptation rather than short-term priming. We demonstrate that lexical and syntactic repetition are reliable and computationally exploitable predictors of task success. 2014 Elsevier Inc. All rights reserved. Introduction Humans appear to be remarkably efficient communicators in light of the computational complexity of natural language. Dialogue poses many challenges: interlocutors have different viewpoints, linguistic preferences and knowledge states. What may help is that we are copy cats rather than creators; we prefer to adapt our language rather than to go against the grain. The Interactive Alignment Model (IAM, Pickering & Garrod, 2004) posits that such mutual adaptation is easier than careful selection of information and targeting of the message in dialogue. The IAM suggests that basic priming effects at lower processing levels (lexical, syntactic) reinforce alignment at higher ones (e.g., semantic, pragmatic), leading to linguistic adaptation and grounding of situation models during speaker interaction. Priming occurs when memory retrieval is biased by previous context; in this case, priming refers to a tendency to choose linguistic constructions that have been used shortly beforehand. The IAM assumes that this repetition of linguistic choices is not just an artifact of general memory retrieval properties, but instead is a mechanism (alignment) by which interlocutors build a common understanding of the situation, enabling them to successfully communicate without keeping track of one another’s linguistic idiosyncrasies. According to the IAM, repetition is a heuristic that helps establish common ground unless the situation requires more careful monitoring and modeling of one’s interlocutor’s state of knowledge. The success of our interactions varies. The success of task-oriented dialogue depends on communication and is quantifiable, allowing us to test the IAM by linking it to http://dx.doi.org/10.1016/j.jml.2014.05.008 0749-596X/ 2014 Elsevier Inc. All rights reserved. ⇑ Corresponding author. Address: IST Building, Penn State, University Park, PA 16802, USA. E-mail addresses: reitter@psu.edu (D. Reitter), J.Moore@ed.ac.uk (J.D. Moore). Journal of Memory and Language 76 (2014) 29–46 Contents lists available at ScienceDirect Journal of Memory and Language journal homepage: www.el sevier.com/locate/jml
30 D.Reirter.I.D.Moore/of Memory and Longuage 76 (2014)29-46 relate prim ing at levels of syntactic,but also lexical choices.A qua sna ly s lar co Hypotheses e t tron unction dialogue partners that develop coherent situation models. second adds a functional perspective by showing a correla- ehakw chan tion system bet articipants.They found that speak thougbtiido5otcsiammchariguoo veloped a common on Lebiere.19 )prim d However.the full causal cascade from lower-level prim ing to high-level alignment has not yet been observed.Spe communication by priming suggested by the IAM could Forinstance syntactic representations may be temporarily tics held in working memo and so me ng is syntactic struct this in and coherence of when speaker 0 yto prefer 861Gd gdhis9cireorhavingheardannte iations cuto cluste as well use it( We hypothe Bernolet. than in sponta tion.Regardle term priming effects and whethe n the Man Task em adaptation in situations where they s is derived fron the IAM's core and task s cult to manipulate in naturalistic human-human dialogue ver we expect observable variation in adaptation from the e]from the car tely vield r een ask suc e the caravans est this prediction in Experiments -4.e nclud struction (hesha that both syntactic and lexical alig e one an oval s pe)m The spontanous syntactic choice adaptation.Adaptation denotesan inreased amount pf rather than p lausible alternatives to describe an oval after a few seconds long-tern shaped path.This example of repetition reflects not only adaptation is adaptation that is enhanced by repeated
alignment. In this paper, we correlate priming at levels of sentence structure (syntax) and word choice, the problem-solving objective of the dialogue, and success. Hypotheses Humans align their linguistic choices at several representational levels. At a low level, phonetic reductions occur in jointly understood words (Bard et al., 2000). An example of adaptation at a higher level of representation involves dialogue partners that develop coherent situation models, as in Garrod and Anderson’s (1987) Maze Game study. The task was designed to elicit a coordinated communication system between participants. They found that speakers tended to make the same semantic and pragmatic choices as in the utterances they had just heard. As the games proceeded, participants developed a common description scheme for positions in the maze. However, the full causal cascade from lower-level priming to high-level alignment has not yet been observed. Specifically, the hypothesized correlation between the two, and ultimately successful communication, has eluded empirical verification. In this paper, we focus on implicit linguistic decisions: the basic mechanics of communication implemented in syntactic structure, as opposed to the high-level strategies speakers use to describe aspects of a task, or the more explicitly controlled lexical choices. Syntactic priming occurs when speakers show a tendency to prefer one phrase structure over an available alternative shortly after having used this structure or having heard an interlocutor use it (Bock, 1986). Verbatim, lexical repetition is known to increase the strength of priming (Gries, 2005; Hartsuiker, Bernolet, Schoonbaert, Speybroeck, & Vanderelst, 2008; Pickering & Branigan, 1998). This lexical boost is a crucial effect for the IAM, as it shows propagation of alignment from lower to higher levels of representation. Thus far, there is only limited evidence for the occurrence of structural adaptation outside of carefully controlled laboratory settings. As we will see, speakers also adapt in situated, realistic dialogue. For example, consider this excerpt from the Map Task corpus (Anderson et al., 1991; McKelvie, 1998), a dataset that we will use extensively in this study. One speaker (g) is giving directions for another one (f) to follow on a map: f: from the mill wheel and up to the abandoned cottage to the right like a tick shape it’d be s– [the shape of a tick] from the g: no g: [the shape of a] [like an oval shape] from the caravan park you start just above the caravans Here, g first sets out to repeat the latest syntactic construction (the shape of an oval), but proceeds to use an alternative one (like an oval shape) in its repair, mirroring his interlocutor’s first syntactic choice (like a tick shape). The spontaneous syntactic choice is a direct repetition, but would be ungrammatical if completed (the shape of a oval). Both of g’s expressions reflect structural repetitions rather than plausible alternatives to describe an ovalshaped path. This example of repetition reflects not only syntactic, but also lexical choices. A quantitative model of priming should cover such cases, but also repetitions that occur outside of lexically or semantically similar contexts. In our study, we are concerned with implicit (syntactic) effects. We therefore measure priming of syntactic phrase-structure rules, whereby word-by-word repetition (topicality effects, parroting) is explicitly excluded. We examine the IAM from a functional perspective, and derive two groups of testable hypotheses. The first examines syntactic priming in task-oriented dialogue, while the second adds a functional perspective by showing a correlation between adaptation and task success. Our first hypothesis concerns the mechanisms of priming. Syntactic priming is claimed to be a mechanistic effect, though this does not necessarily mean that it is automatic and agnostic to contextual influence. According to some cognitive architectures (Anderson & Lebiere, 1998), priming effects are the result of working memory activity. From a functional and rationalist point of view, the enhancement of communication by priming suggested by the IAM could have led to an architectural configuration where the demands of the dialogue situation drive syntactic priming. For instance, syntactic representations may be temporarily associated with semantic ones. Topics determine semantics held in working memory, and so, meaning is typically clustered rather than randomly mixed. In line with this, theories of dialogue have suggested clustering of topics, and coherence of topic structure (Grosz, Joshi, & Weinstein, 1995; Grosz & Sidner, 1986). Given any syntactic-semantic associations, syntactic structure may tend to cluster as well. We hypothesize that there is a tendency for dialogue partners to repeat syntactic structure within brief time windows, and that they do more so in task-oriented dialogue than in spontaneous conversation. Regardless of the underlying mechanisms, the IAM seems incompatible with the inverse hypothesis: less priming in task-oriented dialogue. In the first set of experiments (1–2), we look at shortterm priming effects and whether speakers implicitly use increased short-term adaptation in situations where they may benefit from it. The second hypothesis is derived from the IAM’s core idea connecting low-level priming to high-level mutual understanding and task success. Adaptation itself is diffi- cult to manipulate in naturalistic human–human dialogue. However, we expect observable variation in adaptation levels. The IAM predicts that task-oriented dialogues that exhibit more syntactic adaptation between the interaction partners will ultimately yield more task success. We test this prediction in Experiments 3–4. We conclude with an experiment that uses machine learning techniques to demonstrate that both syntactic and lexical alignment can be exploited to predict task success (Experiment 5). We will refer to several different variants of syntactic adaptation. Adaptation denotes an increased amount of re-use of decisions compared to expected repetition occurring by chance. Short-term priming is short-lived adaptation, which disappears after a few seconds. Long-term adaptation is adaptation that is enhanced by repeated 30 D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46
D.Reitter.JD.Moore/Joumal of Memory and Language 76(2014)29-6 31 Most of the results on priming and alignment come t designs i tion models and established ad hoc conventions e by pa atural ve b-argument preferer s in experimental conditions d nte active alignment and structural priming in dialogue experiments fail to faithfully reproduce real-wo rld lar Structural priming is a special case of adaptation.eithe between or within peakers.Language production and ctors tha hending language.or whether they were used in one's Such criticisms are addressed by work on language elic mented and known to between questions and ,200 Levelt roborate theabor eriments and also show that structu Bock (19)established the experimental paradigm that contrasts the use of alternative svntactic choices sha same seman ics (She picks up the book vs.She tactic structure independent of semantics and metrical or studies as well as lab experiments to 三世 ra()ound sym and sentence completion task vs.passive olaeoraotbleobjet(Dosp position lab i r ev This de object that active and passive constructions.for instance the experin wor in structural priming In th are not se tic alternations mark syntactic choice points,i.e.where ds laid out in ro e m mus con of th doe s not require altemations to define or even measure Garrod (200)argue that if the main rea was used (e.g.the c y giving the clown a balloon priming effects occur is to facilitate alignment provide an。 enced the syntactic structure of the subject's description spontane ous processe es and the interaction between lin The next section wi slo ()st shor-termd Methodology:mea suring short-term priming in corpora workhas proposed models that explain the mechanisms ng or to tes The Switch 2011)within the context of lang age acquisition versations:the HCRC Map Task corpus l.1 cle will address short-term syntactic priming first and priming of assive constructions.we can do so with a
exposure, persistently increasing the availability of syntactic structures. Alignment is a cascade of adaptation processes between speakers at different linguistic levels postulated by the IAM. Alignment culminates in assimilated situation models and established ad hoc conventions between speakers. Interactive alignment and structural priming in dialogue Structural priming is a special case of adaptation, either between or within speakers. Language production and comprehension are biased by recent experience, regardless of whether the structures were observed while comprehending language, or whether they were used in one’s own speech. Alignment at the syntactic level is well-documented and known to occur in a variety of contexts: between questions and answers (Levelt & Kelter, 1982), in comprehension and production. It can be specific to dialogue partners (Brennan & Hanna, 2009) or to the perceived abilities of an interlocutor (Branigan, Pickering, Pearson, McLean, & Brown, 2011). Bock (1986) established the experimental paradigm that uncovered structural priming in speech. Bock and Loebell (1990) demonstrated evidence for priming of syntactic structure independent of semantics and metrical or event structure. Pickering and Branigan (1998) found syntactic priming in written language production using scripted situations and a sentence completion task. Branigan, Pickering, and Cleland (2000) found clear evidence for syntactic alignment in dialogue-like lab interactions. Their experimental design is prototypical of much of the experimental work in structural priming. In their experiments, dialogue partners took turns describing pictures to one another to enable their partner to identify the card containing the described picture from a set of cards laid out in front of them. One of the speakers was a confederate and produced descriptions based on a script that manipulated syntactic choice, in particular whether a double object or a prepositional object construction was used (e.g., the cowboy giving the clown a balloon vs. the cowboy giving a balloon to the clown). The syntactic structure of the confederate’s description strongly influenced the syntactic structure of the subject’s description in the turn immediately following. Two adaptation effects occur: (a) fast, short-term and short-lived priming, and (b) slow, long-term adaptation that persists and is likely to be a result of implicit learning (see Ferreira & Bock (2006) and Pickering & Ferreira (2008) for reviews). Long-term adaptation is a learning effect that can persist over several days (Bock, Dell, Chang, & Onishi, 2007; Kaschak, Kutta, & Schatschneider, 2011). Recent work has proposed models that explain the mechanisms of the effects (Bock & Griffin, 2000; Kaschak, Kutta, & Jones, 2011) within the context of language acquisition (Chang, Dell, & Bock, 2006) and general memory retrieval (Reitter, Keller, & Moore, 2011). The remainder of this article will address short-term syntactic priming first, and then discuss experiments with long-term syntactic and lexical alignment. Most of the results on priming and alignment come from controlled experiments. We caution that designs in which subjects do a task constructed to elicit linguistic target constructions many times may not be a true reflection of linguistic choices made by participants in natural, spontaneous real-life dialogue. For instance, findings regarding verb-argument preferences in experimental conditions do not always correlate well with corpus studies (Roland & Jurafsky, 2002). One reason why some linguistic laboratory experiments fail to faithfully reproduce real-world language use may be the complexity of linguistic choice as evidenced by models derived from corpora. Gries (2005) argues that experimental designs may effectively control only some confounds, but not the variety of factors that influence linguistic decision-making. Such criticisms are addressed by work on language elicited outside of artificially created situations, often in the context of spoken dialogue (Bock & Kroch, 1989; Dubey, Keller, & Sturt, 2005; Estival, 1985; Gries, 2005; Levelt & Kelter, 1982; Szmrecsanyi, 2006, 2005). These studies corroborate the laboratory experiments and also show that structural priming occurs in spontaneously produced language. However, these studies employ a design pattern that contrasts the use of alternative syntactic choices sharing the same semantics (e.g., She picks up the book vs. She picks the book up). Typically, such use of explicit alternations limits corpus studies as well as lab experiments to a small set of predetermined syntactic rules or constructions, such as particle placement as in the example, active vs. passive voice, or double object (DO) vs. prepositional object (PO) use for arguments to verbs. This design also hinges on a very simple notion of semantics. One could object that active and passive constructions, for instance, are not semantically equivalent and carry different connotations and information statuses (Steedman, 2000). Syntactic alternations mark syntactic choice points, i.e., where a speaker must choose a construction to use. The corpusbased approach we follow refers to syntactic choices, but does not require alternations to define or even measure priming. Pickering and Garrod (2004) argue that if the main reason that priming effects occur is to facilitate alignment, they will be particularly strong during natural interactions. Corpora provide an opportunity to quantify and contrast spontaneous processes and the interaction between linguistic choices and cognitive tasks. The next section will describe this methodology in detail. Methodology: measuring short-term priming in corpora What we describe in the following is a method to quantify and contrast priming levels in datasets. They contain language spontaneously produced in contexts not designed to elicit syntactic priming or to test the IAM. The Switchboard corpus (Marcus et al., 1994) is a set of spontaneous telephone conversations; the HCRC Map Task corpus (Anderson et al., 1991) contains task-oriented dialogues. Consider the following example. If we were to detect priming of passive constructions, we can do so with a range of different verbs and semantics by counting occurrences of passives, and contrasting the counts under two D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46 31
D Reirter iD Moorellournal of Mernory and language 76 (2014)29-46 Table Onset time(s) Syntactic rule Yield P-VBG P edg of the page P二ATNN petition case(where (where the Decay-based model of short-term priming DcmaiRgmed priming is not rease in probabilit ptencdosey fter a potential prime of the san le(stim but extends this method by looking at all syntactic con en prime and structions rather than just passives,and by using regres target.For example,if a sentence-level conjunction leads paces the strict control of seman We sample repetitions a erent distances ( un to 25 utterances or 15 s natural.as the underlying sem antics largely dictate how nce e dialogue will normally lea a binary res ponse variable indica tition ys non as noise. emory ettects gener ay non-linear of th Corpus processing confrmed this distribution.In(DIST)is therefore log-trans- orTeieorond lad ation wh structure.Both of the corpora ha ve been annotated with The exampl Marcuset)From the trees. we identi y th target) a proxy for memory items that a speaker has tore ve to all han produce or comprehend a sentence.For example,the tree s a me VP for such phrase VBG PP keeping IN bly inflate results. PP NN IN Np y should the edge of AT NN the page tion as ion of the time between p rime and n pi with dis is for unam omparison of priming str The theow Corpus part-of-spe shown in related studies Gries (2005 :ntence-evecoorncojunction demonstrated a correlation of distance with the repetition
conditions: a repetition case (where a passive occurred shortly before), and a control case (where the passive has not occurred recently). Priming is the result of the difference between the normalized counts. Under this view, priming is not repetition, but the increase in probability caused by a preceding occurrence. Our technique is similar, but extends this method by looking at all syntactic constructions rather than just passives, and by using regression for greater sensitivity. In this and other corpus studies, the concept of adding predictors as controls replaces the strict control of semantics in the laboratory experiment. We see a high degree of variance in speakers’ choices of syntactic forms, which is natural, as the underlying semantics largely dictate how to construct the sentences. However, examining a large number of data points allows us to treat semantic variation as noise. Corpus processing To examine ‘‘all kinds of syntactic constructions’’, we analyze our datasets in terms of their syntactic phrase structure. Both of the corpora have been annotated with phrase structure trees through automatic and manual processes that included extensive verification (Anderson et al., 1991; Marcus et al., 1994). From the trees, we identify the syntactic rules used to construct them. We see the rules as a proxy for memory items that a speaker has to retrieve to produce or comprehend a sentence. For example, the tree yields the six phrase structure rule instances shown in Table 1. 1 The conversion from syntactic trees to rule instances is unambiguous. Decay-based model of short-term priming The amount of rule repetition can now be quantified. Structural priming predicts that a rule (target) occurs more often closely after a potential prime of the same rule (stimulus) than further away. Therefore, we correlate the probability of repetition with the distance between prime and target. For example, if a sentence-level conjunction leads to the rule S ? S cc S, and such a conjunction appears in utterances 3 and 11, we would observe a repetition, noting its distance d ¼ 8 utterances. We sample repetitions and non-repetitions within 1-s or 1-utterance windows at different distances (lnðDistÞ, up to 25 utterances or 15 s). Thus, a rule occurrence in the dialogue will normally lead to up to 25 or 15 data points for the various distances, with a binary response variable indicating repetition vs. nonrepetition. Memory effects generally decay non-linearly. Analysis of the repetition probabilities over increasing d confirmed this distribution. lnðDistÞ is therefore log-transformed in our models. Unlike in controlled experimentation where specific syntactic constructions are elicited, every rule may be biased by a prior prime in this paradigm. The example shown in Fig. 1 shows a subset of the rules appearing in the text. Repetitions a and b are both at distance 2, because the occurrences (prime and target) are two utterances apart, or 4.6 and 3.2 s, respectively. To facilitate the computation, we also drop all hapax rules (frequency f ¼ 1). We exclude cases where syntactic repetition is a mere consequence of verbatim lexical repetition (c). The reason for this is that speakers may merely repeat such phrases without analyzing them syntactically. Lexical repetition is likely to result in syntactic repetition, which would possibly inflate results. The basic statistical model compares the probability of a rule occurrence in situations when it was and was not primed. The null hypothesis is that this probability should be unaffected by the prime. Our statistical model is a sensitive variant of this idea. We predict the probability of repetition as a function of the time between prime and target. Priming effects decay over time or are subject to interference in working memory, so the effect assumes a decline of repetition probability with increasing distance between prime and target. The slope of this decline is the basis for comparison of priming strength under different conditions. The logistic regression model is specified in the appendix. The effect of distance on syntactic repetition has been shown in related studies on corpora. Gries (2005) demonstrated a correlation of distance with the repetition Table 1 Syntactic rules and additional information extracted from the Map Task corpus. The speaker here is the direction follower, as opposed to the direction giver. This is a simplified example compared to the actual annotation. Onset time (s) Speaker Syntactic rule Yield 185.105 Follower VP ! VBG PP Keeping on the edge of the page 185.363 Follower PP ! IN NP On the edge of the page 185.490 Follower NP ! AT NN The edge 185.490 Follower NP !NP PP The edge of the page 185.692 follower PP ! IN NP Of the page 185.729 follower NP ! AT NN The page 1 The analysis uses the Brown Corpus part-of-speech tags Kucera and Francis (1967). IN: preposition, AT: determiner, VBG: verb, present participle/gerund. CC: sentence-level coordinating conjunction. 32 D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46
D.Reitter.1.D.Moore/Journal of Memory and Language 76 (2014)29-46 33 Time Resulting Binary Response Speoker andyield Repetition cases Sampling Windows,Rule PP>RP-PP down in that forty-five degre g:and turn g:a monumen N.NE 212.9 NP→AT-N g:the monument 214, NP→NP-PP :outside of the monume NP ALNN FgLArowsoatheigtiustatewoinst cesof syntactic etit(and a lexical-syntacticne(from MapTask Th v of s uch phe as a prox tances gre er than one parsing unit(a unit similar to an e-D ased)so.alth only during the initial 5s. method,unlike that temporal decay a within the )the unde ngand determine howit interactswith other y.Duncan.Bro n2004. The first of views.tempral decay.implies a lowing experiments model it in seconds. Cumulativ is influenced by decaying activation.The alternative view We distinguish comprehension-production(CP)priming. other mate nhs加 the et and duction-rodcion()both th inapropiaitemlhtOfoteaoOrsnamc nstraints etween-speaker CP priming.and o(base case)for ence of working mem on svntactic decisions. A predictor n(R)is included to control for the fre provides 0 nation of rapid temporal decay of syntactic inform tion size Frequency is an important covariate in many subject to interfering memory (Reitter et al.2011).The interaction of multiple
probability of selected syntactic alternations in a corpus of spoken and written English. Gries found no effect of distances greater than one parsing unit (a unit similar to an utterance). Similarly, in our data, we see a strong decay only during the initial 5 s. In our method, unlike that of Gries, we take the distance effect on repetition within the short initial time period as a measure of short-term priming and determine how it interacts with other variables. How repetition probability is modeled depends on assumptions about the underlying cognitive mechanisms. The first of two common views, temporal decay, implies a diminishing of repetition probability or priming effects over time. This assumes a form of decision-making that is influenced by decaying activation. The alternative view assumes interference of other material, resulting in a similar reduction in repetition probability. In this case, the selection of syntactic rules is influenced by interference from more recent syntactic structures even if they are inappropriate in light of contextual or semantic constraints (see Jonides et al., 2008, for a review of the two views of short-term memory). The latter may also suggest the influence of working memory on syntactic decisions, where working memory provides cues that aid in retrieval of memory. Short-term priming can be modeled as a combination of rapid temporal decay of syntactic information, and cue-based memory retrieval subject to interfering and facilitating semantic and other information in working memory (Reitter et al., 2011). The interaction of multiple activation mechanisms is a common assumption of ACTR (Anderson & Lebiere, 1998). An additional difficulty in modeling such phenomena arises from the fact that one mechanism (e.g., temporal) may act as a proxy for the other (e.g., interference-based). So, although the rational analysis of memory retrieval needed in typical environments or text corpora may suggest temporal decay at the computational level (Marr, 1982), the underlying cognitive processes and neural implementation may be different (Lewandowsky, Duncan, & Brown, 2004). The initial experiment 1 models distance between stimulus and target (DIST) in terms of utterances, while the following experiments model it in seconds.2 Cumulative priming by a stimulus that is repeated several times is not captured by this statistical model. We distinguish comprehension-production (CP) priming, where the speaker first comprehends the prime (uttered by his/her interlocutor) and then produces the target, and production-production (PP) priming, where both the prime and the target are produced by the same speaker. This distinction is encoded in the factor CP, which is coded as 1 for between-speaker CP priming, and 0 (base case) for within-speaker PP priming. A predictor lnðFreqÞ is included to control for the frequency of the repeated syntactic rule in the corpus, as the log-transformed rule frequency normalized by corpus size. Frequency is an important covariate in many Fig. 1. Arrows on the right illustrate two instances of syntactic repetitions (a; b) and a lexical-syntactic one (c) from Map Task. c is not counted as it is also a lexical repetition. Arrows on the left show three samples (out of up to 15 per rule instance) connecting a rule instance of PP ! IN NP (at bottom) with onesecond time windows at varying distances d prior to the rule. The window at distance 3 contains repetition case b, yielding a positive sample (marked ‘‘Yes’’). In the other two windows, there is no repetition, yielding negative samples. 2 We aim to show broad applicability of the method, but see time as the most reliable and neutral basis for decay. Reitter (2008) contains further experiments varying this metric. D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46 33
D.Reirter.I.D.Moore/of Memory and Longuage 76 (2014)29-46 suspected to becomes less likely as the dista red in utterance from the first ence incr In(FREQ n summ hboard In(DiST is gives us an indicatio 8= -0.080.p0001),and the effect is reduced by we th on prot d to t fo In(FREO)interacts with In(DiST)(=0.05 rime to 15s or 25 utterances afterward (predictor:In(DisT)Thus,we is,we find less priming for more common rules. take has not been repeated. Discussion Experiment 1:repetition in corpora nce i repea While controlled expe o one another.the str ments have showr ifypriming by estimating the decay effect was developed that is in semantic The priming effect obtained in these corpora confirm common goal. onger time periods Method els indisti guishable from the prior after abou glance g separnatetgdcs Sh rke ously.Thmi that。i to)decav only after 140 words hich would be rd(Marcus et al ).a corpus of spontane mately 45s ata speech rate/min)Howeve mly paire most of the priming effect"dedline topic to discuss.but were otherwise unrestricted.The cor 0 words ent【oca.5s yielding 472,000 phrase structure rules with 4700 distinc ceeudedinh Niss 2004).After extracting all potential cann e he im inganequ number of repetition and non-repetit cas The second da is the M hlp3gecoaaining20400uteranCeg Experiment 2:priming and decay over time in different genres In this section.we develop the first of two hypotheses Results designed to test the IAM or some of its assumptions. Two r ssion models were fitted.one to each datase ingse (Table 2)Th ycontain the n()covariate toestima 0911ae dcha be ed t identify comprehension-production prming between na the A In Map Task,In(DIsT)reliably predicts declining rule repetition (8=-0.073.p<.0001).Repetition of a rule distance (p<0000)
psycholinguistic models and has long been suspected to interact with priming (e.g., Scheepers, 2003). In summary, our model demonstrates a priming effect by observing a decay, that is, a negative parameter for lnðDistÞ. How strong this decay is gives us an indication of how much repetition probability we see shortly after the stimulus (prime) compared to the probability of chance repetition—without ever explicitly calculating such a prior. We define the strength of priming as the decay rate of repetition probability, from shortly after the prime to 15 s or 25 utterances afterward (predictor: lnðDistÞ). Thus, we take several samples at varying distances (d), looking at cases of structural repetition, and cases where structure has not been repeated. Experiment 1: repetition in corpora While controlled experiments have shown syntactic priming, we first aim to demonstrate a sensitive method that can quantify and contrast priming magnitudes in corpora. We will examine two types of text: (a) spontaneous conversation, that is, in a situation where the semantics of the dialogue are not controlled and (b) task-oriented dialogue, where interlocutors collaborate to achieve a common goal. Method We use two datasets in this experiment and build two separate statistical models. Short-term priming effects are measured as described previously. The first dataset is Switchboard (Marcus et al., 1994), a corpus of spontaneous spoken telephone dialogues among randomly paired, North American English speakers who were given a general topic to discuss, but were otherwise unrestricted. The corpus contains 80,000 transcribed utterances were annotated with phrase structure trees (Marcus et al., 1994), yielding 472,000 phrase structure rules with 4700 distinct rules. Words in this portion of the corpus, included in the Penn Treebank, were time-tagged (Carletta, Dingare, Nissim, & Nikitina, 2004). After extracting all potential repetition cases, the data were balanced by re-sampling, yielding an equal number of repetition and non-repetition cases. The second dataset is the HCRC Map Task corpus (Anderson et al., 1991), which consists of 128 task-oriented dialogues containing 20,400 utterances, using 759 different phrase structure rules. Using exactly the same methodology as for Switchboard, we extracted 157,000 rules. Results Two regression models were fitted, one to each dataset (Table 2). They contain the lnðDistÞ covariate to estimate priming levels (negative effects indicate stronger priming), lnðFreqÞ for the effects of frequency, and a factor CP (to identify comprehension-production priming between speakers). In Map Task, lnðDistÞ reliably predicts declining rule repetition (b ¼ 0:073; p < :0001). Repetition of a rule becomes less likely as the distance measured in utterances from the first occurrence increases: lnðFreqÞ interacts reliably with lnðDistÞ (b ¼ 0:043; p < :0001). In Switchboard, lnðDistÞ also predicts declining rule repetition (b ¼ 0:080; p < :0001), and the effect is reduced by increasing frequency. Prime Type CP (priming between speakers) does not interact with the decay coefficient for lnðDistÞ. 3 lnðFreqÞ interacts with lnðDistÞ (b ¼ 0:057; p < :0001), which suggests that repetition probability decreases less quickly for rules with high frequencies. That is, we find less priming for more common rules. Discussion A speaker is more likely to use a syntactic rule shortly after using the same rule. The closer prime and target are to one another, the stronger the preference is to repeat. Priming occurs both within a speaker (PP) and between speakers (CP), and it decays rapidly. The method to quantify priming by estimating the decay effect was developed initially for the Switchboard corpus; Map Task was not used to design or tune the regression modeling methods. The priming effect obtained in these corpora confirms experimental results by Bock and Griffin (2000) and Branigan, Pickering, and Cleland (1999). These studies find syntactic priming over short and longer time periods.4 The decay we observe is remarkable: repetition rates reach levels indistinguishable from the prior after about 5–6 s. At first glance, this contrasts with Szmrecsanyi (2006, p. 188) results, who finds that future marker choices (will vs. going to) decay only after 140 words (which would be approximately 45 s at a speech rate of 180 words/min). However, as Szmrecsanyi points out, due to the logarithmic nature of the forgetting function, most of the priming effect ‘‘declines within an interval of 10 words (.), equivalent to ca. 5 s of speech.’’ With our data, a log-linear model (for distance) yielded a better fit than a linear–linear one,5 which is compatible with general models of memory (Anderson, Bothell, Lebiere, & Matessa, 1998). The models produced for Switchboard and Map Task cannot be used to quantify the strengths of syntactic priming; they just show the decay effects separately for the two corpora. In the next experiment, we compare priming between the corpora. Experiment 2: priming and decay over time in different genres In this section, we develop the first of two hypotheses designed to test the IAM or some of its assumptions. 3 The resulting estimate for lnðDistÞ in our model (for a syntactic rule of average frequency) would be 0:080 for PP (odds ratio: 0.92), but 0.080 to 0.017 (odds ratio 0.91) for CP priming. Because a negative b indicates decay, this indicates CP and PP priming in Switchboard. 4 The effect of CP on bias may be related to general levels of speaker idiosyncrasies, i.e., increased chance repetition within speakers. Fitting the main effect controls for that. 5 Applying the Akaike Information Criterion, the model in Table 3 would be exceedingly unlikely, if it employed linear distance instead of log-linear distance (p < :0000). 34 D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46
D.Reitter.JD.Moore/Joumal of Memory and Language 76(2014)29-6 35 odels of short-tes 1 PP is th cepted correlation between covariates 02:CP was resid ManTask Switchboard SE OR 0011 050 -0.176 084 (T 0.011 005 The speakers were told that their goal was pear to paint a different upposes ratio say whatever was necessa to compl te the task.It wa that s that an municative or situationl outcome If we accept this as All maps consisted of landmarks represented as line M5son.1989 priming levels vary at all with dialogue purpose.they tend task-rtd dialogue marked only on the giver's map.Landmarks ong th First.ifpr h s,4 onon the giv infuenced by dialogue purpos or contextual working (typically one per map pair)had diffe memory conter nts,then we .The seemore pri iction or ev dintecrori ce on 。e giver's ma sitio once a whole. The follower had only one repeated landmark.which was minrhhn twice as tailored their utterances to match the instruction giver e grammand syn room and half of the pairs could mak eeye contact.From tation in M his is contrary to what would be We pool the two datasets(Switchboard and Map Task) 0 will be the remainder of this paper. them ls ac or the experiments,except that the Disr covariate is now mea Like Switchboard.the Map Task is a corpus of spoke wo-person.th length differs). interlocutors work together to perform a task as quickly obtain the
The IAM suggests that priming benefits speakers in conversation. At the same time, we observe that independently fitted statistical models appear to paint a different picture of priming in spontaneous conversation, as opposed to priming in task-oriented dialogue. The test of the IAM we put forward presupposes rationality in cognitive processes, that is, that variation in an individual’s linguistic processes tends to optimize the communicative or situational outcome. If we accept this as a general principle (Anderson & Milson, 1989; Chater & Oaksford, 1999), then the IAM predicts that if speaker’s priming levels vary at all with dialogue purpose, they tend to vary such that task-oriented dialogue shows stronger priming than less goal-driven interaction, i.e., spontaneous conversation or small talk. Let us briefly consider the alternatives. First, if priming is the result of a mechanistic memory effect that is not influenced by dialogue purpose or contextual working memory contents, then we should not observe any difference in priming between the dialogue genres. Second, if we do find different priming levels, and we see more priming in spontaneous conversation, we would interpret this as a violation of the IAM prediction or even rationality as a whole. The differences in dialogue situation may have affected priming levels through a different mechanism than IAM. Speakers may have tailored their utterances to match the needs of their audience: In the experimental design that led to the Map Task data, participants were in the same room and half of the pairs could make eye contact. From an audience design perspective, the richer communication channel may have led them to reduce their levels of adaptation in Map Task. This is contrary to what would be expected under the IAM. Next, we describe the Map Task in detail. This corpus will be used throughout the remainder of this paper. The Map Task Like Switchboard, the Map Task is a corpus of spoken, two-person dialogues in English. Unlike Switchboard, the Map Task dialogues are task-oriented dialogues, in which interlocutors work together to perform a task as quickly and efficiently as possible. In each trial, the two speakers sat opposite one another and each had a map, which the other could not see. One of them, the instruction giver, had a map with a route drawn on it; the other participant, the instruction follower, had no route drawn on her map. The speakers were told that their goal was to reproduce the Instruction giver’s route on the Instruction follower’s map. The maps were not identical, and before they began the task the participants were told explicitly that their maps may differ in some respects, and that they could say whatever was necessary to complete the task. It was up to the participants to discover how the two maps differed (see Figs. 4 and 5). All maps consisted of landmarks represented as line drawings which are labelled with their intended name. All map routes began with a starting point, which was marked on both maps, and an end point, which was marked only on the giver’s map. Landmarks along the map alternated between those that appeared on both maps and those that appeared on only one map. For each map, 8 landmarks appeared on both maps, 4 on only the giver’s map, and 3 on only the follower’s map. In addition, some landmarks (typically one per map pair) had different names on the two maps. These names were identical in form and location but had different labels on the two maps (e.g., mill wheel vs. old mill). Finally, 2 landmarks appeared twice on the giver’s map, once in a position close to the route and once in a position more distant from the route. The follower had only one repeated landmark, which was distant. Each subject participated in four dialogues, twice as instruction giver and twice as instruction follower. The spoken interactions were recorded, transcribed and syntactically annotated with phrase structure grammar.6 Method We pool the two datasets (Switchboard and Map Task), distinguishing them via a factor SOURCE. The methodology to quantify priming levels is the same as for the previous experiments, except that the DIST covariate is now measured in seconds instead of utterances (the notion of utterance is not the same in each corpus, and average utterance length differs).7 Table 2 Two regression models of short-term rule repetition (Experiment 1). Prime-target distance in utterances. All continuous predictors were centred; CP was coded as 1, PP is the base case. Response variable (repetition probability), effect sizes (b) and standard errors (SE) in logits. Random effects of intercept and slope (distance), grouped by utterance. Maximum accepted correlation between covariates 0.2; CP was residualized. MapTask Switchboard Covariate b OR SE b OR SE Intercept 1:721 0.18 0:011⁄⁄⁄ 1:079 0.34 0:025⁄⁄⁄ lnðDistÞ 0:073 0.93 0:011⁄⁄⁄ 0:080 0.92 0:012⁄⁄⁄ lnðFreqÞ 0:722 2.06 0:01⁄⁄⁄ 0:884 2.42 0:006⁄⁄⁄ CP 0:684 0.50 0:013⁄⁄⁄ 0:176 0.84 0:011⁄⁄⁄ lnðDistÞ:CP 0:018 0.98 0:019 0:017 0.98 0:014 lnðDistÞ:lnðFreqÞ 0:043 1.04 0:011⁄⁄⁄ 0:057 1.06 0:006⁄⁄⁄ ⁄ p < 0.05. ⁄⁄⁄ p < 0.0001 (by jzj) . 6 Many other types of annotation are also available. See http:// www.hcrc.ed.ac.uk/maptask/ for a description and instructions of how to obtain the corpus. 7 Elsewhere, we have documented that time-based vs. utterance-based analysis does not confound the comparisons between the corpora Reitter (2008). D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46 35
36 D.Reirter.I.D.Moore/Journal of Memory and Language 76 (2014)29-46 乙pgg case (PP):-0.35.F ffect correlations between all variab es was lower tha 025.ANOVA F-values shown. Covariate OR F p(> 2099 001 0.000 26 (FEQ) n(DisT):C 8 CP.Map Task PP.Map Task etitio they are 0.165.and 0.141.respectively.Th make To derive these -0.30-0.25-0.20-0.15-0.10-0.050.00 We of Cdactorsandverager vn in Table be with a preposition (such as below that bend there )The of this Results Refer to Table 3.The estimate for In(Disr)describes the rime- arget dis of 1 and 0.120 at 9s Fo se values would be 0.158and0.109 M DisT 图品光芳 Discussion -0.106 d.(Ber and PF Experiment 1 holds:there is syntactic priming in both strong in task lends support to our hypothesis:evi 006 dence for stronge priming whe speakers Fig.2 contrasts different effect sizes.that is.estimatesof in the task oriented Map Task co us than in the spontane ous conversations of Switchboard. intervals as well as the model suggest that priming Discussion:results and methods data mpled toprovide an overa on prob erances rules,and bet repetition probabilities are dshor the corpora of spoken dialogue that we invest on svntactic repetition probability thus providing p=219.and at distances of 8-10s.0.143.For Switch- evidence for a structural priming effect for arbitrary syn-
Results Refer to Table 3. The estimate for lnðDistÞ describes the slope of repetition probability over time for the baseline condition, that is, in Switchboard. We find a main effect of lnðDistÞ (b ¼ 0:165; p jzjÞ Intercept 2:096 0.12 0:010 200.6 < 0:0001 lnðDistÞ 0:195 0.84 0:011 82.7 17.3 < 0:0001 CP 0:263 0.83 0:014 304.5 18.8 < 0:0001 MAPTASK 0:054 1.06 0:015 2.6 3.49 < 0:001 lnðFreqÞ 0:759 2.14 0:007 10388.8 102.2 < 0:0001 lnðDistÞ: CP 0:033 1.02 0:019 5.4 1.77 < 0:10 lnðDistÞ: MAPTASK 0:058 0.94 0:017 12.6 3.35 < 0:001 CP: MAPTASK 0:166 0.85 0:028 35.1 5.93 < 0:0001 lnðDistÞ: lnðFreqÞ 0:092 1.10 0:009 113.3 10.65 < 0:0001 lnðDistÞ:CP:MAPTASK 0:106 0.90 0:037 8.2 2.87 < 0:005 Fig. 2. Relative Decay effect sizes in logits for lnðDistÞ with different combinations of CP and SOURCE factors and average residual frequency, based on model shown in Table 3. Longer bars indicate stronger decay and priming. Error bars show standard errors. 36 D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46
D.Reitter.JD.Moore/Joumal of Memory and Language 76(2014)29-6 37 tactic rules.In both corpora.we also found reliable effects sed at a time e red to y et al.(2 vay)?Th the lo measure In the Map Task c y indirectly(e.g.via topi which consists of task nted dialogues find evidence for stronger overall priming ng evidence for the Experiment 3:task success and short-term priming of the sit- To test the AM sting of syntactic priming and the alignme ation mo dels present in task ed dialogue.The inter is designed to detect co-variance of short-term priming priming() d b munic s.CPpriming and task success. 0 Method make fine-grained rout tition.As Map a concurrent explanation semantic and le material have facilitated is me asured in ute that th wer ha We concede that dialogues in the two corpora differ rs of the Map Task the th ters and their linguistic variability.While d ir e of cav-bas can be expe We cor uage.it is still unclear which differences petween the c g m ally cau DE e thi (s) b (a task su ed co diti to distinguish sh nd l a positive estimate for this interaction syntact arge corpora present us with an oppe rtunity to evalu Results poraenctioaretndtapoint 005).For ins Table ows the full model.As before.short-tern a single utteran will typ 150,P<.0001).N bly.how path devi another.I orpus study ted the (andom sub-languages or by contrasting primed and non-primed sam Discussion by using decay as the taget metric Ne hav a clea A final methodological concern is coherence:adjacent utterances do not jump from topic to topic- -inst does this i ndic tha coherent(Grosz&Sidner.1986)Clustering may be bu
tactic rules. In both corpora, we also found reliable effects of both production-production (PP) priming (self-priming) and comprehension-production (CP) priming. With the clear PP priming effect in spontaneous conversation, we also add a new finding compared to Dubey et al. (2005), who did not detect reliable evidence of adaptation within speakers in Switchboard for selected syntactic rules in coordinate structures. In the Map Task corpus, which consists of task-oriented dialogues, we find evidence for stronger overall priming than in Switchboard, a corpus of spontaneous conversation. We consider this effect supporting evidence for the Interactive Alignment Model (Pickering & Garrod, 2004). According to the IAM, what we observe is the reciprocal boosting of syntactic priming and the alignment of the situation models present in task-oriented dialogue. The interaction partners synchronize their situation models in the task-oriented setting, which co-occurs with cross-speaker priming (CP) on other communicative levels. CP priming appears to be enhanced by the need for a shared situation model. Recurring coordination moves enable speakers to make fine-grained distinctions of the path described, and these may provide an explanation for increased local repetition. As a concurrent explanation, semantic and lexical material that occurs in clusters may also have facilitated local syntactic repetition. We concede that dialogues in the two corpora differ greatly with respect to the overall goals of the speakers, their mode of interaction, the durations of their turns, their language registers and their linguistic variability. While the underlying, decay-based methodology can be expected to be robust with respect to general differences in language, it is still unclear which differences between the corpora actually caused priming to be stronger in Map Task. The next experiments address this concern. We will examine only data from the Map Task corpus, which was collected under well-controlled conditions. We also broaden our view to distinguish short-term and long-term adaptation, and to evaluate to what extent task success can be predicted and estimated based on lexical and syntactic adaptation. Large corpora present us with an opportunity to evaluate small effects and multiple interactions. Yet, data points gained from linguistic corpora are never independent (Kilgarriff, 2005). For instance, a single utterance will typically yield multiple syntactic data points, but of course, the choices of syntactic constructions in a sentence depend heavily on one another. In the corpus study presented here, care is taken to group such linguistic interdependencies in the (random effects) models. A further issue arises due to sub-languages resulting from corpus choice, genre, or speaker. The model structure controls for such variation by contrasting primed and non-primed samples within the same corpus, and by using decay as the target metric to measure priming. A final methodological concern is coherence: adjacent utterances do not jump from topic to topic—instead, they form clusters or discourse segments that are topically coherent (Grosz & Sidner, 1986). Clustering may be present as a result of convention or processing constraints, but it may also be introduced by the task as it is in Map Task, where the path is typically drawn step-by-step, with the area around one landmark being discussed at a time. Could clusters be responsible for the short-term priming effect, producing more repetition inside a cluster than outside (and further away)? This potential confound would affect the short-term priming, but not the long-term adaptation measure. Most importantly, topic chains are reflected primarily in lexical choice, and only indirectly (e.g., via topic status) in syntactic configuration. Experiment 3: task success and short-term priming Under the IAM, we expect successful dialogues to show more priming than unsuccessful ones. To test the IAM hypothesis, we assume that success at the Map Task is an indicator of aligned situation models. The next experiment is designed to detect co-variance of short-term priming and task success. Method The Map Task consists of re-tracing a defined route according to the interactive description provided by the other interlocutor. So, task performance is measured in terms of how far the route that the follower has drawn deviates from the route shown on the giver’s map. To compute this for each dialogue, the developers of the Map Task corpus overlaid the giver’s map on the follower’s map and computed the area covered in between the paths (PATHDEV). Task success is then defined as the inverse of PATHDEV. We correlate short-term priming levels in each dialogue with path deviation. The underlying model is the same as in Experiment 1, except that an interaction of DIST and PATHDEV is included to measure this relationship. Prime-target distance lnðDistÞ is measured in time (s). Under the IAM, we expect there to be more priming with greater task success. As DIST is lower for stronger priming, and PATHDEV is lower for more successful dialogue outcomes, we expect a positive estimate for this interaction. Results Table 4 shows the full model. As before, short-term priming is reliably correlated (negatively) with lnðDistÞ, hence we see a decay and priming effect (lnðDistÞ, b ¼ 0:150; p < :0001). Notably, however, path deviation and short-term priming did not correlate. We tested for reliable PATHDEV and lnðDistÞ interactions, separately for PP and CP situations via contrasts. In neither case did we find a reliable interaction. Discussion We have shown that although there is a clear priming effect in the short term, the size of this priming effect does not correlate with task success. But does this indicate that there is no strong functional component to priming in the dialogue context? There may still be an influence of cognitive load due to speakers working on the task, or an overall disposition for higher priming in task-oriented dialogue: D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46 37
39 D.Reirter.I.D.Moore/Journal of Memory and Language 76 (2014)29-46 en-speaker【 th 0? shown) Covariate 2 p>国 88 884 0001 .153 000 08 thke short er it domakon3cteReiethat5posib othetical lanation of our failure to find the et al 2011)The next experimer riming-task s s correlat investigates the latter possibility.Ar ogous to the previ ds It i extent such a brief effect helps align their sit adaptation relares to more task success. ong-terr uation models.In the Map Task experiments one o Method make a difference is reference to landmarks.Do interlo For structural priming,two repetition effects have beer through the intemal structure of noun phrases that iden priming effects are strong tity the landmarks ces may also be avo dd )and reach alov secon es the eftect s s that e em syntactic adapttion that on)priming ntly tha tha lends itseftoa custering of references to the sameand mate ing.have been observed for sho m priming.but not fo g-tern ma An alternative xplanationcom om the empirica ong-term stru tural adaptation effects by Ferreira and long-term persists (Ferreira After the initial few sec structural repetition ld in short 10 but mem ones of sy ooks at tition of sy ctic rules tion for the This method splits each diald ue in half Apalogous to cess to long-term adaptation the short-term priming model repetit and sample rule instances from the second docu Experiment 4:task success and long-term adaptation .10 s,10-s por Interactive alignment is a process that happens on the rder to d ish adan tation from rall.random time-scale of min utes:spe we contrast dialogue halves tially thought be based on short-term priming.Picke and Garrod (2004)do not detail the longevity of the
Experiment 2 points to stronger priming in such situations. Our results are difficult to reconcile with the model suggested by Pickering and Garrod (2004), if we take shortterm priming as the driving force behind the IAM. A hypothetical explanation of our failure to find the priming–task success correlation is that short-term priming decays within a few seconds. It is questionable to what extent such a brief effect helps interlocutors align their situation models. In the Map Task experiments, one of the linguistic devices where lexical alignment is expected to make a difference is reference to landmarks. Do interlocutors need to refer to landmarks every few seconds? Syntactic priming forms part of alignment of such references through the internal structure of noun phrases that identify the landmarks. Syntactic devices may also be avoided within the early period of rapid decay of repetition probability that we observe. We hypothesized that the syntactically more complex descriptions of how to circumnavigate the landmarks would be repeated on the order of several times a minute, but not commonly within 5–10 s. An analysis of the dialogues, however, showed that reference is used much more frequently than we expected. The task lends itself to a clustering of references to the same landmark, as speakers describe the route step by step. Thus, our hypothetical explanation cannot be corroborated. An alternative explanation comes from the empirical literature: there are two distinguishable, but interacting adaptation effects. A fast, short-term priming effect, and long-term adaptation that persists (Ferreira & Bock, 2006). In the cognitive model we proposed in Reitter et al. (2011), short-term priming is enhanced by semantic material held in short-term memory, but memories of syntactic structures are reinforced and become increasingly more accessible with each use. This provides an explanation for the observed stronger priming in task-oriented dialogue. In the next experiment, we seek to link task success to long-term adaptation. Experiment 4: task success and long-term adaptation Interactive alignment is a process that happens on the time-scale of minutes: speakers establish a common reference system in the long run. This process may not as initially thought be based on short-term priming. Pickering and Garrod (2004) do not detail the longevity of the priming effects supporting alignment. It is unclear whether alignment is due to the automatic, classical priming effect, or whether it is based on a long-term effect that is possibly related to implicit learning (Bock & Griffin, 2000; Chang et al., 2006; Kaschak et al., 2011). The next experiment investigates the latter possibility. Analogous to the previous experiment, we hypothesize that more long-term adaptation relates to more task success. Method For structural priming,8 two repetition effects have been identified. Classical structural priming effects are strong: around 10% for syntactic rules (Reitter et al., 2006). However, they decay quickly (Branigan et al., 1999) and reach a low plateau after a few seconds, which makes the effect seem similar to semantic priming. What complicates matters is that there is also a different, long-term syntactic adaptation effect that is also commonly called (repetition) priming. Structural adaptation has been shown to last longer, from minutes (Bock & Griffin, 2000) to several days. Lexical boost interactions, where the lexical repetition of material within the repeated structure strengthens structural priming, have been observed for short-term priming, but not for long-term priming trials where material intervened between prime and target utterances. Thus, short- and long-term structural adaptation effects may well be due to separate cognitive processes, as argued by Ferreira and Bock (2006). After the initial few seconds, structural repetition shows little decay, but can be demonstrated even minutes or longer after the stimulus. To measure this type of adaptation, this method looks at repetition of syntactic rules over whole document halves, independently of decay. This method splits each dialogue in half. Analogous to the short-term priming model, we define repetition as the occurrence of a prime within the first document half (PRIME), and sample rule instances from the second document half. To rule out short-term priming effects, 10-s portion in the middle of the dialogues is excluded. In order to distinguish adaptation from overall, random repetition of syntactic rules, we contrast dialogue halves Table 4 The full regression model for the Map Task dataset (Experiment 3). CP indicates between-speaker (comprehension-production) priming; PP is within-speaker priming. The scale of PATHDEV is in mm2 to indicate the area of path deviation in the Map Task; as centred, it ranges from 64 to þ159. All covariates were centred; fixed-effect correlations between all centred variables was lower than 0:2. Model ANOVA corroborate the significance of parameter tests (F-values shown). Covariate b SE F z pð> jzjÞ Intercept 1:747 0.174 0:014 127 < 0:0001 lnðDistÞ 0:150 0.860 0:014 86.7 10.5 < 0:0001 CP 0:364 0.695 0:020 277.6 18.2 < 0:0001 PATHDEV 0:0002 1.000 0:0002 0.153 0.81 0:42 lnðFreqÞ 0:700 2.013 0:012 3557 59.9 < 0:0001 lnðDistÞ:CP 0.911 0:093 0:024 14.5 3.91 < 0:0001 lnðDistÞ:lnðFreqÞ 0:080 1.083 0:013 39.4 6.27 < 0:0001 lnðDistÞ:PATHDEV/PP 0.000 0:0000 0:0003 0.03 0.07 0:95 lnðDistÞ:PATHDEV/CP 0.000 0:0001 0:0004 0.21 0:84 8 In both production and comprehension, which we do not distinguish further for space reasons. 38 D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46