D.Reitter.JD.Moore/Joumal of Memory and Language 76(2014)29-6 39 Map Task data the Map Tas ixed-fer borate of parameter tests(show) Covariate OR SE P( .72 152 03 55 n()S we com pants in Map Task were not nd not on its West side were made and went unnoticed.The repetition effect tha Using the same data as in Experiment 3.task success is AM devi ns int ct with the timated Predicting task success So far,we have put fo ard a case for a link be syntactic adaptation and task success.Howeverthe xical priming is of linguistic simila rity for task succes wed a reliable,positive with more comp lex m the our findings.In a interacted humans.In humar redicting the -0.624.D<05).Thus.we find a reliable correlation s after iust a first fe tums of the o ctic priming Greater path devia- quency In(FREQ)did not interact mated calls on to a human operator. 35)Such an interaction Experiment 5:the success prediction task Doc andn(FEQ).The interaction was removed from the model. effect of long-te In this section,we define a general task that predicts ted to sis estimated from ules in ea are r LI nguisti repetition(o) ween-dialog where dialogue must cess me Discussion where.for example.it is unclear whether a cente y res This is expect under the long-term adaptation of rare rules,which may point ou a qualitative difference to short-term priming.Taken Method dard n chine (see Church 2000.which inspired the ta points,ea stemming from single dialogues with dialogues halves taken from two different dialogues. A factor SAMEDOC distinguishes between the two cases. For SAMEDOC = 0, we combine dialogue halves stemming from different dialogues9 ; for SAMEDOC = 1, the dialogue halves stem from the same dialogue. Thus, our model estimates the influence of preceding context on rule repetition. The goal is now to establish an effect of SAMEDOC on repetition. Using the same data as in Experiment 3, task success is inverse path deviation PATHDEV as before, which should, under IAM assumptions, interact with the effect estimated for SAMEDOC. The response variable is PRIME, indicating whether a rule is repeated. Results As seen in Table 5, SAMEDOC showed a reliable, positive effect (b ¼ 3:303; p < :0001), which means we see longterm adaptation. This generalizes previous experimental priming results in long-term priming. The effect interacted reliably with the path deviation scores (SAMEDOC:PATHDEV, b ¼ 0:624; p < :05). Thus, we find a reliable correlation of task success and syntactic priming. Greater path deviations relate to weaker priming. The normalized rule frequency lnðFreqÞ did not interact with SAMEDOC(b ¼ 0:044; p ¼ :35). Such an interaction also could not be found in a reduced model with only SAMEDOC and lnðFreqÞ. The interaction was removed from the model. The effect of long-term adaptation can be visualized in a simple way. In Fig. 3, the proportions of repeated to novel syntactic rules in each dialogue are related to path deviation, contrasting within-dialogue and between-dialogue repetition (control). Discussion Speaker pairs’ long-term syntactic adaptation is correlated with the synchronization of their routes on the maps. This is exactly what one would expect under the assumption of the IAM. We find no evidence for stronger long-term adaptation of rare rules, which may point out a qualitative difference to short-term priming. Taken without theoretical motivation, the results do not imply causality. However, task success is unlikely to cause increased priming, as participants in Map Task were not told whether they were on the ‘‘right track’’. Mistakes, such as passing a landmark on its East and not on its West side, were made and went unnoticed. The repetition effect that contributes to prediction accuracy is long-term syntactic adaptation as opposed to short-term priming. Predicting task success So far, we have put forward a case for a link between syntactic adaptation and task success. However, the IAM spans more than the syntactic level. Lexical priming is also part of the priming cascade. In the following, we establish the predictiveness of linguistic similarity for task success with a more complex model that includes lexical features. Second, we demonstrate the computational applicability of our findings. In an application, an automatic estimate of task success could help evaluate conversations among humans. In human–computer dialogues, predicting the task success after just a first few turns of the conversation could avoid disappointment with the system by switching dialogue strategies or by passing poorly performing automated calls on to a human operator. Experiment 5: the success prediction task In this section, we define a general task that predicts conversational success from textual features. The task we set for ourselves requires that success is estimated from the contents of an entire dialogue. All linguistic and nonlinguistic information available may be used. This task reflects post hoc analysis applications, where dialogues must be evaluated without an independent success measure being available for each dialogue. This covers cases where, for example, it is unclear whether a call center agent or an automated system actually responded to the call satisfactorily. In the next section, we describe a statistical approach that uses repetition effects to implement this task. Method We use a standard machine-learning algorithm, a Support Vector Machine (SVM), which acquires a model from data that, in our case, predicts task success from a set of features. It can do so for a range of data points, each Table 5 The logistic regression model for the Map Task dataset (Experiment 4). The scale of PATHDEV is in mm2 to indicate the area of path deviation in the Map Task; as centred, it ranges from 64 to þ159. Thus, b and odds ratio (OR) for the critical parameter apply to a single mm2 in difference. All covariates were centred; fixed-effect correlations between all centred variables was lower than 0:2. Model ANOVA corroborate the significance of parameter tests (F-values shown). Covariate b OR SE F z pð> jzjÞ Intercept 2.722 15.2 0:036 75.5 < 0:0001 lnðFreqÞ 1.499 4.48 0:016 478 13,838 < 0:0001 SAMEDOC 1.064 2.90 0:048 478 22.0 < 0:0001 PATHDEV 0.001 1.00 0:001 2.27 1.03 ¼ 0:3 lnðFreqÞ:SAMEDOC 0.001 0.9990 0:0002 16.5 4.37 < 0:0001 SAMEDOC: PATHDEV 0.002 0.9977 0:001 6.32 2.51 < 0:05 9 This is a control condition; particularly if applied to lexical repetition, topicality can lead to repetition that is higher than would be sampled from large corpora in the same language (see Church, 2000, which inspired the methodology used here). D. Reitter, J.D. Moore / Journal of Memory and Language 76 (2014) 29–46 39