A Symmetric Local Search Network for Emotion-Cause Pair Extraction Zifeng Cheng,Zhiwei Jiang",Yafeng Yin,Hua Yu,Qing Gu State Key Laboratory for Novel Software Technology, Nanjing University,Nanjing 210023,China chengzf@smail.nju.edu.cn,[jzw,yafeng@nju.edu.cn huayu.yhesmail.nju.edu.cn,guq@nju.edu.cn Abstract Emotion-cause pair extraction(ECPE)is a new task which aims at extracting the potential clause pairs of emotions and corresponding causes in a document.To tackle this task,a two-step method was proposed by previous study which first extracted emotion clauses and cause clauses indi- vidually,then paired the emotion and cause clauses,and filtered out the pairs without causal- ity.Different from this method that separated the detection and the matching of emotion and cause into two steps,we propose a Symmetric Local Search Network(SLSN)model to perform the detection and matching simultaneously by local search.SLSN consists of two symmetric subnetworks,namely the emotion subnetwork and the cause subnetwork.Each subnetwork is composed of a clause representation learner and a local pair searcher.The local pair searcher is a specially-designed cross-subnetwork component which can extract the local emotion-cause pairs.Experimental results on the ECPE corpus demonstrate the superiority of our SLSN over existing state-of-the-art methods. 1 Introduction Emotion cause analysis is a research branch of sentiment analysis and has gained increasing popularity in recent years (Lee et al.,2010;Gui et al.,2016;Xia and Ding,2019;Xia et al.,2019).Its goal is to identify the potential causes that lead to the certain emotion.This is very useful in fields such as electronic commerce,where the sellers may concern about users'emotions towards the products as well as the causes of users'emotions. Previous studies on emotion cause analysis mainly focus on the task of emotion cause extraction (ECE),which is usually formalized as a clause-level sequence labeling problem(Chen et al.,2010;Gui et al.,2016;Li et al.,2018;Ding et al.,2019;Xia et al.,2019;Yu et al.,2019;Fan et al.,2019). Given an annotated emotion clause,for each clause in the document,the goal of ECE task is to identify whether the clause is the corresponding cause.However,in practice,the emotion clauses are naturally not annotated,which may limit the application of the ECE task in real-world scenarios.Motivated by this,Xia and Ding (2019)first proposed the emotion-cause pair extraction(ECPE)task,which aims to extract all potential pairs of emotion and corresponding cause in a document.As shown in Figure 1,the example document has 17 clauses,among which,the emotion clauses are c4,c13,and c17(marked as orange),and their corresponding cause clauses are c3,c12,and c15(marked as blue).The goal of ECPE task is to extract all emotion-cause pairs:(c4,c3),(c13,c12),and (c17,c15). The ECPE task is a new and more challenging task.To tackle this task,Xia and Ding(2019)proposed a two-step method,which has been demonstrated to be effective.In the first step,they extracted emotion clauses and cause clauses individually.In the second step,they used Cartesian product to pair the clauses and then used a logistic regression to filter out the emotion-cause pairs without causality.In this method, the detection of emotion and cause,and the matching of emotion and cause are separately implemented in two steps. Corresponding Author This work is licensed under a Creative Commons Attribution 4.0 International Licence.Licence details:http: //creativecommons.org/licenses/by/4.0/. 139 Proceedings of the 28th International Conference on Computational Linguistics,pages 139-149 Barcelona,Spain(Online),December 8-13,2020
Proceedings of the 28th International Conference on Computational Linguistics, pages 139–149 Barcelona, Spain (Online), December 8-13, 2020 139 A Symmetric Local Search Network for Emotion-Cause Pair Extraction Zifeng Cheng, Zhiwei Jiang∗ , Yafeng Yin, Hua Yu, Qing Gu State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China chengzf@smail.nju.edu.cn,{jzw,yafeng}@nju.edu.cn huayu.yh@smail.nju.edu.cn,guq@nju.edu.cn Abstract Emotion-cause pair extraction (ECPE) is a new task which aims at extracting the potential clause pairs of emotions and corresponding causes in a document. To tackle this task, a two-step method was proposed by previous study which first extracted emotion clauses and cause clauses individually, then paired the emotion and cause clauses, and filtered out the pairs without causality. Different from this method that separated the detection and the matching of emotion and cause into two steps, we propose a Symmetric Local Search Network (SLSN) model to perform the detection and matching simultaneously by local search. SLSN consists of two symmetric subnetworks, namely the emotion subnetwork and the cause subnetwork. Each subnetwork is composed of a clause representation learner and a local pair searcher. The local pair searcher is a specially-designed cross-subnetwork component which can extract the local emotion-cause pairs. Experimental results on the ECPE corpus demonstrate the superiority of our SLSN over existing state-of-the-art methods. 1 Introduction Emotion cause analysis is a research branch of sentiment analysis and has gained increasing popularity in recent years (Lee et al., 2010; Gui et al., 2016; Xia and Ding, 2019; Xia et al., 2019). Its goal is to identify the potential causes that lead to the certain emotion. This is very useful in fields such as electronic commerce, where the sellers may concern about users’ emotions towards the products as well as the causes of users’ emotions. Previous studies on emotion cause analysis mainly focus on the task of emotion cause extraction (ECE), which is usually formalized as a clause-level sequence labeling problem (Chen et al., 2010; Gui et al., 2016; Li et al., 2018; Ding et al., 2019; Xia et al., 2019; Yu et al., 2019; Fan et al., 2019). Given an annotated emotion clause, for each clause in the document, the goal of ECE task is to identify whether the clause is the corresponding cause. However, in practice, the emotion clauses are naturally not annotated, which may limit the application of the ECE task in real-world scenarios. Motivated by this, Xia and Ding (2019) first proposed the emotion-cause pair extraction (ECPE) task, which aims to extract all potential pairs of emotion and corresponding cause in a document. As shown in Figure 1, the example document has 17 clauses, among which, the emotion clauses are c4, c13, and c17 (marked as orange), and their corresponding cause clauses are c3, c12, and c15 (marked as blue). The goal of ECPE task is to extract all emotion-cause pairs: (c4, c3), (c13, c12), and (c17, c15). The ECPE task is a new and more challenging task. To tackle this task, Xia and Ding (2019) proposed a two-step method, which has been demonstrated to be effective. In the first step, they extracted emotion clauses and cause clauses individually. In the second step, they used Cartesian product to pair the clauses and then used a logistic regression to filter out the emotion-cause pairs without causality. In this method, the detection of emotion and cause, and the matching of emotion and cause are separately implemented in two steps. ∗ Corresponding Author This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details: http: //creativecommons.org/licenses/by/4.0/
(c)But when Hans heard these, (c2)he seemed very jealous. (c:)When Mr.Song had a son, (c)Hans was also very happy. (cs c3) (c)Hans had taught him to speak English'since theboy was young. (c)Hans also speaks Spanish and German, (c)and he ofen went downstairs to the community toteach children English (c)During Martial Arts Festival, (cs c12)X (c)he also helped with a lot of translation work, (c)and was rated as advanced worker. (cu)After the meeting. (cthe city organized all participants to travel, (c)Hans was very excited.- (C13c2)√ (c4)But before getting on the bus, (c13,c1s)X (cs)the tour guide said he was too old to go (c6)Everyone can see, (cr7 C1s) (c)Hans was very lost Figure 1:An example document from the ECPE corpus However,when humans deal with the ECPE task,they usually consider the detection and matching problems at the same time.This is mainly achieved through the process of local search.For example, as shown in Figure 1,if a clause is detected as an emotion clause (e.g.,c4),humans will search its corresponding cause clause (i.e.,ca)within its local context (i.e.,c2,c3,c4,c5,c6).The advantage of local search is that the wrong pairs (e.g.,(c4,c12))beyond the local context scope can be avoided Additionally,when local searching the cause clause corresponding to the target emotion clause,humans not only judge whether the clause is a cause clause,but also consider whether it matches the target emotion clause.In this way,they can avoid extracting the pairs (e.g.,(c13,c15))that are in the local context scope but mismatch.Similarly,when a cause clause is encountered,the corresponding emotion clause can also be searched within its local context scope. Inspired by this local search process,we propose a Symmetric Local Search Network (SLSN)model. The model consists of two subnetworks with symmetric structures,namely the emotion subnetwork and the cause subnetwork.Each subnetwork consists of two parts:a clause representation learner and a local pair searcher (LPS).Among them,the clause representation learner is designed to learn the emotion or cause representation of a clause.The local pair searcher is designed to perform the local search of emotion-cause pairs.Specifically,the LPS introduces a local context window to limit the scope of context for local search.In the process of local search,the LPS first judges whether the target clause is emotion (cause),and then judges whether each clause within the local context window is the corresponding cause (emotion).Finally,SLSN will output the local pair labels (i.e.,the labels of the target clause and the clauses within its local context window)for each clause in the document,from which we can get the emotion-cause pairs. The main contributions of this work can be summarized as follows: We propose a symmetric local search network model,which is an end-to-end model and gives a new scheme to solve the ECPE task. We design a local pair searcher in SLSN,which allows simultaneously detecting and matching the emotions and causes. Experimental results on the ECPE corpus demonstrate the superiority of our SLSN over existing state-of-the-art methods. 2 Symmetric Local Search Network In this section,we first present the task definition.Then,we introduce the SLSN model,followed by its technical details.Finally,we discuss the connection between the SLSN model and the previous two-step method. 140
140 (c1 ) But when Hans heard these, (c2 ) he seemed very jealous. (c3 ) When Mr. Song had a son, (c4 ) Hans was also very happy. (c5 ) Hans had taught him to speak English since the boy was young. (c6) Hans also speaks Spanish and German, (c7 ) and he often went downstairs to the community to teach children English. (c8 ) During Martial Arts Festival, (c9 ) he also helped with a lot of translation work, (c10) and was rated as advanced worker. (c11) After the meeting, (c12) the city organized all participants to travel, (c13) Hans was very excited. (c14) But before getting on the bus, (c15) the tour guide said he was too old to go. (c16) Everyone can see, (c17) Hans was very lost. (c4 , c3) (c13, c12) (c17, c15) (c4 , c12) (c13, c15) Figure 1: An example document from the ECPE corpus However, when humans deal with the ECPE task, they usually consider the detection and matching problems at the same time. This is mainly achieved through the process of local search. For example, as shown in Figure 1, if a clause is detected as an emotion clause (e.g., c4), humans will search its corresponding cause clause (i.e., c3) within its local context (i.e., c2, c3, c4, c5, c6). The advantage of local search is that the wrong pairs (e.g., (c4, c12)) beyond the local context scope can be avoided. Additionally, when local searching the cause clause corresponding to the target emotion clause, humans not only judge whether the clause is a cause clause, but also consider whether it matches the target emotion clause. In this way, they can avoid extracting the pairs (e.g., (c13, c15)) that are in the local context scope but mismatch. Similarly, when a cause clause is encountered, the corresponding emotion clause can also be searched within its local context scope. Inspired by this local search process, we propose a Symmetric Local Search Network (SLSN) model. The model consists of two subnetworks with symmetric structures, namely the emotion subnetwork and the cause subnetwork. Each subnetwork consists of two parts: a clause representation learner and a local pair searcher (LPS). Among them, the clause representation learner is designed to learn the emotion or cause representation of a clause. The local pair searcher is designed to perform the local search of emotion-cause pairs. Specifically, the LPS introduces a local context window to limit the scope of context for local search. In the process of local search, the LPS first judges whether the target clause is emotion (cause), and then judges whether each clause within the local context window is the corresponding cause (emotion). Finally, SLSN will output the local pair labels (i.e., the labels of the target clause and the clauses within its local context window) for each clause in the document, from which we can get the emotion-cause pairs. The main contributions of this work can be summarized as follows: • We propose a symmetric local search network model, which is an end-to-end model and gives a new scheme to solve the ECPE task. • We design a local pair searcher in SLSN, which allows simultaneously detecting and matching the emotions and causes. • Experimental results on the ECPE corpus demonstrate the superiority of our SLSN over existing state-of-the-art methods. 2 Symmetric Local Search Network In this section, we first present the task definition. Then, we introduce the SLSN model, followed by its technical details. Finally, we discuss the connection between the SLSN model and the previous two-step method
Pfinal Pete or Pete or (Pete nPete)or (Pete UPete) Per={…,(ce,dc)…J Pc={,(cle,cc).…} =,驰%,脂) =5,张1,,骆1) ■回…■ Symmetrical Local Search Network(SLSN) Figure 2:Overview of SLSN model 2.1 Task Definition The task of emotion-cause pair extraction(ECPE)is first studied by Xia and Ding (2019).In the ECPE task,each document d in the dataset D consists of multiple clauses d=[c1,c2,..,cn].The clause with emotional polarity(such as happiness,sadness,fear,anger,disgust and surprise)is labeled as an emotion clause ce.The clause that causes the emotion is called a cause clause ce.The pair of emotion clause and its corresponding cause clause is called an emotion-cause pair(ce,c).The goal of ECPE task is to extract all emotion-cause pairs in d: P={…,(c,c),…} Note that each document may contain several (at least one)emotion clauses,and each emotion clause may correspond to several (at least one)cause clauses.Besides,the emotion clause and its corresponding cause clause may be the same clause. 2.2 An Overview of SLSN As shown in Figure 2,SLSN receives a sequence of clauses from a document as input and predicts the local pair labels for these clauses,which can be directly converted into the corresponding emotion-cause (E-C)pairs.For each clause ci,SLSN predicts two types of local pair labels:E-LC labele and C- LE label le.The E-LC label le contains the emotion label (E-label)of the i-th clause and the local cause labels (LC-label)()of the clauses near the i-th clause.Similarly,the C-LE label e contains the cause label (C-label)of the i-th clause and the local emotion labels(LE-label) (,,of the clauses near the i-th clause.Whether a clause is near the target clause is defined by the local context window,whose size is denoted as k (the case in Figure 2 is k 1).That is,for a target clause,the scope of its local context includes the previous k clauses,itself,and the following clauses.Note that,both the E-LC labelle and the C-LE label e can be converted into their corresponding emotion-cause(E-C)pairs.For example,the corresponding E-C pair of e=(1,1,0,0) is (ci,ci-1),and the corresponding E-C pair of le =(1,1,0,0)is (ci-1,ci).We denote the E-C pair set corresponding toe as Pelc,and the E-C pair set corresponding to le as Pele.Then the final E-C pair set of our method is the union of Pelc and Pdle.Of course,Pele,Pcle or the intersection of Pele and Ple is also an option for the final pair set. 2.3 Components of SLSN As shown in Figure 3,SLSN contains two subnetworks,i.e.,the emotion subnetwork referred as E-net which is mainly for the E-LC label prediction and the cause subnetwork referred as C-net which is mainly for the C-LE label prediction.E-net and C-net have similar structures in terms of word embedding,clause encoder,and hidden state learning.After the hidden state learning layer,E-net and C-net use two types of local pair searchers (LPS)with symmetric structures for the local pair label prediction.The local pair 141
141 … 𝑷𝒇𝒊𝒏𝒂𝒍 𝑷𝒆𝒍𝒄 𝑜𝑟 𝑷𝒄𝒍𝒆 𝑜𝑟 𝑷𝒆𝒍𝒄 ∩ 𝑷𝒄𝒍𝒆 𝑜𝑟 (𝑷𝒆𝒍𝒄 ∪ 𝑷𝒄𝒍𝒆) c1 … ci … cn Symmetrical Local Search Network (SLSN) 𝑷𝒆𝒍𝒄 = ⋯ , (c 𝒆 , 𝒄 𝒍𝒄 , ⋯ } 𝒚ෝ𝒊 𝒆𝒍𝒄 = (𝑦ො𝑖 𝑒 , 𝑦ො𝑖−1 𝑙𝑐 , 𝑦ො𝑖 𝑙𝑐 , 𝑦ො𝑖+1 𝑙𝑐 ) 𝒚ෝ𝒊 𝒄𝒍𝒆 = (𝑦ො𝒊 𝒄 , 𝑦ො𝒊−𝟏 𝒍𝒆 , 𝑦ො𝒊 𝒍𝒆 , 𝑦ො𝒊+𝟏 𝒍𝒆 ) 𝑷𝒄𝒍𝒆 = ⋯ , (c 𝒍𝒆 , 𝒄 𝒄 , ⋯ } … … … Figure 2: Overview of SLSN model 2.1 Task Definition The task of emotion-cause pair extraction (ECPE) is first studied by Xia and Ding (2019). In the ECPE task, each document d in the dataset D consists of multiple clauses d = [c1, c2, · · · , cn]. The clause with emotional polarity (such as happiness, sadness, fear, anger, disgust and surprise) is labeled as an emotion clause c e . The clause that causes the emotion is called a cause clause c c . The pair of emotion clause and its corresponding cause clause is called an emotion-cause pair (c e , cc ). The goal of ECPE task is to extract all emotion-cause pairs in d: P = {· · · ,(c e , cc ), · · · } Note that each document may contain several (at least one) emotion clauses, and each emotion clause may correspond to several (at least one) cause clauses. Besides, the emotion clause and its corresponding cause clause may be the same clause. 2.2 An Overview of SLSN As shown in Figure 2, SLSN receives a sequence of clauses from a document as input and predicts the local pair labels for these clauses, which can be directly converted into the corresponding emotion-cause (E-C) pairs. For each clause ci , SLSN predicts two types of local pair labels: E-LC label yˆ elc i and CLE label yˆ cle i . The E-LC label yˆ elc i contains the emotion label (E-label) yˆ e i of the i-th clause and the local cause labels (LC-label) (ˆy lc i−1 , yˆ lc i , yˆ lc i+1) of the clauses near the i-th clause. Similarly, the C-LE label yˆ cle i contains the cause label (C-label) yˆ c i of the i-th clause and the local emotion labels (LE-label) (ˆy le i−1 , yˆ le i , yˆ le i+1) of the clauses near the i-th clause. Whether a clause is near the target clause is defined by the local context window, whose size is denoted as k (the case in Figure 2 is k = 1). That is, for a target clause, the scope of its local context includes the previous k clauses, itself, and the following k clauses. Note that, both the E-LC label yˆ elc i and the C-LE label yˆ cle i can be converted into their corresponding emotion-cause (E-C) pairs. For example, the corresponding E-C pair of yˆ elc i = (1, 1, 0, 0) is (ci , ci−1), and the corresponding E-C pair of yˆ cle i = (1, 1, 0, 0) is (ci−1, ci). We denote the E-C pair set corresponding to yˆ elc i as Pelc, and the E-C pair set corresponding to yˆ cle i as Pcle. Then the final E-C pair set of our method is the union of Pelc and Pcle. Of course, Pelc, Pcle or the intersection of Pelc and Pcle is also an option for the final pair set. 2.3 Components of SLSN As shown in Figure 3, SLSN contains two subnetworks, i.e., the emotion subnetwork referred as E-net which is mainly for the E-LC label prediction and the cause subnetwork referred as C-net which is mainly for the C-LE label prediction. E-net and C-net have similar structures in terms of word embedding, clause encoder, and hidden state learning. After the hidden state learning layer, E-net and C-net use two types of local pair searchers (LPS) with symmetric structures for the local pair label prediction. The local pair
TT1 Hidden state Learning (Clause-Level Bi-LSTM Hidden State Learning (Clause-Level Bi-LSTM) a Clause Encoder c v西-lSTM+Aen Clause Clause Encoder (wor 。● E-net C-net Figure 3:Framework of SLSN model searcher is a specially designed cross-subnetwork module,which uses the hidden states of the clauses in both subnetworks for prediction.In the following,we introduce the components of SLSN in technical details. 2.3.1 Word Embedding Before representing the clauses in the document,we first map each word in clauses into word embedding, which is a low-dimensional real-value vector.Formally,given a sequence of clauses d [c1,c2,..,cn], the clause ci=[w,w,...,w]consists of li words.We map each clause into its word-level represen- tation vi=v,v2,...v],where v is the word embedding of word w. 2.3.2 Clause Encoder After word embedding,we use a Bi-LSTM layer followed by an attention layer as the clause encoder in both E-net and C-net to learn the representation of clauses.Formally,in E-net,given the word-level representation of the i-th clause vi=[,,...,]as the input,the word-level Bi-LSTM layer first maps it to the hidden statesrThen,the attention layer maps eachr to the emotion representation of the clause sf by weighting each word in the clause and then aggregating them through the following equations: u tanh(Wwr +bw) (1) a= exp((u)Tus) ∑terp(u)Tus (2) =∑州 (3) where Ww,b and us are weight matrix,bias vector and context vector respectively.a is the attention weight ofr.Similarly,in C-net,the cause representation of the i-th clause sf is obtained using a clause encoder with a similar structure. 2.3.3 Hidden State Learning After the clause encoder,we use a hidden state learning layer to learn the contextualized representa- tion of each clause in the document.Formally,in E-net,given a sequence of emotion representations [s,s5,...,s as input,the clause-level Bi-LSTM layer is used to map it to a sequence of emotion hidden states [hf,h.h.Similarly,in C-net,the sequence of cause hidden states [hf,hh]is obtained from a sequence of cause representations. 2.3.4 Local Pair Searcher After obtaining the two types of hidden states,we design two types of local pair searchers(LPS)with symmetric structures in E-net and C-net respectively,to predict the local pair labels of each clause. In E-net,LPS predicts the E-LC label for each clause,which contains E-label and LC-label respec- tively.For the E-label prediction,LPS only uses the emotion hidden state of the clause.Formally,given 142
142 # (( &# # !)'*! # (( &# # !)'*! !)' #$& $&*! ((#( $# !)' #$& $&*! ((#( $# &# $%+ !' &) !' &) "$( $# )& )' )& $& " # !)' %&'#(( $# # (( $! $! E-net C-net Figure 3: Framework of SLSN model searcher is a specially designed cross-subnetwork module, which uses the hidden states of the clauses in both subnetworks for prediction. In the following, we introduce the components of SLSN in technical details. 2.3.1 Word Embedding Before representing the clauses in the document, we first map each word in clauses into word embedding, which is a low-dimensional real-value vector. Formally, given a sequence of clauses d = [c1, c2, · · · , cn], the clause ci = [w 1 i , w2 i , . . . , w li i ] consists of li words. We map each clause into its word-level representation vi = [v 1 i , v2 i , . . . , v li i ], where v j i is the word embedding of word w j i . 2.3.2 Clause Encoder After word embedding, we use a Bi-LSTM layer followed by an attention layer as the clause encoder in both E-net and C-net to learn the representation of clauses. Formally, in E-net, given the word-level representation of the i-th clause vi = [v 1 i , v2 i , . . . , v li i ] as the input, the word-level Bi-LSTM layer first maps it to the hidden states ri = [r 1 i , r2 i , . . . , r li i ]. Then, the attention layer maps each ri to the emotion representation of the clause s e i by weighting each word in the clause and then aggregating them through the following equations: u j i = tanh(Wwr j i + bw) (1) a j i = exp((u j i ) T us) P t exp((u t i ) T us) (2) s e i = X j a j i r j i (3) where Ww, bw and us are weight matrix, bias vector and context vector respectively. a j i is the attention weight of r j i . Similarly, in C-net, the cause representation of the i-th clause s c i is obtained using a clause encoder with a similar structure. 2.3.3 Hidden State Learning After the clause encoder, we use a hidden state learning layer to learn the contextualized representation of each clause in the document. Formally, in E-net, given a sequence of emotion representations [s e 1 , se 2 , . . . , se n ] as input, the clause-level Bi-LSTM layer is used to map it to a sequence of emotion hidden states [h e 1 ,h e 2 ,· · · ,h e n ]. Similarly, in C-net, the sequence of cause hidden states [h c 1 ,h c 2 ,· · · ,h c n ] is obtained from a sequence of cause representations. 2.3.4 Local Pair Searcher After obtaining the two types of hidden states, we design two types of local pair searchers (LPS) with symmetric structures in E-net and C-net respectively, to predict the local pair labels of each clause. In E-net, LPS predicts the E-LC label for each clause, which contains E-label and LC-label respectively. For the E-label prediction, LPS only uses the emotion hidden state of the clause. Formally, given
the emotion hidden state he of the i-th clause,LPS uses a softmax layer to predict its E-label yf through the following equation: 蓝=softmax(Weh线+be) (4) where We and be are weight matrix and bias vector respectively. For the LC-label prediction,there are two cases under consideration.If the predicted E-label of the i-th clause is false,the corresponding LC-label is a zero vector.Because it is unnecessary to predict LC-label.Otherwise,LPS predicts the LC-label for all the clauses within the local context of the i-th clause.We denote these clauses as local context clauses.Assuming that the local context window size is k=1(the case in Figure 3),the local context clauses of the i-th clause are ci-1,ci,and ci+1 respectively. For the LC-label prediction,both the emotion and cause hidden states of the clause are used.Formally. given the emotion hidden state he of the i-th clause and the cause hidden states [he1,he,h]of the corresponding local context clauses,LPS first calculates an emotion attention ratio j for each local context clause using the following formula: (hi,hg)=hihg (5) exp((hi,hg)) 入= ∑-ep(h,写) (6) where y(he,h)is an emotion attention function which estimates the relevance between the local cause and the target emotion.We choose the simple dot attention based on the experimental results (Luong et al.,2015).This emotion attention ratio Aj is then used to scale the original cause hidden states as follows: 5=入·9 (7) where ge is the scaled cause hidden state of the j-th local context clause.The used in Figure 3 refers to Eq.(5),Eq.(6),and Eq.(7).We further use a local Bi-LSTM layer to learn the contextualized representation of each local context clause: =LSTMie(df),j [ik,+ (8) 6=STMe(g5),j∈i-k,i+月 (9) Finally,the LC-label of the j-th local context clause is predicted through the following equation: softmaz(Wicoj+bie) (10) where o;is the concatenation ofof and j,Wie and bie are the weight matrix and bias vector respectively. Similarly,in C-net,the LPS with symmetric structure in E-net is used to predict the C-LE label for each clause,which contains C-label yf and LE-label e respectively. 2.4 Model Training The SLSN model consists of two sub-networks,i.e.,E-net and C-net.Given a sequence of clauses as input,the E-net is mainly used to predict their E-LC label,and the C-net is mainly used to predict their C-LE label.Thus,the loss of SLSN is a weighted sum of two components: L=aLelc+(1-a)Lcle (11) where a 0,1]is a tradeoff parameter.Both Lele and Lele consist of two parts of the loss: Lelc BLe+(1-B)Llc (12) Lce BLc+(1-B)Lle (13) 143
143 the emotion hidden state h e i of the i-th clause, LPS uses a softmax layer to predict its E-label yˆ e i through the following equation: yˆ e i = sof tmax(Weh e i + be) (4) where We and be are weight matrix and bias vector respectively. For the LC-label prediction, there are two cases under consideration. If the predicted E-label of the i-th clause is false, the corresponding LC-label is a zero vector. Because it is unnecessary to predict LC-label. Otherwise, LPS predicts the LC-label for all the clauses within the local context of the i-th clause. We denote these clauses as local context clauses. Assuming that the local context window size is k = 1 (the case in Figure 3), the local context clauses of the i-th clause are ci−1, ci , and ci+1 respectively. For the LC-label prediction, both the emotion and cause hidden states of the clause are used. Formally, given the emotion hidden state h e i of the i-th clause and the cause hidden states [h c i−1 ,h c i ,h c i+1] of the corresponding local context clauses, LPS first calculates an emotion attention ratio λj for each local context clause using the following formula: γ(h e i , hc j ) = h e ih c j (5) λj = exp(γ(h e i , hc j )) Pi+k j=i−k exp(γ(h e i , hc j )) (6) where γ(h e i , hc j ) is an emotion attention function which estimates the relevance between the local cause and the target emotion. We choose the simple dot attention based on the experimental results (Luong et al., 2015). This emotion attention ratio λj is then used to scale the original cause hidden states as follows: q lc j = λj · h c j (7) where q lc j is the scaled cause hidden state of the j-th local context clause. The ⊗ used in Figure 3 refers to Eq. (5), Eq. (6), and Eq. (7). We further use a local Bi-LSTM layer to learn the contextualized representation of each local context clause: −→oj = −−−−−→ LSTMlc(q lc j ), j ∈ [i − k, i + k] (8) ←−oj = ←−−−−− LSTMlc(q lc j ), j ∈ [i − k, i + k] (9) Finally, the LC-label yˆ lc j of the j-th local context clause is predicted through the following equation: yˆ lc j = sof tmax(Wlcoj + blc) (10) where oj is the concatenation of −→oj and ←−oj , Wlc and blc are the weight matrix and bias vector respectively. Similarly, in C-net, the LPS with symmetric structure in E-net is used to predict the C-LE label for each clause, which contains C-label yˆ c i and LE-label yˆ le j respectively. 2.4 Model Training The SLSN model consists of two sub-networks, i.e., E-net and C-net. Given a sequence of clauses as input, the E-net is mainly used to predict their E-LC label, and the C-net is mainly used to predict their C-LE label. Thus, the loss of SLSN is a weighted sum of two components: L = αLelc + (1 − α)L cle (11) where α ∈ [0, 1] is a tradeoff parameter. Both L elc and L cle consist of two parts of the loss: L elc = βLe + (1 − β)L lc (12) L cle = βLc + (1 − β)L le (13)
where B(0,1)is another tradeoff parameter.Le,Lle,Le and Lle are the cross-entropy loss of the prediction of E-label,LC-label,C-label,and LE-labele respectively: Le=- 1∑mf1og() (14) 71 i+k 1 p(2k+1) ∑(所=1)∑51og(站) (15) j=i-k =-∑o的 n n (16) =1 k p(2k+1) ∑(=1)∑ 51og(5) (17) i= j=i-k where y andyedenote the ground-truth,(is an indicator function,pe and pe denote the times that I()equals 1 in Eq.(15)and Eq.(17)respectively,and n is used to deal with the class imbalance problem. 2.5 Connection to the Two-Step Method The two-step method for the ECPE task was proposed by Xia and Ding(2019),which first extracted the emotion clause set and cause clause set individually,and then generated emotion-cause pairs and finally filtered out irrelevant pairs.In their approach,Cartesian product was used to pair the two sets. While in our model,we use the local context window size k to control the scope of local pair search. Considering the extreme case of k=n,LPS treats all clauses in the document as the local context of the target clause.This is actually equivalent to making a Cartesian product to search emotion-cause pair globally.Therefore,in the extreme case k =n,our method is approximately an end-to-end version of the two-step method (Xia and Ding,2019).In practice,k usually takes a much smaller value than n, which can reduce the complexity of the model while achieving better performance. 3 Experiments 3.1 Dataset and Experimental Settings We evaluate our proposed model on a Chinese ECPE corpus(Xia and Ding,2019)which is constructed based on the ECE corpus(Gui et al.,2016).In this paper,we use the same setting adopted by Xia and Ding(2019).The dataset is split into two parts,90%for training,and the remaining 10%for testing. The results reported in the following experiments are an average of 10-fold cross-validation.We use Precision(P),Recall(R),and Fl-score to measure the performance. We use the word embedding provided by NLPCC which was pre-trained on a 1.1 million Chinese Weibo corpora with the word2vec toolkit (Mikolov et al.,2013)and the dimension of word embedding is 200.The number of hidden units of Bi-LSTM in this paper is set to be 100.The network is trained based on the Adam optimizer,where the mini-batch size and the learning rate are set to 32 and 0.005.The tradeoff parameters a,B,and the size of the local context window k are set to 0.6,0.8,and 2 respectively. The parameter n is set to 5. 3.2 Comparison with Other Methods We compare our model with the following three baseline methods proposed by Xia and Ding (2019): Indep,Inter-CE and Inter-EC.All these three methods are based on the two-step framework,and the dif- ferences of them mainly lie in the first step.Among them,Indep extracts emotion and cause individully. Inter-CE uses cause extraction to enhance emotion extraction,while Inter-EC uses emotion extraction to enhance cause extraction.For our model,we consider four variants of SLSN:SLSN-E,SLSN-C,SLSN- I,and SLSN-U,which correspond to the output of Pele,the output of Ple,the intersection of Pele and Ple,and the union of Pele and Pcle,respectively. 144
144 where β ∈ (0, 1) is another tradeoff parameter. L e , L lc , L c and L le are the cross-entropy loss of the prediction of E-label yˆ e i , LC-label yˆ lc i , C-label yˆ c i , and LE-label yˆ le i respectively: L e = − 1 n Xn i=1 ηye i log(ˆy e i ) (14) L lc = − 1 p e(2k + 1) Xn i=1 I(ˆy e i = 1) X i+k j=i−k y lc j log(ˆy lc j ) (15) L c = − 1 n Xn i=1 ηyc i log(ˆy c i ) (16) L le = − 1 p c(2k + 1) Xn i=1 I(ˆy c i = 1) X i+k j=i−k y le j log(ˆy le j ) (17) where y e i , y lc j , y c i and y le j denote the ground-truth, I(·) is an indicator function, p e and p c denote the times that I(·) equals 1 in Eq. (15) and Eq. (17) respectively, and η is used to deal with the class imbalance problem. 2.5 Connection to the Two-Step Method The two-step method for the ECPE task was proposed by Xia and Ding (2019), which first extracted the emotion clause set and cause clause set individually, and then generated emotion-cause pairs and finally filtered out irrelevant pairs. In their approach, Cartesian product was used to pair the two sets. While in our model, we use the local context window size k to control the scope of local pair search. Considering the extreme case of k = n, LPS treats all clauses in the document as the local context of the target clause. This is actually equivalent to making a Cartesian product to search emotion-cause pair globally. Therefore, in the extreme case k = n, our method is approximately an end-to-end version of the two-step method (Xia and Ding, 2019). In practice, k usually takes a much smaller value than n, which can reduce the complexity of the model while achieving better performance. 3 Experiments 3.1 Dataset and Experimental Settings We evaluate our proposed model on a Chinese ECPE corpus (Xia and Ding, 2019) which is constructed based on the ECE corpus (Gui et al., 2016). In this paper, we use the same setting adopted by Xia and Ding (2019). The dataset is split into two parts, 90% for training, and the remaining 10% for testing. The results reported in the following experiments are an average of 10-fold cross-validation. We use Precision (P), Recall (R), and F1-score to measure the performance. We use the word embedding provided by NLPCC which was pre-trained on a 1.1 million Chinese Weibo corpora with the word2vec toolkit (Mikolov et al., 2013) and the dimension of word embedding is 200. The number of hidden units of Bi-LSTM in this paper is set to be 100. The network is trained based on the Adam optimizer, where the mini-batch size and the learning rate are set to 32 and 0.005. The tradeoff parameters α, β, and the size of the local context window k are set to 0.6, 0.8, and 2 respectively. The parameter η is set to 5. 3.2 Comparison with Other Methods We compare our model with the following three baseline methods proposed by Xia and Ding (2019): Indep, Inter-CE and Inter-EC. All these three methods are based on the two-step framework, and the differences of them mainly lie in the first step. Among them, Indep extracts emotion and cause individully, Inter-CE uses cause extraction to enhance emotion extraction, while Inter-EC uses emotion extraction to enhance cause extraction. For our model, we consider four variants of SLSN: SLSN-E, SLSN-C, SLSNI, and SLSN-U, which correspond to the output of Pelc, the output of Pcle, the intersection of Pelc and Pcle, and the union of Pelc and Pcle, respectively
Emotion Extraction Cause Extraction Emotion-Cause Pair Extraction Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Indep 0.8375 0.8071 0.8210 0.6902 0.5673 0.6205 0.6832 0.5082 0.5818 Inter-CE 0.8494 0.8122 0.8300 0.6809 0.5634 0.6151 0.6902 0.5135 0.5901 Inter-EC 0.8364 0.8107 0.8230 0.7041 0.6083 0.6507 0.6721 0.5705 0.6128 SLSN-E 0.8580 0.7718 0.8118 0.7386 0.6246 0.6762 0.7090 0.6062 0.6529 SLSN-C 0.8563 0.7541 0.8014 0.7301 0.6205 0.6704 0.7087 0.5956 0.6465 SLSN-I 0.8770 0.7193 0.7896 0.7696 0.5866 0.6651 0.7463 0.5696 0.6454 SLSN-U 0.8406 0.7980 0.8181 0.6992 0.6588 0.6778 0.6836 0.6291 0.6545 Table 1:Comparisons with baselines on ECPE corpus.Averaged results over 10 runs are reported.The best results are in bold. Components of Local Pair Searcher Emotion-Cause Pair Extraction Case Attention Function Local Encoder Precision Recall F1-score SLSN-U Using Attention Bi-LSTM 0.6836 0.6291 0.6545 (a) Not Using Attention Bi-LSTM 0.6322 0.6639 0.6462 (b) Using Attention Not Using Encoder 0.5549 0.5819 0.5631 (c) Using Attention FC 0.5531 0.6031 0.5730 (d) Using Attention CNN 0.6900 0.6045 0.6433 (e) Using Attention Transformer 0.6759 0.6198 0.6453 Table 2:Effect of the components in local pair searcher.Averaged results over 5 runs are reported. As shown in Table 1.for the target ECPE task.our methods (i.e..SLSN-E.SLSN-C,SLSN-I.SLSN- U)all achieve better performance than the two-step methods (i.e.,Indep,Inter-CE and Inter-EC)on F1- score,and SLSN-U achieves the best performance(0.6545 on Fl-score).When compared with the best baseline method Inter-EC,SLSN-E,SLSN-C,SLSN-I,and SLSN-U achieve 0.0401,0.0337,0.0326,and 0.0417 improvement on F1-score,respectively.More specifically,by observing the results on emotion extraction and cause extraction,we can find that when compared with the baseline methods,our models perform better on cause extraction (increase about 0.02 on Fl-score)and worse on emotion extraction (decrease about 0.02 on Fl-score).This indicates that our model SLSN has no advantage in emotion extraction,and the good performance on ECPE task may be owed to the effectiveness of cause extraction or emotion-cause pairing. 3.3 Effect of Components in LPS In this section,we explore how the components in LPS affect the performance of SLSN.We mainly study two components of LPS,i.e.,the attention function and the local encoder.We conduct the experiments by ablating them from the model or substituting them with other components.The results are shown in Table 2. By comparing SLSN-U and the case (a)in Table 2,we can find that the case using attention function outperforms the case not using attention function by about 0.0083 on Fl-score.Similarly,by comparing SLSN-U and the case (b),we can find that the case using local encoder(Bi-LSTM)outperforms the case not using local encoder by about 0.091 on Fl-socre.This indicates that both the attention function and the local encoder are effective designs.For the local encoder,we also try to substitute Bi-LSTM with other kinds of transformation layers such as FC(i.e.,Fully-Connected layer),CNN,and Transformer (Vaswani et al.,2017).Among these layers,we can find that Bi-LSTM achieves the best performance. Besides,CNN and transformer achieve similar performance which is only a little worse than Bi-LSTM (about 0.01 on Fl-score).FC gets the worst performance which only brings a little improvement on the case not using encoder (about 0.01 on F1-score).This indicates that the context aware transformation layers (i.e.,Bi-LSTM,CNN,and Transformer)are more suitable to be the local encoder,and Bi-LSTM is the best choice. 145
145 Emotion Extraction Cause Extraction Emotion-Cause Pair Extraction Precision Recall F1-score Precision Recall F1-score Precision Recall F1-score Indep 0.8375 0.8071 0.8210 0.6902 0.5673 0.6205 0.6832 0.5082 0.5818 Inter-CE 0.8494 0.8122 0.8300 0.6809 0.5634 0.6151 0.6902 0.5135 0.5901 Inter-EC 0.8364 0.8107 0.8230 0.7041 0.6083 0.6507 0.6721 0.5705 0.6128 SLSN-E 0.8580 0.7718 0.8118 0.7386 0.6246 0.6762 0.7090 0.6062 0.6529 SLSN-C 0.8563 0.7541 0.8014 0.7301 0.6205 0.6704 0.7087 0.5956 0.6465 SLSN-I 0.8770 0.7193 0.7896 0.7696 0.5866 0.6651 0.7463 0.5696 0.6454 SLSN-U 0.8406 0.7980 0.8181 0.6992 0.6588 0.6778 0.6836 0.6291 0.6545 Table 1: Comparisons with baselines on ECPE corpus. Averaged results over 10 runs are reported. The best results are in bold. Case Components of Local Pair Searcher Emotion-Cause Pair Extraction Attention Function Local Encoder Precision Recall F1-score SLSN-U Using Attention Bi-LSTM 0.6836 0.6291 0.6545 (a) Not Using Attention Bi-LSTM 0.6322 0.6639 0.6462 (b) Using Attention Not Using Encoder 0.5549 0.5819 0.5631 (c) Using Attention FC 0.5531 0.6031 0.5730 (d) Using Attention CNN 0.6900 0.6045 0.6433 (e) Using Attention Transformer 0.6759 0.6198 0.6453 Table 2: Effect of the components in local pair searcher. Averaged results over 5 runs are reported. As shown in Table 1, for the target ECPE task, our methods (i.e., SLSN-E, SLSN-C, SLSN-I, SLSNU) all achieve better performance than the two-step methods (i.e., Indep, Inter-CE and Inter-EC) on F1- score, and SLSN-U achieves the best performance (0.6545 on F1-score). When compared with the best baseline method Inter-EC, SLSN-E, SLSN-C, SLSN-I, and SLSN-U achieve 0.0401, 0.0337, 0.0326, and 0.0417 improvement on F1-score, respectively. More specifically, by observing the results on emotion extraction and cause extraction, we can find that when compared with the baseline methods, our models perform better on cause extraction (increase about 0.02 on F1-score) and worse on emotion extraction (decrease about 0.02 on F1-score). This indicates that our model SLSN has no advantage in emotion extraction, and the good performance on ECPE task may be owed to the effectiveness of cause extraction or emotion-cause pairing. 3.3 Effect of Components in LPS In this section, we explore how the components in LPS affect the performance of SLSN. We mainly study two components of LPS, i.e., the attention function and the local encoder. We conduct the experiments by ablating them from the model or substituting them with other components. The results are shown in Table 2. By comparing SLSN-U and the case (a) in Table 2, we can find that the case using attention function outperforms the case not using attention function by about 0.0083 on F1-score. Similarly, by comparing SLSN-U and the case (b), we can find that the case using local encoder (Bi-LSTM) outperforms the case not using local encoder by about 0.091 on F1-socre. This indicates that both the attention function and the local encoder are effective designs. For the local encoder, we also try to substitute Bi-LSTM with other kinds of transformation layers such as FC (i.e., Fully-Connected layer), CNN, and Transformer (Vaswani et al., 2017). Among these layers, we can find that Bi-LSTM achieves the best performance. Besides, CNN and transformer achieve similar performance which is only a little worse than Bi-LSTM (about 0.01 on F1-score). FC gets the worst performance which only brings a little improvement on the case not using encoder (about 0.01 on F1-score). This indicates that the context aware transformation layers (i.e., Bi-LSTM, CNN, and Transformer) are more suitable to be the local encoder, and Bi-LSTM is the best choice
06 0 04 ◆-52-◆6◆双54U ●-5乳he■●540◆队54y ●=5E=◆540◆5U 5 (a)Only previous (b)Only following (c)Symmetric (d)SLSN-U Figure 4:Effect of local context window 07 0G6 073 n65 0.8 5 203a4a5080.7a9 1a1a2004a5a80.70a910 10 0,1026304050807040910 (a)The effect of o (b)The effect of B Figure 5:Effects of a and B 3.4 Effect of Local Context Window In this section,we explore how the setting of local context window affects the performance of SLSN. We study three settings of local context window:only previous k(i.e.,the target clause and its previous k clauses),only following k(i.e.,the target clause and its following k clauses),and symmetric (i.e.,the target clause,its previous k clauses,and its following k clauses).For each setting,we study its effect on the performance of two basic models (i.e.,SLSN-E,SLSN-C)and the best model (i.e.,SLSN-U).We conduct the experiments by varying k:from 0 to 6 step by 1.The results are shown in Figure 4. From Figure 4(a)and Figure 4(b),we can find that the performance of SLSN-C is poor under the setting of only previous k(about 0.3 on F1-score),while the performance of SLSN-E is poor under the setting of only following k(about 0.3 on F1-score).This implies that the cause clauses may tend to appear before the corresponding emotion clauses.From Figure 4(c),we can find that both SLSN-C and SLSN-E keep a good performance (about 0.63 on F1-score)under the setting of symmetric.This indicates that the setting of symmetric is more robust than other two settings.By observing the former three subfigures in Figure 4,we can find that SLSN-U keeps a stable performance (about 0.65 on F1- score),no matter which setting of local context window is adopted.This means that SLSN-U is more robust to the setting of local context window than SLSN-C and SLSN-E.From Figure 4(d),we can find that SLSN-U achieves the best performance under the setting of symmetric.This indicates that the best choice of the setting of local context window is symmetric.In addition,by observing the window size k, we can find that all the three models achieve a relatively stable performance ask varies,except for the casek =0. 3.5 Effect of Tradeoff Parameters In this section,we explore the effect of the tradeoff parameters a and B on the performance of SLSN For each parameter,we study its effect on the performance of three models(i.e.,SLSN-E,SLSN-C,and SLSN-U).We conduct the experiments by varying a from 0 to 1 step by 0.1,and varying B from 0.1 to 0.9 step by 0.1.The results are shown in Figure 5. From the left subfigure of Figure 5(a),we can find that when a =0 or a =1,the models get poor performance.When a E [0.1,0.9],the performance of SLSN-E and SLSN-C can be improved.This implies that both kinds of loss(i.e.,Lele and Lde)are important to SLSN,and either E-net or E-net can benefit from each other.In most cases,SLSN-U achieves the best performance,and SLSN-E performs better than SLSN-C.From the right subfigure of Figure 5(a),we can find that the three metrics(i.e., 146
146 (a) Only previous (b) Only following (c) Symmetric (d) SLSN-U Figure 4: Effect of local context window (a) The effect of α (b) The effect of β Figure 5: Effects of α and β 3.4 Effect of Local Context Window In this section, we explore how the setting of local context window affects the performance of SLSN. We study three settings of local context window: only previous k (i.e., the target clause and its previous k clauses), only following k (i.e., the target clause and its following k clauses), and symmetric (i.e., the target clause, its previous k clauses, and its following k clauses). For each setting, we study its effect on the performance of two basic models (i.e., SLSN-E, SLSN-C) and the best model (i.e., SLSN-U). We conduct the experiments by varying k from 0 to 6 step by 1. The results are shown in Figure 4. From Figure 4(a) and Figure 4(b), we can find that the performance of SLSN-C is poor under the setting of only previous k (about 0.3 on F1-score), while the performance of SLSN-E is poor under the setting of only following k (about 0.3 on F1-score). This implies that the cause clauses may tend to appear before the corresponding emotion clauses. From Figure 4(c), we can find that both SLSN-C and SLSN-E keep a good performance (about 0.63 on F1-score) under the setting of symmetric. This indicates that the setting of symmetric is more robust than other two settings. By observing the former three subfigures in Figure 4, we can find that SLSN-U keeps a stable performance (about 0.65 on F1- score), no matter which setting of local context window is adopted. This means that SLSN-U is more robust to the setting of local context window than SLSN-C and SLSN-E. From Figure 4(d), we can find that SLSN-U achieves the best performance under the setting of symmetric. This indicates that the best choice of the setting of local context window is symmetric. In addition, by observing the window size k, we can find that all the three models achieve a relatively stable performance as k varies, except for the case k = 0. 3.5 Effect of Tradeoff Parameters In this section, we explore the effect of the tradeoff parameters α and β on the performance of SLSN. For each parameter, we study its effect on the performance of three models (i.e., SLSN-E, SLSN-C, and SLSN-U). We conduct the experiments by varying α from 0 to 1 step by 0.1, and varying β from 0.1 to 0.9 step by 0.1. The results are shown in Figure 5. From the left subfigure of Figure 5(a), we can find that when α = 0 or α = 1, the models get poor performance. When α ∈ [0.1, 0.9], the performance of SLSN-E and SLSN-C can be improved. This implies that both kinds of loss (i.e., L elc and L cle) are important to SLSN, and either E-net or E-net can benefit from each other. In most cases, SLSN-U achieves the best performance, and SLSN-E performs better than SLSN-C. From the right subfigure of Figure 5(a), we can find that the three metrics (i.e
Document Ground-truth Inter-EC SLSN-U Ms.Huang married a guy 20 years younger than her(c).In order to avoid losing her wealth (c5,c4) None (Cc5,C4) (c2),they have notarized their property before marriage(cs).However the man stole Ms.Huang's money twice after marriage (c),and Ms.Huang submitted a case to the court helplessly(cs). Chen and his wife have two boys and one daughter(c1).The 18-year-old son works in Taizhong (C4,C3) (C4,C3 (c4,c3) (e2).When he knew his mother had been killed by his father(cs),he was quite emotional (c4) (c4,c5), Then he saw his younger brother and sister(cs),and they cried together(c6). (c6.c5) Table 3:Two examples for the case study precision,recall,and F1-score)of SLSN-U exhibit a steady trend when a E[0.1,0.9].This indicates that SLSN-U is robust to the setting of parameter a. From the left subfigure of Figure 5(b),we can find that when B [0.1,0.9],the performance of all three models exhibit a upward trend as B increases.This means that the prediction of E-label and C- label plays a more important role than the prediction of LC-label and LE-label in the training process of SLSN.Again,in most cases,SLSN-U achieves the best performance,and SLSN-E performs better than SLSN-C.From the right subfigure of Figure 5(b),we can find that the recall and F1-score of SLSN-U exhibit an upward trend and the precision of SLSN-U exhibits a downward trend as B increases.This implies that the LPS tends to extract more emotion-cause pairs as B increases. 3.6 Case Study For the case study,we select two examples in the test dataset to demonstrate the effectiveness of our model.The ground-truth and the predicted results of Inter-EC and SLSN-U are given in Table 3. For the first example,Inter-EC predicts None while SLSN-U predicts the correct emotion-cause pair (c5,c4).For the Inter-EC method,it outputs the right pair only when the emotion clause set includes cs and the cause clause set includes c4.While for our method,we can extract emotion-cause pair according to emotion or cause clause.It is easier to establish a matching relationship. For the second example,we can observe that many wrong answers are predicted by Inter-EC (e.g., (c4,c5),(c6,c5)).Due to the use of Cartesian product operation,the connection between emotion clause and cause clause may be ignored,thus many irrelevant pairs will be introduced.It indicates that Cartesian product brings a lot of redundancy and the filter operation fails to filter out the irrelevant pairs.In our method,we can extract emotion-cause pair straightly in the local context window.So our method avoids the redundancy brought by Cartesian product.This is a reason that our method is better than Inter-EC method. 4 Related Work Emotion cause analysis has been studied for about a decade (Lee et al.,2010;Gui et al.,2016;Xia and Ding,2019).Previous studies on emotion cause analysis mainly focused on the emotion cause extraction (ECE)task (Ding et al.,2019;Xia et al.,2019;Fan et al.,2019).Recently,based on the ECE task,a new and more challenging task named emotion cause pair extraction (ECPE)was proposed (Xia and Ding, 2019). The ECE task was first proposed by Lee et al.(2010)and was formalized as a word-level sequence labeling problem.But Chen et al.(2010)suggested that clause may be a more appropriate unit than word for detecting cause.Later,based on this idea,Gui et al.(2016)released a Chinese ECE corpus from a public SINA city news.In this corpus,the ECE task was defined as a clause-level sequence labeling problem,the objective of which is to predict the cause clauses in a document given the emotion.For the following studies on ECE task,this corpus has become a benchmark dataset.While early studies mainly adopted the rule-based methods (Chen et al.,2010;Gao et al.,2015;Gui et al.,2014)and machine learning methods(Ghazi et al.,2015)to deal with the ECE task,recent studies has begun to apply the deep learning methods to this task (Gui et al.,2017;Li et al.,2018;Chen et al.,2018;Ding et al.,2019; Yu et al.,2019;Li et al.,2019;Fan et al.,2019;Xia et al.,2019). Although the ECE task is valuable in practice,its application in real-world scenarios is limited due to the reason that the emotion clauses are naturally not annotated.Considering this situation,Xia and 147
147 Document Ground-truth Inter-EC SLSN-U Ms. Huang married a guy 20 years younger than her (c1). In order to avoid losing her wealth (c2), they have notarized their property before marriage (c3). However the man stole Ms. Huang’s money twice after marriage (c4), and Ms. Huang submitted a case to the court helplessly (c5). (c5, c4) None (c5, c4) Chen and his wife have two boys and one daughter (c1). The 18-year-old son works in Taizhong (c2). When he knew his mother had been killed by his father (c3), he was quite emotional (c4). Then he saw his younger brother and sister (c5), and they cried together (c6). (c4, c3) (c4, c3), (c4, c5), (c6, c5) (c4, c3) Table 3: Two examples for the case study precision, recall, and F1-score) of SLSN-U exhibit a steady trend when α ∈ [0.1, 0.9]. This indicates that SLSN-U is robust to the setting of parameter α. From the left subfigure of Figure 5(b), we can find that when β ∈ [0.1, 0.9], the performance of all three models exhibit a upward trend as β increases. This means that the prediction of E-label and Clabel plays a more important role than the prediction of LC-label and LE-label in the training process of SLSN. Again, in most cases, SLSN-U achieves the best performance, and SLSN-E performs better than SLSN-C. From the right subfigure of Figure 5(b), we can find that the recall and F1-score of SLSN-U exhibit an upward trend and the precision of SLSN-U exhibits a downward trend as β increases. This implies that the LPS tends to extract more emotion-cause pairs as β increases. 3.6 Case Study For the case study, we select two examples in the test dataset to demonstrate the effectiveness of our model. The ground-truth and the predicted results of Inter-EC and SLSN-U are given in Table 3. For the first example, Inter-EC predicts None while SLSN-U predicts the correct emotion-cause pair (c5, c4). For the Inter-EC method, it outputs the right pair only when the emotion clause set includes c5 and the cause clause set includes c4. While for our method, we can extract emotion-cause pair according to emotion or cause clause. It is easier to establish a matching relationship. For the second example, we can observe that many wrong answers are predicted by Inter-EC (e.g., (c4, c5), (c6, c5)). Due to the use of Cartesian product operation, the connection between emotion clause and cause clause may be ignored, thus many irrelevant pairs will be introduced. It indicates that Cartesian product brings a lot of redundancy and the filter operation fails to filter out the irrelevant pairs. In our method, we can extract emotion-cause pair straightly in the local context window. So our method avoids the redundancy brought by Cartesian product. This is a reason that our method is better than Inter-EC method. 4 Related Work Emotion cause analysis has been studied for about a decade (Lee et al., 2010; Gui et al., 2016; Xia and Ding, 2019). Previous studies on emotion cause analysis mainly focused on the emotion cause extraction (ECE) task (Ding et al., 2019; Xia et al., 2019; Fan et al., 2019). Recently, based on the ECE task, a new and more challenging task named emotion cause pair extraction (ECPE) was proposed (Xia and Ding, 2019). The ECE task was first proposed by Lee et al. (2010) and was formalized as a word-level sequence labeling problem. But Chen et al. (2010) suggested that clause may be a more appropriate unit than word for detecting cause. Later, based on this idea, Gui et al. (2016) released a Chinese ECE corpus from a public SINA city news. In this corpus, the ECE task was defined as a clause-level sequence labeling problem, the objective of which is to predict the cause clauses in a document given the emotion. For the following studies on ECE task, this corpus has become a benchmark dataset. While early studies mainly adopted the rule-based methods (Chen et al., 2010; Gao et al., 2015; Gui et al., 2014) and machine learning methods (Ghazi et al., 2015) to deal with the ECE task, recent studies has begun to apply the deep learning methods to this task (Gui et al., 2017; Li et al., 2018; Chen et al., 2018; Ding et al., 2019; Yu et al., 2019; Li et al., 2019; Fan et al., 2019; Xia et al., 2019). Although the ECE task is valuable in practice, its application in real-world scenarios is limited due to the reason that the emotion clauses are naturally not annotated. Considering this situation, Xia and
Ding (2019)proposed the ECPE task and released a corresponding ECPE dataset based on the ECE corpus.To tackle the task,they proposed a two-step method,which first extracted emotions and causes individually by multi-task framework,and then got the emotion-cause pairs by pairing and filtering. 5 Conclusion and Future Work In this paper,we propose a symmetric local search network(SLSN)to perform end-to-end emotion-cause pair extraction.SLSN can straightly extract the emotion-cause pair through a process of local search. This is realized by designing a special component,i.e.,local pair searcher,which allows simultaneously detecting and matching the emotions and causes.Experimental results on the ECPE corpus demonstrate the effectiveness of our model. In the future,we will consider to further improve the performance of emotion extraction and cause ex- traction by employing more powerful pre-trained encoder(e.g.,BERT(Devlin et al.,2019))or designing some auxiliary tasks to utilize extra knowledge.Besides,we will further explore the process of local pair search,and seek for more advanced implementations of the local pair searcher. Acknowledgements This work is supported by National Natural Science Foundation of China under Grant Nos. 61906085,61802169,61972192,41972111;JiangSu Natural Science Foundation under Grant No. BK20180325:the Second Tibetan Plateau Scientific Expedition and Research Program under Grant No. 2019QZKK0204.This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. References Ying Chen,Sophia Yat Mei Lee,Shoushan Li,and Chu-Ren Huang.2010.Emotion cause detection with linguistic constructions.In COLING 2010,23rd International Conference on Computational Linguistics,pages 179-187. Ying Chen,Wenjun Hou,Xiyao Cheng,and Shoushan Li.2018.Joint learning for emotion classification and emotion cause detection.In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,pages 646-651. Jacob Devlin,Ming-Wei Chang,Kenton Lee,and Kristina Toutanova.2019.BERT:pre-training of deep bidirec- tional transformers for language understanding.In Jill Burstein,Christy Doran,and Thamar Solorio,editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Lin- guistics:Human Language Technologies,NAACL-HLT 2019,pages 4171-4186.Association for Computational Linguistics. Zixiang Ding,Huihui He,Mengran Zhang,and Rui Xia.2019.From independent prediction to reordered predic- tion:Integrating relative position and global label information to emotion cause identification.In The Thirry- Third AAAI Conference on Artificial Intelligence,AAAI 2019,pages 6343-6350. Chuang Fan,Hongyu Yan,Jiachen Du,Lin Gui,Lidong Bing,Min Yang,Ruifeng Xu,and Ruibin Mao.2019.A knowledge regularized hierarchical approach for emotion cause analysis.In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,EMNLP-IJCNLP,pages 5613-5623. Kai Gao,Hua Xu,and Jiushuo Wang.2015.Emotion cause detection for chinese micro-blogs based on ecocc model.In Advances in Knowledge Discovery and Data Mining-19th Pacific-Asia Conference,PAKDD,pages 3-14. Diman Ghazi,Diana Inkpen,and Stan Szpakowicz.2015.Detecting emotion stimuli in emotion-bearing sen- tences.In Computational Linguistics and Intelligent Text Processing,pages 152-165. Lin Gui,Li Yuan,Ruifeng Xu,Bin Liu,Qin Lu,and Yu Zhou.2014.Emotion cause detection with linguistic construction in chinese weibo text.In Natural Language Processing and Chinese Computing,pages 457-464. Lin Gui,Dongyin Wu,Ruifeng Xu,Qin Lu,and Yu Zhou.2016.Event-driven emotion cause extraction with corpus construction.In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Pro- cessing,EMNLP,pages 1639-1649. 148
148 Ding (2019) proposed the ECPE task and released a corresponding ECPE dataset based on the ECE corpus. To tackle the task, they proposed a two-step method, which first extracted emotions and causes individually by multi-task framework, and then got the emotion-cause pairs by pairing and filtering. 5 Conclusion and Future Work In this paper, we propose a symmetric local search network (SLSN) to perform end-to-end emotion-cause pair extraction. SLSN can straightly extract the emotion-cause pair through a process of local search. This is realized by designing a special component, i.e., local pair searcher, which allows simultaneously detecting and matching the emotions and causes. Experimental results on the ECPE corpus demonstrate the effectiveness of our model. In the future, we will consider to further improve the performance of emotion extraction and cause extraction by employing more powerful pre-trained encoder (e.g., BERT (Devlin et al., 2019)) or designing some auxiliary tasks to utilize extra knowledge. Besides, we will further explore the process of local pair search, and seek for more advanced implementations of the local pair searcher. Acknowledgements This work is supported by National Natural Science Foundation of China under Grant Nos. 61906085, 61802169, 61972192, 41972111; JiangSu Natural Science Foundation under Grant No. BK20180325; the Second Tibetan Plateau Scientific Expedition and Research Program under Grant No. 2019QZKK0204. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. References Ying Chen, Sophia Yat Mei Lee, Shoushan Li, and Chu-Ren Huang. 2010. Emotion cause detection with linguistic constructions. In COLING 2010, 23rd International Conference on Computational Linguistics, pages 179–187. Ying Chen, Wenjun Hou, Xiyao Cheng, and Shoushan Li. 2018. Joint learning for emotion classification and emotion cause detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 646–651. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, pages 4171–4186. Association for Computational Linguistics. Zixiang Ding, Huihui He, Mengran Zhang, and Rui Xia. 2019. From independent prediction to reordered prediction: Integrating relative position and global label information to emotion cause identification. In The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, pages 6343–6350. Chuang Fan, Hongyu Yan, Jiachen Du, Lin Gui, Lidong Bing, Min Yang, Ruifeng Xu, and Ruibin Mao. 2019. A knowledge regularized hierarchical approach for emotion cause analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pages 5613–5623. Kai Gao, Hua Xu, and Jiushuo Wang. 2015. Emotion cause detection for chinese micro-blogs based on ecocc model. In Advances in Knowledge Discovery and Data Mining - 19th Pacific-Asia Conference, PAKDD, pages 3–14. Diman Ghazi, Diana Inkpen, and Stan Szpakowicz. 2015. Detecting emotion stimuli in emotion-bearing sentences. In Computational Linguistics and Intelligent Text Processing, pages 152–165. Lin Gui, Li Yuan, Ruifeng Xu, Bin Liu, Qin Lu, and Yu Zhou. 2014. Emotion cause detection with linguistic construction in chinese weibo text. In Natural Language Processing and Chinese Computing, pages 457–464. Lin Gui, Dongyin Wu, Ruifeng Xu, Qin Lu, and Yu Zhou. 2016. Event-driven emotion cause extraction with corpus construction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 1639–1649