where B(0,1)is another tradeoff parameter.Le,Lle,Le and Lle are the cross-entropy loss of the prediction of E-label,LC-label,C-label,and LE-labele respectively: Le=- 1∑mf1og() (14) 71 i+k 1 p(2k+1) ∑(所=1)∑51og(站) (15) j=i-k =-∑o的 n n (16) =1 k p(2k+1) ∑(=1)∑ 51og(5) (17) i= j=i-k where y andyedenote the ground-truth,(is an indicator function,pe and pe denote the times that I()equals 1 in Eq.(15)and Eq.(17)respectively,and n is used to deal with the class imbalance problem. 2.5 Connection to the Two-Step Method The two-step method for the ECPE task was proposed by Xia and Ding(2019),which first extracted the emotion clause set and cause clause set individually,and then generated emotion-cause pairs and finally filtered out irrelevant pairs.In their approach,Cartesian product was used to pair the two sets. While in our model,we use the local context window size k to control the scope of local pair search. Considering the extreme case of k=n,LPS treats all clauses in the document as the local context of the target clause.This is actually equivalent to making a Cartesian product to search emotion-cause pair globally.Therefore,in the extreme case k =n,our method is approximately an end-to-end version of the two-step method (Xia and Ding,2019).In practice,k usually takes a much smaller value than n, which can reduce the complexity of the model while achieving better performance. 3 Experiments 3.1 Dataset and Experimental Settings We evaluate our proposed model on a Chinese ECPE corpus(Xia and Ding,2019)which is constructed based on the ECE corpus(Gui et al.,2016).In this paper,we use the same setting adopted by Xia and Ding(2019).The dataset is split into two parts,90%for training,and the remaining 10%for testing. The results reported in the following experiments are an average of 10-fold cross-validation.We use Precision(P),Recall(R),and Fl-score to measure the performance. We use the word embedding provided by NLPCC which was pre-trained on a 1.1 million Chinese Weibo corpora with the word2vec toolkit (Mikolov et al.,2013)and the dimension of word embedding is 200.The number of hidden units of Bi-LSTM in this paper is set to be 100.The network is trained based on the Adam optimizer,where the mini-batch size and the learning rate are set to 32 and 0.005.The tradeoff parameters a,B,and the size of the local context window k are set to 0.6,0.8,and 2 respectively. The parameter n is set to 5. 3.2 Comparison with Other Methods We compare our model with the following three baseline methods proposed by Xia and Ding (2019): Indep,Inter-CE and Inter-EC.All these three methods are based on the two-step framework,and the dif- ferences of them mainly lie in the first step.Among them,Indep extracts emotion and cause individully. Inter-CE uses cause extraction to enhance emotion extraction,while Inter-EC uses emotion extraction to enhance cause extraction.For our model,we consider four variants of SLSN:SLSN-E,SLSN-C,SLSN- I,and SLSN-U,which correspond to the output of Pele,the output of Ple,the intersection of Pele and Ple,and the union of Pele and Pcle,respectively. 144144 where β ∈ (0, 1) is another tradeoff parameter. L e , L lc , L c and L le are the cross-entropy loss of the prediction of E-label yˆ e i , LC-label yˆ lc i , C-label yˆ c i , and LE-label yˆ le i respectively: L e = − 1 n Xn i=1 ηye i log(ˆy e i ) (14) L lc = − 1 p e(2k + 1) Xn i=1 I(ˆy e i = 1) X i+k j=i−k y lc j log(ˆy lc j ) (15) L c = − 1 n Xn i=1 ηyc i log(ˆy c i ) (16) L le = − 1 p c(2k + 1) Xn i=1 I(ˆy c i = 1) X i+k j=i−k y le j log(ˆy le j ) (17) where y e i , y lc j , y c i and y le j denote the ground-truth, I(·) is an indicator function, p e and p c denote the times that I(·) equals 1 in Eq. (15) and Eq. (17) respectively, and η is used to deal with the class imbalance problem. 2.5 Connection to the Two-Step Method The two-step method for the ECPE task was proposed by Xia and Ding (2019), which first extracted the emotion clause set and cause clause set individually, and then generated emotion-cause pairs and finally filtered out irrelevant pairs. In their approach, Cartesian product was used to pair the two sets. While in our model, we use the local context window size k to control the scope of local pair search. Considering the extreme case of k = n, LPS treats all clauses in the document as the local context of the target clause. This is actually equivalent to making a Cartesian product to search emotion-cause pair globally. Therefore, in the extreme case k = n, our method is approximately an end-to-end version of the two-step method (Xia and Ding, 2019). In practice, k usually takes a much smaller value than n, which can reduce the complexity of the model while achieving better performance. 3 Experiments 3.1 Dataset and Experimental Settings We evaluate our proposed model on a Chinese ECPE corpus (Xia and Ding, 2019) which is constructed based on the ECE corpus (Gui et al., 2016). In this paper, we use the same setting adopted by Xia and Ding (2019). The dataset is split into two parts, 90% for training, and the remaining 10% for testing. The results reported in the following experiments are an average of 10-fold cross-validation. We use Precision (P), Recall (R), and F1-score to measure the performance. We use the word embedding provided by NLPCC which was pre-trained on a 1.1 million Chinese Weibo corpora with the word2vec toolkit (Mikolov et al., 2013) and the dimension of word embedding is 200. The number of hidden units of Bi-LSTM in this paper is set to be 100. The network is trained based on the Adam optimizer, where the mini-batch size and the learning rate are set to 32 and 0.005. The tradeoff parameters α, β, and the size of the local context window k are set to 0.6, 0.8, and 2 respectively. The parameter η is set to 5. 3.2 Comparison with Other Methods We compare our model with the following three baseline methods proposed by Xia and Ding (2019): Indep, Inter-CE and Inter-EC. All these three methods are based on the two-step framework, and the differences of them mainly lie in the first step. Among them, Indep extracts emotion and cause individully, Inter-CE uses cause extraction to enhance emotion extraction, while Inter-EC uses emotion extraction to enhance cause extraction. For our model, we consider four variants of SLSN: SLSN-E, SLSN-C, SLSNI, and SLSN-U, which correspond to the output of Pelc, the output of Pcle, the intersection of Pelc and Pcle, and the union of Pelc and Pcle, respectively