the emotion hidden state he of the i-th clause,LPS uses a softmax layer to predict its E-label yf through the following equation: 蓝=softmax(Weh线+be) (4) where We and be are weight matrix and bias vector respectively. For the LC-label prediction,there are two cases under consideration.If the predicted E-label of the i-th clause is false,the corresponding LC-label is a zero vector.Because it is unnecessary to predict LC-label.Otherwise,LPS predicts the LC-label for all the clauses within the local context of the i-th clause.We denote these clauses as local context clauses.Assuming that the local context window size is k=1(the case in Figure 3),the local context clauses of the i-th clause are ci-1,ci,and ci+1 respectively. For the LC-label prediction,both the emotion and cause hidden states of the clause are used.Formally. given the emotion hidden state he of the i-th clause and the cause hidden states [he1,he,h]of the corresponding local context clauses,LPS first calculates an emotion attention ratio j for each local context clause using the following formula: (hi,hg)=hihg (5) exp((hi,hg)) 入= ∑-ep(h,写) (6) where y(he,h)is an emotion attention function which estimates the relevance between the local cause and the target emotion.We choose the simple dot attention based on the experimental results (Luong et al.,2015).This emotion attention ratio Aj is then used to scale the original cause hidden states as follows: 5=入·9 (7) where ge is the scaled cause hidden state of the j-th local context clause.The used in Figure 3 refers to Eq.(5),Eq.(6),and Eq.(7).We further use a local Bi-LSTM layer to learn the contextualized representation of each local context clause: =LSTMie(df),j [ik,+ (8) 6=STMe(g5),j∈i-k,i+月 (9) Finally,the LC-label of the j-th local context clause is predicted through the following equation: softmaz(Wicoj+bie) (10) where o;is the concatenation ofof and j,Wie and bie are the weight matrix and bias vector respectively. Similarly,in C-net,the LPS with symmetric structure in E-net is used to predict the C-LE label for each clause,which contains C-label yf and LE-label e respectively. 2.4 Model Training The SLSN model consists of two sub-networks,i.e.,E-net and C-net.Given a sequence of clauses as input,the E-net is mainly used to predict their E-LC label,and the C-net is mainly used to predict their C-LE label.Thus,the loss of SLSN is a weighted sum of two components: L=aLelc+(1-a)Lcle (11) where a 0,1]is a tradeoff parameter.Both Lele and Lele consist of two parts of the loss: Lelc BLe+(1-B)Llc (12) Lce BLc+(1-B)Lle (13) 143143 the emotion hidden state h e i of the i-th clause, LPS uses a softmax layer to predict its E-label yˆ e i through the following equation: yˆ e i = sof tmax(Weh e i + be) (4) where We and be are weight matrix and bias vector respectively. For the LC-label prediction, there are two cases under consideration. If the predicted E-label of the i-th clause is false, the corresponding LC-label is a zero vector. Because it is unnecessary to predict LC-label. Otherwise, LPS predicts the LC-label for all the clauses within the local context of the i-th clause. We denote these clauses as local context clauses. Assuming that the local context window size is k = 1 (the case in Figure 3), the local context clauses of the i-th clause are ci−1, ci , and ci+1 respectively. For the LC-label prediction, both the emotion and cause hidden states of the clause are used. Formally, given the emotion hidden state h e i of the i-th clause and the cause hidden states [h c i−1 ,h c i ,h c i+1] of the corresponding local context clauses, LPS first calculates an emotion attention ratio λj for each local context clause using the following formula: γ(h e i , hc j ) = h e ih c j (5) λj = exp(γ(h e i , hc j )) Pi+k j=i−k exp(γ(h e i , hc j )) (6) where γ(h e i , hc j ) is an emotion attention function which estimates the relevance between the local cause and the target emotion. We choose the simple dot attention based on the experimental results (Luong et al., 2015). This emotion attention ratio λj is then used to scale the original cause hidden states as follows: q lc j = λj · h c j (7) where q lc j is the scaled cause hidden state of the j-th local context clause. The ⊗ used in Figure 3 refers to Eq. (5), Eq. (6), and Eq. (7). We further use a local Bi-LSTM layer to learn the contextualized representation of each local context clause: −→oj = −−−−−→ LSTMlc(q lc j ), j ∈ [i − k, i + k] (8) ←−oj = ←−−−−− LSTMlc(q lc j ), j ∈ [i − k, i + k] (9) Finally, the LC-label yˆ lc j of the j-th local context clause is predicted through the following equation: yˆ lc j = sof tmax(Wlcoj + blc) (10) where oj is the concatenation of −→oj and ←−oj , Wlc and blc are the weight matrix and bias vector respectively. Similarly, in C-net, the LPS with symmetric structure in E-net is used to predict the C-LE label for each clause, which contains C-label yˆ c i and LE-label yˆ le j respectively. 2.4 Model Training The SLSN model consists of two sub-networks, i.e., E-net and C-net. Given a sequence of clauses as input, the E-net is mainly used to predict their E-LC label, and the C-net is mainly used to predict their C-LE label. Thus, the loss of SLSN is a weighted sum of two components: L = αLelc + (1 − α)L cle (11) where α ∈ [0, 1] is a tradeoff parameter. Both L elc and L cle consist of two parts of the loss: L elc = βLe + (1 − β)L lc (12) L cle = βLc + (1 − β)L le (13)