TT1 Hidden state Learning (Clause-Level Bi-LSTM Hidden State Learning (Clause-Level Bi-LSTM) a Clause Encoder c v西-lSTM+Aen Clause Clause Encoder (wor 。● E-net C-net Figure 3:Framework of SLSN model searcher is a specially designed cross-subnetwork module,which uses the hidden states of the clauses in both subnetworks for prediction.In the following,we introduce the components of SLSN in technical details. 2.3.1 Word Embedding Before representing the clauses in the document,we first map each word in clauses into word embedding, which is a low-dimensional real-value vector.Formally,given a sequence of clauses d [c1,c2,..,cn], the clause ci=[w,w,...,w]consists of li words.We map each clause into its word-level represen- tation vi=v,v2,...v],where v is the word embedding of word w. 2.3.2 Clause Encoder After word embedding,we use a Bi-LSTM layer followed by an attention layer as the clause encoder in both E-net and C-net to learn the representation of clauses.Formally,in E-net,given the word-level representation of the i-th clause vi=[,,...,]as the input,the word-level Bi-LSTM layer first maps it to the hidden statesrThen,the attention layer maps eachr to the emotion representation of the clause sf by weighting each word in the clause and then aggregating them through the following equations: u tanh(Wwr +bw) (1) a= exp((u)Tus) ∑terp(u)Tus (2) =∑州 (3) where Ww,b and us are weight matrix,bias vector and context vector respectively.a is the attention weight ofr.Similarly,in C-net,the cause representation of the i-th clause sf is obtained using a clause encoder with a similar structure. 2.3.3 Hidden State Learning After the clause encoder,we use a hidden state learning layer to learn the contextualized representa- tion of each clause in the document.Formally,in E-net,given a sequence of emotion representations [s,s5,...,s as input,the clause-level Bi-LSTM layer is used to map it to a sequence of emotion hidden states [hf,h.h.Similarly,in C-net,the sequence of cause hidden states [hf,hh]is obtained from a sequence of cause representations. 2.3.4 Local Pair Searcher After obtaining the two types of hidden states,we design two types of local pair searchers(LPS)with symmetric structures in E-net and C-net respectively,to predict the local pair labels of each clause. In E-net,LPS predicts the E-LC label for each clause,which contains E-label and LC-label respec- tively.For the E-label prediction,LPS only uses the emotion hidden state of the clause.Formally,given 142142 # (( &# # !)'*! # (( &# # !)'*! !)' #$& $&*! ((#( $# !)' #$& $&*! ((#( $# &# $%+ !' &) !' &) "$( $# )& )' )& $& " # !)' %&'#(( $# # (( $! $! E-net C-net Figure 3: Framework of SLSN model searcher is a specially designed cross-subnetwork module, which uses the hidden states of the clauses in both subnetworks for prediction. In the following, we introduce the components of SLSN in technical details. 2.3.1 Word Embedding Before representing the clauses in the document, we first map each word in clauses into word embedding, which is a low-dimensional real-value vector. Formally, given a sequence of clauses d = [c1, c2, · · · , cn], the clause ci = [w 1 i , w2 i , . . . , w li i ] consists of li words. We map each clause into its word-level representation vi = [v 1 i , v2 i , . . . , v li i ], where v j i is the word embedding of word w j i . 2.3.2 Clause Encoder After word embedding, we use a Bi-LSTM layer followed by an attention layer as the clause encoder in both E-net and C-net to learn the representation of clauses. Formally, in E-net, given the word-level representation of the i-th clause vi = [v 1 i , v2 i , . . . , v li i ] as the input, the word-level Bi-LSTM layer first maps it to the hidden states ri = [r 1 i , r2 i , . . . , r li i ]. Then, the attention layer maps each ri to the emotion representation of the clause s e i by weighting each word in the clause and then aggregating them through the following equations: u j i = tanh(Wwr j i + bw) (1) a j i = exp((u j i ) T us) P t exp((u t i ) T us) (2) s e i = X j a j i r j i (3) where Ww, bw and us are weight matrix, bias vector and context vector respectively. a j i is the attention weight of r j i . Similarly, in C-net, the cause representation of the i-th clause s c i is obtained using a clause encoder with a similar structure. 2.3.3 Hidden State Learning After the clause encoder, we use a hidden state learning layer to learn the contextualized representation of each clause in the document. Formally, in E-net, given a sequence of emotion representations [s e 1 , se 2 , . . . , se n ] as input, the clause-level Bi-LSTM layer is used to map it to a sequence of emotion hidden states [h e 1 ,h e 2 ,· · · ,h e n ]. Similarly, in C-net, the sequence of cause hidden states [h c 1 ,h c 2 ,· · · ,h c n ] is obtained from a sequence of cause representations. 2.3.4 Local Pair Searcher After obtaining the two types of hidden states, we design two types of local pair searchers (LPS) with symmetric structures in E-net and C-net respectively, to predict the local pair labels of each clause. In E-net, LPS predicts the E-LC label for each clause, which contains E-label and LC-label respectively. For the E-label prediction, LPS only uses the emotion hidden state of the clause. Formally, given