walk strategy to sample the positive context.Following the idea of node2vec (Grover and Leskovec, 2016),we sample the positive contexts of words by simulating a 2d order random walk on the knowledge graph with only Sim_edge.After that,by applying a sliding window of fixed length s over the sampled random walk paths,we can get the positive target-context pairs {(wt,we). From the type Dissim_edge,we sample the negative target-context pairs where the target word and the context words are dissimilar on difficulty.Since dissimilarity is generally not transitive,we adopt the immediate neighbor strategy to sample the negative context.Specifically,on the knowledge graph with only Dissim_edge,we collect the negative context from the immediate neighbors of the target node wt and get the negative context list Cn(wt). By replacing the text-based linear context with our graph-based difficulty context,we can train the word embedding using the classic word embedding models,such as C&W,CBOW,and Skip-Gram. Here we use the Skip-Gram model with Negative Sampling (SGNS)proposed by Mikolov et al.(2013). Specifically,given N positive target-context pairs (wt,we)and the negative context list of the target word Cn(wt),the objective of KEWEk is to minimize the loss function Ck,which is defined as follows: 4=∑ loga(uweVw)+EwECn()loga(-uw,Vw) (2) (t,e) where v and u are the"input"and"output"vector representation of w,and o is the sigmoid function defined as a()=.This loss function enables the positive context (e.g.,wc)to be distinguished from the negative context (e.g.,wi). 3.2 The Hybrid Word Embedding Model(KEWE) The classic text-based word embedding models yield word embedding focusing on syntactic and seman- tic contexts,while ignoring the word difficulty.By contrast,KEWEk trains the word embedding focusing on the word difficulty,while leaving out the syntactic and semantic information.Since readability may also relate to both syntax and semantics,we develop a hybrid word embedding model (KEWE),to incorporate both domain knowledge and text corpus.The loss function of the hybrid model Ch can be expressed as follows: Ch=λCk+(1-入)Ct (3) where Ck is the loss of predicting the knowledge graph-based difficulty contexts,Ct is the loss of pre- dicting the text-based syntactic and semantic contexts,and A [0,1]is a weighting factor.Clearly,the case of A =1 reduces the hybrid model to the knowledge-only model. As there are many text-based word embedding models,the text-based loss Ct can be defined in various ways.To be consistent with KEWEk,we formalize Ct based on the Skip-Gram model.Given a text corpus,the Skip-Gram model aims to find word representations that are good at predicting the context words.Specifically,given a sequence of training words,denoted as w1,w2,...,wr,the objective of Skip-Gram model is to minimize the log loss of predicting the context using target word embedding, which can be expressed as follows: T Ct=- 1 logp(wt+ilwt) (4) t=1-s≤j≤sj≠0 where s is the window size of the context sampling.Since the full softmax function used to define p(wwt)is computationally expensive,we employ the negative sampling strategy (Mikolov et al., 2013)and replace every log p(welwt)in Ct by the following formula: k logp(welwt)=loga(uwev)+>EwP.(w)loga(-uw,Vw) (5) =1 where v,u,and o are of the same meanings as in Eq.2,k is the number of negative samples,and P(w)is the noise distribution.This strategy enables the actual context we to be distinguished from the noise context wi drawn from the noise distribution Pn(w). 370370 walk strategy to sample the positive context. Following the idea of node2vec (Grover and Leskovec, 2016), we sample the positive contexts of words by simulating a 2 nd order random walk on the knowledge graph with only Sim edge. After that, by applying a sliding window of fixed length s over the sampled random walk paths, we can get the positive target-context pairs {(wt , wc)}. From the type Dissim edge, we sample the negative target-context pairs where the target word and the context words are dissimilar on difficulty. Since dissimilarity is generally not transitive, we adopt the immediate neighbor strategy to sample the negative context. Specifically, on the knowledge graph with only Dissim edge, we collect the negative context from the immediate neighbors of the target node wt and get the negative context list Cn(wt). By replacing the text-based linear context with our graph-based difficulty context, we can train the word embedding using the classic word embedding models, such as C&W, CBOW, and Skip-Gram. Here we use the Skip-Gram model with Negative Sampling (SGNS) proposed by Mikolov et al. (2013). Specifically, given N positive target-context pairs (wt , wc) and the negative context list of the target word Cn(wt), the objective of KEWEk is to minimize the loss function Lk, which is defined as follows: Lk = − 1 N X (wt,wc) h log σ(u > wc vwt ) + Ewi∈Cn(wt) log σ(−u > wi vwt ) i (2) where vw and uw are the “input” and “output” vector representation of w, and σ is the sigmoid function defined as σ(x) = 1 (1+e−x) . This loss function enables the positive context (e.g., wc) to be distinguished from the negative context (e.g., wi). 3.2 The Hybrid Word Embedding Model (KEWEh) The classic text-based word embedding models yield word embedding focusing on syntactic and semantic contexts, while ignoring the word difficulty. By contrast, KEWEk trains the word embedding focusing on the word difficulty, while leaving out the syntactic and semantic information. Since readability may also relate to both syntax and semantics, we develop a hybrid word embedding model (KEWEh), to incorporate both domain knowledge and text corpus. The loss function of the hybrid model Lh can be expressed as follows: Lh = λLk + (1 − λ)Lt (3) where Lk is the loss of predicting the knowledge graph-based difficulty contexts, Lt is the loss of predicting the text-based syntactic and semantic contexts, and λ ∈ [0, 1] is a weighting factor. Clearly, the case of λ = 1 reduces the hybrid model to the knowledge-only model. As there are many text-based word embedding models, the text-based loss Lt can be defined in various ways. To be consistent with KEWEk, we formalize Lt based on the Skip-Gram model. Given a text corpus, the Skip-Gram model aims to find word representations that are good at predicting the context words. Specifically, given a sequence of training words, denoted as w1, w2, · · · , wT , the objective of Skip-Gram model is to minimize the log loss of predicting the context using target word embedding, which can be expressed as follows: Lt = − 1 T X T t=1 X −s≤j≤s,j6=0 log p(wt+j |wt) (4) where s is the window size of the context sampling. Since the full softmax function used to define p(wt+j |wt) is computationally expensive, we employ the negative sampling strategy (Mikolov et al., 2013) and replace every log p(wc|wt) in Lt by the following formula: log p(wc|wt) = log σ(u > wc vwt ) +X k i=1 Ewi∼Pn(w) log σ(−u > wi vwt ) (5) where vw, uw, and σ are of the same meanings as in Eq. 2, k is the number of negative samples, and Pn(w) is the noise distribution. This strategy enables the actual context wc to be distinguished from the noise context wi drawn from the noise distribution Pn(w)