3.3 Model Training We adopt the stochastic gradient descent to train our models.Specifically,for the hybrid model (KEWE),we adopt the mini-batch mode used in (Yang et al.,2016).Firstly,we sample a batch of random walk paths of size N and take a gradient step to optimize the loss function C.Secondly,we sample a batch of text sentences of size N2 and take a gradient step to optimize the loss function Ct.We repeat the above procedures until the model converged or the predefined number of iterations is reached. The ratio is used to approximate the weighting factorin E3.For training the knowledge-only model,the process is the same without Ct and A. 3.4 Readability Assessment We apply KEWE for document-level readability assessment under a supervised learning framework pro- posed by Schwarm and Ostendorf(2005).The classifier for readability assessment is built on documents with annotated reading levels.Instead of using the hand-crafted features,we use the word embedding to produce the features of documents. To extract high level features from documents using word embedding,we design the max layer sim- ilar to the one used in the convolutional sentence approach proposed by Collobert et al.(Collobert et al.,2011).The max layer is used to generate a fix-size feature vector from variant length sequences. Specifically,given a document represented by a matrix M E Rmx",where the kth column is the word embedding of the kth word in the document,the max layer output a vector fmax(M): [fmax(M)li=max[M]i,t1≤i≤m (6) where t ={1,2,...,n}represents"time"of a sequence,and m is the dimension of embedding vectors. Besides the max layer,the min and average layers are also used to extract features.By concatenating all three feature vectors,we get the final feature vector f(M)of the document M as follows,which can be fed to the classifier for readability assessment. f(M)=[fmaz(M),fmin(M),favg(M)] (7) 4 Experiments In this section,we conduct experiments based on four datasets of two languages,to investigate the following two research questions: RQ1:Whether KEWE is effective for readability assessment,compared with other well-known read- ability assessment methods,and other word embedding models? RQ2:What are the effects of the quality of input (i.e.,the quality of the knowledge base and text corpus)and the hybrid ratio(i.e.,the weighting factor A)on the prediction performance of KEWE? 4.1 The Datasets and Knowledge Base The experiments are conducted on four datasets,including two English datasets:ENCT and EHCT, two Chinese datasets:CPT and CPC.ENCT,CPT (Jiang et al.,2015),and EHCT are extracted from textbooks,where the documents have already been leveled into grades;CPC is extracted from the students'compositions2 written by Chinese primary school students,where the documents are leveled by the six grades of authors.EHCT is collected from the English textbooks of Chinese high schools and colleges,which contains 4 levels corresponding to the 3 grades of high school plus undergraduates.The details of the four datasets are shown in Table 1. Since the experiments are conducted on two languages,we collect the knowledge bases for both En- glish and Chinese,which are used for extracting domain knowledge and constructing the knowledge graphs,as described in Section 3.1.The details are shown in Table 2. http://www.dzkbw.com http://www.eduxiao.com 371371 3.3 Model Training We adopt the stochastic gradient descent to train our models. Specifically, for the hybrid model (KEWEh), we adopt the mini-batch mode used in (Yang et al., 2016). Firstly, we sample a batch of random walk paths of size N1 and take a gradient step to optimize the loss function Lk. Secondly, we sample a batch of text sentences of size N2 and take a gradient step to optimize the loss function Lt . We repeat the above procedures until the model converged or the predefined number of iterations is reached. The ratio N1 N1+N2 is used to approximate the weighting factor λ in Eq. 3. For training the knowledge-only model, the process is the same without Lt and λ. 3.4 Readability Assessment We apply KEWE for document-level readability assessment under a supervised learning framework proposed by Schwarm and Ostendorf (2005). The classifier for readability assessment is built on documents with annotated reading levels. Instead of using the hand-crafted features, we use the word embedding to produce the features of documents. To extract high level features from documents using word embedding, we design the max layer similar to the one used in the convolutional sentence approach proposed by Collobert et al. (Collobert et al., 2011). The max layer is used to generate a fix-size feature vector from variant length sequences. Specifically, given a document represented by a matrix M ∈ R m×n , where the kth column is the word embedding of the kth word in the document, the max layer output a vector fmax(M): [fmax(M)]i = max t [M]i,t 1 ≤ i ≤ m (6) where t = {1, 2, . . . , n} represents ”time” of a sequence, and m is the dimension of embedding vectors. Besides the max layer, the min and average layers are also used to extract features. By concatenating all three feature vectors, we get the final feature vector f(M) of the document M as follows, which can be fed to the classifier for readability assessment. f(M) = [fmax(M), fmin(M), favg(M)] (7) 4 Experiments In this section, we conduct experiments based on four datasets of two languages, to investigate the following two research questions: RQ1: Whether KEWE is effective for readability assessment, compared with other well-known readability assessment methods, and other word embedding models? RQ2: What are the effects of the quality of input (i.e., the quality of the knowledge base and text corpus) and the hybrid ratio (i.e., the weighting factor λ) on the prediction performance of KEWE? 4.1 The Datasets and Knowledge Base The experiments are conducted on four datasets, including two English datasets: ENCT and EHCT, two Chinese datasets: CPT and CPC. ENCT, CPT (Jiang et al., 2015), and EHCT are extracted from textbooks 1 , where the documents have already been leveled into grades; CPC is extracted from the students’ compositions 2 written by Chinese primary school students, where the documents are leveled by the six grades of authors. EHCT is collected from the English textbooks of Chinese high schools and colleges, which contains 4 levels corresponding to the 3 grades of high school plus undergraduates. The details of the four datasets are shown in Table 1. Since the experiments are conducted on two languages, we collect the knowledge bases for both English and Chinese, which are used for extracting domain knowledge and constructing the knowledge graphs, as described in Section 3.1. The details are shown in Table 2. 1 http://www.dzkbw.com 2 http://www.eduxiao.com