Learning from Graph Propagation via Ordinal Distillation for One-Shot Automated Essay Scoring Zhiwei Jiang Meng Liu' Yafeng Yin jzw@nju.edu.cn mf1933061@smail.nju.edu.cn yafeng@nju.edu.cn State Key Laboratory for Novel State Key Laboratory for Novel State Key Laboratory for Novel Software Technology,Nanjing Software Technology,Nanjing Software Technology,Nanjing University University University Nanjing,China Nanjing,China Nanjing,China Hua Yu Zifeng Cheng Qing Gu huayu.yh@smail.nju.edu.cn chengzf@smail.nju.edu.cn guq@nju.edu.cn State Key Laboratory for Novel State Key Laboratory for Novel State Key Laboratory for Novel Software Technology,Nanjing Software Technology,Nanjing Software Technology,Nanjing University University University Nanjing,China Nanjing,China Nanjing,China ABSTRACT KEYWORDS One-shot automated essay scoring (AES)aims to assign scores Essay Scoring,One-Shot,Graph Propagation,Ordinal Distillation to a set of essays written specific to a certain prompt,with only ACM Reference Format: one manually scored essay per distinct score.Compared to the Zhiwei Jiang,Meng Liu,Yafeng Yin,Hua Yu,Zifeng Cheng,and Qing Gu previous-studied prompt-specific AES which usually requires a 2021.Learning from Graph Propagation via Ordinal Distillation for One- large number of manually scored essays for model training (e.g.. Shot Automated Essay Scoring.In Proceedings of the Web Conference 2021 about 600 manually scored essays out of totally 1000 essays),one- (WWW '21),April 19-23,2021,Ljubljana,Slovenia.ACM,New York,NY. shot AES can greatly reduce the workload of manual scoring.In this USA,10 pages.https:/doi.org/10.1145/3442381.3450017 paper,we propose a Transductive Graph-based Ordinal Distillation (TGOD)framework to tackle the task of one-shot AES.Specifically, 1 INTRODUCTION we design a transductive graph-based model as a teacher model to generate pseudo labels of unlabeled essays based on the one-shot Automated Essay Scoring(AES)aims to summarize the quality of labeled essays.Then,we distill the knowledge in the teacher model a student essay with a score or grade based on the factors such into a neural student model by learning from the high confidence as grammaticality,organization,and coherence.It is commercially pseudo labels.Different from the general knowledge distillation, valuable to be able to automate the scoring of millions of essays.In we propose an ordinal-aware unimodal distillation which makes a fact,AES has been developed and deployed in large-scale standard- ized tests such as TOEFL,GMAT,and GRE [2].Besides evaluating unimodal distribution constraint on the output of student model, the quality of essays,as an evaluation technique of text quality,AES to tolerate the minor errors existed in pseudo labels.Experimental results on the public dataset ASAP show that TGOD can improve can also be used conveniently to evaluate the quality of various the performance of existing neural AES models under the one-shot Web texts(e.g.,news,responses,and posts). Research on automated essay scoring has spanned the last 50 AES setting and achieve an acceptable average OWK of 0.69. years [25],and still continues to draw a lot of attention in the natu- CCS CONCEPTS ral language processing community [17].Traditional AES methods mainly rely on various handcrafted-features and score essays based Computing methodologies->Natural language processing: on regression methods [2,19,26,32,48].Recently,with the de- Information systems-Clustering and classification. velopment of deep learning technology,many models based on LSTM and CNN have been proposed [7,8,10,39,41].These models "Both authors contributed equally to this research. can automatically learn the features of essays and achieve better Corresponding author. performance than traditional methods. However,to train an effective neural AES model,it often needs a large number of manually scored essays for model training(e.g., This paper is published under the Creative Commons Attribution 40 International about 600 manually scored essays out of totally 1000 essays in a (CC-BY 4.0)license.Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution. test),which is labor intensive.This limits its application in some WWW '21,April 19-23,2021,Ljubljana,Slovenia real-world scenarios.To this end,some recent work considers using 2021 IW3C2 (International World Wide Web Conference Committee),published the scored essays under other prompts (ie.,topic of writing essay) under Creative Commons CC-BY 4.0 License. ACM ISBN978-1-4503-8312-7/21/04. to alleviate the burden of manual scoring under target prompt.But https:/doi.org/10.1145/3442381.3450017 due to the difference among prompts such as genre,score range, 2347Learning from Graph Propagation via Ordinal Distillation for One-Shot Automated Essay Scoring Zhiwei Jiang∗† jzw@nju.edu.cn State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China Meng Liu∗ mf1933061@smail.nju.edu.cn State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China Yafeng Yin yafeng@nju.edu.cn State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China Hua Yu huayu.yh@smail.nju.edu.cn State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China Zifeng Cheng chengzf@smail.nju.edu.cn State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China Qing Gu guq@nju.edu.cn State Key Laboratory for Novel Software Technology, Nanjing University Nanjing, China ABSTRACT One-shot automated essay scoring (AES) aims to assign scores to a set of essays written specific to a certain prompt, with only one manually scored essay per distinct score. Compared to the previous-studied prompt-specific AES which usually requires a large number of manually scored essays for model training (e.g., about 600 manually scored essays out of totally 1000 essays), oneshot AES can greatly reduce the workload of manual scoring. In this paper, we propose a Transductive Graph-based Ordinal Distillation (TGOD) framework to tackle the task of one-shot AES. Specifically, we design a transductive graph-based model as a teacher model to generate pseudo labels of unlabeled essays based on the one-shot labeled essays. Then, we distill the knowledge in the teacher model into a neural student model by learning from the high confidence pseudo labels. Different from the general knowledge distillation, we propose an ordinal-aware unimodal distillation which makes a unimodal distribution constraint on the output of student model, to tolerate the minor errors existed in pseudo labels. Experimental results on the public dataset ASAP show that TGOD can improve the performance of existing neural AES models under the one-shot AES setting and achieve an acceptable average QWK of 0.69. CCS CONCEPTS • Computing methodologies → Natural language processing; • Information systems → Clustering and classification. ∗Both authors contributed equally to this research. †Corresponding author. This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution. WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License. ACM ISBN 978-1-4503-8312-7/21/04. https://doi.org/10.1145/3442381.3450017 KEYWORDS Essay Scoring, One-Shot, Graph Propagation, Ordinal Distillation ACM Reference Format: Zhiwei Jiang, Meng Liu, Yafeng Yin, Hua Yu, Zifeng Cheng, and Qing Gu. 2021. Learning from Graph Propagation via Ordinal Distillation for OneShot Automated Essay Scoring. In Proceedings of the Web Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana, Slovenia. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3442381.3450017 1 INTRODUCTION Automated Essay Scoring (AES) aims to summarize the quality of a student essay with a score or grade based on the factors such as grammaticality, organization, and coherence. It is commercially valuable to be able to automate the scoring of millions of essays. In fact, AES has been developed and deployed in large-scale standardized tests such as TOEFL, GMAT, and GRE [2]. Besides evaluating the quality of essays, as an evaluation technique of text quality, AES can also be used conveniently to evaluate the quality of various Web texts (e.g., news, responses, and posts). Research on automated essay scoring has spanned the last 50 years [25], and still continues to draw a lot of attention in the natural language processing community [17]. Traditional AES methods mainly rely on various handcrafted-features and score essays based on regression methods [2, 19, 26, 32, 48]. Recently, with the development of deep learning technology, many models based on LSTM and CNN have been proposed [7, 8, 10, 39, 41]. These models can automatically learn the features of essays and achieve better performance than traditional methods. However, to train an effective neural AES model, it often needs a large number of manually scored essays for model training (e.g., about 600 manually scored essays out of totally 1000 essays in a test), which is labor intensive. This limits its application in some real-world scenarios. To this end, some recent work considers using the scored essays under other prompts (i.e., topic of writing essay) to alleviate the burden of manual scoring under target prompt. But due to the difference among prompts such as genre, score range, 2347