正在加载图片...
LJCSNS International Joumal of Computer Science and Network Security, VOL 6 No5A, May 2006 Paper Classification for recommendation on Research Support System Papits Tadachika Ozono and Toramatsu Shintani'y Computer Science and Engineering, Graduate School of Engineering Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 JAPAN one such technique uses the information gain (IG)metric[9] Summary assessed over the set of all words encountered in all texts[6[11]. Soucy [ll] proposed a feature selection shares research information, such as PDF files of research papers, method based on IG and a new algorithm that selects in computers on the network and classifies the information into types of research fields. Users of Papits can share various research eatures according to their average cooccurrence. It information and survey the corpora of their particular fields of yielded good results on binary class problems. Automatic research. In order to realize Papits, we need to design a classification in Papits needs to classify documents to be mechanism for identifying what words are best suited to classify classified into the multivalued category, since researches ocuments in predefined classes. Further we have to consider are organized by several fields. Since there are a lot of classification in cases where we must classify documents into research fields, it is hard to collect enough training data multivalued fields and where there is insufficient data for When the number of training data in one category is small, classification. In this paper, we present an implementation method feature selection becomes sensitive to noise and irrelevant of automatic classification based on a text classification technique data. Further, as previously pointed out, there may not for Papits. We also propose a new method for using feature necessarily be enough training data. This paper proposes a election to classify documents that are represented by a feature selection method for classify ing documents, which bag-of-words into a multivalued category. Our method transforms is represented by a bag-of-words, into the multivalued the multivalued category into a binary category to easily identify the characteristic words to classify category in a few training data category. It transforms the multivalued category into a Our experimental result indicates that our method can effectively binary category, and features are selected using IG classify documents in Papits The remainder of this paper is organized as follows Key words First, we show an outline of our Papits research support Knowledge Management, Recommendation, Text Categorization system. Second, we describe classification method and propose the feature selection algorithm for managing esearch papers. Third, we discuss the experimental results we obtained using our algorithm and prove its usefulness 1 Introduction Fourth, we discuss the functions of Papits. Fifth,we compared our work with related worl We have developed a research support system, called conclude with a brief summary and discuss future research apits [2118]. Papits has several functions that allow it to directions manage research information, i.e., a paper sharing function a paper classifier, a paper recommender, a paper retriever, and a research diary. The paper sharing function facilitates 2. Research Support System Papits to share research information such as the pdf files of research papers, and to collect papers from Web sites. The This section presents an outline of Papits, which is a function of automatic classification can classify research research support system, implemented as a web application information into several research fields. This function (using WebObjects: Web Objects is a tool for creating a enables users to search papers based on category of their Web Application, developed by Apple). Users can access interest. Automatic classification in Papits has a structure via a web browser. Papits has several functions that that gradually improves accuracy through feedback from manage research information, i. e, paper sharing, a paper users. In this paper, we mainly discuss paper classification. classifier, a paper recommender, a paper retriever, and a In automatic text classification, one of the main research diary. The knowledge management of Papits classify documents in predefined classes. Feature selection mainly discusses the paper classifier function, whisper problems is how to identify what words are best suited to supports surveys by through these functions. This techniques are therefore needed to identify these words, and provide intense support to surveys on fields of researchIJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5A, May 2006 17 Paper Classification for Recommendation on Research Support System Papits Tadachika Ozono,† and Toramatsu Shintani††, Computer Science and Engineering, Graduate School of Engineering Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555 JAPAN Summary We have developed a research support system, called Papits, that shares research information, such as PDF files of research papers, in computers on the network and classifies the information into types of research fields. Users of Papits can share various research information and survey the corpora of their particular fields of research. In order to realize Papits, we need to design a mechanism for identifying what words are best suited to classify documents in predefined classes. Further we have to consider classification in cases where we must classify documents into multivalued fields and where there is insufficient data for classification. In this paper, we present an implementation method of automatic classification based on a text classification technique for Papits. We also propose a new method for using feature selection to classify documents that are represented by a bag-of-words into a multivalued category. Our method transforms the multivalued category into a binary category to easily identify the characteristic words to classify category in a few training data. Our experimental result indicates that our method can effectively classify documents in Papits.. Key words: Knowledge Management, Recommendation, Text Categorization, Feature Selection. 1. Introduction We have developed a research support system, called Papits [2][8]. Papits has several functions that allow it to manage research information, i.e., a paper sharing function, a paper classifier, a paper recommender, a paper retriever, and a research diary. The paper sharing function facilitates to share research information, such as the PDF files of research papers, and to collect papers from Web sites. The function of automatic classification can classify research information into several research fields. This function enables users to search papers based on category of their interest. Automatic classification in Papits has a structure that gradually improves accuracy through feedback from users. In this paper, we mainly discuss paper classification. In automatic text classification, one of the main problems is how to identify what words are best suited to classify documents in predefined classes. Feature selection techniques are therefore needed to identify these words, and one such technique uses the information gain (IG) metric[9] assessed over the set of all words encountered in all texts[6][11]. Soucy [11] proposed a feature selection method based on IG and a new algorithm that selects features according to their average cooccurrence. It yielded good results on binary class problems. Automatic classification in Papits needs to classify documents to be classified into the multivalued category, since researches are organized by several fields. Since there are a lot of research fields, it is hard to collect enough training data. When the number of training data in one category is small, feature selection becomes sensitive to noise and irrelevant data. Further, as previously pointed out, there may not necessarily be enough training data. This paper proposes a feature selection method for classifying documents, which is represented by a bag-of-words, into the multivalued category. It transforms the multivalued category into a binary category, and features are selected using IG. The remainder of this paper is organized as follows: First, we show an outline of our Papits research support system. Second, we describe classification method and propose the feature selection algorithm for managing research papers. Third, we discuss the experimental results we obtained using our algorithm and prove its usefulness. Fourth, we discuss the functions of Papits. Fifth, we compared our work with related works. Finally, we conclude with a brief summary and discuss future research directions. 2. Research Support System Papits This section presents an outline of Papits, which is a research support system, implemented as a web application (using WebObjects: Web Objects is a tool for creating a Web Application, developed by Apple). Users can access via a web browser. Papits has several functions that manage research information, i.e., paper sharing, a paper classifier, a paper recommender, a paper retriever, and a research diary. The knowledge management of Papits supports surveys by through these functions. This paper mainly discusses the paper classifier function, which can provide intense support to surveys on fields of research
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有