正在加载图片...
Papers similar to the last papers you have read approach: Papers A and b are related because they both cite the delicate topic of the impact factor papers C, D and e Why the impact factor of joumals should not be used for In contrast, two documents are"co-cited"when at least one evaluating research paper cites both. This approach is illustrated in Figure 4 Papers A and B are related because they are both cited by papers C, D and e. The more co-citations two papers moreM Szklo(2008), receive, the more related they are [6 Epidemiology, vol. 19, no. 3 Figure 2: Similar paper recommendation IL. RELATED WO The usefulness of a research paper recommender system depends to a large extent on its ability to automatically determine related work to one or more documents. various approaches exist to determine the degree of similarity of DOC A documents in order to identify related work. cited cited Whereas text-mining approaches are used in cases in which references are not stated, citation analysis approaches Figure 4: Co-citation analysis usually deliver superior results as e.g. synonyms and unclear nomenclature do not lead to misleading results Although both approaches are suitable to identify similar 4, 5]. Many citation analysis approaches exist and they all papers, they serve different purposes. Whereas have their own strengths and weaknesses for identifying bibliographic coupling is retrospective, co-citation is similar documents. Among the most widely used are the essentially a forward-looking perspective [9]. However easily applicable cited by approach, which considers both approaches often deliver unsatisfying results, since papers as relevant that cite the same input document and the hey only make use of the bibliography at the end of the rence list approach, which considers papers as document without analyzing the constellation of citations relevant that were referenced by the input document. The Therefore it is not possible to determine in which part of a best results can usually be obtained by bibliographic related document the content of interest can be found coupling and co-citation analysis, which allow calculating the coupling strength [6]. These approaches, which were already invented in the 60s and 70s, are used by scientists II CITATION PROXIMITY ANALYSIS AND nd on academic search engine websites like CiteSeerl [9] CITATION ORDER ANALYSIS Instead of just using the bibliography, in CPa the information derived from the proximity of the citations to each other in the full-text is used to calculate the citation DOC A DoC B citing Proximity Index(CPD) in three steps 1. The document is parsed and a series of heuristics are used to process the citations including their position within the document 2. The citations are assigned to their corresponding items in the bibliography. The overall margin of error with the system we have developed equals nearly three percent for the first and second step Figure 3: Bibliographic coupling In the third step the proximity among each citation-pair is examined. The underlying assumption is that the closer the Documents are bibliographically coupled if they cite one or citations are to each other, the more likely it is that they ore documents in common. Figure 3 illustrates this 2 The citations were parsed using a modified version of parsCit (http://wing.comp.nus.edu.sg/parscit)incombinationwith exclusively developed software, which is available upon requestBased on document usage mining, Scienstein recommends you the following papers: Papers similar to the last papers you have read The delicate topic of the impact factor Why the impact factor of journals should not be used for evaluating research Impact Factor: Good Reasons for Concern more... Papers recently published by authors you have read Self-citations, co-authorships and keywords - A new approach to scientists’ field mobility Profiling citation impact - A new methodology more... Title Author Year Source Ratings Abstract Update M. Szklo (2008), Epidemiology, vol. 19, no. 3 Figure 2: Similar paper recommendation II. RELATED WORK The usefulness of a research paper recommender system depends to a large extent on its ability to automatically determine related work to one or more documents. Various approaches exist to determine the degree of similarity of documents in order to identify related work. Whereas text-mining approaches are used in cases in which references are not stated, citation analysis approaches usually deliver superior results as e.g. synonyms and unclear nomenclature do not lead to misleading results [3, 4, 5]. Many citation analysis approaches exist and they all have their own strengths and weaknesses for identifying similar documents. Among the most widely used are the easily applicable „cited by‟ approach, which considers papers as relevant that cite the same input document and the „reference list‟ approach, which considers papers as relevant that were referenced by the input document. The best results can usually be obtained by bibliographic coupling and co-citation analysis, which allow calculating the coupling strength [6]. These approaches, which were already invented in the 60s and 70s, are used by scientists and on academic search engine websites like CiteSeer1 [9]. Doc A citing Doc B citing Doc C Doc D Doc E cites cites Figure 3: Bibliographic coupling Documents are bibliographically coupled if they cite one or more documents in common. Figure 3 illustrates this 1 http://citeseer.ist.psu.edu approach: Papers A and B are related because they both cite papers C, D and E. In contrast, two documents are “co-cited” when at least one paper cites both. This approach is illustrated in Figure 4: Papers A and B are related because they are both cited by papers C, D and E. The more co-citations two papers receive, the more related they are [6]. Doc A cited Doc B cited Doc C Doc D Doc E cites cites Figure 4: Co-citation analysis Although both approaches are suitable to identify similar papers, they serve different purposes. Whereas bibliographic coupling is retrospective, co-citation is essentially a forward-looking perspective [9]. However, both approaches often deliver unsatisfying results, since they only make use of the bibliography at the end of the document without analyzing the constellation of citations. Therefore it is not possible to determine in which part of a related document the content of interest can be found. III. CITATION PROXIMITY ANALYSIS AND CITATION ORDER ANALYSIS Instead of just using the bibliography, in CPA the information derived from the proximity of the citations to each other in the full-text is used to calculate the Citation Proximity Index (CPI) in three steps. 1. The document is parsed and a series of heuristics are used to process the citations including their position within the document2 . 2. The citations are assigned to their corresponding items in the bibliography. The overall margin of error with the system we have developed equals nearly three percent for the first and second step. 3. In the third step the proximity among each citation-pair is examined. The underlying assumption is that the closer the citations are to each other, the more likely it is that they are 2 The citations were parsed using a modified version of parsCit (http://wing.comp.nus.edu.sg/parsCit) in combination with exclusively developed software, which is available upon request from the authors
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有