《电子商务 E-business》阅读文献：A Two-Level Learning Hierarchy of Concept Based Keyword Extraction for Tag Recommendations

团购合买资源类别：文库，文档格式：PDF，文档页数：305，文件大小：4.92MB

Table of Contents Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Relational Classification for Personalized Tag Recommendation . . . . . . . . . 7 Leandro Balby Marinho, Christine Preisach, and Lars SchmidtThieme Measuring Vertex Centrality in Co-occurrence Graphs for Online Social Tag Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Iv´an Cantador, David Vallet, and Joemon M. Jose Social Tag Prediction Base on Supervised Ranking Model . . . . . . . . . . . . . . 35 Hao Cao, Maoqiang Xie, Lian Xue, Chunhua Liu, Fei Teng, and Yalou Huang Tag Recommendations Based on Tracking Social Bookmarking Systems . . 49 Szymon Chojnacki A Fast Effective Multi-Channeled Tag Recommender . . . . . . . . . . . . . . . . . . 59 Jonathan Gemmell, Maryam Ramezani, Thomas Schimoler, Laura Christiansen, and Bamshad Mobasher A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems . . . . . . . . . . . . . . . . . . . . 71 Anestis Gkanogiannis and Theodore Kalamboukis Discriminative Clustering for Content-Based Tag Recommendation in Social Bookmarking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Malik Tahir Hassan, Asim Karim, Suresh Manandhar, and James Cussens Time based Tag Recommendation using Direct and Extended Users Sets . 99 Tereza Iofciu and Gianluca Demartini A Weighting Scheme for Tag Recommendation in Social Bookmarking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Sanghun Ju and Kyu-Baek Hwang ARKTiS - A Fast Tag Recommender System Based On Heuristics. . . . . . . 119 Thomas Kleinbauer and Sebastian Germesin Tag Recommendation using Probabilistic Topic Models . . . . . . . . . . . . . . . . 131 Ralf Krestel and Peter Fankhauser

Preface Since 1999 the ECML PKDD embraces the tradition of organizing a Discovery Challenge, allowing researchers to develop and test algorithms for novel and eal world datasets. This year's Discovery Challenge presents a dataset from he field of social bookmarking to deal with the recommendation of tags. The results submitted by the challenge's participants are presented at an ECML PKDD workshop on September 7th, 2009, in Bled, Slovenia. The provided dataset has been created using data of the social bookmark and publication sharing system BibSonomy, 2 operated by the organizers of the allenge. The training data was released on March 25th 2009, the test data or July 6th. The participants had time until July &th to submit their results. This gave researchers 14 weeks time to tune their algorithms on a snapshot of a real world folksonomy dataset and 48 hours to compute results on the test data To support the user during the tagging process and to facilitate the tagging, BibSonomy includes a tag recommender. When a user finds an interesting web page(or publication) and posts it to BibSonomy, the system offers up to five recommended tags on the posting page. The goal of the challenge is to learn a model which effectively predicts the keywords a user has in mind when describing a web page (or publication). We divided the problem into three tasks, each of which focusing on a certain aspect. All three tasks get the same dataset fo training. It is a snapshot of Bibsonomy until December 31st 2008. The dataset leaned and consists of two parts, the core part and the complete snapshot The test dataset is different for each task Task 1: Content-Based Tag Recommendations. The test data for this task con- tains posts, whose user, resource or tags are not contained in the post-core at level 2 of the training data. Thus, methods which can't produce tag recommen- dations for new resources or are unable to suggest new tags very probably wor produce good results here Task 2: Graph-Based Recommendations. This task is especially intended fc methods relying on the graph structure of the training data only. The user. resource, and tags of each post in the test data are all contained in the training datas post-core at level 2 Task 3: Online Tag Recommendations. This is a bonus task which will take place after Tasks 1 and 2. The participants shall implement a recommendation servic which can be called via Http by Bibsonomy'S recommender infrastructure when a user posts a bookmark or publication. All participating recommenders are called on each posting process, one of them is chosen to actually deliver the results to the user. We can then measure the performance of the recommenders in an online setting, where timeouts are important and where we can measure which tags the user has clicked on http://www.kde.cs.uni-kassel.de/ws/dc09 http://www.bibsonomy.org

Preface Since 1999 the ECML PKDD embraces the tradition of organizing a Discovery Challenge, allowing researchers to develop and test algorithms for novel and real world datasets. This year’s Discovery Challenge1 presents a dataset from the field of social bookmarking to deal with the recommendation of tags. The results submitted by the challenge’s participants are presented at an ECML PKDD workshop on September 7th, 2009, in Bled, Slovenia. The provided dataset has been created using data of the social bookmark and publication sharing system BibSonomy,2 operated by the organizers of the challenge. The training data was released on March 25th 2009, the test data on July 6th. The participants had time until July 8th to submit their results. This gave researchers 14 weeks time to tune their algorithms on a snapshot of a real world folksonomy dataset and 48 hours to compute results on the test data. To support the user during the tagging process and to facilitate the tagging, BibSonomy includes a tag recommender. When a user finds an interesting web page (or publication) and posts it to BibSonomy, the system offers up to five recommended tags on the posting page. The goal of the challenge is to learn a model which effectively predicts the keywords a user has in mind when describing a web page (or publication). We divided the problem into three tasks, each of which focusing on a certain aspect. All three tasks get the same dataset for training. It is a snapshot of BibSonomy until December 31st 2008. The dataset is cleaned and consists of two parts, the core part and the complete snapshot. The test dataset is different for each task. Task 1: Content-Based Tag Recommendations. The test data for this task contains posts, whose user, resource or tags are not contained in the post-core at level 2 of the training data. Thus, methods which can’t produce tag recommendations for new resources or are unable to suggest new tags very probably won’t produce good results here. Task 2: Graph-Based Recommendations. This task is especially intended for methods relying on the graph structure of the training data only. The user, resource, and tags of each post in the test data are all contained in the training data’s post-core at level 2. Task 3: Online Tag Recommendations. This is a bonus task which will take place after Tasks 1 and 2. The participants shall implement a recommendation service which can be called via HTTP by BibSonomy’s recommender infrastructure when a user posts a bookmark or publication. All participating recommenders are called on each posting process, one of them is chosen to actually deliver the results to the user. We can then measure the performance of the recommenders in an online setting, where timeouts are important and where we can measure which tags the user has clicked on. 1 http://www.kde.cs.uni-kassel.de/ws/dc09/ 2 http://www.bibsonomy.org 5

Results. More than 150 participants registered for the mailing list which enabled them to download the dataset. At the end we received 42 submissions -21 fo each of the Tasks 1 2. Additionally, 24 participants submitted a paper that can be found in the proceedings at hand We used the F1-Measure common in Information retrieval to evaluate the submitted recommendations. Therefore, we first computed for each post in the test data precision and recall by comparing the first five recommended tags against the tags the user has originally assigned to this post. Then we averaged precision and recall over all posts in the test data and used the resulting precision and recall to compute the F1-Measure as flm= =pecision recan The winning team of Task l has an flm of 0.18740, the second and third follow with 0. 18001 and 0.17975. For Task 2. the winner achieved an flm of 0.35594 33185and0. conference and later on the website of the challenge Lipczak et al. from Dalhousie University, Halifax, Canada(cf page 157)are the winners of Task 1. with a method based on the combination of tags from the resource's title, tags assigned to the resource by other users and tags in the user's profile they reached an flm of 0. 18740 in Task I and additionally achieved the third place in Task 2 with an flm of 0. 32461. The system is composed of six recommenders and the basic idea is to augment the tags from the title by related and then rescore and merge the occurrence graphs and from the user's profile extracted from two tag The winners of Task 2, Rendle and Schmidt-Thieme from University of Hildesheim, Germany(cf page 235)achieved an flm of 0. 35594 with a statis- tical method based on factor models. Therefore, they factorize the folksonomy structure to find latent interactions between users, resources and tags. Using a variant of the stochastic gradient descent algorithm the authors optimize an adaptation of the Bayesian Personal Ranking criterion. Finally, they estimate how many tags to recommend to further improve precision The second of Task 1(Mrosek et al., page 189)harvests tags from sources ke Delicious, Google Scholar, and CiteULike. They also employ the full-text of web pages and PDFs. The third (Ju and Hwang, page 109) merges tags which have been earlier assigned to the resource or used by the user as well as resource descriptions by a weighting scheme. Finally, the second of Task 2 (Balby Marinho et al., page 7)uses relational classification methods in a semi-supervised learning scenario to recommend tags We thank all participants of the challenge for their contributions and the organizers of the ECML PKDD 2009 conference for their support. Furthermore, we want to thank our sponsors Nokia and Tagora for supporting the challenge by awarding prizes for the winners of each task. We are looking forward to a very exciting and interesting workshop Folke Eisterlehner. Andreas hotho. Robert jaschke http://www.nokia.com/ http://www.tagora-project.eu

Results. More than 150 participants registered for the mailing list which enabled them to download the dataset. At the end, we received 42 submissions – 21 for each of the Tasks 1 & 2. Additionally, 24 participants submitted a paper that can be found in the proceedings at hand. We used the F1-Measure common in Information Retrieval to evaluate the submitted recommendations. Therefore, we first computed for each post in the test data precision and recall by comparing the first five recommended tags against the tags the user has originally assigned to this post. Then we averaged precision and recall over all posts in the test data and used the resulting precision and recall to compute the F1-Measure as f1m = 2·precision · recall precision + recall . The winning team of Task 1 has an f1m of 0.18740, the second and third follow with 0.18001 and 0.17975. For Task 2, the winner achieved an f1m of 0.35594, followed by 0.33185 and 0.32461. The winner of Task 3 will be announced at the conference and later on the website of the challenge. Lipczak et al. from Dalhousie University, Halifax, Canada (cf. page 157) are the winners of Task 1. With a method based on the combination of tags from the resource’s title, tags assigned to the resource by other users and tags in the user’s profile they reached an f1m of 0.18740 in Task 1 and additionally achieved the third place in Task 2 with an f1m of 0.32461. The system is composed of six recommenders and the basic idea is to augment the tags from the title by related tags extracted from two tag-tag–co-occurrence graphs and from the user’s profile and then rescore and merge them. The winners of Task 2, Rendle and Schmidt-Thieme from University of Hildesheim, Germany (cf. page 235) achieved an f1m of 0.35594 with a statistical method based on factor models. Therefore, they factorize the folksonomy structure to find latent interactions between users, resources and tags. Using a variant of the stochastic gradient descent algorithm the authors optimize an adaptation of the Bayesian Personal Ranking criterion. Finally, they estimate how many tags to recommend to further improve precision. The second of Task 1 (Mrosek et al., page 189) harvests tags from sources like Delicious, Google Scholar, and CiteULike. They also employ the full-text of web pages and PDFs. The third (Ju and Hwang, page 109) merges tags which have been earlier assigned to the resource or used by the user as well as resource descriptions by a weighting scheme. Finally, the second of Task 2 (Balby Marinho et al., page 7) uses relational classification methods in a semi-supervised learning scenario to recommend tags. We thank all participants of the challenge for their contributions and the organizers of the ECML PKDD 2009 conference for their support. Furthermore, we want to thank our sponsors Nokia3 and Tagora4 for supporting the challenge by awarding prizes for the winners of each task. We are looking forward to a very exciting and interesting workshop. Kassel, August 2009 Folke Eisterlehner, Andreas Hotho, Robert J¨aschke 3 http://www.nokia.com/ 4 http://www.tagora-project.eu/ 6

Relational Classification for Personalized Ta Recommendation eandro Balby Marinho, Christine Preisach, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (IsmLL) Samelsonplatz 1. University of Hildesheim, D-31141 Hildesheim, Germany [marinho, preisach, schmidt-thieme Gismll uni-hildesheimde ht Abstract. Folksonomy data is relational by nature, and therefore methods that directly exploit these relations are prominent for the tag recommendation prob lem. Relational methods have been successfully applied to areas in which en- tities are linked in an explicit manner, like hypertext documents and scientific publications. For approaching the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009, we propose to turn the folksonomy raph into a homogeneous post graph and use relational classification techniques for predicting tags. Our approach features adherence to multiple kinds of rela- 1 Introduction One might want tag recommendations for several reasons, as for example: simplifying the tagging process for the user, exposing different facets of a resource and helping the tag vocabulary to converge. Given that users are free to tag, i.e., the same resource can be tagged differently by different people, it is important to personalize the recom- mended tags for an individual user. Tagging data forms a ternary relation between users, resources and tags, differently from typical recommender systems in which the relation is usually binary between users and resources. The best methods presented so far explore this ternary relation to com- pute tag predictions, either by means of tensor factorization [8]or PageRank 3], on the hypergraph induced by the ternary relational data. We, on the other hand, propose to explore the underlying relational graph between posts by means of relational classifica In this paper we describe our approaches for addressing the graph-based tag rec- ommendation task of the ECML PKDD Discovery Challenge 2009. We present two basic algorithms: PWA*(probabilistic weighted average ), an iterative relational clas- sification algorithm enhanced with relaxation labelling, and WA *(weighted average). an ite ods feature: adherence to multiple kinds of relations, training free, fast predictions, and semi-supervised classification Semi-supervised classification is particularly appealing because it allows us to evtl benefit from the information contained in the test dataset Furthermore, we propose to combine these methods through unweighted voting

Relational Classification for Personalized Tag Recommendation Leandro Balby Marinho, Christine Preisach, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Samelsonplatz 1, University of Hildesheim, D-31141 Hildesheim, Germany {marinho,preisach,schmidt-thieme}@ismll.uni-hildesheim.de http://www.ismll.uni-hildesheim.com/ Abstract. Folksonomy data is relational by nature, and therefore methods that directly exploit these relations are prominent for the tag recommendation problem. Relational methods have been successfully applied to areas in which entities are linked in an explicit manner, like hypertext documents and scientific publications. For approaching the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009, we propose to turn the folksonomy graph into a homogeneous post graph and use relational classification techniques for predicting tags. Our approach features adherence to multiple kinds of relations, semi-supervised learning and fast predictions. 1 Introduction One might want tag recommendations for several reasons, as for example: simplifying the tagging process for the user, exposing different facets of a resource and helping the tag vocabulary to converge. Given that users are free to tag, i.e., the same resource can be tagged differently by different people, it is important to personalize the recommended tags for an individual user. Tagging data forms a ternary relation between users, resources and tags, differently from typical recommender systems in which the relation is usually binary between users and resources. The best methods presented so far explore this ternary relation to compute tag predictions, either by means of tensor factorization [8] or PageRank [3], on the hypergraph induced by the ternary relational data. We, on the other hand, propose to explore the underlying relational graph between posts by means of relational classification. In this paper we describe our approaches for addressing the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009. We present two basic algorithms: PWA* (probabilistic weighted average), an iterative relational classification algorithm enhanced with relaxation labelling, and WA* (weighted average), an iterative relational classification method without relaxation labelling. These methods feature: adherence to multiple kinds of relations, training free, fast predictions, and semi-supervised classification. Semi-supervised classification is particularly appealing because it allows us to evtl. benefit from the information contained in the test dataset. Furthermore, we propose to combine these methods through unweighted voting. 7

4 The paper Is organized as follows. Section 2 presents the notation used throughout In Section 3 we show how we turned the folksonomy into a post relation graph. Section 4 introduces the individual classifiers and the ensemble technique we used In Section 5 we elaborate on the evaluation and experiments conducted for tuning the parameters of our models, and report the results obtained on the test dataset released for the challenge. The paper closes with conclusions and directions for future work 2 Notation Foksonomy data usually comprises a set of users U, a set of resources R, a set of tags T, and a set Y of ternary relations between them, i.e,YCUXRXT Let {(u,r)t∈T:(u,r,t) be the set of all unique user/resources combinations in the data, where each pair is called a post. For convenience, let T(r=(u, r)):=teTI(u,r,t)eY be the set of all tags assigned to a given post r E x. We consider train/test splits based on posts, i.e Xtrain, Xtest C X disjoint and covering all of X XtrainUXtest=X For training, the learner has access to the set Xtrain of training posts and their true gs T xmi. The tag recommendation task is then to predict, for a given E Xtest, a set T(a)sTof tags that are most likely to be used by the resp. user for the resp. resource 3 Relation Engineering We propose to represent folksonomy data as a homogeneous, undirected relational graph over the post set, i.e G: =(X, E)in which edges are annotated with a weight w:XxX-R denoting the strength of the relation. Besides making the input data nore compact-we have only a binary relation RcxxX between objects of the same ype-this representation will allow us to trivially cast the problem of personalized tag recommendations as a relational classification problem. Relational classifiers usually consider, additionally to the typical attribute-value data of objects, relational information. A scientific paper, for example, can be connected to another paper that has been written by the same author or because they share common citations. It has been shown in many classification problems that relational classifiers perform better than purely attribute-based classifiers [1, 4, 6] In our case, we assume that posts are related to each other if they share the same user: Ruser: =I(a, rE Xx Xluser(r)=user(r), the same resource: Rres Ia,TE Xx Xres(a)=res(r), or either share the same user or resource Ruser: Ruser U Rres(see Figure 1). For convenience, let user(=)and res(a) denote the user and resource of post r respectively. Thus, each post is connected to each other either in terms of other users that tagged the same resource, or the resources tagged by the same user. Weights are discussed in Section 4

The paper is organized as follows. Section 2 presents the notation used throughout the paper. In Section 3 we show how we turned the folksonomy into a post relational graph. Section 4 introduces the individual classifiers and the ensemble technique we used. In Section 5 we elaborate on the evaluation and experiments conducted for tuning the parameters of our models, and report the results obtained on the test dataset released for the challenge. The paper closes with conclusions and directions for future work. 2 Notation Foksonomy data usually comprises a set of users U, a set of resources R, a set of tags T, and a set Y of ternary relations between them, i.e., Y ⊆ U × R × T. Let X := {(u, r)| ∃t ∈ T : (u, r, t) ∈ Y } be the set of all unique user/resources combinations in the data, where each pair is called a post. For convenience, let T(x = (u, r)) := {t ∈ T | (u, r, t) ∈ Y } be the set of all tags assigned to a given post x ∈ X. We consider train/test splits based on posts, i.e., Xtrain, Xtest ⊂ X disjoint and covering all of X: Xtrain∪˙ Xtest = X For training, the learner has access to the set Xtrain of training posts and their true tags T|Xtrain . The tag recommendation task is then to predict, for a given x ∈ Xtest, a set Tˆ(x) ⊆ T of tags that are most likely to be used by the resp. user for the resp. resource. 3 Relation Engineering We propose to represent folksonomy data as a homogeneous, undirected relational graph over the post set, i.e., G := (X, E) in which edges are annotated with a weight w : X × X → R denoting the strength of the relation. Besides making the input data more compact – we have only a binary relation R ⊆ X×X between objects of the same type – this representation will allow us to trivially cast the problem of personalized tag recommendations as a relational classification problem. Relational classifiers usually consider, additionally to the typical attribute-value data of objects, relational information. A scientific paper, for example, can be connected to another paper that has been written by the same author or because they share common citations. It has been shown in many classification problems that relational classifiers perform better than purely attribute-based classifiers [1, 4, 6]. In our case, we assume that posts are related to each other if they share the same user: Ruser := {(x, x0 ) ∈ X × X | user(x) = user(x 0 )}, the same resource: Rres := {(x, x0 ) ∈ X × X|res(x) = res(x 0 )}, or either share the same user or resource: Rres user := Ruser ∪ Rres (see Figure 1). For convenience, let user(x) and res(x) denote the user and resource of post x respectively. Thus, each post is connected to each other either in terms of other users that tagged the same resource, or the resources tagged by the same user. Weights are discussed in Section 4. 8

given a. But this is not an optimal solution, the more similar posts are to each other th higher the weight of this edge should be Hence, a more suitable relational method for tag recommendation is the WeightedAverage(WA) which sums up all the weights of posts z'E N that share the same tag t E T and normalizes this by the sum over all weights in the neighborhood P(+|x) ∑r∈Nt∈r(x)(x,x Thus, WA does only consider neighbors that belong to the training set. A more sophisticated relational method that takes probabilities into account is the probabilistic weighted average(PWA), it calculates the probability of t given r by build- ing the weighted average of the tag probabilities of neighbor nodes T'E Na ∑x∈N,(x,x')P(tx P(t]r) Where P(tr)=l for I'E Xtrain, i.e., we are only exploiting nodes contained in the training set(see eq. (2). We will see in the next paragraph how one can exploit these probabilities in a more clever way. Both approaches have been introduced in [5] and applied to relational datasets Since we want to recommend more than one tag we need to cast the tag recommen- dation problem as a multilabel classification problem, i. e, assign one or more classes to a test node. We accomplish the multilabel problem by sorting the calculated probabili ties P(tr)for all r E Xtest and recommend the top n tags with highest probabilities The proposed relational methods could either be applied on Rrsr, 1. e, the union of the user and resource relation or on each relation Ruser, Rres individually. If applied on each relation the results could be combined by using ensemble techniques 4.1 Semi-Supervised learning As mentioned before, we would like additionally, to exploit unlabeled information con- tained in the graph and use the test nodes that have not been tagged yet, but are related to other nodes. This can be achieved by applying collective inference methods, being iterative procedures, which classify related nodes simultaneously and exploit relational autocorrelation and unlabeled data. relational autocorrelation is the correlation among a variable of an entity to the same variable(here the class)of a related entity, i.e,con- nected entities are likely to have the same classes assigned. Collective Classification is emi-supervised by nature, since one exploits the unlabeled part of the data. One of this semi-supervised methods is relaxation labeling [1], it can be formally expressed as: M(P(tI)22 We first initialize the unlabeled nodes with the prior probability calculated using the train set, then compute the probability of tag t given r iteratively using a relational clas- sification method M based on the neighborhood Nr in the inner loop. The procedure stops when the algorithm converges (i. e, the difference of the tag probability between

given x. But this is not an optimal solution, the more similar posts are to each other the higher the weight of this edge should be. Hence, a more suitable relational method for tag recommendation is the WeightedAverage (WA) which sums up all the weights of posts x 0 ∈ Nx that share the same tag t ∈ T and normalizes this by the sum over all weights in the neighborhood. P(t|x) = P x0∈Nx|t∈T(x0) w(x, x0 ) P x0∈Nx w(x, x0) (3) Thus, WA does only consider neighbors that belong to the training set. A more sophisticated relational method that takes probabilities into account is the probabilistic weighted average (PWA), it calculates the probability of t given x by building the weighted average of the tag probabilities of neighbor nodes x 0 ∈ Nx: P(t|x) = P x0∈Nx w(x, x0 )P(t|x 0 ) P x0∈Nx w(x, x0) (4) Where P(t|x 0 ) = 1 for x 0 ∈ Xtrain, i.e., we are only exploiting nodes contained in the training set (see eq. (2)). We will see in the next paragraph how one can exploit these probabilities in a more clever way. Both approaches have been introduced in [5] and applied to relational datasets. Since we want to recommend more than one tag we need to cast the tag recommendation problem as a multilabel classification problem, i.e., assign one or more classes to a test node. We accomplish the multilabel problem by sorting the calculated probabilities P(t|x) for all x ∈ Xtest and recommend the top n tags with highest probabilities. The proposed relational methods could either be applied on Rres user, i.e., the union of the user and resource relation or on each relation Ruser, Rres individually. If applied on each relation the results could be combined by using ensemble techniques. 4.1 Semi-Supervised Learning As mentioned before, we would like additionally, to exploit unlabeled information contained in the graph and use the test nodes that have not been tagged yet, but are related to other nodes. This can be achieved by applying collective inference methods, being iterative procedures, which classify related nodes simultaneously and exploit relational autocorrelation and unlabeled data. Relational autocorrelation is the correlation among a variable of an entity to the same variable (here the class) of a related entity, i.e., connected entities are likely to have the same classes assigned. Collective Classification is semi-supervised by nature, since one exploits the unlabeled part of the data. One of this semi-supervised methods is relaxation labeling [1], it can be formally expressed as: P(t|x) (i+1) = M(P(t|x 0 ) (i) x0∈Nx ) (5) We first initialize the unlabeled nodes with the prior probability calculated using the train set, then compute the probability of tag t given x iteratively using a relational classification method M based on the neighborhood Nx in the inner loop. The procedure stops when the algorithm converges (i.e., the difference of the tag probability between 10

点击下载完整版文档（PDF格式）

共305页，可试读40页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录