正在加载图片...
Training Details: Parameters were estimated by Table Ill ing samples from ten randomly-seeded runs, each runnin THREE MOST PROBABLE TOPICS FOR THE TAG SEMANTIC UTT ver 100 iterations, with an initial burn-in phase of 500 for the tt model and 1500 iterations for the utt model. we Topic description (word stems) found the number of burn-in iterations to be a convenient Topic 96(p=0. 85) semant ontolog annot know choice by observing a flattening of the log likelihood. Instead dg web commun support in- of estimating the hyperparameters a, B and we fix them to tegr describ xml 50/T, 0.001 and 1/M respectively in each of the experiments Topic 184(p=0.05) servic web (M represents the total number of tags). The format app Integr chosen according to [7]. We trained the topic models with a g standard interfac predefined number of topics ranging from T=200, T=300 and T=400 to show that the performance is not very sensitie Topic 84(p=0.03) search retriev inform relev ar rank feedback effect document this parameter as long as the number of topics is reasonabl high. In addition, models with t=10, t=50 and T=100 were trained for the perplexity evaluation in Section IV-B2 B. Results 1) Uncovering the Hidden Semantic Properties: Table Il illustrates two different topics(out of 200) from the corpus Note that the coupling between p(wlt)and p(taglt)is a property of the here proposed models and originates from -LDA the sampling of a topic for a specific tag based on the topic 2 asol assignments of the resource(see Section I). To show the -UTT model descriptive power of our learned model, we chose two topics i 8 is abo the science of networks, while topic 84 reflects a topic about information retrieval. All extracted 200 topics from the TT and UTT model are available as supplementary data Number of latent variables Information about p(ut) in CiteULike provides interest ing insights about the main research interests of users. The Figure 2. Tag perplexity on the test se most likely users given the topics for a UTT model with T=200 are again provided as supplementary data Other interesting statistical relationships that can be ex- parameters which differ dependent of the model(see Section tracted with the here derived models are e. g. the statistical IID). We partition the data set into disjoint training(90%)and relationships between tag labels and topics p(t)4). p(t(n) test sets(10%)and select for each resource in the test set a gives us a notion about the involvement of a tag in dif- subset of 50% of the tags for the evaluation. The remaining ferent topics. Table Ill shows one such example for the 50% of the tags are used by standard LDA to estimate e tag semantic. semantic is mostly discussed in the context since LDA is not able to model the dependency between of the traditional semantic web, but also in the context of the tokens in the resource and the tags. In contrast, the bioinformatics and web services. The third topic discusses TT and UTT model first estimate the resource specific e, the tag in the classical information retrieval domain which is estimated online via Gibbs Sampling respectively. 2)Tag Perplexity: In addition to the qualitative evaluation Afterwards the most likely tags are computed by T. All of the TT and UTT model shown above, we measure the perplexity values were computed by averaging over ten quality in peaking perplexity is the ability to predict tags on new held-out tags under the maximum likelihood estimates of unseen documents. Perplexity, a quantitative measure for each model for different values of T. Note that a lower comparing language models is widely used to compare the perplexity indicates a better annotation quality. We see that predictive performance of topic models(see e.g. [7) and is the two models. which include the resource tokens into defined over a test set as the computation of the likelihood clearly outperform the standard Lda model. as t increases. the utt model has Perp (Itest Dtrain )=exp a better perplexity than the TT model(with a crosspoint at T=100). With T=400 the perplexity of the TT model starts where ltest are the tags in the test set and ld represent the slightly to increase, while for the UTt model the perple tags in a certain test resource Dtrain represents the trained remains constant.Training Details: Parameters were estimated by averag￾ing samples from ten randomly-seeded runs, each running over 100 iterations, with an initial burn-in phase of 500 for the TT model and 1500 iterations for the UTT model. We found the number of burn-in iterations to be a convenient choice by observing a flattening of the log likelihood. Instead of estimating the hyperparameters α, β and γ, we fix them to 50/T, 0.001 and 1/M respectively in each of the experiments (M represents the total number of tags). The values were chosen according to [7]. We trained the topic models with a predefined number of topics ranging from T=200, T=300 and T=400 to show that the performance is not very sensitive to this parameter as long as the number of topics is reasonably high. In addition, models with T=10 , T=50 and T=100 were trained for the perplexity evaluation in Section IV-B2. B. Results 1) Uncovering the Hidden Semantic Properties: Table II illustrates two different topics (out of 200) from the corpus. Note that the coupling between p(w|t) and p(tag|t) is a property of the here proposed models and originates from the sampling of a topic for a specific tag based on the topic assignments of the resource (see Section III). To show the descriptive power of our learned model, we chose two topics describing different aspects of CiteULike. Topic 18 is about the science of networks, while topic 84 reflects a topic about information retrieval. All extracted 200 topics from the TT and UTT model are available as supplementary data 4 . Information about p(u|t) in CiteULike provides interest￾ing insights about the main research interests of users. The most likely users given the topics for a UTT model with T=200 are again provided as supplementary data4 . Other interesting statistical relationships that can be ex￾tracted with the here derived models are e. g. the statistical relationships between tag labels and topics p(t|l). p(t|l) gives us a notion about the involvement of a tag in dif￾ferent topics. Table III shows one such example for the tag semantic. semantic is mostly discussed in the context of the traditional semantic web, but also in the context of bioinformatics and web services. The third topic discusses the tag in the classical information retrieval domain. 2) Tag Perplexity: In addition to the qualitative evaluation of the TT and UTT model shown above, we measure the tag annotation quality in terms of perplexity. Intuitively speaking, perplexity is the ability to predict tags on new unseen documents. Perplexity, a quantitative measure for comparing language models is widely used to compare the predictive performance of topic models (see e. g. [7]) and is defined over a test set as: P erp.(ltest|Dtrain) = exp − PDtest i=1 log(p(ld|Dtrain)) PDtest i=1 Ld ! , where ltest are the tags in the test set and ld represent the tags in a certain test resource. Dtrain represents the trained Table III THREE MOST PROBABLE TOPICS FOR THE TAG SEMANTIC. UTT MODEL, T=200. Topic Topic description (word stems) Topic 96 (p=0.85) semant ontolog annot knowl￾edg web commun support in￾tegr describ xml Topic 184 (p=0.05) servic web workflow bioin￾format applic resourc integr manag standard interfac Topic 84 (p=0.03) search retriev inform relev ar rank feedback effect document 0 50 100 150 200 250 300 350 400 200 250 300 350 400 450 500 550 Number of latent variables Perplexity LDA TT model UTT model Figure 2. Tag perplexity on the test set. parameters which differ dependent of the model (see Section III). We partition the data set into disjoint training (90%)and test sets (10%) and select for each resource in the test set a subset of 50% of the tags for the evaluation. The remaining 50% of the tags are used by standard LDA to estimate Θ, since LDA is not able to model the dependency between the tokens in the resource and the tags. In contrast, the TT and UTT model first estimate the resource specific Θ, which is estimated online via Gibbs Sampling respectively. Afterwards the most likely tags are computed by Γ. All perplexity values were computed by averaging over ten different samples. Figure 2 plots the perplexity over the held-out tags under the maximum likelihood estimates of each model for different values of T. Note that a lower perplexity indicates a better annotation quality. We see that the two models, which include the resource tokens into the computation of the likelihood clearly outperform the standard LDA model. As T increases, the UTT model has a better perplexity than the TT model (with a crosspoint at T=100). With T=400 the perplexity of the TT model starts slightly to increase, while for the UTT model the perplexity remains constant
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有