《电子商务 E-business》阅读文献：On deriving tagsonomies Keyword relations coming from the crowd

团购合买资源类别：文库，文档格式：PDF，文档页数：12，文件大小：250.95KB

On Deriving Tagsonomies: Keyword Relations Coming from crowd Michal barla and maria bielikova Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology Ilkovicova 3. 842 16 Bratislava, Slovakia I Name Surname fiit. stuba sk Abstract. Many keyword-based approaches to text classification, in formation retrieval or even user modeling for adaptive web-based system could benefit from knowledge on relations between various key words which gives further possibilities to compare them, evaluate their distance etc. This paper proposes an approach how to determine keyword rela- tions(mainly a parent-child relationship) by leveraging collective wisdom of the masses, present in data of collaborative(social)tagging systems on the Web. The feasibility of our approach is demonstrated on the data coming from the social bookmarking systems delicious and CiteULik 1 Introduction Phenomenon of the Social Web, with its roots in Web 2.0, is gaining a lot of attention all over the world in both research and practice. We are studying the power and wisdom of masses, when millions of people switched from the pas- sive reading of the content to the active participation in its creation. People are blogging, sharing wikis, connecting themselves in various social applications nd above all-tagging almost everything they get into touch: bookmarks, pho- tographs, videos, publications, blogposts, articles etc. People got used to classify items by ng few simple tags to it and to use those tags for a future retrieval of their favorite items. More, they are often expecting to find a new, yet unseen content by using their own tags. a part of the Web 2.0 success lies in an implicit agreement of masses on a shared(but never explicitly defined) vocabulary used to tag items -folksonomies. At the same time, vast amount of content requires efficient navigation sup- port, content reorganization or filtering -personalization and adaptation of the web. Its efficiency is dependent on adaptive systems ability to capture and maintain user model. A lot of research was devoted to finding the most suitable fexible or most generic and all-encompassing user model representation [1, 2 however, so far we are not aware of any explicit agreement on an ideal model The obvious question, when analyzing the success of Web 2.0, is whether an assignments of keywords(tags)to user instead of to pages (i.e, creation of

On Deriving Tagsonomies: Keyword Relations Coming from Crowd Michal Barla and M´aria Bielikov´a Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology Ilkoviˇcova 3, 842 16 Bratislava, Slovakia {Name.Surname}@fiit.stuba.sk Abstract. Many keyword-based approaches to text classification, information retrieval or even user modeling for adaptive web-based system could benefit from knowledge on relations between various keywords, which gives further possibilities to compare them, evaluate their distance etc. This paper proposes an approach how to determine keyword relations (mainly a parent-child relationship) by leveraging collective wisdom of the masses, present in data of collaborative (social) tagging systems on the Web. The feasibility of our approach is demonstrated on the data coming from the social bookmarking systems delicious and CiteULike. 1 Introduction Phenomenon of the Social Web, with its roots in Web 2.0, is gaining a lot of attention all over the world in both research and practice. We are studying the power and wisdom of masses, when millions of people switched from the passive reading of the content to the active participation in its creation. People are blogging, sharing wikis, connecting themselves in various social applications and above all – tagging almost everything they get into touch: bookmarks, photographs, videos, publications, blogposts, articles etc. People got used to classify items by assigning few simple tags to it and to use those tags for a future retrieval of their favorite items. More, they are often expecting to find a new, yet unseen content by using their own tags. A part of the Web 2.0 success lies in an implicit agreement of masses on a shared (but never explicitly defined) vocabulary used to tag items – folksonomies. At the same time, vast amount of content requires efficient navigation support, content reorganization or filtering – personalization and adaptation of the web. Its efficiency is dependent on adaptive system’s ability to capture and maintain user model. A lot of research was devoted to finding the most suitable, flexible or most generic and all-encompassing user model representation [1, 2], however, so far we are not aware of any explicit agreement on an ideal model representation. The obvious question, when analyzing the success of Web 2.0, is whether an assignments of keywords (tags) to user instead of to pages (i.e., creation of

tag-based user model) could lead to simple, viable and efficient approach to user modeling for adaptive web-based system. The challenge is then to combine the user model with the models of communities coming from the emerging social web and create a solid platform for personalization based on both traditional(e. g as presented in 3), and social approaches, such as the one presented in [4 When considering tag-based user models in a(tag-based)Web 2.0 environ ment, we are facing the need to be able to compare various tags. We might need to compare user characteristics or even whole user models, represented by tags o find similarities between users, which could serve for model maintenance as well as for more complex tasks such as online community creation or recom- mending in a recommender system. We might also need to compare the domai items represented by tags (e. g, a web page)in order to evaluate the explicit implicit feedback 5 and update the user model appropriately In this paper, we propose a method for inferring various relationships be- tween tags, which allows for full-blown usage of the tag-based user model for personalization and adaptation on the Web The paper is structured as follows: In section 2 we explain our approach to finding relationships between tags. Section 3 presents data we acquired and results of experiments we performed. In section 4 we summarize the related works, which served as an inspiration for our algorithm for building hierarchies from folksonomies. Finally, we give conclusions 2 Finding Relationships between Tags Our approach to finding relationships between tags combines three distinct ap- 1. Deriving of parent-child relationships between tags from a given folksonomy 2. Determining similarity between tags by applying spreading activation on the top of the folksonomy graph 3. Interconnecting tags by additional semantic relationships as well as enriching the whole tag corpus by adding external keywords: both coming from the Wordnet lexical database 2.1 Building Hierarchies from Folksonomies lksonomy is defined as a hypergraph 6h: where the set of vertices V=AUTUI and AnT=0, AnI=0, TOI=O and the set of ternary edges E={(a,t,)|a∈A,t∈T,t∈}. A social tagging system can be represented by such a hypergraph with following definitions of the sets A, T and i (we will use them for the rest of the paper as well) Actors(users)A=a1,.,ak] Tags(keywords, concepts)T=ti, t) Items(objects, instances)I=i1,.,imI

tag-based user model) could lead to simple, viable and efficient approach to user modeling for adaptive web-based system. The challenge is then to combine the user model with the models of communities coming from the emerging social web and create a solid platform for personalization based on both traditional (e.g., as presented in [3]), and social approaches, such as the one presented in [4]. When considering tag-based user models in a (tag-based) Web 2.0 environment, we are facing the need to be able to compare various tags. We might need to compare user characteristics or even whole user models, represented by tags, to find similarities between users, which could serve for model maintenance as well as for more complex tasks such as online community creation or recommending in a recommender system. We might also need to compare the domain items represented by tags (e.g., a web page) in order to evaluate the explicit or implicit feedback [5] and update the user model appropriately. In this paper, we propose a method for inferring various relationships between tags, which allows for full-blown usage of the tag-based user model for personalization and adaptation on the Web. The paper is structured as follows: In section 2 we explain our approach to finding relationships between tags. Section 3 presents data we acquired and results of experiments we performed. In section 4 we summarize the related works, which served as an inspiration for our algorithm for building hierarchies from folksonomies. Finally, we give conclusions. 2 Finding Relationships between Tags Our approach to finding relationships between tags combines three distinct approaches: 1. Deriving of parent-child relationships between tags from a given folksonomy; 2. Determining similarity between tags by applying spreading activation on the top of the folksonomy graph; 3. Interconnecting tags by additional semantic relationships as well as enriching the whole tag corpus by adding external keywords; both coming from the Wordnet lexical database. 2.1 Building Hierarchies from Folksonomies Folksonomy is defined as a hypergraph [6] H := hV, Ei, where the set of vertices V = A ∪T ∪I and A ∩T = ∅, A ∩I = ∅, T ∩I = ∅ and the set of ternary edges E = {(a, t, i) | a ∈ A, t ∈ T, i ∈ I}. A social tagging system can be represented by such a hypergraph with following definitions of the sets A, T and I (we will use them for the rest of the paper as well): – Actors (users) A = {a1, ..., ak} – Tags (keywords, concepts) T = {t1, ..., tl} – Items (objects, instances) I = {i1, ..., im}

foremen ntioned hypergraph can be reduced into three bipartite graphs a graph holding the associations between actors and tags(AT), actors and items (AD) and tags and items(TD). For example, the AT valued bipartite graph defined as follows:AT:=(4A×T,Eat),Eat={(a,t)|i∈I:(a,t,i)∈E} Each such bipartite graph XY: =(XY, Exy) can be furthermore folded into two one-mode graphs Gx Ex), Gy =(Vy, Ew), where Vr =X V=Y, and Er={(a,b)|a,b∈X,y∈Y:(a,y)∈Exy,(b,y)∈Ery} Ey={(a,b)|a,b∈Y,丑x∈X:(x,a)∈Exy,(,b)∈Exy} We can furthermore define a weight of an edge eab E Ei in such a one-mode graph as u(eab):=|k|k∈K:(a,k)∈E1k,(b,k)∈Eik}. In other words, the weight w(ea. b) shows the number of times the a and b were linked together in an original bipartite graph By folding aforementioned bipartite graphs, we get six different one-mode graphs, each representing different semantic network encoded in the folksonomy. For example, by folding AT graph through tags, we get a social network of actors(users) based on overlapping sets of tags, where the links are bet people who have used the same tags with weights showing the number of tags they have used in common. Similarly, we can get a social network based on overlapping sets of items(two people are linked if they have tagged the same item, with weight showing the number of items they have tagged in common) In our work, we were primarily interested in folding TI graph through items giving us a semantic network of tags The proposed approach to creation of hierarchy from the folded folksonomy hypergraph as defined above is based on a rather simple assumptions of set theory In an ideal situation, the tag ta is a parent of tag tb if the set of entities (persons or items)classified under tb is a subset of the entities under ta In other words, all items classified under narrow tag also appear under the broader tag Moreover, since our goal was to produce a reusable hierarchy of hich could be mapped to users' interests and use this hierarchy as a basis for reasoning on those interests, we were not interested in tags, which were used only by a small amount of users, even if they were using it quite extensively. We wanted only what "crowd agrees upon"and were filtering-out tags not achieving a certain degree of popularity (i. e, it is not used by at least k% of all users), even if this decision reduced drastically the amount of tags in the resulting hierarchy The algorithm 1 shows the basic idea of our approach using a simple code. First, we create an artificial root of the hierarchy and put it in the set of already processed tags(ordered tags in the algorithm). Then, we process all tags from the folksonomy in the following manner 1. If the tag t does not reach the popularity threshold, we omit it immediately and n to the next one 2. Otherwise, we compute its overlap(intersect)with every tag from the ordered tags set, resulting in identifying the tag to with a maximum overlap

The aforementioned hypergraph can be reduced into three bipartite graphs: a graph holding the associations between actors and tags (AT), actors and items (AI) and tags and items (T I). For example, the AT valued bipartite graph is defined as follows: AT := hA × T, Eati, Eat = {(a, t) | ∃i ∈ I : (a, t, i) ∈ E}. Each such bipartite graph XY := hX × Y, Exyi can be furthermore folded into two one-mode graphs GX := hVx, Exi, GY := hVy, Eyi, where Vx = X, Vy = Y , and Ex = {(a, b) | a, b ∈ X, ∃y ∈ Y : (a, y) ∈ Exy, (b, y) ∈ Exy}, Ey = {(a, b) | a, b ∈ Y, ∃x ∈ X : (x, a) ∈ Exy, (x, b) ∈ Exy}. We can furthermore define a weight of an edge eab ∈ Ei in such a one-mode graph as w(eab) := |{k | k ∈ K : (a, k) ∈ Eik, (b, k) ∈ Eik}|. In other words, the weight w(ea,b) shows the number of times the a and b were linked together in an original bipartite graph. By folding aforementioned bipartite graphs, we get six different one-mode graphs, each representing different semantic network encoded in the folksonomy. For example, by folding AT graph through tags, we get a social network of actors (users) based on overlapping sets of tags, where the links are between people who have used the same tags with weights showing the number of tags they have used in common. Similarly, we can get a social network based on overlapping sets of items (two people are linked if they have tagged the same item, with weight showing the number of items they have tagged in common). In our work, we were primarily interested in folding T I graph through items, giving us a semantic network of tags. The proposed approach to creation of hierarchy from the folded folksonomy hypergraph as defined above is based on a rather simple assumptions of set theory: In an ideal situation, the tag ta is a parent of tag tb if the set of entities (persons or items) classified under tb is a subset of the entities under ta. In other words, all items classified under narrow tag also appear under the broader tag. Moreover, since our goal was to produce a reusable hierarchy of tags, which could be mapped to users’ interests and use this hierarchy as a basis for reasoning on those interests, we were not interested in tags, which were used only by a small amount of users, even if they were using it quite extensively. We wanted only what “crowd agrees upon” and were filtering-out tags not achieving a certain degree of popularity (i.e., it is not used by at least k% of all users), even if this decision reduced drastically the amount of tags in the resulting hierarchy. The algorithm 1 shows the basic idea of our approach using a simple pseudocode. First, we create an artificial root of the hierarchy and put it in the set of already processed tags (ordered tags in the algorithm). Then, we process all tags from the folksonomy in the following manner: 1. If the tag t does not reach the popularity threshold, we omit it immediately and pass on to the next one. 2. Otherwise, we compute its overlap (intersect) with every tag from the ordered tags set, resulting in identifying the tag to with a maximum overlap

3. The parent-child or sibling relationship is established if the ratio of maximum overlap to the overall use of the tag t reaches the pre-defined threshold. The roles of the tags in relationship is determined as follows: if the tag to is used significantly more often than t, we declare t to be a child of to and vice versa If the usage of both tags is more or less equal, we declare them as sibling Before the assignment of relationships, we check whether it will not "break the context in the hierarchy, i. e, we compute an average overlap of all ances- tors or children in the branch respectively. If the average overlap falls below the threshold, we create a duplicate of tag to, assign tag t to it appropriately and make a new branch of it. The creation of the duplicate aims at solving the homonymy problem, where the to tag has multiple meanings, depending on a context Spreading Activation Spreading activation is a method for associative retrieval [7 in associative and semantic networks. hence a network data structure consisting of nodes and links modeling relationships between nodes. Searching is done by activating the se- lected node with an activation energy and spreading this energy through the edges to its neighbors so that Energylneighbor l- Energylorigin] The process runs recursively until convergence. At the end, the nodes activation levels represent the re of similarity to the initially chosen node or nodes. Spreading activation search in the folksonomy finds new relationships, which are not "visible when considering only set theory assumptions and the algo- rithm 1. If these relations are added to the already existing parent-child rela- tionships, it allows us to make contextual "jumps"between the tags, which we believe could have an interesting impact for the tag- based user modeling and Since the folksonomy does not provide direct links between tags, the spread- ng activation is performed on either the bipartite graph TI (tags-items)or TA (tags-actors)or on a combination of the two, all depending on the definition what neighbors means for the algorithm (i. e, the spreading activation on the Ta graph is performed if the neighbors of a tag are all actors which used that tag). However, due to the vast amount of connections present in every social tagging system, which leads to enormous time and space complexity of the re- cursive process, we needed to select a sub-graph of the whole folksonomy. For instance, for the TI bipartite graph, we select the sub-graph by modifying the neighbors function of a tag/item so that it will return only popular items/tags where popularity of a tag or item is defined(following the same principles as in algorithm 1)as a ratio of actors which used that particular tag or tagge that particular item. When spreading through TA graph, we define an actor as popular if he or she used at least k% of popular tags and tagged at least l% of popular items

3. The parent-child or sibling relationship is established if the ratio of maximum overlap to the overall use of the tag t reaches the pre-defined threshold. The roles of the tags in relationship is determined as follows: if the tag to is used significantly more often than t, we declare t to be a child of to and vice versa. If the usage of both tags is more or less equal, we declare them as siblings. 4. Before the assignment of relationships, we check whether it will not “break” the context in the hierarchy, i.e., we compute an average overlap of all ancestors or children in the branch respectively. If the average overlap falls below the threshold, we create a duplicate of tag to, assign tag t to it appropriately and make a new branch of it. The creation of the duplicate aims at solving the homonymy problem, where the to tag has multiple meanings, depending on a context. 2.2 Finding Related Tags by Spreading Activation Spreading activation is a method for associative retrieval [7] in associative and semantic networks, hence a network data structure consisting of nodes and links modeling relationships between nodes. Searching is done by activating the selected node with an activation energy and spreading this energy through the edges to its neighbors so that Energy[neighbori ] = Energy[origin] |neighbors| The process runs recursively until convergence. At the end, the nodes’ activation levels represent the measure of similarity to the initially chosen node or nodes. Spreading activation search in the folksonomy finds new relationships, which are not “visible” when considering only set theory assumptions and the algorithm 1. If these relations are added to the already existing parent-child relationships, it allows us to make contextual “jumps” between the tags, which we believe could have an interesting impact for the tag-based user modeling and personalization. Since the folksonomy does not provide direct links between tags, the spreading activation is performed on either the bipartite graph T I (tags-items) or T A (tags-actors) or on a combination of the two, all depending on the definition what neighbors means for the algorithm (i.e., the spreading activation on the T A graph is performed if the neighbors of a tag are all actors which used that tag). However, due to the vast amount of connections present in every social tagging system, which leads to enormous time and space complexity of the recursive process, we needed to select a sub-graph of the whole folksonomy. For instance, for the T I bipartite graph, we select the sub-graph by modifying the neighbors function of a tag/item so that it will return only popular items/tags, where popularity of a tag or item is defined (following the same principles as in algorithm 1) as a ratio of actors which used that particular tag or tagged that particular item. When spreading through T A graph, we define an actor as popular if he or she used at least k% of popular tags and tagged at least l% of popular items

lexical relations between synsets, 1. e, one can browse for hyponyms and hyper nyms of a given word, other similar words or even antonyms The reason why we did not use Wordnet as our source of keywords and relationships at the first place comes from our plans to leverage our hierarchy for user modeling purposes. Tags acquired from social tagging systems are closer to the user than Wordnet, they are more"webbish" Even more, some relationships hich emerge from the social tagging systems could never come out of Wordnet for example, a relationship between words ie, png and bugs, pointing to th well-known problem of Internet Explorer's broken PNG support ). Relationshi cquired from social tagging systems are like ordinary people used them, not like linguistics have decided them to be However, we believe, that Wordnet can still significantly contribute to the quality of the tag-based models built upon the generated tag hierarchy. Informa- tion on semantic relationships between words can be used not only to identify particular subtrees, which should be merged (in case of synonymy) or divided (because of ambiguity), but also to add new words(along with their relation- ships)to the hierarchy, which would raise the probability that we will be able to map user's interest to our keywords 3 Experiment In order to determine the feasibility of the proposed approach to deriving rela- tionships between tags from folksonomies for the purposes of tag-based user mod eling and to determine the optimal setting of the algorithm, we performed several experiments with two different folksonomies. Our main concern was whether the algorithm creates cohesive groups of tags(subtrees)without significant flaws of 3.1 Data Wecollectedapartofdeliciousbookmarkingsite(http://delicious.com dataset by periodically polling their Rss feeds. First, we used a recent activity RSS feed to obtain a list of 128 448 unique user login names, next we used this information in user-scoped RSS feeds to obtain all tags and tagged pages for a given login name the second dataset, we took the anonymized folksonomy of users-tags- publications from the Citeulike(htTp: //citeulike. org), which is a system for tagging and searching for scholarly papers. Summarization of data we w able to acquire so far is listed in the Table 1 first thing we were interested in was whether these two folksonomies are used in a similar manner. The graph on Fig. 1 depicts a distribution of tags pages in delicious in a logarithmic scale. We can see that the distribution of tags hypernym-the generic term used to designate a whole class of specific instances. Y is a hypernym of X if X is a(kind of)y

lexical relations between synsets, i.e., one can browse for hyponyms and hypernyms1 of a given word, other similar words or even antonyms. The reason why we did not use Wordnet as our source of keywords and relationships at the first place comes from our plans to leverage our hierarchy for user modeling purposes. Tags acquired from social tagging systems are closer to the user than Wordnet, they are more “webbish”. Even more, some relationships which emerge from the social tagging systems could never come out of Wordnet (for example, a relationship between words ie, png and bugs, pointing to the well-known problem of Internet Explorer’s broken PNG support). Relationships acquired from social tagging systems are like ordinary people used them, not like linguistics have decided them to be. However, we believe, that Wordnet can still significantly contribute to the quality of the tag-based models built upon the generated tag hierarchy. Information on semantic relationships between words can be used not only to identify particular subtrees, which should be merged (in case of synonymy) or divided (because of ambiguity), but also to add new words (along with their relationships) to the hierarchy, which would raise the probability that we will be able to map user’s interest to our keywords. 3 Experiment In order to determine the feasibility of the proposed approach to deriving relationships between tags from folksonomies for the purposes of tag-based user modeling and to determine the optimal setting of the algorithm, we performed several experiments with two different folksonomies. Our main concern was whether the algorithm creates cohesive groups of tags (subtrees) without significant flaws of the context. 3.1 Data We collected a part of delicious bookmarking site (http://delicious.com) dataset by periodically polling their RSS feeds. First, we used a ’recent activity’ RSS feed to obtain a list of 128 448 unique user login names, next we used this information in user-scoped RSS feeds to obtain all tags and tagged pages for a given login name. As the second dataset, we took the anonymized folksonomy of users-tagspublications from the CiteULike (http://citeulike.org), which is a system for tagging and searching for scholarly papers. Summarization of data we were able to acquire so far is listed in the Table 1. First thing we were interested in was whether these two folksonomies are used in a similar manner. The graph on Fig. 1 depicts a distribution of tags on pages in delicious in a logarithmic scale. We can see that the distribution of tags 1 hypernym – the generic term used to designate a whole class of specific instances. Y is a hypernym of X if X is a (kind of) Y

Table 1. Overview of the acquired dataset #usernames: 128 448 44 215 #unique tags: 220 647 294 80 ems(pages, publications ) 962 3671 437245 fits more or less the power-law, with 421 840 pages tagged by one tag only and one page having 171 distinct tags2 Similar graph for the publications of the CiteULike dataset is shown on Fig. 2 Again, we see the power law distribution, with 625 658 publications tagged by one tag only and one publication tagged by 1 708 distinct tags We know already that power law distributions tend to arise in social systems where many people express their preferences among many options. Therefore, by observing the power law in both datasets, we were assured that datasets are valid difference hough users(which could be an issue especially for our delicious obtain e dataset)to perform a crowd-based analysis. However, we found an interesting popularity of tags. In our delicious dataset, we can find many tags which are shared among 5+% of all users whereas in the CiteULike dataset, a tag marked as"most active"on CiteULike website reaches popularity of 1 to 2 percents only. We continue in delicious feeds harvesting in order to determine, hether the overall popularity of tags marked as popular in our current dataset will decrease 3.2 Results We executed the algorithm 1(see section 2)several times on both folksonomies with different setups and were observing the resulting trees. The setups of the variables for different runs of the algorithm are summarized in the Table 2. The floating average overlap threshold from the table means that the actual threshold is computed"on-the-fly"as a fraction of current parent-child overlap The manual inspection of resulting hierarchies proved the viabilit proach, where meaningful relations between keywords were created uld definitely provide a good basis for a tag-based user modeling. The quality of the result depends highly on the configuration of the algorithm. The rest of this section is devoted to analysis of the impact of algorithms variables to the y We provide summarization of basic attributes of the produced hierarchies the Table 3, with examples on Fig 3 and 4(complete results can be seen ishttp:// the most tagged publication according to the CiteULike linkout database is surpris-

Table 1. Overview of the acquired dataset. delicious citeulike #usernames: 128 448 44 215 #records: 2 957 144 5 228 356 #processed users: 2 234 44 215 #unique tags: 220 647 294 806 #unique items(pages,publications): 962 367 1 437 245 fits more or less the power-law, with 421 840 pages tagged by one tag only and one page having 171 distinct tags2 . Similar graph for the publications of the CiteULike dataset is shown on Fig. 2. Again, we see the power law distribution, with 625 658 publications tagged by one tag only and one publication tagged by 1 708 distinct tags3 . We know already that power law distributions tend to arise in social systems where many people express their preferences among many options. Therefore, by observing the power law in both datasets, we were assured that datasets are valid and contain enough users (which could be an issue especially for our delicious dataset) to perform a crowd-based analysis. However, we found an interesting difference in popularity of tags. In our delicious dataset, we can find many tags which are shared among 5+% of all users whereas in the CiteULike dataset, a tag marked as “most active” on CiteULike website reaches popularity of 1 to 2 percents only. We continue in delicious feeds harvesting in order to determine, whether the overall popularity of tags marked as popular in our current dataset will decrease. 3.2 Results We executed the algorithm 1 (see section 2) several times on both folksonomies with different setups and were observing the resulting trees. The setups of the variables for different runs of the algorithm are summarized in the Table 2. The floating average overlap threshold from the table means that the actual threshold is computed “on-the-fly” as a fraction of current parent-child overlap. The manual inspection of resulting hierarchies proved the viability of our approach, where meaningful relations between keywords were created, which would definitely provide a good basis for a tag-based user modeling. The quality of the result depends highly on the configuration of the algorithm. The rest of this section is devoted to analysis of the impact of algorithm’s variables to the created hierarchy. We provide summarization of basic attributes of the produced hierarchies in the Table 3, with examples on Fig. 3 and 4 (complete results can be seen 2 that page is http://www.scribd.com/ 3 the most tagged publication according to the CiteULike linkout database is surprisingly “about:blank

点击进入文档下载页（PDF格式）

共12页，试读已结束，阅读完整版请下载

点击下载（PDF格式）

浏览记录