Mining Association Rules in Folksonomies Christoph Schmitz, Andreas Hotho, Robert Jaschke,2, Gerd Stumme, 2 1 Knowledge Data Engineering Group, Department of Mathematics and Computer Science, University of Kassel, wilhelmshher Allee 73. D-34121 Kassel Germanyhttp://www.kde.cs.uni-kassel.de 2 Research Center L3S, Expo Plaza 1, D-30539 Hannover, Germany, Abstract. Social bookmark tools are rapidly emerging on the Web. In such syster users are setting up lightweight conceptual structures called folksonomies. These sys- tems provide currently relatively few structure. We discuss in this paper, how assc ciation rule mining can be adopted to analyze and structure folksonomies, and how the results can be used for ontology learning and supporting emergent semantics. We demonstrate our approach on a large scale dataset stemming from an online system 1 Introduction A new family of so-called Web 2.0applications is currently emerging platforms like Wikis, Blogs, and social resource sharing systems. In this paper e the Web. These include user-centric publishing and knowledge manageme we focus on resource sharing systems, which all use the same kind of lightweight knowledge representation, called folksonomy. The word'folksonomy'is a blend of the words 'taxonomy'andfolk', and stands for conceptual structures created by the people Resource sharing systems, such as Flickr or del icio us, have acquired large numbers of users(from discussions on the del icio us mailing list, one can ap- proximate the number of users on del icio us to be more than one hundred thousand) within less than two years. The reason for their immediate success is the fact that no specific skills are needed for participating, and that these tools yield immediate benefit for each individual user (e. g. organizing ones bookmarks in a browser-independent, persistent fashion) without too much overhead. Large numbers of users have created huge amounts of information within a very short period of time. As these systems grow larger, however, the users feel the need for more structure for better organizing their resources. For instance, approaches for tagging tags, or for bundling them, are currently dis- cussed on the corresponding news groups. Currently, however, there is a lack of theoretical foundations adapted to the new opportunities which has to be A first step towards more structure within such systems is to discover knowl- edge that is already implicitly present by the way different users assign tags to resources. This knowledge may be used for recommending both a hierarchy Ihttp://www.flickr.com http://del.icio.us
Mining Association Rules in Folksonomies Christoph Schmitz1 , Andreas Hotho1 , Robert J¨aschke1,2 , Gerd Stumme1,2 1 Knowledge & Data Engineering Group, Department of Mathematics and Computer Science, University of Kassel, Wilhelmshher Allee 73, D–34121 Kassel, Germany, http://www.kde.cs.uni-kassel.de 2 Research Center L3S, Expo Plaza 1, D–30539 Hannover, Germany, http://www.l3s.de Abstract. Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. These systems provide currently relatively few structure. We discuss in this paper, how association rule mining can be adopted to analyze and structure folksonomies, and how the results can be used for ontology learning and supporting emergent semantics. We demonstrate our approach on a large scale dataset stemming from an online system. 1 Introduction A new family of so-called “Web 2.0” applications is currently emerging on the Web. These include user-centric publishing and knowledge management platforms like Wikis, Blogs, and social resource sharing systems. In this paper, we focus on resource sharing systems, which all use the same kind of lightweight knowledge representation, called folksonomy. The word ‘folksonomy’ is a blend of the words ‘taxonomy’ and ‘folk’, and stands for conceptual structures created by the people. Resource sharing systems, such as Flickr1 or del.icio.us,2 have acquired large numbers of users (from discussions on the del.icio.us mailing list, one can approximate the number of users on del.icio.us to be more than one hundred thousand) within less than two years. The reason for their immediate success is the fact that no specific skills are needed for participating, and that these tools yield immediate benefit for each individual user (e.g. organizing ones bookmarks in a browser-independent, persistent fashion) without too much overhead. Large numbers of users have created huge amounts of information within a very short period of time. As these systems grow larger, however, the users feel the need for more structure for better organizing their resources. For instance, approaches for tagging tags, or for bundling them, are currently discussed on the corresponding news groups. Currently, however, there is a lack of theoretical foundations adapted to the new opportunities which has to be overcome. A first step towards more structure within such systems is to discover knowledge that is already implicitly present by the way different users assign tags to resources. This knowledge may be used for recommending both a hierarchy 1 http://www.flickr.com/ 2 http://del.icio.us
Date Bsarbeten Ansicht Geh 中·命·自p r's Bibsonomy::r bookmarks BibTex Semigroup or Fig 1. Bibsonomy displays bookmarks and(BIBTEXbased) bibliographic references on the already existing tags, and additional tags, ultimately leading towards emergent semantics(Staab et al.(2002), Steels(1998)) by converging use of the same vocabulary. In this sense, knowledge discovery(KDD)techniques are a promising tool for bottom-up building of conceptual structures In this paper, we will focus on a selected KDD technique, namely association rules. Since folksonomies provide a three-dimensional dataset(users, tags, and resources)instead of a usual two-dimensional one(items and transactions we present first a systematic overview of projecting a folksonomy onto a two- dimensional structure. Then we will show the results of mining rules from two selected projections on the del icio us system This paper is organized as follows. Section 2 reviews recent developments in the area of social bookmark systems, and presents a formal model In Section 3 we briefly recall the notions of association rules, before providing a systematic overview over the projections of a folksonomy onto a two-dimensional dataset in Section 4 In Section 5, we present the results of mining association rules on data of the del icio us system. Section 6 concludes the paper with a discussion of further research topics on knowledge discovery within folksonomies 2 Social Resource Sharing and Folksonomies ocial resource sharing systems are web-based systems that allow users to up- load their resources, and to label them with names. The systems can be distin- guished according to what kind of resources are supported. Flickr, for instance allows the sharing of photos, del.icio us" the sharing of bookmarks, CiteULike and Connotea the sharing of bibliographic references, and 43Things?'even the sharing of goals in private life. Our own upcoming system, called Bib Sonomy, will allow to share simultaneously bookmarks and BIBTEX entries(see Fig. 1) In their core, these systems are all very similar. Once a user is logged in, he can add a resource to the system, and assign arbitrary labels, so-called 3http://www.flickr.com/ OUs tp://www.citeulike.or http://www.connotea.org/ 7http://www.43things.com/ http://www.bibsonomy.org 2
Fig. 1. Bibsonomy displays bookmarks and (BibTEXbased) bibliographic references simultaneously. on the already existing tags, and additional tags, ultimately leading towards emergent semantics (Staab et al. (2002), Steels (1998)) by converging use of the same vocabulary. In this sense, knowledge discovery (KDD) techniques are a promising tool for bottom-up building of conceptual structures. In this paper, we will focus on a selected KDD technique, namely association rules. Since folksonomies provide a three-dimensional dataset (users, tags, and resources) instead of a usual two-dimensional one (items and transactions), we present first a systematic overview of projecting a folksonomy onto a twodimensional structure. Then we will show the results of mining rules from two selected projections on the del.icio.us system. This paper is organized as follows. Section 2 reviews recent developments in the area of social bookmark systems, and presents a formal model. In Section 3, we briefly recall the notions of association rules, before providing a systematic overview over the projections of a folksonomy onto a two-dimensional dataset in Section 4. In Section 5, we present the results of mining association rules on data of the del.icio.us system. Section 6 concludes the paper with a discussion of further research topics on knowledge discovery within folksonomies. 2 Social Resource Sharing and Folksonomies Social resource sharing systems are web-based systems that allow users to upload their resources, and to label them with names. The systems can be distinguished according to what kind of resources are supported. Flickr,3 for instance, allows the sharing of photos, del.icio.us4 the sharing of bookmarks, CiteULike5 and Connotea6 the sharing of bibliographic references, and 43Things7 even the sharing of goals in private life. Our own upcoming system, called BibSonomy, 8 will allow to share simultaneously bookmarks and BibTEX entries (see Fig. 1). In their core, these systems are all very similar. Once a user is logged in, he can add a resource to the system, and assign arbitrary labels, so-called 3 http://www.flickr.com/ 4 http://del.icio.us/ 5 http://www.citeulike.org/ 6 http://www.connotea.org/ 7 http://www.43things.com/ 8 http://www.bibsonomy.org 2
tags, to it. We call the collection of all his assignments his personomy, and the collection of all personomies is called folksonomy. The user can also explore the personomies of other users in all dimensions: for a given user he can see the resources that user has uploaded, together with the tags he has assigned to them(see Fig. 1);when clicking on a resource he sees which other users have uploaded this resource and how they tagged it; and when clicking on a tag he sees who assigned it to which resources The systems allow for additional functionality. For instance, one can copy a resource from another user, and label it with one owns tags. Overall, these systems provide a very intuitive navigation through the data 2. 1 State of the art web collaboration systems. Among the rare exceptions are Hammond et al (2005)and Lund et al.(2005) who provide good overviews of social book marking tools with special emphasis on folksonomies, and Mathes(2004)who discusses strengths and limitations of folksonomies. The main discussion on folksonomies and related topics is currently only going on mailing lists, e. g. Con- note(2005). To the best of our knowledge, the ideas presented in this paper have not been explored before, but there is a lot of recent work dealing with folksonomies Mika(2005) defines a model of semantic-social networks for extracting light eight ontologies from del icio us besides calculating measures like the cluster ing coefficient, (local)betweenness centrality or the network constraint on the extracted one-mode network, Mika uses co-occurence techniques for clustering e conce eton There are several systems working on top of del icio us to explore the un- derlying folksonomy. Collaborative Rank provides ranked search results on top of del icio us bookmarks. The ranking takes into account, how early someone bookmarked an URL and how many people followed him or her. Other sys- tems show popular sites(pop )of statistics about delicio. ulicious0)or focus on graphical representations 2.2 a Formal model for folksonomies A folksonomy basically describes users, resources, tags, and allows users to assign(arbitrary) tags to resources. We present here a formal definition of folksonomies, which is also underlying our BibSonomy system Definition 1. A folksonomy is a tuple F:=(U,T, R,Y, < )where .U, T, and R are finite sets, whose elements are called users, tags and resources, resp. °htt/ ollabrank org/ http://populicio.us http://cloudalicio.us/ http://www.neuroticweb.com/recursos/del.icio.us-graphs/ 3
tags, to it. We call the collection of all his assignments his personomy, and the collection of all personomies is called folksonomy. The user can also explore the personomies of other users in all dimensions: for a given user he can see the resources that user has uploaded, together with the tags he has assigned to them (see Fig. 1); when clicking on a resource he sees which other users have uploaded this resource and how they tagged it; and when clicking on a tag he sees who assigned it to which resources. The systems allow for additional functionality. For instance, one can copy a resource from another user, and label it with one owns tags. Overall, these systems provide a very intuitive navigation through the data. 2.1 State of the Art There are currently virtually no scientific publications about folksonomy-based web collaboration systems. Among the rare exceptions are Hammond et al. (2005) and Lund et al. (2005) who provide good overviews of social bookmarking tools with special emphasis on folksonomies, and Mathes (2004) who discusses strengths and limitations of folksonomies. The main discussion on folksonomies and related topics is currently only going on mailing lists, e.g. Connotea (2005). To the best of our knowledge, the ideas presented in this paper have not been explored before, but there is a lot of recent work dealing with folksonomies. Mika (2005) defines a model of semantic-social networks for extracting lightweight ontologies from del.icio.us. Besides calculating measures like the clustering coefficient, (local) betweenness centrality or the network constraint on the extracted one-mode network, Mika uses co-occurence techniques for clustering the concept network. There are several systems working on top of del.icio.us to explore the underlying folksonomy. CollaborativeRank9 provides ranked search results on top of del.icio.us bookmarks. The ranking takes into account, how early someone bookmarked an URL and how many people followed him or her. Other systems show popular sites (Populicious10) or focus on graphical representations (Cloudalicious11, Grafolicious12) of statistics about del.icio.us. 2.2 A Formal Model for Folksonomies A folksonomy basically describes users, resources, tags, and allows users to assign (arbitrary) tags to resources. We present here a formal definition of folksonomies, which is also underlying our BibSonomy system. Definition 1. A folksonomy is a tuple F := (U, T, R, Y, ≺) where • U, T , and R are finite sets, whose elements are called users, tags and resources, resp., 9 http://collabrank.org/ 10 http://populicio.us/ 11 http://cloudalicio.us/ 12 http://www.neuroticweb.com/recursos/del.icio.us-graphs/ 3
Y is a ternary relation between them, i.e.,YCUxTXR, called assign ments and < is a user-specific subtag/supertag-relation, i. e, <CUX(TxT)\(t, t) t∈T}) The personomy Pu of a given user u E U is the restriction of F to u, P:=(T,Rn,l,xu) with I:={(t,r)∈T×R|(u,t,r)∈Y},Tn:=丌1(Ix) Ru:=丌2(lu), and <u:={(t1,t2)∈T×T(u,t,t2)∈} Users are typically described by their user ID, and tags may be arbitrary strings. What is considered as a resource depends on the type of system. In delicio us. for instance. the resources are URLs. and in Flickr. the resources are pictures. In our BibSonomy system, we have two types of resources, book marks and BIBTEXentries. From an implementation point of view, resources are internally represented by some ID In this paper, we do not make use of the subtag/ supertag relation for sake of simplicity. I.e., <=0, and we will simply note a folksonomy as a quadruple F: =(U,T, R, Y). This structure is known in Formal Concept Analysis(wille (1982), Ganter and Wille(1999)) as a triadic contert(Lehmann and Wille (1995), Stumme(2005)). An equivalent view on folksonomy data is that of a ripartite (undirected) hypergraph G=(V, E), where V= UUTUR is the set of nodes, and E= u, t, rl(u,t, r)EY is the set of hyperedges 2.3 Del.ico. us- A Folksonomy-Based Social Bookmark System In order to evaluate our folksonomy mining approach, we have analyzed the popular social bookmarking sytem del icio us. Delicio us is a server-based sys- tem with a simple-to-use interface that allows users to organize and share book- marks on the internet. It is able to store in addition to the url a description, a note, and tags(i. e, arbitrary labels). We chose delicio. us rather than our own system, Bibsonomy, as the latter is going online only after the time of writing of this article. For our experiments, we collected from the del. ico. us system U= 75, 242 users, T= 533, 191 tags and R=3, 158, 297 resources, related by in total Y= 17, 362, 212 triples 3 Association Rule Mining We assume here that the reader is familiar with the basics of association rule mining introduced by Agrawal et al. (1993). As the work presented in this pa- per is on the conceptual rather than on the computational level, we refrain in particular from describing the vast area of developing efficient algorithms Many of the existing algorithms can be found at the Frequent Itemset Min- ing Implementations Repository. Instead, we just recall the definition of the association rule mining problem, which was initially stated by Agrawal et al. (1993), in order to clarify the notations used in the following. We will not use http://fimi.cs.helsinkifi/ 4
• Y is a ternary relation between them, i. e., Y ⊆ U × T × R, called assignments, and • ≺ is a user-specific subtag/supertag-relation, i. e., ≺⊆ U ×((T ×T )\ {(t, t) | t ∈ T }). The personomy Pu of a given user u ∈ U is the restriction of F to u, i. e., Pu := (Tu, Ru, Iu, ≺u) with Iu := {(t, r) ∈ T × R | (u, t, r) ∈ Y }, Tu := π1(Iu), Ru := π2(Iu), and ≺u:= {(t1, t2) ∈ T × T | (u, t1, t2) ∈≺}. Users are typically described by their user ID, and tags may be arbitrary strings. What is considered as a resource depends on the type of system. In del.icio.us, for instance, the resources are URLs, and in Flickr, the resources are pictures. In our BibSonomy system, we have two types of resources, bookmarks and BibTEXentries. From an implementation point of view, resources are internally represented by some ID. In this paper, we do not make use of the subtag/supertag relation for sake of simplicity. I. e., ≺ = ∅, and we will simply note a folksonomy as a quadruple F := (U, T, R, Y ). This structure is known in Formal Concept Analysis (Wille (1982), Ganter and Wille (1999)) as a triadic context (Lehmann and Wille (1995), Stumme (2005)). An equivalent view on folksonomy data is that of a tripartite (undirected) hypergraph G = (V, E), where V = U∪˙ T∪˙ R is the set of nodes, and E = {{u, t, r} | (u, t, r) ∈ Y } is the set of hyperedges. 2.3 Del.ico.us — A Folksonomy-Based Social Bookmark System In order to evaluate our folksonomy mining approach, we have analyzed the popular social bookmarking sytem del.icio.us. Del.icio.us is a server-based system with a simple-to-use interface that allows users to organize and share bookmarks on the internet. It is able to store in addition to the URL a description, a note, and tags (i. e., arbitrary labels). We chose del.icio.us rather than our own system, Bibsonomy, as the latter is going online only after the time of writing of this article. For our experiments, we collected from the del.ico.us system |U| = 75, 242 users, |T | = 533, 191 tags and |R| = 3, 158, 297 resources, related by in total |Y | = 17, 362, 212 triples. 3 Association Rule Mining We assume here, that the reader is familiar with the basics of association rule mining introduced by Agrawal et al. (1993). As the work presented in this paper is on the conceptual rather than on the computational level, we refrain in particular from describing the vast area of developing efficient algorithms. Many of the existing algorithms can be found at the Frequent Itemset Mining Implementations Repository.13 Instead, we just recall the definition of the association rule mining problem, which was initially stated by Agrawal et al. (1993), in order to clarify the notations used in the following. We will not use 13 http://fimi.cs.helsinki.fi/ 4
the original terminology of Srikant et al, but rather exploit the vocabulary of Formal Concept Analysis(FCA)(Wille(1982)), as it better fits with the formal folksonomy model introduced in Definition 1.14 Definition 2. A formal contert is a dataset K: =(G, M, I) consisting of a set G of objects, a set M of attributes, and a binary relation ICGxM, where (9,m)∈ i is read as“ object g has attribute m In the usual basket analysis scenario, M is the set of items sold by a supermar- ket, G is the set of all transactions, and, for a given transaction g E G, the set g:=mE MI(g, m)E] contains all items bought in that transaction Definition 3. For a set X of attributes, we define A': =I9 EG Vm E X: (9, m)EI. The support of A is calculated by supp(A) Definition 4(Association Rule Mining Problem(Agrawal et al (1993). Let K be a formal context, and minsupp, minconf E [0, 1, called minimum support and minimum confidence thresholds, resp. The association rule mining problem consists now of determining all pairs A - B of subsets of M whose support supp(A -B): supp(A U B) is above the thresh- old minsupp, and whose confidence conf(A-B): =sspp is above the threshold minc As the rules A-B and A-B A carry the same information, and in particular have same support and same confidence, we will consider in this paper the additional constraint prevalent in the data mining community, that premise A and conclusion B are to be disjoint When comparing Definitions 1 and 2, we observe that association rules annot be mined directly on folksonomies, because of their triadic nature. One either has to define some kind of triadic association rules, or to transform the triadic folksonomy into a dyadic formal context. In this paper, we follow the latter approach 4 Projecting the Folksonomy onto two Dimensions As discussed in the previous section, we have to reduce the three-dimensional folksonomy to a two-dimensional formal context before we can apply any as sociation rule mining technique. Several such projections have already been introduced in Lehmann and Wille(1995 ). In Stumme(2005), we provide a more complete approach, which we will slightly adapt to the association rule For a detailed discussion about the role of FCa for association rule mining see In contrast, in FCA, one often requires A to be a subset of B, as this fits better with the notion of closed itemsets which arose of applying FCa to the association mining problem(Pasquier et al.(1999), Zaki and Hsiao(1999), Stumme(1999)) 5
the original terminology of Srikant et al, but rather exploit the vocabulary of Formal Concept Analysis (FCA) (Wille (1982)), as it better fits with the formal folksonomy model introduced in Definition 1.14 Definition 2. A formal context is a dataset K := (G, M, I) consisting of a set G of objects, a set M of attributes, and a binary relation I ⊆ G × M, where (g, m) ∈ I is read as “object g has attribute m”. In the usual basket analysis scenario, M is the set of items sold by a supermarket, G is the set of all transactions, and, for a given transaction g ∈ G, the set g I := {m ∈ M|(g, m) ∈ I} contains all items bought in that transaction. Definition 3. For a set X of attributes, we define A′ := {g ∈ G | ∀m ∈ X: (g, m) ∈ I}. The support of A is calculated by supp(A) := |A ′ | |G| . Definition 4 (Association Rule Mining Problem (Agrawal et al. (1993))). Let K be a formal context, and minsupp, minconf ∈ [0, 1], called minimum support and minimum confidence thresholds, resp. The association rule mining problem consists now of determining all pairs A → B of subsets of M whose support supp(A → B) := supp(A ∪ B) is above the threshold minsupp, and whose confidence conf(A → B) := supp(A∪B) supp(A) is above the threshold minconf. As the rules A → B and A → B \ A carry the same information, and in particular have same support and same confidence, we will consider in this paper the additional constraint prevalent in the data mining community, that premise A and conclusion B are to be disjoint.15 When comparing Definitions 1 and 2, we observe that association rules cannot be mined directly on folksonomies, because of their triadic nature. One either has to define some kind of triadic association rules, or to transform the triadic folksonomy into a dyadic formal context. In this paper, we follow the latter approach. 4 Projecting the Folksonomy onto two Dimensions As discussed in the previous section, we have to reduce the three-dimensional folksonomy to a two-dimensional formal context before we can apply any association rule mining technique. Several such projections have already been introduced in Lehmann and Wille (1995). In Stumme (2005), we provide a more complete approach, which we will slightly adapt to the association rule mining scenario. 14 For a detailed discussion about the role of FCA for association rule mining see (Stumme (2002)). 15 In contrast, in FCA, one often requires A to be a subset of B, as this fits better with the notion of closed itemsets which arose of applying FCA to the association mining problem (Pasquier et al. (1999), Zaki and Hsiao (1999), Stumme (1999)). 5
Fig. 2. All rules with two elements of Ki with 05 support, 50% confidence As we want to analyze all facets of the folksonomy, we want to allow to use any of the three sets U, T, and R as the set of objects- on which the support is computed- at some point in time, depending on the task on hand Therefore. we will not fix the roles of the three sets in advance. Instead, we consider a triadic context as symmetric structure, where all three sets are of equal importance. For easier handling, we will therefore denote the folksonomy F:=(U,T,R, Y alternatively by F: =(X1, X2, X3, Y) in the following We will define the set of objects -i. e, the set on which the support will be counted- by a permutation on the set 1, 2, 3, i. e, by an element o of the full symmetric group S3. The choice of a permutation indicates, together with one of the aggregation modes 9,'W, n' with n E N, and"V, on which formal context K: =(G, M, I the association rules are computed ·K:=(Xa(1)×Xa(3),Xa(),D)with(x(1),xr(3),xa(2)∈ I if and only if( ·Ko:=(Xa(1),Xa(2)×X(3),D)with(x(1),(xa(2),x(3)∈ I if and only if(x1,x2,x3)∈Y Ko,n: =(Xa(), Xo(2), I)with(aa(1), Ta(2)E I if and only if there exist n different z(3)∈Xa(3)with(x1,x2,r3)∈Y K, V : =(Xa(), Xa(2), I)with(aa(1), Ta(2)E I if and only if for all To(3)E Xo(3) holds(=1, r2, r3)E Y. The mode "V is thus equivalent to 'n' if These projections are complemented by the following way to ' cut slices'out of the folksonomy. A slice is obtained by selecting one dimension (out of user/tag/resource), and then fixing in this dimension one particular instance ·Ietx:=x(3)∈X(3Kox:=(Xa(1),Xa(2), D) with(xa(),x(2)∈Iif and only if (a1, r2, I3)EY In the next section, we will discuss for a selected subset of these projections the kind of rules one obtains from mining the formal context that is resulting from the projection 5 Mining Association Rules on the Projected Folksonomy After having performed one of the projections described in the previous sec- tion, one can now apply the standard association rule mining techniques as
Fig. 2. All rules with two elements of K1 with .05 % support, 50 % confidence As we want to analyze all facets of the folksonomy, we want to allow to use any of the three sets U, T , and R as the set of objects – on which the support is computed – at some point in time, depending on the task on hand. Therefore, we will not fix the roles of the three sets in advance. Instead, we consider a triadic context as symmetric structure, where all three sets are of equal importance. For easier handling, we will therefore denote the folksonomy F := (U, T, R, Y ) alternatively by F := (X1, X2, X3, Y ) in the following. We will define the set of objects – i. e., the set on which the support will be counted – by a permutation on the set {1, 2, 3}, i. e., by an element σ of the full symmetric group S3. The choice of a permutation indicates, together with one of the aggregation modes ‘ G ’, ‘ M ’, ‘∃n’ with n ∈ N, and ‘∀’, on which formal context K := (G, M, I) the association rules are computed. • Kσ, G := (Xσ(1) × Xσ(3), Xσ(2), I) with ((xσ(1), xσ(3)), xσ(2)) ∈ I if and only if (x1, x2, x3) ∈ Y . • Kσ, M := (Xσ(1), Xσ(2) × Xσ(3), I) with (xσ(1),(xσ(2), xσ(3))) ∈ I if and only if (x1, x2, x3) ∈ Y . • Kσ,∃n := (Xσ(1), Xσ(2), I) with (xσ(1), xσ(2)) ∈ I if and only if there exist n different xσ(3) ∈ Xσ(3) with (x1, x2, x3) ∈ Y . • Kσ,∀ := (Xσ(1), Xσ(2), I) with (xσ(1), xσ(2)) ∈ I if and only if for all xσ(3) ∈ Xσ(3) holds (x1, x2, x3) ∈ Y . The mode ‘∀’ is thus equivalent to ‘∃n’ if |Xσ(3)| = n. These projections are complemented by the following way to ‘cut slices’ out of the folksonomy. A slice is obtained by selecting one dimension (out of user/tag/resource), and then fixing in this dimension one particular instance. • Let x := xσ(3) ∈ Xσ(3). Kσ,x := (Xσ(1), Xσ(2), I) with (xσ(1), xσ(2)) ∈ I if and only if (x1, x2, x3) ∈ Y . In the next section, we will discuss for a selected subset of these projections the kind of rules one obtains from mining the formal context that is resulting from the projection. 5 Mining Association Rules on the Projected Folksonomy After having performed one of the projections described in the previous section, one can now apply the standard association rule mining techniques as 6
Delicious hacks CSS Fig 3. Rules with two elements of K2 with 0.05 support, and 10% confidence described in Section 3. Due to space restrictions, we have to focus on a subset of projections. In particular, we will address the two projections Ki, witI 1:= id and a2:=(1→1,2→3,3→2). We obtain the two dyadic contexts K1:=(U×R,T,1)with1:={(u,r),t)(u,t,r)∈Y}andK2:=(T×U,R,I2) with I2: =I(t, u),r)l(u,t, rEYI An association rule A- B in KI is read as Users assigning the tags from A to some resources often also assign the tags from B to them. This type of rules may be used in a recommender system. If a user assigns all tags from A then the system suggests him to add also those from B Figure 2 shows all rules with one element in the premise and one element in the conclusion that we derived from K, with a minimum support of 0.05 and a minimum confidence of 50%. In the diagram one can see that our interpretation of rules in KI holds for these examples: users tagging some webpage with debian are likely to tag it with linur also, and pages about bands are probably also concerned with music. These results can be used in a recommender system aiding the user in choosing the tags which are most helpful in retrieving the resource later Another view on these rules is to see them as subsumption relations, so that the rule mining can be used to learn a taxonomic structure. If many resources tagged with rslt are also tagged with rml, this indicates, for example, that rml can be considered a supertopic of slt if one wants to automatically populate the relation Figure 2 also shows two pairs of tags which occur together very frequently without any distinct direction in the rule: open source occurs as a phrase most of the time, while the other pair consists of two tags (ukque and ukg: irc), which seem to be added automatically to any resource that mentioned in a particular chat channel The second example are association rules A B in K? which are read as Users labelling the resources in A with some tags often also assign these tags to the resources in B. In essence both resources have to have something in common. Figure 3 shows parts of the resulting graph for applying association
http://pchere.blogspot.com/2005/02/absolutely-delicious-complete-tool.html http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html http://www.onlamp.com/pub/a/onlamp/2005/03/03/rails.html http://rails.homelinux.org/ http://www.onlamp.com/pub/a/onlamp/2005/06/09/rails_ajax.html http://www.rubyonrails.org/ http://www.rubyonrails.com/ http://www.cssbeauty.com/ http://www.cssimport.com/ http://www.webstandardsawards.com/ http://www.thenoodleincident.com/tutorials/box_lesson/boxes.html http://pro.html.it/esempio/nifty/ http://home.tampabay.rr.com/bmerkey/cheatsheet.htm http://www.alistapart.com/ http://www.scifihifi.com/cocoalicious/ http://tuxtina.de/software/ http://www.inknoise.com/experimental/layoutomatic.php http://www.accessify.com/tools-and-wizards/list-o-matic/list-o-matic.asp http://www.evolt.org/article/Ten_CSS_tricks_you_may_not_know/17/60369/ http://www.positioniseverything.net/ http://www.citeulike.org/ http://www.connotea.org/ http://www.fiftyfoureleven.com/resources/programming/xmlhttprequest/examples http://openrico.org/home.page http://www.baekdal.com/articles/usability/usable-XMLHttpRequest/ http://script.aculo.us/ http://www.adaptivepath.com/publications/essays/archives/000385.php http://www.ajaxmatters.com/ http://developer.apple.com/internet/webcontent/xmlhttpreq.html http://www.modernmethod.com/sajax/ http://www.webpasties.com/xmlHttpRequest/ http://www.xml.com/pub/a/2005/02/09/xml-http-request.html http://jibbering.com/2002/4/httprequest.html http://jpspan.sourceforge.net/wiki/doku.php?id=javascript:xmlhttprequest http://goog-ajaxslt.sourceforge.net/ http://www.ripcord.co.nz/behaviour/ http://prototype.conio.net/ http://johnvey.com/features/deliciousdirector/ http://tool-man.org/examples/edit-in-place.html http://tool-man.org/dragsort/ http://www.netlobo.com/div_hiding.html http://tool-man.org/examples/sorting.html http://www.bobbyvandersluis.com/articles/goodpractices.php http://www.formassembly.com/ http://www.axentric.com/posts/default/7 http://brothercake.com/site/resources/scripts/dbx/ http://www.ajaxpatterns.org/index.php?title=Main_Page http://www.onlinetools.org/articles/unobtrusivejavascript/index.html http://www.onlinetools.org/articles/unobtrusivejavascript/ http://leftjustified.net/site-in-an-hour/ http://toolkit.crispen.org/index.php?cat=templates http://www.stunicholls.myby.co.uk/index.html http://supergreg.hopto.org/nutritious/nutritious.php http://delicious.mozdev.org/ http://dietrich.ganx4.com/foxylicious/ http://www.beelerspace.com/index.php?p=890 http://kevan.org/extispicious http://opencontent.org/oishii/ http://fresh.homeunix.net/delicious.html http://pchere.blogspot.com/2005/03/great-flickr-tools-collection.html http://glish.com/css/ http://www.airtightinteractive.com/projects/related_tag_browser/app/ http://www.gamingheadlines.co.uk/wod/formstyle/index.html http://www.kryogenix.org/code/browser/sorttable/ http://www.quirksmode.org/ Ajax Delicious Hacks CSS Javascript Fig. 3. Rules with two elements of K2 with 0.05 % support, and 10 % confidence described in Section 3. Due to space restrictions, we have to focus on a subset of projections. In particular, we will address the two projections Kσi, G with σ1 := id and σ2 := (1 7→ 1, 2 7→ 3, 3 7→ 2). We obtain the two dyadic contexts K1 := (U×R, T, I1) with I1 := {((u, r), t)|(u, t, r) ∈ Y } and K2 := (T ×U, R, I2) with I2 := {(t, u), r)|(u, t, r) ∈ Y }. An association rule A → B in K1 is read as Users assigning the tags from A to some resources often also assign the tags from B to them. This type of rules may be used in a recommender system. If a user assigns all tags from A then the system suggests him to add also those from B. Figure 2 shows all rules with one element in the premise and one element in the conclusion that we derived from K1 with a minimum support of 0.05 % and a minimum confidence of 50 %. In the diagram one can see that our interpretation of rules in K1 holds for these examples: users tagging some webpage with debian are likely to tag it with linux also, and pages about bands are probably also concerned with music. These results can be used in a recommender system, aiding the user in choosing the tags which are most helpful in retrieving the resource later. Another view on these rules is to see them as subsumption relations, so that the rule mining can be used to learn a taxonomic structure. If many resources tagged with xslt are also tagged with xml, this indicates, for example, that xml can be considered a supertopic of xslt if one wants to automatically populate the ≺ relation. Figure 2 also shows two pairs of tags which occur together very frequently without any distinct direction in the rule: open source occurs as a phrase most of the time, while the other pair consists of two tags (ukquake and ukq:irc), which seem to be added automatically to any resource that is mentioned in a particular chat channel. The second example are association rules A → B in K2 which are read as Users labelling the resources in A with some tags often also assign these tags to the resources in B. In essence both resources have to have something in common. Figure 3 shows parts of the resulting graph for applying association 7
rules with 0.05 support, and 10 confidence on K2. Only associations rules with one element in premise and one element in conclusion are considered in the graph In the figure 3 we identified four major areas in the graph which we labeled with the topics delicious hacks, Javascript, Ajar, and CsS. The topics can be derived by applying the FolkRank(hotho et al.(2006))on some of the resources of interest, which also yields relevant users and other resources for the respective area, such that communities of interest can be identified 6 Conclusion n this paper, we have presented a formal model of folksonomies as a set of triples -or, equivalently, a tripartite hypergraph. In order to apply associ- ation rule mining to folksonomies, we have systematically explored possible projections of the folksonomy structure into the standard notion of"shopping baskets"used in rule mining For two selected projec e demonstrated the outcome of rule on a large-scale folksonomy dataset. The rules can be applied for different pur poses, such as recommending tags, users, or resources, populating the supertag relation of the folksonomy, and community detection Future work includes the tighter integration of the various techniques we used here, namely, association rule mining, FolkRank ranking, and graph clus- tering, to further contribute to the abovementioned applications References AGRAWAL, R, IMIELINSKI, T and SWAMI, A(1993): Mining association rules between sets of items in large databases. In: Proc. of SIGMOD 1993, pp. 207-216 ACM Press ConnOtea(2005):conNoteaMailingList.https://lists.sourceforge.net/lists/ istinfo/connotea-discuss GANTER, B and WILLE, R(1999): Formal Concept Analysis Mathematical foun dations. Springe HAMMOND, T, HANNAY, T, LUND, B and SCOTT, J (2005 ) Social Bookmark ing Tools(I): A General Review. D-Lib Magazine, 11 (4) HOTHO, A, JASCHKE, R, SCHMITZ, C and STUMME, G(2006): Information Retrieval in Folksonomies: Search and Ranking. In: submitted for publication at ESWC 2006 LEHMANN, F and WILLE, R(1995): A triadic approach to Formal Concept Anal ysis. In: G. Ellis, R. Levinson, W. Rich and J. F. Sowa(Eds ) Conceptual Structures: Applications, Implementation and Theory, vol 954 of Lecture Notes in Computer Science. Springer. ISBN 3-540-60161-9 HANNAY, T.(2005 ) Social Bookmarking Tools(II): A Case Study - Connotea D-Lib Magazine, 11(4) MATHES, A(2004): Folksonomies-Cooperative Classification and Communication hroughSharedMetadata.http://www.adammathes.com/academic/computer- mediated-communication/folksonomies. html MIKA, P.(2005: Ontologies Are Us: A Unified Model of Social Networks and Se mantics. In: Y. Gil, E. Motta, V. R. Benjamins and M. A. Musen(Eds ) ISWC 05, vol 3729 of LNCS, pp 522-536. Springer-Verlag, Berlin Heidelberg
rules with 0.05 % support, and 10 % confidence on K2. Only associations rules with one element in premise and one element in conclusion are considered in the graph. In the figure 3 we identified four major areas in the graph which we labeled with the topics delicious hacks, Javascript, Ajax, and CSS. The topics can be derived by applying the FolkRank (Hotho et al. (2006)) on some of the resources of interest, which also yields relevant users and other resources for the respective area, such that communities of interest can be identified. 6 Conclusion In this paper, we have presented a formal model of folksonomies as a set of triples – or, equivalently, a tripartite hypergraph. In order to apply association rule mining to folksonomies, we have systematically explored possible projections of the folksonomy structure into the standard notion of “shopping baskets” used in rule mining. For two selected projections, we demonstrated the outcome of rule mining on a large-scale folksonomy dataset. The rules can be applied for different purposes, such as recommending tags, users, or resources, populating the supertag relation of the folksonomy, and community detection. Future work includes the tighter integration of the various techniques we used here, namely, association rule mining, FolkRank ranking, and graph clustering, to further contribute to the abovementioned applications. References AGRAWAL, R., IMIELINSKI, T. and SWAMI, A. (1993): Mining association rules between sets of items in large databases. In: Proc. of SIGMOD 1993, pp. 207–216. ACM Press. CONNOTEA (2005): Connotea Mailing List. https://lists.sourceforge.net/lists/ listinfo/connotea-discuss. GANTER, B. and WILLE, R. (1999): Formal Concept Analysis : Mathematical foundations. Springer. HAMMOND, T., HANNAY, T., LUND, B. and SCOTT, J. (2005): Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11 (4). HOTHO, A., JASCHKE, R., SCHMITZ, C. and STUMME, G. (2006): Information ¨ Retrieval in Folksonomies: Search and Ranking. In: submitted for publication at ESWC 2006. LEHMANN, F. and WILLE, R. (1995): A triadic approach to Formal Concept Analysis. In: G. Ellis, R. Levinson, W. Rich and J. F. Sowa (Eds.), Conceptual Structures: Applications, Implementation and Theory, vol. 954 of Lecture Notes in Computer Science. Springer. ISBN 3-540-60161-9. HANNAY, T. (2005): Social Bookmarking Tools (II): A Case Study - Connotea. D-Lib Magazine, 11 (4). MATHES, A. (2004): Folksonomies – Cooperative Classification and Communication Through Shared Metadata. http://www.adammathes.com/academic/computermediated-communication/folksonomies.html. MIKA, P. (2005): Ontologies Are Us: A Unified Model of Social Networks and Semantics. In: Y. Gil, E. Motta, V. R. Benjamins and M. A. Musen (Eds.), ISWC 2005, vol. 3729 of LNCS, pp. 522–536. Springer-Verlag, Berlin Heidelberg. 8
PASQUIER, N, BASTIDE, Y, TAOUIL, R and LAKHAL, L(1999): Closed se based discovery of small covers for association rules. In: Actes des 15mes younes Bases de Donnes Avances(BDA 99), pp. 361-38 STAAB, S, SANTINI, S, NACK, F, STEELS, L and MAEDCHE, A(2002): Emer- gent semantics. Intelligent Systems, IEEE, 17(1): 78 STEELS, L(1998): The Origins of Ontologies and Communication Conventions in Multi-Agent Systems. Autonomous Agents and Multi-Agent Systems, 1(2): 169 STUMME, G(1999 ) Conceptual Knowledge Discovery with Frequent Concept La tices. FB4-Preprint 2043, TU Darmstadt STUMME, G.(2002 ): Efficient Data Mining Based on Formal Concept Analysis. In A. Hameurlain, R. Cicchetti and R. Traunmller(Eds ) Proc. DEXA 2002, vol 2453 of LNCS, pp. 534-546. Springer, Heidelberg STUMME, G.(2005): A Finite State Model for On-Line Analytical Processing in Triadic Contexts. In: B. Ganter and R. Godin(Eds ) ICFCA, vol. 3403 of Lecture Notes in Computer Science, pp. 315-328. Springer. ISBN 3-540-24525-1 WILLE, R(1982): Restructuring lattices theory: An approach based on hierarchies of concepts. In: I. Rival(Ed ) Ordered Sets, pp. 445-470. Reidel, Dordrecht- ZAKI, M. J and HSIAO, C - J.(1999 ) ChARM: An efficient algorithm for closed association rule mining. Technical Report 99-10. Tech. rep Computer Science Dept, Rensselaer Polytechnic 9
PASQUIER, N., BASTIDE, Y., TAOUIL, R. and LAKHAL, L. (1999): Closed set based discovery of small covers for association rules. In: Actes des 15mes journes Bases de Donnes Avances (BDA’99), pp. 361–381. STAAB, S., SANTINI, S., NACK, F., STEELS, L. and MAEDCHE, A. (2002): Emergent semantics. Intelligent Systems, IEEE, 17 (1):78. STEELS, L. (1998): The Origins of Ontologies and Communication Conventions in Multi-Agent Systems. Autonomous Agents and Multi-Agent Systems, 1 (2):169. STUMME, G. (1999): Conceptual Knowledge Discovery with Frequent Concept Lattices. FB4-Preprint 2043, TU Darmstadt. STUMME, G. (2002): Efficient Data Mining Based on Formal Concept Analysis. In: A. Hameurlain, R. Cicchetti and R. Traunmller (Eds.), Proc. DEXA 2002, vol. 2453 of LNCS, pp. 534–546. Springer, Heidelberg. STUMME, G. (2005): A Finite State Model for On-Line Analytical Processing in Triadic Contexts. In: B. Ganter and R. Godin (Eds.), ICFCA, vol. 3403 of Lecture Notes in Computer Science, pp. 315–328. Springer. ISBN 3-540-24525-1. WILLE, R. (1982): Restructuring lattices theory : An approach based on hierarchies of concepts. In: I. Rival (Ed.), Ordered Sets, pp. 445–470. Reidel, DordrechtBoston. ZAKI, M. J. and HSIAO, C.-J. (1999): ChARM: An efficient algorithm for closed association rule mining. Technical Report 99–10. Tech. rep., Computer Science Dept., Rensselaer Polytechnic. 9