正在加载图片...
word lea Domain label synsets a-I, junto-l,ci acte lade-l, honor-l, honour-2, I) fhonor-3, honour-4 Factot um 1, student-2 密 ystery-2, mystery_story-l, whodunit-1] unf av orable-judgment-1] 1]fman-3)fman-7) fm an-8) FIGURE 3: SyNSET DOCUMENT REP RESENTaTION foR a fRaGmENT of TEXT palan af THE USER. FIGUrE 4 THE MODEHIING PROESS SHOWING A NEw VERSI ON of SI F Has b eEn REalzED W HERE aN ExampIE af USER moDEL aUGmENTaTI ON IE USER moDEL IS SIIIL ImpIEmENTED as a NETWORk 3.2 Filte SIRUCIURE, WIlH THE DIffEREN CE THaT NODES NOW REp ResEnT Sy NSETS aND aRCS THE CO-OCURRENCE of sy NSEIS. DURING THE fiTTERINGPHasE, THE Sy SIEm compaRES aNy THE waRkING HyPOTHESIS IS THaT THE moDEL CaN HEIP DocumENT(IE. THE REPRESENTaTI ON of aNy DOCUmENTS To DEfiNE SEmaNTI C CIaINS THROU GH W HI CH THE firterin TERmS of syNSETS)IN THE SI TE WITH THE USER moDEL las a better CHaNCE TO CaTCH DoCUmENTS SEmaNTICally A maTCHING moDUIE RECEIVES as INPUT THE INTERN al CoSER IO THE TopICS aIREaDy IoUCHED By THE USER REpresenTation of a DocumENT aND THE CURRENT USER modEl and IT pRODU CES as OUTpuT a gassificalI ON of 3.1 Modelling P has THE DOCUMENT (LE. WHETHER IT IS WORTH OR NOT THE IN THE moDEIIINGpHasE SITEIF CaN SIDERS THE BROW SED U SER'S aTIENII ON). THE RElEVaNCE of aNy SINGIE DOCU- DOCUmENTS DURING a user navIGatIoN SeSSI oN the mENTIS ESTImaTED USING THE SEmaNI C NETw ORk V aIdE sy siEm USES THE DOCUmENT REP RESENTal ON BROWW SED NEV EvERy SyNSET Has a SCORE THaT IS 1998)). THE IDEa BEHIND THE SITEIF aLGORITHm CaN- fREqUENcy ov En tI THE TECHNIqUE(SEE fOR DETaI IS(STEfaNI aND STRappaRava, SI SIS of CHEckIN G, fo NEwS CORP US. THE SCORE IS HI GHER fOR IESS fREqUENT ti on of The DOCUmENT. WHETHER THE CONTEXT IN WHICH SyNSEIS, avaIDING THaT VERy commaN mEaNINGS BE- IT OCCURS Has BEEN aIREaDy foUND IN PREVIOUSLy vIS camE TOO PREvaIlING IN THE USER modEL LIkEWISE, IN ITED DOCUMENTS(I.E. aIREaDy SIORED IN THE SEmaNIC THE W ORDB asp TE NET). THIS CONTEXT IS REPRESENTED By a Co-OOCURRENCE UmENT REP RESENTatIon, WHERE EVERY WORD Has a SCOR RELalI aNSHIp, I. E. By THE COUPIES af TERMS INCIUDED IN NvERSEly PROp ORIIONaL TO THE WORD fREqUENC IN THE E DOCUmENT WHICH HavE aIREaDy Co-OCCURRED BEfORE NEWS CORPUS N OIHER DOCUMENTS. T'HIS INfORmaTI ON IS REPRESENTE THE SySIEm BUILDS OR aU GmENIS THE USER moDEL as By aRcs af THE SEmaNTIC NET. a sEmani cnEt WHOSE NODES aRE SyNSEIs and aRCS BE- HERE BEIOW WE PRESENT THE foRmULa USED TO CalcU- Late ThE relevance of a DocumENT USING THE SEMaNTIC TwEEn NODES arE THE CO-oCCurren Ce RElal oN OCCUR- NG CE IN a DocUmENT) of Two SyNSETS. Wei Ghts NETWORk V aUE TECHNIqUE aN NODES aRE IN CREmENIED By THE SCORE of THE Sy NSETS RCS aRE THE mEan of THE CONNECTED Relev ance(c)= u(i)·/ regai)+ NODES WEIGHTS. FoR EaC BROWSED NEW S. THE WEI GHTS af THE NET aRE P ERI ODI cally RECON SI DERED aND pOSSIBly LOW ERED, DEp ENDING ON THE TmE passED fRom THE LasT cta sEfU l∈(SIsd BE REmONED fRam THE NET. IN THIS way IT Is p ossI BIE TO CONSI DER CHaN GES of THE USER's INIERESIS aND TO WHERE w(i) Is THE WEI GHT of SyN SET-NODE i IN THE avaID THaTUNINIERESIING CaNCEp'Is REmaIN IN THE USER UM NEIWORk, w(i, j)IS THE WEI GHT of THE aRC BETWEENWord lemma Domain label Synsets faction Factotum ffaction-2, sect-2g fcabal-1, faction-1, junta-1, junto-1, camarilla-1g franciscan Religion fGray Friar-1, Franciscan-1g church Religion fchurch-1, Christian church-1, Christianity-2g fchurch-2, church building-1g fchurch service-1, church-3g decoration Factotum fdecoration-3g honour Factotum faward-2, accolade-1, honor-1, honour-2, laurels-1g fhonor-3, honour-4g research Factotum fresearch-1g finquiry-1, enquiry-2, research-2g scholar Pedagogy fscholar-1, scholarly person-1, student-2g flearner-1, scholar-2g fscholar-3g professor Pedagogy fprofessor-1g mystery Literature fmystery-2, mystery story-1, whodunit-1g poetry Literature fpoetry-1, poesy-1, verse-1g fpoetry-2g painter Art fpainter-1g verse Literature fpoetry-1, poesy-1, verse-1g fverse-2, rhyme-2g fverse-3, verse line-1g wonder Factotum fwonder-2, marvel-1g criticism Factotum fcriticism-1, unfavorable judgment-1g ideal Factotum fideal-1g fideal-2g man Factotum fman-1, adult male-1g fman-3g fman-7g fman-8g author Literature fwriter-1, author-1g fresco Art ffresco-1g ffresco-2g basilica Religion fbasilica-1g Figure 3: Synset Document Representation for a fragment of text ipation of the user. A new version of SiteIF has been realized where the user model is still implemented as a network structure, with the di erence that nodes now rep￾resent synsets and arcs the co-occurrence of synsets. The working hypothesis is that the model can help to de ne semantic chains through which the ltering has a better chance to catch documents semantically closer to the topics already touched by the user. 3.1 Modelling Phase In the modelling phase SiteIF considers the browsed documents during a user navigation session. The system uses the document representation of the browsed news. Every synset has a score that is inversely proportional to its frequency over all the news corpus. The score is higher for less frequent synsets, avoiding that very common meanings be￾come too prevailing in the user model. Likewise, in the word-based case we considered a word list doc- ument representation, where every word has a score inversely proportional to the word frequency in the news corpus. The system builds or augments the user model as a semantic net whose nodes are synsets and arcs be￾tween nodes are the co-occurrence relation (cooccur￾ing presence in a document) of two synsets. Weights on nodes are incremented by the score of the synsets, while weights on arcs are the mean of the connected nodes weights. For each browsed news, the weights of the net are periodically reconsidered and possibly lowered, depending on the time passed from the last update. Also no longer useful nodes and arcs may be removed from the net. In this way it is possi￾ble to consider changes of the user's interests and to avoid that uninteresting concepts remain in the user model. Figure 4 sketches the modelling process showing an example of user model augmentation. 3.2 Filtering Phase During the ltering phase, the system compares any document (i.e. the representation of any documents in terms of synsets) in the site with the user model. A matching module receives as input the internal representation of a document and the current user model and it produces as output a classi cation of the document (i.e. whether it is worth or not the user's attention). The relevance of any single docu- ment is estimated using the Semantic Network Value Technique (see for details (Stefani and Strapparava, 1998)). The idea behind the SiteIF algorithm con￾sists of checking, for every concept in the representa￾tion of the document, whether the context in which it occurs has been already found in previously vis￾ited documents (i.e. already stored in the semantic net). This context is represented by a co-occurrence relationship, i.e. by the couples of terms included in the document which have already co-occurred before in other documents. This information is represented by arcs of the semantic net. Here below we present the formula used to calcu￾late the relevance of a document using the Semantic Network Value Technique: Relevance(doc) = X i2fsyns(doc)g w(i) f reqdoc (i) + + X i;j2fsyns(doc)g w(i; j) w(j) f reqdoc (j) where w(i) is the weight of synset-node i in the UM network, w(i; j) is the weight of the arc between i and j.
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有