Recommendation of Web pages based on Concept association Mingyu Lu Qiang Zhou Fan Li Yuchang Lu Lizhu Zhou Department of Computer Science and Technology, Tsinghua University, Beijing, 100084) ( Department of Computer Science and Engineering, Yantai University, Shandong, 264005) E-mail:my/u99@mails.tsinghua.edu.cn Abstract done in the area. They mainly fall into three categories (1)Recommendation based on the similarity between The precision and recall are two main criteria used to web pages evaluate the performance of search engines. For general (2)Recommendation based on the preferences and queries, the precision is most important, but for specific behaviors of group users/4) ses, e.g. scientific researchers and applicants for patent ()Recommendation based on the preference and rights, the recall is most important and is often ignored. In behavior of individual user 'e have noticed an interesting phenomenon which based on concept association is introduced. This approach can be called concept association. For example, when we on the information that associates strongly with users'"Microsoft"or"Operating System, because we take it for queries, and therefore improve the recall of a search granted that " Windows"is associated tightly with engine. In the paper, we discuss the meaning, effect and "Microsoft"or"Operating System"in the nature of things variety of concept association, and describe how to Therefore, when a user input"windows"as a keyword generate associational information related with users' of his query, the search engine can provide information queries and how to realize the association-based not only about"Windows also about“ Microsoft recommendation of web pages. We also present relevant and " Operating System", according to the same reason experimental results and give idea about our further work. Another example is that, if users use 9. 11 Event"as a ke word, the traditional information retrieval technique 1. ntroduction which relies on key word matching will just fetch back web pages with9 11 Event"in them, while the search With the explosive growth of the Internet, information engine based on concept association can also get pages including"terrorist attacks","World Trade Center"."Bin on World Wide Web(www) is swiftly becoming a Laden" and etc, because they have tight relation with the new huge information source But in reality, when a web 9. 11 Even". In these cases the recall will be significantly user attempts to search information by using a search ngine, the traditional information retrieval techniques improved. Such examples are too numerous to be ually return those web pages including the key words in enumerated one by one. the query. The result page set consists of too much Different from the above three main methods association-based approach we present for web page In such situations, precision and recall are poor and recommendation parses users query from which the key disappointing.How to carry out effective information retrieval has become a very important and knotty problem. to web pages associated strongly with the query through conce The precision and recall are two main criteria for pt association. It can help web users to get more evaluatingtheperformanceofsearchenginesForgeneralusefulandinterestinginformationfromthewww.and improve the efficiency and quality of information retrieval queries, precision is most important, but for specific uses, of a web search engine, especially its recall. e.g. scientific researchers and applicants for patent rights, recall is most concerned about and is often ignored esearches on improving the recall in interesting 2. Taxonomy of concept association Web information recommendation is an effec measure to improve the recall rate. Many works have been In the paper, we define five kinds of associations of 973 Program of China under Grant No. G199803041 the National (1)Superclass Association Natural Science Foundation of China un Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
Recommendation of Web Pages Based on Concept Association Mingyu Lu1,2 Qiang Zhou1 Fan Li1 Yuchang Lu1 Lizhu Zhou1 (Department of Computer Science and Technology, Tsinghua University, Beijing, 100084) (Department of Computer Science and Engineering, Yantai University, Shandong, 264005) E-mail: mylu99@mails.tsinghua.edu.cn Abstract The precision and recall are two main criteria used to evaluate the performance of search engines. For general queries, the precision is most important, but for specific uses, e.g. scientific researchers and applicants for patent rights, the recall is most important and is often ignored. In this paper, an approach to web pages recommendation based on concept association is introduced. This approach can produce recommended web page links for users based on the information that associates strongly with users’ queries, and therefore improve the recall of a search engine. In the paper, we discuss the meaning, effect and variety of concept association, and describe how to generate associational information related with users’ queries and how to realize the association-based recommendation of web pages. We also present relevant experimental results and give idea about our further work. 1. Introduction With the explosive growth of the Internet, information on World Wide WebWWW is swiftly becoming a new huge information source[1]. But in reality, when a web user attempts to search information by using a search engine, the traditional information retrieval techniques usually return those web pages including the key words in the query. The result page set consists of too much irrelevant information and may lose some relevant ones. In such situations, precision and recall are poor and disappointing[1,2]. How to carry out effective information retrieval has become a very important and knotty problem. The precision and recall are two main criteria for evaluating the performance of search engines. For general queries, precision is most important, but for specific uses, e.g. scientific researchers and applicants for patent rights, recall is most concerned about and is often ignored --- researches on improving the recall in interesting. Web information recommendation is an effective measure to improve the recall rate. Many works have been The research has been supported by the National Grand Fundamental 973 Program of China under Grant No.G1998030414 and the National Natural Science Foundation of China under Grant No.79990580. done in the area. They mainly fall into three categories: (1) Recommendation based on the similarity between web pages [3]. (2) Recommendation based on the preferences and behaviors of group users[4]. (3) Recommendation based on the preference and behavior of individual user[5]. We have noticed an interesting phenomenon which can be called concept association. For example, when we talk about “Windows”, naturally we will think of “Microsoft” or “Operating System”, because we take it for granted that “Windows” is associated tightly with “Microsoft” or “Operating System” in the nature of things. Therefore, when a user input “windows” as a keyword of his query, the search engine can provide information not only about “Windows”, but also about “Microsoft” and “Operating System”, according to the same reason. Another example is that, if users use “9.11 Event” as a key word, the traditional information retrieval technique which relies on key word matching will just fetch back web pages with “9.11 Event” in them, while the search engine based on concept association can also get pages including “terrorist attacks”, “World Trade Center”, “Bin Laden” and etc, because they have tight relation with the “9.11 Even”. In these cases the recall will be significantly improved. Such examples are too numerous to be enumerated one by one. Different from the above three main methods, association-based approach we present for web page recommendation parses users’ query from which the key words are extracted, and provides candidate links pointed to web pages associated strongly with the query through concept association. It can help web users to get more useful and interesting information from the WWW, and improve the efficiency and quality of information retrieval of a web search engine, especially its recall. 2. Taxonomy of concept association In the paper, we def ine f ive kinds of associations of concept: (1) Superclass Association Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE
It means associating a key word with its superclass [hotspot association concepts +URL of index page concepts. For example, if we talk about"windows",we will think of "Microsoft" of"Operation S Annotation:Here‘+’ denotes juxtaposition,‘{} (2)Subclass Association denotes repetition,and‘O)’ means option It means associating a key word with its subclass The dictionary can be viewed as a concept semantic concepts. This kind of association is just opposite to the web knowledge library, what extend from general superclass concept association. For example, "Microsoft" dictionary are as follows associates with“ Windows"or“ Office”. (1) Each item in dictionary is a concept, namely noun, (3) Synonym/near synonym association place name, general abbreviation, such as"Operating System”,“ Clinton”,“ Great wall,"CRM",etc. It means associating a key word with its synonym (2) Attributes of each item include not only ear synonym concepts, including synonyms in othe paraphrase, anguages. For example, "KDD" can be associated with subclass, historic, and hotspot association concepts, along “ Data mining"”,or“ Computer”with" calculator”, with URL of correlative index web pages recommended counting machine”and“ calculating machine can be filled in MKED referring to general dictionary, (4 Historic association while the generation of superclass, subclass, historic association and hotspot association concepts should rely It means associating a key word with some on domain expert knowledge. concepts related due to some historic causes. For exam (4)The multi-knowledge dictionary actually make we can associate“ Beijing”with“ Great wall”and up a concept tree due to the exist of superclass concept pointers and subclass concept pointers, while by adding synonym/near synonym, superclass, subclass, historic and (5)Hotspot topic association hotspot association concepts, it further makes up a concept semantic web. Every node in the web corresponds a It means associating a key word with existing hot concept, which can have at most a parent node news topics which are tying ith it, for example, nd several child nodes ssociating'Clinton'with Lewinsky, or9 11 Event (subclass concepts ); with"Terrorist Attacks""World Trade Center"."Bin (5)The dictionary is composed of several child dictionaries, each of which corresponds an applica Laden"and etc. This is a kind of very useful concept field, such as computer, macrocosm economy, association Chinese traditional medicine With multi-knowledge dictionary, the 3. Multi-knowledge extendible dictionary recommendation of Web Pages based on concept A concept semantic web association can be realized easily. For a keyword abstracted from users query, its association concepts A The hinge of achieving the recommendation of Web found in MKED should be used as new keywords ges based on concept association is generating the searchthewww.inadditiontoitsnormalresultweb association concepts of key words of demands. ages. fetched back by way of using In order to build superclass, subclass, synonym/near concepts association will make up a new index page, with synonym and historic association concepts of key words several candidate links connecting to material pages in it, n queries, our method is to extend general dictionary to for user to choose. multi-knowledge extendible dictionary(MKED), make it include multi-mode and multi-player concept semantic 4. The maintenance and scalability of knowledge. The model of multi-knowledge extendible MKED dictionary, in the way of definition of data dictionary in software engineering, can be designed In order to maintain the dictionary, we specialized a MKED=litem+[item paraphrase +i synonym/ne dictionary maintain tool, which can carry out addition, synonym association concepts )+ superclass modification, deletion and querying of item association concept+(subclass association first, we proposed to add items(along with their concepts)+(historic association concepts paraphrase, synonym/near synonym, superclass, subclass Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
It means associating a key word with its superclass concepts. For example, if we talk about “windows”, we will think of “Microsoft” of “Operation System”. (2) Subclass Association It means associating a key word with its subclass concepts. This kind of association is just opposite to the superclass concept association. For example, “Microsoft” associates with “Windows” or “Office”. (3) Synonym/near synonym Association It means associating a key word with its synonym or near synonym concepts, including synonyms in other languages. For example, “KDD” can be associated with Data Mining , or “Computer” with “calculator”, “counting machine” and “calculating machine”. (4) Historic association It means associating a key word with some other concepts related due to some historic causes. For example, we can associate “Beijing” with “Great Wall” and “the Imperial Palace”. (5) Hotspot topic association It means associating a key word with existing hot news topics which are tying up with it, for example, associating ‘Clinton’ with ‘Lewinsky’, or “9.11 Event “ with “Terrorist Attacks”, “World Trade Center”, “Bin Laden” and etc. This is a kind of very useful concept association. 3. Multi-knowledge extendible dictionary --- A concept semantic web The hinge of achieving the recommendation of Web Pages based on concept association is generating the association concepts of key words of demands. In order to build superclass, subclass, synonym/near synonym and historic association concepts of key words in queries, our method is to extend general dictionary to multi-knowledge extendible dictionary (MKED), make it include multi-mode and multi-player concept semantic knowledge. The model of multi-knowledge extendible dictionary, in the way of definition of data dictionary in software engineering, can be designed as: MKED ={item + {item paraphrase + {synonym/near synonym association concepts} + superclass association concept +subclass association concepts+historic association concepts+ {hotspot association concepts +URL of index page recommended }}} Annotation Here ‘+’ denotes juxtaposition, ‘{}’ denotes repetition, and ‘( )’ means option. The dictionary can be viewed as a concept semantic web knowledge library, what extend from general dictionary are as follows (1) Each item in dictionary is a concept, namely noun, gerund or nominal phrase, including celeb’s name, famous place name, general abbreviation, such as “Operating System”, “Clinton”, “Great Wall”, “CRM”, etc. (2) Attributes of each item include not only paraphrase, but also synonym/near synonym, superclass, subclass, historic, and hotspot association concepts, along with URL of correlative index web pages recommended (3) Paraphrase and synonym/near synonym concepts can be filled in MKED referring to general dictionary, while the generation of superclass, subclass, historic association and hotspot association concepts should rely on domain expert knowledge. (4) The multi-knowledge dictionary actually makes up a concept tree due to the exist of superclass concept pointers and subclass concept pointers, while by adding synonym/near synonym, superclass, subclass, historic and hotspot association concepts, it further makes up a concept semantic web. Every node in the web corresponds a concept, which can have at most a parent node superclass concept , and several child nodes subclass concepts (5) The dictionary is composed of several child dictionaries, each of which corresponds an application field, such as computer, macrocosm economy, and Chinese traditional medicine. With multi-knowledge dictionary, the recommendation of Web Pages based on concept association can be realized easily. For a keyword abstracted from user’s query, its association concepts found in MKED should be used as new keywords to search the WWW. In addition to its normal result web pages, other web pages fetched back by way of using concepts association will make up a new index page, with several candidate links connecting to material pages in it, for user to choose. 4. The maintenance and scalability of MKED In order to maintain the dictionary, we specialized a dictionary maintain tool, which can carry out addition, modification, deletion and querying of item along with its attributes. At first, we proposed to add items (along with their paraphrase, synonym/near synonym, superclass, subclass Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE
and historic concept associations)to MKED manually by Last. we sort the three tables set their fan-in and fan- web site administrators. The hotspot association concept out pointers, and form an integrated reticular expression of can generated through manual work, or built Hownet automatically in the way based on association search, Figure I is a sketch map that describes the cons which will be described in the next chapter. we had built a of the integrated reticular expression of Hownet. mputer-oriented multi-knowledge extendible dictionary as an experimental model, which includes 10 one-class concepts: computer software, computer hardware Creation tree computer issue, computer guideline and computer But in the above manner, the dictionary is difficult to Feature Feature Feature Concept scale. We had to seek other way to achieve its scalability table relation table file relation table Fortunately, we found Hownet, an on-line common-sense knowledge base unveiling inter-conceptual relations and nter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents. As Reticular a knowledge base, the knowledge structured by Hownet is Hownet a graph rather than a tree. For we mainly manipulate Chinese search engines and base on concepts, so it is more suitable for our task than Wordnet Mindnet s or Fig 1. The construction of reticular representation of Hownet. Eurowordnet 9) Hownet, created and maintained by Prof. Zhendon Dong and his Mr. Qiang Dong, is devoted to 5. Automatic creation of hotspot association general and specific properties concepts based on association search concepts. Now it consists of 110 thousands Chinese word and 57 thousands English words, and can explicates 16 We can add hotspot association concept into MKED by antonym, converse, part-whole and etc way of manually binding method mentioned above, but In actual use, we expand our MKed by selecting the process will require users to master enough hotspot news knowledge, which call for more information than nominal and gerundial words as concepts in sense of our superclass, subclass, synonym/near synonym and historic context from Hownet and add them to MKED. However, association, because the hotspot news associated with a many relations between concepts in Hownet is implicit keyword change frequently. With proper means to build and embodied by some special structures and positional hotspot association concept, web administrators'burden information, for example, Hypernym-Hyponym relation is can be decreased greatly embodied by the hierarchical indent structure A ubiquitous phenomena is that, hotspot association feature files of Hownet. Meanwhile, Hownet also provides concepts in near relation to a certain key word of query ome combined symbols to explicate combination among appear with higher frequency in search result pages of this more than two kinds of relations. So we proposed an key word approach to extracting various relations in Hownet by Assuming K is a keyword of a query and creating three tables: the concept table, the feature table P=/PnP2,p/ is web page sets as general search results and the reference table(n of K, we can automatically build hotspot association First. we extract basic inf concept based on association search according features and descriptive infor between features from feature file and add them into the (1)Extract feature items from P using VSM(Vector reference table Space Model), a feature extraction method which is based on word frequency. Each of feature items must be noun, Second, we extract basic information about each concept from the concept dictionary and add it to the gerund, or nominal phrase and make up a feature set (2)Sort the feature items in Fon their frequencies in P oncept table. We also extract relations between concepts, and the update time of the web pages, and Select n of first relations between concept and feature. as well as feature items fi, f,,.fm which are different from K along ombined relations, and add the three kinds of relations with its superclass, subclass, synonym/near synonym and into the reference table historic association concepts. All of these feature items selected are viewed as hotspot association concepts in relation to K; Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
and historic concept associations) to MKED manually by web site administrators. The hotspot association concept can be generated through manual work, or built automatically in the way based on association search, which will be described in the next chapter. we had built a computer-oriented multi-knowledge extendible dictionary as an experimental model, which includes 10 one-class concepts computer software, computer hardware, computer web, computer press, computer market, computer manufacturer, computer education/training, computer issue, computer guideline and computer industry trends. But in the above manner, the dictionary is difficult to scale. We had to seek other way to achieve its scalability. Fortunately, we found Hownet, an on-line common-sense knowledge base unveiling inter-conceptual relations and inter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents. As a knowledge base, the knowledge structured by Hownet is a graph rather than a tree. For we mainly manipulate Chinese search engines and base on concepts, so it is more suitable for our task than Wordnet[7], Mindnet[8] or Eurowordnet[9]. Hownet[10], created and maintained by Prof. Zhendong Dong and his son, Mr. Qiang Dong, is devoted to demonstrate the general and specific properties of concepts. Now it consists of 110 thousands Chinese words and 57 thousands English words, and can explicates 16 kinds of relations including hypernym-hyponym, synonym, antonym, converse, part-whole and etc. In actual use, we expand our MKED by selecting nominal and gerundial words as concepts in sense of our context from Hownet and add them to MKED. However, many relations between concepts in Hownet is implicit and embodied by some special structures and positional information, for example, Hypernym-Hyponym relation is embodied by the hierarchical indent structure in the feature files of Hownet. Meanwhile, Hownet also provides some combined symbols to explicate combination among more than two kinds of relations. So we proposed an approach to extracting various relations in Hownet by creating three tables: the concept table, the feature table and the reference table[11]. First, we extract basic information about various features and descriptive information about relation between features from feature files after a standardization process, and add them into the feature table and the reference table. Second, we extract basic information about each concept from the concept dictionary and add it to the concept table. We also extract relations between concepts, relations between concept and feature, as well as combined relations, and add the three kinds of relations into the reference table. Last, we sort the three tables, set their fan-in and fanout pointers, and form an integrated reticular expression of Hownet. Figure 1 is a sketch map that describes the construction of the integrated reticular expression of Hownet. 5. Automatic creation of hotspot association concepts based on association search We can add hotspot association concept into MKED by way of manually binding method mentioned above, but the process will require users to master enough hotspot news knowledge, which call for more information than superclass, subclass, synonym/near synonym and historic association, because the hotspot news associated with a keyword change frequently. With proper means to build hotspot association concept, web administrators’ burden can be decreased greatly. A ubiquitous phenomena is that, hotspot association concepts in near relation to a certain key word of query appear with higher frequency in search result pages of this key word. Assuming K is a keyword of a query and P={p1,p2,…,pn} is web page sets as general search results of K, we can automatically build hotspot association concept based on association search according to following steps: (1) Extract feature items from P using VSM (Vector Space Model), a feature extraction method which is based on word frequency[5]. Each of feature items must be noun, gerund, or nominal phrase and make up a feature set F; (2) Sort the feature items in F on their frequencies in P and the update time of the web pages, and Select n of first feature items f1, f2,…, fn, which are different from K along with its superclass, subclass, synonym/near synonym and historic association concepts. All of these feature items selected are viewed as hotspot association concepts in relation to K; Feature files Feature Creation tree Feature files Feature table Feature relation table Feature files Concept relation table Reticular Hownet Fig 1. The construction of reticular representation of Hownet. Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE
13)Utilize n of the first feature items f,, f,,..., fm, Separating single words and phrases from content text selected to make up a new index page Pa whose title is of a web page The relative links associated with keyword &K; t Marking the part of speech of a word or a phrase (4)Use a feature item fi selected as a new key word to Y Selecting noun, gerund, or nominal phrase as alternate search the web to get a result page set Pi: feature items [5] Select m of web pages from Pi and register their Calculating the weight of each alternate feature item. URLS U1.U2,…, U in F The weight W of a alternate feature item I is defined (6) Fill fi and U1. U2,..., Um into multi-knowledge using te- Idf function of vsM modell as dictionary MKED as hotspot association concept and corresponding URLS of K, w(I)=tf(d, I)dog /N/N/ (7)Repeat steps(4)-(6)until all feature items selected There, tf(d, I)is the frequency of I present in document have been transacted d; N is the number of all documents, and N, is the number The algorithm of automatically building hotspot of documents that contain I association concept based on association retrieval is (2) feature_select/F, n/ is function to select association described below concept. It order the feature items in feature items set F according to their weights, and pick up n of first feature Algorithm 1: Automatic generation of the association items having higher weights as association concepts. By experiment, association concepts selected should not exceed Input: Keyword K, (3) age_select/Pi, f m/ is a function to select web Output: hotspot topic association concepts of K pages, which selects m (m<=20) of pages having higher I Search target web page set P in the web site with K frequency of feature item f from page set Pi as target as a query. IFP≠ OTHEN I feature set F=feature_ abstract(P, K) 6. Recommendation of Web Pages based on F=feature_select(F n) MKED Make up a new index page Pa whose title is"The relative links associated with keyword &k", ts a by using n of the first feature items fi. J2- m method of recommendation of Web Pages based on in F. WHILE F≠DO association, we can not only supply its normal search Get a feature item f from F: results, but also recommended page links associated tightly with the query. In such cases, manipulating of Search target web page set Pi in the web site query includes following steps Pi=page_select(Pi, fi, m) 1) Extract key words (noun, gerund, or nominal phrase While pi≠DO from user's query I Get the URL U of a web page PG in Pi o do so, we need to take advantage of natural Add information of hotspot topic association language learning techniques to build a simple query concept of K to MKED analyzer. It can separate independent words or phrases from the query, mark them with their parts of speech, and I Khotspot_topic_ association_concept=fi: select nouns, gerunds, or nominal phrases as keywords KURL_of_ hotspot_topic =U/; 2)Searching for target web pages in search engine with Delete PG from Pi j user s query, Delete fi from F) If the search is successful. the search engine will then ENDIF J return the normal query result to the user, else it will inform the user of corresponding information 3)Looking up MKED with each keyword to get he algorithm association information (1)feature_abstract/P, K/ is a function, which returns If found the urls and titles of the associated web feature items extracted from target pages set P, not pages will be taken from MKED and appended to the including key word K as well as its superclass, subclass, normal query result as candidate links ynonym/near synonym and historic association concepts If the search engine fails to find any association Here, we adopt some of mature natural language information, no recommended web page link will be Inderstanding and text classification techniques, appended to the normal result web page 4)Creating association connection Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
{3) Utilize n of the first feature items f1, f2,…, fn, selected to make up a new index page Pa whose title is “The relative links associated with keyword &K”; (4) Use a feature item fi selected as a new key word to search the web to get a result page set Pi; {5} Select m of web pages from Pi and register their URLs U1, U2, …,Um in Pa (6) Fill fi and U1, U2, …,Um into multi-knowledge dictionary MKED as hotspot association concept and corresponding URLs of K; (7) Repeat steps (4)-(6) until all feature items selected have been transacted. The algorithm of automatically building hotspot association concept based on association retrieval is described below. Algorithm 1: Automatic generation of the association information Input: Keyword K, Integer n, m Output: hotspot topic association concepts of K { Search target web page set P in the web site with K as a query; IF PØ THEN { feature set F=feature_abstract(P, K); F=feature_select(F, n); Make up a new index page Pa whose title is “The relative links associated with keyword &K”, by using n of the first feature items f1, f2,…fn, in F; WHILE FØ DO { Get a feature item fi from F; Search target web page set Pi in the web site with fi as a query; Pi=page_select(Pi, fi , m); While PiØ DO { Get the URL U of a web page PG in Pi; Add information of hotspot topic association concept of K to MKED with : { K.hotspot_topic_ association_concept= fi ; K.URL_of_ hotspot_topic =U }; Delete PG from Pi } ; Delete fi from F }} ENDIF } Explanation for the Algorithm (1) feature_abstract{P,K} is a function, which returns feature items extracted from target pages set P, not including key word K as well as its superclass, subclass, synonym/near synonym and historic association concepts. Here, we adopt some of mature natural language understanding and text classification techniques, including: ! Separating single words and phrases from content text of a web page; ! Marking the part of speech of a word or a phrase; ! Selecting noun, gerund, or nominal phrase as alternate feature items. ! Calculating the weight of each alternate feature item. The weight W of a alternate feature item I is defined using TF-IDF function of VSM model[6] as : w(I)=tf(d,I)×log[N/Ni] There, tf(d,I) is the frequency of I present in document d; N is the number of all documents, and NI is the number of documents that contain I. (2) feature_select{F, n} is function to select association concept. It order the feature items in feature items set F according to their weights, and pick up n of f irst feature items having higher weights as association concepts. By experiment, association concepts selected should not exceed (3) page_select{Pi, fi, m} is a function to select web pages, which selects m m<=20of pages having higher frequency of feature item fi from page set Pi as target pages 6. Recommendation of Web Pages based on MKED When a user inputs a Query into a search engine, with the method of recommendation of Web Pages based on association, we can not only supply its normal search results, but also recommended page links associated tightly with the query. In such cases, manipulating of a query includes following steps: 1) Extract key wordsnoun, gerund, or nominal phrase from user’s query To do so, we need to take advantage of natural language learning techniques to build a simple query analyzer. It can separate independent words or phrases from the query, mark them with their parts of speech, and select nouns, gerunds, or nominal phrases as keywords. 2) Searching for target web pages in search engine with user’s query; If the search is successful, the search engine will then return the normal query result to the user, else it will inform the user of corresponding information. 3) Looking up MKED with each keyword to get association information If found, the URLs and titles of the associated web pages will be taken from MKED and appended to the normal query result as candidate links. If the search engine fails to find any association information, no recommended web page link will be appended to the normal result web page. 4) Creating association connection Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE
If user selects a link recommended search engine will In Fig 3, the multi-knowledge extendible dictionary then build a real-time new link to display relating index MKED consists of five tables, namely the main table page, in virtue of the URL of the index page along with superclass, subclass, synonym/near synonym, corresponding to the association concept. historic and hotspot association content table. Contents of The processing of recommendation of web page based hotspot association concept table about"Pentium 4"are on association is described in Fig 2 built automatically through hotspot concept generatin Ulof search engine Create web page Extract keywords Extract keywords from title of web page Search the web with user’ s query information automatically? successful? Search the web Inform User display normal result web manually query page Extract feature items Look up MKED from target pages Sort feature items ccessful commend relating web Select top-nt feature items as ssociation concepts Selected? Search web using hotspot association concepts Create association links Add hotspot association Fig. 2 The processing flow chart of recommendation of web pages based on concept 7. Experime ent: A case study method based on association retrieval. while contents of Now we present the effects of recommendation of other tables about"Pentium 4 "are added manually using Web Pages with the methods based on concept association maintenance tools of mKeD through a query with a single key word " Pentium 4 "in When we input "Pentium 4as a query into our Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
If user selects a link recommended, search engine will then build a real-time new link to display relating index page, in virtue of the URL of the index page corresponding to the association concept. The processing of recommendation of web page based on association is described in Fig 2. 7. ExperimentA case study Now we present the effects of recommendation of Web Pages with the methods based on concept association through a query with a single key word Pentium 4in it, as figure 3 shows. In Fig 3, the multi-knowledge extendible dictionary MKED consists of five tables, namely the main table along with superclass, subclass, synonym/near synonym, historic and hotspot association content table. Contents of hotspot association concept table about “Pentium 4” are built automatically through hotspot concept generating method based on association retrieval, while contents of other tables about “Pentium 4” are added manually using maintenance tools of MKED. When we input “Pentium 4”as a query into our experimental search engine, we can get its normal search Yes No No Yes Yes ! User ! Web site administrator No Yes UI of search engine Extract keywords Look up MKED successful Recommend relating web page Selected? Create association links End Create web page Web page MKED Search the web with user’s query successful? Inform User display normal result web page Extract keywords from title of web page Build association information manually Search the web with user’s query Extract feature items from target pages Sort feature items Select top-n feature items as association concepts Search web using hotspot association concepts Add hotspot association concepts No Fig. 2 The processing flow chart of recommendation of web pages based on concept association. Build association information automatically? Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE
result. as well as recommended links about"Pentium 4 as figure 4 shows, Superclass association concept table of MKED Subclass association concept table of MKED association I Pentium 4 FAQ of Pentium 4 tium 4 association concept table of MKED Main table of MKED synonym/near synonym I Pentium 4 a new type of chips by Itel company [ Pentium4Penti I Pentium 4 Pentium 4 micro-processos Historic association Hotspot topic association concept table of MKED V concept table of MKED Pentium4 Pentium 4 Fig 3 Information records about"Pentium 4"in multi-knowledge dictionary 8. Conclusion and further works Following is the links associated with your query Pentium4” Recommendation of web pages based on association is a Pentium kind of active web service of information retriev supplying users with more valuable information, which Pentium 4 micro-processor can be applied in many aspects, especially in Sale of Pentium 4 automatically realizing such as following tasks Recommending information needed really by user FAQ about Pentium 4 especially in E-Commerce; Pentium3 e Generating correlative links in close relation to a web page e Building new index pages Computer manufacture Updating web sites information automatically. Foreign computer manufacture We believe the technique is useful and meaningful, and Intel Corp. try hard to improve it further through practice and apply it in real work. We have contacted a economic web site Micro-processor Goldenway(http://www.goldenway.com.cn)torecastits CPU price competition search engine by using our method Our further work will be around following aspects (1) Enhancing the precision of extracting key word Fig 4 Recommended links associated with'Pentium 4 from users'query and feature items from web page (2)Optimizing the selected number of concepts and the number of web pages associated with a key word Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
result, as well as recommended links about “Pentium 4”, as figure 4 shows. 8. Conclusion and further works Recommendation of Web Pages based on association is a kind of active web service of information retrieval supplying users with more valuable information, which can be applied in many aspects, especially in automatically realizing such as following tasks " Recommending information needed really by user; especially in E-Commerce; " Generating correlative links in close relation to a web page; " Building new index pages; " Updating web sites information automatically. We believe the technique is useful and meaningful, and try hard to improve it further through practice and apply it in real work. We have contacted a economic web site ‘Goldenway’(http://www.goldenway.com.cnto recast its search engine by using our method. Our further work will be around following aspects (1) Enhancing the precision of extracting key words from users’ query and feature items from web page texts; (2) Optimizing the selected number of concepts and the number of web pages associated with a key word; Following is the links associated with your query “Pentium 4” Pentium Pentium 4 processor Pentium 4 micro-processor Sale of Pentium 4 Service of Pentium 4 FAQ about Pentium 4 Pentium3 Itanium Computer Computer manufacture Foreign computer manufacture Intel Corp. Chip Micro-processor CPU price competition Fig. 4 Recommended links associated with ‘Pentium 4’ Hotspot topic association concept table of MKED Superclass association concept table of MKED Subclass association concept table of MKED synonym/near synonym association concept table of MKED Historic association concept table of MKED Main table of MKED Item Subclass association concept … … Pentium 4 Sale of Pentium 4 Pentium 4 Service of Pentium 4 Pentium 4 FAQ of Pentium 4 … … item hotspot association Concept … … Pentium 4 chip Pentium 4 micro-processor Pentium 4 CPU price competition … … Item Historic association concept … … Pentium 4 Pentium 3 Pentium 4 Itanium … … Item synonym/near synonym association concept … … Pentium 4 P4 Pentium 4 Pentium 4 processor Pentium 4 Pentium 4 micro-processor Item Superclass association concept … … Pentium 4 computer Pentium 4 computer manufacturer Pentium 4 foreign computer manufacturer Pentium 4 Intel … … Fig. 3 Information records about “Pentium 4” in multi-knowledge dictionary Iitem Item Paraphrase … … Pentium 4 a new type of chips by Itel company … … Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE
(3)Improving the effect of association by analyzing the /3/ Marko Balabanovic. An adaptive web page equence of a web user's queries and understanding recommendation service, In proceedings of /international their intentions conference on autonomous agent, Marina del Rey, ca, 199 (4)Using the taxonomic architecture of web search [41 Chumki Basu, Haym Hirsh and William Cohen. engine to create the superclass and subclass Recommendation as classification: using social and content association concepts of a key word automatically. based information in recommendation, In proceedings AAAL98,1998 (5) Studying the implementation of the XML-based [5) Eric I. Glover, Steve Lawrence, Michael D. Gordon conceptual association. William P. Birmingham and C. Lee Giles. Recommendin web documents base ed on user preferences, In e have achieved the preparation of part of the above of SiGIr 99 Workshop on Recommender Systems, 1999 work,and will keep informing our latest developments [6] T.Mitchell. Machine Learning. MeGram Hill, 1997 henceforth [7 Mil International Jounal of Lexicography, 3(4), 1990 9. References [8] Richardson S. D, Dolan w. B. and andervende L MindNet: acquiring and structuring semantic information from text, In Proc. of COLING-ACL 98, 1098-1102, 1998 [1] Venkat N. Gudivada et al.: Information retrieval on the [9] Euro WordNet: Building a multilingual database with World Wide Web. In proceedings of IEEE Internet severa European languages Computing,1997,1{5}:55-68 [2] Christos Faloutsos and Douglas Oard surveyof[10]Hownet.http formation retrieval and filtering methods, technical report, [11] Zhou Qiang, Feng Songyan. Build a relation network http:/citeseernj.neccomm/faloutsos96survey.html representation for Hownet, journal of chinese information processing, 2000(5) Proceedings of the 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems(WECWIS 2002) 1530-1354/02$1700e2002EE SOCIE
(3) Improving the effect of association by analyzing the sequence of a web user’s queries and understanding their intentions. (4) Using the taxonomic architecture of web search engine to create the superclass and subclass association concepts of a key word automatically. (5) Studying the implementation of the XML-based conceptual association. We have achieved the preparation of part of the above work, and will keep informing our latest developments henceforth. 9. References [1] Venkat N. Gudivada et al.: Information retrieval on the World Wide Web. In proceedings of IEEE Internet Computing, 1997, 1{5}:55-68. [2] Christos Faloutsos and Douglas Oard. A survey of information retrieval and filtering methods, technical report, http://citeseer.nj.nec.com/faloutsos96survey.html. [3] Marko Balabanovic. An adaptive web page recommendation service, In proceedings of 1st International conference on autonomous agent, Marina del Rey, ca, 1997. [4] Chumki Basu, Haym Hirsh and William Cohen. Recommendation as classification: using social and content based information in recommendation, In proceedings of AAAI-98, 1998.. [5] Eric J. Glover, Steve Lawrence, Michael D. Gordon, William P. Birmingham and C. Lee Giles. Recommending web documents based on user preferences, In proceedings of SIGIR 99 Workshop on Recommender Systems, 1999. [6] T. Mitchell. Machine Learning. McGram Hill, 1997. [7] Miller G. Wordnet: An On-line Lexical Database. International Journal of Lexicography, 3(4), 1990. [8] Richardson S. D., Dolan W. B. and Vandervende L. MindNet: acquiring and structuring semantic information from text, In Proc. of COLING-ACL’98, 1098-1102, 1998. [9] EuroWordNet: Building a multilingual database with wordnets for several European languages. http://www.hum.uva.nl/~ewn/. [10] Hownet. http://www.keenage.com/ [11] Zhou Qiang, Feng Songyan. Build a relation network representation for Hownet, journal of chinese information processing, 2000(5). Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002) 1530-1354/02 $17.00 © 2002 IEEE