Expert Systems with Applications 36(2009)10388-10396 Contents lists available at Science Direct Expert Systems with Applications ELSEVIER journalhomepagewww.elsevier.com/locate/eswa a blog article recommendation generating mechanism using an SBACPSo algorithm Tien-Chi Huang Shu-Chen Cheng Yueh-Min Huang Engineering Science, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan, ROC Department of Computer Science and Information Engineering, Southern Taiwan University of Technology, No 1, Nantai St, Yung-Kang City, Tainan 710, Taiwan, ROC ARTICLE INFO A BSTRACT Keywords: In recent years blog-assisted learning has been used widely in higher education for improving writing Blog article recommendation and collaboratively sharing work online. However, methods for gathering useful information to be used rmation retrieval auxiliary-learning materials from the multitude of blog articles in the blogosphere has been seldom vestigated. This paper proposes an individualized blog article recommendation mechanism to provide ACPSO Factor ana uality blog articles that accord with users' learning topics. First, an IR-based technique was applied to extract and score index terms. The top three index terms were then entered into Google,'s blog search engine to find the raw recommended blog articles. To avoid the situation where frequent topic-changing leads to a deficiency of article data on a specific learning topic, a forgetting rate was employed to simulate the phenomenon of changing learning topics. Subsequently, an extended Serial Blog Article Composition Particle Swarm Optimization (SBACPSO)algorithm was employed to provide optimal recommended materials to users. We evaluated the system s performance to find the appropriate article population size. Finally, user satisfaction regarding both the system and recommended content gauged to find the systems limitations and possible improvements. This study is of importance in that it provides users with dynamic blog article recommendation, improved online information discovery skills and opportuni ties to socialize with other bloggers. G 2009 Elsevier Ltd. All rights reserved. 1 Introduction 20 hols. Priebe. 2003: Fernheimer Nelson. 2005 Hall Davison 2007: Instone. 2005: Stiler philleo Weblogs, also known as blogs, have been around for 2003 ns& Jacobs, 2004). Previous research has used educa- and the number of bloggers who post blog articles is growing rap- tional blog entries published by students to generate auxiliary idly. The term blog is both a noun and verb and all blogs are materials using a PSo-based algorithm, (SBACPSO)Serial Blog Arti ncompassed in the blogosphere. According to Technorati statis- cle Composition Particle Swarm Optimization. The results showed tics, over 112.8 million blogs have been recorded and over 250 mil- that this approach can produce quality materials efficiently. The lion pieces of tagged social media exist. Over 1.6 million blog materials generated offered high satisfaction, which was evaluated entries are updated every day. Additionally, in July 2004, The Reg- for interaction, assistance, usability, and flexibility aspects( Huang ister, a British technology news and opinion website reported that Huang, Cheng, 2008 ). Many researchers have also studied meth a new blog is created nearly every 5.8 s and more than three blogs ods of extracting useful information from blogs using various ma- are updated per second on average. There has been a gradual in- chine learning techniques. The PlsA-based (probability latent crease in awareness that blogs are a connected community and so- semantic analysis)approach has been used to determine common cial network. In some cases blogs are being used as online journals themes of blogs, and then generate spatiotemporal life cycle pat and as such, are being used more and more frequently as viable terns of blogs via time and location information(Mei, Liu, Su, zhai ducational resources at the post secondary level(Dron, 2003: 2006). Results indicated that this method effectively uncovered pat- Lin, Yueh, Liu, 2006: Smith, 2007). terns in blog themes related to the time and location they were cre- Over the last few years there has been a dramatic increase in the ated. A similar PLSA-based approach has also been adopted to number of publications on educational blog research. Most educa- separate business blogs by topic, which is useful for keywords detec- ional blog studies regard blogs as a reflective learning tool that as- tion( Chen, Tsai, Chan, 2008). These contributions focused on high sists students in developing insight and critical thinking skills quality information extraction from business and corporate blog en- tries and combining blog search engine technology with keyword extraction related to specific topics. Additionally, early research into E-mail addresses: kylineeasylearnorg (T.-C. Huang), kittycemailstutedu tw trend discovery for blogs can be traced back to Glance, Hurst, and Tomokiyo(2004). They used natural language processing(NLP) 0957-4174s- see front matter o 2009 Elsevier Ltd. All rights reserved. do:101016eswa200901.039
A blog article recommendation generating mechanism using an SBACPSO algorithm Tien-Chi Huang a , Shu-Chen Cheng b , Yueh-Min Huang a,* aDepartment of Engineering Science, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan, ROC bDepartment of Computer Science and Information Engineering, Southern Taiwan University of Technology, No. 1, Nantai St., Yung-Kang City, Tainan 710, Taiwan , ROC article info Keywords: Blog article recommendation Information retrieval Forgetting rate SBACPSO Factor analysis abstract In recent years blog-assisted learning has been used widely in higher education for improving writing and collaboratively sharing work online. However, methods for gathering useful information to be used as auxiliary-learning materials from the multitude of blog articles in the blogosphere has been seldom investigated. This paper proposes an individualized blog article recommendation mechanism to provide quality blog articles that accord with users’ learning topics. First, an IR-based technique was applied to extract and score index terms. The top three index terms were then entered into Google’s blog search engine to find the raw recommended blog articles. To avoid the situation where frequent topic-changing leads to a deficiency of article data on a specific learning topic, a forgetting rate was employed to simulate the phenomenon of changing learning topics. Subsequently, an extended Serial Blog Article Composition Particle Swarm Optimization (SBACPSO) algorithm was employed to provide optimal recommended materials to users. We evaluated the system’s performance to find the appropriate article population size. Finally, user satisfaction regarding both the system and recommended content were gauged to find the system’s limitations and possible improvements. This study is of importance in that it provides users with dynamic blog article recommendation, improved online information discovery skills and opportunities to socialize with other bloggers. 2009 Elsevier Ltd. All rights reserved. 1. Introduction Weblogs, also known as blogs, have been around for many years and the number of bloggers who post blog articles is growing rapidly. The term ‘blog’ is both a noun and verb and all blogs are encompassed in the ‘blogosphere’. According to Technorati statistics, over 112.8 million blogs have been recorded and over 250 million pieces of tagged social media exist. Over 1.6 million blog entries are updated every day. Additionally, in July 2004, The Register, a British technology news and opinion website, reported that a new blog is created nearly every 5.8 s and more than three blogs are updated per second on average. There has been a gradual increase in awareness that blogs are a connected community and social network. In some cases blogs are being used as online journals and as such, are being used more and more frequently as viable educational resources at the post secondary level (Dron, 2003; Lin, Yueh, & Liu, 2006; Smith, 2007). Over the last few years there has been a dramatic increase in the number of publications on educational blog research. Most educational blog studies regard blogs as a reflective learning tool that assists students in developing insight and critical thinking skills (Brooks, Nichols, & Priebe, 2003; Fernheimer & Nelson, 2005; Ganley, 2004; Hall & Davison, 2007; Instone, 2005; Stiler & Philleo, 2003; Williams & Jacobs, 2004). Previous research has used educational blog entries published by students to generate auxiliary materials using a PSO-based algorithm, (SBACPSO) Serial Blog Article Composition Particle Swarm Optimization. The results showed that this approach can produce quality materials efficiently. The materials generated offered high satisfaction, which was evaluated for interaction, assistance, usability, and flexibility aspects (Huang, Huang, & Cheng, 2008). Many researchers have also studied methods of extracting useful information from blogs using various machine learning techniques. The PLSA-based (probability latent semantic analysis) approach has been used to determine common themes of blogs, and then generate spatiotemporal life cycle patterns of blogs via time and location information (Mei, Liu, Su, & Zhai, 2006). Results indicated that this method effectively uncovered patterns in blog themes related to the time and location they were created. A similar PLSA-based approach has also been adopted to separate business blogs by topic, which is useful for keywords detection (Chen, Tsai, & Chan, 2008). These contributions focused on high quality information extraction from business and corporate blog entries and combining blog search engine technology with keyword extraction related to specific topics. Additionally, early research into trend discovery for blogs can be traced back to Glance, Hurst, and Tomokiyo (2004). They used natural language processing (NLP) 0957-4174/$ - see front matter 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.01.039 * Corresponding author. Tel.: +886 6 2757575x63336; fax: +886 6 2766549. E-mail addresses: kylin@easylearn.org (T.-C. Huang), kittyc@mail.stut.edu.tw (S.-C. Cheng), huang@mail.ncku.edu.tw (Y.-M. Huang). Expert Systems with Applications 36 (2009) 10388–10396 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
T-C Huang et aL/ Expert Systems with Applications 36(2009)10388-10396 10389 Blog applications in American higher education settings University North Dakota State University nployed personal weblogs to investigate the relationship between motivations and remediate writing genres rooks et aL, 2003). They concluded that although blogging is not a replacement for writing instruction, it is The University of virginia A blogging platform was devised that provided teachers with a collaborative social aided to corresponding digital reading archives and could further share their thoughts ning media and racy(Bull, Bull, Kajder, 2003). The students' blogs were used as a response journal Blog posts trigger Middlebury College A weblog was used as a course management tool that delivered all necessary information to students including: ne syllabus, links to supplementary online materials, students discussions, and all student work(Ganley, 2004 East Carolina University /eblogs were incorporated into an educational leadership graduate course in 2003 with the goal of using them as work with respect to professional conference sty presentations and academic al riting(Martindale wiley, 2004) The University of Texas at Austin their classrooms and investigated whether weblogs were able to create an agonistic, deliberative, and collaborative community. The students were encouraged to freely express their opinions and ideas(Fernheimer Nelson, 2005) algorithms to analyze trends across nearly 100,000 blog entries. Since the volume of existing blog posts is so large, and their for- Trend searching has been conducted to draw the normalized trend mat is so diverse, answering these questions are very difficult. line over time and estimate the buzz of word for given time-related Therefore, a mechanism that recommends similar blog articles to topics. Moreover, in order to evaluate methods for ranking term sig- readers who read blogs with specific topics would be timely and nificance in an RSS feed corpus, three statistical feature selection useful. In this paper we present a systematic process for blog arti- ethods were proposed: x, Mutual Information(MI) and Informa- cle recommendation. First, we employ an information retrieval ion Gain()(Prabowo Thelwall, 2006). Although they concluded chnique to extract the relevant terms associated with individu- lat x2 seems to be the best method of the three a full human clas- als. Subsequently the memory strength of each relevant term sification might be required for further evaluation of these methods. measured by the forgetting rate. Then, Google's blog search engine The use of blogs in higher education can be traced back to 1999 is employed to carry out blog information discovery. Finally, the in Australia( Brisbane Graduate School of Business at the Queens- extended sBacpso algorithm is applied to find the best combina- land University of Technology ) Williams and Jacobs(2004)found tion of blog articles for individual recommendations that blogging had the potential to transform teaching and learning. This paper is organized as follows. Section 2 outlines the blog In the United States, many universities have employed blogs as article recommendation system architecture Research methodol- teaching and learning devices as shown in Table 1 gy is presented in Section 3. Section 4 describes the procedures The literature is full of discussions of the uses of blogs in higher used in the experiment, the results, and a discussion of the results. education and most of the research comes from Western countries. Finally, in Section 5, conclusions are drawn and discussed In Asia it is an area that is comparatively under-researched and un der-discussed. Therefore, we will build on previous studies to fur ther investigate blogs in this paper 2. Individual blog article recommendation architecture as mentioned earlier. numerous studies examined the educa- tional and technological aspects of blogs. However, to date, there This section outlines the architecture of a mechanism that has had been little research regarding the most frequently asked ques- ability to intelligently and automatically recommend blog articles tions about blogs: "How can people find my blog articles after I've (also called blog entries) according to users'reading and writing posted them in my blog? "and"How can I bring the public to my blog?" behaviors as depicted in Fig. 1 ogle blo search engine owledge Blog articles Algori Individual blog article recommendation RSS aggregato Fig. 1. Blog article recommendation system architecture
algorithms to analyze trends across nearly 100,000 blog entries. Trend searching has been conducted to draw the normalized trend line over time and estimate the buzz of word for given time-related topics. Moreover, in order to evaluate methods for ranking term significance in an RSS feed corpus, three statistical feature selection methods were proposed: v2 , Mutual Information (MI) and Information Gain (I) (Prabowo & Thelwall, 2006). Although they concluded that v2 seems to be the best method of the three, a full human classification might be required for further evaluation of these methods. The use of blogs in higher education can be traced back to 1999 in Australia (Brisbane Graduate School of Business at the Queensland University of Technology). Williams and Jacobs (2004) found that blogging had the potential to transform teaching and learning. In the United States, many universities have employed blogs as teaching and learning devices as shown in Table 1. The literature is full of discussions of the uses of blogs in higher education and most of the research comes from Western countries. In Asia it is an area that is comparatively under-researched and under-discussed. Therefore, we will build on previous studies to further investigate blogs in this paper. As mentioned earlier, numerous studies examined the educational and technological aspects of blogs. However, to date, there had been little research regarding the most frequently asked questions about blogs: ‘‘How can people find my blog articles after I’ve posted them in my blog?” and ‘‘How can I bring the public to my blog?” Since the volume of existing blog posts is so large, and their format is so diverse, answering these questions are very difficult. Therefore, a mechanism that recommends similar blog articles to readers who read blogs with specific topics would be timely and useful. In this paper we present a systematic process for blog article recommendation. First, we employ an information retrieval technique to extract the relevant terms associated with individuals. Subsequently, the memory strength of each relevant term is measured by the forgetting rate. Then, Google’s blog search engine is employed to carry out blog information discovery. Finally, the extended SBACPSO algorithm is applied to find the best combination of blog articles for individual recommendations. This paper is organized as follows. Section 2 outlines the blog article recommendation system architecture. Research methodology is presented in Section 3. Section 4 describes the procedures used in the experiment, the results, and a discussion of the results. Finally, in Section 5, conclusions are drawn and discussed. 2. Individual blog article recommendation architecture This section outlines the architecture of a mechanism that has ability to intelligently and automatically recommend blog articles (also called blog entries) according to users’ reading and writing behaviors as depicted in Fig. 1. Table 1 Blog applications in American higher education settings. University Description North Dakota State University Researchers examined whether a motivated blogger could become a better writer in other writing genres and employed personal weblogs to investigate the relationship between motivations and remediate writing genres (Brooks et al., 2003). They concluded that although blogging is not a replacement for writing instruction, it is worthy writing activity for college courses The University of Virginia A blogging platform was devised that provided teachers with a collaborative social setting in which they were guided to corresponding digital reading archives and could further share their thoughts concerning media and literacy (Bull, Bull, & Kajder, 2003). The students’ blogs were used as a response journal. Blog posts triggered reflections, while comments and feedbacks were delivered via e-mail Middlebury College A weblog was used as a course management tool that delivered all necessary information to students including: the syllabus, links to supplementary online materials, students discussions, and all student work (Ganley, 2004) East Carolina University Weblogs were incorporated into an educational leadership graduate course in 2003 with the goal of using them as the main tools for improving the quality of student work with respect to professional conference style presentations and academic article writing (Martindale & Wiley, 2004) The University of Texas at Austin Researchers explored the use of weblogs in their classrooms and investigated whether weblogs were able to create an agonistic, deliberative, and collaborative community. The students were encouraged to freely express their opinions and ideas (Fernheimer & Nelson, 2005) Fig. 1. Blog article recommendation system architecture. T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396 10389
10390 T.-C Huang et aL/ Expert Systems with Applications 36 (2009)10388-10396 In this system, bloggers are regarded as learners or students Definition 1(Blog entry set ). E=el, ez,... en) is a blog entry set of n who receiving recommendations. In the first step, all blog articles blog entries, where Vei EE, i>0, n>0. are collected in the blog knowledge base, which is a database that ecords the author, time of posting, title, content, and associated Definition 2(Index term set). Let t be the index term. Then a bld category of each blog article. The commentaries made in each blog entry contains n index terms and can be represented as article are stored in the blog knowledge base as well. Second, using e=[ti, tz, . tn), where vt Ee, i>0. Each index term can be information retrieval (IR)techniques, the extraction agent extracts egarded as a specific domain of knowledge in this study individual keywords from blog articles, which consists of the arti- cles read and written by each blogger. Hence these keywords a Definition 3(TF*IDF). In the IR vector model, assuming that there highly associated with the individual. The third ploys Goo- are t index terms represented a blog entry er, and then e can be gle's blog search engine to search for blog articles associated with expressed as a vector e=(wi., w,..., Wt, Jj). where Wy is the ith of the keywords. The articles are considered raw recommendation term weight of e in t dimension space Wy is a significance of ith materials. Fourth, using the SBACPSo algorithm proposed in our index term to the blog entry e and it is usually measured by the revious study (huang et al, 2008), the best combination of ble best known term-weighting schemes as shown in naterials. Finally, individuals receive the recommendation. This Wy=fy x idfi x bi process is asynchronous: when a user logs out of the learning plat form, the process begins in the background. The next time he/she where fu is the normalized frequency of index term ti in blog entry accesses the learning platform the system automatically provides e, and is represented as fy=maxfregu xp x(1 +r). The parameters recommendations according to the individuals blog reading and contained in the normalized frequency are described as follows: ting habits. It is worth mentioning that contrary to our previous Idy, which generated an RSS feed for learner subscription, learn-.frequ. the raw frequency of index term t appearing in blog ers in this system obtain the recommended blog articles without entry ej. subscribing to any feed. The Rss aggregator collects the recom-. max, frequi the highest frequency of index term t in blog entry mended blog articles and then automatically pushes them into the user interface as an entry without user manipulation p, the highest weight of a index terms position in a blog entry. With regards to the execution procedure of the extraction The value is represented as title area(p= 1.5 ), main posting agent, we addressed two factors. The first one was the type of hu- area(p=1.3), and comment area(p= 1.0), man behavior, and other was forgetting rate of specific keywords. r, the reply factor indicates the additional significance of There are three types of human interaction with blog articles: index term in a blog entry reading, posting (writing), and commenting. Although the key words are likely to appear in articles associated with these three Detailed descriptions of the parameters p and r can be found in the behaviors, we argue that the value of a keyword is more significant previous study( Huang et al., 2008). The idf factor means that index to an individual when it appears blog articles written by the indi- terms that appear in many blog entries are not vent one. In addi- ery meaningful for vidual, as opposed to those read by the individual. Hence, the key- distinguishing a relevant entry from a non- releva veight than those in the blog articles read. We also applied the bon to the famous f idf equation, each index term is given a concept of strength of memory versus the decline of memory value of behavior parameter is set as follows: retention to model the rate of forgetting with respect to keyword This concept was proposed by Ebbinghaus(1913) to describe the 1 if ith index term appears in a read posting durability of memory traces in the brain, which have an exponen- b:= 1.25 if ith index term appears in a self-written posting tial nature of forgetting. The blog articles explored by google's search engine needed to 1.25 if ith index term appears in a comment posti be screened for quality, so the SBacPso algorithm was applied a revised to find out the quality articles for recommendation. t Behavior parameter b gauges the significance of each index m according to posting type. In read blog entries learners read and learn the index terms, while in written blog entries and com- 3. Methodology ents they apply specific index terms. So, the behavior parameter This section describes the mathematical model used in our blog than those from read postings. postings or comments is higher recommendation mechanism 3.1.2. Rating extracted keywords by forgetting rate 3. 1. Keyword processing Individual may read blog articles about many different topics This results in frequent topic-changing. The forgetting rate The first step is keyword processing, which includes keyword simulates the phenomenon of topic changing in recommendations extraction and scoring Keyword extraction uses an IR-based tech- According to the formula defined by ebbinghaus, memory reten- nique to extract keywords from individual blog entries in the blog tion is represented as e-f, where S is the relative strength of knowledge base. The forgetting rate of specific keywords was con- memory, and t is time. Let (uk(to) be the initial memory strength idered when modeling the significance of keywords with regard to the kth learner to the ith index term (domain knowledge). Following that pik(to At)is defined as the memory's 3. 1.1. Extraction and scoring remaining strength after time At, which is formulated as Eq (2). Blog articles contain a title, content, and comments. This is sim- Fig. 2 shows the forgetting curves, which illustrate the variations ilar to discussion forums, so we expanded the formal definitions of of remaining memory strength under different parameter sets evious forum study(Huang, Chen, Kuo, Jeng, 2008)and behavior parameters for analyzing different types of blc The formal definitions are as follows k(to+△)=e学k(o)
In this system, bloggers are regarded as learners or students who receiving recommendations. In the first step, all blog articles are collected in the blog knowledge base, which is a database that records the author, time of posting, title, content, and associated category of each blog article. The commentaries made in each blog article are stored in the blog knowledge base as well. Second, using information retrieval (IR) techniques, the extraction agent extracts individual keywords from blog articles, which consists of the articles read and written by each blogger. Hence, these keywords are highly associated with the individual. The third step employs Google’s blog search engine to search for blog articles associated with of the keywords. The articles are considered raw recommendation materials. Fourth, using the SBACPSO algorithm proposed in our previous study (Huang et al., 2008), the best combination of blog articles is generated. These are the elaborated recommendation materials. Finally, individuals receive the recommendation. This process is asynchronous; when a user logs out of the learning platform, the process begins in the background. The next time he/she accesses the learning platform the system automatically provides recommendations according to the individual’s blog reading and writing habits. It is worth mentioning that contrary to our previous study, which generated an RSS feed for learner subscription, learners in this system obtain the recommended blog articles without subscribing to any feed. The RSS aggregator collects the recommended blog articles and then automatically pushes them into the user interface as an entry without user manipulation. With regards to the execution procedure of the extraction agent, we addressed two factors. The first one was the type of human behavior, and other was forgetting rate of specific keywords. There are three types of human interaction with blog articles: reading, posting (writing), and commenting. Although the keywords are likely to appear in articles associated with these three behaviors, we argue that the value of a keyword is more significant to an individual when it appears blog articles written by the individual, as opposed to those read by the individual. Hence, the keywords that appear in the blog articles written were given more weight than those in the blog articles read. We also applied the concept of strength of memory versus the decline of memory retention to model the rate of forgetting with respect to keywords. This concept was proposed by Ebbinghaus (1913) to describe the durability of memory traces in the brain, which have an exponential nature of forgetting. The blog articles explored by Google’s search engine needed to be screened for quality, so the SBACPSO algorithm was applied and revised to find out the quality articles for recommendation. 3. Methodology This section describes the mathematical model used in our blog recommendation mechanism. 3.1. Keyword processing The first step is keyword processing, which includes keyword extraction and scoring. Keyword extraction uses an IR-based technique to extract keywords from individual blog entries in the blog knowledge base. The forgetting rate of specific keywords was considered when modeling the significance of keywords. 3.1.1. Extraction and scoring Blog articles contain a title, content, and comments. This is similar to discussion forums, so we expanded the formal definitions of our previous forum study (Huang, Chen, Kuo, & Jeng, 2008) and added behavior parameters for analyzing different types of blog articles. The formal definitions are as follows: Definition 1 (Blog entry set). E = {e1,e2,...,en} is a blog entry set of n blog entries, where 8ei 2 E; i > 0; n P 0. Definition 2 (Index term set). Let t be the index term. Then a blog entry contains n index terms and can be represented as e ¼ ft1;t2; ... ;tng; where8ti 2 e; i > 0. Each index term can be regarded as a specific domain of knowledge in this study. Definition 3 (TFIDF). In the IR vector model, assuming that there are t index terms represented a blog entry ej, and then ej can be expressed as a vector ~ej ¼ ðw1; j; wj; ... ; wt; jÞ, where wi,j is the ith term weight of e ~j in t dimension space. wi,j is a significance of ith index term to the blog entry ej, and it is usually measured by the best known term-weighting schemes as shown in Wi;j ¼ fi;j idfi bi ð1Þ where fi,j is the normalized frequency of index term ti in blog entry ej, and is represented as fi;j ¼ freqi;j maxlfreql;j p ð1 þ rÞ. The parameters contained in the normalized frequency are described as follows: freqi,j, the raw frequency of index term ti appearing in blog entry ej, maxl freql,j, the highest frequency of index term ti in blog entry ej, p, the highest weight of a index term’s position in a blog entry. The value is represented as title area (p = 1.5), main posting area (p = 1.3), and comment area (p = 1.0), r, the reply factor indicates the additional significance of an index term in a blog entry. Detailed descriptions of the parameters p and r can be found in the previous study (Huang et al., 2008). The idf factor means that index terms that appear in many blog entries are not very meaningful for distinguishing a relevant entry from a non-relevant one. In addition to the famous tf*idf equation, each index term is given a behavior parameter (bi) according to the type of a blog entry. The value of behavior parameter is set as follows: bi ¼ 1 if ith index term appears in a read posting 1:25 if ith index term appears in a self-written posting 1:25 if ith index term appears in a comment posting 8 >: Behavior parameter b gauges the significance of each index term according to posting type. In read blog entries learners read and learn the index terms, while in written blog entries and comments they apply specific index terms. So, the behavior parameter value of index terms from written postings or comments is higher than those from read postings. 3.1.2. Rating extracted keywords by forgetting rate Individual may read blog articles about many different topics. This results in frequent topic-changing. The forgetting rate simulates the phenomenon of topic changing in recommendations. According to the formula defined by Ebbinghaus, memory retention is represented as et S, where S is the relative strength of memory, and t is time. Let ui,k(t0) be the initial memory strength with regard to the kth learner to the ith index term (domain knowledge). Following that ui,k(t0 + Dt) is defined as the memory’s remaining strength after time Dt, which is formulated as Eq. (2). Fig. 2 shows the forgetting curves, which illustrate the variations of remaining memory strength under different parameter sets /i;kðt0 þ DtÞ ¼ eDt S /i;kðt0Þ ð2Þ 10390 T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396
T-C Huang et aL/ Expert Systems with Applications 36 (2009)10388-10396 10391 3. 2. Blog search quer 一5=100 The keywords extracted are regarded as learning elements asso- 5=50 ciated with the learner. Hence, conducting a keyword-based search 5=20 on the web can retrieve all content containing the keywords. Sim- 70 ilar to common search engines, blog search engines are given key ords and then retrieve keyword- related content. The main difference between the two kinds of search engine search engines mainly index blogs and ignore the rest of the web. According to the results of (Thelwall Hasler, 2007)study Googlesblogsearchengine(http://blogsearchgoogle.com)not 20 only covered a great deal of blog articles associated with various topics, but also explored the least spammed blog articles. In addi- tion, other studies have investigated tourism blogs and collected blog documents using Google's blog search engine(Li, Liu, &Yu, 01020304050607080 2006: Nalin& Mohan, 2008). Elapsed Times Since Learning(day) In this study, Google's blog search engine was used to explore Fig. 2. Ebbinghaus' forgetting curve with different parameter sets (t=1. 5-20, 50, the related blog content as raw recommended materials. The three gine. Thus, the search range will be small in scope so that the re- sults will not diverge. The explored blog articles are stored in our Memory strength exponentially decreases with time, and tent comment count. and trackback count. furthermore. the num onger the relative memory strength is, the slower the decay ber of external sources that link to an article is considered and speed is. If an individual learner has an immediate recall after calculated as one of the categorized factors. a ning or studying, he/she will have the highest memory remain- 3.3. Model design for blog article composition with SBACPsO algorithm Although memory remaining strength regarding a specific do- main of knowledge gradually decreases with time, it will be The raw recommendation materials searched by the blog search strengthened if a learner learns the same or related learning mate- engine are processed according to several criteria In our previous rials again. As mentioned before, learning behaviors include blog study, we demonstrated that a PSo-based algorithm, SBACPSO,can entry reading, writing, and commenting. Each form of learning will valuate blog entries according to the quality of their auxiliary strengthen memory of a domain of knowledge. However, the learning materials(huang et al., 2008). In this study we propose intensity of memory strengthening of each learning method is an extended model based on the four indicators of the SBACPso not equal. We defined an intensity factor, a, to describe the inten- algorithm. The linked factor, which refers to the number of exter sity difference among these three learning methods. The values of nal sources that link to the blog article, has been included in the ur intensity factor sets and corresponding descriptions are out- extended model. this factor is used to determine whether a blog ined in Table 2. The enhanced effect of memory strength using article has significant reference value according to other online re- the intensity factor is proposed on the basis of behavioral factors sources. Additionally, the linking factor, which evaluates the num- suggested by huang et al. (2008 ). ber of links in a blog article that link to other online resources With the enhanced effect of strength, Eg. (2)can be also used to estimate the information value of this blog article. reformulated to represent how intain memory strength The difference between the two above factors is this: the linked (see the following equation factor examines about the number of incoming links, while the linking factor examines about the number of outgoing links. 中(0+△)=e++Ⅱe*中(o) Since the blog entries are collected from the web it is difficult to automatically analyze the difficulty of entries so we are not con- cerned here with the difficulty of blog entries. The formal definitions of variables used in the objective func where n is the number of blog entries where ith index term appears tion are defined as follows Each index term for a learner is given a score calculated by the product of the results of Eqs. (1)and (3). In order to avoid over- whelming information inflow, only the top three index terms are N: number of blog articles in a search result. selected as query terms. They are provided to the blog search engine .ol 1<i< N: the number of outgoing links to ith blog introduced in the next subsection ·il1≤i≤N: the number of incoming links in ith blog ·r,1≤i≤N: the association degree between ith blog nd the searched keywords, ·C,1≤i≤N: the amount of comments posted in reply to ith Table 2 t, 1 <i<N: the amount of trackback to ith blog article Learning behavior and corresponding intensity factor value. x, 1 <i<N: the decision variable is set to one if the ith blog Learning behavior description The value of intensity article is selected in a composition; otherwise, it is set to zero factor(a) , u: respectively represent the lower and upper bounds of the Reading a blog entry related to a specific domain expected comment level for each set of blog articles searched h: the lower bound of the expected relevance of the searched Writing(posting)a blog entry related to a specific A comment mentioning a specific domain of knowledge 0.1 p: the lower bound of the expected total outgoing number aks in the search re
Memory strength exponentially decreases with time, and the stronger the relative memory strength is, the slower the decaying speed is. If an individual learner has an immediate recall after learning or studying, he/she will have the highest memory remaining strength. Although memory remaining strength regarding a specific domain of knowledge gradually decreases with time, it will be strengthened if a learner learns the same or related learning materials again. As mentioned before, learning behaviors include blog entry reading, writing, and commenting. Each form of learning will strengthen memory of a domain of knowledge. However, the intensity of memory strengthening of each learning method is not equal. We defined an intensity factor, a, to describe the intensity difference among these three learning methods. The values of our intensity factor sets and corresponding descriptions are outlined in Table 2. The enhanced effect of memory strength using the intensity factor is proposed on the basis of behavioral factors suggested by Huang et al. (2008). With the enhanced effect of memory strength, Eq. (2) can be reformulated to represent how we maintain memory strength (see the following equation) /i;kðt0 þ DtÞ ¼ eDt S Yn l¼1 eal /i;kðt0Þ ð3Þ where n is the number of blog entries where ith index term appears in. Each index term for a learner is given a score calculated by the product of the results of Eqs. (1) and (3). In order to avoid overwhelming information inflow, only the top three index terms are selected as query terms. They are provided to the blog search engine introduced in the next subsection. 3.2. Blog search query The keywords extracted are regarded as learning elements associated with the learner. Hence, conducting a keyword-based search on the web can retrieve all content containing the keywords. Similar to common search engines, blog search engines are given keywords and then retrieve keyword-related content. The main difference between the two kinds of search engines is that blog search engines mainly index blogs and ignore the rest of the web. According to the results of (Thelwall & Hasler, 2007) study Google’s blog search engine (http://blogsearch.google.com) not only covered a great deal of blog articles associated with various topics, but also explored the least spammed blog articles. In addition, other studies have investigated tourism blogs and collected blog documents using Google’s blog search engine (Li, Liu, & Yu, 2006; Nalin & Mohan, 2008). In this study, Google’s blog search engine was used to explore the related blog content as raw recommended materials. The three highest index terms for the learner are inputted into the search engine. Thus, the search range will be small in scope so that the results will not diverge. The explored blog articles are stored in our blog knowledge base, which is organized by author, date, title, content, comment count, and trackback count. Furthermore, the number of external sources that link to an article is considered and calculated as one of the categorized factors. 3.3. Model design for blog article composition with SBACPSO algorithm The raw recommendation materials searched by the blog search engine are processed according to several criteria. In our previous study, we demonstrated that a PSO-based algorithm, SBACPSO, can evaluate blog entries according to the quality of their auxiliary learning materials (Huang et al., 2008). In this study, we propose an extended model based on the four indicators of the SBACPSO algorithm. The linked factor, which refers to the number of external sources that link to the blog article, has been included in the extended model. This factor is used to determine whether a blog article has significant reference value according to other online resources. Additionally, the linking factor, which evaluates the number of links in a blog article that link to other online resources, is also used to estimate the information value of this blog article. The difference between the two above factors is this: the linked factor examines about the number of incoming links, while the linking factor examines about the number of outgoing links. Since the blog entries are collected from the web it is difficult to automatically analyze the difficulty of entries so we are not concerned here with the difficulty of blog entries. The formal definitions of variables used in the objective function are defined as follows: N: number of blog articles in a search result, oli, 1 6 i 6 N: the number of outgoing links to ith blog article, ili, 1 6 i 6 N: the number of incoming links in ith blog articles, ri, 1 6 i 6 N: the association degree between ith blog article and the searched keywords, ci, 1 6 i 6 N: the amount of comments posted in reply to ith blog article, ti, 1 6 i 6 N: the amount of trackback to ith blog article, xi, 1 6 i 6 N: the decision variable is set to one if the ith blog article is selected in a composition; otherwise, it is set to zero, l, u: respectively represent the lower and upper bounds of the expected comment level for each set of blog articles searched, h: the lower bound of the expected relevance of the searched topic, p: the lower bound of the expected total outgoing number of links in the search result, Fig. 2. Ebbinghaus’ forgetting curve with different parameter sets (t = 1, S = 20, 50, and 100). Table 2 Learning behavior and corresponding intensity factor value. Learning behavior description The value of intensity factor (a) Reading a blog entry related to a specific domain of knowledge 0.08 Writing (posting) a blog entry related to a specific domain of knowledge 0.1 A comment mentioning a specific domain of knowledge 0.1 T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396 10391
10392 T.-C Huang et aL/ Expert Systems with Applications 36 (2009)10388-10396 g: the lower bound of the expected total number of incoming ber in a search results. The representations of the three member- inks in the search res ship fu are Is results. L M. H M(x): a double sigmoid form function mapping an integer respectively denote "Low comment number","Moderate comment number x into the real unit interval [0, 1. number", and"High comment number". The given values are as C(x: a membership function mapping the number of outgoing signed to each result (L=0, M=0.5, and H= 1). For example, links into a degree. the comment number of blog article b is 45 and the maximum It is 15 We apply a double sigmoid form function M(x) to map the num- blog article b,s comment number is 0.3(=45 /150 ). Hence, the blog ber of incoming links x into a real unit interval [0, 1. The closed article b gets a fuzzy value(0.5. 0.5, 0). Afterwards, a maximum form is shown as M(x)=sign(x (1-exp(-02)).The parameter function MAk L M Ho is used to detemine which comment leve s is a control variable designed to control the increasing rate of are equal, the higher comment level is assigned. In the example the recommendation quality. ( Fig 3) depicts two different results blog article b is tagged with comment level M using two control variables. It shows that the smaller the control 1,z≤0.2 riable is the steeper the slope will be f(2)={",02<z<04 The membership function C(x) is used to map the number of outgoing links of a blog article into a real unit interval [0, 1].The 0.z≥04 form of this function as defined by( chen, Huang, Chu, 2005 )is 0.z≤0.2orz≥0.6 as follows f(2)=2,02<z≤04 「0.X≤x 2,04≤z≤07 C(x)=,X<X< 0.z≤0.4 1,x≥x2 f3(2)=54,04<z<0.7 1,z≥0.7 where xI and x2, respectively indicate the lower and upper bounds The trackback number depends on how many people manually ymbol, we apply a fuzzy concept to fuzzificate comment data. The fore, the larger the amount of trackback the more useful the atp In addition, in order to transform numeric comment data into a refer to a blog article, which is regarded as active behavior. The ommen number of each blog article is fuzzificated into one of is The main goal of the objective function is maximizing the aver three levels using three membership functions shown in Fig. 4, age trackback number, and the objective function is defined as where z denotes the percentage of a blog article s comment num- Maximize Z(x=X1, X2,.,XN)=A* which is subject to follow- Constraint1.∑1r≥h Constraint4.l≤∑1MAX(L,M:,Hx≤u s=30 These constraints are used to revise the objective function to nake the candidate solutions meet all constraints. This idea is orig 0102030405060708090100 inated from a multiple criteria study(Hwang, Yin, Yeh, 2006). Each solution that violates the constraints is given a penalty as pre- sented in Table 3 Fig. 3. The incoming link mapping function M(x) using two different control With the penalty terms, the fitness function is given as O(x) )-O12-O2B-O37-O4i where @1-O4 indicates the relative weights of the penalty terms. After this, the SBACPSO algo- fcz) rithm, which consists of five steps, is employed to generate the best set of blog articles. Input: N searched blog articles of a personal interest topic. Output: The best set of blog articles for personal recommendat 3.3.1. Step 1. Initial swarm generation The percentage of comment number The first step is to encode the presence of blog articles in searched result as a particle. The particle is represented by a Fig 4. The three membership functions of each blog article's comments. N-dimensional vector, [x1X2,.XN]. As mentioned before, if the Table 3 Formula expression of each penalty term. Penalty Penalty term 2
q: the lower bound of the expected total number of incoming links in the search result, M(x): a double sigmoid form function mapping an integer number x into the real unit interval [0, 1], C(x) : a membership function mapping the number of outgoing links into a degree. We apply a double sigmoid form function M(x) to map the number of incoming links x into a real unit interval [0, 1]. The closed form is shown as MðxÞ ¼ signðxÞ 1 exp x s 2 . The parameter s is a control variable designed to control the increasing rate of the curve. The value of this parameter could be set according to the recommendation quality. (Fig. 3) depicts two different results using two control variables. It shows that the smaller the control variable is the steeper the slope will be. The membership function C(x) is used to map the number of outgoing links of a blog article into a real unit interval [0, 1]. The form of this function as defined by (Chen, Huang, & Chu, 2005) is as follows: CðxÞ ¼ 0; x 6 x1 xx1 x2x1 ; x1 : 9 >= >; ð4Þ where x1 and x2, respectively indicate the lower and upper bounds of the number of outgoing links. In addition, in order to transform numeric comment data into a symbol, we apply a fuzzy concept to fuzzificate comment data. The comment number of each blog article is fuzzificated into one of three levels using three membership functions shown in Fig. 4, where z denotes the percentage of a blog article’s comment number in a search results. The representations of the three membership functions are listed as Eqs. (5)–(7). The results, L, M, H, respectively denote ‘‘Low comment number”, ‘‘Moderate comment number”, and ‘‘High comment number”. The given values are assigned to each result (L = 0, M = 0.5, and H = 1). For example, if the comment number of blog article b is 45 and the maximum comment number in the search result is 150, the percentage of blog article b’s comment number is 0.3 (=45/150). Hence, the blog article b gets a fuzzy value (0.5, 0.5, 0). Afterwards, a maximum function MAX(L, M, H) is used to determine which comment level a blog article is assigned. If the comment level and fuzzy level are equal, the higher comment level is assigned. In the example, the blog article b is tagged with comment level M f1ðzÞ ¼ 1; z 6 0:2 0:4z 0:2 ; 0:2 : 9 >= >; ð5Þ f2ðzÞ ¼ 0; z 6 0:2 or z P 0:6 z0:2 0:2 ; 0:2 : 9 >= >; ð6Þ f3ðzÞ ¼ 0; z 6 0:4 z0:4 0:2 ; 0:4 : 9 >= >; ð7Þ The trackback number depends on how many people manually refer to a blog article, which is regarded as active behavior. Therefore, the larger the amount of trackback the more useful the article is. The main goal of the objective function is maximizing the average trackback number, and the objective function is defined as Maximize Zðx ¼ x1; x2; ... ; xNÞ ¼ PN i¼1 tix P i N i¼1 xi which is subject to following constraints. Constraint 1. PN i¼1rixi P h Constraint 2. PN i¼1CðoliÞxi P p Constraint 3. PN i¼1MðiliÞxi P q Constraint 4. l 6 PN i¼1MAXðLi; Mi; HiÞxi 6 u These constraints are used to revise the objective function to make the candidate solutions meet all constraints. This idea is originated from a multiple criteria study (Hwang, Yin, & Yeh, 2006). Each solution that violates the constraints is given a penalty as presented in Table 3. With the penalty terms, the fitness function is given as OðxÞ ¼ PN i¼1 tix P i N i¼1 xi x1a x2b x3c x4k where x1–x4 indicates the relative weights of the penalty terms. After this, the SBACPSO algorithm, which consists of five steps, is employed to generate the best set of blog articles. Input: N searched blog articles of a personal interest topic. Output: The best set of blog articles for personal recommendation. 3.3.1. Step 1. Initial swarm generation The first step is to encode the presence of blog articles in a searched result as a particle. The particle is represented by a N-dimensional vector, [x1,x2,...,xN]. As mentioned before, if the Fig. 3. The incoming link mapping function M(x) using two different control variables. Fig. 4. The three membership functions of each blog article’s comments. Table 3 Formula expression of each penalty term. Penalty term 1 Penalty term 2 Penalty term 3 Penalty term 4 a ¼ h PN i¼1rixi b ¼ p PN i¼1CðoliÞxi c ¼ q PN i¼1MðiliÞxi k ¼ maxðl PN i¼1MAXðLi; Mi; HiÞxi; 0Þ þ maxð0; PN i¼1MAXðLi; Mi; HiÞxi uÞ 10392 T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396
T-C Huang et aL/ Expert Systems with Applications 36 (2009)10388-10396 10393 ith blog article is selected by a particle, xi is set to one; otherwise, it 3.3. 4. Step 4: Updating positions Is set to zero The final step is to update each particles position to find a b er solution. According to the velocity calculated in the previous 3.3.2. Step 2. Calculating the objective value using the revised objective step, the position update is calculated as follows: Initially, each particle randomly enters the articles into the vec Y(t+1)=y()+nh(t) objective function. Particles with larger objective values indicate 3.3.5. Step 5: Non-sequential blog article recommendation positions nearer the solution. During the initial swarm, every par- While a user-defined generation number is reached, particles ticle keeps the personal best solution(pbest ) and the global best with gbest position find the global best solution. The recommended solution(gbest) for the next generation of the process. In the next blog articles are contained in the best solution. St ently, these generation, the velocity and position of each particle is updated non-sequential blog articles are provided to the pproach the global solution. 4. Evaluations 3.3.3. Step 3. Updating velocities In the whole swarm both social and cognition models are ap- This section demonstrates the system and two experiment plied to update each particle's velocity. The following equation, First, the system demonstration shows the user interface of the hich consists of three parts, presents the updating of the velocity system that presents the recommended results and the other two of the ith particle at iteration t. operation areas. The operation areas include rss feed management v(=vi(t-1)+q, u,(pbest, -y, (t-1)+q, u2(gbest -y,(t-1) and personal bookmark management. Additionally, the several (8) more detail. Two experiments were conducted to measure the im- The first part is the inertia of the ith particle comes from the pact of population size (ie. the number of blog articles)on effi previous generation. The second part uses the cognition model to ciency and user satisfaction associated with the system design describe the thinking of ith particle. The third part applies the so- and recommended content cial model to describe the message sharing and cooperation among particles. 1 and 2 are acceleration constants that stand for the 4.1. System demonstration degree to which the particle is directed towards a good position The values of these two constants affect the degree to which the As shown in Fig. 5, the user interface consists of three operation particle's personal best solution and the global best solution, areas: the RSS Subscription area, which is located in the upper-left respectively influence its movement. H1 and u2 are random num- side, the Personal Bookmark area, which is located in the lower-left bers with uniformly distributed characteristics between 0 and 1 side, and the recommended results, which are displayed in the yit-1)denotes the previous position of the ith particle right side of the window. The user can manually add an RSS feed RSS aggregator collects feeds(manually added and automatically formed) RSS feed:□ Rss suberi Waned won Personal blog article Recommended blog articles under collection a specific topic Fig. 5. Annotated user interface for blog article recommendation
ith blog article is selected by a particle, xi is set to one; otherwise, it is set to zero. 3.3.2. Step 2. Calculating the objective value using the revised objective function Initially, each particle randomly enters the articles into the vector. Then, the objective value is calculated using the above revised objective function. Particles with larger objective values indicate positions nearer the solution. During the initial swarm, every particle keeps the personal best solution (pbesti) and the global best solution (gbest) for the next generation of the process. In the next generation, the velocity and position of each particle is updated to approach the global solution. 3.3.3. Step 3. Updating velocities In the whole swarm both social and cognition models are applied to update each particle’s velocity. The following equation, which consists of three parts, presents the updating of the velocity of the ith particle at iteration t. viðtÞ ¼ viðt 1Þ þ u1u1ðpbesti yi ðt 1ÞÞ þ u2u2ðgbest yiðt 1ÞÞ ð8Þ The first part is the inertia of the ith particle comes from the previous generation. The second part uses the cognition model to describe the thinking of ith particle. The third part applies the social model to describe the message sharing and cooperation among particles. /1 and /2 are acceleration constants that stand for the degree to which the particle is directed towards a good position. The values of these two constants affect the degree to which the particle’s personal best solution and the global best solution, respectively influence its movement. l1 and l2 are random numbers with uniformly distributed characteristics between 0 and 1. yi(t 1) denotes the previous position of the ith particle. 3.3.4. Step 4: Updating positions The final step is to update each particle’s position to find a better solution. According to the velocity calculated in the previous step, the position update is calculated as follows: Yiðt þ 1Þ ¼ yi ðtÞ þ viðtÞ ð9Þ 3.3.5. Step 5: Non-sequential blog article recommendation While a user-defined generation number is reached, particles with gbest position find the global best solution. The recommended blog articles are contained in the best solution. Subsequently, these non-sequential blog articles are provided to the user. 4. Evaluations This section demonstrates the system and two experiments. First, the system demonstration shows the user interface of the system that presents the recommended results and the other two operation areas. The operation areas include RSS feed management and personal bookmark management. Additionally, the several fields that contain the recommended results are introduced in more detail. Two experiments were conducted to measure the impact of population size (i.e. the number of blog articles) on effi- ciency and user satisfaction associated with the system design and recommended content. 4.1. System demonstration As shown in Fig. 5, the user interface consists of three operation areas: the RSS Subscription area, which is located in the upper-left side, the Personal Bookmark area, which is located in the lower-left side, and the recommended results, which are displayed in the right side of the window. The user can manually add an RSS feed Fig. 5. Annotated user interface for blog article recommendation. T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396 10393
10394 T.-C Huang et aL/ Expert Systems with Applications 36(2009)10388-10396 in to the rss Subscription area via RSS Aggregation. Also, the spe- cific topic related to recommended blog articles is automatically forms an RSS feed, which is also shown in RSS Subscription area. 25 The recommended results depend on a personal interested topic shown in""Current Topic". After reading the recommended blo articles useful articles can be marked, using the marking function 88100 (the yellow star mark in Fig. 5). for collection into a personal bookmark list for easy reading. Thus, the user can efficiently man- age bookmarks in the personal bookmark area. The recommended results are divided into several fields for clear presentation. The 小今今°小求→ fields contain " Google Rank","Blog Name."Article Title Number of candidate population size Description","Author", "Post Date", and"Marking". The "Google fig. 6. The results of average execution time with according article population size. Rank"field indicates the article's ranking from the google ble earch engine. The fields, "Blog Name"andArticle Title",are hyperlink fields that link to the source blog and article. The Doll Torkzadeh, 1988: Kerlinger, 1978). The items with lower Description"field provides a brief description of the blog article. discriminant validity were be deleted from the devised scale. The Using the brief description of the article, the user can preview each results of the analysis and discussion are presented following the article, thus reducing the effort and time needed to browse articles. data analysis process. In addition, the Author"and"Post Date"fields provide useful information about the articles authors and the date of posting. 4.3.1. Factor analysis The " Marking"field, as mentioned above, enables the user to man The scale used to measure user satisfaction of system and rec ually add useful articles to the bookmark list. The yellow star indi- ommended content is presented in Table 4. Eighteen items were cates that an article has been added to the bookmark list included in the scale 4. 2. Analytical experiments and efficiency evaluations Table 4 Measures of user satisfaction-oniginal 18-items Since searched results may range from several thousand to sev- eral hundred thousand blog articles, it can be a time-consuming 2. I am satisfied with the system task to calculate the optimal solution. The goal of the analytical 3. The system is not troublesome at all experiments was to find a reasonable population size by analyzing 4. The system is easy to interact with the relationship between the population size and time efficiency. 5. The system is friendl 1000 to 50,000. The boundary parameters, h, p, and g were calcu- 7. The sys 8. The system provides an appropriate quantity of content lated by the average values product of the total number of articles of a certain searched result. Since the boundary parameters I and u 10. The system provides content that seems to be just about exactly what I need are used to evaluate the range of the number of comments associ- 1. I think the output is presented in a useful format 12. I am happy with the layout of the output ated with a set of blog articles, the values of I and u are set to a 13. The content I received requires correction ninus and a plus standard deviation from the mean, respectively. 14. The content is relevant to my learning topics Twenty particles were deployed and the generation termination 15. I feel the recommended content is reliable number was set to 100. The initial weights of the penalty terms 16. Through the recommended content, I gain more opportunities to interact were all set to 0.01 (i.e. O1=02=O3=(4=0.01). The experimen- tal testbed was a server computer with a Core 2 Quad 2. 4-GHz CPU, 18 Through recommended content, I found useful information from others' 4096-MB RAM. and a 500 GB hard disk. blogs Fig 6 shows the average execution times with sets of blog arti- cles ranging from 1000 to 50,000. We can see from the graph that at a population size of 20,000, the average execution time was al- most four times than that of 10,000 blog articles (i.e. 624s VS. Table5 168 s). From a time-consumption perspective, this is an obviou Rotated factor matrix of 18-item scale. turning point. A population size of 10,000 could serve as an upper Item Usability Interaction Requirement Format Accuracy Social bound to limit the computation load on the server. 313 4.3. Analytical experiments in user satisfaction 466 faction with h e system design ab reco mended cont ent was aso B-313729 measured. For this, an 18 instrument, five-point Likert-type scale 13 -472 vas designed to measure user satisfaction. The experiment partic- pants included 367 students from the Department of Nursing. 313 epartment of Engineering Science, and Department of Computer R4 Taiwan. In order to examine the discriminant validity of the instru- F2-393396 Science and Information Engineering in two large universities in FI ment, a factor analysis was conducted( Campbell Fiske, 1959: A2 332542 For interpretation to color in Fig. 5. the reader is referred to the web version of S3 15
in to the RSS Subscription area via RSS Aggregation. Also, the specific topic related to recommended blog articles is automatically forms an RSS feed, which is also shown in RSS Subscription area. The recommended results depend on a personal interested topic shown in ‘‘Current Topic”. After reading the recommended blog articles useful articles can be marked, using the marking function (the yellow star mark in1 Fig. 5), for collection into a personal bookmark list for easy reading. Thus, the user can efficiently manage bookmarks in the personal bookmark area. The recommended results are divided into several fields for clear presentation. The fields contain ‘‘Google Rank”, ‘‘Blog Name”, ‘‘Article Title”, ‘‘Description”, ‘‘Author”, ‘‘Post Date”, and ‘‘Marking”. The ‘‘Google Rank” field indicates the article’s ranking from the Google blog search engine. The fields, ‘‘Blog Name” and ‘‘Article Title”, are hyperlink fields that link to the source blog and article. The ‘‘Description” field provides a brief description of the blog article. Using the brief description of the article, the user can preview each article, thus reducing the effort and time needed to browse articles. In addition, the ‘‘Author” and ‘‘Post Date” fields provide useful information about the articles authors and the date of posting. The ‘‘Marking” field, as mentioned above, enables the user to manually add useful articles to the bookmark list. The yellow star indicates that an article has been added to the bookmark list. 4.2. Analytical experiments and efficiency evaluations Since searched results may range from several thousand to several hundred thousand blog articles, it can be a time-consuming task to calculate the optimal solution. The goal of the analytical experiments was to find a reasonable population size by analyzing the relationship between the population size and time efficiency. In the experiment, the evaluated population size range was from 1000 to 50,000. The boundary parameters, h, p, and q were calculated by the average values product of the total number of articles of a certain searched result. Since the boundary parameters l and u are used to evaluate the range of the number of comments associated with a set of blog articles, the values of l and u are set to a minus and a plus standard deviation from the mean, respectively. Twenty particles were deployed and the generation termination number was set to 100. The initial weights of the penalty terms were all set to 0.01 (i.e. x1 = x2 = x3 = x4 = 0.01). The experimental testbed was a server computer with a Core 2 Quad 2.4-GHz CPU, 4096-MB RAM, and a 500 GB hard disk. Fig. 6 shows the average execution times with sets of blog articles ranging from 1000 to 50,000. We can see from the graph that at a population size of 20,000, the average execution time was almost four times than that of 10,000 blog articles (i.e. 624 s vs. 168 s). From a time-consumption perspective, this is an obvious turning point. A population size of 10,000 could serve as an upper bound to limit the computation load on the server. 4.3. Analytical experiments in user satisfaction In addition to evaluation of system’s performance, users’ satisfaction with the system design and recommended content was also measured. For this, an 18 instrument, five-point Likert-type scale was designed to measure user satisfaction. The experiment participants included 367 students from the Department of Nursing, Department of Engineering Science, and Department of Computer Science and Information Engineering in two large universities in Taiwan. In order to examine the discriminant validity of the instrument, a factor analysis was conducted (Campbell & Fiske, 1959; Doll & Torkzadeh, 1988; Kerlinger, 1978). The items with lower discriminant validity were be deleted from the devised scale. The results of the analysis and discussion are presented following the data analysis process. 4.3.1. Factor analysis The scale used to measure user satisfaction of system and recommended content is presented in Table 4. Eighteen items were included in the scale. Fig. 6. The results of average execution time with according article population size. Table 4 Measures of user satisfaction-original 18-items. 1. I found the system dependable 2. I am satisfied with the system 3. The system is not troublesome at all 4. The system is easy to interact with 5. The system is friendly 6. The system is easy to use 7. The system provides the precise content I need 8. The system provides an appropriate quantity of content 9. The recommended content meets my needs 10. The system provides content that seems to be just about exactly what I need 11. I think the output is presented in a useful format 12. I am happy with the layout of the output 13. The content I received requires correction 14. The content is relevant to my learning topics 15. I feel the recommended content is reliable 16. Through the recommended content, I gain more opportunities to interact with other bloggers 17. I would recommend the content offered by the system to my classmates 18. Through recommended content, I found useful information from others’ blogs Table 5 Rotated factor matrix of 18-item scale. Item code Usability Interaction Requirement Format Accuracy Social interaction U1 .823 .313 U2 .713 .342 U3 .466 .411 I1 .849 .393 I2 .313 .796 I3 .472 .570 .552 R1 .342 .743 R2 .856 R3 .313 .632 .359 R4 .401 .342 .401 F1 .396 .705 F2 .393 .820 A1 .342 .870 A2 .919 A3 .421 .332 .542 S1 .857 S2 .407 .793 S3 .915 1 For interpretation to color in Fig. 5, the reader is referred to the web version of this article. 10394 T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396
T-C Huang et aL/ Expert Systems with Applications 36 (2009)10388-10396 easy operation. The results also indicated that the system could of- The user satisfaction scale. fer reliable performance in an online platform with hundreds of users due to the background computing. Some of users pointed out that the system should include resource management in the Personal Bookmark function. Respondents requested more op- Interaction interact with tions such as adding, editing, and deleting topics and drag-an 2. The system drop records. Content Only 42% users responded positively to the precision of the rec- Requirement 1. The system provides the precise content I need. 2. The system provides appropriate quantity of content ommended content(RI). After looking at qualitative comments, Format 1. I think the output is presented in a useful format users reflected that it was difficult to find answers via recom- I am happy with the layout of the outp mended content when they faced problems. This is because the Accuracy 1. The content I received requires correction proposed recommending approach was not designed to help learn- 2. The content is relevant to my learning topics ers solve problems. Additionally, 43% of users disagreed with the ociateraction. to otg cth w en omer bde entent, I gain more opportunities R2 item, possibly because it took too much time to look at all of 2. I would recommend the content offered by the system to my classmates suggestions should be limited to an appropriate or user-defined 3. Through recommended content, I found useful information quantity. from others 'blogs Nearly 80% of users agreed that the format of the recommended content was useful and well presented(F1-F2). Those users felt that the layout of the output made it convenient to find the sources Using principal components analysis, the response data from of blog articles, briefly review the content, and determine the time- the 367 samples were examined. The results showed six factors liness of the content. with eigen values greater than one. These six factors were inter- Thirty-one percent of users thought some of recommended con- preted as usability, interaction, requirement, format, accuracy, tent required correction, especially concerning health care and nd social interaction. Table 5 shows the loadings of the 18-items nutrition support topics in the nursing domain(Al). Since some n each factor. It should be noted that factor loadings less than 30 blog articles regarding health care and nutrition support are cul- were not included in Table 5. To avoid blurring the distinction be- tural-based, this result may be attributed to differences between tween factors, items with factor loadings greater than 30 on three cultures and countries in the domain Other than the nursing users. or more items were deleted from the above measures. Hence items most of the users in computer science and engineering science ac- U3. 13. R3 R4 and a3 were deleted from the scale. cepted the recommended content. For the second item in the accu- Finally, the measurements consisted of two categories: system racy aspect(A2),86% of users felt that the recommended content and content. The system category contained four items in terms was relevant to the topics that they were studying. These findings of usability and interaction aspects, while the content category are in line with our expectation, so we conclude that the proposed contained nine items with four aspects: requirement, format, accu- recommending mechanism was successfuL. racy, and social interaction, as shown in Table 6. Notice that each More than 65% of users reported positive perceptions of the so- aspect is named according to the implicit meanings and similar cial interaction facilitation of recommended content(S1-S3). These oncepts of items(Pett, Lackey, Sullivan, 2003). findings suggest that recommended content not only assisted learners to increase available learning resources, but also to help 43.2 Results and discussion them interact socially with other people. Users could be trained The results of the analytical experiment in user satisfaction are to develop their knowledge organization and expansion abilities shown in Table 7, and the users also provided a few qualitative the recommended content comments about the system and recommended content The results of this experiment gave us a better understanding of Most of users gave positive responses regarding usability of, and the limitations about the systems design and the necessary. There- interaction with, the system(Ul-12). Nearly 70% of users re- fore, the results should be taken into account in future studies. But, sponded that the system provided a well-designed interface and in sum, we can conclude that the system and recommended con- tent were well accepted by most of the users. Table 7 The results of user satisfaction. 5. Conclusion and future work Item code SA &A(g) Moderate(‰ In this study a PSo-based algorithm, SBACPSO, was employed in a mechanism used to recommend blog articles. The use of this 77 algorithm is extended from our last study. The SBACPSO searches for blog articles to be used to as secondary learning materials auto- matically and intelligently. The target of this study focuses on Content automated detection and recommendation of "relevant articles 42 33 3.2 r reading by the user. In the proposed method IR-based tech- 2.9 niques are used to extract keywords from the blog entries read and written by the users. The positions of the keywords and the types of articles(read or written) were also considered, and a prop- er weighting was used to distinguish the importance of different keywords. In addition, the forgetting rate proposed by psychologist S2 Hermann Ebbinghaus was adopted to simulate changes to the 3.8 learning topic and thus change the recommended contents.Then sn stale of 1rs rate: (1 strongly disagree 2-disagree, 3- moderate, 4- agree. ble search s inset mareo er the ustr ade d seec so a orthe
Using principal components analysis, the response data from the 367 samples were examined. The results showed six factors with eigen values greater than one. These six factors were interpreted as usability, interaction, requirement, format, accuracy, and social interaction. Table 5 shows the loadings of the 18-items on each factor. It should be noted that factor loadings less than .30 were not included in Table 5. To avoid blurring the distinction between factors, items with factor loadings greater than .30 on three or more items were deleted from the above measures. Hence, items U3, I3, R3, R4, and A3 were deleted from the scale. Finally, the measurements consisted of two categories: system and content. The system category contained four items in terms of usability and interaction aspects, while the content category contained nine items with four aspects: requirement, format, accuracy, and social interaction, as shown in Table 6. Notice that each aspect is named according to the implicit meanings and similar concepts of items (Pett, Lackey, & Sullivan, 2003). 4.3.2. Results and discussion The results of the analytical experiment in user satisfaction are shown in Table 7, and the users also provided a few qualitative comments about the system and recommended content. Most of users gave positive responses regarding usability of, and interaction with, the system (U1–I2). Nearly 70% of users responded that the system provided a well-designed interface and easy operation. The results also indicated that the system could offer reliable performance in an online platform with hundreds of users due to the background computing. Some of users pointed out that the system should include resource management in the ‘Personal Bookmark’ function. Respondents requested more options such as adding, editing, and deleting topics and drag-anddrop records. Only 42% users responded positively to the precision of the recommended content (R1). After looking at qualitative comments, users reflected that it was difficult to find answers via recommended content when they faced problems. This is because the proposed recommending approach was not designed to help learners solve problems. Additionally, 43% of users disagreed with the R2 item, possibly because it took too much time to look at all of the recommended content. This suggests that the number of article suggestions should be limited to an appropriate or user-defined quantity. Nearly 80% of users agreed that the format of the recommended content was useful and well presented (F1–F2). Those users felt that the layout of the output made it convenient to find the sources of blog articles, briefly review the content, and determine the timeliness of the content. Thirty-one percent of users thought some of recommended content required correction, especially concerning health care and nutrition support topics in the nursing domain (A1). Since some blog articles regarding health care and nutrition support are cultural-based, this result may be attributed to differences between cultures and countries in the domain. Other than the nursing users, most of the users in computer science and engineering science accepted the recommended content. For the second item in the accuracy aspect (A2), 86% of users felt that the recommended content was relevant to the topics that they were studying. These findings are in line with our expectation, so we conclude that the proposed recommending mechanism was successful. More than 65% of users reported positive perceptions of the social interaction facilitation of recommended content (S1–S3). These findings suggest that recommended content not only assisted learners to increase available learning resources, but also to help them interact socially with other people. Users could be trained to develop their knowledge organization and expansion abilities via the recommended content. The results of this experiment gave us a better understanding of the limitations about the system’s design and the necessary. Therefore, the results should be taken into account in future studies. But, in sum, we can conclude that the system and recommended content were well accepted by most of the users. 5. Conclusion and future work In this study, a PSO-based algorithm, SBACPSO, was employed in a mechanism used to recommend blog articles. The use of this algorithm is extended from our last study. The SBACPSO searches for blog articles to be used to as secondary learning materials automatically and intelligently. The target of this study focuses on automated detection and recommendation of ‘‘relevant articles for reading by the user”. In the proposed method IR-based techniques are used to extract keywords from the blog entries read and written by the users. The positions of the keywords and the types of articles (read or written) were also considered, and a proper weighting was used to distinguish the importance of different keywords. In addition, the forgetting rate proposed by psychologist Hermann Ebbinghaus was adopted to simulate changes to the learning topic and thus change the recommended contents. Then the keywords most related to the user are entered into Google’s blog search engine. Moreover, the extended SBACPSO algorithm Table 6 The user satisfaction scale. System Usability 1. I found the system dependable 2. I am satisfied with the system Interaction 1. The system is easy to interact with 2. The system is friendly Content Requirement 1. The system provides the precise content I need. 2. The system provides appropriate quantity of content Format 1. I think the output is presented in a useful format 2. I am happy with the layout of the output Accuracy 1. The content I received requires correction 2. The content is relevant to my learning topics Social interaction 1. Through the recommended content, I gain more opportunities to interact with other bloggers 2. I would recommend the content offered by the system to my classmates 3. Through recommended content, I found useful information from others’ blogs Table 7 The results of user satisfaction. Item code SA & A (%) Moderate (%) D & SD (%) Mean System U1 77 12 11 3.9 U2 62 26 12 3.7 I1 72 17 11 4.0 I2 72 21 7 3.3 Content R1 42 33 25 3.2 R2 35 22 43 2.9 F1 82 10 8 4.1 F2 78 15 7 4.0 A1 31 36 33 3.0 A2 86 10 4 4.2 S1 75 20 5 4.1 S2 76 13 11 3.9 S3 75 5 20 3.8 On a scale of 1–5 rate: (1 = strongly disagree, 2 = disagree, 3 = moderate, 4 = agree, 5 = strongly agree). T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396 10395
10396 T.-C Huang et aL/ Expert Systems with Applications 36 (2009)10388-10396 was employed to collect a high standard of articles from blogs for Bull, G. Bull, G,& Kajder, s(2003). Writing with weblogs -Reinventing st recommendation to users. This is different from our previous study journals. Learning and Leading With Technology: The ISTE Journal of educatio in that users are not required to subscribe to an RSS feed in order to Campbell, D T.& Fiske, D. w (1959). Convergent and discriminant validation by receive the periodically updated contents. Also, the search topic appears in a different display that allows users to manage the rec- Chen, JN Huang Y M,& Chu, w C(2005). Applying dynamic fuzzy petri net to conveniently. provide adequate results. Firstly, the interface of the system wa satisfaction. MIS Quarterly, 12(2 introduced. The recommended blog articles and the user interface Dron, 1(2003). The blog and the borg: A collective approach to e-learning. In functions were also described Our performance analysis showed that the extended SBACPso model performs best when the data to experimental psychology. New s College, Columbia Univers volume equals 10,000 blog articles, and 20 particles with 100 gen- Fernheimer, J. w,& Nelson, T J( 2005) Bridging the composition divide: Blog rations, which gives us a benchmark for efficiency. We also used the results of the user satisfaction analysis to analyze the system Ganley. B: (2004. Blogging as dynamic transformative medium in and the recommended content. Most users responded positive liberal arts classroom. In Proceedings of the BlogTalk 2.0 conference, Vienna, to the design of the interface operation of the system. Moreover, ome users provided suggestions that will be considered for imple mentation in the system in future. In addition, we measured four 4 workshop on the aspects of user satisfaction about recommended content including Hall, H,& Davison, B(2007). Social software equirement, format, accuracy, a and social interaction the results environments: The value of the blog as a tool for reppcorit en hybrid learning showed that the recommended content changing model utilized. support Library and information Science Research, 29(2). 163-187. Huang, T. C, Huang, Y M,& Cheng. S.C(2008) Automatic and interactive e- used the forgetting curve, worked as expected. Users also ed the following improvements for the recommended Expert Systems with Applications, 35(4), 2113-212 Huang, Y M, Chen, J N, Kuo, Y H,& Jeng. Y L(2008) An intelligent human-expert forum system based zy information retrieval technique Expert Systems (1)Content that should be more accurate and accord with per- Hwang G-J- Yin, P Y.& Yeh, $. H(2005).A tabu search approach to generating test sonal needs for multiple assessment criteria. IEEE Transactions on Education, 49(1) (2)The system should include a verification mechanism to heck the accuracy of the recommended content. te 22nd ASCll (3)The amount of recommended content should be changed for Li x. Liu B. Yu, p. (2006. Mining community structure of named entities from ings of the AAAl spring 2006 symposia on These suggestions imply a move towards a problem-oriented Lin, w.J. Yueh. H P, Liu, y L MurakamI, M, Kakusho, K& Mino, M(2006).Blog blog article recommending mechanism, which would improve its CiCEr 2006(p. 290-292manonal conlerence on advanced leaming technologies the achievements of artificial intelligence as a recommendation Martindale, T-& wiley. D A(2004) An introduction to teaching with weblogs. mechanism for blogs, we believe that if a semantic analysis can be conducted, it can not only provide more accurate recommended eme pattern mining on weblogs. In Proceedings of the www06 workshop on content to the users, but also provide them with increased educa- tional utility. These areas will require further study. Nalin, S.& better tour planning. tng14(2).157-168. of factor analysis for instrument development in health care research. Thous Acknowledgement Prabowo, R,& Thelwall, M. (2006). A comparison of feature selection methods for The authors would like to thank the national science council of therepublicofChinaforfinanciallysupportingthisresearchunderSmithK.(2007)Weblogsinhighereducation<http://www.mchron.net/site/ Contract No NSC 95-2221-E-006-30 Stiler, G M,& Philleo, T (2003). Blogging and blogspots: An alternative Thelw Hasler, L(2007 Blog search engines. Online Informatic 31(4).467-479 Brooks, K. Nichols, C.& Priebe, S(2003). Remediation, genre, and motivation: Key williams. ]. B Jacobs, .(2004). Exploring the use of blogs as learning spaces in the conceptsforteachingwithweblogs.<http://blog.libumn.edublogosphere education sector. Au Technology, 20(2)
was employed to collect a high standard of articles from blogs for recommendation to users. This is different from our previous study in that users are not required to subscribe to an RSS feed in order to receive the periodically updated contents. Also, the search topic appears in a different display that allows users to manage the recommended articles more conveniently. The evaluations reported in this paper showed that the blog recommendation mechanism can be practically implemented and provide adequate results. Firstly, the interface of the system was introduced. The recommended blog articles and the user interface functions were also described. Our performance analysis showed that the extended SBACPSO model performs best when the data volume equals 10,000 blog articles, and 20 particles with 100 generations, which gives us a benchmark for efficiency. We also used the results of the user satisfaction analysis to analyze the system and the recommended content. Most users responded positively to the design of the interface operation of the system. Moreover, some users provided suggestions that will be considered for implementation in the system in future. In addition, we measured four aspects of user satisfaction about recommended content including requirement, format, accuracy, and social interaction. The results showed that the recommended content changing model utilized, which used the forgetting curve, worked as expected. Users also proposed the following improvements for the recommended content (1) Content that should be more accurate and accord with personal needs. (2) The system should include a verification mechanism to check the accuracy of the recommended content. (3) The amount of recommended content should be changed for ease of use. These suggestions imply a move towards a problem-oriented blog article recommending mechanism, which would improve its educational utility. Furthermore, although we have already studied the achievements of artificial intelligence as a recommendation mechanism for blogs, we believe that if a semantic analysis can be conducted, it can not only provide more accurate recommended content to the users, but also provide them with increased educational utility. These areas will require further study. Acknowledgement The authors would like to thank the National Science Council of the Republic of China for financially supporting this research under Contract No. NSC 95-2221-E-006-307. References Brooks, K., Nichols, C., & Priebe, S. (2003). Remediation, genre, and motivation: Key concepts for teaching with weblogs. . Bull, G., Bull, G., & Kajder, S. (2003). Writing with weblogs – Reinventing student journals. Learning and Leading With Technology: The ISTE Journal of Educational Technology Practice and Policy, 31(1), 32–35. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(1), 81–105. Chen, J. N., Huang, Y. M., & Chu, W. C. (2005). Applying dynamic fuzzy petri net to web learning system. Interactive Learning Environments, 13(3), 159–178. Chen, Y., Tsai, F. S., & Chan, K. L. (2008). Machine learning techniques for business blog search and mining. Expert Systems with Applications, 35(3), 581–590. Doll, W. J., & Torkzadeh, G. (1988). The measurement of end-user computing satisfaction. MIS Quarterly, 12(2), 259–274. Dron, J. (2003). The blog and the borg: A collective approach to e-learning. In Proceedings of the world conference on e-learning in Corp., Gov., Healthcare, and Higher Education (pp. 440–443). Ebbinghaus, H. (1913). Memory: A contribution to experimental psychology. New York: Teachers College, Columbia University. Fernheimer, J. W., & Nelson, T. J. (2005). Bridging the composition divide: Blog pedagogy and the potential for agonistic classrooms. . Ganley, B. (2004). Blogging as a dynamic, transformative medium in an American liberal arts classroom. In Proceedings of the BlogTalk 2.0 conference, Vienna, Austria. Glance, N. S., Hurst, M., & Tomokiyo, T. (2004). BlogPulse: Automated trend discovery for weblogs. In Proceedings of the WWW ‘04 workshop on the weblogging ecosystem: Aggregation, analysis and dynamics. Hall, H., & Davison, B. (2007). Social software as support in hybrid learning environments: The value of the blog as a tool for reflective learning and peer support. Library and Information Science Research, 29(2), 163–187. Huang, T. C., Huang, Y. M., & Cheng, S. C. (2008). Automatic and interactive elearning auxiliary material generation utilizing particle swarm optimization. Expert Systems with Applications, 35(4), 2113–2122. Huang, Y. M., Chen, J. N., Kuo, Y. H., & Jeng, Y. L. (2008). An intelligent human-expert forum system based on fuzzy information retrieval technique. Expert Systems with Applications, 34(1), 446–458. Hwang, G. J., Yin, P. Y., & Yeh, S. H. (2006). A tabu search approach to generating test sheets for multiple assessment criteria. IEEE Transactions on Education, 49(1), 88–97. Instone, L. (2005). Conversations beyond the classroom: Blogging in a professional development course. In Proceedings of the 22nd ASCILITE conference. Kerlinger, F. N. (1978). Foundations of behavioral research. New York: McGraw-Hill. Li, X., Liu, B., & Yu, P. S. (2006). Mining community structure of named entities from web pages and blogs. In Proceedings of the AAAI spring 2006 symposia on computational approaches to analysing weblogs. Lin, W. J., Yueh, H. P., Liu, Y. L., Murakami, M., Kakusho, K., & Minoh, M. (2006). Blog as a tool to develop e-learning experience in an international distance course. In Proceedings of the 6th international conference on advanced learning technologies (ICALT 2006) (pp. 290–292). Martindale, T., & Wiley, D. A. (2004). An introduction to teaching with weblogs. . Mei, Q., Liu, C., Su, H., & Zhai, C. (2006). A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the WWW’06 workshop on the weblogging ecosystem: Aggregation, analysis and dynamics. Nalin, S., & Mohan, P. (2008). Tourism blog visualiser for better tour planning. Journal of Vacation Marketing, 14(2), 157–168. Pett, M. A., Lackey, N. R., & Sullivan, J. J. (2003). Make sense of factor analysis: The use of factor analysis for instrument development in health care research. Thousand Oaks: Sage. Prabowo, R., & Thelwall, M. (2006). A comparison of feature selection methods for an evolving RSS feed corpus. Information Processing and Management, 42(6), 1491–1512. Smith, K. (2007). Weblogs in higher education. Retrieved 18.06.08. Stiler, G. M., & Philleo, T. (2003). Blogging and blogspots: An alternative format for encouraging reflective practice among pre-service teachers. Education, 123(4), 789–797. Thelwall, M., & Hasler, L. (2007). Blog search engines. Online Information Review, 31(4), 467–479. Williams, J. B., & Jacobs, J. (2004). Exploring the use of blogs as learning spaces in the higher education sector. Australasian Journal of Educational Technology, 20(2), 232–247. 10396 T.-C. Huang et al. / Expert Systems with Applications 36 (2009) 10388–10396