non-stop words (see Section 3). Text _中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Mining ideas from textual information

正在加载图片...

D. Thorleuchter et aL Expert Systems with Applications 37(2010)7182-7188 non-stop words(see Section 3). Text patterns should not be too These files can be formatted in various ways e.g. as plain text, html, small so that they contain all terms representing a new and useful xml etc. However, scripting code. (html-or xml-)tags, and images ea. Additionally, text patterns should not be too large so that fur- are discarded that means the application extracts plain text from ther terms occur in the text patterns that are not related to the new the provided files. Then, the user has to select the language of these nd useful idea To find out an optimal size of text patterns, we cre- texts to integrate a general stop word list of this language. The ate text patterns from several patent descriptions by using differ- application offers general stop word lists in English, German, ent values for I and for the percentages u and 1. A human expert Dutch, Spain and French. After determining the parameters of the checks the different length of these text patterns for an optimal application the automatically extraction of new and useful ideas size. He gets the best results by setting the value of text pattern from the new text starts as described in the idea mining process length I to 7 terms and the percentage u to 50% and v to 100%. (see Sections 3 and 4). As a result, new ideas are presented as de- Then, the approach extracts automatically about 200 new ideas scribed in Section 5 from the 40 randomly selected patents. To cluster these results, means and purposes are assigned to scientific categories in the sci- 9. Conclusions and future research ence citation index and examples are presented below. Several ideas are identified that uses methods from 'Artificial Intelligence'(mean This study shows the success of an automatic approach for find for applications in'Health Care Sciences and Services'(purpose). We ing new ideas from textual information. For this, the study trans- also identify new ideas using'lmaging Science and Photographic forms creativity approaches from psychology and cognitive ideas use techniques from ' Remote Sensing (mean)in the field of redefine an abstract term (an idea)in a concrete way that it can Tropical Medicine(purpose). Additionally, several ideas use be used for computing with text mining methods In detail, it is Computer Science, Theory and Methods'(mean) for applications nown that a technological idea represents a combination of a pur- in 'Psychiatry'(purpose). Furthermore, methods from 'Artificial Intelligence (mean)are used for Automation and Control Systems' Pose and a mean and that purposes and means are defined by a Additionally, it is shown that problems and problem solution To evaluate these results, we use precision and recall measures ideas can be represented as term vectors in vector space model commonly used in information retrieval based on true positives, For this, the study contributes a new(idea mining)measure.This false positives and false negatives. For this, we have to define the measure identifies new ideas by comparing vectors that represent ground truth for our evaluation. Therefore, a human expert also a problem to vectors that represent a problem solution idea.Last, it identifes new and useful ideas from these patents manually that is shown that approaches from comprehensibility research can be means without using our idea mining approach. He uses the idea adopted to this approach to present the new ideas in a comprehen definition in Section 1.2. This means, he checks each text pattern sible way to the user. As further main finding. it is demonstrated for finding terms representing a known mean(purpose)and terms that this theoretical approach can be realized by a web-based representing an unknown purpose( mean). These results are the application. The success of the idea mining measure is proved by ground truth for the evaluation. omparing it to further heuristic measures(overlap-index For each patent, we compute its precisi d recall values by sine-similarity and dice-similarity) using the idea mining measure and by using the Jaccards coeffi Directions for future research are given by the fact that nowa cient. Then, we compute the average precision and recall val lays there a result, we get a precision value of 40% and a recall value of the internet and this information probably contains many new 25% by using the idea mining approach with the idea mining mea- technological ideas. Enlarging this approach to a web idea mining sure.A precision value of 40% means that if the idea mining ap- approach that automatically identifies problem solution ideas from extracts ten text patterns then four of them represent the internet is an interesting topic for further research d useful idea. a recall value of 25% means that if there four new and useful ideas in the new text then the idea mining ap- and the idea mining measure can probably be enlarged with fur- proach extracts only one of them. In contrast to this, we get a pre- ther aspects to improve its quality that means to get better results cision value of 30% and a recall value of 20% by using Jaccard's for the precision and recall values coefficient. This is because in some texts Jaccard's coefficient ex- A further aspect is to transform this idea mining approach to the tracts text patterns from the new text that are similar to text pat- colloquial language. For this, it is necessary that the idea definition rns from the problem description. This represents probably a also contains new product ideas from the consumers.Then,new known idea but not a new idea product ideas can be identified to support marketing activities. Beside jaccard, s coefficient, we also test other well-known heu- stic measures like overlap-index, cosine-similarity and dice-si Last, the approach can be extended with innovation-related as- pects. Then, extracted ideas can be classified as innovative ideas ilarity(Ferber, 2003)as baseline. However, we get nearly the same and might be used as starting point for the new product results for the precision(30%)and for the recall (20%)value 8. The idea mining application Acknowledgements We thank Joachim Schulze and Jorg Fenner for constructive The idea mining application focus on users without extensive technical comments perts. We give them the possibility to extract specifically problem references They can access to the web-based application via the internet. It is Coussement, K. Van den Poel, D (2008) Integrating availableunderhttp://www.text-mining.infoanditisprogrammed rough call center emails into a decision support system for churn prediction. An user has to provide two textual files, a problem description Coussement, K,& Van den poel, D (200 by integrating emotions from cl and a new text that probably consists of problem solution ideas. multiple classifiers Expert Systems with Applications, 36, 6127-6134non-stop words (see Section 3). Text patterns should not be too small so that they contain all terms representing a new and useful idea. Additionally, text patterns should not be too large so that further terms occur in the text patterns that are not related to the new and useful idea. To find out an optimal size of text patterns, we create text patterns from several patent descriptions by using different values for l and for the percentages u and v. A human expert checks the different length of these text patterns for an optimal size. He gets the best results by setting the value of text pattern length l to 7 terms and the percentage u to 50% and v to 100%. Then, the approach extracts automatically about 200 new ideas from the 40 randomly selected patents. To cluster these results, means and purposes are assigned to scientific categories in the science citation index and examples are presented below. Several ideas are identified that uses methods from ‘Artificial Intelligence’ (mean) for applications in ‘Health Care Sciences and Services’ (purpose). We also identify new ideas using ‘Imaging Science and Photographic Technology’ (mean) for ‘Medical Informatics’ purposes. Further ideas use techniques from ‘Remote Sensing’ (mean) in the field of ‘Tropical Medicine’ (purpose). Additionally, several ideas use ‘Computer Science, Theory and Methods’ (mean) for applications in ‘Psychiatry’ (purpose). Furthermore, methods from ‘Artificial Intelligence’ (mean) are used for ‘Automation and Control Systems’ purposes. To evaluate these results, we use precision and recall measures commonly used in information retrieval based on true positives, false positives and false negatives. For this, we have to define the ground truth for our evaluation. Therefore, a human expert also identifies new and useful ideas from these patents manually that means without using our idea mining approach. He uses the idea definition in Section 1.2. This means, he checks each text pattern for finding terms representing a known mean (purpose) and terms representing an unknown purpose (mean). These results are the ground truth for the evaluation. For each patent, we compute its precision and recall values by using the idea mining measure and by using the Jaccard’s coeffi- cient. Then, we compute the average precision and recall values. As a result, we get a precision value of 40% and a recall value of 25% by using the idea mining approach with the idea mining measure. A precision value of 40% means that if the idea mining approach extracts ten text patterns then four of them represent a new and useful idea. A recall value of 25% means that if there are four new and useful ideas in the new text then the idea mining approach extracts only one of them. In contrast to this, we get a precision value of 30% and a recall value of 20% by using Jaccard’s coefficient. This is because in some texts Jaccard’s coefficient extracts text patterns from the new text that are similar to text patterns from the problem description. This represents probably a known idea but not a new idea. Beside Jaccard’s coefficient, we also test other well-known heuristic measures like overlap-index, cosine-similarity and dice-similarity (Ferber, 2003) as baseline. However, we get nearly the same results for the precision (30%) and for the recall (20%) value. 8. The idea mining application The idea mining application focus on users without extensive knowledge in the text mining field as well as on text mining experts. We give them the possibility to extract specifically problem solution ideas for their own needs using this idea mining approach. They can access to the web-based application via the internet. It is available under http://www.text-mining.info and it is programmed in perl and ruby. An user has to provide two textual files, a problem description and a new text that probably consists of problem solution ideas. These files can be formatted in various ways e.g. as plain text, html, xml etc. However, scripting code, (html- or xml-) tags, and images are discarded that means the application extracts plain text from the provided files. Then, the user has to select the language of these texts to integrate a general stop word list of this language. The application offers general stop word lists in English, German, Dutch, Spain and French. After determining the parameters of the application the automatically extraction of new and useful ideas from the new text starts as described in the idea mining process (see Sections 3 and 4). As a result, new ideas are presented as described in Section 5. 9. Conclusions and future research This study shows the success of an automatic approach for finding new ideas from textual information. For this, the study transforms creativity approaches from psychology and cognitive science to text mining approaches. One main finding here is to redefine an abstract term (an idea) in a concrete way that it can be used for computing with text mining methods. In detail, it is shown that a technological idea represents a combination of a purpose and a mean and that purposes and means are defined by a combination of terms, which co-occur. Additionally, it is shown that problems and problem solution ideas can be represented as term vectors in vector space model. For this, the study contributes a new (idea mining) measure. This measure identifies new ideas by comparing vectors that represent a problem to vectors that represent a problem solution idea. Last, it is shown that approaches from comprehensibility research can be adopted to this approach to present the new ideas in a comprehensible way to the user. As further main finding, it is demonstrated that this theoretical approach can be realized by a web-based application. The success of the idea mining measure is proved by comparing it to further heuristic measures (overlap-index, cosine-similarity and dice-similarity). Directions for future research are given by the fact that nowadays there is a large amount of textual information available on the internet and this information probably contains many new technological ideas. Enlarging this approach to a web idea mining approach that automatically identifies problem solution ideas from the internet is an interesting topic for further research. Additionally, the parameters of the approach can be optimized and the idea mining measure can probably be enlarged with further aspects to improve its quality that means to get better results for the precision and recall values. A further aspect is to transform this idea mining approach to the colloquial language. For this, it is necessary that the idea definition also contains new product ideas from the consumers. Then, new product ideas can be identified to support marketing activities. Last, the approach can be extended with innovation-related aspects. Then, extracted ideas can be classified as innovative ideas and might be used as starting point for the new product development. Acknowledgements We thank Joachim Schulze and Jörg Fenner for constructive technical comments. References Coussement, K., & Van den Poel, D. (2008). Integrating the voice of customers through call center emails into a decision support system for churn prediction. Information and Management, 45, 165. Coussement, K., & Van den Poel, D. (2009). Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers. Expert Systems with Applications, 36, 6127–6134. D. Thorleuchter et al. / Expert Systems with Applications 37 (2010) 7182–7188 7187

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Mining ideas from textual information