Text patterns should not be too small so that they contain all terms representing a new and useful idea. Additionally, text patterns should not be too large so that fur￾ther terms occur in the text patterns that are not related to the new and useful idea. To find out an optimal size of text patterns, we cre￾ate text patterns from several patent descriptions by using differ￾ent values for l and for the percentages u and v. A human expert checks the different length of these text patterns for an optimal size. He gets the best results by setting the value of text pattern length l to 7 terms and the percentage u to 50% and v to 100%. Then, the approach extracts automatically about 200 new ideas from the 40 randomly selected patents. To cluster these results, means and purposes are assigned to scientific categories in the sci￾ence citation index and examples are presented below. Several ideas are identified that uses methods from ‘Artificial Intelligence’ (mean) for applications in ‘Health Care Sciences and Services’ (purpose). We also identify new ideas using ‘Imaging Science and Photographic Technology’ (mean) for ‘Medical Informatics’ purposes. Further ideas use techniques from ‘Remote Sensing’ (mean) in the field of ‘Tropical Medicine’ (purpose). Additionally, several ideas use ‘Computer Science, Theory and Methods’ (mean) for applications in ‘Psychiatry’ (purpose). Furthermore, methods from ‘Artificial Intelligence’ (mean) are used for ‘Automation and Control Systems’ purposes. To evaluate these results, we use precision and recall measures commonly used in information retrieval based on true positives, false positives and false negatives. For this, we have to define the ground truth for our evaluation. Therefore, a human expert also identifies new and useful ideas from these patents manually that means without using our idea mining approach. He uses the idea definition in Section 1.2. This means, he checks each text pattern for finding terms representing a known mean (purpose) and terms representing an unknown purpose (mean). These results are the ground truth for the evaluation. For each patent, we compute its precision and recall values by using the idea mining measure and by using the Jaccard’s coeffi- cient. Then, we compute the average precision and recall values. As a result, we get a precision value of 40% and a recall value of 25% by using the idea mining approach with the idea mining mea￾sure. A precision value of 40% means that if the idea mining ap￾proach extracts ten text patterns then four of them represent a new and useful idea. A recall value of 25% means that if there are four new and useful ideas in the new text then the idea mining ap￾proach extracts only one of them. In contrast to this, we get a pre￾cision value of 30% and a recall value of 20% by using Jaccard’s coefficient. This is because in some texts Jaccard’s coefficient ex￾tracts text patterns from the new text that are similar to text pat￾terns from the problem description. This represents probably a known idea but not a new idea. Beside Jaccard’s coefficient, we also test other well-known heu￾ristic measures like overlap-index, cosine-similarity and dice-sim￾ilarity (Ferber, 2003) as baseline. However, we get nearly the same results for the precision (30%) and for the recall (20%) value. 8. The idea mining application The idea mining application focus on users without extensive knowledge in the text mining field as well as on text mining ex￾perts. We give them the possibility to extract specifically problem solution ideas for their own needs using this idea mining approach. They can access to the web-based application via the internet. It is available under http://www.text-mining.info and it is programmed in perl and ruby. An user has to provide two textual files, a problem description and a new text that probably consists of problem solution ideas. These files can be formatted in various ways e.g. as plain text, html, xml etc. However, scripting code, (html- or xml-) tags, and images are discarded that means the application extracts plain text from the provided files. Then, the user has to select the language of these texts to integrate a general stop word list of this language. The application offers general stop word lists in English, German, Dutch, Spain and French. After determining the parameters of the application the automatically extraction of new and useful ideas from the new text starts as described in the idea mining process (see Sections 3 and 4). As a result, new ideas are presented as de￾scribed in Section 5. 9. Conclusions and future research This study shows the success of an automatic approach for find￾ing new ideas from textual information. For this, the study trans￾forms creativity approaches from psychology and cognitive science to text mining approaches. One main finding here is to redefine an abstract term (an idea) in a concrete way that it can be used for computing with text mining methods. In detail, it is shown that a technological idea represents a combination of a pur￾pose and a mean and that purposes and means are defined by a combination of terms, which co-occur. Additionally, it is shown that problems and problem solution ideas can be represented as term vectors in vector space model. For this, the study contributes a new (idea mining) measure. This measure identifies new ideas by comparing vectors that represent a problem to vectors that represent a problem solution idea. Last, it is shown that approaches from comprehensibility research can be adopted to this approach to present the new ideas in a comprehen￾sible way to the user. As further main finding, it is demonstrated that this theoretical approach can be realized by a web-based application. The success of the idea mining measure is proved by comparing it to further heuristic measures (overlap-index, co￾sine-similarity and dice-similarity). Directions for future research are given by the fact that nowa￾days there is a large amount of textual information available on the internet and this information probably contains many new technological ideas. Enlarging this approach to a web idea mining approach that automatically identifies problem solution ideas from the internet is an interesting topic for further research. Additionally, the parameters of the approach can be optimized and the idea mining measure can probably be enlarged with fur￾ther aspects to improve its quality that means to get better results for the precision and recall values. A further aspect is to transform this idea mining approach to the colloquial language. For this, it is necessary that the idea definition also contains new product ideas from the consumers. Then, new product ideas can be identified to support marketing activities. Last, the approach can be extended with innovation-related as￾pects. Then, extracted ideas can be classified as innovative ideas and might be used as starting point for the new product development. Acknowledgements We thank Joachim Schulze and Jörg Fenner for constructive technical comments. References Coussement, K., & Van den Poel, D. (2008). Integrating the voice of customers through call center emails into a decision support system for churn prediction. Information and Management, 45, 165. Coussement, K., & Van den Poel, D. (2009). Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers. Expert Systems with Applications, 36, 6127–6134. D. Thorleuchter et al. / Expert Systems with Applications 37 (2010) 7182–7188 7187
