defined as a combination of two thing_中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Mining ideas from textual information

正在加载图片...

D. Thorleuchter et aL Expert Systems with Applications 37(2010)7182-7188 defined as a combination of two things: a mean and an appertain- text as described in Section 1. 2. Therefore, with an automatically ing purpose. An example for an idea is a transistor. A transistor is a process, we extract a very large number of overlapping text semiconductor device. It can be used to amplify or switch elec- phrases from the new text. In the remainder of this paper, text tronic signals. Here, we have a mean(a semiconductor device) phrases will be named text patterns In the third step, all extract nd an appertaining purpose (to amplify or switch electronic text patterns are evaluated for novelty and usefulness. This means, signals ). hey are compared to the problem description by using a specific In general, we talk about a new idea if a know mean is related to idea mining measure. With this measure, text patterns can be clas- n unknown purpose or if a known purpose is related to an un- sified as new and useful idea. Therefore, idea mining identifies new known mean(Thorleuchter, Van den poel, Prinzie, 2010c). Then, and useful ideas in three steps: a new idea is a nanomagnet because a nanomagnet is a miniat ized magnet that also can be used to amplify or switch electronic Preparation of a problem description signals. Here we have an unknown mean(a miniaturized magnet) Extraction of text patterns from a new text ppearing together with a known purpose. This new idea could be Evaluation of text patterns for novelty and usefulness concern- useful to humans who are working in the field of electronic signals ing problem description. because in future nanomagnetic technology possibly could replace transistor technology Therefore, we define a new and probably useful idea as a text phrase. This text phrase consists of domain specific terms that oc Fig 1 shows the processing of the idea mining approach in dif- ferent steps based on the rationale for the idea mining process(see into two subsets. The first subset should represent a known mean Section 2). (or a known purpose)and the second subset should represent an With tokenization(Coussement Van den Poel, 2008), texts are unknown purpose(or an unknown mean). Additionally, all terms separated in terms and the term unit is word. The set of different in the first subset should occur together in a text phrase of the terms in a text is reduced by using stop word filtering methods echnological problem description. and stemming( Hotho et al., 2005). For this, a general list of stop words is used as well as the well-known Porter stemming algo rithm( Porter, 1980). 2. Rationale behind idea mining A related problem to the use of stemming is to identify syn- onyms and homonyms. Synonyms are different words with identi- Creating ideas is a well-known topic that is related to creativity cal or at least similar meanings. Homonyms are groups of words n psychology and cognitive science. One of the first descriptions of with the same spelling but with different meanings With stem- the creative process was published by Wallas(1926). His stage ming synonyms and homonyms cannot be identified because model explains creative insights and illuminations for finding a stemming does not use knowledge of the context of a term. In this problem solution. This model consists of a four stages process. In idea mining approach, we do not identify synonyms and hom- age one 'preparation, the problem is analyzed so that a person onyms. This is because the approach always considers the context cognizes the problems dimensions. The stage two 'incubation/ of a term by working on text patterns containing several co-occur- timation and the stage three ' illumination transfer the problem ring terms as described below. from the conscious to the unconscious mind. the unconscious Here, we show how to create these text patterns automatically nind works on the problem continuously and it probably finds a Around each appearance of each term in the new text, we create a solution by creative insights and illuminations. This solution is text pattern containing the selected term and all terms, which oc- transferred to the conscious mind, which means after some time cur in the left and right context of the selected term. To reduce the the person suddenly gets an idea that is new for him and that prob- number of text patterns, we only create text patterns around non ably solves the problem. In the last stage 'verification the idea is sted for novelty and usefulness. New Text Problem description One of the best-known pragmatic approaches of using practical creativity is brainstorming from Osborn(1948). The first step in Tokenization brainstorming is to define the problem e. g. by creating descriptions of the problem. Then, persons generate new ideas using creativity Term fiitering methods like idea association etc. The last step in the brainstorm- ing process is to cluster the generated ideas and to evaluate it for Creating text pattens Creating text patterns novelty and usefulness. Beside this, there are several further approaches dealing with the creation of new ideas. We can learn from all these approaches Creating term vectors Creating term vectors that for creating ideas three steps are necessary. The first step is to ocus on a problem, the second step is to generate some new ideas Euclidean distance measure specific for this problem with creative methods and the third step is to evaluate the generated ideas for novelty and usefulness con- ea mining measure rning quate rationale for the idea mining process. Therefore, idea mining also consists of three steps. In the first step, we focus on the problem. Here, the user of our idea mining approach has to provide textual infor-. mation where he describes his specific problem (a problem and term filtering, text patterns are created and term vectors are built represe description).In the second step, the user has to provide further tex- these text patterns. Term vectors from the new text are compared to term tual information where he supposes the existence of new and use- vectors from the ne om the problem description using the Euclidean distance measure. Then, term re compared to their most similar term vectors from the ful ideas(a new text)that probably can solve his problem( Ripke problem description using the idea mining measure. As a Stober, 1972). Ideas are contained in text phrases inside this new from the new text that represent new and useful ideas esult, we get term vectorsdefined as a combination of two things: a mean and an appertaining purpose. An example for an idea is a transistor. A transistor is a semiconductor device. It can be used to amplify or switch electronic signals. Here, we have a mean (a semiconductor device) and an appertaining purpose (to amplify or switch electronic signals). In general, we talk about a new idea if a know mean is related to an unknown purpose or if a known purpose is related to an unknown mean (Thorleuchter, Van den Poel, & Prinzie, 2010c). Then, a new idea is a nanomagnet because a nanomagnet is a miniaturized magnet that also can be used to amplify or switch electronic signals. Here we have an unknown mean (a miniaturized magnet) appearing together with a known purpose. This new idea could be useful to humans who are working in the field of electronic signals because in future nanomagnetic technology possibly could replace transistor technology. Therefore, we define a new and probably useful idea as a text phrase. This text phrase consists of domain specific terms that occur together in textual information. These terms can be divided up into two subsets. The first subset should represent a known mean (or a known purpose) and the second subset should represent an unknown purpose (or an unknown mean). Additionally, all terms in the first subset should occur together in a text phrase of the technological problem description. 2. Rationale behind idea mining Creating ideas is a well-known topic that is related to creativity in psychology and cognitive science. One of the first descriptions of the creative process was published by Wallas (1926). His stage model explains creative insights and illuminations for finding a problem solution. This model consists of a four stages process. In stage one ‘preparation’, the problem is analyzed so that a person recognizes the problem’s dimensions. The stage two ‘incubation/ intimation’ and the stage three ‘illumination’ transfer the problem from the conscious to the unconscious mind. The unconscious mind works on the problem continuously and it probably finds a solution by creative insights and illuminations. This solution is transferred to the conscious mind, which means after some time the person suddenly gets an idea that is new for him and that probably solves the problem. In the last stage ‘verification’, the idea is tested for novelty and usefulness. One of the best-known pragmatic approaches of using practical creativity is brainstorming from Osborn (1948). The first step in brainstorming is to define the problem e.g. by creating descriptions of the problem. Then, persons generate new ideas using creativity methods like idea association etc. The last step in the brainstorming process is to cluster the generated ideas and to evaluate it for novelty and usefulness. Beside this, there are several further approaches dealing with the creation of new ideas. We can learn from all these approaches that for creating ideas three steps are necessary. The first step is to focus on a problem, the second step is to generate some new ideas specific for this problem with creative methods and the third step is to evaluate the generated ideas for novelty and usefulness concerning the problem. Referring to these approaches, we build an adequate rationale for the idea mining process. Therefore, idea mining also consists of three steps. In the first step, we focus on the problem. Here, the user of our idea mining approach has to provide textual information where he describes his specific problem (a problem description). In the second step, the user has to provide further textual information where he supposes the existence of new and useful ideas (a new text) that probably can solve his problem (Ripke & Stöber, 1972). Ideas are contained in text phrases inside this new text as described in Section 1.2. Therefore, with an automatically process, we extract a very large number of overlapping text phrases from the new text. In the remainder of this paper, text phrases will be named text patterns. In the third step, all extracted text patterns are evaluated for novelty and usefulness. This means, they are compared to the problem description by using a specific idea mining measure. With this measure, text patterns can be classified as new and useful idea. Therefore, idea mining identifies new and useful ideas in three steps: Preparation of a problem description Extraction of text patterns from a new text and Evaluation of text patterns for novelty and usefulness concerning problem description. 3. Idea mining process Fig. 1 shows the processing of the idea mining approach in different steps based on the rationale for the idea mining process (see Section 2). With tokenization (Coussement & Van den Poel, 2008), texts are separated in terms and the term unit is word. The set of different terms in a text is reduced by using stop word filtering methods and stemming (Hotho et al., 2005). For this, a general list of stop words is used as well as the well-known Porter stemming algorithm (Porter, 1980). A related problem to the use of stemming is to identify synonyms and homonyms. Synonyms are different words with identical or at least similar meanings. Homonyms are groups of words with the same spelling but with different meanings. With stemming synonyms and homonyms cannot be identified because stemming does not use knowledge of the context of a term. In this idea mining approach, we do not identify synonyms and homonyms. This is because the approach always considers the context of a term by working on text patterns containing several co-occurring terms as described below. Here, we show how to create these text patterns automatically. Around each appearance of each term in the new text, we create a text pattern containing the selected term and all terms, which occur in the left and right context of the selected term. To reduce the number of text patterns, we only create text patterns around nonFig. 1. Processing of our idea mining approach in different steps: After tokenization and term filtering, text patterns are created and term vectors are built representing these text patterns. Term vectors from the new text are compared to term vectors from the problem description using the Euclidean distance measure. Then, term vectors from the new text are compared to their most similar term vectors from the problem description using the idea mining measure. As a result, we get term vectors from the new text that represent new and useful ideas. D. Thorleuchter et al. / Expert Systems with Applications 37 (2010) 7182–7188 7183

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Mining ideas from textual information