正在加载图片...
idea mining approach is evaluated by using ouridea mining ed application and The idea mining measure as central point in th ically founded. Therefore, it is crucial to evaluation to show their baseline because we are not aware of ot gth of the text patterns. The (Ferber, 2003) as ure for the baseli term weighting between stop words andthe ‘Groebener Modell’ marking text pattern is important for struc￾ture-organization and this leads directly to comprehensibility. In this point there are differences between the ‘Groebener Modell’ and the ‘Hamburger Verständlichkeitsmodell’ in which structure￾organization is not so important for comprehensibility. As a result, the presentation of ideas in the idea mining applica￾tion based on text excerption. It is comprehensible after the ‘Groebener Modell’ and it is less comprehensible after the ‘Ham￾burger Verständlichkeitsmodell’. 6. Results and discussions In a study for the German Ministry of Defence (MoD), we use this approach to identify new technological ideas for the German defence research program. In detail, we have to identify new solu￾tion ideas to solve current problems in German defence based re￾search projects. We extract new ideas from 300 descriptions of research projects granted in 2006 by the National Institute of Stan￾dards and Technology (NIST) in the United States Small Business Innovation Research (SBIR) Program. We use textual information from current defence based research projects of the German MoD as problem description (Thorleuchter, Van den Poel, & Prinzie, 2010a). As a result, we extract several new ideas that are useful for German defence research planners and that now are used as starting point for collaboration projects or for new defence based research projects. A proper selection of these ideas is a strategic is￾sue and - together with the weapon selection problem (Dagdevi￾ren, Yavuz, & Kilinc, 2009) – it has significant impacts to the efficiency of future defence systems. The results are published in Fenner and Thorleuchter (2009). Here, we show some successful examples: A modified focal plane array technology is identified that can be used to create a detector for the far ultraviolet spectrum. It leads to an improvement of military reconnaissance. This idea is new be￾cause up to now focal plane array technology is only used in the infrared, visual and near ultraviolet area. Further, the approach identifies personnel ultrasonic locating equipment that was originally developed to make orientation pos￾sible for fire fighters in dense smoke. It also can be used to improve the location and navigation of soldiers in urban warfare (e.g. in buildings). Additionally, the approach shows that the use of avalanche pho￾todiode (APD) technology can improve the internal gain and the dark current of infrared detectors. This also leads to an improve￾ment of military reconnaissance. This study shows that some of the automatically extracted ideas are useful for technological research planners from the German MoD. Unfortunately, the used problem description (textual infor￾mation about current defence based research projects) is classified as German restricted (Verschlusssache – Nur für den Dienstgebr￾auch) that means it is not allowed to distribute it to the scientific community. Therefore, we cannot use the results of this study to evaluate this idea mining approach. However, a separate evalua￾tion (see Section 7) is done using (unclassified) patent data that al￾lows re-computing of the evaluation. 7. Evaluation The idea mining measure as central point in the idea mining ap￾proach consists of four heuristic sub measures that are not theoret￾ically founded. Therefore, it is crucial to provide an extensive evaluation to show their success. We compare this approach to a baseline because we are not aware of other approaches for idea mining. As measure for the baseline, we use Jaccard’s coefficient (Ferber, 2003) as well-known heuristic similarity measure. The idea mining approach is evaluated by using our idea mining application (see Section 8). There the web-based application and all texts that are used for evaluation are presented. Additionally, we create an alternative idea mining application, based on Jac￾card’s coefficient instead of the idea mining measure for the sole purpose of comparison to the baseline. For evaluation, we use patent data because in patent descrip￾tions, we normally can find new ideas, which include a consider￾able part of scientific and technological knowledge (Li, Wang, & Hong, 2009). We use the abstract of a patent as new text. A patent often bases on further patents. We aggregate abstracts of theses references as problem description. Then we identify new and use￾ful ideas from this patent concerning its patent references using the idea mining applications. We use abstracts from 40 randomly selected patents and from their references, a general stop word list and Porter stemmer for evaluation. Then we determine the parameters of the idea mining measure (g1; g2; g3; g4; a~, and z) as well as the parameters for the length of the text patterns (l, u, and v). For this, we use further patent data and their references as new text and as problem description. The results are evaluated by a hu￾man expert and compared to each single sub measure m1, m2, m3 and m4 alone. We find out that using the first sub measure alone is successful. If this sub measure is small then the corresponding text pattern normally does not contain a new and useful idea. If this sub measure is large then the probability that the text pattern con￾tains a new idea is also high. We also find out that using the further sub measures alone is not successful. This means, they are successful only if the result value of the first sub measure is medium to high. Therefore, they only can be used in addition to the first sub measure. The results of the second and third sub measures depend on the parameter z. This parameter is used to define frequent terms by building a set of z% most frequently stemmed and stop word fil￾tered terms. We heuristically think that this parameter should be between 10% and 30% to get good sub measures. This is because if z is greater than 30% then we probably classify several terms, which only occur once as frequent terms. If z is smaller than 10% then we only identify high frequently terms for the set. In this case, the result values of the second and third sub measures are small regardless weather known terms occur frequently in the problem description or unknown terms occur frequently in the new text. Therefore, we determine z to the mean value (20%). Additionally, we see that the second and third sub measure is nearly equally successful and that the fourth sub measure is less successful. Therefore, we heuristically determine the parameters of g1 to 50%, g2 to 20%, g3 to 20% and g4 to 10%. We also have used other values to optimize the combination of these four sub measures. However, we do not find a combination that is generally superior to the selected combination. This is be￾cause the success of these value combinations depends on the quality of the user given textual information. Then, we determine the alpha-cut value a~ of the idea mining measure m. If the percentage a~ is small then we get many result items. This leads to a small precision value because many extracted text patterns do not contain a new and useful idea. If a~ is large then we only get a very small number of results and probably our recall value is small because we do not find most of the new and useful ideas in the new text. A human expert checks the results of several patent descriptions for an optimal value of a~. He gets the experi￾ence that 60% is a good compromise. Therefore, we set a~ to 60%. We also determine the alpha-cut value of Jaccard’s coefficient as measure for the baseline to 20% by using the same way of evalua￾tion as described above. After this, we determine the length of the text patterns. The length depends on the parameter l and on fg(wi), a term weighting scheme that is based on the difference between stop words and 7186 D. Thorleuchter et al. / Expert Systems with Applications 37 (2010) 7182–7188
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有