正在加载图片...
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.X,NO.X,XXX 200X 4 contain so many negative instances that it cannot represent the area of true positive instances at all (cf.Section 3.2.3).DD (diverse density)value [10]is proposed to measure how many different positive bags have instances near a point in the feature space and how far the negative instances are from that point.The DD method tries to find the point with the highest DD value as the target concept.The DD method is very sensitive to labeling noise too since the DD value at a point will be exponentially reduced if an instance from some negative bag is close to that point (cf.Section 3.2.3).This phenomenon has been validated empirically by [11].Moreover, since the DD landscape contains local maxima,searching for the point with the highest DD value generally requires multiple restarts and hence incurs high computation cost.Other methods,such as mi-SVM [7],adopt some heuristic methods to solve the optimization problem,which may lead to local minima and incur high computation cost. 1.3 Main Contributions In this paper,we propose a very efficient and robust MIL method,called MILD (Multiple- Instance Learning via Disambiguation),for general MIL problems.The main contributions of this paper can be summarized as follows: By investigating the properties of true positive instances in depth,we propose a novel disambiguation method for identifying the true positive instances in the positive bags.This method is not only very efficient but also very robust. Two feature representation schemes,one for instance-level classification and the other for bag-level classification,are proposed to convert the MIL problem into a standard single- instance learning problem that can be solved directly by well-known SIL algorithms,such as support vector machine (SVM). By combining the two feature representation schemes,a multi-view semi-supervised learning method based on co-training [12]is proposed for MIL.To the best of our knowledge,this is the first inductive semi-supervised learning method for MIL. To demonstrate the promising performance of our method,we extensively compare our method with many state-of-the-art MIL methods in diverse applications,including drug activity prediction,protein sequence classification and image classification. One of the most attractive advantages of our methods is that,after instance label disambigua- tion,the MIL problem is converted into a standard single-instance learning problem.As a result, March 1.2009 DRAFTIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XXX 200X 4 contain so many negative instances that it cannot represent the area of true positive instances at all (cf. Section 3.2.3). DD (diverse density) value [10] is proposed to measure how many different positive bags have instances near a point in the feature space and how far the negative instances are from that point. The DD method tries to find the point with the highest DD value as the target concept. The DD method is very sensitive to labeling noise too since the DD value at a point will be exponentially reduced if an instance from some negative bag is close to that point (cf. Section 3.2.3). This phenomenon has been validated empirically by [11]. Moreover, since the DD landscape contains local maxima, searching for the point with the highest DD value generally requires multiple restarts and hence incurs high computation cost. Other methods, such as mi-SVM [7], adopt some heuristic methods to solve the optimization problem, which may lead to local minima and incur high computation cost. 1.3 Main Contributions In this paper, we propose a very efficient and robust MIL method, called MILD (Multiple￾Instance Learning via Disambiguation), for general MIL problems. The main contributions of this paper can be summarized as follows: • By investigating the properties of true positive instances in depth, we propose a novel disambiguation method for identifying the true positive instances in the positive bags. This method is not only very efficient but also very robust. • Two feature representation schemes, one for instance-level classification and the other for bag-level classification, are proposed to convert the MIL problem into a standard single￾instance learning problem that can be solved directly by well-known SIL algorithms, such as support vector machine (SVM). • By combining the two feature representation schemes, a multi-view semi-supervised learning method based on co-training [12] is proposed for MIL. To the best of our knowledge, this is the first inductive semi-supervised learning method for MIL. • To demonstrate the promising performance of our method, we extensively compare our method with many state-of-the-art MIL methods in diverse applications, including drug activity prediction, protein sequence classification and image classification. One of the most attractive advantages of our methods is that, after instance label disambigua￾tion, the MIL problem is converted into a standard single-instance learning problem. As a result, March 1, 2009 DRAFT
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有