正在加载图片...
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.X,NO.X,XXX 200X 3 From the formulation of MIL,we can easily see that a positive bag may contain some negative instances in addition to one or more positive instances.Hence,the true labels for the instances in a positive bag may or may not be the same as the corresponding bag label and,consequently,the instance labels are inherently ambiguous.In the MIL literature [2],true positive instances and false positive instances refer to the positive and negative instances,respectively,in the positive bags. 1.2 Motivation Since labels are not available for the training instances,some methods [2],[6]simply assume that all instances in a bag have the same label as the bag label.However,this assumption can be very unreasonable for positive bags because a positive bag may contain as few as only one true positive instance.If the majority of the negative instances in a positive bag are mislabeled this way,the features learned for distinguishing positive instances from negative instances may end up being very misleading.Other methods [7],[8],[9]try to extend some standard single-instance learning (SIL)methods for multi-instance data by adding some constraints.Unfortunately,the resulting methods typically require solving non-convex optimization problems which suffer from the local minima problem and have high computation cost. Considering the limitations of some previous methods,we advocate here the necessity of instance label disambiguation as a way to eventually improve the prediction accuracy of the bag labels.In the context of MIL,disambiguation essentially refers to identifying the true positive instances in the positive bags. However,existing disambiguation methods cannot achieve promising performance.The APR (axis-parallel rectangle)method [1]tries to find an axis-parallel rectangle (or,more generally, hyper-rectangle)in the feature space to represent the area of the true positive instances.This rectangle should include at least one instance from each positive bag but exclude all instances from the negative bags.Although APR works quite well for the drug activity prediction problem, it is highly possible that no aPR can be found for some other applications,such as image or text categorization,to satisfy the requirement that at least one instance from each positive bag is included while all instances from the negative bags are excluded.Moreover,APR is very sensitive to labeling noise.Suppose only one negative bag is mislabeled as a positive one.In order to include at least one instance from this mislabeled negative bag,the computed APR may March 1,2009 DRAFTIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XXX 200X 3 From the formulation of MIL, we can easily see that a positive bag may contain some negative instances in addition to one or more positive instances. Hence, the true labels for the instances in a positive bag may or may not be the same as the corresponding bag label and, consequently, the instance labels are inherently ambiguous. In the MIL literature [2], true positive instances and false positive instances refer to the positive and negative instances, respectively, in the positive bags. 1.2 Motivation Since labels are not available for the training instances, some methods [2], [6] simply assume that all instances in a bag have the same label as the bag label. However, this assumption can be very unreasonable for positive bags because a positive bag may contain as few as only one true positive instance. If the majority of the negative instances in a positive bag are mislabeled this way, the features learned for distinguishing positive instances from negative instances may end up being very misleading. Other methods [7], [8], [9] try to extend some standard single-instance learning (SIL) methods for multi-instance data by adding some constraints. Unfortunately, the resulting methods typically require solving non-convex optimization problems which suffer from the local minima problem and have high computation cost. Considering the limitations of some previous methods, we advocate here the necessity of instance label disambiguation as a way to eventually improve the prediction accuracy of the bag labels. In the context of MIL, disambiguation essentially refers to identifying the true positive instances in the positive bags. However, existing disambiguation methods cannot achieve promising performance. The APR (axis-parallel rectangle) method [1] tries to find an axis-parallel rectangle (or, more generally, hyper-rectangle) in the feature space to represent the area of the true positive instances. This rectangle should include at least one instance from each positive bag but exclude all instances from the negative bags. Although APR works quite well for the drug activity prediction problem, it is highly possible that no APR can be found for some other applications, such as image or text categorization, to satisfy the requirement that at least one instance from each positive bag is included while all instances from the negative bags are excluded. Moreover, APR is very sensitive to labeling noise. Suppose only one negative bag is mislabeled as a positive one. In order to include at least one instance from this mislabeled negative bag, the computed APR may March 1, 2009 DRAFT
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有