正在加载图片...
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.X,NO.X,XXX 200X 2 1 INTRODUCTION 1.1 Multiple-Instance Learning The task of drug activity prediction [1]is to classify aromatic molecules according to whether or not they are"musky".Here,the same molecule can manifest several different steric configurations (i.e.,molecular shapes),each with very different energy properties.A molecule is said to be musky if it binds itself to a particular receptor when it is in one or some of its configurations, although it cannot bind itself to the receptor in its other configurations.When a molecule cannot bind itself to the receptor in any of its configurations,it is said to be non-musky A new learning paradigm called multiple-instance learning (MIL)was proposed in [1]to model learning problems like drug activity prediction.In an MIL problem [1],an individual example is called an instance and a bag contains a single or multiple instances.A bag is labeled positive if at least one of its instances is positive;otherwise,the bag is labeled negative.In the example of drug activity prediction,a bag corresponds to a molecule,and an instance corresponds to a configuration.A configuration is said to be positive if it can make the molecule bind to the receptor.Otherwise,it is called negative.In MIL,the class labels available in the training set are associated with bags rather than instances.For example,we only know whether or not a molecule is musky,but do not know in what configuration a molecule can bind itself to the receptor. As more and more applications have been formulated as MIL problems,some of them are slightly different from the original formulation of MIL in [1].Recent works [2],[3]have pointed out that there are actually two different settings for MIL.One setting is based on the existential assumption,which assumes that a bag can be determined to be positive as long as one single positive instance exists in it.This setting is the same as the original formulation in [1].The other setting is based on the collective assumption [3],which assumes that a bag's label is collectively determined by a set of instances or even all the instances in the corresponding bag. One representative application conforming to the collective assumption is the region-based image classification task [4],[5],where the label of an image always refers to some object in that image. Because current automatic image segmentation methods may cut the object responsible for the label of the image into multiple parts,any single part cannot represent the object satisfactorily. It is the collective property of multiple parts that determines the label of the image. March 1.2009 DRAFTIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XXX 200X 2 1 INTRODUCTION 1.1 Multiple-Instance Learning The task of drug activity prediction [1] is to classify aromatic molecules according to whether or not they are “musky”. Here, the same molecule can manifest several different steric configurations (i.e., molecular shapes), each with very different energy properties. A molecule is said to be musky if it binds itself to a particular receptor when it is in one or some of its configurations, although it cannot bind itself to the receptor in its other configurations. When a molecule cannot bind itself to the receptor in any of its configurations, it is said to be non-musky. A new learning paradigm called multiple-instance learning (MIL) was proposed in [1] to model learning problems like drug activity prediction. In an MIL problem [1], an individual example is called an instance and a bag contains a single or multiple instances. A bag is labeled positive if at least one of its instances is positive; otherwise, the bag is labeled negative. In the example of drug activity prediction, a bag corresponds to a molecule, and an instance corresponds to a configuration. A configuration is said to be positive if it can make the molecule bind to the receptor. Otherwise, it is called negative. In MIL, the class labels available in the training set are associated with bags rather than instances. For example, we only know whether or not a molecule is musky, but do not know in what configuration a molecule can bind itself to the receptor. As more and more applications have been formulated as MIL problems, some of them are slightly different from the original formulation of MIL in [1]. Recent works [2], [3] have pointed out that there are actually two different settings for MIL. One setting is based on the existential assumption, which assumes that a bag can be determined to be positive as long as one single positive instance exists in it. This setting is the same as the original formulation in [1]. The other setting is based on the collective assumption [3], which assumes that a bag’s label is collectively determined by a set of instances or even all the instances in the corresponding bag. One representative application conforming to the collective assumption is the region-based image classification task [4], [5], where the label of an image always refers to some object in that image. Because current automatic image segmentation methods may cut the object responsible for the label of the image into multiple parts, any single part cannot represent the object satisfactorily. It is the collective property of multiple parts that determines the label of the image. March 1, 2009 DRAFT
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有