DOI: 10.11992/tis.202012019 网络出版地址: h

正在加载图片...

第17卷第2期智能系统学报 Vol.17 No.2 2022年3月 CAAI Transactions on Intelligent Systems Mar.2022 D0:10.11992/tis.202012019 网络出版地址：https:/ns.cnki.net/kcms/detail/23.1538.TP.20210621.1427.002.html 对不平衡目标域的多源在线迁移学习周晶雨，王士同 (江南大学人工智能与计算机学院，江苏无锡214122) 摘要：多源在线迁移学习已经广泛地应用于相关源域中含有大量的标记数据且目标域中数据以数据流的形式达到的应用中。然而，目标域的类别分布有时是不平衡的，针对目标域每次以在线方式到达多个数据的不平衡二分类问题，本文提出了一种可以对目标域样本过采样的多源在线迁移学习算法。该算法从前面批次的样本中寻找当前批次的样本的k近邻，先少量生成多数类样本，再生成少数类使得当前批次样本的类别分布平衡。每个批次合成样本和真实样本一同训练目标域函数，从而提升目标域函数的分类性能。同时，分别设计了在目标域的输入空间和特征空间过采样的方法，并且在多个真实世界数据集上进行了综合实验，证明了所提出算法的有效性。关键词：多源迁移学习：在线学习；目标域：不平衡数据：过采样：k近邻：输入空间：特征空间中图分类号：TP181文献标志码：A文章编号：1673-4785(2022)02-0248-09 中文引用格式：周晶雨，王士同.对不平衡目标域的多源在线迁移学习.智能系统学报，2022,17(2)：248-256. 英文引用格式：ZHOU Jingyu,.WANG Shitong.Multi--source online transfer learning for imbalanced target domainsJ.CAAI transactions on intelligent systems,2022,17(2):248-256. Multi-source online transfer learning for imbalanced target domains ZHOU Jingyu,WANG Shitong (School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi 214122,China) Abstract:Multi-source online transfer learning has been widely used in applications where the relevant source domain contains a large amount of labeled data and the data in the target domain is achieved in the form of data flow.However, the class distribution of the target domain is sometimes imbalanced.Aiming at the unbalanced binary classification prob- lem wherein the target domain reaches multiple data online at a time,this paper proposes a multi-source online transfer learning algorithm by means of oversampling the target domain samples.First,the algorithm finds the k-nearest neigh- bors of the current batch of samples from the previous batch,then generates a small number of majority class samples, finally generating a minority class to balance the class distribution of the current batch of samples.Each batch of syn- thetic and real samples train the target domain function together,thereby improving the classification performance of the target domain function.At the same time,methods for oversampling in the input space and feature space of the target domain are designed respectively,and comprehensive experiments are conducted on multiple real-world data sets to prove the effectiveness of the proposed algorithm. Keywords:multi-source transfer learning;online learning;target domain;imbalanced data;oversampling;k-nearest neighbor,input space;feature space 迁移学习的主要目的是利用源域的知识来提息，可以解决目标域的训练数据有限或标记成本高目标域的学习性能，多年来进行了广泛的研究"。太高的问题。在许多实际应用中，与目标域分布使用一些分布相似的现有数据来提取有用的信相似的离线源域有多个，所以可以轻松地从这些源域中收集辅助信息。为了应对不同来源对与目收稿日期：2020-12-16.网络出版日期：2021-06-21. 基金项目：国家自然科学基金项目(61572236). 标域的贡献不同的问题，许多复杂的基于提升方通信作者：王士同.E-mail:wxwangst(@aliyun.com 法的多源迁移学习算法1被设计。基于提升方DOI: 10.11992/tis.202012019 网络出版地址: https://kns.cnki.net/kcms/detail/23.1538.TP.20210621.1427.002.html 对不平衡目标域的多源在线迁移学习周晶雨，王士同（江南大学人工智能与计算机学院，江苏无锡 214122）摘要：多源在线迁移学习已经广泛地应用于相关源域中含有大量的标记数据且目标域中数据以数据流的形式达到的应用中。然而，目标域的类别分布有时是不平衡的，针对目标域每次以在线方式到达多个数据的不平衡二分类问题，本文提出了一种可以对目标域样本过采样的多源在线迁移学习算法。该算法从前面批次的样本中寻找当前批次的样本的 k 近邻，先少量生成多数类样本，再生成少数类使得当前批次样本的类别分布平衡。每个批次合成样本和真实样本一同训练目标域函数，从而提升目标域函数的分类性能。同时，分别设计了在目标域的输入空间和特征空间过采样的方法，并且在多个真实世界数据集上进行了综合实验，证明了所提出算法的有效性。关键词：多源迁移学习；在线学习；目标域；不平衡数据；过采样；k 近邻；输入空间；特征空间中图分类号：TP181 文献标志码：A 文章编号：1673−4785(2022)02−0248−09 中文引用格式：周晶雨, 王士同. 对不平衡目标域的多源在线迁移学习 [J]. 智能系统学报, 2022, 17(2): 248–256. 英文引用格式：ZHOU Jingyu, WANG Shitong. Multi-source online transfer learning for imbalanced target domains[J]. CAAI transactions on intelligent systems, 2022, 17(2): 248–256. Multi-source online transfer learning for imbalanced target domains ZHOU Jingyu，WANG Shitong (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China) Abstract: Multi-source online transfer learning has been widely used in applications where the relevant source domain contains a large amount of labeled data and the data in the target domain is achieved in the form of data flow. However, the class distribution of the target domain is sometimes imbalanced. Aiming at the unbalanced binary classification problem wherein the target domain reaches multiple data online at a time, this paper proposes a multi-source online transfer learning algorithm by means of oversampling the target domain samples. First, the algorithm finds the k-nearest neighbors of the current batch of samples from the previous batch, then generates a small number of majority class samples, finally generating a minority class to balance the class distribution of the current batch of samples. Each batch of synthetic and real samples train the target domain function together, thereby improving the classification performance of the target domain function. At the same time, methods for oversampling in the input space and feature space of the target domain are designed respectively, and comprehensive experiments are conducted on multiple real-world data sets to prove the effectiveness of the proposed algorithm. Keywords: multi-source transfer learning; online learning; target domain; imbalanced data; oversampling; k-nearest neighbor; input space; feature space 迁移学习的主要目的是利用源域的知识来提高目标域的学习性能，多年来进行了广泛的研究[1]。使用一些分布相似的现有数据来提取有用的信息，可以解决目标域的训练数据有限或标记成本太高的问题。在许多实际应用中，与目标域分布相似的离线源域有多个，所以可以轻松地从这些源域中收集辅助信息。为了应对不同来源对与目标域的贡献不同的问题，许多复杂的基于提升方法的多源迁移学习算法[2-3] 被设计。基于提升方收稿日期：2020−12−16. 网络出版日期：2021−06−21. 基金项目：国家自然科学基金项目 (61572236). 通信作者：王士同. E-mail：wxwangst@aliyun.com. 第 17 卷第 2 期智能系统学报 Vol.17 No.2 2022 年 3 月 CAAI Transactions on Intelligent Systems Mar. 2022

向下翻页>>

点击下载：【机器学习】对不平衡目标域的多源在线迁移学习