因为本文主要关注恶意样本的行为特征，在利用强化学习模型处理恶意样本时，更

正在加载图片...

高洋等：基于强化学习的工控系统恶意软件行为检测方法 461 表1分类结果的混淆矩阵择了删除比例最高和最低的的5个API函数.由 Table 1 Confusion matrix 表中的结果可知，VirtualAllocEx等函数在实际分 Confusion matrix Prediction:malicious Prediction:benign 类过程中起到的作用比较次要，尤其是Virtual Truth:malicious 257(TP) 43(FN) AllocEx函数对于强化学习模型而言，更容易被删 Truth:benign 4(FP) 296(TN) 除.这说明，虽然VirtualAllocEx在大多数的恶意样本中都出现，但是对于分类来说，反而没有起到因为本文主要关注恶意样本的行为特征，在太大的贡献.而GetProcAddress和CloseHandle等利用强化学习模型处理恶意样本时，更倾向于确函数出现次数多，并且都未被删除，说明这些函定删除和保留哪些API函数.因此，在测试集恶意数对于所训练的强化学习模型提出取来的特征样本中，统计出现次数超过100的API函数，并且的贡献是较为重要的.从恶意软件行为的角度按照删除比例（删除次数/总出现次数）来排序.如来看，这些API函数相当于是恶意样本中关键的表2中的前半部分和后半部分所示，这里分别选行为表2删除比例最高和最低的各5个API函数 Table 2 Five API functions with the highest and lowest deletion rates API Functions Number of deleting operation Number of retaining operation Rate of deleting operation VirtualAllocEx 174 209 0.454308 IsDBCSLeadByte 89 135 0.397321 GetSystemDirectoryA 101 206 0.328990 CreateThread 38 106 0.263889 GetDC 82 229 0.263666 GetProcAddress 0 2883 0 CloseHandle 0 2853 0 LocalFree 0 1939 0 GetModuleFileNameW 0 1485 0 IstrlenW 0 1460 0 4结论文].北京：北京邮电大学，2016) [2] Demontis A,Melis M,Biggio B,et al.Yes,machine leaming can 为了有效检测工控系统中的恶意软件行为特 be more secure!A case study on android malware detection./EEE 征，本文通过结合使用强化学习这一高级机器学 Trans Dependable Secure Comput,2019,16(4):711 习算法模型，设计了一个智能检测方法框架.借助 [3] Sharif M.Lanzi A,Giffin J,et al.Impeding malware analysis 于强化学习具有序列决策和可根据反馈调整学习 using conditional code obfuscation /Proceedings of the Netvork 策略的特殊优势，对恶意软件行为序列进行了筛 and Distributed System Security Symposium.San Diego,2008: 选，以获得有效的行为序列特征，并利用得到的特 1939 征，实现了恶意软件的检测分类应用.围绕设计的 [4] Xiao X,Wang Z,Li Q,et al.Back-propagation neural network on 方法框架，详细讨论和分析了其中的特征提取网 Markov chains from system call sequences:a new approach for 络、策略网络和分类网络三个关键模块.通过结 detecting Android malware with system call sequences.IETInf Secw,2016,11(1:8 合实际数据集进行的实验验证结果表明，文中设 [5] Su X,Zhang D F.Li W J,et al.A deep learning approach to 计的基于强化学习的检测方法，可在一定程度上， android malware feature learning and detection /2016 IEEE 智能实现应用检测任务 Trustcom/BigDataSE/ISPA.Tianjin,2016:244 [6]Li G L,Gomez R,Nakamura K,et al.Human-centered 参考文献 reinforcement learning:a survey.IEEE Trans Human Mach Syst, [Shi Y J.Research on the Key Security Issues of Mobile and Open 2019,49(4:337 Industrial Control System[Dissertation].Beijing:Beijing [7]Wu C S,Shi J Y,Yang Y X,et al.Enhancing machine learning University of Posts and Telecommunications,2016 based malware detection model by reinforcement learning / (时忆杰.移动互联环境下工业控制系统安全问题研究[学位论 Proceedings of the 8th International Conference on因为本文主要关注恶意样本的行为特征，在利用强化学习模型处理恶意样本时，更倾向于确定删除和保留哪些 API 函数. 因此，在测试集恶意样本中，统计出现次数超过 100 的 API 函数，并且按照删除比例（删除次数/总出现次数）来排序. 如表 2 中的前半部分和后半部分所示，这里分别选择了删除比例最高和最低的的 5 个 API 函数. 由表中的结果可知，VirtualAllocEx 等函数在实际分类过程中起到的作用比较次要，尤其是 Virtual AllocEx 函数对于强化学习模型而言，更容易被删除. 这说明，虽然 VirtualAllocEx 在大多数的恶意样本中都出现，但是对于分类来说，反而没有起到太大的贡献. 而 GetProcAddress 和 CloseHandle 等函数出现次数多，并且都未被删除，说明这些函数对于所训练的强化学习模型提出取来的特征的贡献是较为重要的. 从恶意软件行为的角度来看，这些 API 函数相当于是恶意样本中关键的行为. 4 结论为了有效检测工控系统中的恶意软件行为特征，本文通过结合使用强化学习这一高级机器学习算法模型，设计了一个智能检测方法框架. 借助于强化学习具有序列决策和可根据反馈调整学习策略的特殊优势，对恶意软件行为序列进行了筛选，以获得有效的行为序列特征，并利用得到的特征，实现了恶意软件的检测分类应用. 围绕设计的方法框架，详细讨论和分析了其中的特征提取网络、策略网络和分类网络三个关键模块. 通过结合实际数据集进行的实验验证结果表明，文中设计的基于强化学习的检测方法，可在一定程度上，智能实现应用检测任务. 参考文献 Shi Y J. Research on the Key Security Issues of Mobile and Open Industrial Control System[Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2016 （时忆杰. 移动互联环境下工业控制系统安全问题研究[学位论 [1] 文]. 北京: 北京邮电大学, 2016） Demontis A, Melis M, Biggio B, et al. Yes, machine learning can be more secure! A case study on android malware detection. IEEE Trans Dependable Secure Comput, 2019, 16（4）: 711 [2] Sharif M, Lanzi A, Giffin J, et al. Impeding malware analysis using conditional code obfuscation // Proceedings of the Network and Distributed System Security Symposium. San Diego, 2008: 1939 [3] Xiao X, Wang Z, Li Q, et al. Back-propagation neural network on Markov chains from system call sequences: a new approach for detecting Android malware with system call sequences. IET Inf Secur, 2016, 11（1）: 8 [4] Su X, Zhang D F, Li W J, et al. A deep learning approach to android malware feature learning and detection // 2016 IEEE Trustcom/BigDataSE/ISPA. Tianjin, 2016: 244 [5] Li G L, Gomez R, Nakamura K, et al. Human-centered reinforcement learning: a survey. IEEE Trans Human Mach Syst, 2019, 49（4）: 337 [6] Wu C S, Shi J Y, Yang Y X, et al. Enhancing machine learning based malware detection model by reinforcement learning // Proceedings of the 8th International Conference on [7] 表 1 分类结果的混淆矩阵 Table 1 Confusion matrix Confusion matrix Prediction : malicious Prediction : benign Truth : malicious 257 (TP) 43 (FN) Truth : benign 4 (FP) 296 (TN) 表 2 删除比例最高和最低的各 5 个 API 函数 Table 2 Five API functions with the highest and lowest deletion rates API Functions Number of deleting operation Number of retaining operation Rate of deleting operation VirtualAllocEx 174 209 0.454308 IsDBCSLeadByte 89 135 0.397321 GetSystemDirectoryA 101 206 0.328990 CreateThread 38 106 0.263889 GetDC 82 229 0.263666 GetProcAddress 0 2883 0 CloseHandle 0 2853 0 LocalFree 0 1939 0 GetModuleFileNameW 0 1485 0 lstrlenW 0 1460 0 高洋等：基于强化学习的工控系统恶意软件行为检测方法 · 461 ·

<<向上翻页向下翻页>>

点击下载：基于强化学习的工控系统恶意软件行为检测方法