正在加载图片...
第11卷第3期 智能系统学报 Vol.11 No.3 2016年6月 CAAI Transactions on Intelligent Systems Jun.2016 D0I:10.11992/is.201603051 网s络出版地址:http:/www.cnki.net/kcms/detail/23.1538.TP.20160513.0913.006.html 基于概率图模型的蛋白质推断算法 赵璨,段琼,何增有 (大连理工大学国家示范性软件学院,辽宁大连116620) 摘要:蛋白质组学是研究细胞内表达的所有的蛋白质及其变化规律的一门新兴学科。蛋白质组学的一个重要目 标是能够快速准确的进行蛋白质鉴定。蛋白质鉴定主要包括肽段鉴定和蛋白质推断两个步骤。肽段鉴定是从原始 质谱数据中鉴定出肽段序列,而蛋白质推断是从这些鉴定得到的肽段中还原出原始的蛋白质序列。但由于质谱数 据固有的不确定性和蛋白质组的复杂性,使得解决蛋白质推断问题变得很困难。本文引入串联质谱数据对于蛋白 质存在概率的影响,提出了一种基于概率图模型的方法(PGMP)来解决蛋白质推断问题,将蛋白质推断问题抽象成 一个概率图模型的求解问题,通过寻找蛋白质的最大后验概率来推断真实存在的蛋白质集合。该方法不仅能够进 行有效的蛋白质推断,而且模型参数少,提高了算法的稳定性。实验结果表明该模型在蛋白质推断上具有很好的表 现。 关键词:蛋白质推断:肽段推断:鸟枪法蛋白质组学:概率图模型 中图分类号:TP393文献标志码:A文章编号:1673-4785(2016)01-0376-08 中文引用格式:赵璨,段琼,何增有.基于概率图模型的蛋白质推断算法[J].智能系统学报,2016,11(2):376-383. 英文引用格式:ZHAO Can,.DUAN Qiong,.HE Zengyou..Protein inference method based on probabilistic graphical model[J].CAAI transactions on intelligent systems,2016,11(2):376-383. Protein inference method based on probabilistic graphical model ZHAO Can,DUAN Qiong,HE Zengyou (School of Software,Dalian University of Technology,Dalian 116620,China) Abstract:Proteomics is an emerging discipline that focuses on the large-scale study of proteins expressed inan or- ganism.An explicit goal of proteomics is the prompt and accurate identification of all proteins in a cell or tissue. Generally,protein identification can be divided into two parts:peptide identification and protein inference.In pep- tide identification,the peptide sequence is identified from raw tandem mass spectrometry,while the goal of protein inference is to identify which of these identified proteins is truly present in the sample.Because of the inherent un- certainty of MS data and the complexity of the proteome,there are several challenges in protein identification.In this article,we propose a novel method based on the probabilistic graphical model (PGMPi)that introduces the in- fluence of tandem mass spectrometry.This method transforms the protein inference problem into a probabilistic graphical model problem to be solved,in which the maximum posteriori probabilities of proteins are identified in or- der to identify the protein set that is actually present in the sample.PGMPi can not only achieve efficient perform- ance in terms of identification,but also introduces only one parameter,which ensures the algorithm's stability.The experimental results demonstrate that our method is superior to existing state-of-the-art protein inference algo- rithms. Keywords:protein inference;peptide inference;shotgun proteomics;probability graph model 蛋白质组学是研究细胞内表达的所有的蛋白质 及其变化规律的一门新兴学科川。蛋白质组主要 是指由一个基因组,或一个细胞组织表达的所有蛋 收稿日期:2016-03-200.网络出版日期:2016-05-13 基金项目:国家自然科学基金项目(61572094). 白质。基因组基本是固定不变的,而蛋白质组却为 通信作者:何增有.E-mail:zyhc@dlut.cdu.cm.第 11 卷第 3 期 智 能 系 统 学 报 Vol.11 №.3 2016 年 6 月 CAAI Transactions on Intelligent Systems Jun. 2016 DOI:10.11992 / tis.201603051 网络出版地址:http: / / www.cnki.net / kcms/ detail / 23.1538.TP.20160513.0913.006.html 基于概率图模型的蛋白质推断算法 赵璨,段琼,何增有 (大连理工大学 国家示范性软件学院,辽宁 大连 116620) 摘 要:蛋白质组学是研究细胞内表达的所有的蛋白质及其变化规律的一门新兴学科。 蛋白质组学的一个重要目 标是能够快速准确的进行蛋白质鉴定。 蛋白质鉴定主要包括肽段鉴定和蛋白质推断两个步骤。 肽段鉴定是从原始 质谱数据中鉴定出肽段序列,而蛋白质推断是从这些鉴定得到的肽段中还原出原始的蛋白质序列。 但由于质谱数 据固有的不确定性和蛋白质组的复杂性,使得解决蛋白质推断问题变得很困难。 本文引入串联质谱数据对于蛋白 质存在概率的影响,提出了一种基于概率图模型的方法(PGMPi)来解决蛋白质推断问题,将蛋白质推断问题抽象成 一个概率图模型的求解问题,通过寻找蛋白质的最大后验概率来推断真实存在的蛋白质集合。 该方法不仅能够进 行有效的蛋白质推断,而且模型参数少,提高了算法的稳定性。 实验结果表明该模型在蛋白质推断上具有很好的表 现。 关键词:蛋白质推断;肽段推断;鸟枪法蛋白质组学;概率图模型 中图分类号:TP393 文献标志码:A 文章编号:1673⁃4785(2016)01⁃0376⁃08 中文引用格式:赵璨,段琼,何增有.基于概率图模型的蛋白质推断算法[J]. 智能系统学报, 2016, 11(2): 376⁃383. 英文引用格式:ZHAO Can,DUAN Qiong,HE Zengyou.Protein inference method based on probabilistic graphical model[J]. CAAI transactions on intelligent systems, 2016,11(2): 376⁃383. Protein inference method based on probabilistic graphical model ZHAO Can,DUAN Qiong,HE Zengyou (School of Software, Dalian University of Technology, Dalian 116620, China) Abstract:Proteomics is an emerging discipline that focuses on the large-scale study of proteins expressed inan or⁃ ganism. An explicit goal of proteomics is the prompt and accurate identification of all proteins in a cell or tissue. Generally, protein identification can be divided into two parts: peptide identification and protein inference. In pep⁃ tide identification, the peptide sequence is identified from raw tandem mass spectrometry , while the goal of protein inference is to identify which of these identified proteins is truly present in the sample. Because of the inherent un⁃ certainty of MS data and the complexity of the proteome, there are several challenges in protein identification. In this article, we propose a novel method based on the probabilistic graphical model (PGMPi) that introduces the in⁃ fluence of tandem mass spectrometry. This method transforms the protein inference problem into a probabilistic graphical model problem to be solved, in which the maximum posteriori probabilities of proteins are identified in or⁃ der to identify the protein set that is actually present in the sample. PGMPi can not only achieve efficient perform⁃ ance in terms of identification, but also introduces only one parameter, which ensures the algorithm's stability. The experimental results demonstrate that our method is superior to existing state-of -the -art protein inference algo⁃ rithms. Keywords:protein inference; peptide inference; shotgun proteomics; probability graph model 收稿日期:2016⁃03⁃200. 网络出版日期:2016⁃05⁃13. 基金项目:国家自然科学基金项目(61572094). 通信作者:何增有. E⁃mail:zyhe@ dlut.edu.cn. 蛋白质组学是研究细胞内表达的所有的蛋白质 及其变化规律的一门新兴学科[1] 。 蛋白质组主要 是指由一个基因组,或一个细胞组织表达的所有蛋 白质。 基因组基本是固定不变的,而蛋白质组却为
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有