当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

浙江大学:《生物信息学》(第二版)配套PPT课件_3 Analysis and alignment of sequences 3.4 Multiple sequence alignment and domain finding

资源类别:文库,文档格式:PPTX,文档页数:49,文件大小:1.24MB,团购合买
(1) Multiple sequence alignment and progressive global alignment (ClustalW) (2) Find and model local multiple alignment (3) How to evaluate the quality of a PSSM?
点击下载完整版文档(PPTX)

《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-3 3.4 Multiple sequence alignment and domain finding (1) Multiple sequence alignment and progressive global alignment(Clustal W) (2) Find and model local multiple alignment (3)How to evaluate the quality of a PSSM?

• (1) Multiple sequence alignment and progressive global alignment (ClustalW) • (2) Find and model local multiple alignment • (3) How to evaluate the quality of a PSSM? 3.4 Multiple sequence alignment and domain finding 《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-3

数据库 DNA/RNA 保守功能位点/元件 (Rfam、Dfam等) 联配多条序列 信息量/熵 概型 蛋白质 功能域 数据库 HMM ( PROSITE、Pfam等) 正则表达式

(1)Multiple sequence alignment and progressive global alignment(ClustalW) Why produce a multiple sequence alignment? Conserved regions/domains are likely to represent regions that are essential for structure and function core of proteins A multiple sequence alignment is a starting point for an evolutionary(phylogenetic)analysis Using more than two sequences results in a more convincing alignment by revealing conserved regions in all of the sequences

(1) Multiple sequence alignment and progressive global alignment (ClustalW) • Conserved regions/domains are likely to represent regions that are essential for structure and function - core of proteins • A multiple sequence alignment is a starting point for an evolutionary (phylogenetic) analysis • Using more than two sequences results in a more convincing alignment by revealing conserved regions in all of the sequences Why produce a multiple sequence alignment?

Types of multiple sequence alignment Global alignment in which entire sequences are aligned at the same time using extension of dy ynamic programming Local alignment in which conserved local regions derived by removing stretches of global alignment found by statistical methods

Types of multiple sequence alignment • Global alignment in which entire sequences are aligned at the same time using extension of dynamic programming • Local alignment in which conserved local regions • derived by removing stretches of global alignment • found by statistical methods

EXample of local msa Rest of Rest of proteins do not align we∥ot proteins do align well Domain that aligns well To find an identifiable common AGGCTT usually longest, pattern with AAGCTA 2 some degree of variability. No agactt 3 gaps are shown in this example AAACTA/ 4 but they can be accomodated

Example of local msa AGGCTT AAGCTA AGACTT AAACTA 1 2 3 4 Domain that aligns well Rest of proteins do not align well Rest of proteins do not align well To find an identifiable common, usually longest, pattern with some degree of variability. No gaps are shown in this example but they can be accomodated

Challenges Finding an optimal alignment of more than two sequences that includes matches mismatches, and gaps, and that takes into account the degree of variation in all of the sequences at the same time poses a very difficult challenge A second computational challenge is identifying a reasonable method of obtaining a cumulative score for the substitutions in the column ot an msa

• Finding an optimal alignment of more than two sequences that includes matches, mismatches, and gaps, and that takes into account the degree of variation in all of the sequences at the same time poses a very difficult challenge. • A second computational challenge is identifying a reasonable method of obtaining a cumulative score for the substitutions in the column of an msa. Challenges

Multiple sequence alignment is computational complex Suppose one tries to align three sequences by extending the method of aligning 2 sequences to a 3 dimensional scoring matrix Sequence 1 ■■■■■■■■■■■■ Problems . Time and space needed is length of seq raised to power of no of sequences 8c8 ? Optimal score in 3 Can do for three sequences dimensions but not more than three

Multiple sequence alignment is computational complex Suppose one tries to align three sequences by extending the method of aligning 2 sequences to a 3 dimensional scoring matrix. Sequence 1 Sequence 2 Y W W ? Optimal score in 3 dimensions Problems: •Time and space needed is length of seq. raised to power of no. of sequences •Can do for three sequences but not more than three

Alignment of three sequences by dynamic programming For three protein sequences each 300 amino acids in length and excluding gaps, the number of comparisons to be made by dynamic programming is equal to 3003=2.7 X 107, whereas only 3002=9 X104 is required for two sequences of this length (The number of steps and memory required for N M-amino-acid sequences: Mv Carrillo and Lipman(1988) found a way(the sum of pairs, sP method, the msa program)to reduce the number of comparisons that must be made without compromising the attempt to find an optimal alignment

Alignment of three sequences by dynamic programming • For three protein sequences each 300 amino acids in length and excluding gaps, the number of comparisons to be made by dynamic programming is equal to 3003 = 2.7 ×107 , whereas only 3002 = 9 ×104 is required for two sequences of this length. (The number of steps and memory required for N M-amino-acid sequences: MN) • Carrillo and Lipman (1988) found a way (the sum of pairs, SP method, the MSA program) to reduce the number of comparisons that must be made without compromising the attempt to find an optimal alignment

Basic idea of msa program sum of pairs(SP)method B A-C sequence A

Basic idea of MSA program: sum of pairs (SP) method

Thus, approximate methods are used, including (1)a progressive global alignment of the sequences starting with an alignment of the most alike sequences and then building an alignment by adding more sequences; (2)iterative methods that make an initial alignment of groups of sequences and then revise the alignment to achieve a more reasonable result (3)alignments based on locally conserved patterns found in the same order in the sequences (4)use of statistical methods and probabilistic models of the sequences

Thus, approximate methods are used, including: (1) a progressive global alignment of the sequences starting with an alignment of the most alike sequences and then building an alignment by adding more sequences; (2) iterative methods that make an initial alignment of groups of sequences and then revise the alignment to achieve a more reasonable result; (3) alignments based on locally conserved patterns found in the same order in the sequences; (4) use of statistical methods and probabilistic models of the sequences

点击下载完整版文档(PPTX)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共49页,可试读17页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有