THE CHINESE UNIVERSITY OF HONG KONG香港中文大舉 An Introduction to Bioinformatics and its application in Protein-DNA/Protein Interactions Research and Drug Discovery CMSC5719 Dr.Leung,Kwong Sak Professor of Computer Science and Engineering Mar26,2012 1
1 An Introduction to Bioinformatics and its application in Protein-DNA/Protein Interactions Research and Drug Discovery CMSC5719 Dr. Leung, Kwong Sak Professor of Computer Science and Engineering Mar 26, 2012
Outline ol.Introduction to Bioinformatics oII.Protein-DNA Interactions oIll.Drug Discovery IV.Discussion and Conclusion 2
2 Outline I. Introduction to Bioinformatics II. Protein-DNA Interactions III. Drug Discovery IV. Discussion and Conclusion
I.Introduction to Bioinformatics os Bioinformatics o Research Areas o Biological Basics 3
3 I. Introduction to Bioinformatics Bioinformatics Research Areas Biological Basics
Introduction os Bioinformatics More and more crucial in life sciences and biomedical applications for analysis and new discoveries Huge noisy data Curated and well-organized Costly annotations Bioinformatics Effective and efficient analysis Individual specific Bridging Generalized knowledge Biology Informatics (e.g.Computer Science) 4
4 Introduction Bioinformatics More and more crucial in life sciences and biomedical applications for analysis and new discoveries Huge noisy data Costly annotations Individual & specific Biology Informatics (e.g. Computer Science) Curated and well-organized Effective and efficient analysis Generalized knowledge Bioinformatics Bridging
Bioinformatics Research Areas oMany (crossing)areas: R(Genome-scale)Sequence Analysis Sequence alignments,motif discovery,genome-wide association (to study diseases such as cancers) CR Computational Evolutionary Biology oR Phylogenetics,evolution modeling R Analysis of Gene Regulation R Gene expression analysis,alternative splicing,protein-DNA interactions,gene regulatory networks RStructural Biology oR Drug discovery,protein folding,protein-protein interactions R Synthetic Biology oR High throughput Imaging Analysis 3.. 5
5 Bioinformatics Research Areas Many (crossing) areas: (Genome-scale) Sequence Analysis Sequence alignments, motif discovery, genome-wide association (to study diseases such as cancers) Computational Evolutionary Biology Phylogenetics, evolution modeling Analysis of Gene Regulation Gene expression analysis, alternative splicing, protein-DNA interactions, gene regulatory networks Structural Biology Drug discovery, protein folding, protein-protein interactions Synthetic Biology High throughput Imaging Analysis …
Our Research Roadmap Real-life Projects→ Related Bioinformatics Problems Computer Techniques Drug HIV-1 SNP Alternative Protein-DNA Gene Protein-Protein Discovery Project Analysis (HBV) Splicing Interactions Networks Interactions Docking SNPs Ligand Sequence Association Genomic Motif Network Phylogeny Growing Alignment Study Analysis Discovery Analysis Searching Data Mining Statistical Strings Modeling Differential Optimization Analysis Algorithm Equations Kemel Non-linear Expectation Feature Selection (Mutual Methods GPU/ Integral Finite Markov Maximization Suffix Trees Information) Parallel Chains and Suffix Array Approximate Computing Markov Chain HMMS Database BWT Index Monte Carlo Evolutionary Matching (MCMC) Techniques Com putation 6
6 Our Research Roadmap
Genome-wide Association Human DNA sequences Normal Targets:SNPs that are associated with genetic diseases;Diagnosis and healthcare for high-risk patent Methods:Feature selection; Disease! mutual information;non-linear integrals;Support Vector Machine (SM); SNPs(singTe nucleotide polymorphism;>5% variations) KS Leung,KH Lee,(JF Wang),(Eddie YT Ng),Henry LY Chan,Stephen KW Tsui,Tony SK Mok,Chi-Hang Tse Joseph JY Sung,"Data Mining on DNA Sequences ofHepatitis BVirus".IEEE/ACM Transactions on Computational Biology and Bioinformatics.2011
Genome-wide Association … Human DNA sequences SNPs (single nucleotide polymorphism; >5% variations) Normal Disease! Targets: SNPs that are associated with genetic diseases; Diagnosis and healthcare for high-risk patent Methods: Feature selection; mutual information; non-linear integrals; Support Vector Machine (SVM); ! KS Leung, KH Lee, (JF Wang), (Eddie YT Ng), Henry LY Chan, Stephen KW Tsui, Tony SK Mok, Chi-Hang Tse, Joseph JY Sung, “Data Mining on DNA Sequences of Hepatitis B Virus”. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2011
HBV Project (Example) HBV sequences Feature Selection Hepatitis B (Hep B) →Normal Non-linear Integral (Problem Modeling) HepB→ Cancer! Optimization and Classification SNPs are not known and to be discovered by alignments Explicit Diagnosis Rules (if sites XX YY areA T,then
HBV Project (Example) … HBV sequences Hepatitis B (Hep B) →Normal Hep B → Cancer! ? ? ? Feature Selection Non-linear Integral (Problem Modeling) Optimization and Classification Explicit Diagnosis Rules (if sites XX & YY are A & T, then …) SNPs are not known and to be discovered by alignments
Biological Basics A string of amino acids U忙es Chromosome Telamere ∑={A,R,N,D,C,E.} 1∑=20 Gengme Centromere Cell Other functions: Protein-protein Gene Protein-ligand > RNA >Protein Transcription Translation Base Pairs A-T Regulatory functions C-G DNA >5'..AGACTGCGGA.·.3'→ .·.AGACTGCGGA... Sequence 3 ...TCTGACGCCT...5 A string with alphabet Σ={A,C,G,T} http://www.jeffdonofrio.net/DNA/DNA%20graphics/chromosome.gif 9 http://upload.wikimedia.org/wikipedia/commons/7/7a/Protein_conformation.jpg
9 Biological Basics Cell Chromosome DNA Sequence Genome 5’...AGACTGCGGA...3’ http://www.jeffdonofrio.net/DNA/DNA%20graphics/chromosome.gif http://upload.wikimedia.org/wikipedia/commons/7/7a/Protein_conformation. jpg 3’...TCTGACGCCT...5’ Base Pairs A-T C-G Gene ...AGACTGCGGA... A string with alphabet = {A,C, G, T} RNA A string of amino acids | | 20 {A, R, N, D, C, E...} = = Transcription Protein Translation Regulatory functions Other functions: Protein-protein Protein-ligand
Protein-ligand Interactions o3 Drug Discovery Protein structures Com putational power Sim ulation over wet lab Protein-ligand Other functions: Interactions Protein-protein Protein Detailed in Ill.drug discovery 10
10 Protein-ligand Interactions Drug Discovery Protein Other functions: Protein-protein Protein-ligand Interactions Protein structures Computational power Simulation over wet lab Detailed in III. drug discovery