MACINETIC RESONANCE ELSEVIER Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 The use of nmr methods for conformational studies of nucleic acids Sybren S. Wijmenga*, Bernd N M. van Buuren Umed University, Department of Medical Biochemistry and Biophysics, S 90187 Umed, Sweden Received 10 July 1997 Contents Introduction 2. RNA and DNA synthesis and purification 3. Nomenclature 4. Distances 291 4. 1. Overview of short distances and their general characteristics 4.2. Overview of structurally important intra-nucleotide distances 4.3. Overview of structurally important sequential and cross-strand distances 4.4. Derivation of distances from NOESY spectra and structure characterization using distances 4.5. Conclusion 5. J-couplings 5.1. JHC- and Jcc-couplings 5.2. Overview of J-couplings in the bases 5.3. Ribose sugar 5.4. Determination of the B torsion angle 310 5.5. Determination of the e torsion angle 312 5. 7. x orison angle and JHc sugar to base eo specific assignment 5.6. Torsion angle y and H5 and H5" ste 316 5.8. Measurement of homo- and heteronuclear J-coupling constants 5.8.1. Determination of J-couplings from the shape of the signal 5.8.2. Determination of J-couplings from E COSY patterns 18 5.8. 1. Homonuclear E cosy 318 5.8.2.2. Heteronuclear E cosY 5.8.2.2.1. Determination of JHP- and Jcp-couplings 5.8.2.2.2. Determination of JHc-couplings 5.8.2.2.3. Determination of JHH-couplings via HCC-E. COSY spectra 5.8.3. Determination of J-couplings from signal intensities 321 5.8.3. 1. Determination of JHH-couplings from homonuclear(H, H) TOCSY transfer 5.8.3.2. Determination of J-couplings from heteronuclear experiments Corresponding author. Tel: +469078 6500; fax: +469013 6310, mail: sybren a indigo. chem.umuse 0022-2860798/$19.00@ 1998 Published by Elsevier Science B v. All rights reserved PS0079-6565(97)00023-X
The use of NMR methods for conformational studies of nucleic acids Sybren S. Wijmenga*, Bernd N.M. van Buuren Umea˚ University, Department of Medical Biochemistry and Biophysics, S 901 87 Umea˚, Sweden Received 10 July 1997 Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387 0022-2860/98/$19.00 q 1998 Published by Elsevier Science B.V. All rights reserved PII S0079-6565(97)00023-X * Corresponding author. Tel: +46 9078 6500; fax: +46 9013 6310; e-mail: sybren@indigo.chem.umu.se Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 2. RNA and DNA synthesis and purification . . . . . . . . . . . . . . . . . . . . . . . . . 290 3. Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 4. Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 4.1. Overview of short distances and their general characteristics . . . . . . . . . . . . . . . . 292 4.2. Overview of structurally important intra-nucleotide distances . . . . . . . . . . . . . . . . 294 4.3. Overview of structurally important sequential and cross-strand distances . . . . . . . . . . . 295 4.4. Derivation of distances from NOESY spectra and structure characterization using distances . . . 295 4.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 5. J-couplings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 5.1. 1 JHC- and 1 JCC-couplings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 5.2. Overview of J-couplings in the bases . . . . . . . . . . . . . . . . . . . . . . . . . 307 5.3. Ribose sugar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 5.4. Determination of the b torsion angle. . . . . . . . . . . . . . . . . . . . . . . . . . 310 5.5. Determination of the e torsion angle . . . . . . . . . . . . . . . . . . . . . . . . . . 312 5.6. Torsion angle g and H59 and H599 stereo specific assignment. . . . . . . . . . . . . . . . 314 5.7. x torison angle and 3 JHC sugar to base . . . . . . . . . . . . . . . . . . . . . . . . . 316 5.8. Measurement of homo- and heteronuclear J-coupling constants . . . . . . . . . . . . . . . 316 5.8.1. Determination of J-couplings from the shape of the signal . . . . . . . . . . . . . . 316 5.8.2. Determination of J-couplings from E.COSY patterns . . . . . . . . . . . . . . . . 318 5.8.2.1. Homonuclear E.COSY . . . . . . . . . . . . . . . . . . . . . . . . 318 5.8.2.2. Heteronuclear E.COSY . . . . . . . . . . . . . . . . . . . . . . . . 319 5.8.2.2.1. Determination of JHP- and JCP-couplings . . . . . . . . . . . . 319 5.8.2.2.2. Determination of JHC-couplings . . . . . . . . . . . . . . . . 320 5.8.2.2.3. Determination of JHH-couplings via HCC-E.COSY spectra . . . . . 320 5.8.3. Determination of J-couplings from signal intensities . . . . . . . . . . . . . . . . 321 5.8.3.1. Determination of JHH-couplings from homonuclear (H,H) TOCSY transfer . . . 321 5.8.3.2. Determination of J-couplings from heteronuclear experiments . . . . . . . . 321
S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 6. Chemical shifts 322 6. 1. Chemical shifts; qualitative aspects 6.2. Theory 324 6.3. shifts 6.4. Structurally important H shifts 5. N and c shifts in dna and rna 6.6. shifts 330 7. Assignment methods 330 7. 1. Assignment without isotope labeling 7.2. Assignment with isotope labeling 335 7.2.1. NOE-based correlation 7.2.2. Through-bond correlation 337 7.2.2. 1. Coherence transfer functions 337 7.2.2.2. Through-bond amino/imino to non-exchangeable proton correlation 7.2.2.3. Through-bond H2-H8 correlation 344 7.2.2.4. Through-bond base sugar correlation 347 7.2.2.5. Through-bond sugar correlation 7.2.2.6. Through-bond sequential backbone assignment 357 7. 2.3. X-filter techniques 361 8. Relaxation and dynamics 363 9. Calculation of structures 375 10. Prospects for larger systems 378 11. Concl 383 References 383 Keywords: NMR; Conformational studies, Nucleic acids; RNA; DNA; Labeling; Assignment; Structure 1. Introduction However. the alternate rna and dNA structures associated with many of the different processes Nucleic acid molecules play a central role in cell mentioned above, are less well known. Only> in biological processes. DNA s main role is to act as the the early 1990s have technological advances in carrier of genetic information. Furthermore, DNA is sample preparation, such as isotope labeling and transcribed into RNa by a carefully regulated process, developments in crystallization, made such structural and it is duplicated on cell division. RNAs main role data available, and allowed the structural basis of the Is to communicate the genetic information for protein biological functions of DNA and rNa to be synthesis to the ribosomes RNA is, however, very addressed versatile. It can also take on the role of dna as the In the past ten years, we have witnessed an carrier of genetic information, and it can function as explosion in the number of crystal and solution struc an enzyme. It has even been hypothesized that early in tures of proteins determined by X-ray crystallography evolution, life was based entirely on RNA(see, for and NMr, respectively. In comparison, the increase in example, Ref. [I]). All these different processes the number of nucleic acid structures determined by require different structures. The basic structural either X-ray or NMR has been relatively small. This elements of rna and dna are well established. ie. can be attributed to the difficulties encountered when DNA forms a B-helix, while RNa may be either trying to crystallize nucleic acids for detailed X-ray single-stranded or may form an A-type helix. analysis and to the problem of extensive resonance
Keywords: NMR; Conformational studies; Nucleic acids; RNA; DNA; Labeling; Assignment; Structure 1. Introduction Nucleic acid molecules play a central role in cell biological processes. DNA’s main role is to act as the carrier of genetic information. Furthermore, DNA is transcribed into RNA by a carefully regulated process, and it is duplicated on cell division. RNA’s main role is to communicate the genetic information for protein synthesis to the ribosomes. RNA is, however, very versatile. It can also take on the role of DNA as the carrier of genetic information, and it can function as an enzyme. It has even been hypothesized that early in evolution, life was based entirely on RNA (see, for example, Ref. [1]). All these different processes require different structures. The basic structural elements of RNA and DNA are well established, i.e. DNA forms a B-helix, while RNA may be either single-stranded or may form an A-type helix. However, the alternate RNA and DNA structures, associated with many of the different processes mentioned above, are less well known. Only since the early 1990s have technological advances in sample preparation, such as isotope labeling and developments in crystallization, made such structural data available, and allowed the structural basis of the biological functions of DNA and RNA to be addressed. In the past ten years, we have witnessed an explosion in the number of crystal and solution structures of proteins determined by X-ray crystallography and NMR, respectively. In comparison, the increase in the number of nucleic acid structures determined by either X-ray or NMR has been relatively small. This can be attributed to the difficulties encountered when trying to crystallize nucleic acids for detailed X-ray analysis and to the problem of extensive resonance 6. Chemical shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 6.1. Chemical shifts; qualitative aspects . . . . . . . . . . . . . . . . . . . . . . . . . . 324 6.2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 6.3. 1 H shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 6.4. Structurally important 1 H shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 6.5. 15N and 13C shifts in DNA and RNA . . . . . . . . . . . . . . . . . . . . . . . . . 329 6.6. 31P shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 7. Assignment methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 7.1. Assignment without isotope labeling. . . . . . . . . . . . . . . . . . . . . . . . . . 330 7.2. Assignment with isotope labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 7.2.1. NOE-based correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 7.2.2. Through-bond correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 7.2.2.1. Coherence transfer functions . . . . . . . . . . . . . . . . . . . . . . 337 7.2.2.2. Through-bond amino/imino to non-exchangeable proton correlation . . . . . . 337 7.2.2.3. Through-bond H2-H8 correlation . . . . . . . . . . . . . . . . . . . . 344 7.2.2.4. Through-bond base ¹ sugar correlation . . . . . . . . . . . . . . . . . 347 7.2.2.5. Through-bond sugar correlation. . . . . . . . . . . . . . . . . . . . . 355 7.2.2.6. Through-bond sequential backbone assignment . . . . . . . . . . . . . . 357 7.2.3. X-filter techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 8. Relaxation and dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 9. Calculation of structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 10. Prospects for larger systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 288 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387
S.S. Wijmenga, B N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 overlap in NMR spectra of these compound this is more reliable resonance assignments. In addi- Advances in crystallization techniques have in recent tion, more extensive constraints lists could be years resulted in the structure determination of the obtained for subsequent structure determination. In RNA hammerhead enzyme [2, 3], one of the two fold- the past two years a number of RNA structures with ing domains of the group I intron self-splicing RNa a size up to 30 to 40 nucleotides have been published [4, 5], and a few RNA-protein complexes [6, 7].In [17-301, together with RNA-peptide complexes addition, the structures of several DNA duplexes, as [31-36] and an RNA-protein complex of total mol well as of a DNa quadruplex, have been determined ecular weight 22 kDa 37, 38]. These studies have also by means of X-ray crystallography [8]. However, made it clear that the upper size limit for RNAs which despite the two X-ray structures of the hammerhead, can be studied by NMr lies around 30 nucleotides catalytic mechanism of this ribozyme has not yet when uniform labeling is employed, a size limit clarified. Crystal packing forces sometimes considerably below that for proteins affect RNA or DNA structures. For example, there Only quite recently has it become possible to enrich is still no crystal structure available of a DNA or DNA withC> N isotopes. Zimmer and Crothers RNA hairpin, since these tend to crystallize in bio- 39]demonstrated that DNA can be enriched via an duplex structures enzymatic approach, while even more recentlyC [9, 10- Solution structures, which can be determined and N labeled dNa phosphoramidites have become via NMR, are therefore particularly important in dna available [401, so that C andN enriched DNAs can and RNA structural biology as a complement to now also be obtained via chemical synthesis. It is to be crystallography. In addition, nucleic acids often con- expected that these possibilities will also have an tain regions of higher conformational flexibility. effect on NMR structural studies of DNA of larger NMR is particularly suited for identifying such size. Larger DNA systems, such as those forming regions three- and four-way junctions, have already been In the field of NMR of nucleic acids, advances were studied [41], but these have not yet produced detailed made in the 1980s with the introduction of synthetic solution structures, again due to the extensive signal methods for preparing well defined DNA sequences. overlap(see, for example, Refs. [42-44D) It is note- This development also made it possible to produce worthy that Altona and co-workers used an extremely well defined RNA sequences from DNA templates interesting approach to achieve the assignments in by enzymatic synthesis via T7-polymerase. These their studies of four-way junctions [43, 44]. They developments led to the determination of several solu- used well-determined hairpins as building blocks for tion DNA and RNA hairpin structures, from which the the larger four-way and three-way junctions they main folding principles of hairpin loops could be studied. This made it possible to obtain resonance determined [11, 12]. In addition, these developments assignment in very crowded spectra. The future will led to the determination of the solution structure of a reveal whether combining this approach with labeling DNA quadruplex [13, 14] and solution structures of will allow an extension to larger systems, both for triple helix molecules [151, as well as to the determi- RNAs and dNAs nation of a new DNA multi-stranded fold the C-motif Naturally, as isotope enriched nucleic acid mol [16. Still, the overlap encountered in NMR spectra ecules are now used in NMR studies, we will pay limited the size of the molecules that could be studied particular attention in this review to the related and the detail by which the structures could be deter- NMR methods. Various other reviews [36, 45-48] mined. In the early 1990s, methods were developed to have recently appeared, but they have focused gener- produce C or N enriched RNAS, via enzymatic ally on specific aspects of the NMR of isotope synthesis, in quantities large enough for NMR studies. enriched RNA. We try here to provide a broad over This possibility enabled more detailed studies of bio- view, covering as much as possible of the various logically relevant RNA sequences and folds. Initial aspects that come into play when performing NMR NMR studies have been performed and methods structural studies of both DNA and RNa molecules have been developed for assignment of resonances Furthermore, the field is developing rapidly and new of C and N labeled RNAS. The direct result of aspects have been published since the appearance of
overlap in NMR spectra of these compounds. Advances in crystallization techniques have in recent years resulted in the structure determination of the RNA hammerhead enzyme [2,3], one of the two folding domains of the group I intron self-splicing RNA [4,5], and a few RNA–protein complexes [6,7]. In addition, the structures of several DNA duplexes, as well as of a DNA quadruplex, have been determined by means of X-ray crystallography [8]. However, despite the two X-ray structures of the hammerhead, the catalytic mechanism of this ribozyme has not yet been clarified. Crystal packing forces sometimes affect RNA or DNA structures. For example, there is still no crystal structure available of a DNA or RNA hairpin, since these tend to crystallize in biologically less relevant extended duplex structures [9,10]. Solution structures, which can be determined via NMR, are therefore particularly important in DNA and RNA structural biology as a complement to crystallography. In addition, nucleic acids often contain regions of higher conformational flexibility. NMR is particularly suited for identifying such regions. In the field of NMR of nucleic acids, advances were made in the 1980s with the introduction of synthetic methods for preparing well defined DNA sequences. This development also made it possible to produce well defined RNA sequences from DNA templates by enzymatic synthesis via T7-polymerase. These developments led to the determination of several solution DNA and RNA hairpin structures, from which the main folding principles of hairpin loops could be determined [11,12]. In addition, these developments led to the determination of the solution structure of a DNA quadruplex [13,14] and solution structures of triple helix molecules [15], as well as to the determination of a new DNA multi-stranded fold, the C-motif [16]. Still, the overlap encountered in NMR spectra limited the size of the molecules that could be studied and the detail by which the structures could be determined. In the early 1990s, methods were developed to produce 13C or 15N enriched RNAs, via enzymatic synthesis, in quantities large enough for NMR studies. This possibility enabled more detailed studies of biologically relevant RNA sequences and folds. Initial NMR studies have been performed and methods have been developed for assignment of resonances of 13C and 15N labeled RNAs. The direct result of this is more reliable resonance assignments. In addition, more extensive constraints lists could be obtained for subsequent structure determination. In the past two years a number of RNA structures with a size up to 30 to 40 nucleotides have been published [17–30], together with RNA–peptide complexes [31–36] and an RNA–protein complex of total molecular weight 22 kDa [37,38]. These studies have also made it clear that the upper size limit for RNAs which can be studied by NMR lies around 30 nucleotides when uniform labeling is employed, a size limit considerably below that for proteins. Only quite recently has it become possible to enrich DNA with 13C and 15N isotopes. Zimmer and Crothers [39] demonstrated that DNA can be enriched via an enzymatic approach, while even more recently 13C and 15N labeled DNA phosphoramidites have become available [40], so that 13C and 15N enriched DNAs can now also be obtained via chemical synthesis. It is to be expected that these possibilities will also have an effect on NMR structural studies of DNA of larger size. Larger DNA systems, such as those forming three- and four-way junctions, have already been studied [41], but these have not yet produced detailed solution structures, again due to the extensive signal overlap (see, for example, Refs. [42–44]). It is noteworthy that Altona and co-workers used an extremely interesting approach to achieve the assignments in their studies of four-way junctions [43,44]. They used well-determined hairpins as building blocks for the larger four-way and three-way junctions they studied. This made it possible to obtain resonance assignment in very crowded spectra. The future will reveal whether combining this approach with labeling will allow an extension to larger systems, both for RNAs and DNAs. Naturally, as isotope enriched nucleic acid molecules are now used in NMR studies, we will pay particular attention in this review to the related NMR methods. Various other reviews [36,45–48] have recently appeared, but they have focused generally on specific aspects of the NMR of isotope enriched RNA. We try here to provide a broad overview, covering as much as possible of the various aspects that come into play when performing NMR structural studies of both DNA and RNA molecules. Furthermore, the field is developing rapidly and new aspects have been published since the appearance of S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387 289
290 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 these reviews. For example, a complete overview of enzymatic synthesis is the usual method of prepara J-couplings in the nucleic acid bases has been pub- tion; although chemical synthesis is possible it is still lished [49] and proton structural chemical shifts have prohibitively expensive when large quantities are been calculated and compared with experimental data required. Chemical synthesis is the usual approach [50]. We will incorporate these aspects into this for the preparation of DNAs of defined sequence review, together with a detailed description and criti- Zimmer and Crother [39]have shown how large quan- cal evaluation of the present state of the art NMr tities of DNA can be made via enzymatic synthesis, methodology for determining the structure of labeled thus demonstrating the feasibility of C and N DNA and RNa molecules. This review is divided into labeling of DNA via this method. However, C and eleven sections. In Section 2, C andN labeling, as N labeled DNA phosphoramidites have also recently well as other labeling methods, are described, albeit become commercially available, so that labeled briefly, in view of the quite detailed descriptions that DNAs can conveniently be prepared via chemical have recently appeared. The IUPAC nomenclature is synthesis [40]. We refer the reader to the original introduced in Section 3. In Section 4, we present an papers or reviews for the detailed protocols and for overview of the distances found in dna and rna discussions of the relative merits of the various molecules and discuss their relevance for NMR approaches [36,45,. 47, 48, 51-59]. Here we will structural studies. Section 5 gives an overview of all concentrate on some general and qualitative aspects homonuclear and heteronuclear J-couplings and A certain amount of confusing terminology has describes their structural dependencies. We also give crept into the literature with regard to labeling. We an overview of the NMr methods that are or can be will use the following terms: uniform labeling, when used to determine these J-couplings In Section 6, we every atom of a certain type in the molecule describe the chemical shifts and discuss their use both enriched: re esidue-type-specific labeling, if all residues for assignment purposes and as structural parameters. of a certain type(e.g. all Adenines) in the molecule Section 7 forms the heart of this review, and describes are enriched; site-specific labeling, if a particular resi- nd discusses in detail the currently available methods due or a number of particular residues are enriched for assignment both in unlabeled and C and N e.g. Al0 partial labeling, if the labeling of a certain labeled compounds. Section 8 concentrates on a residue is on, say, CI'only. In order to indicate that description of relaxation. Isotope enrichment has labeling is not 100%, we add the percentage after the opened up the way for detailed relaxation studies in word labelin the field of proteins. Such relaxation studies are still For the enzymatic synthesis of RNA, a DNA tem- scarce in the field of nucleic acids. We place relaxa- plate is required from which the RNA is transcribed tion studies on nucleic acids in the context of parallel by T7-polymerase using NTPs as building blocks tudies on proteins, and give an overview of the The C and/or N and/or H labeled NTPs are theoretical background. In Section 9 we briefly usually obtained from E coli cells, which are grown describe the actual structure determination from on either C enriched glucose, and/or 5N enriched NMr data. In Section 10, we discuss the prospects ammonium chloride. The RNA isolated from the cells for extension of NMR studies to larger systems and is broken down to C and/or N labeled NMPs we attempt to draw some conclusions in Section 11 which are subsequently converted into NTPs. Th method thus allows uniformly labeled RNAs to be made, or residue-type-specific labeled RNA when 2. RNA and DNA synthesis and purification the in vitro transcription occurs on a mixture of labeled and unlabeled NTPs. The method can in prin- Two strategies are available for preparing large ciple easily be extended to achieve deuteration or par quantities of DNA and RNa of defined sequence tial labeling. For example, Michnicka et al. [60]have nd high purity for NMR studies: (1)chemical suggested partial C labeling using acetate as a car synthesis by the phosphoramidite method, and (2) bon source, most recently Nikonowicz et al. [57]have enzymatic synthesis of RNAs via T7-polymerase demonstrated uniform HN labeling via the nd of dNAs via DNA-polymerase. For RNA enzymatic more complicated to
these reviews. For example, a complete overview of J-couplings in the nucleic acid bases has been published [49] and proton structural chemical shifts have been calculated and compared with experimental data [50]. We will incorporate these aspects into this review, together with a detailed description and critical evaluation of the present state of the art NMR methodology for determining the structure of labeled DNA and RNA molecules. This review is divided into eleven sections. In Section 2, 13C and 15N labeling, as well as other labeling methods, are described, albeit briefly, in view of the quite detailed descriptions that have recently appeared. The IUPAC nomenclature is introduced in Section 3. In Section 4, we present an overview of the distances found in DNA and RNA molecules and discuss their relevance for NMR structural studies. Section 5 gives an overview of all homonuclear and heteronuclear J-couplings and describes their structural dependencies. We also give an overview of the NMR methods that are or can be used to determine these J-couplings. In Section 6, we describe the chemical shifts and discuss their use both for assignment purposes and as structural parameters. Section 7 forms the heart of this review, and describes and discusses in detail the currently available methods for assignment both in unlabeled and 13C and 15N labeled compounds. Section 8 concentrates on a description of relaxation. Isotope enrichment has opened up the way for detailed relaxation studies in the field of proteins. Such relaxation studies are still scarce in the field of nucleic acids. We place relaxation studies on nucleic acids in the context of parallel studies on proteins, and give an overview of the theoretical background. In Section 9 we briefly describe the actual structure determination from NMR data. In Section 10, we discuss the prospects for extension of NMR studies to larger systems and we attempt to draw some conclusions in Section 11. 2. RNA and DNA synthesis and purification Two strategies are available for preparing large quantities of DNA and RNA of defined sequence and high purity for NMR studies: (1) chemical synthesis by the phosphoramidite method, and (2) enzymatic synthesis of RNAs via T7-polymerase and of DNAs via DNA-polymerase. For RNA, enzymatic synthesis is the usual method of preparation; although chemical synthesis is possible it is still prohibitively expensive when large quantities are required. Chemical synthesis is the usual approach for the preparation of DNAs of defined sequence. Zimmer and Crother [39] have shown how large quantities of DNA can be made via enzymatic synthesis, thus demonstrating the feasibility of 13C and 15N labeling of DNA via this method. However, 13C and 15N labeled DNA phosphoramidites have also recently become commercially available, so that labeled DNAs can conveniently be prepared via chemical synthesis [40]. We refer the reader to the original papers or reviews for the detailed protocols and for discussions of the relative merits of the various approaches [36,45,47,48,51–59]. Here we will concentrate on some general and qualitative aspects. A certain amount of confusing terminology has crept into the literature with regard to labeling. We will use the following terms: uniform labeling, when every atom of a certain type in the molecule is enriched; residue-type-specific labeling, if all residues of a certain type (e.g. all Adenines) in the molecule are enriched; site-specific labeling, if a particular residue or a number of particular residues are enriched, e.g. A10; partial labeling, if the labeling of a certain residue is on, say, C19 only. In order to indicate that labeling is not 100%, we add the percentage after the word labeling. For the enzymatic synthesis of RNA, a DNA template is required from which the RNA is transcribed by T7-polymerase using NTPs as building blocks. The 13C and/or 15N and/or 2 H labeled NTPs are usually obtained from E. coli cells, which are grown on either 13C enriched glucose, and/or 15N enriched ammonium chloride. The RNA isolated from the cells is broken down to 13C and/or 15N labeled NMPs, which are subsequently converted into NTPs. This method thus allows uniformly labeled RNAs to be made, or residue-type-specific labeled RNA when the in vitro transcription occurs on a mixture of labeled and unlabeled NTPs. The method can in principle easily be extended to achieve deuteration or partial labeling. For example, Michnicka et al. [60] have suggested partial 13C labeling using acetate as a carbon source; most recently Nikonowicz et al. [57] have demonstrated uniform 2 H/15N labeling via the enzymatic approach. It is more complicated to 290 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387
S.S. Wijmenga, B N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 291 achieve site-specific labeling via the enzymatic des(; r)p for distances between adjacent base paired method(see, for example, Ref. [36]). Site-specific labeling, on the other hand, can quite easily be nucleotides, e.g. dcs(1, 2)3 achieved via chemical synthesis. This would be the The symbols NH and NHz represent imino and amino method of choice for the preparation of labeled dNA oligonucleotide protons, respectively. The directionality in the equential cross-strand distances has to be indicated Consider two adjacent base pairs, and define the 5 3. Nomenclature and 3 -nucleotides. It can be easily seen that dcs either between two 3 -nucleotides or between two For atom numbering and torsion angle definitions in 5-nucleotides. This is indicated by the subscript nucleic acids we will follow the IUPAC/UB guide Alternatively, when two protons I and r do not fall lines [61]. Accordingly, the chemical structure and in any of the above categories the distance is indicated atom numbering of the five common bases, the by pyrimidines C, T and U, and the purines G and A are given in Fig. 1(A), and of the B-D-(deoxy)riboses d(; r) for long -range internucleotide distances in Fig. I(B), which also indicates the torsion angles in e.g. d(T2-NB3, A9-NH,6) the sugar-phosphate backbone(a, 6, 7, 8, 8 and 5) and the glycosidic torsion angle x. Their definitions Here, T2-NH3 indicates the imino proton of Thymine are:O3’-P-O5′-C5′(a),P-O5′-C5-C4’(B), number2andA9NH26 indicates the amino group of O5′-C5′-C4-C3′(),C5′-C4-C3-03′(6), Adenine number9 C4-C3-03-P(e),C3-03′-P-O5′(3),O4 CI'-NI-C2 ( x (Py), and 04'-C1'-N9-C4 (x(Pu)). Furthermore, it gives a designation of the 4. Distances chain direction and the unit numbering in a poly- nucleotide chain. Fig. I(C)shows the two most Proton to proton distances are essential parameters common conformations of the B-D(deoxy )ribose for the three-dimensional structure determination of sugar ring, the C2'-endo (E)and the C3'-endo(E) biomolecules by NMR. Since only short distances conformers, also referred to as S-type and N-type (<5-6A)can be obtained by NMR, it is difficult to conformers, respectively determine global features, such as bending of the To describe the distances we will use the shorthand helix. On the other hand, local features can be deter- notation introduced by Wijmenga et al. [62]. In this mined quite well and most NMR structural studies notation the distance between the protons I and r is have focused on these aspects. Consequently, it is of paramount importance to have a good overview of the di(; r)for intranucleotide distances, e.g. di (8, 2) short distances in the main structural elemer as the sugar ring, the bases, the base pairs, etc ds(; r) for internucleotide distances, e.g. ds(1, 6) how these distances determine the structural of those elements. Another aspect is that several of the Here, I corresponds to the proton in the 5'-nucleotide short distances do not depend on conformation nor do unit and r with the proton in the 3-nucleotide unit. For they take on well defined values for the two major methyl protons the I or r is indicated by the letter helical conformations. A- and B-helices. for this M. To indicate that the distance is between H3 in reason, it is particularly useful to have at hand an the 5'-nucleotide and H5 or the methyl protons in overview of these distances and their characteristics, the3′- nucleotide se d(3, 5/M). Cross-strand so that one can focus on the relevant data for interesting structural aspects dci(l; r) for distances within a base pair, In the next sections we therefore discuss the short distances and how they reflect structural characteris- e.g. dci(T-NH3; A-NH26 tics, by first giving a more general overview and
achieve site-specific labeling via the enzymatic method (see, for example, Ref. [36]). Site-specific labeling, on the other hand, can quite easily be achieved via chemical synthesis. This would be the method of choice for the preparation of labeled DNA oligonucleotides. 3. Nomenclature For atom numbering and torsion angle definitions in nucleic acids we will follow the IUPAC/IUB guidelines [61]. Accordingly, the chemical structure and atom numbering of the five common bases, the pyrimidines C, T and U, and the purines G and A, are given in Fig. 1(A), and of the b-D-(deoxy) riboses in Fig. 1(B), which also indicates the torsion angles in the sugar–phosphate backbone (a, b, g, d, « and z) and the glycosidic torsion angle x. Their definitions are: O39–P–O59–C59 (a), P–O59–C59–C49 (b), O59–C59–C49–C39 (g), C59–C49–C39–O39 (d), C49–C39–O39–P («), C39–O39–P–O59 (z), O49– C19–N1–C2 (x (Py)), and O49–C19–N9–C4 (x (Pu)). Furthermore, it gives a designation of the chain direction and the unit numbering in a polynucleotide chain. Fig. 1(C) shows the two most common conformations of the b-D-(deoxy)ribose sugar ring, the C29-endo (2 E) and the C39-endo (3 E) conformers, also referred to as S-type and N-type conformers, respectively. To describe the distances we will use the shorthand notation introduced by Wijmenga et al. [62]. In this notation the distance between the protons l and r is given by: di(l; r) for intranucleotide distances, e:g: di(8; 29) ds(l; r) for internucleotide distances, e:g: ds(19; 6) Here, l corresponds to the proton in the 59-nucleotide unit and r with the proton in the 39-nucleotide unit. For methyl protons the l or r is indicated by the letter M. To indicate that the distance is between H39 in the 59-nucleotide and H5 or the methyl protons in the 39-nucleotide we use ds(39;5/M). Cross-strand distances are defined as: dci(l; r) for distances within a base pair, e:g: dci(T ¹ NH3; A ¹ NH26) dcs(l; r)p for distances between adjacent base paired nucleotides, e:g: dcs(19; 2)39 The symbols NH and NH2 represent imino and amino protons, respectively. The directionality in the sequential cross-strand distances has to be indicated. Consider two adjacent base pairs, and define the 59- and 39-nucleotides. It can be easily seen that dcs is either between two 39-nucleotides or between two 59-nucleotides. This is indicated by the subscript p. Alternatively, when two protons l and r do not fall in any of the above categories the distance is indicated by: d(l; r) for long ¹ range internucleotide distances, e:g: d(T2 ¹ NH3; A9 ¹ NH26) Here, T2-NH3 indicates the imino proton of Thymine number 2 and A9-NH26 indicates the amino group of Adenine number 9. 4. Distances Proton to proton distances are essential parameters for the three-dimensional structure determination of biomolecules by NMR. Since only short distances ( , 5–6 A˚ ) can be obtained by NMR, it is difficult to determine global features, such as bending of the helix. On the other hand, local features can be determined quite well and most NMR structural studies have focused on these aspects. Consequently, it is of paramount importance to have a good overview of the short distances in the main structural elements, such as the sugar ring, the bases, the base pairs, etc. and of how these distances determine the structural features of those elements. Another aspect is that several of the short distances do not depend on conformation nor do they take on well defined values for the two major helical conformations, A- and B-helices. For this reason, it is particularly useful to have at hand an overview of these distances and their characteristics, so that one can focus on the relevant data for the more interesting structural aspects. In the next sections we therefore discuss the short distances and how they reflect structural characteristics, by first giving a more general overview and S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387 291
292 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 Cytosine Thymine Uracil Guanine i+1 Fig 1. Structure and atom numbering in nucleic acids, according to the IUPAC/UB guidelines 61, of the five common bases(pyrimidines C,T and U; purines G and A)(A), and of the B-D-deoxy )riboses(B and C).(B)also shows the torsion angles in the sugar-phosphate backbone(a, B, 6, e and $)and the glycosidic torsion angle x( the exact definition is given in the text), a designation of the chain 5 to 3 direction and the unit numbering in a polynucleotide chain. (C) shows the puckering of the two most common B-D-(deoxy )ribose sugar ring conformations, the C2'-endo(or S-type )and the C3'-endo(or N-type)conformati subsequently going into more detail. Finally, we 4.1.Overview of short distances and their general iscuss their derivation from NOESY spectra and characteristics their use as constraints in simulated annealing protocols In Table 1 we have summarized the short distances
subsequently going into more detail. Finally, we discuss their derivation from NOESY spectra and their use as constraints in simulated annealing protocols. 4.1. Overview of short distances and their general characteristics In Table 1 we have summarized the short distances Fig. 1. Structure and atom numbering in nucleic acids, according to the IUPAC/IUB guidelines [61], of the five common bases (pyrimidines C, T and U; purines G and A) (A), and of the b-D-(deoxy)riboses (B and C). (B) also shows the torsion angles in the sugar–phosphate backbone (a, b, g, d, « and z) and the glycosidic torsion angle x (the exact definition is given in the text), a designation of the chain 59 to 39 direction and the unit numbering in a polynucleotide chain. (C) shows the puckering of the two most common b-D-(deoxy)ribose sugar ring conformations, the C29-endo (or S-type) and the C39-endo (or N-type) conformations. 292 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387
S.S. Wijmenga, B N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 293 Overview of short distances per residue L. constant 4. base-sugar 632 3000 0841 Inter-nucleotide 1. non-exchangeable I sequential sugar-base 2. sequential base-base 3. sequential sugar-sugar I. within base pair M: measurable distances 0.2 A(see text):C:NMR accessible structural' distances: D: NMR accessible 'structural distances that are different in A- and B-helices. The intra-nucleotide distances: 1. The constant distances d 2: 2), d,(5, 5), dA(6, 5)and d(6, M); 2. the sugar-to-sugar distances, dA1'-4: 1-4); they all fall into group B, except for d, (2: 4)and d,(1, 4), which fall into groups C and C, while group d only contains d( 2: 43. the distances d(2-4, 5/ 5); group B contains the distances dA( 2/2.575); group C contains d(374, 575): none of them fall into groups C or D, 4. sugar-to-base distances d,(6/8: 1-5); they are subdivided according to: group B, d,(6/8: 4), group C, d( 6/8; 1-3, 575), group C, excluded d 6/8; 575), group D, dA6/8, 1-3). The distances d( S/M: 1-5)are not taken into account since they are larger than 5 A[62]. The inter-nucleotide distance (considered are the distances to and from Cytosine in a GCG trinucleotide sequence (see Fig. 2). I. Non-exchangeable protons: 1. sequential sugar-to-base distances, d(1-3; 8)and ds(1-3: 6/5); all of them fall into categories C, C and D; 2. base-to-base distances, ds(6/5; 8)and d, (8: 6/ 5); all of them fall into categories C, C and D; 3. sequential sugar-to-sugar distances, ds(1-2, 4, 5), ds(5",1-2,4), ds(2: 3), ds(3, 2), ds(2, 2)and ds (2, 2); all of them are conformation dependent (category C), but only dx2: 3)and d33, 2) are easily accessible tween A-type and B-type helices. Il. Inter-nucleotide distances involving exchangeable protons: 1. The distances within a bas d(NH2: NH); this distance depends on conformation(category C), is NMR accessible (C ), but does not differ between A-type helices; 2. the sequential distances are ds(NH2: NH)and ds NH; NH2); they both fall into category C and C, but not into category D: strand distances are d(NH2: NH2)s, and d(NH2: NH2)3, they fall into category C, Cand D(however, note that the NH2 resonances of G may be broadened making them inaccessible for NMR) ( 5-6 A)and categorized them into two main dependence on conformation is indicated(a to D). groups, intra-nucleotide and inter-nucleotide dis- with category A referring to conformation indepen- tances, with further subdivision to reflect more dent distances, category B to distances that can vary detailed conformational characteristics. The inter- by less than +0. 2 A, and category C to structural nucleotide distances fall into the two broad groups distances, i.e. distances that convey structural infor- of sequential and cross-strand distances involving mation since they can vary by more than +0.2 A non-exchanging protons and exchanging protons, Thestructural distances, category C, are subdivided respectively. The sequential distances involving into two further categories to indicate their usefulness non-exchanging protons are again subdivided into category C contains those structuraldistances that sugar-to-sugar distances, base-to-base distances and are reasonably well accessible by NMR, and category ugar-to-base distances. Within each category their D refers to NMR accessiblestructural'distances that
( , 5–6 A˚ ) and categorized them into two main groups, intra-nucleotide and inter-nucleotide distances, with further subdivision to reflect more detailed conformational characteristics. The internucleotide distances fall into the two broad groups of sequential and cross-strand distances involving non-exchanging protons and exchanging protons, respectively. The sequential distances involving non-exchanging protons are again subdivided into sugar-to-sugar distances, base-to-base distances and sugar-to-base distances. Within each category their dependence on conformation is indicated (A to D), with category A referring to conformation independent distances, category B to distances that can vary by less than 6 0.2 A˚ , and category C to ‘structural’ distances, i.e. distances that convey structural information since they can vary by more than 6 0.2 A˚ . The ‘structural’ distances, category C, are subdivided into two further categories to indicate their usefulness; category C9 contains those ‘structural’ distances that are reasonably well accessible by NMR, and category D refers to NMR accessible ‘structural’ distances that Table 1 Overview of short distances per residue Type % M A B C C9 D Intra-nucleotide 1. constant 5 3 3 0 0 0 0 2. sugar–sugar 16 10 0 8 2 2 1 3. sugar–59/50 13 8 0 4 4 0 0 4. base–sugar 12 7 0 1 6 4 4 sum 46 28 3 13 12 6 5 Inter-nucleotide I. non-exchangeable 1. sequential sugar–base 20 12 0 0 12 12 12 2. sequential base–base 6 4 0 0 4 4 0 3. sequential sugar–sugar 20 12 0 0 12 2 2 4. cross-strand (3%) (2) 0 0 (2) (2) (2) sum 46 28 0 0 28 18 14 II. exchangeable (imino/amino) 1. within base pair 2 1 0 0 1 1 0 2. sequential 3 2 0 0 2 2 0 3. cross-strand 3 2 0 0 2 2 2 sum 8 5 0 0 5 5 2 Total 100 61 3 13 45 29 20 % 100 5 21 74 48 33 M: measurable distances , 5 to 6 A˚ [62]; A: completely conformation independent distances; B: distances that are conformation independent within approximately 6 0.2 A˚ ; C: ‘structural distances’, i.e. conformation dependent distances with variation . 0.2 A˚ (see text); C9: NMR accessible ‘structural’ distances; D: NMR accessible ‘structural’ distances that are different in A- and B-helices. The intra-nucleotide distances: 1. The constant distances di(29;20), di(59;50), di(6;5) and di(6;M); 2. the sugar-to-sugar distances, di(19-49;19-49); they all fall into group B, except for di(20;49) and di(19;49), which fall into groups C and C9, while group D only contains di(20;49); 3. the distances di(29-49;59/ 50); group B contains the distances di(29/20;59/50); group C contains di(39/49;59/50); none of them fall into groups C9 or D; 4. sugar-to-base distances di(6/8;19-50); they are subdivided according to: group B, di(6/8;49), group C, di(6/8;19-39, 59/50), group C9, excluded di(6/8;59/50), group D, di(6/8;19-39). The distances di(5/M;19-50) are not taken into account since they are larger than 5 A˚ [62]. The inter-nucleotide distances (considered are the distances to and from Cytosine in a GCG trinucleotide sequence (see Fig. 2). I. Non-exchangeable protons: 1. sequential sugar-to-base distances, ds(19-39;8) and ds(19-39;6/5); all of them fall into categories C, C9 and D; 2. base-to-base distances, ds(6/5;8) and ds(8;6/ 5); all of them fall into categories C, C9 and D; 3. sequential sugar-to-sugar distances, ds(19-20,49;50), ds(50;19-20,49), ds(29;39), ds(39;29), ds(29;20) and ds(29;20); all of them are conformation dependent (category C), but only ds(29;39) and ds(39;29) are easily accessible and differ between A-type and B-type helices. II. Inter-nucleotide distances involving exchangeable protons: 1. The distances within a base pair are dc(NH2; NH); this distance depends on conformation (category C), is NMR accessible (C9), but does not differ between A-type and B-type helices; 2. the sequential distances are ds(NH2;NH) and ds(NH;NH2); they both fall into category C and C9, but not into category D; 2. the crossstrand distances are dc(NH2;NH2) 59 and dc(NH2;NH2) 39; they fall into category C, C9 and D (however, note that the NH2 resonances of G may be broadened making them inaccessible for NMR). S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387 293
S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 also show differences depending on whether they are di(1: 4). Only d(2; 4)differs significantly present in an A- or B-type helix between S-type and N-type conformers, with As can be seen from Table 1 approximately 60 d(2;4 4. 2A for the s-type conformer distances per residue can in principle be measured (pseudorotation angle P= 160)and dH(2: 4)= The number of distances that are constant within 2.8A for the N-type conformer (P=10 +0.2 A is rather high. They represent about 26% Although it is in principle possible to determine of the total number of measurable distances Their the sugar conformation from the di(2": 4")distance percentage is even higher for the intra-nucleotide dis- the accuracy of the determination is limited. The tances, of which they represent about 57%(16 out of di(2: 4) distance is difficult to determine from 28). The distances that convey relevant structural NOE intensities because of spin diffusion effects information (in helices) and are reasonably well due to the close proximity of the H2"and H2 accessible by NMR represent less then half (48%)of protons. Also note that in RNA the H2 proton is the total number of distances, while only 20 are dif- bsent, so that these sugar distances cannot be used ferent between A-and B-type helices(33%). Note at all to determine the puckering. The distance also the small number of structurally very important di(1: 4)is almost identical for N-type and s cross-strand and sequential distances involving pe sugars 3.4 A), but has a lower value for exchanging protons which establish base pairing sugar rings with an intermediate pseudorotation (8%), and the small number of cross-strand distances angle, di(1: 4)=2.6A for P=90. Here again involving non-exchanging protons (3%) On the other spin diffusion can adversely affect the accuracy hand, sequential sugar-to-base and sugar-to-sugar dis- distance of the determination tances, which are so important for establishing base 3. The distances dA(3: 575)depend only weakly on stacking and defining the phosphate backbone, are the sugar ring conformation, but significantly on both relatively large in number(20%). The former the y torsion angle, while the distances di 44; 57 e mostly reasonably well accessible by NMR 6) only depend on the y torsion angle. The whereas the latter are extremely difficult to establish distances di(374; 5/5")therefore allow the deter- Thus, a rather uneven spread in the short distances is mination of the torsion angle [62, 63]. This can be found through the chemical structure. As a conse- done in conjunction with relevant J-couplings(see quence, important structural features such as base Section 5). Given an uncertainty in these distances ing often hinge on the presence of a particular of +0.2 A, they do not discriminate well between NOE contact reflecting one short distance the different ranges of the y torsion angle, in ticular when an equilibrium between g* and g 4.2. Overview of structurally important intra tamers exists. The distances di(2/2 575) nucleotide distances depend on both the sugar puckering and the torsion angle y, but their dependence is weak, and they are The intra-nucleotide distances in dna and rna of the order of 5 to 6 A [62] can conveniently be subdivided according to the 4. The distance between HI and H8/6, d(1: 6/8) categories indicated in Table 1, i.e. (1)conformation depends only on the glycosidic torsion angle x. It independent distances,(2)distances between sugar thus provides a means for determining this torsion protons, (3)distances between H2/2/3/4 and H5/ angle. However, the maximum difference in the 5"(4)distances between HI'through H575"and base values of di(1: 6/8)for x in the syn domain (x= 60°) and in the anti domain(x=240°) is only about 1.2 A. Given that in practice the uncertainty 1. The conformation independent distances are: the in the distance determination from noe data is of geminal proton distances, d 2, 2 )and d 5, 5) the order of±0.2Ato±0.5A. it is to be of 1.8A, d( 5: 6)(= 2.45 A)in Cytosine and expected that the use of d; (1: 6/8)is a rather impre Uracyl, and d 6, M)in Thymidine cise means to determine the x torsion angle. The 2. The distances within the sugar ring are all indepen- other sugar proton to base proton distances, dH(2/ dent of its conformation, except for di(2: 4)and 21314: 6/8), depend on both the sugar puckering
also show differences depending on whether they are present in an A- or B-type helix. As can be seen from Table 1 approximately 60 distances per residue can in principle be measured. The number of distances that are constant within 6 0:2 A˚ is rather high. They represent about 26% of the total number of measurable distances. Their percentage is even higher for the intra-nucleotide distances, of which they represent about 57% (16 out of 28). The distances that convey relevant structural information (in helices) and are reasonably well accessible by NMR represent less then half (48%) of the total number of distances, while only 20 are different between A- and B-type helices (33%). Note also the small number of structurally very important cross-strand and sequential distances involving exchanging protons which establish base pairing (8%), and the small number of cross-strand distances involving non-exchanging protons (3%). On the other hand, sequential sugar-to-base and sugar-to-sugar distances, which are so important for establishing base stacking and defining the phosphate backbone, are both relatively large in number (20%). The former are mostly reasonably well accessible by NMR, whereas the latter are extremely difficult to establish. Thus, a rather uneven spread in the short distances is found through the chemical structure. As a consequence, important structural features such as base pairing often hinge on the presence of a particular NOE contact reflecting one short distance. 4.2. Overview of structurally important intranucleotide distances The intra-nucleotide distances in DNA and RNA can conveniently be subdivided according to the categories indicated in Table 1, i.e. (1) conformation independent distances, (2) distances between sugar protons, (3) distances between H29/20/39/49 and H59/ 50, (4) distances between H19 through H59/50 and base protons. 1. The conformation independent distances are: the geminal proton distances, di(29;20) and di(59;50), of 1.8 A˚ , di(5;6) ( ¼ 2.45 A˚ ) in Cytosine and Uracyl, and di(6,M) in Thymidine. 2. The distances within the sugar ring are all independent of its conformation, except for di(20;49) and di(19;49). Only di(20;49) differs significantly between S-type and N-type conformers, with di(20;49) ¼ 4.2 A˚ for the S-type conformer (pseudorotation angle P ¼ 1608) and di(20;49) ¼ 2.8 A˚ for the N-type conformer (P ¼ 108). Although it is in principle possible to determine the sugar conformation from the di(20;49) distance, the accuracy of the determination is limited. The di(20;49) distance is difficult to determine from NOE intensities because of spin diffusion effects, due to the close proximity of the H29 and H20 protons. Also note that in RNA the H20 proton is absent, so that these sugar distances cannot be used at all to determine the puckering. The distance di(19;49) is almost identical for N-type and Stype sugars (3.4 A˚ ), but has a lower value for sugar rings with an intermediate pseudorotation angle, di(19;49) ¼ 2.6 A˚ for P ¼ 908. Here again spin diffusion can adversely affect the accuracy distance of the determination. 3. The distances di(39; 59/50) depend only weakly on the sugar ring conformation, but significantly on the g torsion angle, while the distances di(49; 59/ 50) only depend on the g torsion angle. The distances di(39/49; 59/50) therefore allow the determination of the torsion angle [62,63]. This can be done in conjunction with relevant J-couplings (see Section 5). Given an uncertainty in these distances of 6 0.2 A˚ , they do not discriminate well between the different ranges of the g torsion angle, in particular when an equilibrium between gþ and gt rotamers exists. The distances di(29/20; 59/50) depend on both the sugar puckering and the torsion angle g, but their dependence is weak, and they are of the order of 5 to 6 A˚ [62]. 4. The distance between H19 and H8/6, di(19;6/8), depends only on the glycosidic torsion angle x. It thus provides a means for determining this torsion angle. However, the maximum difference in the values of di(19;6/8) for x in the syn domain (x ¼ 608) and in the anti domain (x ¼ 2408) is only about 1.2 A˚ . Given that in practice the uncertainty in the distance determination from NOE data is of the order of 6 0.2 A˚ to 6 0.5 A˚ , it is to be expected that the use of di(19;6/8) is a rather imprecise means to determine the x torsion angle. The other sugar proton to base proton distances, di(29/ 20/39/49;6/8), depend on both the sugar puckering 294 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387
S.S. Wimenga, B NM van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 295 and the x torsion angle. The distance d(4: 6/8) reproduced from Wijmenga et al. [62] gives the does not convey useful structural information sequential distances, ds(; r) cross-strand since its dependence on these parameters is weak distances, dc r)and d(l; r), found in A-DNA [62]. The distances di (2/2/3: 6/8)are on the other B-DNA and rna helices hand quite useful. Each of these distances defines The cross-strand distances, dcdl, r) and desc; r). the x torsion angle quite well, because of their involving exchangeable protons are indicative of quite strong dependence on the torsion angle x base pair formation. The sequential distances involv [62]. Their dependence on the sugar puckering is ing either exchanging or non-exchanging protons are rather weak, in particular for the distances di(2/ indicative of base stacking. However, only a limited 2; 6/8)[62]. Despite this weak dependence on the number depend on the type of helix conformation. In sugar puckering, a concerted use of d(2273: 6/8) both A- and B-type helices, short base-to-base dis- makes it possible to determine the percentage N- tances, d, (6/8/5/M: 6/8/5/M), are present, depending type or S-type pucker, but to achieve a reasonable on the sequence. Similarly, all distances involving level of precision requires that the uncertainty in exchanging pro otons are very sin their values should be less than +0.5A [62]. w B-type helices. The differences occur for the cross- finally note that Lane and co-workers[64] have shown strand and sequential distances involving H2 protons the improved reliability of sugar pucker determination d&(2; 1/2)3 and ds(2; 1), the sequential sugar-to- using these distances together with J-couplings base distances, ds(2/273: 6/8/5), and for a number The H5/H5"to base proton distances, di(/5: 6/8) of stances,d(2n2”;5 depend on three torsion angles, y, 8 and x. Their 5),d, (2 3),d, (22)and d(1: 5). Short cross- dependence on the sugar pucker(o), and on th strand, as well as sequential H2 to HI'distances glycosidic torsion angle (x) in the usual anti are present in A-type helices, but absent in B-type domain (180-240)is weak, but they depen helices. Short sequential H2 to H6/8 distances and quite strongly on the y torsion angle. In particular long h2" to H6/8 distances are seen in A-helices. ;6/8) while in B-helices the reverse is found. The sugar- to 4.5 A), while for y' the distance d,(5: 6/8) to-sugar distances show the following pattern: Short becomes short(2.5 to 2.9 A). As has been s o8) sequential H2/H2"to H5 /5"distances in A-helices, ith uncertainties in the distance estimates in the while in B-helices these distances are long, rather order of +0.2 A, they determine quite well the long, but measurable, sequential H2 to H3 distances torsion angle y[62]. The distances d (5 /5"; 6/8) In A-helices, which are over 7 A and thus not measur- can be quite useful in NOESY spectra of dNa able in B-helices; finally, long(>7 A)sequential H2 since the related NoE cross pea to h2 and hi to h5" distances in a-helices. which in a crowded spectral region. This does not hold are relatively short in B-helices. While the distances true for RNa where these cross peaks overlap with involving H2 and the h2 /2"to base distances are he other H6/8 to H2 /3'NOE cross peaks. On the quite accessible from NMR spectra, the sugar-to- other hand, the distances d;(3/4; 5/5) all sugar distances are difficult to determine since the relate to cross peaks in crowded spectral sugar proton resonances reside in quite crowded regions for both DNA and RNA and are thus spectral regions difficult to establish 4.4. Derivation of distances from NOESY spectra and structure characterization using distances 4.3. Overview of structurally important sequential and cross-strand distances We will discuss here the three aspects of NMr accessible distances that are of particular relevance Helical conformations form an important part of for structure determination. First, how precisely can nucleic acid structures. We therefore present an over- distances be derived from NOE data? Secondly, how iew of the distances in the two most commonly does this on affect the precision of the deter- found helix types, A-helices and B-helices. Fig. 2, mined structure? Thirdly, how does the spread and
and the x torsion angle. The distance di(49;6/8) does not convey useful structural information since its dependence on these parameters is weak [62]. The distances di(29/20/39;6/8) are on the other hand quite useful. Each of these distances defines the x torsion angle quite well, because of their quite strong dependence on the torsion angle x [62]. Their dependence on the sugar puckering is rather weak, in particular for the distances di(29/ 20;6/8) [62]. Despite this weak dependence on the sugar puckering, a concerted use of di(29/20/39;6/8) makes it possible to determine the percentage Ntype or S-type pucker, but to achieve a reasonable level of precision requires that the uncertainty in their values should be less than 6 0.5 A˚ [62]. We finally note that Lane and co-workers [64] have shown the improved reliability of sugar pucker determination using these distances together with J-couplings. The H59/H50 to base proton distances, di (59/50;6/8), depend on three torsion angles, g, d and x. Their dependence on the sugar pucker (d), and on the glycosidic torsion angle (x) in the usual anti domain (180–2408) is weak, but they depend quite strongly on the g torsion angle. In particular, for gþ both di(59; 6/8) and di(50;6/8) are long (3.7 to 4.5 A˚ ), while for gt the distance di(50;6/8) becomes short (2.5 to 2.9 A˚ ). As has been shown, with uncertainties in the distance estimates in the order of 6 0.2 A˚ , they determine quite well the torsion angle g [62]. The distances di(59/50;6/8) can be quite useful in NOESY spectra of DNA, since the related NOE cross peaks do not reside in a crowded spectral region. This does not hold true for RNA where these cross peaks overlap with the other H6/8 to H29/39 NOE cross peaks. On the other hand, the distances di(39/49;59/50) all relate to cross peaks in crowded spectral regions for both DNA and RNA and are thus difficult to establish. 4.3. Overview of structurally important sequential and cross-strand distances Helical conformations form an important part of nucleic acid structures. We therefore present an overview of the distances in the two most commonly found helix types, A-helices and B-helices. Fig. 2, reproduced from Wijmenga et al. [62], gives the sequential distances, ds(l;r), and cross-strand distances, dci(l;r) and dcs(l;r), found in A-DNA, B-DNA and RNA helices. The cross-strand distances, dci(l;r) and dcs(l;r), involving exchangeable protons are indicative of base pair formation. The sequential distances involving either exchanging or non-exchanging protons are indicative of base stacking. However, only a limited number depend on the type of helix conformation. In both A- and B-type helices, short base-to-base distances, ds(6/8/5/M;6/8/5/M), are present, depending on the sequence. Similarly, all distances involving exchanging protons are very similar in A- and B-type helices. The differences occur for the crossstrand and sequential distances involving H2 protons, dcs(2;19/2)39 and ds(2;19), the sequential sugar-tobase distances, ds(29/20/39;6/8/5), and for a number of sequential sugar-to-sugar distances, ds(29/20;59/ 50), ds(29;39), ds(20;20) and ds(19;50). Short crossstrand, as well as sequential H2 to H19 distances, are present in A-type helices, but absent in B-type helices. Short sequential H29 to H6/8 distances and long H20 to H6/8 distances are seen in A-helices, while in B-helices the reverse is found. The sugarto-sugar distances show the following pattern: Short sequential H29/H20 to H59/50 distances in A-helices, while in B-helices these distances are long; rather long, but measurable, sequential H29 to H39 distances in A-helices, which are over 7 A˚ and thus not measurable in B-helices; finally, long ( . 7 A˚ ) sequential H20 to H29 and H19 to H50 distances in A-helices, which are relatively short in B-helices. While the distances involving H2 and the H29/20 to base distances are quite accessible from NMR spectra, the sugar-tosugar distances are difficult to determine since the sugar proton resonances reside in quite crowded spectral regions. 4.4. Derivation of distances from NOESY spectra and structure characterization using distances We will discuss here the three aspects of NMR accessible distances that are of particular relevance for structure determination. First, how precisely can distances be derived from NOE data? Secondly, how does this precision affect the precision of the determined structure? Thirdly, how does the spread and S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387 295
S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32(1998)287-387 A HS HS"H4H3'H2 H2"HI H8 NH2 NH [ NH2 HS H6 HI H2" F H3'H4'H5"H5' HS"" H3 H2H2" H6 CH3 NH NH2 H2 H8 8HEB2田HB H5H5”H4H3H2"H2 CH3NH NH2 H2 H8 HI' H2"H2H H3" H4 H5" HSA HSHS"H4'H3'H2'H2 中中 H2'H3'H4'HS"H5 HSHS"H4'H3'H2'H2" 6 HS NH2JNH NH2 H8 HI H2"H2 H3 H4 HS"HS HSH5H4H3H2'H2HI H6 CH3 NHNH2 H2 H8 HI H2"H2 2" H3 H4 HS"HSA HS'H5"H4 H2"HI HS H2 NH2]NH CH3 H6 HI"" H3"H4 H5"H5 H5 HS"H4 H3 H2 H2"HI H6 HS NH2NH NH2 H8 HI H2"H2 H4 HS"HS HS HS" H4' H3 H2H2"HI H8 H2 NH2 NH CH3 H6 HI H2"H2 H3 H4 HS"HS" 田FHHH田HNH2 H CH H6 HI H2H2 H3 H4H5H5 EHm业正mcm[N正正EE时3 HS HS" H4 H3" H2 H2" HI HB NH2 NH NH2 HS H6 HI H2"H2 H3" H4 HS"H5 IHS HS" H4 H3 HZ H2 HI H8 NH2 NH NH2 H5 H6 HI' H2H2. H3' H4' H5" H5' HS H5"H4 H3 H2 H2"HI'H8 H2 NH2 NH CH3 H6 HI H2"H2 H3 H4 HS"H H5 H5 H4 H3 H2H2"HI H8 NH2 NH NH2 HS H6 HI H2" H2 H3" H4 H5"H5 H5’H5”H4H3′H2′H2"H′H8NH2NH 12 H5 H6 HI'H2H2'H3'H4'H5"H5 5 3 base stacking in A-DNA(A), B-DNA(B)and RNA(C). The meaning of the symbols is: 0-2.5 A( thick solid line), 2.5-3.0 A(solid line ), 3.0-4.0 A(dashed line), 4.0-50A (dotted line)
Fig. 2. Overview of short sequential and inter-strand proton–proton distances for all possible combinations of base stacking in A-DNA (A), B-DNA (B) and RNA (C). The meaning of the symbols is: 0–2.5 A˚ (thick solid line), 2.5–3.0 A˚ (solid line), 3.0–4.0 A˚ (dashed line), 4.0–5.0 A˚ (dotted line). 296 S.S. Wijmenga, B.N.M. van Buuren/Progress in Nuclear Magnetic Resonance Spectroscopy 32 (1998) 287–387