Contents lists available at sciencedirect Bioorganic Medicinal Chemistry Letters ELSEVIER journal homepage:www.elsevier.com/locate/bmcl Cheminformatic comparison of approved drugs from natural product versus synthetic origins Christopher F.Stratton,David J.Newman,Derek S.Tan D21702U ad-ReseachPgm Memr m Ketering Cancer Center.1275 York Ave.Bx 4.New York NY10065.USA ARTICLE INFO ABSTRACT ne2015 and 201 ely syn syn etic drugs perties 015 Elsevier Ltd.All rights reserved. diversity in most discovery libraries.Small-molecule drug develop end is couin rameters.n orialibrar& esre deve xinccmll-moecuedrugdiscoveryw usly successful drug candidates. ehl-moiecule drugn small-m target ony proteins encode by the d address awd de range of biologica targets er half o Tentincdhnicaltnalk月 and chem lex natural product scaf mall- nt s that 8do0eIg0Leud1igeaened
Cheminformatic comparison of approved drugs from natural product versus synthetic origins Christopher F. Stratton a , David J. Newman b , Derek S. Tan a,c,⇑ a Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, 1275 York Ave, Box 422, New York, NY 10065, USA bNatural Products Branch, Developmental Therapeutics Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Frederick, PO Box B, Frederick, MD 21702, USA c Chemical Biology Program and Tri-Institutional Research Program, Memorial Sloan Kettering Cancer Center, 1275 York Ave, Box 422, New York, NY 10065, USA article info Article history: Received 24 June 2015 Accepted 8 July 2015 Available online 14 July 2015 Keywords: Natural products Synthetic drugs Physicochemical properties Cheminformatics Principal component analysis abstract Despite the recent decline of natural product discovery programs in the pharmaceutical industry, approximately half of all new drug approvals still trace their structural origins to a natural product. Herein, we use principal component analysis to compare the structural and physicochemical features of drugs from natural product-based versus completely synthetic origins that were approved between 1981 and 2010. Drugs based on natural product structures display greater chemical diversity and occupy larger regions of chemical space than drugs from completely synthetic origins. Notably, synthetic drugs based on natural product pharmacophores also exhibit lower hydrophobicity and greater stereochemical content than drugs from completely synthetic origins. These results illustrate that structural features found in natural products can be successfully incorporated into synthetic drugs, thereby increasing the chemical diversity available for small-molecule drug discovery. 2015 Elsevier Ltd. All rights reserved. Recent studies indicate that annual drug approvals have remained at a fairly low, constant level since the 1950s despite technological advancements1,2 and increased research and development expenditures. The current cost of developing a drug from concept to market is estimated at nearly $2.6 billion.3 A striking trend over the last 30 years is the marked increase in approvals of biologic therapeutics (e.g., monoclonal antibodies and vaccines) and decline in small-molecule drugs (Fig. 1).4,5 This trend is consistent with the pharmaceutical industry’s shift away from focusing exclusively on classical, small-molecule drug discovery and toward biologics.6,7 Despite this, small-molecule drugs remain an integral component of the drug discovery pipeline. Most notably, small molecules are still more effective at addressing intracellular targets than most biologics.8,9 However, small-molecule drugs address a rather limited range of targets. A 2006 study estimated that approved small-molecule drugs target only 207 proteins encoded by the human genome.10 Moreover, 50% of all drugs target only four protein classes: rhodopsin-like G-protein coupled receptors, nuclear receptors, voltage-gated ion channels, and ligand-gated ion channels.10 A potential factor contributing to the limited range of biological targets engaged by small-molecule drugs is the lack of chemical diversity in most discovery libraries. Small-molecule drug development often begins with screening campaigns using compound collections whose designs are impacted heavily by synthetic accessibility.11 Discovery libraries are also biased by conventions such as Lipinski’s Rule-of-five12 and Veber’s rules13 for oral bioavailability, which have been used to define ‘drug-like’ structures based on prescribed boundaries of certain physicochemical parameters. In addition, many combinatorial libraries are developed around the structural features of known compounds or previously successful drug candidates.11 Over time, these factors have led to screening collections replete with molecules sharing a high degree of structural similarity. Broadening the scope of addressable targets and investigating new modes of action are important goals towards increasing the versatility of small-molecule therapeutics. Natural products are an important source of bioactive molecules for drug development and address a wide range of biological targets.14–17 Over half of all approved small-molecule drugs trace their structural origins to natural products.4,5 In addition, there are a considerable number of natural products and natural product-derived compounds currently in clinical trials.18 However, natural product drug discovery is often associated with challenges in purification, characterization, and chemical modification of complex natural product scaffolds.14,19 As such, natural product scaffolds are underrepresented in small-molecule libraries, with a recent study estimating that http://dx.doi.org/10.1016/j.bmcl.2015.07.014 0960-894X/ 2015 Elsevier Ltd. All rights reserved. ⇑ Corresponding author. Bioorganic & Medicinal Chemistry Letters 25 (2015) 4802–4807 Contents lists available at ScienceDirect Bioorganic & Medicinal Chemistry Letters journal homepage: www.elsevier.com/locate/bmcl
C.F.Med Chem.(2015)4-40 300 100 ★ i 0 hre 山 25 20 tigate differences in the chemical properties of drugs herein and 198518918g282852076 al.(a)No that are ucts (NP ased o These results illus- gs as a m of nc source using categories established by Newman ugs betv and 2010 the FDA and simila a natural product:usually a semisynthetic hereas approvals for S drugs increased in S=Made by totalsynthesis,but the pharmacophore is froma D natural product. synthe -Syntheticdrug:often found by HTSor modification of an ral and cal parameters(Table The parsed NCEs indicate that appromately half of al small cuar weight (MW)hydro NCE in ve-year HBD)≤5 calculated tanol/water partition d topolog al pols mbers fo ions have guided d rug discovery i d in c ng approve
only 17% of the scaffolds found in natural products (with 611 heavy atoms) are present in commercially available screening collections.20 To investigate differences in the chemical properties of drugs from natural product versus synthetic origins, we report herein a principal component analysis (PCA) of the structural and physicochemical features found in new chemical entities (NCEs) approved between 1981 and 2010. Overall, we find that drugs based on natural product structures exhibit greater chemical diversity and interrogate larger regions of chemical space compared to drugs from completely synthetic origins (i.e.: structure not based upon a natural product). Relative to completely synthetic drugs, natural products and their semisynthetic derivatives have larger molecular size, greater three-dimensional complexity, lower hydrophobicity and increased polarity, and fewer aromatic rings. Moreover, drugs that are synthetic but based on natural product structures or pharmacophores are also somewhat larger, more complex, and less hydrophobic than completely synthetic drugs. These results illustrate that structural features found in natural products can be successfully incorporated into synthetic drugs as a means of increasing chemical diversity and, by extension, target diversity. Thus, we parsed NCEs approved between 1981 and 2010 by compound source using categories established by Newman and Cragg:4,5,21 NP ¼ Natural product: ND ¼ Derived from a natural product; usually a semisynthetic modification: S ¼ Made by total synthesis; but the pharmacophore is from a natural product: S ¼ Synthetic drug; often found by HTS or modification of an existing agent: NM ¼ Natural product mimic: The parsed NCEs indicate that approximately half of all smallmolecule drugs approved over the last 30 years trace their structural origins to a natural product. Binning NCEs in five-year periods reveals that this pattern has remained consistent over time, with drugs based on natural product structures (NP, ND, S⁄ , S⁄ /NM) and completely synthetic drugs (S, S/NM) representing fairly equal shares of total small-molecule drug approvals in each time interval (Fig. 2a). Examining relative approval numbers for natural product (NP), natural product-derived (ND), and natural product-inspired synthetic (S⁄ , S⁄ /NM) drugs highlights additional trends (Fig. 2b). Approvals for NP drugs peaked in the late 1980s and declined in the 1990s, correlating with the decommissioning of many natural product discovery programs in the pharmaceutical industry.22 The uptick in NP drugs between 2001 and 2010 results, in part, from the approval of several botanical ‘defined mixtures’, which have become recognized as drugs by the FDA and similar organizations.5 In addition, approvals for ND drugs have remained fairly constant over time, whereas approvals for S⁄ drugs increased in 1981–2000, then leveled off and declined in 2001–2010. Parameter selection: To gain a greater understanding of structural differences between approved drugs from natural product versus completely synthetic origins, we carried out a cheminformatic analysis of the NCEs from 1981 to 2010. Compounds were analyzed for our established set of 20 structural and physicochemical parameters (Table 1).23–29 Parameters were selected based on the considerations outlined below. Lipinski’s Rule-of-five (molecular weight (MW) 6 500; hydrogen bond acceptors (HBA) 6 10; hydrogen bond donors (HBD) 6 5; calculated octanol/water partition coefficient (CLogP) 6 5) has prescribed limits on parameters correlated with oral bioavailability.12 Subsequent studies correlated increased oral bioavailability with rotatable bond count (RotB 6 10) and topological polar surface area (tPSA 6 140 Å).13,30 Such chemical conventions have guided drug discovery in the pharmaceutical industry for many years and have strongly influenced the chemical features found in current small-molecule drugs. Notably, many natural products, including approved drugs that are orally bioavailable, Figure 2. Small-molecule drug approvals between 1981 and 2010 parsed by compound class. NCEs are binned in five-year periods and displayed as percentages of total small-molecule drug approvals in each time interval. (a) NCEs parsed as drugs based on natural product structures (i.e.: natural products (NP), natural product-derived (ND), natural product-inspired synthetics (S⁄ , S⁄ /NM)) versus drugs from completely synthetic origins (S, S/NM). (b) Drugs based on natural product structures parsed by individual compound classes. Figure 1. Approved drugs 1981–2010. NCEs from 1981 to 2010 are binned in fiveyear groups and displayed in three series: total approved drugs (d), small molecules (j), and biologics (N). C. F. Stratton et al. / Bioorg. Med. Chem. Lett. 25 (2015) 4802–4807 4803
4804 CF.et al/or Med Chem.Let 25(015)40-407 Fsp was included to complement the 2010 NCE (Ri droge ng (Rnet g)and rin Wa relat e polar surface area) character tha natural p ducts. in drus c0÷ th mined averae values of each structuraland phy 1du89 rie in the largest ring oth NPa gher a age mole eights than S and ncoetficientl distribution coemcient products:the the trend holds (Table).The e results are consistent wi Thus. we included MW.HBA.HBD nt with the C our In addition. espite havi on natu ral pro dard paramet k.in din lculatedn-ocodl Relativ values for other parame relate with mol count(HBD.HBA). rotatable bond co nt(RotB)and stereocenter nction ad the for tic drug-like ded asaddn of mole Fi ete m c unts (N 0 nd phv an impo products mpou rded Natura NP-inspired of aromati natural (S S/NM) Th ortant in y has been numbe nalyses have shown that nat al products have a 849601 eate mbe selectivity and s mportan aBa分 measure o molecul complexity was defined more complex able $2 for additional molecular weight-nor
violate these conventions.31 Thus, we included MW, HBA, HBD, ALOGPs,32 RotB, and tPSA in our analysis, as these parameters were predicted to highlight differences between natural products and synthetic drugs. Several additional descriptors were included to complement these standard parameters, including: calculated n-octanol/water distribution coefficient (LogD), calculated aqueous solubility (ALOGpS), Van der Waals surface area (VWSA), relative polar surface area (relPSA), and heteroatom counts (number of nitrogens, N; number of oxygens, O). LogD (pH 7.4) was included as an alternative measure of hydrophobicity as many of the drugs in this analysis contain ionizable functional groups. ALOGpS32 is a calculated measure of aqueous solubility and was included because solubility is often a challenge for synthetic drug-like compounds. VWSA and relPSA were included as additional measures of molecular surface properties due to the correlation of such features with passive membrane diffusion.33 Finally, heteroatom counts (N, O) were included because natural products typically have fewer nitrogen atoms and more oxygen atoms than synthetic drug-like compounds.34,35 Molecular complexity is an important feature differentiating natural products and synthetic compounds. Whereas synthetic drug-like compounds are commonly regarded as flat, rigid molecules with a high degree of aromatic character, natural products generally contain more complex scaffolds.16 This is particularly important in drug design as molecular complexity has been correlated with biological activity.36 An important metric for molecular complexity is stereochemical content as measured by the number of stereocenters in a molecule (nStereo). Previous cheminformatic analyses have shown that natural products have a greater number of stereocenters than synthetic drug-like compounds34 and increased stereochemical content has also been associated with improved binding selectivity16 and successful progression through clinical trials.37 As stereocenter count is correlated with molecular weight (vide infra), a normalized descriptor for stereochemical density (nStereo MW = nStMW) was also included. Another important measure of molecular complexity was defined by Lovering as fraction sp3 (Fsp3 ), where Fsp3 = total number of sp3 carbons total carbon count.37 Importantly, Fsp3 has been correlated with improved progression from lead discovery through clinical trials to drug approval.37 Subsequent statistical studies used Fsp3 to illustrate that natural products are more complex than the synthetic drug-like compounds found in commercial screening libraries.38 Accordingly, Fsp3 was included to complement the nStereo and nStMW descriptors. Natural products are also differentiated from drug-like compounds by having larger, more complex ring systems.35,39 Thus, several parameters related to ring count and ring size were included in the analysis: number of rings (Rings), number of aromatic rings (RngAr), number of ring systems (RngSys), atom count of largest ring (RngLg), and rings per ring system (RRSys). RngAr is of particular importance as previous analyses have shown that synthetic drug-like compounds, on average, have more aromatic character than natural products,34 and aromatic content has been correlated with increased preclinical toxicity and attrition rates in drug candidate progression.40,41 Average values for structural and physicochemical parameters: We then determined average values of each structural and physicochemical descriptor for each drug category (Table 2). Molecular weight varies significantly between classes and both NP and ND drugs have higher average molecular weights than S⁄ and S drugs. The large differences in mean molecular weight are due in part to the approval of several large peptide natural products; the difference in median molecular weights is less pronounced, although the same trend holds (Table S1). These results are consistent with previous cheminformatic studies.34,35 The lower average molecular weight for completely synthetic S drugs is also consistent with the use of Lipinski parameters in the development of synthetic drugs. In addition, despite having structures based on natural product pharmacophores, the average molecular weight for S⁄ drugs is lower than that of NP and ND drugs, closer to that of completely synthetic S drugs (Table 2). Relative to S⁄ and S drugs, NP and ND drugs have higher average values for other parameters that correlate with molecular weight, such as heteroatom count (N, O), hydrogen bond donor/acceptor count (HBD, HBA), rotatable bond count (RotB), and stereocenter count (nStereo). To account for the influence of molecular weight on these parameters, we divided the average values of each descriptor by the average molecular weight for each compound class. The normalized values for heteroatom count, hydrogen bond donor/acceptor count, and rotatable bond count display little to no variation across compound classes (Table S2). Notably, the Table 1 Structural and physicochemical parameters used to analyze NCEs between 1981 and 2010 Parameter Description MW Molecular weight N Number of nitrogen atoms O Number of oxygen atoms HBD Number of hydrogen bond donor atoms HBA Number of hydrogen bond acceptor atoms RotB Number of rotatable bonds tPSA Topological polar surface area VWSA Van der Waals surface area relPSA tPSA VWSA (relative polar surface area) nStereo Number of stereocenters nStMW nStereo MW (stereochemical density) Fsp3 sp3 carbon count total carbon count (fraction sp3 ) Rings Number of rings RngAr Number of aromatic rings RngSys Number of ring systems RngLg Number of atoms in the largest ring RRSys Rings RngSys (ring complexity) ALOGPs Calculated n-octanol/water partition coefficient ALOGpS Calculated aqueous solubility LogD Calculated n-octanol/water distribution coefficient Table 2 Mean values for the structural and physicochemical parameters of approved smallmolecule drugs Natural product drugs (NP) NP-derived drugs (ND) NP-inspired synthetic drugs (S⁄ , S⁄ /NM) Completely synthetic drugs (S, S/NM) MW 626 634 386 343 N 4.1 4.4 3.0 2.4 O 9.3 8.3 4.1 2.6 HBD 6.4 5.0 2.5 1.3 HBA 10.3 9.2 5.2 4.2 RotB 11.0 12.6 7.7 5.2 nStereo 8.2 6.7 1.9 0.8 nStMW 0.012 0.011 0.005 0.002 Fsp3 0.68 0.60 0.47 0.37 tPSA 209 194 98 70 VWSA 933 917 573 487 relPSA 0.22 0.20 0.19 0.16 Rings 3.1 3.8 2.8 2.8 RngAr 0.8 1.3 1.9 1.9 RngSys 2.0 2.2 2.0 2.2 RngLg 9.6 7.5 6.1 5.9 RRSys 1.6 2.1 1.4 1.3 ALOGPs 1.5 2.0 1.8 2.7 ALOGpS 3.3 3.9 3.5 3.7 LogD 2.2 1.3 0.3 1.5 See Table S1 for median values and Table S2 for additional molecular weight-normalized values of size-dependent parameters. 4804 C. F. Stratton et al. / Bioorg. Med. Chem. Lett. 25 (2015) 4802–4807
C.F.Med Chem.(2015)4-40 90 and e spre of NP and ND relative to synthetic drug-like ity found in these mo e data indicate that the structural and ph measured by Fsp has beer ugh s and s drugs hav mila ent loadings in the PCA can be used to understand the dr uct-b ues for ad 4)u greater xity than c rotate The d ng the pla other siz ba aran such a ant(RotB),and stereocer Stereo)have a stror ng nes )indicating that out with th rs is illus ated on the load plo may er of relative to s alon s largely due ce in mole products have .on average nd molecula weight ient AioghaedtriDutionoiS”andsdugssconstrainedalor of 5 drugs may g cla argely b d)ane wh es:Io nds r )he PC2.ande pos struc ural an athardalongPc Des for mol complexit hod for ensio al da ma loss of info ion from the original Compared to the natural product-based NP.ND.and S drug dan of rugs gs ind var for th h of NP and ND by ALOGPs and L bina riginal ugs ext to th wer let de idean the fraction of total varian oD)and gre mole cular (nStMW and Fsp) S drugs (F the se mponent (PC)contains the of the forme ents To whe n)with minima al loss of infor he las 30 years,at es of the tructura for 20 s and phy emical )with PCAn ron 198 aken to ther the first two p h displays a noticeable e in ase for all NCEs fr the full 20-dimen w to the aortoe tion of the PCA plot heminfor tthe molecular weighto e be ause the signs and units of each cular weight,such as RD (No)
normalized values for stereocenter count (nStMW) for NP and ND drugs were 2- to 6-fold higher than those for S⁄ and S drugs (Table 2). These data are consistent with previous cheminformatic studies indicating that natural products have a greater degree of stereochemical diversity relative to synthetic drug-like compounds.34,35 The values of Fsp3 are higher for NP and ND drugs relative to S⁄ and S drugs. This is particularly important because increased molecular complexity, as measured by Fsp3 , has been associated with the ability of molecules to interrogate larger regions of chemical space.37 Interestingly, although S⁄ and S drugs have similar average molecular weights, S⁄ drugs have higher values for both nStMW and Fsp3 . Thus, natural product-based S⁄ drugs exhibit greater molecular complexity than completely synthetic S drugs. Overall, ring count (Rings), ring system count (RngSys), and rings per ring system (RRSys) are similar across compound classes. Mean values for the size of the largest ring (RngLg) suggest that, on average, NP drugs contain larger rings than S drugs (Table 2). However, the median value for largest ring size is equivalent (6 atoms) for all compound classes (Table S1), indicating that outliers may skew the mean value for NP drugs. The average and median number of aromatic rings is higher for S and S⁄ drugs relative to NP and ND drugs. These data are consistent with previous analyses indicating that natural products have lower aromatic character than synthetic, drug-like compounds.34 Finally, the partition coefficient ALOGPs and distribution coeffi- cient LogD both predict NP and S drugs to have the lowest and highest hydrophobicity, respectively, with ND and S⁄ drugs having intermediate values. The increased lipophilicity of S drugs may result in part from higher aromatic content. Calculated aqueous solubility ALOGpS is similar across drug classes. Principal component analysis comparison of compound classes: To visualize the distribution of NCEs in chemical space, we performed principal component analysis (PCA) on the set of structural and physicochemical descriptors described above. PCA is a statistical method for variable reduction that allows multidimensional data to be visualized using two- and three-dimensional plots with minimal loss of information from the original dataset. As several of the descriptors in this analysis are correlated, PCA uses a linear transformation to rotate the matrix of variables onto a set of orthonormal axes that define the dimensions of greatest variance for the dataset.42–44 The newly formed axes are called principal components and represent linear combinations of the original variables (descriptors). Importantly, the matrix rotation preserves Euclidean distances and maximizes the fraction of total variance from the original dataset on each successive principal component. Through this transformation, the first principal component (PC1) retains the greatest fraction of variance from the original dataset, the second principal component (PC2) contains the next largest fraction, and so on. In this way, an n-dimensional dataset can be visualized using an m-dimensional plot of principal components (where m 90% of the information in the full dataset is represented in the first six principal components (PC1–PC6; Table S3). The PCA plot (PC1 vs PC2) from a single analysis encompassing all compounds is presented in Figure 3, although NP, ND, S⁄ and S drugs are shown on separate plots for clarity. To maintain the orientation of these PCA plots with our previous analyses,23–28 PC2 scores for each compound were inverted; this is feasible because the signs and units of each principal component are arbitrary. The PCA plots indicate that NP (Fig. 3a) and ND drugs (Fig. 3b) are fairly evenly distributed across chemical space as defined by PC1 and PC2. The wide spread of NP and ND drugs on the PCA plots illustrates the high degree of physicochemical and structural diversity found in these molecules. Both S⁄ (Fig. 3c) and S drugs (Fig. 3d) occupy tighter clusters in chemical space relative to NP and ND drugs. These data indicate that the structural and physicochemical features of synthetic drugs are more narrowly focused than natural products, and consequently these compounds exhibit less chemical diversity. Component loadings in the PCA can be used to understand the influence of the original 20 parameters on the distribution of molecules in the PCA plots. A loading plot (Fig. 4) illustrates how the original variables are rotated onto the plane defined by PC1 and PC2. The loading plot reveals that molecular weight (MW) and other size-based parameters such as heteroatom counts (N, O), hydrogen bond donor/acceptor count (HBD, HBA), rotatable bond count (RotB), and stereocenter count (nStereo) have a strong negative (leftward) influence along PC1. The high correlation of molecular weight with these parameters is illustrated on the loading plot by the small angles between the vectors representing each descriptor. This indicates that the large spread of NP and ND drugs along PC1, relative to S⁄ and S drugs, is largely due to variance in molecular size (Fig. 3). These data agree with previous analyses showing that natural products have, on average, higher molecular weights relative to synthetic drug-like compounds.34,35 Although the distribution of S⁄ and S drugs is constrained along PC1, the spread of these compounds is more pronounced on PC2 (Fig. 3c and d). Positioning of compounds along PC2 is governed largely by ALOGPs and ALOGpS, which influence compounds in a positive (upward) and negative (downward) direction, respectively (Fig. 4). In addition, RngAr, Rings and RngSys influence the positioning of compounds positively (upward) along PC2, and negatively (leftward) along PC1. Descriptors for molecular complexity Fsp3 and nStMW, as well as relPSA, influence the positioning of compounds negatively (downward) along PC2 and negatively (leftward) along PC1 (Fig. 4). Compared to the natural product-based NP, ND, and S⁄ drugs, a larger portion of completely synthetic S drugs cluster in the upper right region of the PCA plot (Fig. 3). The component loadings indicate that this results from the increased hydrophobic character of S drugs, as measured by ALOGPs and LogD. In contrast, a greater proportion of NP and ND drugs extend into the lower left region of the PCA plot (Fig. 3), resulting from lower hydrophobicity (ALOGPs, LogD) and greater molecular complexity (nStMW and Fsp3 ). Interestingly, natural product-based S⁄ drugs cluster lower on PC2 than completely synthetic S drugs (Fig. 3c and d), due to the decreased hydrophobicity (ALOGPs) and increased stereochemical diversity (nStMW and Fsp3 ) of the former. Time-resolved analysis of structural and physicochemical descriptors and PCA plots: To investigate relative changes in the properties of drugs over the last 30 years, average values of the 20 structural and physicochemical parameters for NP, ND, S⁄ , and S drugs were parsed in five-year periods from 1981 to 2010 (Table S4). Although distinct trends are less clear in these data, molecular weight displays a noticeable increase for all NCEs from 1981 to 2010. A dramatic increase in molecular weight for NP drugs in 2001–2005 is in part due to the approval of several large peptide-based drugs, which skew the mean value. The influence of high molecular weight outliers is less pronounced on median values, though the pattern of increasing molecular weight is still observed (Table S5). These results are consistent with previous cheminformatic studies indicating that the molecular weight of drugs has increased since the early 1980s.45,46 Parameters that correlate with molecular weight, such as heteroatom counts (N, O), hydrogen bond donor/acceptor count (HBD, HBA), rotatable bond C. F. Stratton et al. / Bioorg. Med. Chem. Lett. 25 (2015) 4802–4807 4805
C.F.et al/Bioorg Med.Chem.Lett.25 (2015)402-4807 a b Natural product Natural product- 4 drugs(NP) derived drugs (ND) 2 -3 .5 PC1 PC1 3 Completely synthetic drugs 2 2 (S,S/NM) 0 31 NP-inspired 91 4 synthetic drugs (S*,S*/NM) 6876543210123 6876543210123 PC1 PC1 ocenter count(nStereo)also increa 04 eearpeniodts,litig 0.2 spac occupied by NP,ND.ands drugs()The data su in emical space has remained similar over the last st that drugs appr rsity of drugs from naturalpr duct versus sy 04 In con ion.02010 04 02 0 02 PC1 Furthermore.PCA using ou ural and interrogate of chemical space that s druss NPand
count (RotB), and stereocenter count (nStereo), also increase over time. However, when normalized for molecular weight, these parameters have remained fairly consistent over the last 30 years. When drugs in the PCA plot are parsed in five-year periods, little change is observed with respect to the relative regions of chemical space occupied by NP, ND, S⁄ , and S drugs (Fig. S1). These data suggest that the relative positions of natural products and synthetic drugs in chemical space has remained similar over the last 30 years. Although a recent analysis of the physicochemical features of drugs suggest that drugs approved since 2002 occupy different regions of chemical space compared to drugs approved before 1983,46 the current analysis considers only the relative diversity of drugs from natural product versus synthetic sources, and does not include structures of drugs approved prior to 1981. In conclusion, our cheminformatic analysis of NCEs approved between 1981 and 2010 indicates that drugs that are based on natural product structures exhibit a greater range of structural and physicochemical features compared to completely synthetic drugs. Furthermore, PCA using our established set of 20 structural and physicochemical parameters23–29 indicates that NP and ND drugs interrogate larger areas of chemical space that S drugs. NP and Figure 3. PCA plots of drugs approved between 1981 and 2010 parsed by compound class. Data from a single analysis are shown on four separate PCA plots defined by the first two principal components, PC1 versus PC2, for (a) natural product drugs (NP), (b) natural product-derived drugs (ND), (c) natural product-inspired synthetic drugs (S⁄ , S⁄ / NM), and (d) completely synthetic drugs (S, S/NM). Figure 4. Component loadings for the PCA. Vectors on the loading plot indicate the relative influence of each structural and physiochemical descriptor on the placement of molecules on the plot of PC1 versus PC2. 4806 C. F. Stratton et al. / Bioorg. Med. Chem. Lett. 25 (2015) 4802–4807
C.F.Med Chem.(2015)4-40 od200 drug-like eg Shstdieshenieitnamen differe es b een com 5.993 ive their structural features from the scaffolds and buildn 13. ks used ecomes a Edrada-Ebel 17. Wong.L S.:Geerlings.T.H.:Micklefield.L Nat.Prod.Rep.2007.24 ynthetic drugs ersity of current feature such as increase molecula r complexity and stere ki,T.A:Tan.D.S.Nat.Chem.Biol 2013.9. Tan.D.S structural features found in natural products can be aahneaucodireorgcploitingthestnc Acknowledgements 33. thman.K:Ungell.A L:Strandlund.G.:Artursson.P.L Pharm.Sci 36.S r.P.:Roth.H.L:Ertl P:Schufenhauer.A.Curr.Ooin Chem.BioL 2005.9 NIH (P41 GM076267 to D.S.T.). Tri-Institutio Res 20 Experimental Therapeutics Center is gratefully acknowledged 41 011 Supplementary data 42 User'sCue Compo 47. 2011167 References and notes 48. 67
ND drugs are differentiated from S drugs by having, on average, larger molecular scaffolds with lower hydrophobicity and higher stereochemical content and molecular complexity. These results agree with previous studies indicating that synthetic drug-like compounds display less structural diversity and occupy a narrower region of chemical space compared to natural products.16,26,34,35,47 Such studies highlight fundamental differences between compounds from natural and synthetic origins. Whereas the structural features of many natural products have been tailored through evolution for binding to biological macromolecules, synthetic drugs derive their structural features from the scaffolds and building blocks used in their preparation. This becomes a limiting factor as many drug-like combinatorial libraries are constructed based on synthetic accessibility or structures of previously successful drug candidates.11 As such, this strategy has restricted the structural diversity of many discovery libraries, which may have contributed to the limited target diversity of current small-molecule synthetic drugs. Our analysis also illustrates that drugs that are synthetic but based on natural product scaffolds (S⁄ ) are less hydrophobic and have greater stereochemical complexity than drugs of completely synthetic origins (S). This is of particular relevance to drug design as features such as increased molecular complexity and stereochemical content have been correlated with decreased preclinical toxicity40,48 and increased progression through clinical trials.37 Moreover, these data underscore the general concept that the structural features found in natural products can be successfully leveraged to increase the structural diversity of synthetic drugs. Such information can now guide the development of synthetic methods that aim at enhancing diversity by exploiting the structural motifs and features of natural products. Acknowledgements This work is dedicated to the memory of our mentor and colleague, Professor David Y. Gin (1967–2011). Instant JChem was generously provided by ChemAxon. Financial support from the NIH (P41 GM076267 to D.S.T.), Tri-Institutional Stem Cell Initiative, William H. Goodwin and Alice Goodwin and the Commonwealth Foundation for Cancer Research, and the MSKCC Experimental Therapeutics Center is gratefully acknowledged. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.bmcl.2015.07. 014. References and notes 1. Munos, B. Nat. Rev. Drug Disc. 2009, 8, 959. 2. Scannell, J. W.; Blanckley, A.; Boldon, H.; Warrington, B. Nat. Rev. Drug Disc. 2012, 11, 191. 3. DiMasi, J. A.; Grabowski, H. G.; Hansen, R. W. Cost to Develop and Win Marketing Approval for a New Drug is $2.6 billion, http://csdd.tufts.edu. 4. Newman, D. J.; Cragg, G. M. J. Nat. Prod. 2007, 70, 461. 5. Newman, D. J.; Cragg, G. M. J. Nat. Prod. 2012, 75, 311. 6. Prueksaritanont, T.; Tang, C. AAPS J. 2012, 14, 410. 7. Projan, S. J.; Gill, D.; Lu, Z.; Herrmann, S. H. Expert Opin. Biol. Ther. 2004, 4, 1345. 8. Mitragotri, S.; Burke, P. A.; Langer, R. Nat. Rev. Drug Disc. 2014, 13, 655. 9. Smith, A. J. J. Biomol. Screen. 2015, 20, 437. 10. Overington, J. P.; Al-Lazikani, B.; Hopkins, A. L. Nat. Rev. Drug Disc. 2006, 5, 993. 11. Eberhardt, L.; Kumar, K.; Waldmann, H. Curr. Drug Targets 2011, 12, 1531. 12. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug Deliv. Rev. 1997, 23, 3. 13. Veber, D. F.; Johnson, S. R.; Cheng, H. Y.; Smith, B. R.; Ward, K. W.; Kopple, K. D. J. Med. Chem. 2002, 45, 2615. 14. Harvey, A. L. Drug Discovery Today 2008, 13, 894. 15. Harvey, A. L.; Edrada-Ebel, R.; Quinn, R. J. Nat. Rev. Drug Disc. 2015, 14, 111. 16. Clemons, P. A.; Bodycombe, N. E.; Carrinski, H. A.; Wilson, J. A.; Shamji, A. F.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 18787. 17. Dixon, N.; Wong, L. S.; Geerlings, T. H.; Micklefield, J. Nat. Prod. Rep. 2007, 24, 1288. 18. Butler, M. S.; Robertson, A. A.; Cooper, M. A. Nat. Prod. Rep. 2014, 31, 1612. 19. O’Connor, C. J.; Laraia, L.; Spring, D. R. Chem. Soc. Rev. 2011, 40, 4332. 20. Hert, J.; Irwin, J. J.; Laggner, C.; Keiser, M. J.; Shoichet, B. K. Nat. Chem. Biol. 2009, 5, 479. 21. Cragg, G. M.; Newman, D. J.; Snader, K. M. J. Nat. Prod. 1997, 60, 52. 22. Dias, D. A.; Urban, S.; Roessner, U. Metabolites 2012, 2, 303. 23. Wenderski, T. A.; Stratton, C. F.; Bauer, R. A.; Kopp, F.; Tan, D. S. Methods Mol. Biol. 2015, 1263, 225. 24. Bauer, R. A.; Wenderski, T. A.; Tan, D. S. Nat. Chem. Biol. 2013, 9, 21. 25. Kopp, F.; Stratton, C. F.; Akella, L. B.; Tan, D. S. Nat. Chem. Biol. 2012, 8, 358. 26. Bauer, R. A.; Wurst, J. M.; Tan, D. S. Curr. Opin. Chem. Biol. 2010, 14, 308. 27. Bauer, R. A.; DiBlasi, C. M.; Tan, D. S. Org. Lett. 2010, 12, 2084. 28. Moura-Letts, G.; DiBlasi, H. M.; Bauer, R. A.; Tan, D. S. Proc. Natl. Acad. Sci. U.S.A. 2011, 108, 6745. 29. Davis, T. D.; Gerry, C. J.; Tan, D. S. ACS Chem. Biol. 2014, 9, 2535. 30. Lu, J. J.; Crimin, K.; Goodwin, J. T.; Crivori, P.; Orrenius, C.; Xing, L.; Tandler, P. J.; Vidmar, T. J.; Amore, B. M.; Wilson, A. G.; Stouten, P. F.; Burton, P. S. J. Med. Chem. 2004, 47, 6104. 31. Ganesan, A. Curr. Opin. Chem. Biol. 2008, 12, 306. 32. Tetko, I. V.; Tanchuk, V. Y.; Kasheva, T. N.; Villa, A. E. J. Chem. Inf. Comput. Sci. 2001, 41, 246. 33. Palm, K.; Luthman, K.; Ungell, A. L.; Strandlund, G.; Artursson, P. J. Pharm. Sci. 1996, 85, 32. 34. Feher, M.; Schmidt, J. M. J. Chem. Inf. Comput. Sci. 2003, 43, 218. 35. Henkel, T.; Brunne, R. M.; Muller, H.; Reichel, F. Angew. Chem., Int. Ed. 1999, 38, 643. 36. Selzer, P.; Roth, H. J.; Ertl, P.; Schuffenhauer, A. Curr. Opin. Chem. Biol. 2005, 9, 310. 37. Lovering, F.; Bikker, J.; Humblet, C. J. Med. Chem. 2009, 52, 6752. 38. Dandapani, S.; Marcaurelle, L. A. Nat. Chem. Biol. 2010, 6, 861. 39. Ertl, P.; Schuffenhauer, A. Prog. Drug Res. 2008, 66, 217. 40. Ritchie, T. J.; Macdonald, S. J. Drug Discovery Today 2009, 14, 1011. 41. Ritchie, T. J.; Macdonald, S. J.; Young, R. J.; Pickett, S. D. Drug Discovery Today 2011, 16, 164. 42. Jackson, J. E. A User’s Guide to Principal Components; John Wiley & Sons: Hoboken, New Jersey, 2003. 43. Joliffe, I. T.; Morgan, B. J. Stat. Methods Med. Res. 1992, 1, 69. 44. Jolliffe, I. T. Principal Component Analysis; Springer: New York, New York, 2002. 45. Leeson, P. D.; Davis, A. M. J. Med. Chem. 2004, 47, 6338. 46. Faller, B.; Ottaviani, G.; Ertl, P.; Berellini, G.; Collis, A. Drug Discovery Today 2011, 16, 976. 47. Singh, S. B. a. C. J. C. In Natural Product Chemistry for Drug Discovery; Buss, A. D. A. B. A. S., Ed.; RSC Publishing: Cambridge, 2010. p 28. 48. Luker, T.; Alcaraz, L.; Chohan, K. K.; Blomberg, N.; Brown, D. S.; Butlin, R. J.; Elebring, T.; Griffin, A. M.; Guile, S.; St-Gallay, S.; Swahn, B. M.; Swallow, S.; Waring, M. J.; Wenlock, M. C.; Leeson, P. D. Bioorg. Med. Chem. Lett. 2011, 21, 5673. C. F. Stratton et al. / Bioorg. Med. Chem. Lett. 25 (2015) 4802–4807 4807