Provided for non-commercial research and education use Not for reproduction, distribution or commercial use ournal MOLECULAR GRAPHICS mOdelling with the Molecular Available online at Science Direct www.sciencedirect.com This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article(e.g in Word or Tex form)to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
Author's personal copy Journal of Molecular Graphics and Modelling 29(2010)326-330 Contents lists available at scienceDirect Journal of Molecular graphics and modelling ELSEVIER journalhomepagewww.elsevier.com/locate/jmgm Evaluation of various inverse docking schemes in multiple targets identification Liu Hui-fang, Shen Qing,, Zhang Jian, ** Fu Weia ty, 826 Zhangheng Road, Shanghai 201203, china b School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China ARTICLE IN FO A BSTRACT Article history The lack of accurate and efficient methods for target identification has been the bottleneck in drug discov- ery. In recent years, inverse docking has been applied as an efficient method in target identification, and eceived in revised form 6 September 2010 several specific inverse docking strategies have been employed in academic and industrial researche Available online 29 September 2010 owever, the effectiveness of these docking strategies in multiple targets identification is unclear. In this tudy, five inverse docking schemes were evaluated to find out the most effective approach in multiple rgets identification. A target database containing a highly qualified dataset that is composed of 1714 tries from 1594 known drug targets covering 18 biochemical functions was collected as a testing pool for inverse docking. The inverse docking engines including GOLD, Flexx, Tarfisdock and two in-house tar- Target databas get search schemes Tar Search-X and TarSearch-M were evaluated by eight multiple target systems in the dataset. The results show that TarSearch-X is the most effective method in multiple targets identification and validation among these five schemes, and the effectiveness of GoLd in multiple targets identificati TarFisDock is also acceptable. Moreover, these two inverse docking strategies will also be helpful in predicting the TarSearch-X ndesirable effects of drugs, such as toxicity TarSearch-M O 2010 Elsevier Inc. All rights reserved. 1. Introduction potential targets of an active compound become necessary before this compound could be advanced in drug discovery Recently, there has been a growing interest in the rational design In the past decades, various tools and techniques have been used of multi-target drugs with the goal to enhance overall efficacy for target identification and validation, such as microarray technol and or improve safety [1. The development of multi-target drugs ogy including nucleic acid microarrays, protein microarrays, and ht disclose new avenues to confront various serious diseases tissue and cell microarrays [7], antisense technology [8]. zinc fin such as neurodegenerative syndromes, cardiovascular diseases, ger protein transcription factor design 9, and haplotype analysis cancers, etc, all of which involve multiple pathogenic factors [2]. [10. Chen et al. used fluorophosphate derivatives as activity-based With the completion of human genome project and the progress probes to determine whether the serine hydrolase is one kind of of functional proteomics, more and more macromolecules have fluorophosphate derivative targets [11 Suzuki and co-workers been identified as potential targets to treat human diseases 3]. used antisense S-oligonucleotides or vector-based small interfering Nowadays, approximately 35% of known drugs or leads are against RNAs of COX 17 to suppress the expression of CoX17 in non-small multiple targets, and the multiple targets of the same drug usually cell lung cancer (NSClC) and to inhibit the growth of NSClc cells. involve in entirely different pathological pathways. Inevitably, the Their results indicate that cytochrome c oxidase(CCo) assembled presence of multiple targets presents both opportunities and chal- protein CoX17 might be a potential molecular target for the treat lenges for drug development. Drug efficacy could be significantly ment of lung cancers [12]. Recently, Lum et al. assessed the cellular improved by interacting with multiple targets [4,5 however, effects of 78 drugs in Saccharomyces cerevisiae using a genome-wide severe adverse drug effects could also be induced by binding to pool of tagged heterozygotes and found that lanosterol synthase ultiple targets [6]. Therefore, identification and validation of all could be another target for the antianginal drug molsidomine in the sterol biosynthetic pathway. Moreover, the rRNA processing exo- some was identified as a potential target of the cell growth inhibitor aracil[13 Corresponding author. TeL: +8621 51980010: fax: +86 21 51980010 However, those methods described above are experimentally expensive, low throughput and time consuming, which have 82103153@fudan.edu.cn(SQing).jianzhang@sjtu.edu difficulties in dealing with large-scale target identification and ZJian),wfu@fudan.edu.cn,weifuuh@gmail.com(FWei) evaluation. To complement these experiential methods, an in sili- These authors contributed equally to this work. con inverse-docking approach that can select potential targets of an 93-3263/ S-see front matter o 2010 Elsevier Inc. All rights reserved. doi:10.1016/jmgm2010.09004
Author's personal copy Journal of Molecular Graphics and Modelling 29 (2010) 326–330 Contents lists available at ScienceDirect Journal of Molecular Graphics and Modelling journal homepage: www.elsevier.com/locate/JMGM Evaluation of various inverse docking schemes in multiple targets identification Liu Hui-fanga,1, Shen Qinga,1, Zhang Jianb,∗∗, Fu Wei a,∗ a Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China b School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China article info Article history: Received 27 March 2010 Received in revised form 6 September 2010 Accepted 23 September 2010 Available online 29 September 2010 Keywords: Inverse docking Target identification Target database GOLD FlexX TarFisDock TarSearch-X TarSearch-M abstract The lack of accurate and efficient methods for target identification has been the bottleneck in drug discovery. In recent years, inverse docking has been applied as an efficient method in target identification, and several specific inverse docking strategies have been employed in academic and industrial researches. However, the effectiveness of these docking strategies in multiple targets identification is unclear. In this study, five inverse docking schemes were evaluated to find out the most effective approach in multiple targets identification. A target database containing a highly qualified dataset that is composed of 1714 entries from 1594 known drug targets covering 18 biochemical functions was collected as a testing pool for inverse docking. The inverse docking engines including GOLD, FlexX, Tarfisdock and two in-house target search schemes TarSearch-X and TarSearch-M were evaluated by eight multiple target systems in the dataset. The results show that TarSearch-X is the most effective method in multiple targets identification and validation among these five schemes, and the effectiveness of GOLD in multiple targets identification is also acceptable. Moreover, these two inverse docking strategies will also be helpful in predicting the undesirable effects of drugs, such as toxicity. © 2010 Elsevier Inc. All rights reserved. 1. Introduction Recently, there has been a growing interest in the rational design of multi-target drugs with the goal to enhance overall efficacy and/or improve safety [1]. The development of multi-target drugs might disclose new avenues to confront various serious diseases such as neurodegenerative syndromes, cardiovascular diseases, cancers, etc., all of which involve multiple pathogenic factors [2]. With the completion of human genome project and the progress of functional proteomics, more and more macromolecules have been identified as potential targets to treat human diseases [3]. Nowadays, approximately 35% of known drugs or leads are against multiple targets, and the multiple targets of the same drug usually involve in entirely different pathological pathways. Inevitably, the presence of multiple targets presents both opportunities and challenges for drug development. Drug efficacy could be significantly improved by interacting with multiple targets [4,5]; however, severe adverse drug effects could also be induced by binding to multiple targets [6]. Therefore, identification and validation of all ∗ Corresponding author. Tel.: +86 21 51980010; fax: +86 21 51980010. ∗∗ Corresponding author. E-mail addresses: 072103141@fudan.edu.cn (L. Hui-fang), 082103153@fudan.edu.cn (S. Qing), jian.zhang@sjtu.edu.cn (Z. Jian), wfu@fudan.edu.cn, weifuuh@gmail.com (F. Wei). 1 These authors contributed equally to this work. potential targets of an active compound become necessary before this compound could be advanced in drug discovery. In the past decades, various tools and techniques have been used for target identification and validation, such as microarray technology including nucleic acid microarrays, protein microarrays, and tissue and cell microarrays [7], antisense technology [8], zinc finger protein transcription factor design [9], and haplotype analysis [10]. Chen et al. used fluorophosphate derivatives as activity-based probes to determine whether the serine hydrolase is one kind of fluorophosphate derivative targets [11]. Suzuki and co-workers used antisense S-oligonucleotides or vector-based small interfering RNAs of COX17 to suppress the expression of COX17 in non-small cell lung cancer (NSCLC) and to inhibit the growth of NSCLC cells. Their results indicate that cytochrome c oxidase (CCO) assembled protein COX17 might be a potential molecular target for the treatment of lung cancers [12]. Recently, Lum et al. assessed the cellular effects of 78 drugs in Saccharomyces cerevisiae using a genome-wide pool of tagged heterozygotes and found that lanosterol synthase could be another target for the antianginal drug molsidomine in the sterol biosynthetic pathway. Moreover, the rRNA processing exosome was identified as a potential target of the cell growth inhibitor 5-fluorouracil [13]. However, those methods described above are experimentally expensive, low throughput and time consuming, which have difficulties in dealing with large-scale target identification and evaluation. To complement these experiential methods, an in silicon inverse-docking approach that can select potential targets of an 1093-3263/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.jmgm.2010.09.004
Author's personal copy g29(2010)326-330 327 ates in the center of the binding site. The specific description on etermine binding pocket for each protein was added into /drug candidates from DrugBank [19] were e been demonstrated to bind se targets with structures have ofdrugs ferent inv covers 159 ing schemes that TarSearc acceptable o 2. Experi 2.1. Target do se the er [26]and
Author's personal copy L. Hui-fang et al. / Journal of Molecular Graphics and Modelling 29 (2010) 326–330 327 active compound from protein cavities database using automated docking has been successfully applied by several groups recently. For example, Cai et al. found that H. pylori Peptide deformylase (HpPDF) is one of the antibiosis targets through inverse docking of an active natural product with an in-house drug target database followed by crystal structure validation [14]. Using an inverse docking based program named as SELNERGY(tm), Tofisopam, an old drug that is used as a racemic mixture to treat anxiety in the 1980s, was recently found to act as a phosphodiesterase 4 inhibitor [15]. Studies by Chen et al. showed that 83% of the experimentally known toxicity and side effect of 8 clinically used agents (aspirin, gentamicin, ibuprofen, indinavir, neomycin, penicillin G, 4H-tamoxifen, and vitamin-C) could be predicted by inverse docking [16]. With the development of various docking algorithms, inverse docking approaches may play a more important role in target identification. Moreover, in conjunction with bioassay and structural biology, inverse docking technology could significantly improve the effectiveness of target identification. Although docking scheme focused on ligand screening against one target has been extensively evaluated by several groups [3,17,18], target screening strategy against multiple targets has not been evaluated thoroughly. In our study, eight multiple target compounds extracted from DrugBank [19] with known target structures were selected to evaluate different inverse docking schemes. A highly qualified target database covering 18 biochemical functions and containing 1714 entries that covers 1594 drug targets was collected as a testing pool. The numerous scoring functions were integrated into different inverse docking schemes to evaluate their effectiveness individually. We found that TarSearch-X is the most efficient method and GOLD is another acceptable one in multiple targets identification and validation. 2. Experimental methods 2.1. Target database construction Protein targets were selected from the scientific literature based on the available information of their biochemical categories. Since the purpose of our study is to assess the effectiveness of different inverse docking methods, only targets with known/available threedimension crystal structures were included in our database. Firstly, protein targets deposited in our database were downloaded from RCSB Protein Data Bank (PDB) [20]. Several principles must be followed: (i) for large sets of targets like transferases and hydrolases, proteins with sequence identity >90% were removed in order to improve the diversity of our database; (ii) for proteins with several entries, structures with high resolutions, proteins from human species or complexes with ligands were preferred; (iii) for proteins in complexes, binding pocket information of the targets could be extracted directly from the specific positions of ligands, and unique protein ID numbers were reserved the same as PDB entry code; (iv) for proteins with more than one binding pockets, ID codes were appended with numeric postfixes. Secondly, water, ions and other HETATM records, which are not related with the protein activity, were all removed from pdb files. Thirdly, hydrogens and Amber7 FF99 charge were added and saved into mol2 files using Sybyl v6.9 software (Tripos Inc, St. Louis, MO). The reliability of our study depends largely on the accuracy of binding site for ligands. Binding pockets of targets were determined according to the following criteria: (i) for complexes, amino acid residues within 7A around the ˚ bound ligand were used to define binding pockets; (ii) for structures without ligand, binding site data were either extracted from the literature or detected by CASTp program [21], which locates and measures pockets and voids on 3D protein structures based on the pocket algorithm of the alpha shape theory [22]. The active pocket is defined as the region 10A around the hydrogen atom that ˚ locates in the center of the binding site. The specific description on how to determine binding pocket for each protein was added into the parameter text file in each entry. 2.2. Compilation of test set Eight diverse drugs/drug candidates from DrugBank [19] were chosen as the test set. They have been demonstrated to bind to multiple targets, and all these targets with structures have been deposited in our database. Structural diversity of the test set prevents the coincidence of the target validation. The chemical structures of these drugs/drug candidates are sketched using ISIS/Draw (ISIS/Draw, MDL Informations Systems, Inc., San Leandro, CA, USA) as shown in Table 1, and their 3D structures with hydrogens were converted by CORINA [23]. Atomic types and bond types of these compounds were inspected and modified manually, and Gasteiger charges were assigned to them. Furthermore, the structures were optimized by means of molecular mechanics, using Tripos force field encoded in Sybyl. Finally, the 3D structures of these compounds were saved in separate mol2 files. In the following inverse docking procedure, multiple conformations of drugs will be generated by each specific inverse docking tool, therefore, only one conformation for each drug was saved here. 2.3. TarSearch-X and TarSearch-M Two in-house inverse docking strategies were developed by extensive conformational sampling in combination with external scoring function rank. The Dock program (version 5.1) is employed to generate an ensemble of docked conformations for each pair of ligand and target. This program uses an efficient Divide-and-Conquer method plus the Greedy algorithm (DCG) for conformational sampling. Each DCG run generates a set of final conformations. During docking, all the rotatable single bonds in the ligand, for instance sp3–sp3 and sp3–sp2, are allowed to rotate except those whose rotations do not result in different conformations, such as the ones connecting a terminal –CH3 group. Flexibility in cyclic parts of the ligand is neglected. Searching steps for translation, rotation, and torsion are set to 0.5A, 15 ˚ ◦, and 15◦, respectively. Since initial ligand conformation in binding site of target may impact conformational sampling, DCG run was repeated with different initial ligand conformations till all final ligand conformations in last three runs are close enough (Root-Mean-Square deviation, RMSD ≤ 1A)˚ to the conformations found before. All conformational sets are merged and duplicate conformations (RMSD ≤ 0.5A) are removed. ˚ Finally, all conformations are evaluated and ranked by X-Score (TarSearch-X) [24] and M-Score [25] (TarSearch-M), respectively. 2.4. Testing of inverse docking schemes Five inverse docking schemes were chosen to be evaluated in this study as shown in Fig. 1, which include two commercially standalone schemes (docking software) successfully implemented in previous studies, i.e. GOLD and FlexX encoded in Sybyl, one Dock4- based scheme implemented in the public server TarFisDock, and the two in-house schemes TarSearch-X and TarSearch-M described above. The evaluation procedures of the five schemes are the same and divided into four steps: (i) extract the active binding pocket information from protein pdb or mol2 files according to the pocket parameter file for each entry; (ii) dock eight small drugs into the binding pockets of all targets in the database using five schemes, respectively; (iii) estimate the affinity of eight small drugs to possible binding sites in all the targets of the database; (v) rank targets of a compound with respect to their protein-ligand affinity scores. Default parameters for each scheme were applied to decrease the artificial factors. TarFishing dock was run by public server [26] and
Author's personal copy L Hui- fang et al Journal of Molecular Graphics and Modelling 29(2010)326-330 he results of inverse docking of eight active compounds with our in-house database using five schemes. rget Code G-Rank M-Rank X-Rank 1335 514 1QKM 1254 1714 Dromostanolone 2 F 1F5F 1NHZ 1159 61 Hydrocortisone 1xA 2 OH Levothyroxin 1 F5F 917 2 1E3G 1104 MLW 111 4 1418 200 Tetrahydrobiopterin F-Rank, G-Rank, E-Rank M-Rank and x-Rank are the ranked number of each target in 1714 entries, based on Flexx, GOLD, TarFisDock, TarSearch-M and TarSearch-X other four inverse docking schemes were automated using com- compounds. All algorithms and scoring functions derived from nand script in our Dell 5400 workstation. hese schemes have been widely used in searching for ligand-target binding mode and affinity. Accordingly, a comparative evaluation 3. Results and discussion of these schemes is of great interest for many researchers in the fields of chemical genetics and target validation. This evaluation successfully examined chemes for recognizing potential targets of an ensemble of 3. 1. Targetable proteins in database drugs/drug candidates. The schemes are either employed in pre- vious studies of target identification by several groups [27-30]or Integrated with PDtD 31 that covers more than 830 known or utilized in in-house strategies of target exploration against active potential drug targets 32]. this database currently contains 1714
Author's personal copy 328 L. Hui-fang et al. / Journal of Molecular Graphics and Modelling 29 (2010) 326–330 Table 1 The results of inverse docking of eight active compounds with our in-house database using five schemes. ID Drug Structure Target Code F-Rank G-Rank E-Rank M-Rank X-Rank 1 OH O Alitretinoin 1UHL 574 174 79 168 1 2CBR 1335 269 514 334 14 1LBD 291 1119 146 434 1447 2 H N S O O O F F3C NC OH Bicalutamide 1E3G 949 503 1714 550 8 1QKM 929 1254 680 1493 810 3 O O O Dromostanolone 1F5F 1062 1 112 152 2 1E3G 1197 317 1714 558 5 4 F O OH Fluoxymesterone 1F5F 553 1 22 192 2 1E3G 334 31 1714 247 4 5 O HO O OH OH Hydrocortisone 1F5F 622 1 13 185 4 1NHZ 1159 212 55 616 15 6 O I I OH I I HO O NH2 Levothyroxine 1NAV 2 6 1375 216 2 1KGI 14 2 860 81 16 7 OH O Testosterone 1F5F 917 2 3 761 1 1E3G 1104 23 7 286 3 8 N H H N N H N NH2 OH O HO Tetrahydrobiopterin 1MLW 54 15 53 111 4 1TOH 1418 909 270 200 10 F-Rank, G-Rank, E-Rank, M-Rank and X-Rank are the ranked number of each target in 1714 entries, based on FlexX, GOLD, TarFisDock, TarSearch-M and TarSearch-X, respectively. other four inverse docking schemes were automated using command script in our Dell 5400 workstation. 3. Results and discussion This evaluation successfully examined five inverse docking schemes for recognizing potential targets of an ensemble of drugs/drug candidates. The schemes are either employed in previous studies of target identification by several groups [27–30] or utilized in in-house strategies of target exploration against active compounds. All algorithms and scoring functions derived from these schemes have been widely used in searching for ligand–target binding mode and affinity. Accordingly, a comparative evaluation of these schemes is of great interest for many researchers in the fields of chemical genetics and target validation. 3.1. Targetable proteins in database Integrated with PDTD [31] that covers more than 830 known or potential drug targets [32], this database currently contains 1714
Author's personal copy L Hui-fang et al /Journal of Molecular Graphics and Modelling 29(2010)326-330 Add Hydrogen and Charge our database, which cannot be further classified It is widely known that 50% of the known drug targets are G protein-coupled receptors (GPCR). However, because of the difficulty to obtain GCPR crystals paration only a few GPCR structures are deposited in our database. There- Conformations fore, receptors, except nuclear receptor that is a separated category in our database, only account for 1. 87% In immune system, the proportion of immunoglobulin, glycoprotein and cytokine is totally 1.75% Others in the immune system account for 2.33%. There are Docking only a few growth factor proteins (1.11%)and transcription regula- tors(0.82%). The biochemical function distributions of the targets in our database are consistent with that in the rcsb protein data Bank. In addition, our database provides an extended list of proteins M- Score for target identification. 3.2. Evaluation of inverse docking schemes with the test set Inverse docking scheme is basically a virtual target screening TarSearch-M. All of the five schemes undergo the process including preparation process, which aims to determine the most favorable binding can ligand and target binding pocket, docking with different algorithms, scoring and didates from deposited target database. As described in the Section inking TarFisDock, Tarsearch-x and Tarsearch-M are based on the same geometric 2, our assessment of inverse docking schemes was performed using eight active compounds, and the results are shown in Table 1. In the TarSearch-X scheme, at least one of the known multiple targets for entries covering 1594 known drug targets, among which more each drug is ranked in top 10 of the scoring list, and all known tar- than 820 targets are from human species. It stores each protein gets except ILBD for alitretinoin and 1QKM for bicalutamide are in both PDB format and mol2 format with Amber7 FF99 charge. ranked in the top 20 of the scoring list. All the compounds except Moreover, detailed binding pocket information is also combined etinoin and bicalutamide display good inverse docking scor in this database. For targets with more than one pocket, sepa- ings in GOLD scheme as that in TarSearch-x. However, most known rated entries with numeric postfixes were saved except those from targets failed to be recognized when Flexx, TarSearch-M or Dock PDTD whose names were preserved in their original denomination schemes were used as the inverse docking engine. Therefore, we tem. For example, monomeric hexokinase I coded as 1CZA has concluded that, among the five schemes TarSearch-X has the high- est success rate for retrieving targets from a huge body of deposit, 1CZA.2 for adenine nucleotides binding site and 1CZA3 for glu- and GOLD schemeis another effective way toretrieve targets Flexx. cose 6-phosphate binding site. The biochemical classification and TarSearch-M and Dock failed to recognize real targets in all eight the number of target structures in each category are shown in Fig. 2. systems Hydrolases, transferases, oxidoreductases, transport proteins and signaling proteins account for 22.23%, 25.03%, 10.97%, 6.89% and 3.3. Implication for the future development of inverse docking 5.58%, respectively and totally they cover 70.70% of the targets in scheme the database Enzyme, the largest category of potential drug tar- gets, is divided into hydrolases, ligases, isomerases, transferases With the progress of genetics and proteomics techn oxidoreductases and lyases in our database based on the chemical more and more proteins that play significant biological and reactions they catalyze, and it covers 67.62% of the total targets. logical roles are characterized. In the mean time. it is essential Besides, membrane proteins (2.80%)is the other large categories in to identify all possible targets of a biologically active compound Distribution of drug targets 100 iF Fig. 2. D 二m The database is primarily divided enzyme system, receptor system, immune system and other ansferases, oxidoreductases and lyases; receptor system includes transport proteins, signaling proteins. iunoglobulin, glycoprotein and cytokine. Besides, growth factor protein, transcription regulator and as that in RCSB Protein data bank
Author's personal copy L. Hui-fang et al. / Journal of Molecular Graphics and Modelling 29 (2010) 326–330 329 Fig. 1. The evaluation flowchart in GOLD, FlexX, TarFisDock, TarSearch-X and TarSearch-M. All of the five schemes undergo the process including preparation for ligand and target binding pocket, docking with different algorithms, scoring and ranking. TarFisDock, TarSearch-X and TarSearch-M are based on the same geometric matching algorithm but different scoring functions. entries covering 1594 known drug targets, among which more than 820 targets are from human species. It stores each protein in both PDB format and mol2 format with Amber7 FF99 charge. Moreover, detailed binding pocket information is also combined in this database. For targets with more than one pocket, separated entries with numeric postfixes were saved except those from PDTD whose names were preserved in their original denomination system. For example, monomeric hexokinase I coded as 1CZA has three binding sites, denominated as 1CZA 1 for ADP binding site, 1CZA 2 for adenine nucleotides binding site and 1CZA 3 for glucose 6-phosphate binding site. The biochemical classification and the number of target structures in each category are shown in Fig. 2. Hydrolases, transferases, oxidoreductases, transport proteins and signaling proteins account for 22.23%, 25.03%, 10.97%, 6.89% and 5.58%, respectively and totally they cover 70.70% of the targets in the database. Enzyme, the largest category of potential drug targets, is divided into hydrolases, ligases, isomerases, transferases, oxidoreductases and lyases in our database based on the chemical reactions they catalyze, and it covers 67.62% of the total targets. Besides, membrane proteins (2.80%) is the other large categories in our database, which cannot be further classified. It is widely known that 50% of the known drug targets are G protein-coupled receptors (GPCR). However, because of the difficulty to obtain GCPR crystals, only a few GPCR structures are deposited in our database. Therefore, receptors, except nuclear receptor that is a separated category in our database, only account for 1.87%. In immune system, the proportion of immunoglobulin, glycoprotein and cytokine is totally 1.75%. Others in the immune system account for 2.33%. There are only a few growth factor proteins (1.11%) and transcription regulators (0.82%). The biochemical function distributions of the targets in our database are consistent with that in the RCSB Protein Data Bank. In addition, our database provides an extended list of proteins for target identification. 3.2. Evaluation of inverse docking schemes with the test set Inverse docking scheme is basically a virtual target screening process, which aims to determine the most favorable binding candidates from deposited target database. As described in the Section 2, our assessment of inverse docking schemes was performed using eight active compounds, and the results are shown in Table 1. In the TarSearch-X scheme, at least one of the known multiple targets for each drug is ranked in top 10 of the scoring list, and all known targets except 1LBD for alitretinoin and 1QKM for bicalutamide are ranked in the top 20 of the scoring list. All the compounds except alitretinoin and bicalutamide display good inverse docking scorings in GOLD scheme as that in TarSearch-X. However, most known targets failed to be recognized when FlexX, TarSearch-M or Dock schemes were used as the inverse docking engine. Therefore, we concluded that, among the five schemes TarSearch-X has the highest success rate for retrieving targets from a huge body of deposit, and GOLD scheme is another effective way to retrieve targets. FlexX, TarSearch-M and Dock failed to recognize real targets in all eight systems. 3.3. Implication for the future development of inverse docking scheme With the progress of genetics and proteomics technologies, more and more proteins that play significant biological and pathological roles are characterized. In the mean time, it is essential to identify all possible targets of a biologically active compound. Fig. 2. Distribution of drug targets based on biochemical functions. The database is primarily divided into enzyme system, receptor system, immune system and other systems. Enzyme system includes hydrolases, ligases, isomerases, transferases, oxidoreductases and lyases; receptor system includes transport proteins, signaling proteins, membrane proteins, nuclear receptors; immune system includes immunoglobulin, glycoprotein and cytokine. Besides, growth factor protein, transcription regulator and blood clotting are also deposited. Biochemical distributions of targets are almost the same as that in RCSB Protein Data Bank
Author's personal copy L Hui-fang et al Journal of Molecular Graphics and Modelling 29(2010)326-330 Therefore there is a strong need to develop an effective and reliable [7] D.N. Howbrook, A M. van der Valk, M.C. O'Shaughnessy, D.K. Sarker, SC.Baker, strategy to identify multiple targets of an active compound. when used along with the experimental techniques for target identifica- [8 L Scherer, J. Rossi, Approaches for the sequencespecific knockdown of mRNA, tion, inverse docking in silicon has several advantages. The most hysical isolation of the target is needed. This is 9I S.Y. Tan, D. Guschin, A Davalos, YL Lee, Aw. Snowden, YJo inverse docking can be used to predict the potential undesirable isi, P D. Gregory. Zinc-fin S.K. Spratt, C.C. Case, C.O. Pabo, J. Camp effects of a compound. The inverse docking schemes currently used single-gene specificity, Proc. Natl. Acad. Sci. U.S.A. 100 only take into account the structural flexibility of small ligands. 110) S.M. Kher son, V Shetty, TA Neubert, Y T Chang, Facilitated forward chemical genetics order to shorten the computational time. Future inverse docking using a tagged triazine library and zebrafish embryo screening. J Am. Chem. engine should further balance the computational time, cost, and (111 G.. _. Chen, M Uttamchandani Q. Zhu, C. wang. S.Q., Yao, Developing a strategy flexibility of ligands and active pockets. detection of enzymes in a protein mi [12] C. Suzuki, Y. Daigo, T. Kikuchi, T. Katagiri, Y Nakamura, Identification of COX17 therapeutic target for non-small cell lung cancer, Cancer Res 63(2003) 7038-7041 In this study, a target database was established as a testing [131 P.Y. Lum, C.D. Armour, S.B. Stepaniants, G Cavet, M K Wolf, J.S. Butler, J.C. Hin. pool for inverse docking, which contains a highly qualified dataset omposed of 1714 entries from 1594 known drug targets covering nodes of action for therapeutic compounds using a genome-wide screen of FlexX, TarFisDock, TarSearch-X and TarSearch-M were evaluated by [141 J.H. Cai. C. Han, T.C. Hu, J. Zhang D L. Wu, E.D. Wang. Y.Q. Liu, J.P. Ding Kx nversely docking eight multi-target drugs into our in-house target enzymatic assay, and x-ray database. the results show that tarSearch-X is the most effective method and GOLD is an acceptable one in the multiple targets iden- [ 15] P Bernard, C. Dufresne-Favetta, P. Favetta, Q.T. Do, F. Himbert, S. Zubrzycki, tification and evaluation. These two inverse docking strategies ca be used to predict the undesirable effects of drugs such as toxicity, [16] YZ. Chen, C.Y. Ung. Prediction of potential toxicity and side effect protein tar- and they can also provide clues on the mechanisms of drug actions. ur in-house target database can be used to predict potential tar- (17) RX Wang. Y P. Lu, XL Fang S M Wang, An extensive test of 14 scoring func- gets of new active compounds. Even though more optimization is J. Chem. still needed, the identified inverse docking strategy can be used for lnf. Comput. Sci.44(2004)2114-212 multiple targets identification in drug discovery [18 M. Kulharia, R.S. Goody, R N. Jackson, Information theory-based scoring func Chem.nf. Model.48(2008)1990-1998. Acknowledgements [19TheDrugbank,http://www.drugbank.ca We express gratitude to Mr. Chang shun, Jingze Niu, Sixue SurfaceTopographyserverhttp://sts- Zhang(Fudan University )for literature search and database build- [22] D. Joe, O Zheng T Jeffery. B. Andrew, T. Yaron, L Jie, CASTp ing. We gratefully acknowledge financial support from the Agilent Foundation(No. 0557), National Science Foundation of China ping of functionally annotated resides. Nucleic Acids Res 34 )W116- 118. 2009ZX09301-011). The project is also sponsored by the Scientific 124/etoQ-100gders using ser. C Klebe. Comparison of automatic 3-dimensional gy Research and Development Program of China(863 Program No. 2009AA02Z308)and National Drug Innovative Progr ng, Further development and validation of empirical Research foundation for the returned overseas chinese scholars Aided MoL Des. 16(2002)11-26. State Education Ministry. 25] C.Y. Yang, RX. Wang, S.M. Wang M-score ng function accounting for protein atom mobility. J Med. Chem. 49(2006) References server harmacophores, Curr. Drug Discov. Technol. 1 (2004)45 121A Cavalli, ML Bolognesi, A Minarini, M. Rosini, V. Tumiatti, M. Recanatini, 128 ML Verdonk, JC. Cole, M!L Hartshorn. cw. Murray. R.D. Taylor, Improv protein-ligand docking using GOLD, Protein Struct. Funct. Genet. 52(2003) [3] LXie, J. Li, P.E. Bourne, -ligand biovery using chemical systems biology: iden- 129) R. Matthias,KBernd,L Thomas,KGe to explain the side effects of [30) H. Sato, LM Shewchuk. ]. Tang. Prediction of [4] G.V. Paolini, R.H. B. Shapland, W.P. van Hoorn, J.S. Mason, A L. Hopkin Global mapping of pharmacological space, Nat Biotechnol. 24(2006) gandFit: an evaluation of performance, Chem Int I5]R Morphy, Z Rankovic. Designed multipl ligands. An emerging drug discovery 32iZTe Gao,H⊥LLi,HL X F Liu, L Kang. x M. Luo, WL Zhu, K K. Chen, [6]T. Kennedy, Managing the drug discovery/development interface, Drug Discov XC Wang HL Jiang, a web-accessible protein database for drug target Today2(1997)436-444 identification, BMC Bioinf 9(2008)104-110
Author's personal copy 330 L. Hui-fang et al. / Journal of Molecular Graphics and Modelling 29 (2010) 326–330 Therefore, there is a strong need to develop an effective and reliable strategy to identify multiple targets of an active compound. When used along with the experimental techniques for target identification, inverse docking in silicon has several advantages. The most apparent one is no physical isolation of the target is needed. This is especially true for the targets that are difficult to isolate. Moreover, inverse docking can be used to predict the potential undesirable effects of a compound. The inverse docking schemes currently used only take into account the structural flexibility of small ligands. Conformational change of the binding pockets is overlooked in order to shorten the computational time. Future inverse docking engine should further balance the computational time, cost, and flexibility of ligands and active pockets. 4. Conclusions In this study, a target database was established as a testing pool for inverse docking, which contains a highly qualified dataset composed of 1714 entries from 1594 known drug targets covering 18 biochemical functions. Five inverse docking strategies, GOLD, FlexX, TarFisDock, TarSearch-X and TarSearch-M were evaluated by inversely docking eight multi-target drugs into our in-house target database. The results show that TarSearch-X is the most effective method and GOLD is an acceptable one in the multiple targets identification and evaluation. These two inverse docking strategies can be used to predict the undesirable effects of drugs such as toxicity, and they can also provide clues on the mechanisms of drug actions. Our in-house target database can be used to predict potential targets of new active compounds. Even though more optimization is still needed, the identified inverse docking strategy can be used for multiple targets identification in drug discovery. Acknowledgements We express gratitude to Mr. Chang Shun, Jingze Niu, Sixue Zhang (Fudan University) for literature search and database building. We gratefully acknowledge financial support from the Agilent Foundation (No. 0557), National Science Foundation of China (NSFC) (No. 20702009), grants from the National High Technology Research and Development Program of China (863 Program) (No. 2009AA02Z308) and National Drug Innovative Program (No. 2009ZX09301-011). The project is also sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry. References [1] C. Ezerzer, M. Dolgin, J. Skovorodnikova, N. Harris, Chemokine receptor-derived peptides as multi-target drug leads for the treatment of inflammatory diseases, Peptides 30 (2009) 1296–1305. [2] A. Cavalli, M.L. Bolognesi, A. Minarini, M. Rosini, V. Tumiatti, M. Recanatini, C. Melchiorre, Multi-target-directed ligands to combat neurodegenerative diseases, J. Med. Chem. 51 (2008) 347–372. [3] L. Xie, J. Li, P.E. Bourne, Drug discovery using chemical systems biology: identification of the protein-ligand binding network to explain the side effects of CETP inhibitors, PloS Comput. Biol. 5 (2009) 1–10. [4] G.V. Paolini, R.H.B. Shapland, W.P. van Hoorn, J.S. Mason, A.L. Hopkins, Global mapping of pharmacological space, Nat. Biotechnol. 24 (2006) 805–815. [5] R. Morphy, Z. Rankovic, Designed multiple ligands. An emerging drug discovery paradigm, J. Med. Chem. 48 (2005) 6523–6543. [6] T. Kennedy, Managing the drug discovery/development interface, Drug Discov. Today 2 (1997) 436–444. [7] D.N. Howbrook, A.M. van der Valk, M.C. O’Shaughnessy, D.K. Sarker, S.C. Baker, A.W. Lloyd, Developments in microarray technologies, Drug Discov. Today 8 (2003) 642–651. [8] L. Scherer, J. Rossi, Approaches for the sequencespecific knockdown of mRNA, Nat. Biotechnol. 21 (2003) 1457–1465. [9] S.Y. Tan, D. Guschin, A. Davalos, Y.L. Lee, A.W. Snowden, Y. Jouvenot, H.S. Zhang, K. Howes, A.R. McNamara, A. Lai, C. Ullman, L. Reynolds, M. Moore, M. Isalan, L.P. Berg, B. Campos, H. Qi, S.K. Spratt, C.C. Case, C.O. Pabo, J. Campisi, P.D. Gregory, Zinc-finger protein-targeted gene regulation: genomewide single-gene specificity, Proc. Natl. Acad. Sci. U.S.A. 100 (2003) 11997– 12002. [10] S.M. Khersonsky, D.W. Jung, T.W. Kang, D.P. Walsh, H.S. Moon, H. Jo, E.M. Jacobson, V. Shetty, T.A. Neubert, Y.T. Chang, Facilitated forward chemical genetics using a tagged triazine library and zebrafish embryo screening, J. Am. Chem. Soc. 125 (2003) 11804–11805. [11] G.Y.J. Chen, M. Uttamchandani, Q. Zhu, G. Wang, S.Q. Yao, Developing a strategy for activity-based detection of enzymes in a protein microarray, ChemBioChem 4 (2003) 336–339. [12] C. Suzuki, Y. Daigo, T. Kikuchi, T. Katagiri, Y. Nakamura, Identification of COX17 as a therapeutic target for non-small cell lung cancer, Cancer Res. 63 (2003) 7038–7041. [13] P.Y. Lum, C.D. Armour, S.B. Stepaniants, G. Cavet, M.K. Wolf, J.S. Butler, J.C. Hinsha, P. Garnier, G.D. Prestwich, A. Leonardson, P. Garrett-Engele, C.M. Rush, M. Bard, G. Schimmack, J.W. Phillips, C.J. Roberts, D.D. Shoemaker, Discovering modes of action for therapeutic compounds using a genome-wide screen of yeast heterozygotes, Cell 116 (2004) 121–137. [14] J.H. Cai, C. Han, T.C. Hu, J. Zhang, D.L. Wu, F.D. Wang, Y.Q. Liu, J.P. Ding, K.X. Chen, J.M. Yue, X. Shen, H.L. Jiang, Peptide deformylase is a potential target for anti-Helicobacter pylori drugs: Reverse docking, enzymatic assay, and X-ray crystallography validation, Protein Sci. 15 (2006) 2071–2081. [15] P Bernard, C. Dufresne-Favetta, P. Favetta, Q.T. Do, F. Himbert, S. Zubrzycki, T. Scior, C. Lugnier, Application of drug repositioning strategy to TOFISOPAM, Curr. Med. Chem. 15 (2008) 3196–3203. [16] Y.Z. Chen, C.Y. Ung, Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach, J. Mol. Graph. Model. 20 (2001) 199–218. [17] R.X. Wang, Y.P. Lu, X.L. Fang, S.M. Wang, An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes, J. Chem. Inf. Comput. Sci. 44 (2004) 2114–2125. [18] M. Kulharia, R.S. Goody, R.N. Jackson, Information theory-based scoring function for the structure-based prediction of protein-ligand binding affinity, J. Chem. Inf. Model. 48 (2008) 1990–1998. [19] The DrugBank, http://www.drugbank.ca/. [20] The Protein Data Bank, http://www.rcsb.org/pdb/home/home.do. [21] The Computed Atlas of Surface Topography server, http://stsfw.bioengr.uic.edu/castp/index.php. [22] D. Joe, O. Zheng, T. Jeffery, B. Andrew, T. Yaron, L. Jie, CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated resiudes, Nucleic Acids Res. 34 (2006) W116– W118. [23] J. Sadowski, J. Gasteiger, G. Klebe, Comparison of automatic 3-dimensional model builders using 639 X-ray structures, J. Chem. Inf. Comput. Sci. 34 (1994) 1000–1008. [24] R.X.Wang, L.H. Lai, S.M.Wang, Further development and validation of empirical scoring functions for structure-based binding affinity prediction, J. Comput. Aided Mol. Des. 16 (2002) 11–26. [25] C.Y. Yang, R.X. Wang, S.M. Wang, M-score: a knowledge-based potential scoring function accounting for protein atom mobility, J. Med. Chem. 49 (2006) 5903–5911. [26] The Target Fishing Dock public server, http://www.dddc.ac.cn/tarfisdock/index.php. [27] H. Claussen, M. Gastreich, V. Apelt, J. Greene, S.A. Hindle, C. Lemmen, The FlexX database docking environment – rational extraction of receptor based pharmacophores, Curr. Drug Discov. Technol. 1 (2004) 49–60. [28] M.L. Verdonk, J.C. Cole, M.J. Hartshorn, C.W. Murray, R.D. Taylor, Improved protein-ligand docking using GOLD, Protein Struct. Funct. Genet. 52 (2003) 609–623. [29] R. Matthias, K. Bernd, L. Thomas, K. Gerhard, A fast flexible docking method using an incremental construction algorithm, J. Mol. Biol. 261 (1996) 470–489. [30] H. Sato, L.M. Shewchuk, J. Tang, Prediction of multiple binding modes of the CDK2 inhibitors, anilinopyrazoles, using the automated docking programs GOLD, FlexX, and LigandFit: an evaluation of performance, J. Chem. Inf. Model. 46 (2006) 2552–2562. [31] The Potential Drug Target Database, http://www.dddc.ac.cn/pdtd/index.php. [32] Z.T. Gao, H.L. Li, H.L. Zhang, X.F. Liu, L. Kang, X.M. Luo, W.L. Zhu, K.K. Chen, X.C. Wang, H.L. Jiang, PDTD: a web-accessible protein database for drug target identification, BMC Bioinf. 9 (2008) 104–110