380©C.Xiao et al. identities to and rareness in the HIV-I sequences. both to exchange genetic materials is negligible.On HTV-1 coul dely present in variou v genome. C. 2019-nCov but also heir infected cells ifr omhination indeedo equences from bats:two (ZC45 and ZXC21)from ever,extensive studies of more CoV viruses in wild and Zhejiang deposited in C enBank in 2018 and RaTG13 3o9 Yunnan ob 12015 inserte la 1A.n9 an es and the understand how Cov viruses jump from animalsto In the RaTG13 are identical humans and adapt in the latter.Current data showed G)to those in 2019 related to 2019-nCo one one SU 6 them B.T serve as more divergent from 2019-nCoV than RaTG13,but both also contain similar insertions for SARS and camel for MERS [3,12]are remained to e cept nsertion4 (Figure Furthe identified.More studie her ns but h 301 is may These results clearly show that three out of four of a large number of wild and domestic animals in an these inserts naturally exist in three bat Cov viruses case reducing or eliminating direct contacts with This undoubtedl wild animals will be critical to ontrol the new epi 10 h tools likely 2019 sed analysis nCoV originated from RaTG13-like CoV viruses sequences.However,great care is required for compre Third,insertions 1 and 2 in 2019-nCoV have 6-AA hensive and thorough analysis to fully understand the certain HIV biological implica ons the new n ca 1C)9 ever.insertion 3 located betwe and 2 in and harm the process of true scientific disc eries and 2019-nCoV has sequences similar (with deletions)to the effort to control the damage to public health. those in the V1 region of HIV-1 gp120.V1 is far v5 on th Acknowledgments V4 and Vs i the spike protein str Gag protein of HIV-1 that not ntry Disclosure statement nCoV spik rotein(Figu No potential conflict of interest was reported by the author(s) selection benefit or ratio nale for 2019-ncoy to obtain ORCID nLi http:/lorcido o ng 000-0002-5780-088 //0000-0001-8903-0203 remains unknown.For any virus to obtain additional insert sequences from other organisms,it requires that it has direct interactions with othe References organisms [1]Gao F,Bailes E Robertson DL et al.Origin of HIV-1 ir most kely th r non in the e Pan ats fr re both CoV viruses and HIV-1 are different,the chance for 20050ct283105748676679.identities to and rareness in the HIV-1 sequences, HIV-1 could not be the source for those insertion sequences in the 2019-nCoV genome. Second, these insertions are present not only in 2019-nCoV viruses but also in three betaCoV sequences from bats: two (ZC45 and ZXC21) from Zhejiang deposited in GenBank in 2018 and RaTG13 from Yunnan obtained in 2013 [8]. The RaTG13 is much more similar to 2019-nCoV than both ZC45 and ZXC21 (Figure 1A). The similarity of the spike protein between RaTG13 and 2019-nCoV is 97.7%. In the RaTG13 genome, two inserts are identical (HKNNKS and RSYLTPGDSSSG) to those in 2019- nCoV, one has one T → I substitution (TNGIKR), and the fourth one misses the C-terminal 4 amino acids (QTNS----) (Figure 1B). ZC45 and ZXC21 are more divergent from 2019-nCoV than RaTG13, but both also contain similar insertions at three insertion sites, except insertion 4 (Figure 1B). Furthermore, many other CoV viruses have similar insertions but with different sequences at the insertion 1 position. These results clearly show that three out of four of these inserts naturally exist in three bat CoV viruses before 2019-nCoV was identified. This undoubtedly refutes the possibility that 2019-nCoV is generated through obtaining gene fragments from the HIV-1 genome. Instead, it is much more likely that 2019- nCoV originated from RaTG13-like CoV viruses. Third, insertions 1 and 2 in 2019-nCoV have 6-AA motifs identical to those in V4 and V5 of certain HIV-1 gp120 isolates, which are structurally close to each other but separated by a LE loop (Figure 1C) [9]. However, insertion 3 located between insertions 1 and 2 in 2019-nCoV has sequences similar (with deletions) to those in the V1 region of HIV-1 gp120. V1 is far away from V4 and V5 on the opposite side of gp120, which should not interact with V4/V5 in gp120 (Figure 1C) but is now inserted between V4 and V5 in the modelled the 2019-nCoV spike protein structure [10]. Insertion 4 was found in Gag protein of HIV-1 that is not associated with viral entry. This insertion is located too far to be considered to form the same structural unit with the other three insertions in the 2019- nCoV spike protein (Figure 1C). We do not see any selection benefit or rationale for 2019-nCoV to obtain and mix structurally unrelated parts of HIV-1 to generate a unique structure for its enhanced receptor binding as indicated by the authors [8]. How the three bat CoV viruses obtain those inserts remains unknown. For any virus to obtain additional insert sequences from other organisms, it requires that it has direct interactions with other organisms, most likely through homologous or non-homologous recombination [11]. For bat CoV viruses to gain the gene fragments from HIV-1, it will require both viruses to co-infect the same cells. Because the host cells for bat CoV viruses and HIV-1 are different, the chance for both to exchange genetic materials is negligible. On the contrary, these motifs are widely present in various mammalian cells and so it will be more likely for bat CoV viruses to gain those motifs from the genomes of their infected cells if recombination indeed occurs. However, extensive studies of more CoV viruses in wild and domestic animals are warranted to address this question. Identification of the origins of these inserted sequences in three bat CoV viruses and the new epidemic 2019-nCoV strain will be important for us to understand how CoV viruses jump from animals to humans and adapt in the latter. Current data showed that RaTG13 is most closely related to 2019-nCoV [7]. However, the genetic difference between them is too high for RaTG13 to serve as the immediate ancestor of 2019-nCoV. Other viruses that are more closely related to 2019-nCoV in intermediate animals like civet for SARS and camel for MERS [3,12] are remained to be identified. More studies are necessary to identify the real source of 2019-nCoV. This may take a long time to identify the origin of 2019-nCoV by screening a large number of wild and domestic animals. In any case, reducing or eliminating direct contacts with wild animals will be critical to control the new epidemic infection diseases in the future. The advances in bioinformatics analysis tools are widely used to easily and rapidly analyse newly obtained sequences. However, great care is required for comprehensive and thorough analysis to fully understand the real biological implications of the new genomic information. Biased, partial and incorrect analysis can dangerously lead to conclusions that fuel conspiracies and harm the process of true scientific discoveries and the effort to control the damage to public health. Acknowledgments We greatly appreciate Youyu He (Shanghai Center for Bioinformation Technology) in helping us to blast the insertion sequences against the viral sequence database. Disclosure statement No potential conflict of interest was reported by the author(s). ORCID Xiaojun Li http://orcid.org/0000-0002-5780-0880 Feng Gao http://orcid.org/0000-0001-8903-0203 References [1] Gao F, Bailes E, Robertson DL, et al. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature. 1999;397(6718):436–441. [2] Li W, Shi Z, Yu M, et al. Bats are natural reservoirs of SARS-like coronaviruses. Science (New York, NY). 2005 Oct 28:310(5748):676–679. 380 C. Xiao et al