RESEARCH REVIEW ot so stupid atter all?int Stat Rer.69 Univ.Pr ng in neural networks:an overview.Neural Netw uld bl d desig nd ould the end of the 32 RRreedlantangashepomialomolhione 2016 33. ymthesandrc Chem Eu20 ogues n systerns.Proc R.Soc 6289201E 2cDA地g为 ol.Retro elism 04 crystallinity of 7.19 934(2 Blum.V.Ha P.&Sc elect 名 533.7-762016 eory calculations o .J.et al.Th and rials 108.238-238201 6 S. 3.31267 47.ue oeAutventra of (201 2010 49. atic DFT errors in 19 with machin 2351492012 21 014 with machine 55.Behl and condend Cher 554 I NATURE I VOL 559 1 26 JULY 2018 018RESEARCH Review behaviour of a compound might depend on knowledge that scientists do not yet possess, such as a many-body interaction giving rise to a new type of superconductivity. If an advanced machine-learning system was able to learn key aspects of quantum mechanics, it is hard to envisage how its connection weights could be turned into a comprehensible theory if the scientist lacked understanding of a fundamental component of it. Finally, there may be scientific laws that are so complex that, were they to be discovered by a machine-learning system, they would be too challenging for even a knowledgeable scientist to understand. A machine-learning system that could discern and use such laws would truly be a computational black box. As scientists embrace the inclusion of machine learning with statistically driven design in their research programmes, the number of reported applications is growing at an extraordinary rate. This new generation of computational science, supported by a platform of open-source tools and data sharing, has the potential to revolutionize molecular and materials discovery. Received: 20 October 2017; Accepted: 9 May 2018; Published online 25 July 2018. 1. Dirac, P. A. M. Quantum mechanics of many-electron systems. Proc. R. Soc. Lond. A 123, 714–733 (1929). 2. Pople, J. A. Quantum chemical models (Nobel lecture). Angew. Chem. Int. Ed. 38, 1894–1902 (1999). 3. Boyd, D. B. Quantum chemistry program exchange, facilitator of theoretical and computational chemistry in pre-internet history. ACS Symp. Ser. 1122, 221–273 (2013). 4. Arita, M., Bowler, D. R. & Miyazaki, T. Stable and efcient linear scaling frst-principles molecular dynamics for 10000+ atoms. J. Chem. Theory Comput. 10, 5419–5425 (2014). 5. Wilkinson, K. A., Hine, N. D. M. & Skylaris, C.-K. Hybrid mpi-openmp parallelism in the Onetep linear-scaling electronic structure code: application to the delamination of cellulose nanofbrils. J. Chem. Theory Comput. 10, 4782–4794 (2014). 6. Havu, V., Blum, V., Havu, P. & Schefer, M. Efcient O(N) integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 228, 8367–8379 (2009). 7. Catlow, C. R. A., Sokol, A. A. & Walsh, A. Computational Approaches to Energy Materials (Wiley-Blackwell, New York, 2013). 8. Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964). 9. Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation efects. Phys. Rev. 140, A1133–A1138 (1965). 10. Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016). 11. Hachmann, J. et al. The Harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241–2251 (2011). 12. Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013). 13. Calderon, C. E. et al. The AFLOW standard for high-throughput materials science calculations. Comput. Mater. Sci. 108, 233–238 (2015). 14. Agrawal, A. & Choudhary, A. Perspective: Materials informatics and big data: realization of the ‘fourth paradigm’ of science in materials science. APL Mater. 4, 053208 (2016). 15. Schwab, K. The fourth industrial revolution. Foreign Afairs https://www. foreignafairs.com/articles/2015-12-12/fourth-industrial-revolution (2015). 16. Fourches, D., Muratov, E. & Tropsha, A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50, 1189–1204 (2010). 17. Kireeva, N. et al. Generative topographic mapping (GTM): universal tool for data visualization, structure-activity modeling and dataset comparison. Mol. Inform. 31, 301–312 (2012). 18. Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017). 19. Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012). 20. Bonchev, D. & Rouvray, D. H. Chemical Graph Theory: Introduction and Fundamentals (Abacus Press, New York, 1991). 21. Schütt, K. T. et al. How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89, 205118 (2014). A radial-distribution-function description of periodic solids is adapted for machine-learning models and applied to predict the electronic density of states for a range of materials. 22. Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96, 024104 (2017). 23. Isayev, O. et al. Universal fragment descriptors for predicting electronic properties of inorganic crystals. Nat. Commun. 8, 15679 (2017). 24. Hand, D. J. & Yu, K. Idiot’s Bayes—not so stupid after all? Int. Stat. Rev. 69, 385–398 (2001). 25. Shakhnarovich, G., Darrell, T. & Indyk, P. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (MIT Press, Boston, 2005). 26. Rokach, L. & Maimon, O. in Data Mining and Knowledge Discovery Handbook (eds Maimon, O. & Rokach, L.) 149–174 (Springer, New York, 2010). 27. Shawe-Taylor, J. & Cristianini, N. Kernel Methods for Pattern Analysis (Cambridge Univ. Press, Cambridge, 2004). 28. Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). 29. Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic synthesis. Science 166, 178–192 (1969). 30. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018). A computer-driven retrosynthesis tool was trained on most published reactions in organic chemistry. 31. Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016). 32. Klucznik, T. et al. Efcient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018). 33. Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017). 34. Cole, J. C. et al. Generation of crystal structures using known crystal structures as analogues. Acta Crystallogr. B 72, 530–541 (2016). 35. Gómez-Bombarelli, R. et al. Design of efcient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016). This study uses machine learning to guide all stages of a materials discovery workfow from quantum-chemical calculations to materials synthesis. 36. Jastrzębski, S., Leśniak, D. & Czarnecki, W. M. Learning to SMILE(S). Preprint at https://arxiv.org/abs/1602.06289 (2016). 37. Nam, J. & Kim, J. Linking the neural machine translation and the prediction of organic chemistry reactions. Preprint at https://arxiv.org/abs/1612.09529 (2016). 38. Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-tosequence models. ACS Cent. Sci. 3, 1103–1113 (2017). 39. Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low data drug discovery with one-shot learning. ACS Cent. Sci. 3, 283–293 (2017). 40. Wicker, J. G. P. & Cooper, R. I. Will it crystallise? Predicting crystallinity of molecular materials. CrystEngComm 17, 1927–1934 (2015). This paper presents a crystal engineering application of machine learning to assess the probability of a given molecule forming a high-quality crystal. 41. Pillong, M. et al. A publicly available crystallisation data set and its application in machine learning. CrystEngComm 19, 3737–3745 (2017). 42. Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016). The study trains a machine-learning model to predict the success of a chemical reaction, incorporating the results of unsuccessful attempts as well as known (successful) reactions. 43. Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017). 44. Billinge, S. J. L. & Levin, I. The problem with determining atomic structure at the nanoscale. Science 316, 561–565 (2007). 45. Kalinin, S. V., Sumpter, B. G. & Archibald, R. K. Big–deep–smart data in imaging for guiding materials design. Nat. Mater. 14, 973–980 (2015). 46. Ziatdinov, M., Maksov, A. & Kalinin, S. V. Learning surface molecular structures via machine vision. npj Comput. Mater. 3, 31 (2017). 47. de Albuquerque, V. H. C., Cortez, P. C., de Alexandria, A. R. & Tavares, J. M. R. S. A new solution for automatic microstructures analysis from images based on a backpropagation artifcial neural network. Nondestruct. Test. Eval. 23, 273–283 (2008). 48. Hui, Y. & Liu, Y. Volumetric data exploration with machine learning-aided visualization in neutron science. Preprint at https://arxiv.org/abs/1710.05994 (2017). 49. Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017). 50. Christensen, R., Hansen, H. A. & Vegge, T. Identifying systematic DFT errors in catalytic reactions. Catal. Sci. Technol. 5, 4946–4949 (2015). 51. Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012). 52. Wellendorf, J. et al. Density functionals for surface science: exchangecorrelation model development with Bayesian error estimation. Phys. Rev. B 85, 235149 (2012). 53. Mardirossian, N. & Head-Gordon, M. ωB97M-V a combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. J. Chem. Phys. 144, 214110 (2016). 54. Brockherde, F. et al. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 8, 872 (2017). This study transcends the standard approach to DFT by providing a direct mapping from density to energy, paving the way for higher-accuracy approaches. 55. Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 56, 12828– 12840 (2017). 554 | NATUR E | V OL 559 | 26 J U LY 2018 © 2018 Springer Nature Limited. All rights reserved