REVIEW RESEARCH stalli uni omiiaccnmr0ggtomptiadinargankpodicdtomaiom from trair 公 ted fut and,the ce for the Ithas been der ystems.Much of this ergy surf ough the combir ded in th guantiicatior dminimi tior oferro pagation DAKOTA nsional char cterization ation of g overy of new cor mpounds egraesymthesischaracterizationand8ddtbe2iedtoct t dat 6 lh n stru Property r d mat ng to the challer 00 nachine-lea e lin et 99 agnet opt con chine learn the mbe tions ransitions between them n al Similar i Enhancing the tical ch ule or machine lea and atomic st cture.In actice,the computations pidly in stals a given compo is de voted to finding sh s and an mations that e able the ch wa ical c mposition PP d on DFT have been succ curacy at re able cos DFT and re rally high edelect As an to learning from experimental data,calculated that d non-cla cal inte There ar has been 26 JULY 2018 I VOL 559 I NATURE I 551 20185 All rig Review RESEARCH f electron) systems and for the latest generation of quantum materials (such as iron pnictide superconductors), which often require a more sophisticated many-body Hamiltonian. Drawing from the growing number of structure–property databases (Table 3), accurate universal density functionals can be learned from data50,51. Early examples include the Bayesian error-estimation functional52 and combinatorially optimized DFT functionals53. Going beyond the standard approach to DFT, the need to solve the Kohn–Sham equations can be by-passed by learning density-to-energy and density-to-potential maps directly from training systems54. Equally challenging is the description of chemical processes across length scales and timescales, such as the corrosion of metals in the presence of oxygen and water. A realistic description of chemical interactions (bond forming and breaking) including solvents, interfaces and disorder is still limited by the computational cost of available quantum-mechanical approaches. The task of developing transferrable analytic force fields is a well-defined problem for machine learning55,56. It has been demonstrated that, in simple materials, approximate potential-energy surfaces learned from quantum-mechanical data can save orders of magnitude in processing cost57,58. Although the combination of methods with varying levels of approximation is promising, much work is needed in the quantification and minimization of error propagation across methods. In this context, initiatives for error estimation such as the DAKOTA package (https://dakota.sandia.gov) are critically important. Targeting discovery of new compounds We have considered how machine learning can be used to enhance and integrate synthesis, characterization and modelling. However, machine learning can also reveal new ways of discovering compounds. Models that relate system descriptors to desirable properties are already used to reveal previously unknown structure–property relationships59,60. So far, the fields of molecular (primarily pharmaceutical and medicinal) and materials chemistry have experienced different degrees of uptake of machine-learning approaches to the design of new compounds, in part owing to the challenges of representing the crystal structure and morphology of extended solids. Crystalline solids. The application of machine learning to the discovery of functional materials is an emerging field. An early report in 1998 applied machine learning to the prediction of magnetic and optoelectronic materials61, but the number of studies has risen substantially only since 201062–64. The complexity of games like Go is reminiscent of certain problems in materials science65,66, such as the description of on-lattice interactions that govern chemical disorder, magnetism and ferroelectricity. Even for representations of small unit cells, the number of configurations of a disordered crystal can quickly exceed the limitations of conventional approaches. An inverse-design procedure illustrated how such a combinatorial space for an alloy could be harnessed to realize specific electronic structure features67. Similar inverse-design approaches have also been applied in molecular chemistry to tailor ground- and excited-state properties68. Predicting the likelihood of a composition to adopt a given crystal structure is a good example of a supervised classification problem in machine learning. Two recent examples involve predicting how likely a given composition is to adopt the so-called Heusler and half-Heusler crystal structures. The first predicts the likelihood that a given composition will adopt the Heusler structure and is trained on experimental data69. This approach was applied to screen hypothetical compositions and successfully identified 12 new gallide compounds, which were subsequently verified experimentally. In the second, a random forest model was trained on experimental data to learn the probability that a given ABC stoichiometry would adopt the half-Heusler structure70. As an alternative to learning from experimental data, calculated properties can be used as a training set for machine learning. Assessing the degree of similarity between electronic band structures has been shown to yield improved photocathodes for dye-sensitized solar cells71. A machine-learning model, trained to reproduce energies for the with an accuracy of around 80% has been demonstrated40. Crucially, this model had access to a training set of more than 20,000 crystalline and non-crystalline compounds. The availability of such open-access databases is pivotal for the further development of similar predictive models41. In another study, a model was trained to predict the reaction conditions for new organically templated inorganic-product formation with a success rate of 89%42. A less explored avenue of machine learning is how to best sample the set of possible experimental set-ups. Active learning predicts the optimal future experiments that are required to better understand a given problem. It was recently applied to help to understand the conditions for the synthesis and crystallization of complex polyoxometalate clusters43. Starting from initial data on failed and successful experiments, the machine-learning approach directed future experiments and was shown to be capable of covering six times as much crystallization space as a human researcher in the same number of experiments. Computational assistance for the planning and direction of chemical synthesis has come a long way since the early days of hand-coded expert systems. Much of this progress has been achieved in the past five years. Incorporation of artificial-intelligence-based chemical planners, with advances in robotic synthesis43, promises a rich new frontier in the production of novel compounds. Assisting multi-dimensional characterization The structure of molecules and materials is typically deduced by a combination of experimental methods, such as X-ray and neutron diffraction, magnetic and spin resonance, and vibrational spectroscopy. Each approach has a certain sensitivity and length scale, and information from each method is complementary. Unfortunately, it is rare that data are fully assimilated into a coherent description of atomic structure. Analyses of individual streams often result in conflicting descriptions of the same compound44. A solution could be to incorporate real-time data into the modelling, with results then returned to the experiment, forming a feedback loop45. Machine learning represents a unifying framework that could enable the synergy of synthesis, imaging, theory and simulations. The power of machine-learning methods for enhancing the link between modelling and experiment has been demonstrated in the field of surface science. Complex surface reconstructions have been characterized by combining ab initio simulations with multi-stage pattern-recognition systems that use convolutional neural networks46. Machine-learning methods have also recently shown promise in areas such as microstructural characterization47 and the identification of interesting regions in large, complex, neutron-scattering, volumetric (three-dimensional) datasets48. Another example of machine learning opening new avenues in an area of complicated characterization is phase transitions of highly correlated systems; neural networks have been trained to encode topological phases of matter and thus identify transitions between them49. Enhancing theoretical chemistry Modelling is now commonly considered as being equally important as synthesis and characterization for successful programmes of research. Using atomistic simulations, the properties of a molecule or material can, in principle, be calculated for any chemical composition and atomic structure. In practice, the computations grow rapidly in complexity as the size of the system increases, so considerable effort is devoted to finding short-cuts and approximations that enable the properties of the material to be calculated to an acceptable degree of fidelity, without the need for unreasonable amounts of computer time. Approaches based on DFT have been successful in predicting the properties of many classes of compounds, offering generally high accuracy at reasonable cost. However, DFT and related electronicstructure techniques are limited by the exchange-correlation functional that describes non-classical interactions between electrons. There are notable limitations of the current approximations for weak chemical interactions (such as in layered materials), for highly correlated (d and 26 J U LY 2018 | V OL 559 | NATUR E | 551 © 2018 Springer Nature Limited. All rights reserved