Machine Learning in High Energy Physics Community White Paper May17,2019 Abstract:Machine learning has been appied problems in particle physics with nd training the article nh comm The mai objective of the doc ument is to connect and mot ivate th hese areas of research and development with the physics drivers of the High- will be of great benefit. Editors:Sergei Gleyzer,Paul Seyfert,Steven Schramm Contributors:Kim Albertsson,Piero Altoe2,Dustin Ande on,John Anderson,Michael Andrews,Juan Cala Laurent Wahid Bhim Bonac 1:9 Elias Coniav Kyle Cranm Claire Davids 13A1 A Giron 13.Pa Vava Gligon ar Tob Golling . on He 7, Gra Greenwood nas Hacker John Har vey Zahari Kassa Alexei Klim nio Limo Gilles Louppe17,Aa rita Mangu Pere Mato Helg Meinhard,Dario Menasce Lon ortg Manf -15 a 7,Ryan Ree ,Aurelius Rinkevicius Ed rdo amal Ror Aaron Sauers 13 Schwartzman Ho Mart 2 NVidia de mellon Unlversity LIP Lisb niversity of London ional Laboratory ,Bologna CERN University
Machine Learning in High Energy Physics Community White Paper May 17, 2019 Abstract: Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We detail a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit. Editors: Sergei Gleyzer30, Paul Seyfert13, Steven Schramm32 Contributors: Kim Albertsson1 , Piero Altoe2 , Dustin Anderson3 , John Anderson4 , Michael Andrews5 , Juan Pedro Araque Espinosa6 , Adam Aurisano7 , Laurent Basara8 , Adrian Bevan9 , Wahid Bhimji10, Daniele Bonacorsi11 , Bjorn Burkle12, Paolo Calafiura10, Mario Campanelli9 , Louis Capps2 , Federico Carminati13, Stefano Carrazza13 , Yi-Fan Chen4 , Taylor Childers14, Yann Coadou15, Elias Coniavitis16, Kyle Cranmer17, Claire David18, Douglas Davis19, Andrea De Simone20, Javier Duarte21, Martin Erdmann22, Jonas Eschle23, Amir Farbin24, Matthew Feickert25, Nuno Filipe Castro6 , Conor Fitzpatrick26, Michele Floris13, Alessandra Forti27, Jordi Garra-Tico28 , Jochen Gemmler29, Maria Girone13, Paul Glaysher18, Sergei Gleyzer30, Vladimir Vava Gligorov31, Tobias Golling32, Jonas Graw2 , Lindsey Gray21, Dick Greenwood33, Thomas Hacker34, John Harvey13, Benedikt Hegner13, Lukas Heinrich17, Ulrich Heintz12, Ben Hooberman35, Johannes Junggeburth36, Michael Kagan37 , Meghan Kane38, Konstantin Kanishchev8 , Przemys law Karpi´nski13, Zahari Kassabov39, Gaurav Kaul40, Dorian Kcira3 , Thomas Keck29, Alexei Klimentov41, Jim Kowalkowski21, Luke Kreczko42, Alexander Kurepin43, Rob Kutschke21, Valentin Kuznetsov44, Nicolas K¨ohler36, Igor Lakomov13, Kevin Lannon45, Mario Lassnig13, Antonio Limosani46, Gilles Louppe17, Aashrita Mangu47, Pere Mato13, Helge Meinhard13, Dario Menasce48, Lorenzo Moneta13, Seth Moortgat49, Meenakshi Narain12, Mark Neubauer35, Harvey Newman3 , Sydney Otten50, Hans Pabst40, Michela Paganini51, Manfred Paulini5 , Gabriel Perdue21, Uzziel Perez52, Attilio Picazio53, Jim Pivarski54 , Harrison Prosper55, Fernanda Psihas56, Alexander Radovic57, Ryan Reece58, Aurelius Rinkevicius44, Eduardo Rodrigues7 , Jamal Rorie59, David Rousseau60, Aaron Sauers21, Steven Schramm32, Ariel Schwartzman37, Horst Severini61, Paul Seyfert13, Filip Siroky62, Konstantin Skazytkin43, Mike Sokoloff7 , Graeme Stewart63, Bob Stienen64, Ian Stockdale65, Giles Strong6 , Wei Sun4 , Savannah Thais51, Karen Tomko66, Eli Upfal12, Emanuele Usai12, Andrey Ustyuzhanin67, Martin Vala68, Sofia Vallecorsa69, Justin Vasel56, Mauro Verzetti70, Xavier Vilas´ıs-Cardona71, Jean-Roch Vlimant3 , Ilija Vukotic72, Sean-Jiun Wang30, Gordon Watts73, Michael Williams74 , Wenjing Wu75, Stefan Wunsch29, Kun Yang4 , Omar Zapata76 1 Lulea University of Technology 2 NVidia 3 California Institute of Technology 4 Google 5 Carnegie Mellon University 6 LIP Lisboa 7 University of Cincinnati 8 Universita e INFN, Padova 9 University of London 10 Lawrence Berkeley National Laboratory 11 Universita e INFN, Bologna 12 Brown University 13 CERN 1 arXiv:1807.02876v3 [physics.comp-ph] 16 May 2019
e National Laboratory 6 ronen-Synchrotror SISSA Trieste Italy Uni rsitat Z行rich Ecole Polytee dhalqueFedlo e de Lausann University of Illinois at Urbana-Champaign versity of Milan Brookhaven National Laboratory University of Bristo of Science Notre Dame University of Berkeley vert INFN,Mil c yofAmterdn Radboud Univerity Nmegc ton University 61n Rice Uni ty of Glasgo Ohio Sup ng-Wonju Nat 2
14 Argonne National Laboratory 15 CPPM Aix Marseille Univ CNRS/IN2P3 16 Universitaet Freiburg 17 New York University 18 Deutsches Elektronen-Synchrotron 19 Duke University 20 SISSA Trieste Italy 21 Fermi National Accelerator Laboratory 22 RWTH Aachen University 23 Universit¨at Z¨urich 24 University of Texas at Arlington 25 Southern Methodist University 26 Ecole Polytechnique Federale de Lausanne 27 University of Manchester 28 University of Cambridge 29 Karlsruher Institut f¨ur Technologie 30 University of Florida 31 LPNHE, Sorbonne Universit´e et Universit´e Paris Diderot, CNRS/IN2P3, Paris 32 Universit´e de Gen`eve 33 Louisiana Tech University 34 Purdue University 35 University of Illinois at Urbana-Champaign 36 Max Planck Institut f¨ur Physik 37 SLAC National Accelerator Laboratory 38 SoundCloud 39 University of Milan 40 Intel 41 Brookhaven National Laboratory 42 University of Bristol 43 Russian Academy of Sciences 44 Cornell University 45 University of Notre Dame 46 University of Melbourne 47 University of California Berkeley 48 Universita & INFN, Milano Bicocca 49 Vrije Universiteit Brussel 50 University of Amsterdam and Radboud University Nijmegen 51 Yale University 52 University of Alabama 53 University of Massachusetts 54 Princeton University 55 Florida State University 56 Indiana University 57 College of William and Mary 58 University of California, Santa Cruz 59 Rice University 60 Universite de Paris Sud 11 61 University of Oklahoma 62 Masaryk University 63 University of Glasgow 64 Radboud Universiteit Nijmegen 65 Altair Engineering 66 Ohio Supercomputer Center 67 Yandex School of Data Analysis 68 Technical University of Kosice 69 Gangneung-Wonju National University 70 University of Rochester 71 University of Barcelona 72 University of Chicago 73 University of Washington 74 Massachusetts Institute of Technology 2
3
75 Chinese Academy of Sciences 76 OProject and University of Antioquia 3
Contents 1 Preface 2 Introduction of Machine Lear ing Alg gorithms in HEp 6667 3 Machine Learning Applications and R&D Analysis and trigge 3.3Object Reconstruction.Identification.and Calibration 7788990 6 Matrix Element Machine Learning Method source Optimization and 4 Aca ademic Outreach and Engagement 4 13314455 5.2 I/O and Programming Languag traeteT ion Hardware 5.5.2D able HEP-NL a Formats Data Format Attributes 5.5.3 Interfaces and Middleware g 6 Computing and Hardware Resources 202 High Performance C omputing 66 Opportunistic resources 222 6.7 Data Storage and Availability 6 Machine Leamni 222222 7 Training the community 22 223 9 Conclusions 10 Acknowledgements 4
Contents 1 Preface 6 2 Introduction 6 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Brief Overview of Machine Learning Algorithms in HEP . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Structure of the Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Machine Learning Applications and R&D 7 3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Real Time Analysis and Triggering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Object Reconstruction, Identification, and Calibration . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 End-To-End Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.5 Sustainable Matrix Element Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.6 Matrix Element Machine Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.7 Learning the Standard Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.8 Theory Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.9 Uncertainty Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.10 Monitoring of Detectors, Hardware Anomalies and Preemptive Maintenance . . . . . . . . . . . . 13 3.11 Computing Resource Optimization and Control of Networks and Production Workflows . . . . . 13 4 Collaborating with other communities 13 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Academic Outreach and Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Machine Learning Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4 Collaborative Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.5 Industry Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.6 Machine Learning Community-at-large Outreach . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5 Machine Learning Software and Tools 16 5.1 Software Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 I/O and Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3 Software Interfaces to Acceleration Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.4 Parallelization and Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.5 Internal and External ML tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.5.1 Machine Learning Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.5.2 Desirable HEP-ML Software and Data Format Attributes . . . . . . . . . . . . . . . . . . 19 5.5.3 Interfaces and Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Computing and Hardware Resources 19 6.1 Resource Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.2 Graphical Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.3 Cloud TPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.4 High Performance Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.5 Field Programmable Gate Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.6 Opportunistic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.7 Data Storage and Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.8 Software Distribution and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.9 Machine Learning As a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7 Training the community 22 8 Roadmap 22 8.1 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8.2 Steps to Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 9 Conclusions 23 10 Acknowledgements 23 4
Element Mfetbods
A Appendix 24 A.1 Matrix Element Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5
1 Preface To outline the challenges in computing that high-energy physics will face over the next years and strategies to not attempt to take later developments into account. 2 Introduction notthe Large Hadron Colider (LHC)re,the high luminosity LHC (HI-LH),in a is to exploit the full pbysics will delive ayhat periments will be limited by the physics Machine learning (ML)applied to particle physics nt research and development where signil mprovements are ne .Physics performance of reconstruction and analysis algorithms: Execution time of computationally expensive parts of event simulation,pattern recognition,and cali- bration; .Realtime implementation of machine learning algorithms Reduction of the data footprint with data compression,placement and access 2.1 Motivation with physics beyond the SM.Both tasks require the identification of rare signals in immense bac rounds rie tc Machine learning algorithms are already the s e-of-the-art in event and particle identification,energy estima 2.2 Brief Overview of Machine Learning Algorithms in HEP Trees(BDTs)and Neural Networks(NN). model is tmined for
1 Preface To outline the challenges in computing that high-energy physics will face over the next years and strategies to approach them, the HEP software foundation has organised a Community White Paper (CWP) [1]. In addition to the main document, several more detailed documents were worked out by different working groups. The present document focusses on the topic of machine learning. The goals are to define the tasks at the energy and intensity frontier that can be addressed during the next decade by research and development of machine learning applications. Machine learning in particle physics is evolving fast, while the contents of this community white paper were mainly compiled during community meetings in spring 2017 that took place at several workshops on machine learning in high-energy physics: S2I2 and [2–5]. The contents of this document thus reflect the state of the art at these events and does not attempt to take later developments into account. 2 Introduction One of the main objectives of particle physics in the post-Higgs boson discovery era is to exploit the full physics potential of both the Large Hadron Collider (LHC) and its upgrade, the high luminosity LHC (HL-LHC), in addition to present and future neutrino experiments. The HL-LHC will deliver an integrated luminosity that is 20 times larger than the present LHC dataset, bringing quantitatively and qualitatively new challenges due to event size, data volume, and complexity. The physics reach of the experiments will be limited by the physics performance of algorithms and computational resources. Machine learning (ML) applied to particle physics promises to provide improvements in both of these areas. Incorporating machine learning in particle physics workflows will require significant research and development over the next five years. Areas where significant improvements are needed include: • Physics performance of reconstruction and analysis algorithms; • Execution time of computationally expensive parts of event simulation, pattern recognition, and calibration; • Realtime implementation of machine learning algorithms; • Reduction of the data footprint with data compression, placement and access. 2.1 Motivation The experimental high-energy physics (HEP) program revolves around two main objectives that go hand in hand: probing the Standard Model (SM) with increasing precision and searching for new particles associated with physics beyond the SM. Both tasks require the identification of rare signals in immense backgrounds. Substantially increased levels of pile-up collisions from additional protons in the bunch at the HL-LHC will make this a significant challenge. Machine learning algorithms are already the state-of-the-art in event and particle identification, energy estimation and pile-up suppression applications in HEP. Despite their present advantage, machine-learning algorithms still have significant room for improvement in their exploitation of the full potential of the dataset. 2.2 Brief Overview of Machine Learning Algorithms in HEP This section provides a brief introduction to the most important machine learning algorithms in HEP, introducing key vocabulary (in italic). Machine learning methods are designed to exploit large datasets in order to reduce complexity and find new features in data. The current most frequently used machine learning algorithms in HEP are Boosted Decision Trees (BDTs) and Neural Networks (NN). Typically, variables relevant to the physics problem are selected and a machine learning model is trained for classification or regression using signal and background events (or instances). Training the model is the most human- and CPU-time consuming step, while the application, the so called inference stage, is relatively inexpensive. BDTs and NNs are typically used to classify particles and events. They are also used for regression, 6
oforee toobain the bet ete of partice'ybdo impact on HEP.Deep learning is particularly promising when there is a large amount of data and features,as well as symmetries and complex non-linear dependencies between inputs and outputs There are different types of DNN used in HEP:fully-connected(FCN),convolutional (CNN)and recurrent (RNN).etworks aresd in the co work is used in HEP. A large set of machine learning algorithms is devoted to time series analysis and prediction.They are in general not relevant for HEP data an alysis where event s are independent from eachother.However,there is more and 2.3 Structure of the Document earning software in HEP and discusses the interplay between internally and developed machine lear ing tools Recent ng was made p t by emerg learning.Finally,Section 8 presents the roadmap for the near future. 3 Machine Learning Applications and R&D This chapter describes the science drivers and high-energy physics challenges where machine learning can play a work will go in adapting and evolving such methods to match the particular HEP requirements. 3.1 Simulation ctions with matter are known,it is int acta cated CWP challenges of simulations in great detail.This section focuses on the machine learning related aspects For the HLLHC.on of trillions of simulated ded in r example,simula tector response of a single LH oton-proto icity in the part case in the core of a jet of particles produced byhigh guo Fast simulation place the slo utationally efficient mations.Often such approximations have been done by using simplified parametrizations or particle shower These are computationally fast but often suffer from insufficient accuracy for high precision 7
where a continuous function is learned, for example to obtain the best estimate of a particle’s energy based on the measurements from multiple detectors. Neural Networks have been used in HEP for some time; however, improvements in training algorithms and computing power have in the last decade led to the so-called deep learning revolution, which has had a significant impact on HEP. Deep learning is particularly promising when there is a large amount of data and features, as well as symmetries and complex non-linear dependencies between inputs and outputs. There are different types of DNN used in HEP: fully-connected (FCN), convolutional (CNN) and recurrent (RNN). Additionally, neural networks are used in the context of Generative Models, where a Neural Network is trained to reproduce the multidimensional distribution of the training instances set. Variational AutoEncoders (VAE) and more recent Generative Adversarial Networks (GAN) are two examples of such generative models used in HEP. A large set of machine learning algorithms is devoted to time series analysis and prediction. They are in general not relevant for HEP data analysis where events are independent from each other. However, there is more and more interest in these algorithms for Data Quality, Computing and Accelerator Infrastructure monitoring, as well as those physics processes and event reconstruction tasks where time is an important dimension. 2.3 Structure of the Document Applications of machine learning algorithms motivated by HEP drivers are detailed in Section 3, while Section 4 focuses on outreach and collaboration with the machine learning community. Section 5 focuses on the machine learning software in HEP and discusses the interplay between internally and externally developed machine learning tools. Recent progress in machine learning was made possible in part by emergence of suitable hardware for training complex models, thus in Section 6 the resource requirements of training and applying machine learning algorithms in HEP are discussed. Section 7 discusses strategies for training the HEP community in machine learning. Finally, Section 8 presents the roadmap for the near future. 3 Machine Learning Applications and R&D This chapter describes the science drivers and high-energy physics challenges where machine learning can play a significant role in advancing the current state of the art. These challenges are selected because of their relevance and potential and also due to similarity with challenges faced outside the field. Despite similarities, major R&D work will go in adapting and evolving such methods to match the particular HEP requirements. 3.1 Simulation Particle discovery relies on the ability to accurately compare the observed detector response data with expectations based on the hypotheses of the Standard Model or models of new physics. While the processes of subatomic particle interactions with matter are known, it is intractable to compute the detector response analytically. As a result, Monte Carlo simulation tools, such as GEANT [6], have been developed to simulate the propagation of particles in detectors to compare with the data. The dedicated CWP on detector simulation [7] discusses the challenges of simulations in great detail. This section focuses on the machine learning related aspects. For the HL-LHC, on the order of trillions of simulated collisions are needed in order to achieve the required statistical accuracy of the simulations to perform precision hypothesis testing. However, such simulations are highly computationally expensive. For example, simulating the detector response of a single LHC proton-proton collision event takes on the order of several minutes. A particularly time consuming step is the simulation of particles incident on the dense material of a calorimeter. The high interaction probability and resulting high multiplicity in the so-called showers of particles passing through the detector material make the simulation of such processes very expensive. This problem is further compounded when particle showers overlap, as is frequently the case in the core of a jet of particles produced by high energy quarks and gluons. Fast simulations replace the slowest components of the simulation chain with computationally efficient approximations. Often such approximations have been done by using simplified parametrizations or particle shower look-up tables. These are computationally fast but often suffer from insufficient accuracy for high precision physics measurements and searches. 7
Recent progress in high fidelity fast generative models,such as GANs and VAEs,which learn to sample from high toehniqpesawordnofnagmimdenceaeininhtioapedoereitingfattnmlhtiomtoetnios ources with fast s can be used to tune various aspects of the simnlated events.Performing such tuning over the of the generator without detailed knowledge of its internal details Applying such techniques to simulation tuning may further improve the output of the simulations 3.2 Real Time Analysis and Triggering nts recorded by these ever can be aftordabl ed and di teding)with a reasonable efficiency,an later event more of the data in realtime o This topic is dis aod in e detail in the re d Soft osted decision trees in the Level I trigger to approximate mon momenta.One of the challenges is the Tade-ofinngorTthcomplextiwyandperforTmancemd quent perfor fast inference in online systems. Real-tim for fast fake k and clone 2ectio ndaheateplrattboasteldeiateeforalaepa Another related application In addition,the incr easing event complexity particularly in the HI-LHCerawill mean that machine learning improving jet calibration at a very early stage of reconstruction allowing jet triggers thresholds to be lowered: or super e and prot ay triggering at neutrino 3.3 Object Reconstruction,Identification,and Calibration ts occur n tim and thus tdtt ecay within ap They may well also break down in other areas of high-energy physics in due cours
Recent progress in high fidelity fast generative models, such as GANs and VAEs, which learn to sample from high dimensional feature distributions by minimizing an objective that measures the distance between the generated and actual distribution, offer a promising alternative for simulation. A simplified first attempt at using such techniques saw orders of magnitude increase in simulation speed over existing fast simulation techniques [8], but such generative models have not yet reached the required accuracy partly due to inherent shortcomings of the methods and the instability in training of the GANs. Developing these techniques for realistic detector models and understanding how to reach the required accuracy is still needed. The fast advancement in the ML community of such techniques makes this a highly promising avenue to pursue. Orthogonal to the reduction of demand of computing resources with fast simulations, machine learning can also contribute to other aspects of the simulation. Event generators have a large number of parameters that can be used to tune various aspects of the simulated events. Performing such tuning over many-dimensional parameter space is highly non-trivial and may require generating many data samples in the process to test parameter space points. Modern machine learning optimization techniques, such as Bayesian Optimization, allow for global optimization of the generator without detailed knowledge of its internal details [9]. Applying such techniques to simulation tuning may further improve the output of the simulations. 3.2 Real Time Analysis and Triggering The traditional approach to data analysis in particle physics assumes that the interesting events recorded by a detector can be selected in real-time (a process known as triggering) with a reasonable efficiency, and that once selected, these events can be affordably stored and distributed for further selection and analysis at a later point in time. However, the enormous production cross-section and luminosity of the LHC mean that these assumptions break down.1 In particular there are whole classes of events, for example beauty and charm hadrons or low-mass dark matter signatures, which are so abundant that it is not affordable to store all of the events for later analysis. To exploit the full information the LHC delivers, it will increasingly be necessary to perform more of the data analysis in real-time [10]. This topic is discussed in some detail in the Reconstruction and Software Triggering chapter [11], but it is also an important driver of machine learning applications in HEP. Machine learning methods offer the possibility to offset some of the cost of applying reconstruction algorithms, and may be the only hope of performing the real-time reconstruction that enables real-time analysis in the first place. For example, the CMS experiment uses boosted decision trees in the Level 1 trigger to approximate muon momenta. One of the challenges is the trade-off in algorithm complexity and performance under strict inference time constraints. In another example, called the HEP.TrkX project, deep neural networks are trained on large resource platforms and subsequently perform fast inference in online systems. Real-time analysis poses specific challenges to machine learning algorithm design, in particular how to maintain insensitivity to detector performance which may vary over time. For example, the LHCb experiment uses neural networks for fast fake-track and clone rejection and already employs a fast boosted decision tree for a large part of the event selection in the trigger [12]. It will be important that these approaches maintain performance for higher detector occupancy for the full range of tracks used in physics analyses. Another related application is speeding up the reconstruction of beauty, charm, and other lower mass hadrons, where traditional track combinatorics and vertexing techniques may become too computationally expensive. In addition, the increasing event complexity particularly in the HL-LHC era will mean that machine learning techniques may also become more important to maintaining or improving the efficiency of traditional triggers. Examples of where ML approaches can be useful are the triggering of electroweak events with low-energy objects; improving jet calibration at a very early stage of reconstruction allowing jet triggers thresholds to be lowered; or supernovae and proton decay triggering at neutrino experiments. 3.3 Object Reconstruction, Identification, and Calibration The physical processes of interest in high energy physics experiments occur on time scales too short to be observed directly by particle detectors. For instance, a Higgs boson produced at the LHC will decay within approximately 10−22 seconds and thus decays essentially at the point of production. However, the decay products of the initial particle, which are observed in the detector, can be used to infer its properties. Better knowledge of the properties (e.g. type, energy, direction) of the decay products permits more accurate reconstruction of the initial physical process. Event reconstruction at large is discussed in [11] and also disucces the following 1They may well also break down in other areas of high-energy physics in due course. 8
applications of machine learning Experiments have trained ML algorithms on the features from combined reconstruction algorithms to perform ex extracting networks. projection chamber from electromagnetic showers,jet p opertics including substructure and btagging.taus and missing ener are th that scales line s such as 3.4 End-To-End Deep Learning called MELA variables,used in the analysis of the final states.While a few analyses first at the Tevatro which is described in the next section. a from a detector together with etend的 high dimer s raw data in a controlled way that does not necessarily rely on domain knowledge. 3.5 Sustainable Matrix Element Method d for n nts of standard mod (SN ME method are given in Appendix A.1. p The me method has able fea ent all nation of a radiation within the LO ME framework using transverse boosting and dedicated transfer funct ions to inte wlnitialstat y on 9
applications of machine learning. Experiments have trained ML algorithms on the features from combined reconstruction algorithms to perform particle identification for decades. In the past decade BDTs have been one of the most popular techniques in this domain. More recently, experiments have focused on extracting better performance with deep neural networks. An active area of research is the application of DNNs to the output of feature extraction in order to perform particle identification and extracting particle properties [13]. This is particularly true for calorimeters or time projection chambers (TPCs), where the data can be represented as a 2D or 3D image and the problems can be cast as computer vision tasks, in which neural networks are used to reconstruct images from pixel intensities. These neural networks are adapted for particle physics applications by optimizing network architectures for complex, 3-dimensional detector geometries and training them on suitable signal and background samples derived from data control regions. Applications include identification and measurements of electrons and photons from electromagnetic showers, jet properties including substructure and b-tagging, taus and missing energy. Promising deep learning architectures for these tasks include convolutional, recurrent and adversarial neural networks. A particularly important application is to Liquid Argon TPCs (LArTPCs), which are the chosen detection technology for the flagship neutrino program. For tracking detectors, pattern recognition is the most computationally challenging step. In particular, it becomes computationally intractable for the HL-LHC. The hope is that machine learning will provide a solution that scales linearly with LHC collision density. A current effort called HEP.TrkX investigates deep learning algorithms such as long short-term memory (LSTM) networks for track pattern recognition on many-core processors. 3.4 End-To-End Deep Learning The vast majority of analyses at the LHC use high-level features constructed from particle four-momenta, even when the analyses make use of machine learning. A high-profile example of such variables are the seven, socalled MELA variables, used in the analysis of the final states H → ZZ → 4`. While a few analyses, first at the Tevatron, and later at the LHC, have used the four-momenta directly, the latter are still high-level relative to the raw data. Approaches based on the four-momenta are closely related to the Matrix Element Method, which is described in the next section. Given recent spectacular advances in image recognition based on the use of raw information, we are led to consider whether there is something to be gained by moving closer to using raw data in LHC analyses. This so-called end-to-end deep learning approach uses low level data from a detector together with deep learning algorithms [14, 15]. One obvious challenge is that low level data, for example, detector hits, tend to be both high-dimensional and sparse. Therefore, there is interest in also exploring automatic ways to compress raw data in a controlled way that does not necessarily rely on domain knowledge. 3.5 Sustainable Matrix Element Method The Matrix Element (ME) Method [16–19] is a powerful technique which can be utilized for measurements of physical model parameters and direct searches for new phenomena. It has been used extensively by collider experiments at the Tevatron for standard model (SM) measurements and Higgs boson searches [20–25] and at the LHC for measurements in the Higgs and top quark sectors of the SM [26–32]. A few more details on the ME method are given in Appendix A.1. The ME method has several unique and desirable features, most notably it (1) does not require training data being an ab initio calculation of event probabilities, (2) incorporates all available kinematic information of a hypothesized process, including all correlations, and (3) has a clear physical meaning in terms of the transition probabilities within the framework of quantum field theory. One drawback to the ME Method is that it has traditionally relied on leading order (LO) matrix elements, although nothing limits the ME method to LO calculations. Techniques that accommodate initial-state QCD radiation within the LO ME framework using transverse boosting and dedicated transfer functions to integrate over the transverse momentum of initial-state partons have been developed [33]. Another challenge is development of the transfer functions which rely on tediously hand-crafted fits to full simulated Monte-Carlo events. 9
The most serious difficulty in the ME method that has limited its applicability to searches for beyond-the-SM imdin the new physics rech of the Cwhich wi be dominated by increase in integrated The applica tion of the ME method is computationally challenging for two reasons:(1)it involves high- spite the attr rdenof the ME method and promise of futher optimiztion ive featur dern machine lea ed up the the ME method and therefore broaden the applicability of the ME method to the efit of HL-LHC physics is suf ly rich to encod al N of ti he wmple) (DN re strong APwdhatepoiyoaaeeodp imations,poesibly n co unctio ith smart ph can be found i On ea set of DNNs re oodapnoii of th ese DN n be ME method to be both nimble and usb neither of which is true today. The all strategy is to do the sive full me calculatio sible ideally ace for dnn raining and once more for a final pass befor a good appr ture analysis thod with might l ng lk sed through this pro dure As the s through selection and /or sample nRetnticteanetcte,the DNN-base o the time ME method is published.a final pass using full ME calculation would likely be performed both to maximize orty of the ruad to adate teevi the Dbas alone with a com software project for ME calculations in the spirit of This area is very nclaiedformpactlcohboatiouwihg hine learning. sing a DNN trainins from full ME calculations and direct compa ns of the integration accura and DNN-based cal tions should be udertaken.More placed in developing comeling common API,is proposed. 3.6 Matrix Element Machine Learning Method The ma element method is hased in the fact that the physics of article collisions is t-ha r al and side of interest.In exactly the same inputs as in the matrix element method.namely,the matrix elements.parton distribution 10
The most serious difficulty in the ME method that has limited its applicability to searches for beyond-the-SM physics and precision measurements is that it is very computationally intensive. If this limitation is overcome, it would enable more widespread use of ME methods for analysis of LHC data. This could be particularly important for extending the new physics reach of the HL-LHC which will be dominated by increases in integrated luminosity rather than center-of-mass collision energy. The application of the ME method is computationally challenging for two reasons: (1) it involves highdimensional integration over a large number of events, signal and background hypotheses, and systematic variations and (2) it involves sharply-peaked integrands2 over a large domain in phase space. Therefore, despite the attractive features of the ME method and promise of further optimization and parallelization, the computational burden of the ME technique will continue to limit is range of applicability for practical data analysis without new and innovative approaches. The primary idea put forward in this section is to utilize modern machine learning techniques to dramatically speed up the numerical evaluations in the ME method and therefore broaden the applicability of the ME method to the benefit of HL-LHC physics. Applying neural networks to numerical integration problems is plausible but not new (see [34–36], for example). The technical challenge is to design a network which is sufficiently rich to encode the complexity of the ME calculation for a given process over the phase space relevant to the signal process. Deep Neural Networks (DNNs) are strong candidates for networks with sufficient complexity to achieve good approximations, possibly in conjunction with smart phase-space mapping such as described in [37]. Promising demonstration of the power of Boosted Decision Trees [38, 39] and Generative Adversarial Networks [40] for improved Monte Carlo integration can be found in [41]. Once a set of DNNs representing definite integrals is generated to good approximation, evaluation of the ME method calculations via the DNNs will be very fast. These DNNs can be thought of as preserving the essence of ME calculations in a way that allows for fast forward execution. They can enable the ME method to be both nimble and sustainable, neither of which is true today. The overall strategy is to do the expensive full ME calculations as infrequently as possible, ideally once for DNN training and once more for a final pass before publication, with the DNNs utilized as a good approximation in between. A future analysis flow using the ME method with DNNs might look something like the following: One performs a large number of ME calculations using a traditional numerical integration technique like VEGAS [42, 43] or FOAM [44] on a large CPU resource, ideally exploiting acceleration on many-core devices. The DNN training data is generated from the phase space sampling in performing the full integration in this initial pass, and DNNs are trained either in situ or a posteriori. The accuracy of the DNN-based ME calculation can be assessed through this procedure. As the analysis develops and progresses through selection and/or sample changes, systematic treatment, etc., the DNN-based ME calculations are used in place of the time-consuming, full ME calculations to make the analysis nimble and to preserve the ME calculations. Before a result using the ME method is published, a final pass using full ME calculation would likely be performed both to maximize the numerical precision or sensitivity of the results and to validate the analysis evolution via the DNN-based approximations. There are several activities which are proposed to further develop the idea of a Sustainable Matrix Element Method. The first is to establish a cross-experiment group interested in developing the ideas presented in this section, along with a common software project for ME calculations in the spirit of [45]. This area is very well-suited for impactful collaboration with computer scientists and those working in machine learning. Using a few test cases (e.g. tt¯ or tth¯ production), evaluation of DNN choices and configurations, developing methods for DNN training from full ME calculations and direct comparisons of the integration accuracy between Monte Carlo and DNN-based calculations should be undertaken. More effort should also be placed in developing compelling applications of the ME method for HL-LHC physics. In the longer term, the possibility of Sustainable-MatrixElement-Method-as-a-Service (SMEMaaS), where shared software and infrastructure could be used through a common API, is proposed. 3.6 Matrix Element Machine Learning Method The matrix element method is based in the fact that the physics of particle collisions is encoded in the distribution of the particles’ four-momenta and with their flavors. As noted in the previous section, the fundamental task is to approximate the left-hand side of Eq. (5) for all (exclusive) final states of interest. In the matrix element method, one proceeds by approximating the right-hand side of Eq. (5). But, since the goal is to compute Pξ(x|α), and given that billions of fully simulated events will be available, and that the simulations use exactly the same inputs as in the matrix element method, namely, the matrix elements, parton distribution 2a consequence of imposing energy/momentum conservation in the processes 10