
Software-defined Software:a PerspectiveofMachineLearningbasedSoftware Production
Software-defined Software: a Perspective of Machine Learning based Software Production 1

Software Productionisfacing Two Challenges·Increasing complexities of both hardware and software:Many new devices are merging into existing computing systems: Highly optimized software needs deep understanding of both:Newhardwarearchitecture,and.Domainspecificknowledge·Increasing shortageofsoftware developers:Unbalanced demand and supply,in both quantity and quality: A parallel programming expert needs a training at a Ph.D. level
Software Production is facing Two Challenges • Increasing complexities of both hardware and software • Many new devices are merging into existing computing systems • Highly optimized software needs deep understanding of both • New hardware architecture, and • Domain specific knowledge • Increasing shortage of software developers • Unbalanced demand and supply, in both quantity and quality • A parallel programming expert needs a training at a Ph.D. level 2

Moore'sLawis reachingtothe endMasteringMoore'sLawCorei510billion1.3billionIntel'sprogressinpackingmoreIntelCore2Duotransistorsonmainstream1billion410millionmicroprocessorchips100millionPentium4Core15Logarithmicscale125millionLObillion10millionPentium1million3.1million100,00010,000Cintel(intel)1,000Core2DuoCore"i540041002,300transistors1011971'80'90'102000*UpgradedversionsofpriormodelsTHEWALLSTREETJOURNALSource:IntelWe are closing to the final usable size limit a transistor gate length: 5 nm (2020)Minimum costpertransistorhasbeenup since32nm (2010)Road map:22 nm in 2012, 14 nm in 2014, 10 nm in 2017, 7 nm 20183
Moore’s Law is reaching to the end We are closing to the final usable size limit a transistor gate length: 5 nm (2020) Minimum cost per transistor has been up since 32 nm (2010) Road map: 22 nm in 2012, 14 nm in 2014, 10 nm in 2017, 7 nm 2018 3

Apowerfulandgeneralpurposeecosystem·A simple and one-size-fit-all computing abstraction:Any computingtask is expressed byprogramsand executedbyIsA:Ataskisdividedbyasequenceof standardexecutiontimeunit.Undermultiprogramming model, Os assignstime unitto eachtaskbyturnsDatablocksaremovedaroundinthememoryhierarchy.Programmingisveryflexible.Moore's law has hidden dark side ofthe conventional ecosystem.EfficiencyconfinuesfobecomelowinbothpowerandexecutionNotinclusivetoothermodes,e.g.simD,dataflow,and domainspecific·AsMoore'slawisending:General-purpose computing needs a lot ofexternal help.One-size-fit-all abstraction isno longerthe only choice
A powerful and general purpose ecosystem • A simple and one-size-fit-all computing abstraction • Any computing task is expressed by programs and executed by ISA • A task is divided by a sequence of standard execution time unit • Under multiprogramming model, OS assigns time unit to each task by turns • Data blocks are moved around in the memory hierarchy • Programming is very flexible • Moore’s law has hidden dark side of the conventional ecosystem • Efficiency continues to become low in both power and execution • Not inclusive to other modes, e.g. SIMD, data flow, and domain specific • As Moore’s law is ending • General-purpose computing needs a lot of external help • One-size-fit-all abstraction is no longer the only choice 4

IncreasingShortageofSoftwareDevelopers1,400,0001200,000$5o0billionopportunity1000,000800,0001.4million600,000computingjobs400,000200,000400.000computerscience students201220132014201520162018201920120172020Sources:NationalBureauofLaborStatists,NsF,andBayAreaCouncilofEconomicsInstitute5
Increasing Shortage of Software Developers Sources: National Bureau of Labor Statists, NSF, and Bay Area Council of Economics Institute. 5

Hardware devices are specializedAlteraFPGA(intel)OYourASICXeon'processorGPUFPGAASICCPU (general)Dataprocessingsystemsmustefficientlyutilizeexternalhardwaredevicesfor continuedhighperformance
Hardware devices are specialized 6 CPU (general) GPU FPGA Data processing systems must efficiently utilize external hardware devices for continued high performance ASIC

A case for the highly efficient specializationLENERGYPERFORMANCEVSTIMEEFFICIENCYTHROUAHPUTCRIPTOCURRENCYMINING(MhashiJ)(mhash/s)107100001o1'000ASICtoo100000to10000FPGA-1o001O0GPU0.1CPU0.011C0.001200g201620Loos20122013201420152016.Cryptocurrency miningishigh computing intensivetask inbitcoin system Since 2o08, the mininghas been evolvedon cPU, GPU, FPGA, and AsiC.TheefficiencyfactorsareCPU=1,multicore=10,GPU=100,FPGA=1,000:Asic is the most efficientin throughput and power:1o,000 to 1,000,000
A case for the highly efficient specialization • Crypto currency mining is high computing intensive task in bitcoin system • Since 2008, the mining has been evolved on CPU, GPU, FPGA, and ASIC • The efficiency factors are CPU=1, multicore = 10, GPU=100, FPGA=1,000 • ASIC is the most efficient in throughput and power: 10,000 to 1,000,000 7

Software-defined-Software:softwaredevelopment by machinesMotivation.Recent advances of machine learning for.Image/video/voicerecognition,NLP,andmanyothers·AlphaGodefeatedthebesthumanGoplayer.Machine-learningbasedprograms enhancehumanknowledgeof the Go game at a super fast speed·Hypothesis. If a machine can replace human genius of Goplayers, it canalso become a highly skilled programmer
Software-defined-Software: software development by machines • Motivation • Recent advances of machine learning for • Image/video/voice recognition, NLP, and many others • AlphaGo defeated the best human Go player • Machine-learning based programs enhance human knowledge of the Go game at a super fast speed • Hypothesis • If a machine can replace human genius of Go players, it can also become a highly skilled programmer 8

Basic Difference between SDS and HDsTraditionalProgrammingDataOutputcomputerProgramMachine LearningData+ProgramComputerOutput
Basic Difference between SDS and HDS 9

Unique Properties of Software-defined SoftwareSDsvs.HDs(human-definedsoftware,traditionalprograms)·Implicitalgorithmdesignvs.Explicitalgorithmdesign.SDsdoesnothavepredefineddatastructures/algorithms,e.g.B-tree.SDscreatessoftwarebylearning (DNN),e.g.LearnedIndex·On-demand programming vs.Pre-programming:sDs handle dynamics or runtime optimization.Betheprogramvs.Writeaprogram.sDscanquicklymakethecopyofaprogram,dynamicallyevolveandre-programa program, and timely optimize a program10*TimKraska.AlexBeutelEdH.Chi.Jeffrey.Dean.NeoklisPolvzotis."TheCaseforLearnedIndexStructure"SiGMOD'18
Unique Properties of Software-defined Software SDS vs. HDS (human-defined software, traditional programs) • Implicit algorithm design vs. Explicit algorithm design • SDS does not have predefined data structures/algorithms, e.g. B-tree • SDS creates software by learning (DNN), e.g. Learned Index • On-demand programming vs. Pre-programming • SDS handle dynamics or runtime optimization • Be the program vs. Write a program • SDS can quickly make the copy of a program, dynamically evolve and re-program a program, and timely optimize a program 10 *Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis, “The Case for Learned Index Structure”, SIGMOD’18