
“平衡”是系统研发的一个永恒的主题
“平衡”是系统研发的一个永恒的主题

计算机 系统和应用进程的三个阶段。计算机是为“计算(computing)”「而研制的(1930s-1990s一CPU芯片,操作系统,存贮系统,编译软件,高性能计算。。一物质和物理世界被转变为数字世界,快速计算和深度分析一人类社会有了前所未有的科技突破:气象,新型材料,。。。。·计算机是为“网络(connectivity)”而研制的(1990s一2010s)一互联网和无线上网是一个全新数据世界的基础.1981-2017:Bandwidth:from50Kbpsto100Pbps(2Mtimes)1981-2017:#ofdevices/user:from0.1to 10 (100times)网络电话,微博,QQ,微信,网上购物,网上查询,。。。·计算机是为“数据中心(data)”而研制的(从21世纪开始)一今天大数据的爆炸并不是已有的物理和物质的数字世界的一个延续一这个新的数据世界精确地记录和追踪人类自身的行为一有史以来90%的数据是过去两年产生的2
计算机 系统和应用进程的三个阶段 • 计算机是为 “计算 (computing)” 而研制的(1930s -1990s) – CPU芯片,操作系统,存贮系统,编译软件,高性能计算。 – 物质和物理世界被转变为数字世界, 快速计算和深度分析 – 人类社会有了前所未有的科技突破:气象,新型材料, 。 • 计算机是为 “网络 (connectivity)” 而研制的 (1990s – 2010s) – 互联网和无线上网是一个全新数据世界的基础: • 1981-2017: Bandwidth: from 50K bps to 100P bps (2 M times) • 1981-2017: # of devices/user: from 0.1 to 10 (100 times) • 网络电话, 微博,QQ, 微信,网上购物, 网上查询, 。 • 计算机是为 “数据中心 (data)” 而研制的 (从21世纪开始) – 今天大数据的爆炸并不是已有的物理和物质的数字世界的一个延续 – 这个新的数据世界精确地记录和追踪人类自身的行为 – 有史以来90%的数据是过去两年产生的 2

Two Trends in IT R&D·DomainSpecific Software Development-Softwareis designed,developed &maintainedindomainareas-Manynon-softwareFortune5o0companiesowndomainsoftware:Amazon (Retail):cloudsoftware. Walmart (grocery shopping): data analytics software Google (Internet search): big data system software and Android: Facebook (social networks): big data warehouse software (Hive), Presto:ExxonMobil (oilCompany):3DseismicdataprocessingsoftwareSoftware defined Infrastructure (hardware)-Definingapplications'values andfunctionsontopofhardware-Translating users'requirements into best hardware performance-Software defined storage,software definednetworks,datacenterdefinedcloud
Two Trends in IT R&D • Domain Specific Software Development – Software is designed, developed & maintained in domain areas – Many non-software Fortune 500 companies own domain software • Amazon (Retail): cloud software • Walmart (grocery shopping): data analytics software • Google (Internet search): big data system software and Android • Facebook (social networks): big data warehouse software (Hive), Presto • ExxonMobil (oil Company): 3D seismic data processing software • . • Software defined Infrastructure (hardware) – Defining applications’ values and functions on top of hardware – Translating users’ requirements into best hardware performance – Software defined storage, software defined networks, datacenter defined cloud . 3

MajorResources inComputing and NetworkSystemsGoodNewsinsupplyCPU cycles:oversupplied for many applicationsMemory bandwidth: improved dramaticallyMemory capacity:increasingly large and at low cost.//O bandwidth:improved dramaticallyDisk capacity: huge and at very low cost. Cluster and Internet bandwidths: very richBad News indemandCPU cycles per Watt decreases. (less energy efficient)Cache capacity: always limited.Improvementofdataaccesslatenciesveryslow.- Networking and energy costs are increasingly highAdam Smith: commodity price is defined by an“invisiblehand"inthe market.Weneed to balanceOversupplied cycles,largestorage capacity,fastnetworksHigh demand of low latency accesses, low energy cost
Major Resources in Computing and Network Systems ▪ Good News in supply ▪ CPU cycles: oversupplied for many applications. ▪ Memory bandwidth: improved dramatically. ▪ Memory capacity: increasingly large and at low cost. ▪ I/O bandwidth: improved dramatically. ▪ Disk capacity: huge and at very low cost. ▪ Cluster and Internet bandwidths: very rich. ▪ Bad News in demand ▪ CPU cycles per Watt decreases. (less energy efficient). ▪ Cache capacity: always limited. ▪ Improvement of data access latencies very slow. ▪ Networking and energy costs are increasingly high ▪ Adam Smith: commodity price is defined by an “invisible hand” in the market. We need to balance ▪ Oversupplied cycles, large storage capacity, fast networks ▪ High demand of low latency accesses, low energy cost

Moore's Law Driven Computing Research (IEEE spectrum, May2008)4.0GHz40010years ofdarkageof parallelcomputing,3.5 GHzCPU-memorygap isthemajor concern3.0 GHz300QuadcoreDRAM(costof1MBinUSS)Dualcore2.5GHzadsosro2002.0GHzSirgleprocessor1.5GHz1.0GHz100500MHzCostof1MBofDRAMin2006:$0.0009Processorspeedin1971;400KHz0197120001975198019851990199520052008个个Neweraofmulticorecomputing25year ofgoldenageofparallel computingMemoryproblemcontinues5
5 Moore’s Law Driven Computing Research (IEEE Spectrum, May 2008) hi 25 year of golden age of parallel computing 10 years of dark age of parallel computing, CPU-memory gap is the major concern. New era of multicore computing Memory problem continues

Thedisksin2000are57timesSLOWER"thantheirancestors in 1980 --- increasingly widen the Speed GapAdiskaccesshasa4milliontimedelayoveracachehit5,000,000500000045000004000000350000030000001,666,66625000002000000560,0001500000451,80787,0001.2100000011.660.30.370.70.92.55000001.25019801985199019952000YearSRAMAccessTimeDRAMAccessTimeDiskSeekTimeBryant andO'Hallaron,"Computer Systems:A Programmer's Perspective"6PrenticeHall,2003
6 0.3 0.37587,000 0.9 1.2 451,807 0.7 2 560,000 2.5 11.66 1,666,666 1.25 37.5 5,000,000 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000 CPU Cycles 1980 1985 1990 1995 2000 Year Latencies of Cache, DRAM and Disk in CPU Cycles SRAM Access Time DRAM Access Time Disk Seek Time Unbalanced System Improvements: A disk perspective Bryant and O’Hallaron, “Computer Systems: A Programmer’s Perspective”, Prentice Hall, 2003 The disks in 2000 are 57 times “SLOWER” than their ancestors in 1980 - increasingly widen the Speed Gap A disk access has a 4 million time delay over a cache hit

1to100MillionsTimesDelayTodayforDiskAccesses100.000.000MemoryLatency(ns)10.000.0001.100.000100.0001.0000-DRNFiashcachTeoTOTieryoanncBtoraeStorageFOoraaeTierO storage:high-enddisksconnected byfastswitches fortransactional dataTier1 storage:SATAdiskarraysformissioncriticaldataTier2storage:DiskarraysforseldomusedarchiveddataJeffRichardson,"BridgingtheI/OGap",TheDataCenterJournal,20127
7 1 to 100 Millions Times Delay Today for Disk Accesses Tier 0 storage: high-end disks connected by fast switches for transactional data Tier 1 storage: SATA disk arrays for mission critical data Tier 2 storage: Disk arrays for seldom used archived data Jeff Richardson, “Bridging the I/O Gap”, The Data Center Journal, 2012

TechnologyAdvancementsin45years: Single-core CPU reached its peak performance1971(2300transistors0nlntel4004chip):0.4MHz 2005 (1 billion + transistors on IntelPentium D):3.75GHz-After10,oo0timesimprovement,GHzstoppedanddropped- CPU improvement will be reflected by number of cores in a chip.Increased DRAM capacity enables largeworking sets- 1971 ($400/MB)to2014(0.75cent/MB):a reductionof533,333times-In-memory computingis a reality:SSDs (flash memory) can further reduce the access latency- Non-volatile device with limited write life (can be an independent disk)-Lowpower (6-8Xlowerthandisks,2X lowerthanDRAM)- Fast random read (200X fasterthan disks,25X slower thanDRAM)8
8 Technology Advancements in 45 years • Single-core CPU reached its peak performance – 1971 (2300 transistors on Intel 4004 chip): 0.4 MHz – 2005 (1 billion + transistors on Intel Pentium D): 3.75 GHz – After 10,000 times improvement, GHz stopped and dropped – CPU improvement will be reflected by number of cores in a chip • Increased DRAM capacity enables large working sets – 1971 ($400/MB) to 2014 (0.75 cent/MB): a reduction of 533,333 times – In-memory computing is a reality • SSDs (flash memory) can further reduce the access latency – Non-volatile device with limited write life (can be an independent disk) – Low power (6-8X lower than disks, 2X lower than DRAM) – Fast random read (200X faster than disks, 25X slower than DRAM)

Data-IntensiveScalableComputing(DiSC)MassivelyAccessing/ProcessingData Sets inFast Speed> An initial big data report, endorsed by Industries: Intel,Google, Microsoft, Sun, and scientists in many areas.>Applications in science, industry, and business.Special requirementsforDisCInfrastructure:Top 50o DISC ranked by data throughput, as well FLOPSFrequent interactions between parallel CPUs anddistributed storages. Scalability is challenging> DiSC is not an extension of SC, but demands newtechnology advancements.9
9 Data-Intensive Scalable Computing (DISC) ❑ Massively Accessing/Processing Data Sets in Fast Speed ➢ An initial big data report, endorsed by Industries: Intel, Google, Microsoft, Sun, and scientists in many areas. ➢Applications in science, industry, and business. ❑ Special requirements for DISC Infrastructure: ➢ Top 500 DISC ranked by data throughput, as well FLOPS ➢ Frequent interactions between parallel CPUs and distributed storages. Scalability is challenging. ➢ DISC is not an extension of SC, but demands new technology advancements

Systems Comparison: (courtesy of BryantDISCConventional ComputersSystemSystem-Diskdatastoredseparately- System collects and·Nosupportforcollectionormaintains datamanagement: Shared, active data setBrought in for computation- Computation co-located.Timeconsumingwithdisks.Limits interactivity·Fasteraccess10
10 Systems Comparison: (courtesy of Bryant) – Disk data stored separately • No support for collection or management – Brought in for computation • Time consuming • Limits interactivity – System collects and maintains data • Shared, active data set – Computation co-located with disks • Faster access System System Conventional Computers DISC