
高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 9Control Flow Transfer in Java Processing:Characterization and Architectural Implications
高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 9 Control Flow Transfer in Java Processing: Characterization and Architectural Implications

高级计算机体系结构设计及其在数据中心和云计算的应用Control Flow Transfer in Modern ProcessorsProblem:pipelinestallSolution:prediction, speculative executionAccuracyof predictionisveryimportantTo improveaccuracyof predictionforemergingworkloads: characterization is the first stepSelectedJavaduetoitspopularityinembeddedtoserverplatforms
高级计算机体系结构设计及其在数据中心和云计算的应用 Control Flow Transfer in Modern Processors • Problem: pipeline stall • Solution: prediction, speculative execution • Accuracy of prediction is very important • To improve accuracy of prediction for emerging workloads: characterization is the first step 2 – Selected Java due to its popularity in embedded to server platforms

高级计算机体系结构设计及其在数据中心和云计算的应用Java ProcessingJavaPrograms(Bytecode)ClassLibrariesClassLoading&I/ORMIawtnetVerificationJVMJITCompilerInterpreterNativeJavaGarbageCodeThreadManagementNativeCollectionInterfaceOperating SystemHardwarePlatform
高级计算机体系结构设计及其在数据中心和云计算的应用 Java Processing Java Programs (Bytecode) Class Loading & Verification awt net I/O RMI . Class Libraries JVM 3 Interpreter . . . JIT Compiler Garbage Collection Thread Management Native Code Java Native Interface Operating System Hardware Platform

高级计算机体系结构设计及其在数据中心和云计算的应用MajorFeaturesof JavaMajorrun-timefeaturesofJavaarisefromeither-Javaruntime environment:Java Virtual Machine (JVM)-Interpreter-JustInTime (JIT)compiler. Garbage collection, synchronization,class loading etc.Javalanguagefeatures(Objectorientedness,classesvirtualmethodsetc.)
高级计算机体系结构设计及其在数据中心和云计算的应用 Major Features of Java • Major run-time features of Java arise from either – Java runtime environment • Java Virtual Machine (JVM) – Interpreter – Just In Time (JIT) compiler • Garbage collection, synchronization, class loading etc. 4 – Java language features (Object orientedness, classes, virtual methods etc.)

高级计算机体系结构设计及其在数据中心和云计算的应用OsActivity inJavaProcessingSPECjvm98:4%~17%[ICS'00]Java Servers:65.3%[Luo etal.,ISPASS'01]SPECInt95:<2%[ICS'00]OSActivityinSPECivm98%ofosCycleBenchmarks4.3compress14.9jessdb12.614.9javac8.6mtrtjack16.9
高级计算机体系结构设计及其在数据中心和云计算的应用 OS Activity in Java Processing • SPECjvm98: 4%~17% [ICS’00] • Java Servers: 65.3% [Luo et al., ISPASS’01] • SPECInt95: <2% [ICS’00] OS Activity in SPECjvm98 Benchmarks % of OS Cycle 5 compress 4.3 jess 14.9 db 12.6 javac 14.9 mtrt 8.6 jack 16.9

高级计算机体系结构设计及其在数据中心和云计算的应用IndirectBranchFrequency in JavaProcessingJava:uniformlyhigh (dueto OO features + JvM)C:gcc, perl and li (compilation and interpretation)Benchmarks%ofIndirectBranchesJITInterpretationdb3.02.53.3jess2.42.61.9javac2.52.1jack2.72.0mtrt4.31.6compress0.7go0.4compress0.8m88ksim1.1gcc0.2jjpegli2.02.2perl0.9vortex6
高级计算机体系结构设计及其在数据中心和云计算的应用 Indirect Branch Frequency in Java Processing • Java: uniformly high (due to OO features + JVM) • C: gcc, perl and li (compilation and interpretation) Benchmarks % of Indirect Branches Interpretation JIT db 3.0 2.5 jess 3.3 2.4 javac 2.6 1.9 Java jack 2.5 2.1 (SPECJvm98) 6 jack 2.5 2.1 mtrt 2.7 2.0 compress 4.3 1.6 Java (SPECJvm98) go 0.7 compress 0.4 m88ksim 0.8 gcc 1.1 ijpeg 0.2 li 2.0 C (SPECInt95) perl 2.2 vortex 0.9

高级计算机体系结构设计及其在数据中心和云计算的应用ProblemStatementImpactof OS activityonbranchbehaviorof Javais notwellunderstood.IndirectbranchbehaviorinJavaisnotwellunderstoodDothecharacteristicsofOSbranchesorindirectbranchesmotivatearchitecturalenhancementsinprocessors?Ifso,whatmodifications?
高级计算机体系结构设计及其在数据中心和云计算的应用 Problem Statement • Impact of OS activity on branch behavior of Java is not well understood. • Indirect branch behavior in Java is not well understood. 7 • Do the characteristics of OS branches or indirect branches motivate architectural enhancements in processors? If so, what modifications?

高级计算机体系结构设计及其在数据中心和云计算的应用Characterization of Os branch behaviorKernelinvocationsareshort-livedandkernelexecutesfewerbranchespercontext10001000jack (kernel)jack (user)10010010020003000500001000200010004000300040005000UserContextSerialNoKernelContextSerialNoExecutedBranchesinUser andKernel Contexts(5,000SamplingContextsonSPECjvm98Benchmarkjack)8
高级计算机体系结构设计及其在数据中心和云计算的应用 Characterization of OS branch behavior – Kernel invocations are short-lived and kernel executes fewer branches per context. jack (user) 100 1000 Number of Executed jack (kernel) 100 1000 Number of Executed 8 1 10 100 0 1000 2000 3000 4000 5000 Context Switch Number of Executed Branches 1 10 100 0 1000 2000 3000 4000 5000 Context Switch Number of Executed Branches Executed Branches in User and Kernel Contexts (5,000 Sampling Contexts on SPECjvm98 Benchmark jack) User Context Serial No. Kernel Context Serial No

高级计算机体系结构设计及其在数据中心和云计算的应用80kernel70usererer6050403020100compress80vadiaAverageNumberofExecutedBranchesper Context in User and Kernel Modes
高级计算机体系结构设计及其在数据中心和云计算的应用 30 40 50 60 70 80 kernel user 9 0 10 20 Average Number of Executed Branches per Context in User and Kernel Modes

高级计算机体系结构设计及其在数据中心和云计算的应用ThePredictabilityof OsbranchesBranch(PC)bitsusedforbranchaddressTotalBHTof2j+kEntriesBHSRibitsSizeofbitsSchemebits usedBHSRBHTschemeAsizeforBHTindex(# ofselectionBHSRS(i=1..9)index(i)(i)BHT(k)entries)2bc.2'K002'Ki+10002'KGAg.2'Ki+10.0i+642'KGAs.2'K2'K00Gshare.2'Ki+10kbitsi+5i+842'KSAs.2'KBHT: Branch History TablepredictionBHSR:BranchHistoryShiftRegister14□2K12■4K108K□16K86■32K4202bcSAsGAsGAgGshareBranchPredictorConfigurations and Kernel BranchesPredictability[ISPASS'o1]
高级计算机体系结构设计及其在数据中心和云计算的应用 The Predictability of OS branches BHT of 2j+k Entries branch address . . . . . . . BHSRs . . . i bits k bits j bits Branch (PC) bits used for Scheme size (i=1.9) BHSR selection (i) BHT index (j) BHSR bits used for BHT index (k) Total Size of scheme (# of BHT entries) 2bc.2i K 0 i+10 0 2i K GAg.2i K 0 0 i+10 2i K GAs.2i K 0 i+6 4 2i K Gshare.2i K 0 0 i+10 2i K SAs.2i K i+8 i+5 4 2i K 10 prediction BHT: Branch History Table BHSR: Branch History Shift Register SAs.2i K i+8 i+5 4 2i K 0 2 4 6 8 10 12 14 2bc GAs SAs GAg Gshare Misprediction Rate (%) 2K 4K 8K 16K 32K Branch Predictor Configurations and Kernel Branches Predictability [ISPASS’01]