954 IEEE TRANSACTIONS ON COMPUTERS, S_中国高校课件下载中心

点击下载：《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Some Computer Organizations and Their Effectiveness

正在加载图片...

954 IEEE TRANSACTIONS ON COMPUTERS,SEPTEMBER 1972 SIMD and Its Effectiveness Control There are three basic types of SIMD processors, Unit that is,processors characterized by a master instruction ocessin applied over a vector of related operands.These in- clude (Fig.5)the following types. Memory 1)The Array Processor:One control unit and m di- rectly connected processing elements.Each processing element is independent,i.e.,has its own registers and N Universol Execution Resources storage,but only operates on command from the con- (a) trol unit. 2)The Pipelined Processor:A time-multiplexed ver- ADD AU儿TIPLY TC sion of the array processor,that is,a number of func- Unit tional execution units,each tailored to a particular func- Memory staged tion.The units are arranged in a production line fashion, dedicated resources staged to accept a pair of operands every At time units. The control unit issues a vector operation to memory. Memory is arranged so that it is suitable to a high-speed stage delo data transfer and produces the source operands which are entered a pair every At time units into the desig- (b) nated function.The result stream returns to memory. 3)The Associative Processor:This is a variation of the Control array processor.Processing elements are not directly Unit addressed.The processing elements of the associative processor are activated when a generalized match rela- tion is satisfied between an input register and charac- teristic data contained in each of the processing ele- ments.For those designated elements the control unit instruction is carried out.The other units remain idle. A number of difficulties can be anticipated for the SIMD organization.These would include the following problems. 1)Communications between processing elements. 2)Vector Fitting:This is the matching of size between Execute if match Is satisfled the logical vector to be performed and the size of the (c) physical array which will process the vector. Fig.5.SIMD processors.(a)Array processor.(b)Pipelined pro- 3)Sequential Code (Nonvector):This includes house- cessor.(c)Associative processor. keeping and bookkeeping operations associated with the preparation of a vector instruction.This corresponds to widely studied [17]-[20].Results to date,however, the Amdahl effect.Degradation due to this effect can be indicate that it is not as significant a problem as was masked out by overlapping the sequential instructions earlier anticipated.Neuhauser [17],in an analysis of with the execution of vector type instructions. several classical SIMD programs,noted that communi- 4)Degradation Due to Branching:When a branch cations time for an array-type organization rarely ex- point occurs,several of the executing elements will be in ceeded 40 percent of total job time and for the matrix one state,and the remainder will be in another.The inversion case was about 15 percent. master controller can essentially control only one of the The fitting problem is illustrated in Fig.6.Given a two states;thus the other goes idle. source vector of size m,performance is effected in an 5)Empirically,Minsky and Papert [29]have ob-array processor when the M physical processing ele- served that the SIMD organization has performance ments do not divide m [21].However,so long as m is portional to the logz m(m,the number of data streams substantially larger than M,this effect will not con- per instruction stream)rather than linear.If this is tribute significant performance degradation.The pipe- generally true,it is undoubtedly due to all of the pre-line processor exhibits similar behavior,as will be dis- ceding effects (and perhaps others).We will demon- cussed later. strate an interpretation of it based upon branching The Amdahl effect is caused by a lack of "parallelism" degradation. in the source program;this can be troublesome in any Communication in SIMD organizations has been multistream organization.Several SIMD organizations954 IEEE TRANSACTIONS ON COMPUTERS, SEPTEMBER 1972 SIMD and Its Effectiveness Control * * E There are three basic types of SIMX\D processors, Communicatlon that is, processors characterized by a master instruction npcissn applied over a vector of related operands. These in- Element P E PE clude (Fig. 5) the following types. 1) The Array Processor: One control unit and m directly connected processing elements. Each processing -N Universal Enecution Resources element is independent, i.e., has its own registers and storage, but only operates on command from the con- (a) trol unit. 2) The Pipelined Processor: A time-multiplexed ver- Control fADD MULTIPLY ETC, sion of the array processor, that is, a number of func- Unit Memory tional execution units, each tailored to a particular func- sted tion. The units are arranged in a production line fashion, resources staged to accept a pair of operands every LAt time units. _ The control unit issues a vector operation to memory. Memory is arranged so that it is suitable to a hiigh-speed execution tiYe, data transfer and produces the source operands which are entered a pair every A\t time units into the desig- (b) nated function. The result stream returns to memory. *3) The Associative Processor: This is a variation of the Control array processor. Processing elements are not directly Unit addressed. The processing elements of the associative Inquiry Register processor are activated when a generalized match relation is satisfied between an input register and clharacteristic data contained in each of the processing elements. For those designated elements the control unit instruction is carried out. The other units remain idle. (sreglstive A number of difficulties can be anticipated for the SI1\ID organization. These would include the following problems. 1) Communications betwreen processing elements. 2) Vector Fitting: This is tlhe matching of size between Execute If match s satisfied the logical vector to be performed and the size of the (c) physical array which will process the vector. Fig. 5. SIMD processors. (a) Array processor. (b) Pipeliined pro- 3) Sequential Code (Nonvector): This includes house- cessor. (c) Associative processor. keeping and bookkeeping operations associated with the preparation of a vector instruction. This corresponds to widely studied [17]-[20]. Results to date, however, the Amdalhl effect. Degradation due to this effect can be indicate tllat it is not as significant a problem as wTas masked out by overlapping the sequential instructions earlier anticipated. Neuhauser [17], in an analysis of with the execution of vector type instructions. several classical SII\ID programs, noted that conmmuni- 4) Degradation Due to Branching: When a branch cations timle for an array-type organization rarely expoint occurs, several of the executing elements will be in ceeded 40 percent of total job time and for the matrix one state, and the remiiainder will be in another. The inversion case was about 15 percent. master controller can essentially control only onie of the The fitting problem is illustrated in Fig. 6. Given a two states; thus the otlher goes idle. source vector of size m, performance is effected in an 5) Empirically, M\linsky and Papert [29] lhave ob- array processor w-vhen the M.1 phlysical processing eleserved that the SIMID organization has performiiance ments do not divide m [21]. However, so long as m. is portional to thlelog2m (in,thlenumlber of data streamns sub)stantially larger thlan M1, this effect wvill not conper instruction streaml) rather than linear. If thlis is tribute significant performance degradation. Thlepipe- generally true, it is undoubtedly due to all of thle pre- line processor exhlibits simlilar behlavior, as will be disceding effects (and perhlaps othlers). We will demlon- cussed later. strate an interpretation of it based upon branching Thle Amndahl effect is caused by a lack of "parallelism " degradation. in the source programl; this can be troublesomle in any Communication in SI i\ D organizations hlas been multistreaml organization. Several SI i\ D organizationls

<<向上翻页向下翻页>>

点击下载：《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Some Computer Organizations and Their Effectiveness