FLYNN: COMPUTER ORGANIZATIONS 951 mun_中国高校课件下载中心

点击下载：《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Some Computer Organizations and Their Effectiveness

正在加载图片...

FLYNN:COMPUTER ORGANIZATIONS 951 munication;and 2)the possibilities for high computa- tional bandwidth within a stream. Interstream Communications There are two aspects of communications:operational resource accessing of a storage item(OXS)and storage itJ- to storage transfer (SXS).Both aspects can be repre- i+J+1i sented by communications matrices each of whose entry ti is the time to transfer a datum in the jth storage re- Fig.2.Stream inertia. source to the ith resource (operational or storage).The operational communication matrix is quite useful for MIMD organizations,while the storage communica- perf.max L·△t tions matrix is usually more interesting for SIMD organizations.An(OXO)matrix can also be defined for This is illustrated in Fig.2.Successive instructions are describing MISD organizations. offset in this example by At time units. An alternate form of the communications matrix, called the connection matrix,can also be developed for System Classification the square matrix cases.This avoids possibly large or Then to summarize,a technology independent infinite entries possible in the communications matrix macroscopic specification of a large computing system (when interstream communications fail to exist).The would include:1)the number of instruction streams and reciprocal of the normalized access time ta/t(assuming the number of data streams-the "instruction"and ti is the minimum entry for a row)is entered for the "data"unit should be taken with respect to a convenient access time of an element of the ith data storage re- reference;2)the appropriate communications (or con- source by the jth operational or storage resource di.The nection)matrices;and 3)the stream inertia factor J and minimum access time (resolution)is 1.If a particular the number of time units of instruction execution la- item were inaccessible,there would be a zero entry. tency L. Notice that in comparing parallel organizations to the serial organization,the latter has immediate access to EFFECTIVENESS IN PERFORMING THE corresponding data.While it appears that under certain COMPUTING PROCESS conditions an element expression can be zero due to Resolution of Entropy lack of communication between resources,in practice this does not occur since data can be transferred from Measures of the effectiveness are necessarily problem one stream to another in finite time,however slow.Usu- based.Therefore,comparisons between parallel and ally such transfers occur in a common storage hierarchy. simplex organizations frequently are misleading since such comparisons can be based on different problem Stream Inertia environments.The historic view of parallelism in prob- lems is probably represented best by Amdahl [6]and is It is well known that the action of a single-instruction shown in Fig.3.This viewpoint is developed by the ob- stream may be telescoped for maximum performance by servation that certain operations in a problem environ- overlapping the various constituents of the execution of ment must be done in an absolutely sequential basis. an individual instruction [4].Such overlapping usually These operations include,for example,the ordinary does not exceed the issuing of one instruction per in-housekeeping operations in a program.In order to struction decode resolution time At.This avoids the achieve any effectiveness at all,from this point of view, possibly exponentially increasing number of decision parallel organizations processing N streams must have elements required in such a decoder [1],[5].A recent substantially less than 1/NX100 percent of absolutely study [13]provides an analysis of the multiple-instruc- sequential instruction segments.One can then proceed tion issuing problem in a single-overlapped instruction to show that typically for large N this does not exist in stream.In any event,a certain number of instructions conventional programs.A major difficulty with this in a single-instruction stream are being processed during analysis lies in the concept of "conventional programs" the latency time for one instruction execution.This since this implies that what exists today in the way of number may be referred to as the confluence factor or programming procedures and algorithms must also exist inertia factor J of the processor per individual instruc- in the future.Another difficulty is that it ignores the tion stream.Thus the maximum performance per in-possibility of overlapping some of this sequential pro- struction stream can be enhanced by a factor J.If the cessing with the execution of "parallel"tasks. average instruction execution time is L.At time units, To review this problem from a general perspective, the maximum performance per stream would be consider a problem in which Ni words each of p bitsFLYNN: COMPUTER ORGANIZATIONS 951 munication; and 2) the possibilities for high computa- i Lht- - L. tional bandwidth within a stream. IV.t, Interstream Communications i + J - IThere are two aspects of communications: operational resource accessing of a storage item (OXS) and storage i+J to storage transfer (S XS). Both aspects can be repre- - sented by communications matrices each of whose entry tij is the time to transfer a datum in the jth storage re- Fig. 2. Stream inertia. source to the ith resource (operational or storage). The operational communication matrix is quite useful for J MIMD organizations, while the storage communica- perfLma t tions matrix is usually more interesting for SIMD organizations. An (OXO) matrix can also be defined for This is illustrated in Fig 2 Successive instructions are describing MISD organizations. offset in this example by At time units. An alternate form of the communications matrix, System Classification called the connection matrix, can also be developed for the square matrix cases. This avoids possibly large or Then to summarize, a technology independent infinite entries possible in the communications matrix macroscopic specification of a large computing system (when interstream communications fail to exist). The would include: 1) the number of instruction streams and reciprocal of the normalized access time ti/ti, (assuming the number of data streams-the "instruction" and tii is the minimum entry for a row) is entered for the "data" unit should be taken with respect to a convenient access time of an element of the ith data storage re- reference; 2) the appropriate communications (or consource by the jth operational or storage resource dii. The nection) matrices; and 3) the stream inertia factor J and minimum access time (resolution) is 1. If a particular the number of time units of instruction execution laitem were inaccessible, there would be a zero entry. tency L. Notice that in comparing parallel organizations to the serial organization, the latter has immediate access to corresponding data. While it appears that under certain COMPUTING PROCESS conditions an element expression can be zero due to Resolution of Entropy lack of communication between resources, in practice Measures of the effectiveness are necessarily problem this does not occur since data can be transferred from based. Therefore, comparisons between parallel and one stream to another in finite time, however slow. Usu- . . . a ally such transfers occur in a common storage hierarchy. sipe orizons frequen ar ding since such comparisons can be based on different problem Stream Inertia environments. The historic view of parallelism in problems is probably represented best by Amdalhl [6] and is It is well known that the action of a single-instruction shown in Fig. 3. This viewpoint is developed by the obstream may be telescoped for maximum performance by servation that certain operations in a problem environoverlapping the various constituents of the execution of ment must be done in an absolutely sequential basis. an individual instruction [4]. Such overlapping usually These operations include, for example, the ordinary does not exceed the issuing of one instruction per in- housekeeping operations in a progranm. In order to struction decode resolution time At. This avoids the achieve any effectiveness at all, from this point of view, possibly exponentially increasing number of decision parallel organizations processing N streams must hlave elements required in such a decoder [1], [5]. A recent substantially less than 1/NX100 percent of absolutely study [13] provides an analysis of the multiple-instruc- sequential instruction segments. One can then proceed tion issuing problem in a single-overlapped instruction to show that typically for large N this does not exist in stream. In any event, a certain number of instructions conventional programs. A major difficulty with this in a single-instruction stream are being processed during analysis lies in the concept of "conventional programs" thle latency time for one instruction execution. This since this implies that wvhat exists today in the way of number may be referred to as the confluence factor or programming procedures and algorithms must also exist inertia factor J of the processor per individual instruc- in the future. Anothaer difficulty is that it ignores the tion stream. Thlus the maximum performance per in- possibility of overlapping some of this sequential prostruction stream can be enhanced by a factor J. If the cessing with the execution of "parallel" tasks. average instruction execution time is L-At time units, To reviewr this problem from a general perspective, the maximum performance per stream would be consider a problem in which N1 words each of p bits

<<向上翻页向下翻页>>

点击下载：《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Some Computer Organizations and Their Effectiveness