958 IEEE TRANSACTIONS ON COMPUTERS, S_中国高校课件下载中心

点击下载：《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Some Computer Organizations and Their Effectiveness

正在加载图片...

958 IEEE TRANSACTIONS ON COMPUTERS,SEPTEMBER 1972 consider the use of multiple-skeleton processors sharing the execution resources.A "skeleton"processor consists Data HInstruction Counter Buffer of only the basic registers of a system and a minimum of 人 Instruction Register control (enough to execute a load or store type instruc- tion).From a program point of view,however,the accumulator skeleton processor is a complete and independent logical Index Registers computer. instruction These processors may share the resources in space Buffer or Cache [22],or time [23]-[25].If we completely rely on space sharing,we have a "cross-bar"type switch of processors (a) -each trying to access one of n resources.This is usu- Time divlslons ally unsatisfactory since the cross-bar switching time overhead can be formidable.On the other hand,time- Skeleton processor phased sharing (or time-multiplexing)of the resources can be attractive in that no switching overhead is in- volved and control is quite simple if the number of Pipelined multiplexed processors are suitably related to each of Execution the pipelining factors.The limitation here is that again Units only one of the m available resources is used at any one moment. A possible optimal arrangement is a combination of space-time switching (Fig.9).The time factor is the Rings number of skeleton processors multiplexed on a time- up to K simultaneous phase ring,while the space factor is the number of requests multiplexed processor "rings"K which simultaneously request resources.Note that K processors will contend for the resources and,up to K-1,may be denied service at that moment.Thus a rotating priority among the rings is suggested to guarantee a minimum performance. The partitioning of the resources should be determined by the expected request statistics. (b) When the amount of“parallelism”(or number of Fig.9.(a)Skeleton processor.(b)Time-multiplexed multiprocessing. identifable tasks)is less than the available processors, we are faced with the problem of accelerating these tasks.This can be accomplished by designing certain of the processors in each ring with additional staging and Ring I interlock [13](the ability to issue multiple instructions 16 Processors: simultaneously)facilities.The processor could issue /ie multiple-instruction execution requests in a single-ring time slots) revolution.For example,in a ring N=16,8 processors could issue 2 request/revolutions,or 4 processors could issue 4 request/revolutions;or 2 processors could issue 8 request/revolutions;or 1 processor could issue 16 re- quest/revolutions.This partition is illustrated in Fig.10. Ring 2 Resources Of course mixed strategies are possible.For a more de- only 4 processors {} tailed discussion the reader is referred to [25],[26], active and[31]. (I instruction resourceT access per 4 time slots) SOME CONSIDERATIONS ON SYSTEMS RESOURCES Fig.10.Subcommutation of processors on ring. The gross resources of a system consist of execution, instruction control,primary storage,and secondary implied in the instruction upon the specified operands. storage. The execution bandwidth of a system is usually referred to as being the maximum number of operations that can Execution Resources be performed per unit time by the execution area.No- tice that due to bottlenecks in issuing instructions,for The execution resources of a large system include the example,the execution bandwidth is usually substan- decision elements which actually perform the operations tially in excess of the maximum performance of a sys-958 IEEE TRANSACTIONS ON COMPUTERS, SEPTEMBER 1972 consider the use of multiple-skeleton processors sharing the execution resources. A "skeleton" processor consists B-f Data Instruction Counter of only the basic registers of a system and a minimum of Br H--Instruction Register control (enough to execute a load or store type instruc- accumulator tion). From a program point of view, however, the skeleton processor is' a complete and independent logical D Index Registers computer. Instruction computer.~~~~~~~~~~~~~~~~~~~~~~~~Bffro These processors may share the resources in space Cache [22], or time [23]-[25]. If we completely rely on space sharing, we have a "cross-bar" type switch of processors (a) -each trying to access one of n resources. This is usu- Time divisions ally unsatisfactory since the cross-bar switching time Shored overhead can be formidable. On the otlher hand, tinme- Memory N Processors Skeleton processor phased sharing (or time-multiplexing) of the resources r Ring can be attractive in that no switching overhead is in- \ volved and control is quite simple if the number of H Piplined nmultiplexed processors are suitably related to eaclh of pel R Execution the pipelining factors. TFhe limitation here is that again iq Units only one of the m available resources is used at any one fector moment. o A possible optimal arrangement is a combination Of K space-time switching (Fig. 9). Ihe timiie factor is the Rings number of skeleton processors multiplexed on a time- K simulptneous J phase ring, wlhile the space factor is the number of requests r multiplexed processor "rings" K which simultaneously q I Rm request resources. Note that K processors will contend for the resources and, up to K - 1, mnay be denied service at that moment. Thus a rotating priority aimong the ) rings is suggested to guarantee a mimniiiium performance. The partitioning of the resources should be determined by the expected request statistics. (b) When the amount of "parallelism" (or number of Fig.9. (a)Skeletoniprocessor. (b)Time-multiplexedmultiprocessing. identifiable tasks) is less than the available processors, we are faced with the problem of accelerating these tasks. This can be accomplislhed by designing certain of the processors in each ring with additional staging and 16Ring IP 5 interlock [13] (tlhe ability to issue multiple instructions 6 Procsor -1 (I instructioni- simultaneously) facilities. The processor could issue resource access/162 I multiple-instruction execution requests in a single-ring time slots) 11 revolution. For example, in a ring N= 16, 8 processors could issue 2 request/revolutions, or 4 processors could issue 4 request/revolutions; or 2 processors could issue 8 request/revolutions; or 1 processor could issue 16 re- 7- Roc quest/revolutions. This partition is illustrated in Fig. 10. 2Resources Ring Of course nmixed strategies are possible. For a more de- oniy4 pocessors R tailed discussion the reader is referred to [25], [26], active, {Rl and [31]. (i instruction resource / access per 4 time slots)/ _ _ SOME CONSIDERATIONS ON SYSTEMS RESOURCES Fig. 10. Subcommutation of processors on ring. The gross resources of a systeml consist of execution, instruction control, primary storage, and secondary implied in the instruction upon the specified operands. storage. The execution bandwidth of a system is usually referred to as being the maximum number of operations that can ExecutionResources ~~~~be performed per unit time by the execution area. No- - ~~~tice that due to bottlenecks in issuing instructions, for The execution resources of a large system include the example, the execution bandwvidth is usually substandecision elements whlichl actually perform the operations tially in excess of the maximum performance of a sys-

<<向上翻页向下翻页>>

点击下载：《GPU并行编程 GPU Parallel Programming》课程教学资源（参考文献）Some Computer Organizations and Their Effectiveness