he Ten commandments of excellent design Dowloadfromhttp://www.fpga.com.cn Peter chambers Engineering Fellow VLSI Technology This report will give you some pointers that will help you design synchronous circuits that work first time. Ten commandments that should always be followed Using Synchronous Circuits Synchronous digital systems are pervasive in today's designs Engineers create clocked circuits for every conceivable application, with frequencies from DC to GHz. Every synchronous system employs certain common characteristics, and is prone to a group of common faults. These faults can cause instability and unreliability, and may not be uncovered in the typical design process. The net result is a poor product that fails to meet the design criteria, and the engineer has to go through the suffering of design modification and revision. This is time- consuming and costly. However, by applying a few simple rules, you can avoid synchronous design faults in your designs and achieve consistent first-pass ccess. In this article you'll learn the sources of the most common problems and their solutions, and how to apply these ideas to your designs c1997, VLSI Technology
© 1997, VLSI Technology 1 The Ten Commandments of Excellent Design Dowload from: http:// www.fpga.com.cn Peter Chambers Engineering Fellow VLSI Technology This report will give you some pointers that will help you design synchronous circuits that work first time. Ten commandments that should always be followed! Using Synchronous Circuits Synchronous digital systems are pervasive in today’s designs. Engineers create clocked circuits for every conceivable application, with frequencies from DC to GHz. Every synchronous system employs certain common characteristics, and is prone to a group of common faults. These faults can cause instability and unreliability, and may not be uncovered in the typical design process. The net result is a poor product that fails to meet the design criteria, and the engineer has to go through the suffering of design modification and revision. This is timeconsuming and costly. However, by applying a few simple rules, you can avoid synchronous design faults in your designs and achieve consistent first-pass success. In this article you’ll learn the sources of the most common problems and their solutions, and how to apply these ideas to your designs
igital Systems 101 We'll begin by describing a typical synchronous circuit. Many variations are pos- ible but a simple example will be adequate to illustrate the sources of error Figure 1 shows the circuit and timing for one clocked element of the example One issue that deserves mention is this: Why use synchronous logic at all? Wouldn't asynchronous logic be faster? The answers to these questions could take a book, but here are some reasons to use synchronous designs Synchronous designs eliminate the problems associated with speed varia- tions through different paths of logic. By sampling signals at well-defined time intervals, fast paths and slow paths can be handled in a simple manner Synchronous designs work well under variations of temperature, voltage and process. This stability is key for high-volume manufacturing Many designs must be portable-that is, they must be easy to migrate to a new and improved technology(say, moving from 6 micron to. 35 micron) The deterministic behavior of synchronous designs makes them much more straightforward to move to a new technology Interfacing between two blocks of logic is simplified by defining standardized synchronous behavior. Asynchronous interfaces demand elaborate hand shaking or token passing to ensure integrity of information; synchronous designs with known timing characteristics can guarantee correct reception of Heck, I Know What a Flip Synchronous circuits are made with a mixture of combinatorial logic and Flop Is! clocked elements, such as fiip flops or registers. The clocked elements share a common clock, and all transition from one state to another on the rising edge of The Ten Commandments of Excellent Design
Digital Systems 101 2 The Ten Commandments of Excellent Design Digital Systems 101 We’ll begin by describing a typical synchronous circuit. Many variations are possible, but a simple example will be adequate to illustrate the sources of error. Figure 1 shows the circuit and timing for one clocked element of the example. One issue that deserves mention is this: Why use synchronous logic at all? Wouldn’t asynchronous logic be faster? The answers to these questions could take a book, but here are some reasons to use synchronous designs: • Synchronous designs eliminate the problems associated with speed variations through different paths of logic. By sampling signals at well-defined time intervals, fast paths and slow paths can be handled in a simple manner. • Synchronous designs work well under variations of temperature, voltage and process. This stability is key for high-volume manufacturing. • Many designs must be portable—that is, they must be easy to migrate to a new and improved technology (say, moving from .6 micron to .35 micron). The deterministic behavior of synchronous designs makes them much more straightforward to move to a new technology. • Interfacing between two blocks of logic is simplified by defining standardized synchronous behavior. Asynchronous interfaces demand elaborate handshaking or token passing to ensure integrity of information; synchronous designs with known timing characteristics can guarantee correct reception of data. Heck, I Know What a FlipFlop Is! Synchronous circuits are made with a mixture of combinatorial logic and clocked elements, such as flip flops or registers. The clocked elements share a common clock, and all transition from one state to another on the rising edge of
Digital Systems 101 the clock. When the rising edge occurs, the registers propagate the logic levels at their d inputs to their Q outputs Logic T D FIGURE1. Simple Example of a Synchronous Circuit In Figure 1, two important timing parameters are defined Setup Time-Tsu Setup time is the time that the D input to a register must be valid before the clock transitions Hold Time-Th Hold time is the period that the d input to a register must be maintained valid after the clock has transitioned If the setup or hold time parameters are violated terrible things happen. We'll discuss this later in the section on synchronization The Ten Commandments of Excellent Design
Digital Systems 101 The Ten Commandments of Excellent Design 3 the clock. When the rising edge occurs, the registers propagate the logic levels at their D inputs to their Q outputs. FIGURE 1. Simple Example of a Synchronous Circuit In Figure 1, two important timing parameters are defined: • Setup Time—Tsu Setup time is the time that the D input to a register must be valid before the clock transitions. • Hold Time—Th Hold time is the period that the D input to a register must be maintained valid after the clock has transitioned. If the setup or hold time parameters are violated terrible things happen. We’ll discuss this later in the section on synchronization. D Clock Combinatorial Q Logic Inputs Output Clock D Q Th Tsu
Clock Distribution(Yawn) Clock Distribution(Yawn The distribution of clocks throughout a design has received considerable atten- tion with the increase in logic speed Common-or-garden personal computers have bus speeds of 66 MHz, and processor clocks run at 300 MHz or greater. In this article we're concerned more with the possible pitfalls in the synchronous logic itself, not with the production of decent clocks. However, for completeness here are the important parameters necessary for a good clock distribution sys tem design Skew Minimization Clock skew is the variation in time of the clock's active transition being detected by different devices within a system. Skew must be kept to a mini- mum to ensure that setup and hold times are not violated at any one device Methods for managing skew include equal-length traces, zero-delay PLL based buffers, and additional logic for extending hold times The clock's waveform must be as clean and deterministic as possible. Tech- niques used to guarantee consistent clock behavior include transmission line termination, ground-bounce minimization, and the use of identical clock driv The Ten Commandments of Excellent Design
Clock Distribution (Yawn) 4 The Ten Commandments of Excellent Design Clock Distribution (Yawn) The distribution of clocks throughout a design has received considerable attention with the increase in logic speed. Common-or-garden personal computers have bus speeds of 66 MHz, and processor clocks run at 300 MHz or greater. In this article we’re concerned more with the possible pitfalls in the synchronous logic itself, not with the production of decent clocks. However, for completeness, here are the important parameters necessary for a good clock distribution system design: • Skew Minimization Clock skew is the variation in time of the clock’s active transition being detected by different devices within a system. Skew must be kept to a minimum to ensure that setup and hold times are not violated at any one device. Methods for managing skew include equal-length traces, zero-delay PLLbased buffers, and additional logic for extending hold times. • Clock Fidelity The clock’s waveform must be as clean and deterministic as possible. Techniques used to guarantee consistent clock behavior include transmission line termination, ground-bounce minimization, and the use of identical clock drivers
Good State Machine Design One of the designer,s most powerful constructs for synchronous design is the state machine Combining combinatorial logic and a number of registers, the state machine is capable of making decisions based on its inputs and its current state. The behavior of the state machine is entirely synchronous, with all deci- sions taken at the time of the clock transition there are two conventional forms of state machine: Mealy and moore. The characteristics of these machines are shown in Figure 2. Inputs Combinatorial State Combinatorial Clock Moore machine Combinatoria State Combinatorial Outputs Register ogIc Clock Mealy Machine FIGURE 2. Characteristics of Mealy and Moore Machines Moore machines Moore machines are the simpler of the two standard types. The output is a function only of the current state of the machine Mealy Machines The outputs of Mealy machines are a function of the current state of the machine plus the inputs. This additional path provides more flexibility, but ate the understanding of the ma The Ten Commandments of Excellent Design
Good State Machine Design The Ten Commandments of Excellent Design 5 Good State Machine Design One of the designer’s most powerful constructs for synchronous design is the state machine. Combining combinatorial logic and a number of registers, the state machine is capable of making decisions based on its inputs and its current state. The behavior of the state machine is entirely synchronous, with all decisions taken at the time of the clock transition. There are two conventional forms of state machine: Mealy and Moore. The characteristics of these machines are shown in Figure 2. FIGURE 2. Characteristics of Mealy and Moore Machines • Moore Machines Moore machines are the simpler of the two standard types. The output is a function only of the current state of the machine. • Mealy Machines The outputs of Mealy machines are a function of the current state of the machine plus the inputs. This additional path provides more flexibility, but may complicate the understanding of the machine. Clock State Combinatorial Logic Inputs Combinatorial Outputs Logic Register Mealy Machine Clock State Combinatorial Logic Inputs Combinatorial Outputs Logic Register Moore Machine
Good State Machine Design Books on high-level design languages(HDLs)expound at great length on the construction of state machines. The results are frequently disappointing. If you define your state machine in an HDL and run your design through a synthesizer, you may find spaghetti logic that no self-respecting designer would ever put together. What's Wrong with Mealy/ Figure 2 shows that the outputs of both the mealy and Moore forms of state Moore? machine are combinatorial decodes of the current state and, in the Mealy form the inputs. While this is fine in principle, there are pitfalls here waiting to trap the The outputs of the state machine may include the following types of function Latch enables(low-or high-going pulses to open or close latches) Tristate enables(signals to turn on and off drivers onto on-chip or off-chip Register enables(enables to synchronously clocked registers) Other general control signals, such as counter enables, flags, and so on Most of these signals have one characteristic in common-glitches are abso lutely unacceptable at any time. As the state registers and inputs of the Mealy or Moore state machines transition and settle, the combinatorial gates are quite capable of generating glitches as a consequence of the varying gate propaga- tion delays. These transitory glitches may well contain enough energy to open latches, clock registers, and other highly undesirable effects Wouldn't Gray Code Fix the We all learn at an early age that gray code counters are wonderful since only one bit changes at a time When fed to an asynchronous decoder, theory sug- gests that the outputs should settle to their new state without noise. Your author is suspicious of this when the implementation is created by synthesized logic unclocked feed-forward paths might well negate the advantage of gray code There is, however, a greater challenge to the use of gray code. The sequence of transitions taken by a state machine as it does its stuff is likely to be quite elab- rate: many state machines are very complex with many branches between the possible states. Since gray code-driven decodes are only glitch free when a sin gle bit changes at each clock edge, the designer must assure that all possible state transitions result in only a single bit change of the state variable. This is practical in only the simplest of state machines A Much Better State Machine Figure 3 shows a much better design for a state machine. By adding an output register(with cleanly clocked D-type flip-flops) that is reloaded at each clock edge, the outputs of the state machine are guaranteed to be always glitch-free The Ten Commandments of Excellent Design
Good State Machine Design 6 The Ten Commandments of Excellent Design Books on high-level design languages (HDLs) expound at great length on the construction of state machines. The results are frequently disappointing. If you define your state machine in an HDL and run your design through a synthesizer, you may find spaghetti logic that no self-respecting designer would ever put together. What’s Wrong with Mealy/ Moore? Figure 2 shows that the outputs of both the Mealy and Moore forms of state machine are combinatorial decodes of the current state and, in the Mealy form, the inputs. While this is fine in principle, there are pitfalls here waiting to trap the unwary. The outputs of the state machine may include the following types of function: • Latch enables (low- or high-going pulses to open or close latches) • Tristate enables (signals to turn on and off drivers onto on-chip or off-chip buses) • Register enables (enables to synchronously clocked registers) • Other general control signals, such as counter enables, flags, and so on. Most of these signals have one characteristic in common—glitches are absolutely unacceptable at any time. As the state registers and inputs of the Mealy or Moore state machines transition and settle, the combinatorial gates are quite capable of generating glitches as a consequence of the varying gate propagation delays. These transitory glitches may well contain enough energy to open latches, clock registers, and other highly undesirable effects. Wouldn’t Gray Code Fix the Problem? We all learn at an early age that gray code counters are wonderful since only one bit changes at a time. When fed to an asynchronous decoder, theory suggests that the outputs should settle to their new state without noise. Your author is suspicious of this when the implementation is created by synthesized logic; unclocked feed-forward paths might well negate the advantage of gray code. There is, however, a greater challenge to the use of gray code. The sequence of transitions taken by a state machine as it does its stuff is likely to be quite elaborate; many state machines are very complex with many branches between the possible states. Since gray code-driven decodes are only glitch free when a single bit changes at each clock edge, the designer must assure that all possible state transitions result in only a single bit change of the state variable. This is practical in only the simplest of state machines. A Much Better State Machine Figure 3 shows a much better design for a state machine. By adding an output register (with cleanly clocked D-type flip-flops) that is reloaded at each clock edge, the outputs of the state machine are guaranteed to be always glitch-free
Feeding Inputs and Resets to Your State Machine is suggested that all state machines be implemented in this form, since the quality of the outputs is independent of the number of states or outputs State uts Combinatorial Register Outputs Clock FIGURE 3. A Much Better State Machine Feeding Inputs and resets to Your State Machine Reset signals are traditionally asynchronous and are routed directly to the clear inputs of state machine register elements. When the reset is asserted all regis- ters(state and output bits)are cleared immediately. All well and good, but what happens when the reset is deasserted Consider a state machine that will tran- sition from the reset state to some other state directly after the reset is deas- serted. If the reset deasserts close to a clock edge, some of the state bits will assume their new states, while others might not. The state machine ends up in an undefined error state, and, yet again, you have egg on your face The solution? Synchronize that darned reset! That way, the reset will be removed well before the clock edge, and all register elements will correctly tran- sition to their new states Synchronize A∥ State In fact, every input to your state machine must be synchronous At the very Machine Inputs least, you must be absolutely certain that no input will violate the setup and hold times of the state machine's state and output registers The Ten Commandments of Excellent Design
Feeding Inputs and Resets to Your State Machine The Ten Commandments of Excellent Design 7 It is suggested that all state machines be implemented in this form, since the quality of the outputs is independent of the number of states or outputs. FIGURE 3. A Much Better State Machine Feeding Inputs and Resets to Your State Machine Reset signals are traditionally asynchronous and are routed directly to the clear inputs of state machine register elements. When the reset is asserted, all registers (state and output bits) are cleared immediately. All well and good, but what happens when the reset is deasserted? Consider a state machine that will transition from the reset state to some other state directly after the reset is deasserted. If the reset deasserts close to a clock edge, some of the state bits will assume their new states, while others might not. The state machine ends up in an undefined error state, and, yet again, you have egg on your face. The solution? Synchronize that darned reset! That way, the reset will be removed well before the clock edge, and all register elements will correctly transition to their new states. Synchronize All State Machine Inputs In fact, every input to your state machine must be synchronous. At the very least, you must be absolutely certain that no input will violate the setup and hold times of the state machine’s state and output registers. Clock State Combinatorial Logic Inputs Outputs Register Output Register
Dead States-The Purgatory of state Machine Dead States-The Purgatory of state Machines State machines with encoded state bits don' t always use all possible states. For example, if you have a 20-state state machine, you would use a five-bit state register. This would leave 12 unused state values. Since states are usually counted incrementally from zero, our example would look like this States What The States are Used For Normal operation 20-31 Not used: these are"dead"states If the state machine ever enters a state 20-31, errors are likely; worse, the machine may lock up totally, with the state machine forever in one of these ille- gal states. It may require a hard reset to recover from this condition Clearly, it,'s best to ensure your state machine never reaches a dead state. How ever, a robust design will at a minimum ensure that if the state machine does enter a dead state, it will exit the dead state immediately and then perhaps enter a quiescent state The Ten Commandments of Excellent Design
Dead States—The Purgatory of State Machines 8 The Ten Commandments of Excellent Design Dead States—The Purgatory of State Machines State machines with encoded state bits don’t always use all possible states. For example, if you have a 20-state state machine, you would use a five-bit state register. This would leave 12 unused state values. Since states are usually counted incrementally from zero, our example would look like this: If the state machine ever enters a state 20-31, errors are likely; worse, the machine may lock up totally, with the state machine forever in one of these illegal states. It may require a hard reset to recover from this condition. Clearly, it’s best to ensure your state machine never reaches a dead state. However, a robust design will at a minimum ensure that if the state machine does enter a dead state, it will exit the dead state immediately and then perhaps enter a quiescent state. States What The States are Used For 0-19 Normal operation. 20-31 Not used: these are “dead” states
Crossing Clock Domains Moving information from one clock domain to another is rather like descending into Dante's inferno. All sorts of evils lie in wait to beset the naive. Setup and hold violations, metastability conditions, unreliable data, and other perils are manifest when moving from one clock domain to another. Indeed, the whole issue of synchronization might merit its own article. Here, a few tips will be pre sented which might help in resolving the block-to-block synchronization issues First, let's define the problem; please see Figure 4 Block A Block B Strobe A-B Data A-B Clock domain a clock domain b Strobe b-a Data b-a clock A Clock B FIGURE 4. Crossing clock Domains We have two blocks of logic, A and B Block A operates with Clock A, while Block B operates with Clock B We make no assumptions at all about the fre quencies of Clock A and Clock B; nor do we assume any integer or multiple relationship between the two. The two clocks are totally independent We need to send a strobe from Block a to Block B Strobe A-B), and also some data, Data A-B. In response, Strobe B-A returns, together with Data B-A. The transmission of information between the blocks must be absolutely reliable. To accomplish this, we will look at several aspects of the cross-domain problem The Ten Commandments of Excellent Design
Crossing Clock Domains The Ten Commandments of Excellent Design 9 Crossing Clock Domains Moving information from one clock domain to another is rather like descending into Dante’s inferno. All sorts of evils lie in wait to beset the naive. Setup and hold violations, metastability conditions, unreliable data, and other perils are manifest when moving from one clock domain to another. Indeed, the whole issue of synchronization might merit its own article. Here, a few tips will be presented which might help in resolving the block-to-block synchronization issues. First, let’s define the problem; please see Figure 4. FIGURE 4. Crossing Clock Domains We have two blocks of logic, A and B. Block A operates with Clock A, while Block B operates with Clock B. We make no assumptions at all about the frequencies of Clock A and Clock B; nor do we assume any integer or multiple relationship between the two. The two clocks are totally independent. We need to send a strobe from Block A to Block B (Strobe A-B), and also some data, Data A-B. In response, Strobe B-A returns, together with Data B-A. The transmission of information between the blocks must be absolutely reliable. To accomplish this, we will look at several aspects of the cross-domain problem. Clock Domain A Clock A Clock Domain B Clock B Strobe A-B Strobe B-A Data A-B Data B-A Block A Block B
Synchronization 101 Crossing between clock domains is a similar issue to managing asynchronous ce no relationship between the multiple clock domains can assumed, the inputs from Block A to Block B must be assumed to be asynchro nous inputs. The traditional way of synchronizing an asynchronous input signal is shown in Figure 5 nput from Output to Block b' s Clock B FIGURE 5. Synchronizing an Asynchronous Input Two D-type flip-flops are used; two synchronization stages are usually sufficient. Only the rarest applications might demand three stages of synchronization. If your silicon library supports metastable-hardened flip-flops, then the first stage hould use such a device. Typically, metastable-hardened flip-flops guarantee at their Q outputs will settle after a given maximum time, no matter how close the data transition is to the flip-fiop's clock edge This method of information interchange has one drawback. If the strobe has the form of a pulse, it may not be seen by the destination block if the pulse width is less than the destination block's clock(sampling)frequency. This is not a prob- lem if the two blocks exchange levels instead of pulses; however, this is slow, as typically four level exchanges must occur for a two-way handshake. The toggle method described later is an excellent solution to this problem Single-Point Information Imagine that Block A needs to send two bits of information to Block B We could simply duplicate the circuit in Figure 5, with one synchronization circuit for each bit. There is a serious problem which should be clear: occasionally, the circum- stance will arise when one bit gets through the two-stage synchronization cir- cuit, while the other does not. The result is ambiguous information and errors The solution is shown back in Figure 4-use a single strobe from Block A to Block B, and send the rest of the information separately. The single-point strobe from a to B informs the destination block that the Data A-B is valid; the originat ing block ensures that there is adequate setup time The Ten Commandments of Excellent Design
Crossing Clock Domains 10 The Ten Commandments of Excellent Design Synchronization 101 Crossing between clock domains is a similar issue to managing asynchronous inputs. Since no relationship between the multiple clock domains can be assumed, the inputs from Block A to Block B must be assumed to be asynchronous inputs. The traditional way of synchronizing an asynchronous input signal is shown in Figure 5: FIGURE 5. Synchronizing an Asynchronous Input Two D-type flip-flops are used; two synchronization stages are usually sufficient. Only the rarest applications might demand three stages of synchronization. If your silicon library supports metastable-hardened flip-flops, then the first stage should use such a device. Typically, metastable-hardened flip-flops guarantee that their Q outputs will settle after a given maximum time, no matter how close the data transition is to the flip-flop’s clock edge. This method of information interchange has one drawback. If the strobe has the form of a pulse, it may not be seen by the destination block if the pulse width is less than the destination block’s clock (sampling) frequency. This is not a problem if the two blocks exchange levels instead of pulses; however, this is slow, as typically four level exchanges must occur for a two-way handshake. The toggle method described later is an excellent solution to this problem. Single-Point Information Imagine that Block A needs to send two bits of information to Block B. We could simply duplicate the circuit in Figure 5, with one synchronization circuit for each bit. There is a serious problem which should be clear: occasionally, the circumstance will arise when one bit gets through the two-stage synchronization circuit, while the other does not. The result is ambiguous information and errors. The solution is shown back in Figure 4—use a single strobe from Block A to Block B, and send the rest of the information separately. The single-point strobe from A to B informs the destination block that the Data A-B is valid; the originating block ensures that there is adequate setup time. D Clock B Q Input from Output to Block A D Q Block B’s Logic