



# Introduction

- Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.
- Defines the precise instants when the circuit is allowed to change the state.
- Clock signal should appear to all the processing elements at the same instance.

# Review: Sequential Definitions

- Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent)
- Static storage
  - static uses a bistable element with feedback to store its state and thus preserves state as long as the power is on
    - Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based)
- Dynamic storage
  - dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh
  - dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit so that it is pseudostatic

# **Timing Classifications**

- Synchronous systems
  - All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i.e., a global clock signal)
  - Functionality is ensure by strict constraints on the clock signal generation and distribution to minimize
    - Clock skew (spatial variations in clock edges)
    - Clock jitter (temporal variations in clock edges)
- Asynchronous systems
  - Self-timed (controlled) systems
  - No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc.)
- Hybrid systems
  - Synchronization between different clock domains
  - Interfacing between asynchronous and synchronous domains

### **Clock Definition and Parameters**

• The Clock is a periodic synchronization signal used as a time reference for data transfer in synchronous digital system.



- Skew
- Spatial variation of clock signal as distributed through the chip
- **Clock Jitter**

Temporal variation of the clock with respect to reference edge

- Duty cycle variation
  - 50/50 design target

# **Review: Synchronous Timing Basics**



# Review: Synchronous Timing Basics

- Under ideal conditions (i.e., when tclk1 = tclk2)
- $T \ge tc-q + tplogic + tsu$
- thold ≤ tcdlogic + tcdreg
- Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations
  - skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions)
  - jitter causes T to change on a cycle-by-cycle basis

### Processor Frequency Trend



### Clock Skew Trend



# **Clock Jitter Trend**



# Sources of Clock Skew and Jitter in Clock Network (4) power supply



#### Skew

- manufacturing device variations in clock drivers
- interconnect variations
- environmental variations (power supply and temperature)

- Jitter
  - clock generation
  - capacitive loading and coupling
  - environmental variations (power supply and temperature)

# Sources of Clock Skew

 With a perfectly balanced distribution, device mismatch is the largest contributor to the clock skew



# **Clock-special signal**

- Clock signals are often regarded as simple control signals; however, these signals have some very special characteristics and attributes.
- loaded with the greatest fanout,
- ravel over the longest distances,
- and operate at the highest speeds of any signal, either control or data, within the entire system.

# Integral part of system design

- Tradeoffs --- system speed, physical die area, and power dissipation are greatly affected by the clock distribution network.
- The design methodology and structural topology of the clock distribution network should be considered in the development of a system for distributing the clock signals.

### Requirements

- Clock waveforms must be particularly clean and sharp.
- No skew

# Difficulty

 The requirement of distributing a tightly controlled clock signal to each synchronous register on a large hierarchically structured integrated circuit within specific temporal bounds is difficult.

# Again: Synchronous Timing Basics



# Review: Synchronous Timing Basics

- Under ideal conditions (i.e., when tclk1 = tclk2)
- $T \ge tc-q + tplogic + tsu$
- thold ≤ tcdlogic + tcdreg
- Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations
  - skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions)
  - jitter causes T to change on a cycle-by-cycle basis

# Sources of Clock Skew and Jitter in Clock Network (4) power supply



#### Skew

- manufacturing device variations in clock drivers
- interconnect variations
- environmental variations (power supply and temperature)

- Jitter
  - clock generation
  - capacitive loading and coupling
  - environmental variations (power supply and temperature) 20

# Positive Clock Skew

 Clock and data flow in the same direction



Τ:

### t<sub>hold</sub>

### **Positive Clock Skew**

Clock and ulletdata flow in the same direction

Τ:



### **Positive Clock Skew**

 δ > 0: Improves performance, but makes t<sub>hold</sub> harder to meet. If t<sub>hold</sub> is not met (race conditions), the circuit malfunctions independent of the clock period!

# Negative Clock <u>Skew</u>

 Clock and data flow in opposite directions



Т:

### t<sub>hold</sub>

-

### Negative Clock Skew

 Clock and data flow in opposite directions



$$\begin{array}{cccc} \text{TT:} + \delta \geq t_{c-q} + t_{plogic} + t_{su} & \text{so} & \text{T} \geq t_{c-q} + t_{plogic} + t_{su} + t_{su} & \text{so} & \text{T} \geq t_{c-q} + t_{plogic} + t_{su} + \delta & t_{su} - \delta & t_{su} + \delta &$$

### Negative Clock Skew

 δ < 0: Degrades performance, but thold is easier to meet (eliminating race conditions)

# **Clock Jitter**

 Jitter causes T to vary on a cycleby-cycle basis

Τ:



# **Clock Jitter**

- **R1** Combinational Jitter causes T In logic to vary on a <sup>L</sup>clk clk cycle-by-cycle basis +τ<sub>iitter</sub>  $\mathbf{T} - 2\mathbf{t}_{jitter} \ge \mathbf{t}_{c-q} + \mathbf{t}_{plogic} + \mathbf{t}_{su} \quad so \quad \mathbf{T} \ge \mathbf{t}_{c-q} + \mathbf{t}_{plogic}$ + t<sub>su</sub> + 2t<sub>iitter</sub>
  - Jitter directly reduces the performance of a sequential circuit

# Combined Impact of Skew and Jitter

Constraints

 on the
 minimum
 clock period
 (δ > 0)



$$\begin{split} \mathsf{T} \geq \mathsf{t}_{\mathsf{c}-\mathsf{q}} + \mathsf{t}_{\mathsf{plogic}} + \mathsf{t}_{\mathsf{su}} - \delta + 2\mathsf{t}_{\mathsf{jitter}} & \mathsf{t}_{\mathsf{hold}} \leq \mathsf{t}_{\mathsf{cdlogic}} \\ & + \mathsf{t}_{\mathsf{cdreg}} - \delta - 2\mathsf{t}_{\mathsf{jitter}} \end{split}$$

•  $\delta$  > 0 with jitter: Degrades performance, and makes  $t_{hold}$  even *harder* to meet. (The acceptable skew is reduced by jitter.)



Fig. 9. Floorplan of structured custom VLSI circuit requiring synchronous clock distribution.

# **Technology scaling**

 Technology scaling, in that long global interconnect lines become much more highly resistive as line dimensions are decreased. This increased line resistance is one of the primary reasons for the growing importance of clock distribution on synchronous performance. Clock distribution strategies (only relative phase between two clocking element is important)

# Achieve Zero skew routing

 Route clock to destinations such that clock edges appear at the same time

### **Clock tree**

- Single driver----If the interconnect resistance of the buffer at the clock source is small as compared to the buffer output resistance,
- maintaining high-quality waveform shapes (i.e., short transition times)
- Use elmore formula to compute delay
- Balance delay paths
- Drawback---large delay, drive capability should be high

# Terminology

- The unique clock source is frequently described as the root of the tree,
- the initial portion of the tree as the trunk,
- individual paths driving each register as the branches,
- and the registers being driven as the leaves

# Buffered clock Tree interconnect resistance large

- The most common and general approach to equi-potential clock distribution is the use of buffered trees,
- It leads to an asymmetric structure
- ALL PATHS ARE BALANCED





#### **Buffered clock Tree**

 Insert buffers either at the clock source and/or along a clock path, forming a tree structure.

#### **Buffers**

- The distributed buffers serve the double function of
- amplifying the clock signals degraded by the distributed interconnect impedances and
- isolating the local clock nets from upstream load impedances

### DESIGN

- All nodes have capacitance
- All branches have resistance
- Fix the load (fan out ) of each buffer
- Compute no .of levels required
- Position the buffers optimally

Guidelines- minimize delay

buffer delay=segment delay

#### Pentium<sup>®</sup> 4 Processor Clock Network



#### **3D Skew Visualization**



#### Mesh version of clk tree



#### Mesh version of clock tree

- Shunt paths further down the clock distribution network are placed to minimize the interconnect resistance within the clock tree.
- This mesh structure effectively places the branch resistances in parallel, minimizing the clock skew.

# **Clock Distribution Networks**







Grid







H-Tree



Tapered H-Tree<sub>46</sub>

### **CDN** properties

- H TREE—symmetric, regular array, clk skew can be small
- X TREE- variant of H TREE
- Zero skew is achieved maintaining the distributed interconnect and buffers to be identical from the clock signal source to the clocked register of each clock path.
- each clock path from
- the clock source to a clocked register has practically the same delay.

#### Skew

- The primary delay difference between the clock signal paths is due to variations in process parameters that affect the interconnect impedance and, in particular, any active distributed buffer amplifiers.
- The amount of clock skew within an H-tree structured clock distribution network is strongly dependent upon the physical size, the control of the semiconductor process, and the degree to which active buffers are distributed within the Htree structure

# **Clock Distribution**



H-tree

Clock is distributed in a tree-like fashion

### H-Tree Clock Network

• If the paths are perfectly balanced, clock skew is zero



# More realistic H-tree



[Restle98]

#### Tapered H tree

- The conductor widths in H-tree structures are designed to progressively decrease as the signal propagates to lower levels of the hierarchy.
- This strategy minimizes reflections of the high-speed clock signals at the branching points.

$$Z_k = \frac{Z_{k+1}}{2}$$
, for an H-tree structure

#### H Tree----Difficulty -1

- Clock routed in both the vertical and horizontal directions. For a standard twolevel metal CMOS process, this manhattan structure creates added difficulty in routing the clock lines without using either resistive interconnect or multiple high resistance vias between the two metal lines.
- 3 level metal process

### Difficulty -2

- Furthermore, the interconnect capacitance (and therefore the power dissipation) is much greater for the H-tree as compared with the standard clock tree since the total wire length tends to be much greater
- An important tradeoff between clock delay and clock skew in the design of highspeed clock distribution networks.

#### Grid

- Low skew achievable
- Lots of excess interconnect
- Large power dissipation

#### Clock distribution-hierarchical

- Distribute global reference to various parts of the chip with zero skew
- Local distribution of the clock while considering local load variations., permitted clk skew, . Power saving strategies are used here.

### GCLK

- Gridded global clock signal (GCLK) is distributed over the entire IC in order to maintain a low-resistance reference clock signal and to distribute the power dissipated by the clock distribution network across the die area
- The global clock signal GCLK is the source of thousands of buffered and conditional (or gated) clock signals driving registers across the IC



# Example: DEC Alpha 21164

**Clock Frequency: 300 MHz - 9.3 Million Transistors** 

Total Clock Load: 3.75 nF

**Power in Clock Distribution network : 20 W (out of 50)** 

**Uses Two Level Clock Distribution:** 

- Single 6-stage driver at center of chip
- Secondary buffers drive left and right side clock grid in Metal3 and Metal4
   Total driver size: 58 cm!

## DEC Alpha 21164 (EV5)

- 300 MHz clock (9.3 million transistors on a 16.5x18.1 mm die in 0.5 micron CMOS technology)
  - single phase clock
- 3.75 nF total clock load
  - Extensive use of dynamic logic
- 20 W (out of 50) in clock distribution network
- Two level clock distribution
  - Single 6 stage driver at the center of the chip
  - Secondary buffers drive the left and right sides of the clock grid in m3 and m4
- Total equivalent driver size of 58 cm !!

# 21164 Clocking



- 2 phase single wire clock, distributed globally
- 2 distributed driver channels
  - Reduced RC delay/skew
  - Improved thermal distribution
  - 3.75nF clock load
  - 58 cm final driver width
- Local inverters for latching
- Conditional clocks in caches to reduce power
- More complex race checking
- Device variation



### EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS





- 2 Phase, with multiple conditional buffered clocks
  - 2.8 nF clock load
  - 40 cm final driver width
- Local clocks can be gated "off" to save power
- Reduced load/skew
- Reduced thermal issues
- Multiple clocks complicate race checking

# 21264 Clocking









# **EV6 Clock Results**





GCLK Skew (at Vdd/2 Crossings)

GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%)