## Architecture Evaluation for Power-efficient FPGAs

Fei Li\*, Deming Chen+, Lei He\*, Jason Cong+

\* EE Department, UCLA

<sup>+</sup> CS Department, UCLA

Partially supported by NSF and SRC

## Outline

- □ Introduction
- Evaluation Flow
- Architecture Model
- Description Power Model
- Architecture Evaluation Results
- Conclusions

# Existing FPGAs are known to be power inefficient

◆ E.g. [Kusse, ISLPED'98]

| Design<br>Example | Vdd  | Energy    |
|-------------------|------|-----------|
| Xilinx XC4003A    | 5v   | 4.2mW/MHz |
| Static CMOS       | 3.3v | 5.5uW/MHz |

Table1 8-bit adder

100X power overhead

### Need to explore power efficient FPGAs

## **Evaluation Framework** — *fpgaEva-LP*

□fpgaEva-flew [Cong, et al, ICCD'00]



### □ Logic Block [Ahmed-Rose, FPGA2000]



Feb. 2002

FPGA Symposium 2003

### □ Routing Structure [Betz-Rose, FPGA1999]



Parameters:

Wire segment length

Switch-box type

Buffer/Pass transistor distribution

Connection box configuration

## **BC-Netlist Generator**



### **Capacitance Extraction and Delay Calculation**



- Wires segmented by buffers and pass transistors
- Capacitance: lumped from all branches for wires, pass transistors and gates (buffers)
- Delay: Elmore Delay model

## Mixed-level Power Model – Overview

#### Dynamic power

- Switching power
- Short-circuit power
- Related to signal transitions
  - Functional switch

#### Static Power

- Sub-threshold leakage
- Reverse biased leakage
- Depending on the input vector



Feb. 2002

## Macromodeling – Dynamic Power

□ Pre-characterized average power per access

- Based SPICE simulation with random input vectors
- Both switching power and short-circuit power
- Applied to LUTs that have the regularity of connection

## Verification

|                         | SPICE simulation | Our Power<br>Model | Error  |
|-------------------------|------------------|--------------------|--------|
| Total Energy<br>(Jourl) | 1.42E-11         | 1.27E-11           | 10.56% |

#### 200 random input vectors

- Input pattern dependent
- Pre-characterized average static power
  - Input vectors are grouped into vector sets
  - Typical vectors are simulated in each set
  - Save SPICE simulation time

Applied to both LUTs and Interconnect buffs

## Switch-level Model – Interconnect Switching Power

Switching power without glitches

$$P_{sw} = 0.5 f \cdot V_{dd}^2 \cdot \sum_{i=1}^n C_i \cdot E_i$$
$$= 0.5 f \cdot V_{dd}^2 \cdot \sum_{i=1}^n C_i \cdot (N_i / cycles)$$

#### Effective transition number

$$\hat{N}_{i}(\text{rising}) = \frac{(V_{1} - V_{2})(V_{1} + V_{2} - 2V_{dd})}{V_{dd}^{2}}N_{i}$$

Switching power with glitches

$$P_{SW} = 0.5 f \cdot V_{dd}^{2} \cdot \sum_{i=1}^{n} C_{i} \hat{E}_{i}$$
$$= 0.5 f \cdot V_{dd}^{2} \cdot \sum_{i=1}^{n} C_{i} (\hat{N}_{i} / cycles)$$





Feb. 2002

## Switch-level Model – Interconnect Short-Circuit Power

### Short-circuit power

- Fixed ratio between short-circuit and switching power
- The ratio is decided by SPICE simulation (13%)



FPGA Symposium 2003

## **Power Simulator**



## **Experimental Settings**

| Technology | R_NMOS   | R_wire         | C_wire     |
|------------|----------|----------------|------------|
| 0.1 µ      | 5300 Ohm | 0.91667 MOhm/m | 73.8 aF/um |

| Logic Block Architectures |                       |  |  |
|---------------------------|-----------------------|--|--|
| LUT Size <i>k</i>         | 3 – 7                 |  |  |
| Cluster Size <i>N</i>     | 4, 8, 12, 16, 20      |  |  |
| Routing Architectures     |                       |  |  |
| routing_default           | wire length 4,        |  |  |
|                           | 50% buffers and 50%   |  |  |
|                           | pass transistors      |  |  |
| routing_fullbuf1          | wire length 4,        |  |  |
|                           | 100% buffers          |  |  |
| routing fullbuf2          | wire lengths 4 and 8, |  |  |
| 5_                        | 100% buffers          |  |  |

### **Experiments on Logic Block Architectures**



- □ LUT Size = 4 is also optimal for power consumption
- □ Cluster Size = 12 is the optimal cluster size

Feb. 2002

FPGA Symposium 2003



**Cluster Size = 4** 

FPGA Symposium 2003

## **Power Breakdown**

Cluster Size = 12, LUT Size = 4

Cluster Size = 12, LUT Size = 6



#### □ Interconnect power is dominant

Feb. 2002

## **Power Breakdown**



#### □ Leakage power becomes increasingly important

Feb. 2002

- Developed an architecture evaluation framework *fpgaEVA-LP* for power efficiency
- Performed quantitative analysis for parameterized FPGA architecture
- Identified future directions for FPGA power optimization
  - Interconnect power is dominant
  - Leakage power is becoming important