# Outline

- 1. Derivation of efficient HDL description
- 2. Operator sharing
- 3. Functionality sharing
- 4. Layout-related circuits
- 5. General circuits

| RTL Hardware Design | Chapter 7 |
|---------------------|-----------|
| by P. Chu           |           |

| RTL Hardware Design |  |
|---------------------|--|
| by P. Chu           |  |

1

3

Chapter 7

2

#### 1. Derivation of efficient HDL description

Combinational Circuit Design:

Practice

- Think "H", not "L", of HDL
- Right way:
  - Research to find an efficient design ("domain knowledge")
  - Develop VHDL code that accurately describes the design
- Wrong way:
  - Write a C program and covert it to HDL

| RTL Hardware Design | Chapter 7 |
|---------------------|-----------|
| by P. Chu           |           |

#### An example 0.55 um standard-cell CMOS implementation

| width    |             |     |       | VE    | IDL op  | perator |        |       |       |     |
|----------|-------------|-----|-------|-------|---------|---------|--------|-------|-------|-----|
|          | nand        | xor | $>_a$ | $>_d$ | =       | $+1_a$  | $+1_d$ | $+_a$ | $+_d$ | mux |
|          |             |     |       | are   | a (gate | count   | )      |       |       |     |
| 8        | 8           | 22  | 25    | 68    | 26      | 27      | 33     | 51    | 118   | 21  |
| 16       | 16          | 44  | 52    | 102   | 51      | 55      | 73     | 101   | 265   | 42  |
| 32       | 32          | 85  | 105   | 211   | 102     | 113     | 153    | 203   | 437   | 85  |
| 64       | 64          | 171 | 212   | 398   | 204     | 227     | 313    | 405   | 755   | 171 |
|          |             |     |       |       | delay   | (ns)    |        |       |       |     |
| 8        | 0.1         | 0.4 | 4.0   | 1.9   | 1.0     | 2.4     | 1.5    | 4.2   | 3.2   | 0.3 |
| 16       | 0.1         | 0.4 | 8.6   | 3.7   | 1.7     | 5.5     | 3.3    | 8.2   | 5.5   | 0.3 |
| 32       | 0.1         | 0.4 | 17.6  | 6.7   | 1.8     | 11.6    | 7.5    | 16.2  | 11.1  | 0.3 |
| 64       | 0.1         | 0.4 | 35.7  | 14.3  | 2.2     | 24.0    | 15.7   | 32.2  | 22.9  | 0.3 |
| RTL Hard | lware Desig | jn  |       | Ch    | apter 7 |         |        |       |       | 5   |

# Sharing

- Circuit complexity of VHDL operators varies
- Arith operators
  - Large implementation
  - Limited optimization by synthesis software

Chapter 7

- "Optimization" can be achieved by "sharing" in RT level coding
  - Operator sharing
  - Functionality sharing

RTL Hardware Design by P. Chu

# 2. Operator sharing

- "value expressions" in priority network and multiplexing network are mutually exclusively:
- Only one result is routed to output

. . .

- Conditional sig assignment (if statement) sig\_name <= value\_expr\_1 when boolean\_expr\_1 else value\_expr\_2 when boolean\_expr\_2 else value\_expr\_3 when boolean\_expr\_3 else

#### value\_expr\_n;

RTL Hardware Design Chapter 7 by P. Chu



### Example 1

- Original code:
   r <= a+b when boolean\_exp else a+c;
- Revised code: src0 <= b when boolean\_exp else c; r <= a + src0;</li>

RTL Hardware Design by P. Chu



# Example 2

Chapter 7

8

10

Original code:
 process(a,b,c,d,...)
 begin
 if boolean\_exp\_1 then
 r <= a+b;
 elsif boolean\_exp\_2 then
 r <= a+c;
 else
 r <= d+1;
 end if
 end process;
 RTL Hardware Design
 Chapter 7</pre>



#### Example 3

| ٠ | Original code:                        |
|---|---------------------------------------|
|   | with sel select                       |
|   | r <= a+b <b>when</b> "00",            |
|   | a+c <b>when</b> "01",                 |
|   | d+1 when others;                      |
| ٠ | Revised code:                         |
|   | with sel_exp select                   |
|   | src0 <= a <b>when</b> "00" "01",      |
|   | d <b>when others</b> ;                |
|   | with sel_exp select                   |
|   | src1 <= b <b>when</b> "00",           |
|   | c <b>when</b> "01",                   |
|   | "00000001" when others;               |
|   | $r \le src0 + src1;$                  |
|   | L Hardware Design Chapter 7<br>P. Chu |



Area: Area: 2 adders, 1 inc, 1 mux 1 adder, 2 mux RTL Hardware Design Chapter 7 14

#### Example 4

 Original code: process(a,b,c,d,...) begin if boolean\_exp then x <= a + b; y <= (others=>'0'); else x <= (others=>'0'); y <= c + d; end if; end process;

 RTL Hardware Design DyP. Chu



2 adders, 2 mux

| Chapter 7 | 15 | RTL Hardware Design<br>by P. Chu | Chapter 7 |
|-----------|----|----------------------------------|-----------|
|           |    |                                  |           |
|           |    |                                  |           |

13

• Revised code: begin if boolean\_exp then src0 <= a; src1 <= b; x <= sum; y <= (**others**=>'0'); else othe input src0 <= c; src1 <= d; x <= (**others**=>'1'); y <= sum; end if; end process; sum <= src0 + src1;</pre> 17 RTL Hardware Design by P. Chu Chapter 7



- Area: 1 adder, 4 mux
- Is the sharing worthwhile?
  - 1 adder vs 2 mux
  - It depends . . .

RTL Hardware Design Chapter 7 18 by P. Chu 18

#### Summary

- Sharing is done by additional routing circuit
- Merit of sharing depends on the complexity of the operator and the routing circuit
- · Ideally, synthesis software should do this

RTL Hardware Design Chapter 7 19 by P. Chu

#### 3. Functionality sharing

- A large circuit involves lots of functions
- Several functions may be related and have common characteristics
- Several functions can share the same circuit.
- Done in an "ad hoc" basis, based on the understanding and insight of the designer (i.e., "domain knowledge")
- Difficult for software it since it does not know the "meaning" of functions

RTL Hardware Design Chapter 7 20 by P. Chu

#### e.g., add-sub circuit

| library ieee;                           | ctrl | operation |
|-----------------------------------------|------|-----------|
| use ieee.std_logic_1164.all;            | ~    | a + b     |
| use ieee.numeric_std.all;               | 0    | a + b     |
| entity addsub is                        | 1    | a - b     |
| port (                                  | •    |           |
| a,b: in std_logic_vector(7 downto 0)    | ; [  |           |
| ctrl: in std_logic;                     |      |           |
| r: out std_logic_vector(7 downto 0)     |      |           |
| );                                      |      |           |
| end addsub;                             |      |           |
|                                         |      |           |
| architecture direct_arch of addsub is   |      |           |
| signal src0, src1, sum: signed(7 downto | 0);  |           |
| begin                                   |      |           |
| <pre>src0 &lt;= signed(a);</pre>        |      |           |
| <pre>src1 &lt;= signed(b);</pre>        | 1    |           |
| sum <= src0 + src1 when ctrl='0' else   |      |           |
| src0 - src1;                            |      |           |
| r <= std_logic_vector(sum);             |      |           |
|                                         |      | 21        |
| end direct_arch;                        |      |           |

#### • Observation: a – b can be done by a + b' + 1

```
architecture shared_arch of addsub is
   signal src0, src1, sum: signed(7 downto 0);
   signal cin: signed(0 downto 0); --- carry-in bit
begin
   src0 <= signed(a);
   src1 <= signed(b) when ctrl='0' else
        signed(not b);
   cin <= "0" when ctrl='0' else
        "1";
   sum <= src0 + src1 + cin;
   r <= std_logic_vector(sum);
end shared_arch;</pre>
```

RTL Hardware Design by P. Chu Chapter 7

22



 $x_7x_6x_5x_4x_3x_2x_1x_01$  and  $y_7y_6y_5y_4y_3y_2y_1y_0c_{in}$ 

```
architecture manual_carry_arch of addsub is
       signal src0, src1, sum: signed(8 downto 0);
       signal b_tmp: std_logic_vector(7 downto 0);
       signal cin: std_logic; -- carry-in bit
   begin
      src0 <= signed(a & '1');</pre>
      b_tmp <= b when ctrl='0' else
                not b;
      cin <= '0' when ctrl='0' else</pre>
              '1';
      src1 <= signed(b_tmp & cin);</pre>
      sum <= src0 + src1;</pre>
      r <= std_logic_vector(sum(8 downto 1));</pre>
   end manual_carry_arch;
RTL Hardware Design 
by P. Chu
                         Chapter 7
                                                        23
```



# e.g., sign-unsigned comparator

| library ieee;                                            |     |
|----------------------------------------------------------|-----|
| use ieee.std_logic_1164.all;                             |     |
| use ieee.numeric_std.all;                                |     |
| use leve.humelic_std.all;                                |     |
|                                                          |     |
| entity comp2mode is                                      |     |
| port (                                                   |     |
| <pre>a,b: in std_logic_vector(7 downto 0);</pre>         |     |
|                                                          |     |
| mode: in std_logic;                                      |     |
| agtb: out std_logic                                      |     |
| );                                                       |     |
| end comp2mode;                                           |     |
| end companions,                                          |     |
|                                                          |     |
| architecture direct_arch of comp2mode is                 |     |
| <pre>signal agtb_signed, agtb_unsigned: std_logic;</pre> |     |
| begin                                                    |     |
| 0                                                        |     |
| agtb_signed <= '1' when signed(a) > signed(b) else       |     |
| · · · ;                                                  |     |
| agtb_unsigned <= '1' when unsigned(a) > unsigned(b)      | els |
| .0:                                                      |     |
| ÷ ,                                                      |     |
| agtb <= agtb_unsigned when (mode='0') else               |     |
| agtb_signed;                                             |     |
| end direct_arch ;                                        |     |
|                                                          |     |
|                                                          |     |



| • Same sign: com<br>This works for no<br>E.g., 1111 (-1), 1<br>111 > | al comparator<br>t: positive number is larger<br>pare remaining 3 LSBs<br>egative number, too!<br>1100 (-4), 1001(-7)<br>100 > 001<br>of 3 LSBs can be shared |    | <pre>architecture shared_arcl<br/>signal a1_b0, agtb_mag:<br/>begin<br/>a1_b0 &lt;= '1' when a(7)=<br/>'0';<br/>agtb_mag &lt;= '1' when a(<br/>'0';<br/>agtb &lt;= agtb_mag when (<br/>a1_b0 when mode<br/>not a1_b0;<br/>end shared_arch;</pre> | std_logic;<br>'1' and b(7)='0' else<br>6 downto 0) > b(6 downto 0)<br>a(7)=b(7)) else | else |
|----------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|------|
| RTL Hardware Design<br>by P. Chu                                     | Chapter 7                                                                                                                                                     | 27 | RTL Hardware Design<br>by P. Chu                                                                                                                                                                                                                 | Chapter 7                                                                             | 28   |

# e.g., Full comparator

| <pre>library ieee;<br/>use ieee.std_logic_1164.al<br/>entity comp3 is<br/>port(<br/>a,b: in std_logic_v<br/>agtb, altb, aeqb: 0<br/>);<br/>end comp3 ;<br/>architecture direct_arch o<br/>begin<br/>agtb &lt;= '1' when a &gt; b<br/>'0';<br/>altb &lt;= '1' when a &lt; b<br/>'0';<br/>altb &lt;= '1' when a = b</pre> | ector(15 downio 0);<br>uf std_logic<br>f comp3 is<br>else<br>else |    | <pre>architecture shar<br/>signal gt, lt:<br/>begin<br/>gt &lt;= '1' when<br/>'0';<br/>lt &lt;= '1' when<br/>'0';<br/>agtb &lt;= gt;<br/>altb &lt;= lt;<br/>aeqb &lt;= not (g<br/>end share1_arch;</pre> | a > b else<br>a < b else |    |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|----|
| '0';<br>end direct_arch;                                                                                                                                                                                                                                                                                                |                                                                   |    |                                                                                                                                                                                                          |                          |    |
| RTL Hardware Design Chapt<br>by P. Chu                                                                                                                                                                                                                                                                                  | er 7                                                              | 29 | RTL Hardware Design<br>by P. Chu                                                                                                                                                                         | Chapter 7                | 30 |

```
architecture share2_arch of comp3 is
    signal eq, lt: std_logic;
begin
    eq <= '1' when a = b else
        '0';
    lt <= '1' when a < b else
        '0';
    aeqb <= eq;
    altb <= lt;
    agtb <= not (eq or lt);
end share2_arch;</pre>
```

Chapter 7

• Read 7.3.3 and 7.3.5

4. Layout-related circuits

- After synthesis, placement and routing will derive the actual physical layout of a digital circuit on a silicon chip.
- VHDL cannot specify the exact layout
- VHDL can outline the general "shape"

RTL Hardware Design by P. Chu Chapter 7

32

#### - Silicon chip is a "square"

- "Two-dimensional" shape (tree or rectangular) is better than one-dimensional shape (cascadingchain)
- Conditional signal assignment/if statement form a single "horizontal" cascading chain
- Selected signal assignment/case statement form a large "vertical" mux
- Neither is ideal

RTL Hardware Design by P. Chu

RTL Hardware Design by P. Chu

Chapter 7

33

31

RTL Hardware Design by P. Chu

Chapter 7

34

36





35



Chapter 7

RTL Hardware Design by P. Chu

# e.g., Reduced-xor circuit

 $a_7 \oplus a_6 \oplus a_5 \oplus a_4 \oplus a_3 \oplus a_2 \oplus a_1 \oplus a_0$ library ieee; use ieee.std\_logic\_1164.all; entity reduced\_xor is port ( a: in std\_logic\_vector(7 downto 0); y: out std\_logic ): end reduced\_xor; architecture cascade1\_arch of reduced\_xor is begin y <= a(0) xor a(1) xor a(2) xor a(3) xor a(4) xor a(5) xor a(6) xor a(7); end cascade1\_arch: RTL Hardware Design by P. Chu Chapter 7

37

39

41





Chapter 7

architecture tree\_arch of reduced\_xor is begin y <= ((a(7) xor a(6)) xor (a(5) xor a(4))) xor ((a(3) xor a(2)) xor (a(1) xor a(0))); end tree\_arch;



Comparison of n-input reduced xor

- Cascading chain :
  - Area: (n-1) xor gates
  - Delay: (n-1)
  - Coding: easy to modify (scale)
- -Tree:
  - Area: (n-1) xor gates
  - Delay: log<sub>2</sub>n
  - Coding: not so easy to modify
- Software should able to do the conversion automatically

| RTL Hardware Design | Chapter 7 |
|---------------------|-----------|
| by P. Chu           |           |

 $y_1 = a_1 \oplus a_0$ 

 $y_2 = a_2 \oplus a_1 \oplus a_0$ 

 $y_0 = a_0$ 

- $y_3 = a_3 \oplus a_2 \oplus a_1 \oplus a_0$
- $y_4 = a_4 \oplus a_3 \oplus a_2 \oplus a_1 \oplus a_0$
- $y_5 = a_5 \oplus a_4 \oplus a_3 \oplus a_2 \oplus a_1 \oplus a_0$
- $y_6 = a_6 \oplus a_5 \oplus a_4 \oplus a_3 \oplus a_2 \oplus a_1 \oplus a_0$
- $y_7 = a_7 \oplus a_6 \oplus a_5 \oplus a_4 \oplus a_3 \oplus a_2 \oplus a_1 \oplus a_0$

e.g., Reduced-xor-vector circuit

RTL Hardware Design by P. Chu Chapter 7

42

#### · Direct implementation

| entity z   |      | ced_: | cor_v  | recto | r is    |          |      |       |      |        |     |       |   |
|------------|------|-------|--------|-------|---------|----------|------|-------|------|--------|-----|-------|---|
| port (     |      |       |        |       |         |          |      |       |      |        |     |       |   |
| a :        | in   | std   | logi   | ic_ve | ctor    | (7 do)   | wnto | 0);   |      |        |     |       |   |
| у:         | o u  | I sta | d_108  | gic_v | ector   | r(7 d)   | ownt | o ()  |      |        |     |       |   |
| );         |      |       |        |       |         |          |      |       |      |        |     |       |   |
| end redu   | iced | _xor. | vect   | or;   |         |          |      |       |      |        |     |       |   |
|            |      |       |        |       |         |          |      |       |      |        |     |       |   |
| architec   | tur  | e di: | rect,  | arch  | of      | reduce   | ed_x | or_ve | ctor | is     |     |       |   |
| signa      | l p  | : st  | d_10;  | gic_v | ector   | r (7 d   | ownt | • 0); |      |        |     |       |   |
| begin      | -    |       |        |       |         |          |      |       |      |        |     |       |   |
| y(0)       | <=   | a(0)  |        |       |         |          |      |       |      |        |     |       |   |
| y(1)       | <=   | a(1)  | X 0 I' | a(0)  |         |          |      |       |      |        |     |       |   |
| y(2)       | <=   | a(2)  | xor    | a(1)  | хог     | a(0)     |      |       |      |        |     |       |   |
| y(3)       | <=   | a(3)  | xor    | a(2)  | xor     | a(1)     | xor  | a(0)  |      |        |     |       |   |
| y(4)       | <=   | a(4)  | xor    | a(3)  | xor     | a(2)     | xor  | a(1)  | хог  | a(0)   | :   |       |   |
|            |      |       |        |       |         |          |      |       |      |        |     | a(0); |   |
| y(6)       | <=   | a(6)  | xor    | a(5)  | xor     | a(4)     | xor  | a(3)  | хог  | a(2)   | xor | a(1)  |   |
|            |      | xor a |        |       |         |          |      |       |      |        |     |       |   |
| v(7)       | <=   | a(7)  | xor    | a(6)  | хог     | a(5)     | xor  | a(4)  | xor  | a(3)   | xor | a(2)  |   |
| 2          |      | xor   |        |       |         |          |      |       |      | - (- / |     |       |   |
| end dire   |      |       |        | 401   | a ( 0 ) | ,        |      |       |      |        |     |       |   |
| L Hardware | Doci | an    |        |       | ~       | hapter 7 |      |       |      |        |     |       | 4 |

# • Functionality Sharing architecture shared1\_arch of reduced\_xor\_vector is signal p: std\_logic\_vector(7 downto 0); begin p(0) <= a(0); p(1) <= p(0) xor a(1); p(2) <= p(1) xor a(2); p(3) <= p(2) xor a(3); p(4) <= p(3) xor a(4); p(5) <= p(4) xor a(5); p(6) <= p(6) xor a(6); p(7) <= p(6) xor a(7); y <= p; end shared1\_arch; architecture shared\_compact\_arch of reduced\_xor\_vector is constant WIDTH: integer := 8; signal p: std\_logic\_vector(WIDTH-1 downto 0); begin p <= (p(WIDTH-2 downto 0) & '0') xor a; y <= p; end shared\_compact\_arch;</pre>

RTL Hardware Design Chapter 7 44 by P. Chu

#### • Direct tree implementation

| architecture direct_tree_arch of reduced_xor_vector is   |
|----------------------------------------------------------|
| <pre>signal p: std_logic_vector(7 downto 0);</pre>       |
| begin                                                    |
| y(0) <= a(0);                                            |
| y(1) <= a(1) xor a(0);                                   |
| $y(2) \le a(2) \text{ xor } a(1) \text{ xor } a(0);$     |
| y(3) <= (a(3) xor a(2)) xor (a(1) xor a(0));             |
| $y(4) \le (a(4) xor a(3)) xor (a(2) xor a(1)) xor a(0);$ |
| $y(5) \le (a(5) xor a(4)) xor (a(3) xor a(2)) xor$       |
| (a(1) <b>XOF</b> a(0));                                  |
| y(6) <= ((a(6) xor a(5)) xor (a(4) xor a(3))) xor        |
| ((a(2) xor a(1)) xor a(0));                              |
| $y(7) \le ((a(7) xor a(6)) xor (a(5) xor a(4))) xor$     |
| ((a(3) xor a(2)) xor (a(1) xor a(0)));                   |
| <pre>end direct_tree_arch;</pre>                         |

45

| RTL Hardware Design | Chapter 7 |
|---------------------|-----------|
| by P. Chu           |           |

| architecture optimal_tree_arch of reduced_xor_vector is |
|---------------------------------------------------------|
| signal p01, p23, p45, p67, p012,                        |
| p0123, p456, p4567: std_logic;                          |
| begin                                                   |
| p01 <= a(0) xor a(1);                                   |
| p23 <= a(2) xor a(3);                                   |
| p45 <= a(4) xor a(5);                                   |
| $p67 \le a(6) xor a(7);$                                |
| p012 <= p01 xor a(2);                                   |
| p0123 <= p01 xor p23;                                   |
| p456 <= p45 xor a(6);                                   |
| p4567 <= p45 xor p67;                                   |
| y(0) <= a(0);                                           |
| y(1) <= p01;                                            |
| y(2) <= p012;                                           |
| y(3) <= p0123;                                          |
| y(4) <= p0123 xor a(4);                                 |
| y(5) <= p0123 xor p45;                                  |
| y(6) <= p0123 xor p456;                                 |
| y(7) <= p0123 xor p4567;                                |
| end optimal_tree_arch;                                  |
|                                                         |
|                                                         |

| RTL Hardware Design | Chapter 7 | 47 |
|---------------------|-----------|----|
| by P. Chu           |           |    |

#### • "Parallel-prefix" implementation



- Comparison of n-input reduced-xor-vector – Cascading chain
  - Area: (n-1) xor gates
  - Delay: (n-1)
  - Coding: easy to modify (scale)
  - Multiple trees
    - Area: O(n<sup>2</sup>) xor gates
    - Delay: log<sub>2</sub>n
    - · Coding: not so easy to modify
- Parallel-prefix
  - Area: O(nlog<sub>2</sub>n) xor gates
  - Delay: log<sub>2</sub>n
  - Coding: difficult to modify
- Software is not able to convert cascading
- chain to parallel-prefix RTL Hardware Design Chapter 7 by P. Chu

46

# e.g., Shifter (rotating right)

#### • Direct implementation

| port(<br>a: in<br>amt:           | std_logic,<br>in std_logic,<br>in std_logic,<br>if std_logic,<br>right; | _vector(7<br>ic_vector | (2 downto ( | );   |        |
|----------------------------------|-------------------------------------------------------------------------|------------------------|-------------|------|--------|
| architectur                      | e direct_an                                                             | cch of ro              | tate_right  | is   |        |
| begin                            |                                                                         |                        |             |      |        |
| with ant                         | select                                                                  |                        |             |      |        |
| y <=                             | a                                                                       |                        |             | when | "000", |
|                                  | a(0) & a(7                                                              | downto 1               | )           | when | "001". |
|                                  | a(1 downto                                                              | 0) & a(7               | downto 2)   | when | "010". |
|                                  | a (2 downto                                                             | 0) & a(7               | downto 3)   | when | "011", |
|                                  | a (3 downto                                                             | 0) & a(7               | downto 4)   | when | "100". |
|                                  | a(4 downto                                                              | 0) & a(7               | downto 5)   | when | "101". |
|                                  | a (5 downto                                                             |                        |             |      |        |
|                                  | a (6 downto                                                             |                        |             |      |        |
| end direct.                      |                                                                         |                        |             | ,    |        |
| RTL Hardware Design<br>by P. Chu |                                                                         | Chapte                 | er 7        |      |        |

49



Better implementation



| <pre>architecture multi_level_arch of rotate_right is<br/>signal le0_out, le1_out, le2_out:<br/>std_logic_vector(7 downto 0);</pre> |
|-------------------------------------------------------------------------------------------------------------------------------------|
| begin                                                                                                                               |
| level 0, shift 0 or 1 bit                                                                                                           |
| le0_out <= a(0) & a(7 downto 1) when amt(0)='1' else                                                                                |
| a;                                                                                                                                  |
| level 1, shift 0 or 2 bits                                                                                                          |
| lei_out <=                                                                                                                          |
| le0_out(1 downto 0) & le0_out(7 downto 2)                                                                                           |
| when amt(1)='1' else                                                                                                                |
| le0_out;                                                                                                                            |
| level 2, shift 0 or 4 bits                                                                                                          |
| le2_out <=                                                                                                                          |
| le1_out(3 downto 0) & le1_out(7 downto 4)                                                                                           |
| when amt(2)='1' else                                                                                                                |
| le1_out;                                                                                                                            |
| y <= le2_out;                                                                                                                       |
| end multi_level_arch;                                                                                                               |
|                                                                                                                                     |
|                                                                                                                                     |

RTL Hardware Design by P. Chu Chapter 7

50

52

54

· Comparison for n-bit shifter

- Direct implementation
  - n n-to-1 mux
  - vertical strip with O(n<sup>2</sup>) input wiring

Chapter 7

- · Code not so easy to modify
- Staged implementation
  - n\*log<sub>2</sub>n 2-to-1 mux
  - Rectangular shaped
  - · Code easier to modify

# 5. General examples

Chapter 7

- Gray code counter
- Signed addition with status
- Simple combinational multiplier

53

RTL Hardware Design by P. Chu

#### e.g., Gray code counter

| binary code<br>b <sub>3</sub> b <sub>2</sub> b <sub>1</sub> b <sub>0</sub> | gray code | gray code | incremented gray code |
|----------------------------------------------------------------------------|-----------|-----------|-----------------------|
| 03020100                                                                   | 93929190  | 0000      | 0001                  |
| 0000                                                                       | 0000      | 0001      | 0011                  |
| 0001                                                                       | 0001      | 0011      | 0010                  |
| 0010                                                                       | 0011      | 0010      | 0110                  |
| 0011                                                                       | 0010      | 0110      | 0111                  |
| 0100                                                                       | 0110      | 0111      | 0101                  |
| 0101                                                                       | 0111      | 0101      | 0100                  |
| 0110                                                                       | 0101      | 0100      | 1100                  |
| 0111                                                                       | 0100      | 1100      | 1101                  |
| 1000                                                                       | 1100      | 1101      | 1111                  |
| 1001                                                                       | 1101      | 1111      | 1110                  |
| 1010                                                                       | 1111      | 1110      | 1010                  |
| 1011                                                                       | 1110      | 1010      | 1011                  |
| 1100                                                                       | 1010      | 1010      | 1001                  |
| 1101                                                                       | 1011      | 1001      | 1000                  |
| 1110                                                                       | 1001      | 1001      | 0000                  |
| 1111                                                                       | 1000      | 1000      | 0000                  |

#### • Direct implementation

| entity g_inc is<br>port(                         |                         |
|--------------------------------------------------|-------------------------|
| g: in std_log                                    | ic_vector(3 downto 0);  |
| g1: out std_1                                    | ogic_vector(3 dewmte 0) |
| );                                               |                         |
| end g_inc ;                                      |                         |
| end birne ;                                      |                         |
| architecture table_                              | arch of g_inc is        |
| begin                                            |                         |
| with g select                                    |                         |
| g1 <= "0001"                                     | when "0000",            |
| "0011"                                           | when "0001",            |
| "0010"                                           | when "0011",            |
| "0110"                                           | when "0010",            |
| "0111"                                           | when "0110".            |
| "0101"                                           | when "0111".            |
| "0100 "                                          | when "0101",            |
| "1100"                                           | when "0100".            |
| "1101"                                           | when "1100".            |
| "1111"                                           | when "1101",            |
| "1110"                                           | when "1111".            |
| "1010"                                           | when "1110",            |
| "1011"                                           | when "1010".            |
| "1001"                                           | when "1011",            |
|                                                  | when "1001".            |
| "0000 "                                          | when others; "1000"     |
| RTL Hardware Design end table_arch;<br>by P. Chu |                         |

| <ul> <li>Observation         <ul> <li>Require 2<sup>n</sup> rows</li> <li>No simple algorit</li> <li>One possible me</li> <li>Gray to binary</li> <li>Increment the</li> <li>Binary to gray</li> </ul> </li> </ul> | thod      | increment | $\begin{array}{c} {\color{red} {\rm binary\ code} \\ {\color{red} b_3 b_2 b_1 b_0 \\ 0000 \\ 0001 \\ 0010 \\ 0010 \\ 0101 \\ 0100 \\ 0101 \\ 0111 \\ 1000 \\ 1001 \\ 1011 \\ 1010 \\ 1101 \\ 1110 \\ 1111 \end{array}}$ | gray code<br>g3xd2q1g50<br>0000<br>0001<br>0010<br>0110<br>0101<br>0100<br>1100<br>1 | • binary to gray<br>$g_i = b_i \oplus b_{i+1}$<br>$g_3 = b_3 \oplus 0 = b_3$<br>$g_2 = b_2 \oplus b_3$<br>$g_1 = b_1 \oplus b_2$<br>$g_0 = b_0 \oplus b_1$<br>• gray to binary<br>$b_i = g_i \oplus b_{i+1}$<br>$b_3 = g_3 \oplus 0 = g_3$<br>$b_2 = g_2 \oplus b_3 = g_2 \oplus g_3$<br>$b_1 = g_1 \oplus b_2 = g_1 \oplus g_2 \oplus g_3$<br>$b_0 = g_0 \oplus b_1 = g_0 \oplus g_1 \oplus g_2 \oplus g_3$ |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RTL Hardware Design<br>by P. Chu                                                                                                                                                                                   | Chapter 7 | 57        | RTL Hardware<br>by P. Chu                                                                                                                                                                                               | Design                                                                               | Chapter 7 58                                                                                                                                                                                                                                                                                                                                                                                                 |

```
architecture compact_arch of g_inc is
  constant WIDTH: integer := 4;
  signal b, b1: std_logic_vector(WIDTH-1 downto 0);
begin
        -- gray to binary
        b <= g xor ('0' & b(WIDTH-1 downto 1));
        -- binary increment
        b1 <= std_logic_vector((unsigned(b)) + 1);
        -- binary to gray
        gl<= b1 xor ('0' & b1(WIDTH-1 downto 1));
end compact_arch;
```

Chapter 7

| RTL Hardware Design |  |
|---------------------|--|
| by P. Chu           |  |

59

#### e.g., signed addition with status

#### · Adder with

- Carry-in: need an extra bit (LSB)
- Carry-out: need an extra bit (MSB)
- Overflow:
  - two operands has the same sign but the sum has a different sign

 $overflow = (s_a \cdot s_b \cdot s'_s) + (s'_a \cdot s'_b \cdot s_s)$ 

– Zero

RTL Hardware Design by P. Chu

- Sign (of the addition result)

Chapter 7

60

| cin: in std                                                                                                                                                        | std.all;<br>us is<br>Llogic_vector(7 downto 0);<br>Llogic;                                                                                                                                          |    | e | e.g.               | , sin      | nple               | e coi                        | mbiı                                   | natio                                                | ona                                            | l mu           | ultiplier                  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|---|--------------------|------------|--------------------|------------------------------|----------------------------------------|------------------------------------------------------|------------------------------------------------|----------------|----------------------------|
|                                                                                                                                                                    | std_logic_vector(7 downto 0);<br>. overflow, sign: out std_logic                                                                                                                                    |    | × |                    |            |                    |                              | $a_3$<br>$b_3$                         | $a_2 \\ b_2$                                         | $a_1 \\ b_1$                                   | $a_0$<br>$b_0$ | multiplicand<br>multiplier |
| signal a_ext,<br>signal ovf: st<br>alias sign_a:<br>alias sign_b:<br>alias sign_a:<br>begin                                                                        | <pre>of adder_status is<br/>b_ast, sum_ast: signad(0 downso 0);<br/>d_logic;<br/>atd_logic is a_ext(0);<br/>atd_logic is b_ext(0);<br/>atd_logic is sum_ext(0);<br/>d('0' &amp; a &amp; '1');</pre> |    | + |                    | $a_3b_3$   | $a_3b_2 \\ a_2b_3$ | $a_3b_1 \\ a_2b_2 \\ a_1b_3$ | $a_3b_0 \\ a_2b_1 \\ a_1b_2 \\ a_0b_3$ | $\begin{array}{c} a_2b_0\\a_1b_1\\a_0b_2\end{array}$ | $\begin{array}{c} a_1b_0\\ a_0b_1 \end{array}$ | $a_0 b_0$      |                            |
| <pre>b_ext &lt;= signe<br/>sum_ext &lt;= s.e<br/>ovf &lt;= (sign.a<br/>((not s<br/>cout &lt;= sum_ex<br/>zero &lt;= '1' when<br/>'0';<br/>overflow &lt;= ovf</pre> | <pre>id('0' &amp; b &amp; cin);<br/>vxt + b_ext;<br/>ign_a) and (not sign_s)) or<br/>ign_a) and (not sign_b) and sign_s);<br/>t(9);<br/>m (sun_ext(8 downto 1)=0 and ovf='0') else</pre>            |    |   | $y_7$              | ¥6         | $y_5$              | $y_4$                        | ¥3                                     | $y_2$                                                | $y_1$                                          | $y_0$          | product                    |
| RTL Hardware Design<br>by P. Chu                                                                                                                                   | Chapter 7                                                                                                                                                                                           | 61 |   | "L Hardw<br>P. Chu | are Desigr | n                  |                              | Chapte                                 | er 7                                                 |                                                |                | 62                         |

| <pre>library ieee;<br/>use ieee.std_logic_1164.all;<br/>use ieee.numeric_std.all;<br/>entity mult8 is<br/>port(<br/>a, b: in std_logic_vector(7 downto 0);<br/>y: out std_logic_vector(15 downto 0)<br/>);<br/>end mult8;<br/>architecture comb1_arch of mult8 is<br/>constant WIDTH: integer:=8;<br/>signal au, bv0, bv1, bv2, bv3, bv4, bv5, bv<br/>unsigned(WIDTH-1 downto 0);<br/>signal p0,p1,p2,p3,p4,p5,p6,p7,prod:<br/>unsigned(2*WIDTH-1 downto 0);</pre> | r6, bv7: | bv1 <= (of<br>bv2 <= (of<br>bv3 <= (of<br>bv4 <= (of<br>bv5 <= (of<br>bv6 <= (of<br>bv7 <= (of<br>p0 <= 0000<br>p1 <= 0000<br>p2 <= 0000<br>p3 <= 0000<br>p4 <= 0000<br>p5 <= "00"<br>p6 <= 00"<br>p7 <= "0" % | hers=>b(0));<br>hers=>b(0));<br>hers=>b(1));<br>hers=>b(2));<br>hers=>b(3));<br>hers=>b(3));<br>hers=>b(5));<br>hers=>b(6));<br>hers=>b(6));<br>hers=>b(6));<br>hers=>b(7));<br>0000 % (bv1 and au) % "0000 % (bv2 and au) & "0000 % (bv2 and au) & "0000 % (bv5 and au) & "0000 % (bv6 and au) & "0000 % (bv6 and au) & "00000 % (bv7 and au) & "00000 % (bv7 and au) & "00000 % (bv7 and au) & "0000 % (bv6 and au) & "0000 % (bv6 and au) & "0000 % (bv6 and au) & "00000 % (bv7 and au) & "0000 % (bv6 and au) & "00000 % (bv7 and au) & "00000 & "0000 % (bv7 and au) & "00000 & " | 00";<br>00";<br>00";<br>00";<br>00";<br>00"; |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
| RTL Hardware Design Chapter 7<br>by P. Chu                                                                                                                                                                                                                                                                                                                                                                                                                         | 63       | RTL Hardware Design<br>by P. Chu                                                                                                                                                                               | Chapter 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 64                                           |