Register Allocation

## Announcements

- Programming Project 4 due Saturday, August 18 at 11:30AM
- OH all this week.
- Ask questions via email!
- Ask questions via Piazza!
- No late submissions.

Please evaluate this course on Axess.
Your feedback really makes a difference.

## Where We Are

| Lexical Analysis |
| :---: |
| Syntax Analysis |
| Semantic Analysis |
| IR Generation |
| IR Optimization |
| Code Generation |
| Optimization | Code

## Where We Are



## Achievement unlocked

 Fan-TAC-stic! Code
## Where We Are

| Lexical Analysis |
| :---: |
| Syntax Analysis |
| Semantic Analysis |
| IR Generation |
| IR Optimization |
| Code Generation |
| Optimization | Code

## Code Generation at a Glance

- At this point, we have optimized IR code that needs to be converted into the target language (e.g. assembly, machine code).
- Goal of this stage:
- Choose the appropriate machine instructions for each IR instruction.
- Divvy up finite machine resources (registers, caches, etc.)
- Implement low-level details of the runtime environment.
- Machine-specific optimizations are often done here, though some are treated as part of a final optimization phase.


## Overview

- Register Allocation (Today)
- How to assign variables to finitely many registers?
- What to do when it can't be done?
- How to do so efficienty?
- Garbage Collection (Monday)
- How to detect reclaimable memory?
- How to reclaim memory efficiently?


## Memory Tradeoffs

- There is an enormous tradeoff between speed and size in memory.
- SRAM is fast but very expensive:
- Can keep up with processor speeds in the GHz.
- As of 2007, cost is \$10/MB
- Good luck buying 1TB of the stuff!
- Hard disks are cheap but very slow:
- As of 2012, you can buy a 2TB hard drive for about $\$ 100$
- As of 2012, good disk seek times are measured in ms (about two to four million times slower than a processor cycle!)


## The Memory Hierarchy

- Idea: Try to get the best of all worlds by using multiple types of memory.


## The Memory Hierarchy

- Idea: Try to get the best of all worlds by using multiple types of memory.



## The Memory Hierarchy

- Idea: Try to get the best of all worlds by using multiple types of memory.



## The Memory Hierarchy

- Idea: Try to get the best of all worlds by using multiple types of memory.


| $256 \mathrm{~B}-8 \mathrm{~KB}$ | $0.25-1 \mathrm{~ns}$ |
| :---: | :---: |
| $16 \mathrm{~KB}-64 \mathrm{~KB}$ | $1 \mathrm{~ns}-5 \mathrm{~ns}$ |
| $1 \mathrm{MB}-4 \mathrm{MB}$ | $5 \mathrm{~ns}-25 \mathrm{~ns}$ |
| $4 \mathrm{~GB}-256 \mathrm{~GB}$ | $25 \mathrm{~ns}-100 \mathrm{~ns}$ |
| $500 \mathrm{~GB}+$ | $3-10 \mathrm{~ms}$ |
| HUGE | $10-2000 \mathrm{~ms}$ |

## The Challenges of Code Generation

- Almost all programming languages expose a coarse view of the memory hierarchy:
- All variables live in "memory."
- Disk and network explicitly handled separately.
- (Interesting exception: Stanford's Sequoia programming language)
- Challenges in code generation:
- Position objects in a way that takes maximum advantage of the memory hierarchy.
- Do so without hints from the programmer.


## Registers

- Most machines have a set of registers, dedicated memory locations that
- can be accessed quickly,
- can have computations performed on them, and
- exist in small quantity.
- Using registers intelligently is a critical step in any compiler.
- A good register allocator can generate code orders of magnitude better than a bad register allocator.


## Register Allocation

- In TAC, there are an unlimited number of variables.
- On a physical machine there are a small number of registers:
- x86 has four general-purpose registers and a number of specialized registers.
- MIPS has twenty-four general-purpose registers and eight special-purpose registers.
- Register allocation is the process of assigning variables to registers and managing data transfer in and out of registers.


## Challenges in Register Allocation

- Registers are scarce.
- Often substantially more IR variables than registers.
- Need to find a way to reuse registers whenever possible.
- Registers are complicated.
- x86: Each register made of several smaller registers; can't use a register and its constituent registers at the same time.
- x86: Certain instructions must store their results in specific registers; can't store values there if you want to use those instructions.
- MIPS: Some registers reserved for the assembler or operating system.
- Most architectures: Some registers must be preserved across function calls.


## Goals for Today

- Introduce register allocation for a MIPSstyle machine:
- Some number of indivisible, general-purpose registers.
- Explore three algorithms for register allocation:
- Naïve ("no") register allocation.
- Linear scan register allocation.
- Graph-coloring register allocation.


## An Initial Register Allocator

- Idea: Store every value in main memory, loading values only when they're needed.
- To generate a code that performs a computation:
- Generate load instructions to pull the values from main memory into registers.
- Generate code to perform the computation on the registers.
- Generate store instructions to store the result back into main memory.


## Our Register Allocator In Action

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c}$;
$\mathrm{d}=\mathrm{a}$;
c $=a+d ;$

## Our Register Allocator In Action

$$
\begin{aligned}
& \mathrm{a}=\mathrm{b}+\mathrm{c} ; \\
& \mathrm{d}=\mathrm{a} ; \\
& \mathrm{c}=\mathrm{a}+\mathrm{d} ;
\end{aligned}
$$

| Param N | $\mathrm{fp}+\mathbf{4 N}$ |
| :---: | :---: |
| ... | ... |
| Param 1 | fp +4 |
| Stored fp | fp +0 |
| Stored ra | fp-4 |
| a | fp-8 |
| b | fp-12 |
| c | fp-16 |
| d | fp-20 |

## Our Register Allocator In Action

$$
\begin{aligned}
& \mathrm{a}=\mathrm{b}+\mathrm{c} ; \\
& \mathrm{d}=\mathrm{a} ; \\
& \mathrm{c}=\mathrm{a}+\mathrm{di}
\end{aligned}
$$

| Param N | $\mathrm{fp}+4 \mathrm{~N}$ |
| :---: | :---: |
| ... | ... |
| Param 1 | $\mathrm{fp}+4$ |
| Stored fp | $\mathrm{fp}+0$ |
| Stored ra | fp-4 |
| a | fp-8 |
| b | fp-12 |
| c | fp-16 |
| d | fp-20 |

## Our Register Allocator In Action

$$
\begin{aligned}
& \mathrm{a}=\mathrm{b}+\mathrm{c} ; \\
& \mathrm{d}=\mathrm{a} ; \\
& \mathrm{c}=\mathrm{a}+\mathrm{d} ;
\end{aligned}
$$

$$
\text { lw } \$ \mathrm{t} 0,-12(\mathrm{fp})
$$

| Param N | $\mathrm{fp}+\mathbf{4 N}$ |
| :---: | :---: |
| ... | ... |
| Param 1 | fp +4 |
| Stored fp | fp +0 |
| Stored ra | fp-4 |
| a | fp-8 |
| b | fp-12 |
| c | fp-16 |
| d | fp-20 |

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c}$;
$d=a ;$
$c=a+d ;$
Param N $\quad$ fp $+\mathbf{4 N}$

Param 1
Stored fp
Stored ra
a
b
fp-12
c
fp-16
fp-20

| Param N | $\mathbf{f p}+\mathbf{4 N}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |
| Param 1 | $\mathbf{f p}+\mathbf{4}$ |
| Stored fp | $\mathbf{f p}+\mathbf{0}$ |
| Stored ra | $\mathbf{f p}-\mathbf{4}$ |
| a | $\mathbf{f p}-\mathbf{8}$ |
| b | $\mathbf{f p}-12$ |
| c | $\mathbf{f p}-16$ |
| $d$ | $\mathbf{f p}-\mathbf{2 0}$ |

$$
\begin{array}{lll}
l w & \$ t 0, & -12(f p) \\
l w & \$ t 1, & -16(f p)
\end{array}
$$

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c} ;$
$\mathrm{d}=\mathrm{a} ;$
$\mathrm{c}=\mathrm{a}+\mathrm{d} ;$
lw $\$ t 0,-12(f p)$
lw \$t1, -16 (fp)
add \$t2, \$t0, \$t1

| Param N | $\mathbf{f p}+\mathbf{4 N}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |
| Param 1 | $\mathbf{f p}+\mathbf{4}$ |
| Stored fp | $\mathbf{f p}+\mathbf{0}$ |
| Stored ra | $\mathbf{f p}-\mathbf{4}$ |
| a | $\mathbf{f p}-\mathbf{8}$ |
| b | $\mathbf{f p}-12$ |
| c | $\mathbf{f p}-16$ |
| $d$ | $\mathbf{f p}-20$ |

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c} ;$
$\mathrm{d}=\mathrm{a} ;$
$\mathrm{c}=\mathrm{a}+\mathrm{d} ;$
Param N
Param 1
Stored fp
Stored ra
a
c
fp-16
d fp-20
fp +4
fp $+\mathbf{0}$
fp-4
fp-8
fp-12
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)

## Our Register Allocator In Action

$$
\begin{aligned}
& \mathrm{a}=\mathrm{b}+\mathrm{c} ; \\
& \mathrm{d}=\mathrm{a} ; \\
& \mathrm{c}=\mathrm{a}+\mathrm{d} ;
\end{aligned}
$$

$$
\text { Param } N \quad f p+\mathbf{4 N}
$$

Param 1
Stored fp
Stored ra
a
b
c
d
d
fp +4
fp $+\mathbf{0}$
fp-4
fp-8
fp-12
fp-16
fp-20
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)

## Our Register Allocator In Action

$$
\begin{aligned}
& \mathrm{a}=\mathrm{b}+\mathrm{c} ; \\
& \mathrm{d}=\mathrm{a} ; \\
& \mathrm{c}=\mathrm{a}+\mathrm{di}
\end{aligned}
$$

| Param N | $\mathbf{f p}+\mathbf{4}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param $1 \quad \mathrm{fp}+4$
Stored fp
Stored ra
a
b
c
d
d
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw $\$ t 0,-8(f p)$

## Our Register Allocator In Action

$a=b+c ;$
$d=a ;$
$c=a+d ;$

| Param N | $f p+\mathbf{4}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param 1 fp + 4
Stored fp
Stored ra
a
b
c
d
d
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw \$t0, -8(fp)
sw \$t0, -20(fp)

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c} ;$
d $=a ;$
$c=a+d ;$

| Param N | $f \mathbf{p}+\mathbf{4}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param 1
Stored fp
Stored ra

| a |
| :--- |
| b |
| c |
| d |

lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw $\$ t 0,-8(f p)$
sw \$t0, -20(fp)

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c} ;$
d $=a ;$
$c=a+d ;$

| Param N | $f p+4$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param 1
Stored fp
Stored ra
a
b
c
d
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw $\$ t 0,-8(f p)$
sw \$t0, -20(fp)
lw \$t0, -8(fp)

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c}$;
d $=a ;$
$c=a+d ;$

| Param N | fp $+\mathbf{4}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param $1 \quad \mathrm{fp}+4$
Stored fp
Stored ra
a
c
d
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw \$t0, -8(fp)
sw \$t0, -20(fp)
lw \$t0, -8(fp)
lw \$t1, -20(fp)

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c}$;
d $=a ;$
$c=a+d ;$

| Param N | fp $+\mathbf{4}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param $1 \quad \mathrm{fp}+4$
Stored fp
Stored ra
a
c
d
lw $\$ t 0,-12(f p)$
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw \$t0, -8(fp)
sw \$t0, -20(fp)
lw \$t0, -8(fp)
lw \$t1, -20(fp)
add \$t2, \$t0, \$t1

## Our Register Allocator In Action

$\mathrm{a}=\mathrm{b}+\mathrm{c}$;
d $=a ;$
$c=a+d ;$

| Param N | fp $+\mathbf{4}$ |
| :---: | :---: |
| $\ldots$ | $\ldots$ |

Param $1 \quad \mathrm{fp}+4$
Stored fp
Stored ra

| a | fp $-\mathbf{8}$ |
| :---: | :---: |
| $b$ | fp -12 |
| $c$ | fp -16 |
| $d$ | fp -20 |

lw \$t0, -12(fp)
lw \$t1, -16(fp)
add \$t2, \$t0, \$t1
sw \$t2, -8(fp)
lw \$t0, -8(fp)
sw \$t0, -20(fp)
lw \$t0, -8(fp)
lw \$t1, -20(fp)
add \$t2, \$t0, \$t1
sw \$t2, -16(fp)

## Analysis of our Allocator

- Disadvantage: Gross inefficiency.
- Issues unnecessary loads and stores by the dozen.
- Wastes space on values that could be stored purely in registers.
- Easily an order of magnitude or two slower than necessary.
- Unacceptable in any production compiler.
- Advantage: Simplicity.
- Can translate each piece of IR directly to assembly as we go.
- Never need to worry about running out of registers.
- Never need to worry about function calls or special-purpose registers.
- Good if you just needed to get a prototype compiler up and running.


## Building a Better Allocator

- Goal: Try to hold as many variables in registers as possible.
- Reduces memory reads/writes.
- Reduces total memory usage.
- We will need to address these questions:
- Which registers do we put variables in?
- What do we do when we run out of registers?


## Register Consistency



## Register Consistency

- At each program point, each variable must be in the same location.
- Does not mean that each variable is always stored in the same location!
- At each program point, each register holds at most one live variable.
- Can assign several variables the same register if no two of them ever will be read together.


## Live Ranges and Live Intervals

- Recall: A variable is live at a particular program point if its value may be read later before it is written.
- Can find this using global liveness analysis.
- The live range for a variable is the set of program points at which that variable is live.
- The live interval for a variable is the smallest subrange of the IR code containing all a variable's live ranges.
- A property of the IR code, not the CFG.
- Less precise than live ranges, but simpler to work with.

Live Ranges and Live Intervals

## Live Ranges and Live Intervals

_L0 :

$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b \\
& \text { IfZ e Goto LO } \\
& d=e+f \\
& \text { Goto LI } \\
& d=e-f
\end{aligned}
$$

_L1:

$$
g=d
$$

## Live Ranges and Live Intervals

$e=d+a$
$f=b+c$
$f=f+b$
IfZ e Goto LO
$d=e+f$
Goto_L1;
LO:
$d=e-f$
LI: $\quad$
$g=d$


$$
\mathrm{g}=\mathrm{d}
$$

## Live Ranges and Live Intervals

$e=d+a$
$f=b+c$
$f=f+b$
IfZ e Goto LO
$d=e+f$
Goto_L1;
LO:
$d=e-f$
LI: $\quad$
$g=d$


$$
\begin{aligned}
& g=d \\
& \{g\}
\end{aligned}
$$

## Live Ranges and Live Intervals

$e=d+a$
$f=b+c$
$f=f+b$
IfZ e Goto LO
$d=e+f$
Goto_L1;
LO:
$d=e-f$
L1: $\quad$
$g=d$


## Live Ranges and Live Intervals




## Live Ranges and Live Intervals




## Live Ranges and Live Intervals




## Live Ranges and Live Intervals




## Live Ranges and Live Intervals

$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b
\end{aligned}
$$

IfZ e Goto _LO

$$
d=e+f
$$

Goto _L1;
_L0:

$$
d=e-f
$$

_L1:

$$
g=d
$$



## Live Ranges and Live Intervals

$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b
\end{aligned}
$$

IfZ e Goto _LO

$$
d=e+f
$$

Goto _L1;
_LO:

$$
d=e-f
$$

_L1:

$$
g=d
$$



## Live Ranges and Live Intervals

$$
\begin{aligned}
& \mathrm{e}=\mathrm{d}+\mathrm{a} \\
& \mathrm{f}=\mathrm{b}+\mathrm{c} \\
& \mathrm{f}=\mathrm{f}+\mathrm{b} \\
& \text { IfZ e Goto LO } \\
& \mathrm{d}=\mathrm{e}+\mathrm{f} \\
& \text { Goto } \mathrm{L} 1
\end{aligned}
$$

_L0:


$$
g=d
$$

## Live Ranges and Live Intervals

$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b \\
& \text { IfZ e Goto LO } \\
& d=e+f \\
& \text { Goto } L \mathcal{L} 1
\end{aligned}
$$

_L0:


$$
g=d
$$

## Live Ranges and Live Intervals

$$
e=d+a
$$

$$
\{b, c, e\}
$$

$$
\{b, c, e\}
$$

$$
f=b+c
$$

$$
\{b, e, f\}
$$

$$
\{b, e, f\}
$$

$$
f=f+b
$$

$$
\{e, f\}
$$

$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b \\
& \text { IfZ e Goto LO } \\
& d=e+f \\
& \text { Goto LI' } \\
& d=e-f
\end{aligned}
$$

_L0:

$$
g=d
$$

_L1:

## Live Ranges and Live Intervals

$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b \\
& \text { IfZ e Goto LO } \\
& d=e+f \\
& \text { Goto } L 1 ;
\end{aligned}
$$

_L0:


$$
g=d
$$

## Live Ranges and Live Intervals



$$
\begin{aligned}
& e=d+a \\
& f=b+c \\
& f=f+b \\
& \text { IfZ e Goto _LO } \\
& d=e+f
\end{aligned}
$$

Goto _L1;
LO :


## Live Ranges and Live Intervals

|  | a |
| :---: | :---: |
| $e=d+a$ |  |
| $\mathrm{f}=\mathrm{b}+\mathrm{c}$ |  |
| $\mathrm{f}=\mathrm{f}+\mathrm{b}$ |  |
| IfZ e Goto _LO |  |
| $d=e+f$ |  |
| Goto _L1; |  |
| _L0: |  |
| $d=e-f$ |  |
| _L1: |  |
| $g=d$ |  |



## Live Ranges and Live Intervals

|  | a b |
| :---: | :---: |
| $e=d+a$ |  |
| $\mathrm{f}=\mathrm{b}+\mathrm{c}$ |  |
| $\mathrm{f}=\mathrm{f}+\mathrm{b}$ |  |
| IfZ e Goto _LO |  |
| $d=e+f$ |  |
| Goto _L1; |  |
| L0: |  |
| $d=e-f$ |  |
| L1: |  |
| $g=d$ |  |



## Live Ranges and Live Intervals



## Live Ranges and Live Intervals



## Live Ranges and Live Intervals



## Live Ranges and Live Intervals



## Live Ranges and Live Intervals



## Live Ranges and Live Intervals



## Register Allocation with Live Intervals

- Given the live intervals for all the variables in the program, we can allocate registers using a simple greedy algorithm.
- Idea: Track which registers are free at each point.
- When a live interval begins, give that variable a free register.
- When a live interval ends, the register is once again free.
- We can't always fit everything into a register; we'll see what do to in a minute.



## Register Allocation with Live Intervals



Free Registers


# Register Allocation with Live Intervals 

$$
a b c d e f g
$$

Free Registers


# Register Allocation with Live Intervals 

$$
a b c d e f g
$$

Free Registers


| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{2}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

$$
a b c d e f g
$$



Free Registers


$$
a b c d e f g
$$



$$
a b c d e f g
$$



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- | :--- |

Free Registers


$$
a b c d e f g
$$

Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- | :--- |

$$
a b c d e f g
$$

Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- | :--- |

$$
a b c d e f g
$$

Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- | :--- |

Free Registers


$$
a b c d e f g
$$

Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

$$
a b c d e f g
$$

Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{2}$ | $\mathbf{R}_{3}$ |
| :--- | :--- | :--- | :--- |

## Another Example

## Another Example



## Another Example



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers


## Another Example



Free Registers


## Another Example



Free Registers


## Another Example



Free Registers

## Another Example



Free Registers


What do we do now?

## Register Spilling

- If a register cannot be found for a variable $v$, we may need to spill a variable.
- When a variable is spilled, it is stored in memory rather than a register.
- When we need a register for the spilled variable:
- Evict some existing register to memory.
- Load the variable into the register.
- When done, write the register back to memory and reload the register with its original value.
- Spilling is slow, but sometimes necessary.


## Another Example



Free Registers


What do we do now?

## Another Example



Free Registers | $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers | $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{0}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Another Example



Free Registers

| $\mathbf{R}_{0}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Linear Scan Register Allocation

- This algorithm is called linear scan register allocation and is a comparatively new algorithm.
- Advantages:
- Very efficient (after computing live intervals, runs in linear time)
- Produces good code in many instances.
- Allocation step works in one pass; can generate code during iteration.
- Often used in JIT compilers like Java HotSpot.
- Disadvantages:
- Imprecise due to use of live intervals rather than live ranges.
- Other techniques known to be superior in many cases.


## Correctness Proof Sketch

- No register holds two live variables at once:
- Live intervals are conservative approximations of live ranges.
- No two variables with overlapping live ranges placed in the same register.
- At each program point, every variable is in the same location:
- All variables assigned a unique location.


## Second-Chance Bin Packing

- A more aggressive version of linear-scan.
- Uses live ranges instead of live intervals.
- If a variable must be spilled, don't spill all uses of it.
- A later live range might still fit into a register.
- Requires a final data-flow analysis to confirm variables are assigned consistent locations.
- See "Quality and Speed in Linear-scan Register Allocation" by Traub, Holloway, and Smith.


## Second-Chance Bin Packing $a b c d e f g$ <br> Free Registers $\mathbf{R}_{\mathbf{0}} \quad \mathbf{R}_{1}$

## Second-Chance Bin Packing

## a b c d ef f



Free Registers | $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing

## a b c d ef g



Free Registers

## Second-Chance Bin Packing

## a b c d ef g



Free Registers

## Second-Chance Bin Packing



Free Registers

## Second-Chance Bin Packing

## a b c d ef f

Free Registers


## Second-Chance Bin Packing



Free Registers | $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing

## ab c def g

Free Registers


## Second-Chance Bin Packing



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing

$$
a b c d e f g
$$

Free Registers


## Second-Chance Bin Packing



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing

$$
a b c d e f g
$$



Free Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing

$$
a b c d e f g
$$

Free Registers


## Second-Chance Bin Packing



Free Registers | $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{1}$ | $\mathbf{R}_{2}$ |
| :--- | :--- | :--- |

## Second-Chance Bin Packing

$$
a b c d e f g
$$

Free Registers


\section*{Second-Chance Bin Packing $a \mathrm{~b} c \mathrm{~d} e f \mathrm{~g}$ Free Registers | $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |}

An Entirely Different Approach

## An Entirely Different Approach



## An Entirely Different Approach

What can we infer from all these variables being

$$
\begin{gathered}
\{e, f f\} \\
d=e+f \\
\{d\}
\end{gathered} \quad \begin{gathered}
\left\{\begin{array}{c}
e, f \\
d=e-f \\
\{d
\end{array}\right]
\end{gathered}
$$ live at this point?

$$
\begin{aligned}
& \text { \{ a, b, c, d \} } 4 \\
& e=d+a \\
& \text { \{ b, c, e \} } \\
& r \\
& \begin{array}{c}
\{b, c, e\} \\
f=b+c \\
\{b, e, f\}
\end{array} \\
& \{b, e, f \\
& \mathrm{f}=\mathrm{f}+\mathrm{b} \\
& \text { \{ e, f \} }
\end{aligned}
$$

## An Entirely Different Approach



An Entirely Different Approach


An Entirely Different Approach

(g)

f

An Entirely Different Approach


An Entirely Different Approach


## An Entirely Different Approach



## An Entirely Different Approach



## The Register Interference Graph

- The register interference graph (RIG) of a control-flow graph is an undirected graph where
- Each node is a variable.
- There is an edge between two variables that are live at the same program point.
- Perform register allocation by assigning each variable a different register from all of its neighbors.
- There's just one catch...


## The One Catch

- This problem is equivalent to graphcoloring, which is NP-hard if there are at least three registers.
- No good polynomial-time algorithms (or even good approximations!) are known for this problem.
- We have to be content with a heuristic that is good enough for RIGs that arise in practice.

The One Catch to The One Catch

## The One Catch to The One Catch

If you can figure out a way to assign registers to arbitrary RIGs, you've just proven $\mathbf{P}=\mathbf{N P}$ and will get a $\mathbf{\$ 1 , 0 0 0 , 0 0 0}$ check from the Clay Mathematics Institute.

## The One Catch to The One Catch CHALLENGE ACCEPTED



If you can figure out a way to assign registers to arbitrary RIGs, you've just proven $\mathbf{P}=\mathbf{N P}$ and will get a $\mathbf{\$ 1 , 0 0 0 , 0 0 0}$ check from the Clay Mathematics Institute.

## Battling NP-Hardness

## Chaitin's Algorithm

- Intuition:
- Suppose we are trying to $k$-color a graph and find a node with fewer than $k$ edges.
- If we delete this node from the graph and color what remains, we can find a color for this node if we add it back in.
- Reason: With fewer than $k$ neighbors, some color must be left over.
- Algorithm:
- Find a node with fewer than $k$ outgoing edges.
- Remove it from the graph.
- Recursively color the rest of the graph.
- Add the node back in.
- Assign it a valid color.


## Chaitin's Algorithm

## Chaitin's Algorithm



## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



C

Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers


## Chaitin's Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ | $\mathbf{R}_{\mathbf{3}}$ |
| :--- | :--- | :--- | :--- |

## One Problem

- What if we can't find a node with fewer than $k$ neighbors?
- Choose and remove an arbitrary node, marking it "troublesome."
- Use heuristics to choose which one.
- When adding node back in, it may be possible to find a valid color.
- Otherwise, we have to spill that node.


## Chaitin's Algorithm Reloaded



## Chaitin's Algorithm Reloaded


g


## Chaitin's Algorithm Reloaded



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



Registers


## Chaitin's Algorithm Reloaded



Registers


## Chaitin's Algorithm Reloaded



Registers


## Chaitin's Algorithm Reloaded



Registers


## Chaitin's Algorithm Reloaded



Registers


## Chaitin's Algorithm Reloaded



Registers


## Chaitin's Algorithm Reloaded



## Chaitin's Algorithm Reloaded



## Chaitin's Algorithm Reloaded



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



## Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm Reloaded



## Chaitin's Algorithm Reloaded



## Chaitin's Algorithm Reloaded


g


## Chaitin's Algorithm Reloaded


$g$


## Chaitin's Algorithm Reloaded



## Chaitin's Algorithm Reloaded



## A Smarter Algorithm



## A Smarter Algorithm



## A Smarter Algorithm



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |


(spilled)

## A Smarter Algorithm


d'

## Another Example

## Another Example



## Another Example



## Another Example



## Another Example



## Another Example



## Another Example



Registers


## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers


## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



## Another Example



## Another Example



## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



## Another Example



## Another Example



## Another Example



## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers


## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



## Another Example



## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Another Example



## Another Example



## Another Example



## Another Example



Registers

| $\mathbf{R}_{\mathbf{0}}$ | $\mathbf{R}_{\mathbf{1}}$ | $\mathbf{R}_{\mathbf{2}}$ |
| :--- | :--- | :--- |

## Chaitin's Algorithm

- Advantages:
- For many control-flow graphs, finds an excellent assignment of variables to registers.
- When distinguishing variables by use, produces a precise RIG.
- Often used in production compilers like GCC.
- Disadvantages:
- Core approach based on the NP-hard graph coloring problem.
- Heuristic may produce pathologically worst-case assignments.


## Correctness Proof Sketch

- No two variables live at some point are assigned the same register.
- Forced by graph coloring.
- At any program point each variable is always in one location.
- Automatic if we assign each variable one register.
- Requires a few tricks if we separate by use case.


## Improvements to the Algorithm

- Choose what to spill intelligently.
- Use heuristics (least-commonly used, greatest improvement, etc.) to determine what to spill.
- Handle spilling intelligently.
- When spilling a variable, recompute the RIG based on the spill and use a new coloring to find a register.


## Summary of Register Allocation

- Critical step in all optimizing compilers.
- The linear scan algorithm uses live intervals to greedily assign variables to registers.
- Often used in JIT compilers due to efficiency.
- Chaitin's algorithm uses the register interference graph (based on live ranges) and graph coloring to assign registers.
- The basis for the technique used in GCC.


## Next Time

- Garbage Collection
- Reference Counting
- Mark-and-Sweep
- Stop-and-Copy
- Incremental Collectors

