

# **COSMOS:** Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators Luca Piccolboni, Paolo Mantovani, Giuseppe Di Guglielmo, Luca Carloni Columbia University, New York, USA





## Hardware Accelerators Motivations

• Hardware accelerators are devices designed and optimized to realize very specific functionalities



## Hardware Accelerators Architecture









1. Loop unrolling

for (k = 0; k < N; ++k) a[k] = b[k] + c[k]; Which knobs can be used to obtain several RTL implementations?



1. Loop unrolling

for (k = 0; k < N; k += 2)
a[k+0] = b[k+0] + c[k+0];
a[k+1] = b[k+1] + c[k+1];</pre>

Which knobs can be used to obtain several RTL implementations?



2. Memory Ports



Which knobs can be used to obtain several RTL implementations?



2. Memory Ports



Which knobs can be used to obtain several RTL implementations?



## **Motivational Examples**

 Performing an accurate and exhaustive design-space exploration for a hardware accelerator is complex:

## **Motivational Examples**

- Performing an accurate and exhaustive design-space exploration for a hardware accelerator is complex:
  - HLS tools do not always support the generation (and optimization) of the private local memories

## Motivational Examples Need of multi-port memories

#### using standard memories





#### Motivational Examples Need of multi-port memories

#### using multi-port memories



## **Motivational Examples**

- Performing an accurate and exhaustive design-space exploration for a hardware accelerator is complex:
  - HLS tools do not always support the generation (and optimization) of the private local memories
  - 2. The algorithms adopted by HLS tools are based on heuristics that make it hard to set the knobs

#### Motivational Examples Unpredictability of HLS tools





## Motivational Examples Unpredictability of HLS tools



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

## Motivational Examples Unpredictability of HLS tools



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

## **Motivational Examples**

- Performing an accurate and exhaustive design-space exploration for a hardware accelerator is complex:
  - HLS tools do not always support the generation (and optimization) of the private local memories
  - 2. The algorithms adopted by HLS tools are based on heuristics that make it hard to set the knobs
  - 3. HLS tools do not handle the simultaneous optimization of multiple components

## Motivational Examples Need of compositionality



- We propose COSMOS, an automatic methodology for the design-space exploration of complex accelerators
  - 1. COSMOS is able to efficiently coordinate highlevel synthesis and memory generator tools

- We propose COSMOS, an automatic methodology for the design-space exploration of complex accelerators
  - 1. COSMOS is able to efficiently coordinate highlevel synthesis and memory generator tools
  - 2. COSMOS leverages a scalable compositional design-space exploration methodology

- We propose COSMOS, an automatic methodology for the design-space exploration of complex accelerators
  - Step 1: Component Characterization



- We propose COSMOS, an automatic methodology for the design-space exploration of complex accelerators
  - Step 2: Design-Space Exploration



## **Component Characterization**

• Goal: for each component of the accelerator identify the regions with the Pareto-optimal implementations



## **Component Characterization**

• Goal: for each component of the accelerator identify the regions with the Pareto-optimal implementations



#### Component Characterization How to identify the lower-right point







ACM/IEEE CODES + ISSS 2017, Seoul, South Korea







ACM/IEEE CODES + ISSS 2017, Seoul, South Korea



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

## Component Characterization Identifying the upper-left point



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

## **Design-Space Exploration**



## **Design-Space Exploration**



# **Design-Space Exploration**



# **Design-Space Exploration**



#### Design-Space Exploration Step 1: Synthesis Planning



Computational dependencies among the components of the accelerator



#### Timed Marked Graph (TMG)

#### Design-Space Exploration Step 1: Synthesis Planning



#### Timed Marked Graph (TMG)

throughput of 
$$\vartheta = \frac{1}{\min(\frac{1}{\lambda_1 + \lambda_2}, \frac{1}{\lambda_1 + \lambda_3})}$$

ACM/IEEE CODES + ISSS 2017, Seoul, South Korea











#### Design-Space Exploration Step 2: Synthesis Mapping

**CASE 3**:  $\lambda_k$  falls inside a region



#### Design-Space Exploration Step 2: Synthesis Mapping





# Experimental Results

#### **Component Characterization**





### **Experimental Results Component Characterization**



### Experimental Results Component Characterization



## Experimental Results Design-Space Exploration (Efficiency)



## Experimental Results Design-Space Exploration (Efficiency)



### Experimental Results Design-Space Exploration (Accuracy)



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

### Experimental Results Design-Space Exploration (Accuracy)



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

 We presented COSMOS, an automatic methodology for design-space exploration (DSE) of accelerators that coordinates HLS and memory generator tools



ACM/IEEE CODES + ISSS 2017, Seoul, South Korea

- We presented COSMOS, an automatic methodology for design-space exploration (DSE) of accelerators that coordinates HLS and memory generator tools
- 1. COSMOS guarantees a richer DSE compared to the methods that do not consider the accelerator PLMs

- We presented COSMOS, an automatic methodology for design-space exploration (DSE) of accelerators that coordinates HLS and memory generator tools
- 1. COSMOS guarantees a richer DSE compared to the methods that do not consider the accelerator PLMs
- 2. COSMOS guarantees a much faster DSE compared to exhaustive methods in case of complex accelerators

- We presented COSMOS, an automatic methodology for design-space exploration (DSE) of accelerators that coordinates HLS and memory generator tools
- 1. COSMOS guarantees a richer DSE compared to the methods that do not consider the accelerator PLMs
- 2. COSMOS guarantees a much faster DSE compared to exhaustive methods in case of complex accelerators
- 3. COSMOS is a scalable methodology for DSE



# COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators Questions?





**Speaker**: Luca Piccolboni Columbia University, NY