#### MARACAS

Ying Ye, Richard West Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

# MARACAS: A Real-Time Multicore VCPU Scheduling Framework

### Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Computer Science Department Boston University



### Overview

### MARACAS

Ying Ye, Richard West Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

Introduction

Quest RTOS

3 Background Scheduling

Memory-Aware Scheduling

5 Multicore VCPU Scheduling

6 Evaluation

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Multicore platforms are gaining popularity in embedded and real-time systems
  - concurrent workload support
  - less circuit area
  - lower power consumption
  - lower cost

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Multicore platforms are gaining popularity in embedded and real-time systems
  - concurrent workload support
  - less circuit area
  - lower power consumption
  - lower cost
- Complex on-chip memory hierarchies pose significant challenges for applications with real-time requirements

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

### • Shared cache contention:

- page coloring
- hardware cache partitioning (Intel CAT)
- static VS dynamic

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

### • Shared cache contention:

- page coloring
- hardware cache partitioning (Intel CAT)
- static VS dynamic
- Memory bus contention:
  - bank-aware memory management
  - memory throttling

### Contribution

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- We proposed the use of foreground (reservation) + background (surplus) scheduling model
  - improves application performance
  - effectively reduces resource contention
  - well-integrated with real-time scheduling algorithms

### Contribution

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- We proposed the use of foreground (reservation) + background (surplus) scheduling model
  - improves application performance
  - effectively reduces resource contention
  - well-integrated with real-time scheduling algorithms
- We proposed a new bus monitoring metric that accurately detects traffic

### Application

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Imprecise computation/Numeric integration
  - MPEG video decoding: mandatory to process I-frames, optional to process B- and P-frames to improve frame rate

## Application

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

### Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Imprecise computation/Numeric integration
  - MPEG video decoding: mandatory to process I-frames, optional to process B- and P-frames to improve frame rate
- Mixed-criticality systems running performance-demanding applications
  - machine learning
  - computer vision

# Quest RTOS

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

### Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- VCPU model (C, T) in Quest RTOS
  - C: Capacity
  - T: Period

# Quest RTOS

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

### Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- VCPU model (C, T) in Quest RTOS
  - C: Capacity
  - T: Period
  - Partitioned scheduling using RMS



# Quest RTOS

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

### Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- VCPU model (C, T) in Quest RTOS
  - C: Capacity
  - T: Period
- Partitioned scheduling using RMS
- Schedulability test  $\sum_{1}^{n} \left(\frac{C_{i}}{T_{i}}\right) \leq n(\sqrt[n]{2}-1)$



#### MARACAS

Ying Ye, Richard West Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

#### Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

 VCPU enters background mode upon depleting its budget (C)



### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

### Quest RTOS

### Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

 VCPU enters background mode upon depleting its budget (C)



 Core enters background mode when all VCPUs are in background mode

### MARACAS

- Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng
- Introduction
- Quest RTOS

### Background Scheduling

- Memory-Aware Scheduling
- Multicore VCPU Scheduling
- Evaluation
- Conclusion

 VCPU enters background mode upon depleting its budget (C)



- Core enters background mode when all VCPUs are in background mode
- Background CPU Time (**BGT**): time a VCPU runs when core in background mode

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

#### Background Scheduling

- Memory-Aware Scheduling
- Multicore VCPU Scheduling
- Evaluation
- Conclusion

 VCPU enters background mode upon depleting its budget (C)



- Core enters background mode when all VCPUs are in background mode
- Background CPU Time (**BGT**): time a VCPU runs when core in background mode
- Background scheduling: schedule VCPUs when core is in background mode
  - fair share of **BGT** amongst VCPUs on core

## DRAM structure

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

### DRAM



### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

• Prior work [MemGuard] uses "Rate Metric": number of DRAM accesses over a certain period

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Prior work [MemGuard] uses "Rate Metric": number of DRAM accesses over a certain period
  - Bank-level parallelism

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Prior work [MemGuard] uses "Rate Metric": number of DRAM accesses over a certain period
  - Bank-level parallelism
  - Row buffers

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Prior work [MemGuard] uses "Rate Metric": number of DRAM accesses over a certain period
  - Bank-level parallelism
  - Row buffers
  - Sync Effect

# Sync Effect



# Sync Effect



• Each task reduces its access rate by a factor of (T-t)/T

• Contention in [0, t] remains the same





## Latency Metric

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

- UNC\_ARB\_TRK\_REQUEST.ALL (requests): counts all memory requests going to the memory controller request queue
- UNC\_ARB\_TRK\_OCCUPANCY.ALL (occupancy): counts cycles weighted by the number of pending requests in the queue

# Latency Metric

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

- UNC\_ARB\_TRK\_REQUEST.ALL (requests): counts all memory requests going to the memory controller request queue
- UNC\_ARB\_TRK\_OCCUPANCY.ALL (occupancy): counts cycles weighted by the number of pending requests in the queue

• Average latency:  $latency = \frac{occupancy}{requests}$ 

# Memory Throttling

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

• When core gets throttled, background scheduling is disabled

# Memory Throttling

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- When core gets throttled, background scheduling is disabled
- Latency threshold: MAX\_MEM\_LAT if latency > MAX\_MEM\_LAT then num\_throttle + +

# Memory Throttling

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- When core gets throttled, background scheduling is disabled
- Latency threshold: MAX\_MEM\_LAT if latency > MAX\_MEM\_LAT then num\_throttle + +
- Proportional throttling
  - Every core is throttled at some point
  - Throttled time proportional to core's DRAM access rate

| MARACAS                                                        |
|----------------------------------------------------------------|
| Ying Ye,<br>Richard West,<br>Jingyi Zhang,<br>Zhuoqun<br>Cheng |
|                                                                |
|                                                                |
|                                                                |
|                                                                |
| Multicore<br>VCPU<br>Scheduling                                |
|                                                                |
|                                                                |
|                                                                |

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

• Run migration thread with highest priority on each core: pushing local VCPUs to other cores (starts from highest utilization ones)

#### MARACAS

- Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng
- Introduction
- Quest RTOS
- Background Scheduling
- Memory-Aware Scheduling
- Multicore VCPU Scheduling
- Evaluation
- Conclusion

- Run migration thread with highest priority on each core: pushing local VCPUs to other cores (starts from highest utilization ones)
  - Only one migration thread active during a migration period

### MARACAS

- Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng
- Introduction
- Quest RTOS
- Background Scheduling
- Memory-Aware Scheduling
- Multicore VCPU Scheduling
- Evaluation
- Conclusion

- Run migration thread with highest priority on each core: pushing local VCPUs to other cores (starts from highest utilization ones)
- Only one migration thread active during a migration period
- Its execution of its entire capacity C does not lead to any other local VCPUs missing their deadlines

### MARACAS

- Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng
- Introduction
- Quest RTOS
- Background Scheduling
- Memory-Aware Scheduling
- Multicore VCPU Scheduling
- Evaluation
- Conclusion

- Run migration thread with highest priority on each core: pushing local VCPUs to other cores (starts from highest utilization ones)
- Only one migration thread active during a migration period
- Its execution of its entire capacity C does not lead to any other local VCPUs missing their deadlines
- Constraint on C:

$$C \geq 2 \times E_{lock} + E_{struct}$$

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

# • For every core, define Slack-Per-VCPU (**SPV**): $SPV = \frac{1 - \sum_{i=1}^{n} (C_i/T_i)}{n}$

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

# • For every core, define Slack-Per-VCPU (**SPV**): $SPV = \frac{1 - \sum_{i=1}^{n} (C_i/T_i)}{n}$



#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

• Balance Background CPU Time (**BGT**) used by every VCPU across cores: equalize **SPV**s of all cores

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Balance Background CPU Time (**BGT**) used by every VCPU across cores: equalize **SPV**s of all cores
  - BGT fair sharing

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Balance Background CPU Time (**BGT**) used by every VCPU across cores: equalize **SPV**s of all cores
  - BGT fair sharing
  - balanced memory throttling capability on each core

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Balance Background CPU Time (**BGT**) used by every VCPU across cores: equalize **SPV**s of all cores
  - BGT fair sharing
  - balanced memory throttling capability on each core



## Cache-Aware Scheduling

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- Static cache partitioning amongst cores
  - page coloring

## Cache-Aware Scheduling

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

- Static cache partitioning amongst cores
  - page coloring

### • New API:

bool vcpu\_create(uint C, uint T, uint cache);

## Cache-Aware Scheduling

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introductior

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

- Static cache partitioning amongst cores
  - page coloring
- New API:

bool vcpu\_create(uint C, uint T, uint cache);

 Extension of VCPU Load Balancing: destination core meets the cache requirement

### Evaluation

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

### • MARACAS running on the following hardware platform:

| Processor | Intel Core i5-2500k quad-core                         |
|-----------|-------------------------------------------------------|
| Caches    | 6MB L3 cache, 12-way set associative, 4 cache slices  |
| Memory    | 8GB 1333MHz DDR3, 1 channel, 2 ranks, 8KB row buffers |

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

### Evaluation

Conclusion

 Micro-benchmark m\_jump: byte array[6M]; for (uint32 j = 0; j < 8K; j += 64) for (uint32 i = j; i < 6M; i += 8K) < Variable delay added here > (uint32)array[i] = i;

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

 Micro-benchmark m\_jump: byte array[6M]; for (uint32 j = 0; j < 8K; j += 64) for (uint32 i = j; i < 6M; i += 8K) < Variable delay added here > (uint32)array[i] = i;

• Three m\_jump (task 1,2,3) running on separate cores without memory throttling, utilization (C/T) 50%

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

 Micro-benchmark m\_jump: byte array[6M]; for (uint32 j = 0; j < 8K; j += 64) for (uint32 i = j; i < 6M; i += 8K) < Variable delay added here > (uint32)array[i] = i;

- Three m\_jump (task 1,2,3) running on separate cores without memory throttling, utilization (C/T) 50%
- Each run, insert a different time delay in task1 and task2, task3 has no delay

#### MARACAS

- Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng
- Introduction
- Quest RTOS
- Background Scheduling
- Memory-Aware Scheduling
- Multicore VCPU Scheduling
- Evaluation
- Conclusion

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

|   | Bus Traffic (GB) | Latency | task3 Instructions<br>Retired ( $	imes 10^8$ ) |
|---|------------------|---------|------------------------------------------------|
| Н | 1128             | 228     | 249                                            |
| М | 1049             | 183     | 304                                            |
| L | 976              | 157     | 357                                            |

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

|   | Bus Traffic (GB) | Latency | task3 Instructions<br>Retired ( $	imes 10^8$ ) |
|---|------------------|---------|------------------------------------------------|
| Н | 1128             | 228     | 249                                            |
| М | 1049             | 183     | 304                                            |
| L | 976              | 157     | 357                                            |

- Setting comparable thresholds:
  - rate-based: derived from Bus Traffic (1128/time)
  - latency-based: from Latency (228)

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

|   | Bus Traffic (GB) | Latency | task3 Instructions Retired ( $\times 10^8$ ) |
|---|------------------|---------|----------------------------------------------|
| Н | 1128             | 228     | 249                                          |
| М | 1049             | 183     | 304                                          |
| L | 976              | 157     | 357                                          |

- Setting comparable thresholds:
  - rate-based: derived from Bus Traffic (1128/time)
  - latency-based: from Latency (228)
- Last column serves as reference, showing the **expected** performance of task3 using the corresponding thresholds

MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

• Repeat experiment with memory throttling enabled and fixed delay for task1/task2

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Backgroun Scheduling

Memory-Aware Scheduling

Multicore VCPU Schedulin

Evaluation

Conclusion

• Repeat experiment with memory throttling enabled and fixed delay for task1/task2



### Conclusion

#### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

Conclusion

• MARACAS uses background time to improve task performance; when memory bus is contended, it gets disabled through throttling

### Conclusion

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- MARACAS uses background time to improve task performance; when memory bus is contended, it gets disabled through throttling
- MARACAS uses a latency metric to trigger throttling, outperforming prior rate-based approach

### Conclusion

### MARACAS

Ying Ye, Richard West, Jingyi Zhang, Zhuoqun Cheng

Introduction

Quest RTOS

Background Scheduling

Memory-Aware Scheduling

Multicore VCPU Scheduling

Evaluation

- MARACAS uses background time to improve task performance; when memory bus is contended, it gets disabled through throttling
- MARACAS uses a latency metric to trigger throttling, outperforming prior rate-based approach
- MARACAS fairly distributes background time across cores, for both fairness and better throttling