

### Versal<sup>™</sup> Al Edge Series Announcement

Rehan Tahir, Senior Product Line Manager



### What's Happening at the Edge

#### The Edge





### Hypergrowth at the Edge



"Edge computing ... solves for weaknesses of the cloud"<sup>1</sup>

\$\$\$ \$

Edge AI chipset opportunity is 3X that of data center - \$65B in 2025<sup>2</sup>

Edge Enterprise \$80 \$70 \$60 \$50 \$Billions \$40 \$30 \$20 \$10 2019 2020 2021 2022 2023 2024 2025

Deep learning chipset revenue, enterprise vs. edge, world markets: 2019–25

1: Gartner, "2021 Strategic Roadmap for Edge Computing", November 2020 2: Omdia, "Market Report: Deep Learning Chipsets", July 2020



### Now Bringing Versal ACAPs to the Edge

Versal<sup>™</sup> ACAPs first introduced breakthrough compute for the cloud and network
Now 'miniaturizing' this technology for performance/watt at the edge



### New Versal<sup>™</sup> Platform for Intelligence at the Edge





Smart Vision



Unmanned Aerial Vehicles



Collaborative Robotics



ADAS & Automated Drive



Endoscopy



Ultrasound





### Versal<sup>™</sup> Al Edge: Intelligence Unleashed



- 4X AI Performance/Watt vs. GPUs<sup>1</sup> with Innovations in AI Engines and Memory Hierarchy
- 10X Compute Density<sup>2</sup> with Highest Levels of Safety and Security
- World's Most Scalable and Adaptable
   Platform for Edge and Endpoint



1: vs. Jetson AGX Xavier, ResNet50 224x224, batch=1, <u>https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks</u> 2: Compared to Zynq® UltraScale+™ MPSoCs





### 4X AI Performance/Watt



### **Proven AI Engine Architecture**

#### Array of Compute Core

Flexible compute: fixed- & floating-point vector processors

HW adaptable to evolving algorithms

### **Tightly Coupled Memory**

Cache-less memory hierarchy

Maximizes bandwidth, ensures determinism & low latency

### **Flexible Interconnect**

- Connect any tile to any tile for custom microarchitecture
- High bandwidth

Al Engine Tile Distributed Data Memory (treconnect)

**AI Engines Array** 

(Part of ACAP Device)

#### Architected for Adaptability, Low Power, and Low Latency



### **Optimized AI Engines-ML for Machine Learning**

#### Optimized the compute core for ML

Doubled the multipliers, doubled INT8 performance
Native support for INT4 and BFLOAT16

#### Doubled the data memory

- From 32kB to 64 kB
- Improved localization of data

#### **New** Memory Tile

- Up to 38 Megabytes across the AI engine array
- Higher bandwidth memory access

#### Optimized AI Engine-ML Array (Part of ACAP Device)



#### Delivering 4X ML Compute at <sup>1</sup>/<sub>2</sub> the Latency<sup>1</sup>

1: AI Engine-ML delivers 2X INT8 compute, 4X INT4 compute, and 16X BFLOAT16 compute vs. AI Engine (per core) 2: Native 32-bit support in AI Engines only



### **AIE-ML Complements AI Engines for Diverse Workloads**



1: AI Engine-ML delivers 2X INT8 compute, 4X INT4 compute, and 16X BFLOAT16 compute vs. AI Engine (per core) 2: Native 32-bit support in AI Engines only



### **Innovations in Memory Hierarchy: Accelerator RAM**

#### 4MB of On-Chip RAM for Massive Bandwidth

Avoid DDR to store AI compute data or safety-critical code

#### Part of the Adaptable Memory Hierarchy

Select the right memory for bandwidth requirement





### Up to 4X Performance/Watt vs. GPUs



1: Jetson NX Xavier: https://mlcommons.org/en/inference-edge-10, batch size not provided

2: Jetson AGX Xavier run in a mid-performance & power configuration, categorized as "15 W-Mode": https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks

3: Jetson AGX Xavier MAX N-Mode and Versal™ VE2802 ACAP represent the highest performing device configuration in their respective portfolios

Jetson Xavier device power estimated by subtracting published memory & I/O power from total module power

All charts are normalized





### 10X Compute Density with Highest Levels of Safety and Security



### **10X** Compute Density: Level 3 Semi-Automated Driving



\*Power levels are typical, approximate, and estimated at room temp



### Whole Application Acceleration for Real-Time Systems From Sensor to AI to Real-Time Control

### EXECUTION TIME

| Sense | Think (AI) | Act |
|-------|------------|-----|
|       |            |     |





### Versal<sup>™</sup> AI Edge ACAP in ADAS and Automated Driving



- Intelligent Engines for signal conditioning and low-latency AI
- Scalar Engine for decision making and vehicle control
- Scalable compute from edge sensor to domain controller<sup>1</sup>

1: Diagram demonstrates capabilities of architecture; does not represent a single chip AD system

**E**XILINX.

(?)

Next Gen

4/8-Mpix

Multi-Camera

0

LPDDR

Radar LiDAR



CPU

Host Processor

(Optional)

### **Fully Automotive-Qualified and Safety Certified**

HURBRICH

ZYNQ

Architected to Meet Stringent ISO 26262 Requirements



1502621

### **Supporting Multiple Safety Standards**





### Collaborative Robotics: Al-Based Systems Need to be Safe and Secure

#### Real-Time Precision and Control to Augment AI

Deterministic response, AI to navigate unpredictable movement of workers

#### Environmental Awareness and Perception

Sensor fusion for perception, self-learning to improve capabilities over time

#### **Predictive Maintenance**

Analyze sensor data for actionable insights to reduce downtime

#### Safety and Security are Connected Matters

Cyber-attack creates safety and data privacy risks, robotic systems require IEC 62443 compliance



## Whole Application Acceleration for Collaborative Robotics



#### Robotic Perception Systems for Real-Time Control, Safety Critical, and Predictive Maintenance

- Adaptable Engines for perception, control/networking, navigation
- Al to augment control for dynamic execution, predictive maintenance
- Scalar Engines for cybersecurity (IEC 62443), safety control, UI





### AI-Enabled Multi-Mission Payloads for UAVs

AI with Software Defined Radio (SDR), Signal Intelligence (SIGINT), Image/Video Processing

Vision AI for Real-Time Analysis and Response.

Autonomous flight control, optimize navigation paths

Cognitive RF Optimizing radio communication and protecting against malicious intrusion

#### **Diverse and Emerging Forms of Al**

Al is rapidly evolving in tactical applications and vendors will need to adapt over time

Need AI Compute in Limited Size, Weight, and Power (SWaP) and Thermal Envelope



### **Versal AI Edge for Unmanned Aerial Vehicles**

Actuator



#### Multi-Mission and Situationally Aware UAVs with Low SWaP

- Adaptable Engines for sensor fusion and pre-processing
- Intelligent Engines for low power, low latency Al and signal conditioning
- Scalar Engines for command and control
- Ruggedized packaging and military-temp grade (XQ)



### **Versal ACAP Development Experience for All Developers**



| OS & Embedded Run-Time | Custom HW         | HW IP &<br>Accelerated Libraries | HW Accelerated Libraries |  |
|------------------------|-------------------|----------------------------------|--------------------------|--|
| Scalar Engines         | Adaptable Engines |                                  | Intelligent Engines      |  |
|                        |                   |                                  |                          |  |

Versal<sup>™</sup> AI Edge ACAP



### **Market-Specific Application Stacks**

#### Examples for Automotive, Robotics, and Multi-Mission Payload Applications

- One platform with market-specific libraries, frameworks, and ecosystem to enable all developers
- Following industry standards for developing safety critical software on silicon







# World's Most Adaptable and Scalable Edge Platform



### Adaptability: From Domain Specific Architectures (DSAs) to Dynamic Function Exchange

HW/SW OVER THE AIR (OTA) UPDATE



#### DSAs for Diverse Platform Requirements

- Implement custom AI, vision, sensor strategies
- Design for different safety and security targets
- One platform for diverse end-customers' requirements

#### Hardware/Software Over-the-Air Updates

- Update your AI accelerator or fusion algorithms
- Future proof for emerging security threats
- Avoid recalls or costly re-deployment

#### Dynamic Function Exchange (DFx)

- Swap functionality in milliseconds
- Available in Adaptable Engines, DSP, AI Engines
- Fewer system components → reduce power and cost





### **Dynamic Function Exchange (DFx) in Automotive**

**Drive Mode** Swap Functionality in (Lane Departure  $\Box D)$ Warning) Low Speed Mode (Parking Assist) Programmable Network on

> **Dynamic Regions** (Engines, Integrated Cores, I/O)

**Post-Drive Mode** (Dog Left Behind)

Fewer Devices to Reduce System-Wide Power and Cost



#### Scale from Edge Sensor to CPU Accelerator Accelerator Edge Aggregation & Autonomous Systems Intelligent Edge Sensor & End Point XILINX VERSAL VERSAL. VERSAL. AI EDGE VE2002 VE2102 VE2302 VE2602 VE1752 VE2802 VE2202 Total AI Compute (INT4) 14 TOPS 22 TOPS 47 TOPS 67 TOPS 256 TOPS **166 TOPS 479 TOPS** Total AI Compute (INT8) 7 TOPS 10 TOPS 21 TOPS 31 TOPS 120 TOPS **124 TOPS** 228 **TOPS** Engines AIE / AIE-ML<sup>1</sup> 8 12 24 34 152 304 304 Adaptable Engines 20K LUTs 37K LUTs 105K LUTs 150K LUTs 375K LUTs 449K LUTs 521K LUTs Dual-Core Arm® Cortex®-A72 Application Processing Unit / Dual-Core Arm Cortex-R5F Real-Time Processing Unit Processing Subsystem $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ Accelerator RAM (4MB) RAM **Total Memory** 95Mb 103Mb 156Mb 172Mb 554Mb 253Mb 575Mb 32G Transceivers 32 8 8 32 44 $\checkmark$ $\checkmark$ PCle® - $\checkmark$ $\checkmark$ $\checkmark$ PCIe + CCIX **Estimated Power** 6–9W 7–10W 15–20W ~20W 50–60W 50–60W 75W

30 1: VE2xx2 based on AIE-ML, VE1752 device base based on AIE



# The Only Edge AI Platform that Scales from Sensor to Accelerator on a Single Architecture<sup>1,2</sup>

|                        | 1–100 Watts |           |           |           |  |  |
|------------------------|-------------|-----------|-----------|-----------|--|--|
|                        | 0-10TOPS    | 10-25TOPS | 25–75TOPS | 100+ TOPS |  |  |
|                        |             |           |           |           |  |  |
| 👁 INIDIA (Jetson)      | •           | •         | •         |           |  |  |
| <b>™INIDIA</b> (T4)    |             |           |           | •         |  |  |
|                        |             |           | •         | •         |  |  |
|                        | •           | •         | •         |           |  |  |
| TEXAS<br>INSTRUMENTS   | •           | •         | •         |           |  |  |
| Qualcomm<br>snapdragon | •           | •         |           |           |  |  |
| Qualcomm® Cloud Al 100 |             |           | •         | •         |  |  |
| (iMX8)                 | •           |           |           |           |  |  |
| RENESAS                | •           |           | •         |           |  |  |
| Shown in INT8 TOPS     |             |           |           |           |  |  |

2: Based on published sources

XILINX.

### Scalable for Different Requirements and Product Features



#### Scale for Varying Levels of Compute and Safety

- Scale number of sensors, AI compute, vision and video processing
- e.g., Scale from Level-3 ADAS to Level-5 automated drive on a single platform



#### Scale a Low-End to High-End End-Product Portfolio

- Design once, scale with same tools, SW, ecosystem, safety certification
- Scale for different price points and capabilities



#### **Explore Distributed vs. Centralized Architectures**

- "Load Balance" across the system
- Shift compute from edge sensor to central compute across a single system





### **How Customers Can Get Started**



### Availability

Documentation Available Now

Tools Available in 2<sup>nd</sup> Half of '21

ES & Production Silicon in 1<sup>st</sup> Half '22

Versal<sup>™</sup> AI Edge ACAP Eval Kit in 2<sup>nd</sup> Half '22





### **Start Prototyping Now**

### Start Now with Versal AI Core ACAP VCK190 Evaluation Kit Migrate Later to Versal AI Edge Device



Evaluate Key Blocks in Versal™ AI Edge Leverage Vitis™ Accelerated Libraries

Breadth of Interfaces for System Testing System-Design Methodology Guides Guided Flows in Vitis and Vivado® Tools



### Versal AI Edge ACAP: Intelligence Unleashed From Sensor to AI to Real-Time Control

4X AI Performance/Watt vs. GPUs<sup>1</sup> with Innovations in AI Engines and Memory Hierarchy

10X Compute Density<sup>2</sup> with Highest Levels of Safety and Security

World's Most Scalable and Adaptable
 Platform for Edge Systems



1: vs. Jetson AGX Xavier (MAX N-Mode), ResNet50 224x224, batch=1, <u>https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks</u> 2: Compared to Zynq® UltraScale+™ MPSoCs



### **XILINX**.

### **Thank You**

