

# FROM 3D TECHNOLOGY TO 2.5D AND 3D MANY-CORE ARCHITECTURES

Package

Pascal Vivet | Cea-Leti | 21-22 Sept 2016

1











# **CHALLENGES OF HIGH PERFORMANCE COMPUTING**



# How to fit more ?

- ... More cores
- ... More Memory
- ... Memory closer to core
- ... Computing Model
- ... Power Efficiency
- ... Thermal Dissipation

### & cost !



# Computing Applications





leti









- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives



- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives







# **3D Stacking strategy : Wafer ? Die ?**



## 3D Si technologies – focus on interconnection leti Ceatech

# **3D SILICON TECHNOLOGY / FINE PITCH CHIP-TO-WAFER ROADMAP**

## Cu/Sn solder µbumps

with pre-applied underfill

### Hybrid Cu-SiO2 bonding

Glue-less and self-alignment



5µm 10 µm pitch









<1 µm alignment accuracy using selfassembling with hybrid bonding



Ø20 μm

40 µm

 $Ø10\mu m$  (in dev.) 20 µm

# **Pre-applied underfill solution**



A. Garnier et al., ECTC 2014

Ø 80µm

160 µm

Size

Pitch

# **TSV High Aspect Ratio, Metallization Challenges**

leti

Ceatech

Silicon thickness ? → key contributor for thermal & stress management Need more agressive TSV aspect ratio for trading-off perf & thermal/stress



| 20



# **3D TECHNOLOGY : DESIGN CHALLENGES ?**





- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives



# 2.5/3DIC Commercial Announcements!





# **3D STACKED BACKSIDE IMAGERS**

# Most industrial players have adopted 3D Stacked BSI



"All in one" integration OCost effective for low-end image sensors > Large chip alos & low performance for minimum. to high-and apple stores







And Address of the Party

Source : JL Jaffard, Imaging Technologies and applications: Pioneers of TSV and 3D technologies, TSV Summit 2016











# **INTERPOSER (OR 2.5D) : XILINX VIRTEX 7 SERIE**

# • XILINX: The first 2.5D interposer product

- FPGA is split in slices, stacked onto an interposer
- Main advantages : gain in yield for very large dies
- A full product family & roadmap is available
- Xilinx is now going to heterogeneous dies (for fast IO's)





TSV: Through Silicon via



# **3D DRAM : COMPARISON**



|                         | LPDDR4              | WidelO/2           | НВМ                                               | НМС                     | DiRAM4              |
|-------------------------|---------------------|--------------------|---------------------------------------------------|-------------------------|---------------------|
|                         |                     | JEDEC.             | JEDEC.                                            | Hybrid Memory Cube      |                     |
|                         |                     |                    | 3D Memory Silcon die<br>Base die Substrate        |                         |                     |
|                         |                     | SK hynix           | SK hynix SAMSUNG                                  | Agicron                 |                     |
| Interface type          | parallel            | wide data          | wide data                                         | serial                  | wide data or serial |
| Data bus                | 16b DDR             | 64b DDR            | 128b DDR                                          | 16 lanes                | 64b                 |
| Channel                 | 2                   | 4-8                | 8                                                 | 4-8                     |                     |
| I/O bandwidth           | 3.2Gbps<br>@1600MHz | 0.8Gbps<br>@400MHz | 1-2Gbps<br>@500-1000MHz                           | 10-15Gbps               |                     |
| Total bandwidth         | 12.8GBps            | 25.6-51.2GBps      | 128-256GBps                                       | 160-320GBps             | 2TBps               |
| Capacity                | 16GB                | 16GB               | 32GB<br>Currently 1GB (Gen1)<br>Next 4-8GB (Gen2) | 32GB<br>Currently 2-4GB | 8GB                 |
| Total I/O               | 66                  | 776                | 1616                                              | 256-512                 |                     |
| Integration / Packaging | POP, MCP            | 3D                 | 2.5D                                              | MCP                     |                     |
| Computing-In-Memory     |                     | NO                 | NO                                                | YES                     | NO                  |

# **3D DRAM MEMORY STACKING : HMC VS HBM**

HMC (Hybrid Memory Cube), ex : Micron 3D stack only, no passive silicon interposer

leti

Host





# **HBM PRODUCT EXAMPLES (1/2)**

# 

- AMD has presented in 2015 the first commercial GPU product including HBM Gen1 memories
- "Fiji" chip is part of the Radeon **Fury graphics card series**





#### AMDZ ER 2015

- **Combination of:** - HBM DRAM memory (3D) - Silicon interposer (2.5D)
- x3 Performance per Watt
- 60% gain in Memory BW
- 95% less PCB area versus GDDR5



### **FPGA**

• Altera integrates HBM2 memories from SK hynix in Stratix 10 products



 Integration is performed thanks to the EMIB (Embedded Multidie Interconnect Bridge)
 Heterogeneous Integration using EMIB Technology



### GPU

• NVIDIA will integrate HBM2 memory from Samsung in the "Pascal" GPU module expected in 2017.









# **3D Integration**

Source: D. Dutoit, VLSI'13

# Comparison with LPDDR3



# **Comparison with LPDDR3**

### > 4x gain in power efficiency with 3D-TSV interconnect

| Memory Type          |              | LPDDR3 - [1]   | Widel0 - This work |  |
|----------------------|--------------|----------------|--------------------|--|
| Package              |              | PoP / Discrete | 3D-IC              |  |
| BW (Gbyte/s)         |              | 6.4 GB/s       | 12.8 GB/s          |  |
| Total power          |              |                | 293 mW*            |  |
| VDD-MPSoC            | MPSoC power  |                | 121 mW*            |  |
| VDD-Mem              | Memory Power | 4x gain        | 81 mW*             |  |
| VDD-I/O              | I/O power    |                | 91 mW*             |  |
| I/O power efficiency |              | 3.7 pJ/bit**   | 0.9 pJ/bit*        |  |

 Yong-Cheol Bae, et al. "A 1.2V 30nm 1.6Gb/s/pin 4Gb LPDDR3 SDRAM with input skew calibration and enhanced control scheme," ISSCC-2012.

> Measurements conditions: \* at speed (200MHz) 13N MBIST, 80°C \*\* Read, 5pF load without ODT

Source: Dutoit, 2013 Symposia on VLSI Technology and Circuits Slide 24



# **SRAM-ON-LOGIC : 3DMAPS MULTI-CORE**

- 3D MAssively Parallel processor with Stacked memory - 130nm GLOBALFOUNDRIES + Tezzaron F2F bonding - 64 cores, 5-stage/2-way VLIW architecture - 256KB SRAM, 1-cycle access - 5mm X 5mm, 230 IO cells - 277MHz Fmax, 1.5V Vdd - 64GB/s memory BW @ 4W

64 Cores, Split in 2 layers CPU ⇔ SRAM, 5 stage VLIW pipeline,

- TSV: 50K used for IO & dummy
 - TSV: 1.2um diameter, 5um pitch
 - F2F: 50K used for memory access
 - F2F: 3.4um diameter, 5um pitch





2 logic tiers, face-to-face bonded

- Top die thinned to 12um, bottom die is 765um

- GLOBALFOUNDRIES 130nm technology + Artisan library/IP



[ISSCC'2012, GeorgiaTech]



- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives

# A 3D ASYNCHRONOUS NOC FOR ENERGY EFFICIENT MULTI-CORE ARCHITECTURES

# **Energy Efficient Multi-Core**

leti

22 tech

- Performances adaptation wrt. application requirements
- High energy efficiency : cores & system communications
  Design Challenge ?
- high bandwidth & energy efficient communication infrastructure

# Use 3D technology for :

- Logic-on-Logic partitioning, to scale delivered performances
- Reduce inter-chip communication
  power consumption

# From 2D to 3D Network-on-Chip

- Scalable and modular chip-to-chip communication
- Target both homogeneous & heterogeneous cores & technologies
- Asynchronous logic avoids global clocking, robust to thermal variations





# **3DNOC CIRCUIT : A LOGIC-ON-LOGIC MULTI-CORE**



# • 3D Network-on-Chip based multi-core

- Heterogeneous multi-core, MIMO 4G-Telecom application
- Stack 2 similar dies on top of each others
- No global clock, robust asynchronous 3D links
- Serial link for throughput / #TSV trade-off
- 3D-DFT & Fault Tolerance Scheme





|                            | GeorgiaTech<br>ISSCC'2012 | Kobe Univ.<br>ISSCC'2013        | This Work                        |
|----------------------------|---------------------------|---------------------------------|----------------------------------|
| Architecture               | Cache-on-CPU<br>Manycore  | Memory-on-Logic<br>1 layer DRAM | Logic-on-Logic<br>2 layers 3DNOC |
| Process &<br>3D technology | 130nm<br>F2F CuCu         | 90nm<br>F2B TSV                 | 65nm<br>F2B TSV                  |
| 3D Bandwidth               | 277 Mbps                  | 200 Mbps                        | 326 Mbps                         |
| 3D I/O Power               | -                         | 0.56 pJ/bit                     | 0.32 pJ/bit                      |

# • 3D Link Performances

@ 1

22 tom

- Fastest link, +20% (326 Mflit/s)
- Best Energy Efficiency, +40% (0.32 pJ/bit)
- Self-Adaptation to Temperature, a strong 3D concern

### [P. Vivet et al. ISSCC'16]



# An efficient 3DPlug (asynchronous 3DNOC including test & fault tolerance): a first step towards 3D-based computing architectures



# 3D NOC & 3D LINK : OVERVIEW

### •3DNOC router & topology

•No use of 7x7 ports router : too large & slow !

### •Hierarchical router

-5x5 routers for intra-die com.

-4x4 router for inter-die com. and cores

### •Performances

- •One-hop latency for intra-die com.
- •Two-hop latency for inter-die com.

•Preserve throughput

•Better area than 7x7 router



Fully implemented in asynchronous logic Robust 3D interface, no clocking issues

# Each bi-directional up/down 3D link composed of:

- 3D Routing stage
- Pipeline stage
- DFT stage
- µbuffer & physical stage





# **3D TECHNOLOGY & 3DNOC CIRCUIT**

[P. Vivet et al. ISSCC'16]





# 3DNOC CIRCUIT DEMONSTRATION : SELF-ADAPTATION OF ASYNCHRONOUS LINK PERFORMANCES WRT. TEMPERATURE

### Thermal impacts in 3D?

- Due to 3D, increased power density, use of thin die (TSVs),
- Thermal impact on package, cost, reliability, & circuit performances

### Live demo of 3DNOC circuit

- Thermal throttling using active heaters
- On-chip thermal measurements
- 3D NoC asynchronous link performance measurements with traffic generators showing self-adaptation







# 3DNOC scalability : from 2 layers to 8 layers?

Is 3DNOC circuit scalable up to 8 layers ? 





impact of the current hooked up to

2 die stade

Die stack configuration

the top dies on the IR-drop over

he MG network in bottom die

150

100

75

0 1

Single-die

Worst instance IR-drop (mV) 52 0.01 0.01 521



Power Map & Budget ~ 800 mW / layer

Voltage drop within the stack





Conserving of the best must exactly also current density analy around the TBV birdys

Main indexes Poroprinage scombines voltage drog onligi duna pounce)

APACHE/RedHawk 3D simulations



8 layers, Worst IRdrop ~ 125 mV

4-die stack

Impact of TSVs /

and a-burnes

Static IR-drop analysis

[P. Vivet, to appear in JSSC'17-01]

8-die stack

Bottom-die

@ 1.2V

reint (A)

ī

Top-die 1

Top-die 2

Top-die 3.

Top-die 4

Top-die 5

Too-die 6 Top-die 7 ---- Total current



# 3DNOC scalability : from 2 layers to 8 layers?

Thermal Model & Study

Power Map & Budget ~ 800 mW / layer Thermal model : 3D dies + package + socket + PCB









Thermal model (FinTHEPM93) for the 3D chip + package + boord



Thermal analysis using SAHARA & FIoTHERM



# • 3DNOC Thermal Dissipation



Thermal Dissipation with regular packaging (8 layers, Pmax=6 Watts, Tmax=94°)

For limited power budget :

- Power delivery is sufficient (< 10% IRdrop)
- Max temperature < 100°C
- → multilayer 3DNOC is feasible up to 8 layers



- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives

#### leti **ACTIVE INTERPOSER PARTITIONING FOR MANY-CORE** C22 tech



- « Active » Interposer : which added value ?
- Heterogeneous 3D
- System IOs
- Power Management

- Advanced tech node for computation within chiplets
- Mature tech node for communication/power/DFT/etc
- Chip-to-Chip Interconnect Hierarchical NoC, for energy efficient communications
  - On Interposer, for off-chip memory accesses
  - Chiplet power supply, without any external passives
- And most of all ... preserve (active) interposer cost !

### Target low logic density (eg < 10%) to preserve interposer yield & cost





# **ACTIVE INTERPOSER FOR COMPUTING :** 28FDSOI CHIPLETS 3D-STACKED ON A 65NM ACTIVE INTERPOSER **OFFERING A 96 CORES COMPUTE FABRIC**

u-bumps Ø 10 µm Pitch 20 µm



### 28nm FDSOI chiplets (x6)

- Low Power Compute Fabric
- Wide Voltage Range (0.6V 1.2V)
- Body Biasing for logic boost & leakage ctrl

### **65nm Active Interposer**

- Power unit (Switched Cap DC-DC conv.)
- Interconnect (Network-on-Chip)
- Test, clocking, thermal sensors, etc













**TSV** Ø 10µm Height 100µm

DELEC.

**Performance Targets** ✓ 100 GOPS ✓ 10 GOPS/Watt ✓ 25 Watts total

#### **Application Targets**

- ✓ Big Data
- ✓ Networking
- ✓ High Performance Computing



### **Cache Coherent Compute Fabric**

- 96 cores (MIPS-32bit)
- L1/L2/L3 coherent caches
- Implemented with 3D-Plugs -
- Full support of Linux OS
- [D. Dutoit, VLSI-Symposium'2016]
- [P. Vivet, S. Cheramy, 3DIC'2015]
- [P. Vivet, E. Guthmuller, ISVLSI'2015]



• Chip-to-Chip Active or Passive NoC links High throughput, Low latency, robust interface



- 3D-Plug need to cope with :
  - DFT interface : muxes for Boundary Scan cells
  - Electrical Interface : µ-buffer cell design
  - Physical interface : layout constraints of µ-bump/TSV array, PG grid, etc.
  - Logical interface : protocol signalling, timing margins, etc.



- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives

# ON-CHIP COMMUNICATION ON INTERPOSER : PASSIVE, ACTIVE OR PHOTONIC ?





| Metallic<br>1-4 chiplets |               | Active<br>6 chiplets | Photonic<br>6-10 chiplets |  |  |
|--------------------------|---------------|----------------------|---------------------------|--|--|
| 2015                     |               | 2017                 | 2020                      |  |  |
| Technology               | Metallic      | Active               | Photonic                  |  |  |
| On-chip<br>bandwidth     | ≤ 250<br>Gb/s | ≤2 Tb/s              | >4Tb/s <b>(&gt;2x)</b>    |  |  |
| Number of cores          | ≤ 16          | ≤ 36                 | > 72 (>2x)                |  |  |
| Power for<br>on-chip com | ~ 1 W         | ~ 20 W               | ~ 20 W (~1x)              |  |  |

 Photonic : The Scale-up/Scale-out Technology !
 For a given power envelop, it will offer larger traffic bandwidth, & integrate more cores onto a single package

Source: Thonnart, Y., Zid, M. "Technology assessment of silicon interposers for manycore SoCs: Active, passive, or optical?" NoCS 2014



- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives



[3] Batude, P., et al. "3DVLSI with CoolCube process: An alternative path to scaling ." VLSI technology symposium 2015

23/09/2016 | 85

# **3D TECHNOLOGY AND NEW COMPUTING PARADIGM**







## **N3XT Architecture**

- Monolithic 3D
- 3D RRAM
- CNT FET
- Tight memory-computing integration

Claim a ~ x1000 gain in energy efficiency gain (from technology, architecture)

« Energy-Efficient Abundant-Data Computing: The N3XT 1,000x », M. Sabry & al, Computer, 2015, Volume: 48, Issue: 12



# MONOLITHIC 3D : COOLCUBE<sup>™</sup> PROCESS & DESIGN



Bottom FET processing - BULK, FINFET, FDSOL. - standard process - W/SiO2 metal lines

Top active by direct bonding: SOI and etch back or SMART CUT <sup>TM</sup> process enable to obtain large variety of substrates material and orientations

low Thermal Budget Top FET Activation: SPER. or ns laser anneal CDE Low temperature epitaxy Low TB and k spacers

3D contact realization and BEOL 3D via= standard W plug in oxide Cu/ Low k metal lines



Top layer @ low thermal budget (500/550°C)<sup>[1]</sup>
 High alignment precision process
 Up to 10<sup>8</sup> 3D Vias per mm<sup>2</sup> => 10<sup>4</sup> x than Cu-Cu or HD-TSV

EDA collaboration : Architecture level (Atrenta) ; Signoff DRC/LVS (Mentor)
 EDA tools for 3D High Density Place and Route : *required* !

□Up to 60% Area reduction & 25% better perf vs 2D 28 nm @ preliminary result<sup>[2]</sup>

→ Objective : 1 node gain without scaling : 28nm / 28 nm ⇔ 14 nm

[1] P. Batude, et al., "3DVLSI with CoolCube process: An alternative path to scaling", VLSI technology symposium 2015.
 [2] H. Sarhan, et al., "An Unbalanced Area Ratio Study for High Performance Monolithic 3D Integrated Circuits", ISVLSI 2015.







- RRAM ? it is a kind of 3D device post-processed within regular technology process ٠
- Co-design between Circuit Architecture & Technology is mandatory ٠
- **Circuit Design : Crossbar exploration & Sneak Path compensation** •
- System Design : non-volatile processor for IoT : fast wake-up, NV-FF, NV-SRAM, NV-REG •
- Going Further ? Advanced research on-going : Logic-in-Memory, Neuromorphic

Sy<mark>st</mark>em



# **4 LAYERS SMART IMAGER**

#### L1 : image capture

• BSI (Back Side Illumination)

#### L2 : read out circuit

- ADC (analog & digital)
- Analog processing

#### L2 : low level processing

- SIMD digital processing array
- Distributed Memory, 1<sup>st</sup> level

#### L3 : medium level processing

- Distributed Memory, 2<sup>nd</sup> level
- Host interface, System Communication
- Image processing





# LOGIC-ON-LOGIC : 3D NEURAL NETWORK CIRCUIT

### **Neural Networks**

- Classically divided in two layers of computation
- Difficult to implement in 2D, due to high congestions
- Very well adapted to 3D : one neuron layer per die !



Compared to 2D, 3D offers : 2x better total area 25% better in power







| Component<br>or Block | Power<br>(mW) | Power<br>(%) | Area $(\mu m^2)$ | Area<br>(%) | Critical<br>path (ns) |
|-----------------------|---------------|--------------|------------------|-------------|-----------------------|
| TOTAL                 | 353.90        | 100.00       | 3,634,195.44     | 100.00      | 6.63                  |
| Layer 1               | 247.62        | 69.97        | 911,395,45       | 25.08       |                       |
| Decoder               | 0.35          | 0.10         | 5,913.60         | 0.16        |                       |
| Configuration         | 0.03          | 0.01         | 2,442.40         | 0.07        |                       |
| Synapses (RAM)        | 208.10        | 58.80        | 431,636.64       | 11.88       |                       |
| Neuron                | 39.15         | 11.06        | 471,402.80       | 12.97       |                       |
| Layer 2               | 106.28        | 30.03        | 2,722,799.99     | 74.92       |                       |
| Decoder               | 0.42          | 0.12         | 7,495.99         | 0.21        |                       |
| Configuration         | 0.04          | 0.01         | 3,219.20         | 0.09        |                       |
| Synapses (RAM)        | 90.18         | 25.48        | 2,544,723.20     | 70.02       |                       |
| Neuron                | 15.64         | 4.42         | 167361.60        | 4.61        |                       |

Table 1. Characteristics and breakdown of (two-layer) 3D circuit.

| Component<br>or Block | Power<br>(mW) | Power<br>(%) | Area<br>(µm <sup>2</sup> ) | Area<br>(%) | Critical<br>path (ns) |
|-----------------------|---------------|--------------|----------------------------|-------------|-----------------------|
| TOTAL                 | 428.24        | 100.00       | 7.974,762.94               | 100.00      | 9.00                  |
| Decoder               | 1.05          | 0.24         | 13,497.90                  | 0.17        |                       |
| Configuration         | 4.32          | 1.01         | 4,506,958.60               | 56.52       |                       |
| Synapses (RAM)        | 298.28        | 69.65        | 2,976,359.84               | 37.32       |                       |
| Neuron                | 124.60        | 29.09        | 477,946.59                 | 5.99        |                       |

Table 2. Characteristics and breakdown of (two-layer) 2D circuit.

[B. Belhadj, R. Heliot, P. Vivet, CASSES'2014]

More layers ? Tighter integration of Neuron, Memory, and NVM ?



# **ARCHITECTURE "DATA CENTRIC": A 3D VISION ?**



Re-visit Processing-In-Memory thanks to new technologies ?

> Interposer integration for scaling



#### Distribute the processing within the memory hierarchy

- Memory hierarchy ? programming model ? some level of coherency ? Heterogeneous 3D integration
- Active Interposer, Non Volatile Memory technology, advanced node for computing **Scalability**
- Vertically : more memory layers
- Horizontally : more chiplets



- Introduction
- 3D Technology : an introduction
- State-of-Art on Circuits & Applications
- 3D Circuit Demonstrators
  - 3DNOC : A logic-on-logic multi-core
  - INTACT : An Active Interposer for computing
  - HUBEO : Photonic Interposer
- New Trends with High Density 3D technologies
- Conclusions & Perspectives

# **CONCLUSIONS & PERSPECTIVES**

#### 3D technology is mature and is already on the market !

• Imagers (Sony), MEMS

- Memory Cubes (Samsung, Hynix), with HMC, HBM, WidelO
- Xilinx Virtex7 (Passive Interposer)
- AMD & NVIDIA (GPU & HBM cubes on interposer)
- → 3D Technology and Value chain are ready and available
- → 3D CAD tools are getting mature

### Logic-on-Logic partitionning

- Many number of demonstrators ...
- 3DNOC : a first large scale 3D Network-on-Chip architecture & circuit
  - Energy efficient 3D communication, 326 Mbit/s, 0.66pJ/bit
  - Demonstrated self-adapation to temperature, can scale up to 8 dies,

#### **Chiplet partionning for scale-out architectures**

- Cost effective, heterogeneous technologies,
- Active Interposer, INTACT, offering 96 cores, target 100 GOPS, 25 Watts
- Photonic Interposer, for future large scale many-core









# 

# **CONCLUSIONS & PERSPECTIVES**

### 3D technology is continously evolving !

- Smaller pitch, new technologies
- Copper-Copper Hybrid bonding
- Monolithic 3D (CoolCube™)





#### An architecture **R-evolution**

- Smaller & Denser 3D interconnects will be available soon,
- Many design & CAD challenges
- Need to re-think system and computer architecture
- New opportunites for many applications
  - Imagers, Neuro, Processing-In-Memory, Many other ones







### **CEA-LETI** design & technology teams :

- S. Thuriès, Y. Thonnart, R. Lemaire, C. Santos, B. Giraud, D. Dutoit, F. Clermidy, J. Martin, E. Guthmuller, C. Bernard, I. Miro-Panadès, F. Darve, J. Durupt, G. Pillonnet, J. Pontès, D. Varreau,
- S. Cheramy, D. Lattard, L. Arnaud, F. Bana, A. Garnier, A. Jouve, T.Mourier

## **IRT-3D project**

• Part of this work was funded thanks to the French national program "Programme d'Investissements d'Avenir, IRT Nanoelec" ANR-10-AIRT-05



## **Our Partners**

