top of page

AMD | Development of Cache Architecture from Planar to 3D Integration

Writer: Latitude Design SystemsLatitude Design Systems
Introduction

With the rapid advancement of computing technology, the relationship between cache architecture and chiplet technology plays a crucial role in driving progress in artificial intelligence and computing performance. This article explores the evolution from traditional planar design to complex 3D integration technology, focusing on major innovations in CPU and GPU architectures.

Advancements in CPU Cache Architecture

The evolution of CPU cache architecture reflects the advancement of computing technology. As workloads continue to grow, the demand for larger and more efficient cache systems has increased accordingly. Traditional methods of expanding cache capacity have reached their limits, necessitating innovative solutions.

Let's first analyze the evolution of AMD server LLC capacity across multiple generations.

the growth of AMD server CPU LLC capacity from Barcelona to Genoa
Figure 1 illustrates the growth of AMD server CPU LLC capacity from Barcelona to Genoa across different generations, including comparisons between configurations with and without V-Cache.

The challenge of increasing cache capacity while maintaining performance has driven the development of revolutionary 3D stacking technologies. The introduction of hybrid bonding technology marks a significant advancement in this field.

compares micro-bump 3D cross-sections and hybrid bonding 3D cross-sections
Figure 2 compares micro-bump 3D cross-sections and hybrid bonding 3D cross-sections, demonstrating the significant increase in interconnect density achieved through hybrid bonding.

AMD’s 3D V-Cache™ technology represents a breakthrough in cache architecture.

first-generation AMD 3D V-Cache™ structure
Figure 3 illustrates the first-generation AMD 3D V-Cache™ structure, showing the stacking arrangement of structural chips, L3D cache, and CCD components.

This technology continues to evolve with improvements in manufacturing processes and Bond Pad Via (BPV) technology.

ransition in BPV interface layers from MTop (M13) to Al RDL
Figure 4 depicts the transition in BPV interface layers from MTop (M13) to Al RDL in the Zen 3 to Zen 4 3D V-Cache™ implementation.
compares the power delivery across BPV interfaces
Figure 5 compares the power delivery across BPV interfaces, illustrating how design improvements enhance power distribution efficiency.
Innovations in GPU Cache Architecture

GPU cache architecture has undergone a unique evolution, particularly with the introduction of AMD Infinity Cache™. This innovation addresses the limitations of traditional GDDR memory systems while improving both performance and energy efficiency.

comparison of GDDR bandwidth and power consumption in "Navi 21,"
Figure 6 presents a comparison of GDDR bandwidth and power consumption in "Navi 21," highlighting the bandwidth-per-watt advantages of Infinity Cache.

The implementation of Infinity Cache in the "Navi 21" architecture is a significant milestone.

"Navi 21" chip with Infinity Cache
Figure 7 showcases the "Navi 21" chip with Infinity Cache, demonstrating its integration into the GPU architecture.
latency comparison between "Navi 21" (RX 6800) and previous-generation products
Figure 8 illustrates a storage latency comparison between "Navi 21" (RX 6800) and previous-generation products, showing substantial latency improvements.

This evolution further developed into a chiplet-based approach in "Navi 31."

omplexity of the chiplet-based implementation
Figure 9 displays (a) the position of Infinity Cache and memory interfaces in "Navi 31" and (b) a cross-sectional view, showcasing the complexity of the chiplet-based implementation.
Advanced Integration in AI Accelerators

The development of AI accelerators has introduced new challenges and solutions for cache architectures. AMD Instinct™ MI300X represents the culmination of these advancements.

AMD Instinct™ MI300X accelerator
Figure 10 illustrates the AMD Instinct™ MI300X accelerator, highlighting the complex integration of multiple chips and HBM stacks.
a cross-sectional view of MI300X
Figure 11 presents a cross-sectional view of MI300X, providing details on the sophisticated packaging technology used to integrate multiple components.

Design optimizations extend down to the finest architectural details.

arrangement of SRAM arrays between power TSV columns
Figure 12 shows the arrangement of SRAM arrays between power TSV columns, demonstrating meticulous space utilization optimization.
MI300A IOD, XCD, and CCD configuration
Figure 13 depicts the MI300A IOD, XCD, and CCD configuration, highlighting the flexible integration of different computing units.
Conclusion

The evolution of cache architecture illustrates the industry's relentless efforts to enhance computing performance. From the challenges of early planar designs to today’s sophisticated 3D integration technologies, each step forward has contributed to increased computational capabilities. The synergy between chiplet technology and cache architecture continues to drive innovation, and the future of cache development will likely see further integration with new memory technologies to complement or replace traditional SRAM, meeting the growing demands of AI and high-performance computing applications.

References

[1] J. Wuu, M. Mantor, G. H. Loh, A. Smith, D. Johnson, D. Fisher, B. Johnson, C. Henrion, R. Schreiber, J. Lucas, S. Dussinger, A. Tomlinson, W. Walker, P. Moyer, D. Kulkarni, D. Ng, W. Jung, R. Swaminathan, and S. Naffziger, "Coevolution of Chiplet Technology and Cache Architecture for AI and Compute," in 2024 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2024.

Comments


bottom of page