Introduction
The rapid growth of data-intensive applications, such as Artificial Intelligence (AI), Machine Learning (ML), high-resolution video streaming, and Augmented Reality/Virtual Reality (AR/VR), has led to a surge in data traffic within hyperscale data centers and cloud service providers. This surge has driven a significant increase in the bandwidth demands on networking infrastructure, resulting in a doubling of the aggregate bandwidth of switch systems and Ethernet optics every two to three years.
However, this exponential increase in bandwidth comes with a significant challenge: the corresponding rise in power consumption. Typical data centers are built with fixed electrical power budgets and energy use forecasts, making it critical to scale bandwidth within strict power envelopes or require expensive infrastructure upgrades.
This is where Co-Packaged Optics (CPO) emerges as a promising solution. CPO is an emerging technology that integrates high-bandwidth optical engines next to a compute chip, such as a switch ASIC or a CPU/GPU, on the same substrate. By bringing the optical components closer to the compute chip, CPO offers several key advantages, including reduced power consumption, lower latency, and improved cost-efficiency compared to traditional pluggable optical transceivers.
In this tutorial, we will explore the fundamentals of CPO, its development, the critical components involved, system-level integration challenges, and the potential impact of this technology on hyperscale networking and high-performance computing applications.
The Need for Co-Packaged Optics
The growing demand for data center bandwidth and the associated power consumption challenges are driving the need for innovative solutions. Figure 1 illustrates the trends in ASIC and Ethernet bandwidth scaling over the past several years, highlighting the exponential growth in both areas.
Figure 1: ASIC and Ethernet bandwidth evolution with time depicting (a) ASIC bandwidth doubling roughly every 2 years, and (b) Ethernet speeds scaling with ASIC bandwidth [1].
As the bandwidth density increases, the power requirements for each successive generation also rise significantly. Figure 2 further illustrates this trend, showing that optical interconnects can represent up to 80% of the total switch system cost at higher data rates and approximately 50% of the total system power for 51.2T systems.
Figure 2: (a) Optical Interconnects represent ~80% of total switch system cost at higher data rates, and (b) Optical Systems represent ~50% of total system power for 51.2T systems [1].
To address these challenges, various efforts are underway in the industry to optimize interconnect and Serializer-Deserializer (Serdes) power across switch systems. These include power-efficient Serdes design, the use of low-loss printed circuit board (PCB) materials, low-loss flyover cables as interconnects, and the adoption of linear pluggable optics. However, Co-Packaged Optics (CPO) stands out as a particularly promising approach, as it optimizes power consumption by bringing optical transceivers closer to the ASIC chips, thereby eliminating the need for power-hungry retimers and optical signal processing.
The Co-Packaged Optics Approach
Co-Packaged Optics (CPO) is a paradigm shift in building switch systems that involves packaging optical engines on the same substrate or interposer as the switch ASIC (or in CPU/GPU ICs for AI/ML applications). This allows the ASIC to drive the optical devices directly, without the need for a separate Digital Signal Processor (DSP) in the optical engine.
Figure 3 depicts various architectural choices available for reducing power consumption in switch systems, with the CPO approach being one of the most efficient in terms of system power optimization. By packaging optical engines next to the ASIC and allowing the ASIC to drive optical devices directly, CPO can achieve a significant reduction in overall system power, estimated to be up to 25-30% [11, 12]. This is due to the elimination of a power-hungry DSP in the optics, as well as the use of lower-power Serdes on the ASIC (XSR vs. LR).
The key components of a CPO system are the Optical Engine and the switch system assembly, which we will discuss in the following sections.
Silicon Photonics-Based Optical Engines
A critical enabler for CPO is the development of silicon photonics technology, which utilizes standard Complementary Metal-Oxide-Semiconductor (CMOS) fabrication processes to enable the dense integration of multiple optical and electrical devices on a single silicon chip. These devices include optical modulators, photodetectors, low-loss optical waveguides, optical coupling structures, thermistors, sensors, capacitors, and in some cases, even analog electrical devices such as Transimpedance Amplifiers (TIAs) and drivers.
This integration allows for cost-effective, scalable mass production of photonic interconnects that can be co-packaged with ASICs (or CPUs/GPUs) using semiconductor packaging and assembly techniques. Cisco has been at the forefront of developing highly scalable PAM-4 optical interconnects based on silicon photonics, featuring critical features such as low-loss modulators, low-loss optical coupling structures, and integrated Mux-Demux (Multiplexing-Demultiplexing) in a 400G FR4 architecture.
To achieve high levels of integration between the Photonics Integrated Circuit (PIC), Electrical Integrated Circuits (EICs), and other components, as well as to optimize signal and power integrity for the optical system, 2.5D or 3D packaging architectures provide a distinct advantage. These packaging approaches enable tight integration of PIC and EIC components in a compact footprint and allow the use of well-established semiconductor assembly processes for optics co-packaging.
Fiber coupling to optical engines used for CPO is another significant challenge. Depending on the CPO application, optical engines can have anywhere from 24 to 72+ fiber channels, including polarization-maintaining (PM) fibers used to couple remote laser optical power to the silicon photonics IC. For such a large array of fibers, chip or package warpage plays a major role in determining optical coupling efficiency and the reliability of the coupling solution.
Optical coupling of large-channel-density fiber arrays thus requires the optimization of overall optical package warpage, fiber alignment axis, and the efficiency of the coupling structures used in the optical assembly.
System Integration for Co-Packaged Optics
Co-packaging optical engines on an ASIC platform requires complex chip-level and system-level assembly. This process involves several innovations in co-package mechanical and thermal design, power delivery architecture, package warpage management, and the selection of assembly materials.
Large body substrates (typically greater than 80 x 80 mm) are needed to house both the switch ASIC and the 8 to 16 optical tile assemblies. This gives rise to high warpage and high stress in the package due to the coefficient of thermal expansion (CTE) mismatch between various assembled components. Warpage management using the right combination of materials, assembly techniques, and the use of stiffener materials with optimized CTE is critical to achieving a high-performing, reliable co-package.
Advanced substrate materials and fabrication technologies are also essential to enable low-loss channels between the ASIC and the optical tiles, where reducing channel discontinuities, lowering channel loss, and enabling short channel lengths are all critical parameters.
Thermal management for CPO becomes crucial as both the high-power ASIC and the various optical engines share the same real estate, increasing the thermal density at the center of the system. Prevention of thermal crosstalk between the ASIC and the optical tiles is critical for the performance and reliability of the system. Design and strategic placement of enhanced surface area heat sink systems, along with advanced air-cooling mechanisms, help efficiently dissipate heat and prevent temperature-related performance degradation.
To make a CPO system more reliable and robust, adding redundancy in both electrical and optical power sources is essential. For this reason, the laser sources are developed as a remote laser package that is assembled separately from the ASIC package, due to concerns about thermal crosstalk and the lack of repairability options in case of field failures.
Cisco, along with several other industry partners, has driven the development of a standard multi-source agreement (MSA) for ELSFP (External Laser Small Form Factor Pluggable) modules, with the objective of cost efficiency, manufacturing scalability, and standardization of optical power sources for CPO. These ELSFP modules allow lasers to be passively cooled, run with more efficiency and reliability, and be replaceable in the field.
Challenges in Scaling and Deployment
While CPO offers significant potential benefits, there are several challenges associated with its system integration, including signal integrity and high-speed channel optimization, packaging, thermal management, and overall system reliability.
Minimizing thermal crosstalk between the high thermal dissipation power ASIC and the optics, as well as managing stress and warpage for the overall package, present unique challenges that require innovative thermal, opto-mechanical, and electrical design solutions.
Modeling and demonstrating the reliability of various components is critical to the success and adoption of CPO across data center networking and AI/ML applications. Since the majority of optical engine components (except laser sources) are inside the switch box, field repair or replacement are not viable options in case of failure. Therefore, the reliability and Failure-in-Time (FIT) rates for optical engine components must be proven to be acceptable for data center use case applications, and in certain cases, may need to be proven equal or better than switch ASIC system components.
Summary
Co-Packaged Optics (CPO) is an emerging technology that offers a promising solution to the power and cost challenges faced by hyperscale networking and high-performance computing applications. By integrating high-bandwidth optical engines next to the compute chip, CPO can significantly reduce overall system power and cost, while also improving latency.
Enabling CPO requires tight integration of various optical and electrical components, particularly high-bandwidth optical engines and ASIC/CPU/GPU packages. Integration challenges related to optical assembly, mechanical design and assembly, and thermal solutions need to be addressed for viable deployment of CPO in the data center environment.
Silicon Photonics plays a crucial role in enabling the miniaturization of optical and electrical components, allowing the co-packaging of optics and ASICs using standard semiconductor processes. This is critical to providing high-density optical and electrical channel bandwidth in small form factors and a path to reliable and cost-effective systems.
While several technical solutions have been developed to mitigate integration, yield, and reliability challenges, key challenges in wide-scale adoption of CPO remain due to concerns about field reliability, repairability, possible system downtime, and demonstrating significant cost and performance advantages over comparable pluggable optics solutions.
Despite these challenges, co-packaged optics technology shows immense potential to reduce overall system power, interconnect latency, and enable environmental sustainability and governance goals for next-generation data centers and high-performance computing applications. As the industry continues to drive innovation in this space, the widespread adoption of CPO could significantly shape the future of hyperscale networking and computing.
Reference
[1] S. Razdan, M. Traverso, and A. Torza, "Co-Packaged Optics Integration for Hyperscale Networking," Cisco Systems, Inc., [Online]. Available: https://ieee802.org/3/B400G/public/21_02/chopra_b400g_01_210208.pdf
留言