top of page
Writer's pictureLatitude Design Systems

Scaling Datacenter Networks with Optical Circuit Switching: Google's Apollo Project

Introduction

Over the past few decades, the rise of hyperscale data centers has revolutionized global Internet services, enabling innovations in web search, e-commerce, social media, and cloud computing. More recently, machine learning workloads have further emphasized the importance of large-scale datacenter computing capabilities. At the heart of it all lies datacenter networking, which provides the connectivity and scale needed to efficiently execute these functions and services.

Google's Apollo Project
Traditional Datacenter Network Architectures

Conventionally, datacenter networks have used packet-based Clos topologies, also known as spine-and-leaf architectures. In this setup, racks of compute servers connect to top-of-rack (ToR) leaf switches, which then link through aggregation layers to a spine of Ethernet packet switches (EPS). While this has been the industry standard, EPS devices consume significant power. Figure 1a illustrates a conventional datacenter network with EPS-based spine blocks connecting aggregation blocks (ABs), with ToR switches linked to the ABs using parallel fiber optics.

Introducing Optical Circuit Switching

As an alternative, researchers have explored incorporating optical circuit switches (OCS) into datacenter networks. OCS devices offer several advantages:

  1. Data rate and wavelength agnostic

  2. Low power consumption

  3. Low latency

MEMS-based OCS systems, which use tilting mirror arrays to steer light from input to output ports, have shown the most promise in scaling to the high port counts required for datacenters at reasonable costs.

However, datacenter requirements pose challenges for OCS hardware:

  • Faster switching times (commercial OCS typically 10-20ms)

  • Lower insertion and return loss

  • Wide wavelength range operation

  • Strictly non-blocking architectures

  • Lower costs

Google's Apollo OCS Platform

Google's Apollo represents the first large-scale deployment of OCS in datacenter networks. In production for nearly a decade, Apollo has served as the backbone of Google's datacenters, supporting all use cases.

Key components of the Apollo platform include:

  • Palomar: Google's internally developed OCS

  • Optical circulators

  • Custom WDM optical transceivers supporting bidirectional links

In Apollo's architecture, the OCS layer replaces the EPS spine blocks, linking the leaf switches through a patch panel (Figure 1b). This direct-connect topology functions as an optical cross-connect rather than packet switching.

The use of WDM and circulators with single-mode fiber enables full-duplex communication over each OCS port/fiber, doubling efficiency compared to using parallel single-mode optics. Placement of the OCS layer offers flexibility to the traditionally static Clos network.

Evolution of Network Architecture
Figure 1. Evolution of Network Architecture
Advantages of Google's Apollo OCS

Google overcame the typical drawbacks of OCS through its innovations:

  1. High upfront costs - Increased demand from datacenters can drive down costs with economies of scale.

  2. Insertion loss - Apollo's 136x136 OCS is non-blocking and consumes only 108W compared to 3000W for an equivalent 136-port EPS.

  3. Slow reconfiguration - Faster switching enables flexibility.

  4. Lack of drop-in support - Apollo is forward and backward compatible with any bandwidth or wavelength in Google's datacenters.

Future Directions

To meet evolving datacenter network demands, key areas for optical technology advancement include:

  1. Larger port count OCS for increased scale-out and topology flexibility

  2. Faster, lower-cost OCS for adoption in lower network layers

  3. Improved reliability and availability for larger failure domains

  4. Lower insertion and return loss to extend the optical interconnect roadmap

Conclusion

Google's Apollo platform demonstrates the viability and advantages of deploying OCS at scale in datacenter networks. Through innovations in MEMS-based switching, custom transceivers, and network architecture, Apollo has provided a flexible, low-power, and cost-effective connectivity solution. As datacenter bandwidth demands continue to grow, driven by ML and cloud computing, OCS technologies are poised to play an increasingly important role in scaling and optimizing these networks.

Comments


bottom of page