top of page
Writer's pictureLatitude Design Systems

A Low-Power Inverter-Based AC-Coupled Link for Die-to-Die Communication

Introduction

As the demand for high-performance computing continues to grow, there is an increasing need for high-density, low-power interconnects to transfer large amounts of data between chips. This has led to a recent trend of moving multi-chip modules (MCMs) onto silicon interposers to accommodate higher bandwidth densities. However, existing medium-to-short-reach interfaces typically consume too much power for these interposer-based chiplet systems.

This article describes a new interconnect solution called the inverter-based short-reach ac-coupled toggle (ISR-ACT) link. The ISR-ACT link is designed for very short-reach die-to-die communication over a silicon interposer or similar high-density interconnect. It achieves ultra-low power consumption while providing dc voltage isolation between the transmitter (TX) and receiver (RX), enabling communication between chips from different process nodes.

Power Reduction Techniques

The ISR-ACT link incorporates several power reduction techniques compared to conventional signaling:

1. Removal of Receiver Termination

For short-reach channels like an interposer, termination is only needed at the transmit end since reflections mainly occur at the end points. Figure 1(a) shows an unterminated RX with a rail-to-rail signal swing from the TX driver.


(a) Unterminated RX with rail-to-rail swing. (b) Swing reduction through a capacitor divider.
Fig. 1. (a) Unterminated RX with rail-to-rail swing. (b) Swing reduction through a capacitor divider.

2. Swing Reduction Through Capacitor Divider

Full swing signaling is unnecessary for short channels with minimal attenuation. A small series capacitor at the TX forms a capacitive divider with the line capacitance to reduce the signal swing as shown in Figure 1(b), lowering the drive requirements and power.

3. Addition of DC Path and Reflection Mitigation

To define the DC operating point and avoid excessive reflections, DC biasing paths are added at the TX and RX as shown in Figure 2. Intentionally making the signal traces lossy suppresses residual reflections.

Addition of dc paths at the TX and RX and the impact of reflections
Fig. 2. Addition of dc paths at the TX and RX and the impact of reflections

4. Decoupling RX DC From TX

To enable voltage isolation between TX and RX, the TX DC path is removed. The RX uses positive feedback to form a latch that establishes and maintains the DC levels on the line independently from the TX as shown in Figure 3.

Decoupling TX and RX dc operating points through positive feedback from the RX output.
Fig. 3. Decoupling TX and RX dc operating points through positive feedback from the RX output.
Circuit Implementation

The architecture of the ISR-ACT transceiver is shown in Figure 4. The TX transmits AC-coupled data transitions through a small on-chip capacitor Cac. The AC peak-to-peak amplitude Vac_ppk is set by the capacitor ratio:

Vac_ppk = Cac / (Cload + Cac) * VDD

ISR-ACT transceiver.
Fig. 4. ISR-ACT transceiver.

The RX is a two-stage latch with negative feedback through Rn to the input and positive feedback through Rp from the output. This toggles the RX input between two stable DC states defined by:

toggles the RX input between two stable DC states

Where A1 is the gain of the first inverter stage. Proper sizing of Rn, Rp and Cac ensures Vac_ppk > Vdc_ppk for robust operation.

To optimize signal swing and eye quality, simulations sweeping Cac were performed, as shown in Figure 5 for a 1.2mm channel. An 80% Cac value provided the best jitter while 100% (150fF nominal) was chosen to accommodate ±15% variation.

Optimization of transmit coupling capacitor (Cac) based on simulated eye diagrams at 25 Gb/s.
Fig. 5. Optimization of transmit coupling capacitor (Cac) based on simulated eye diagrams at 25 Gb/s.
Link Architecture

Figure 6 shows the ISR-ACT link architecture using a delay-matched clock-forwarding scheme. It has 19 data TX/RX lanes and one forwarded clock lane per direction in a 20-lane PHY.

ISR-ACT PHY top level.
Fig. 6. ISR-ACT PHY top level.

Multiple PHYs can be stacked for higher bandwidth as a multi-rank system (Figure 7). A 4-rank configuration with 4 PHYs provides 1.9Tb/s total bandwidth.

Four-rank ISR-ACT link (152 DQs and eight clocks) on four interposer routing layers.  The forwarded clock is generated, transmitted, and received using replicas of the data path circuitry to closely match delays, ensuring the received clock aligns properly with the received data regardless of PVT variations between the TX and RX chips.
Fig. 7. Four-rank ISR-ACT link (152 DQs and eight clocks) on four interposer routing layers. The forwarded clock is generated, transmitted, and received using replicas of the data path circuitry to closely match delays, ensuring the received clock aligns properly with the received data regardless of PVT variations between the TX and RX chips.
Measurement Results

The ISR-ACT link was implemented in a 5nm test chip and measured at 25.2Gb/s/wire over a 1.2mm on-chip channel designed to model a 4-rank interposer system.

Bit Error Rate and Eye Margin

Figure 8(a) shows the bathtub curve at 25.2Gb/s with a horizontal eye opening of 0.66 UI at BER=1e-12. The eye margin remains over 0.53UI at BER=1e-25 after accounting for random jitter.

Fig. 8. (a) Bit error rate of four ISR-ACT lanes at 25.2 b/s/wire. (b) Eye margin across process and temperature.  Figure 8(b) plots the eye margin across process corners from 16-25.2Gb/s over 0-90°C. At 25.2Gb/s, >0.55UI margin is achieved at all corners.
Fig. 8. (a) Bit error rate of four ISR-ACT lanes at 25.2 b/s/wire. (b) Eye margin across process and temperature. Figure 8(b) plots the eye margin across process corners from 16-25.2Gb/s over 0-90°C. At 25.2Gb/s, >0.55UI margin is achieved at all corners.
Power Consumption

Figure 9(a) shows the per-PHY power breakdown based on simulations. At 25.2Gb/s/wire, the total PHY power is 90.8mW, with the output driver consuming only 11%.

Fig. 9. (a) Power consumption of a 19-DQ ISR-ACT link at 25.2 Gb/s/wire and the breakdown based on simulation. (b) Power versus activity using clock gating.  Figure 9(b) confirms over 90% of the power scales with activity using clock gating, with only a 7.9mW static power component - a low-power advantage of CMOS logic.
Fig. 9. (a) Power consumption of a 19-DQ ISR-ACT link at 25.2 Gb/s/wire and the breakdown based on simulation. (b) Power versus activity using clock gating. Figure 9(b) confirms over 90% of the power scales with activity using clock gating, with only a 7.9mW static power component - a low-power advantage of CMOS logic.

The ISR-ACT link achieves an energy efficiency of 0.190pJ/bit, which is the best reported power efficiency to date for die-to-die interconnects at these data rates. Table I compares the ISR-ACT link to other recent work, showing competitive area efficiency while operating at a low 750mV supply that can be shared with processor core logic.

Longer Reach Potential

While optimized for 1.2mm channels, the ISR-ACT topology allows signaling over longer traces by increasing the coupling capacitance Cac. Figure 10 shows simulated eyes at 25Gb/s over 3.3mm channels with Cac doubled to 300fF, recovering eye margin with only a 7fJ/bit increase in power.

Simulated ISR-ACT eye pattern at RX input at 25 Gb/s over 3.3-mm trace. Top: Cac = 150 fF. Bottom: Cac = 300 fF. 0–200 ns, PRBS31.
Fig. 10. Simulated ISR-ACT eye pattern at RX input at 25 Gb/s over 3.3-mm trace. Top: Cac = 150 fF. Bottom: Cac = 300 fF. 0–200 ns, PRBS31.
Conclusion

The ISR-ACT link presents a highly power-efficient solution for very short-reach die-to-die communication over interposers and high-density interconnects. Its use of AC coupling, capacitive signal swing reduction, and positive feedback latching enables ultra-low power operation of 0.19pJ/bit at 25.2Gb/s/wire while providing DC isolation between transmitting and receiving chips. With its low 750mV supply and high bandwidth density, the ISR-ACT architecture is well-suited for scaling future chiplet-based computing systems.

Reference

[1] Y. Nishi, J. W. Poulton, W. J. Turner, X. Chen, S. Song, B. Zimmer, S. G. Tell, N. Nedovic, J. M. Wilson, W. J. Dally, and C. T. Gray, "A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS," IEEE Journal of Solid-State Circuits, vol. 59, no. 4, pp. 1146-1157, April 2024, doi: 10.1109/JSSC.2023.3338478.

Comments


bottom of page