Introduction
Silicon photonic wavelength filters based on asymmetric Mach-Zehnder interferometers (AMZI) or micro-rings are critical components for on-chip wavelength division multiplexing (WDM) systems in optical interconnects. However, these filters are highly sensitive to changes in wavelength and temperature, which can cause functional failures. Abrupt variations in wavelength or temperature lead to misalignment between the filter's spectral peak and the operational wavelength, resulting in significant performance degradation.
Traditional solutions, such as pre-calibration at different temperatures and implementing feedback control, are time-consuming and inefficient. Moreover, these approaches become increasingly challenging when dealing with multiple coupled or cascaded filter devices with multiple parameters. Therefore, a real-time, simple, and quick method for automatic wavelength alignment over a wide range, without requiring calibration, is highly desirable.
This article introduces a novel approach that utilizes reinforced Q-learning, a machine learning technique, to achieve automatic work wavelength alignment for silicon photonic AMZI filters against wide laser wavelength shifts and temperature variations. The method has been experimentally demonstrated and successfully implemented in a microcontroller (MCU) with only 8-KB of memory.
Experimental Setup
The experimental schematic and setup are illustrated in Figs. 1(a) and 1(b), respectively.
In this setup, light from a wavelength-tunable laser is modulated by a commercial modulator driven by a pulse pattern generator (PPG) and input into the AMZI filter. The multi-agent reinforced Q-learning algorithm is implemented in an Arduino microcontroller (Mega2560, 8-KB SRAM memory), which controls a 16-bit digital-to-analog converter (DAC) for applying voltage to the AMZI and a 16-bit analog-to-digital converter (ADC) for sensing the optical power of the photodetector.
The AMZI filter has two phase shifters (TiN heaters) controlled by two channels of the DAC, corresponding to two agents in the Q-learning algorithm. These agents are considered phase rivals, which is a slightly more difficult case than completely independent parameters.
Q-Learning Algorithm
In the Q-learning algorithm, the device (AMZI filter) is treated as a black box. Each agent takes an action and receives an updated state and reward based on a predetermined reward policy. For wavelength alignment, the goal is to maximize the optical output; therefore, the reward policy is defined as follows: if the output is increased, the reward (r) = 1; otherwise, r = -1.
Each agent is represented by a Q-matrix that describes the state-action relationship. The state space (s) is a discretized DAC value, and the action space (a) is a vector of increments (e.g., [-, 0, +]) of the DAC value. The Q-matrix is updated using the following equation:
Q[s, a] = (1 - α) * Q[s, a] + α * (r + γ * Q_max)
where α is the learning rate (set to 0.5), γ is the discount factor (set to 0.9), and Q_max is the maximum Q value of the next state.
Experimental Results
Fig. 2(a) demonstrates automatic wavelength alignment against laser wavelength shifts. The laser wavelength was intentionally switched in 5 nm steps from 1540 to 1560 nm every 600 algorithm loops, while the algorithm was running continuously.
Without running the Q-learning algorithm (i.e., without locking), the ADC value (optical output) varies significantly due to the AMZI's free spectral range (FSR ~5.6 nm). However, with the Q-learning algorithm active (i.e., with locking), the ADC value converges to and maintains nearly the maximum value despite wavelength shifts, indicating automatic wavelength alignment of the AMZI filter to the laser wavelength.
Fig. 2(b) illustrates dynamic locking, where the laser wavelength is swept in steps between 1540 and 1545 nm with a dwell time of 0.5 seconds. Initially, without alignment, the optical output powers at both wavelengths are low. As the Q-learning algorithm runs, the AMZI filter is automatically aligned to the laser wavelength, and the optical output power is quickly recovered and maintained at the same locked level, even though the wavelength is periodically swept.
Temperature changes can also cause filter failures due to the misalignment between the spectral peak and the wavelength. Fig. 2(c) shows that the Q-learning algorithm can achieve automatic alignment against both continuous wavelength shifts and large temperature changes.
Finally, the researchers inputted a 25-Gbps PRBS31 NRZ modulated signal into the AMZI filter and checked the eye pattern while executing the Q-learning algorithm and abruptly tuning the wavelength. Initially, the eye was completely closed, but it quickly opened as the algorithm ran. The eye pattern auto-locked by the Q-learning method showed no performance degradation compared to the manually tuned case, as illustrated in Fig. 3.
Even when the wavelength was abruptly shifted from 1545 to 1555 nm (a 5 nm step), the eye pattern was quickly recovered and remained open, demonstrating the effectiveness of the proposed method.
Summary
This tutorial has introduced an experimental demonstration of implementing a reinforced Q-learning algorithm in a microcontroller for real-time automatic work wavelength alignment of silicon photonic AMZI filters against abrupt wavelength shifts and large temperature variations. The proposed method offers a way to significantly alleviate the challenges associated with using wavelength-sensitive silicon photonic devices, without the need for time-consuming calibration procedures.
Reference
[2] G. Cong, R. Konoike, K. Suzuki, N. Yamamoto, R. Kou, Y. Maegami, M. Ohno, K. Ikeda, S. Namiki, and K. Yamada, "Real-time Automatic Wavelength Alignment for Silicon Photonic AMZI Filter by Running Reinforced Q-Learning in Microcontroller," Platform Photonics Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan, 2024, pp. 1-6, doi: 979-8-3503-9404-7/24/$31.00 ©2024 IEEE.
留言