Microcomb-enabled parallel self- calibration optical convolution streaming processor

Principle of operation

The core structure integrated in the photonic chip comprises: a passive streaming processing unit, a reference waveguide, and two pairs of fan-out for data processing and calibration respectively (Fig. 1b). The streaming processing unit can be configured as a finite impulse response (FIR) filter which can be modeled as a finite impulse filter⁴². Mathematically, the transfer function in the time and frequency domain can be given by:

$$H\left(f\right)={\sum }_{n=0}^{N-1}{a}_{n}{e}^{j{\varphi }_{n}}{e}^{-j2\pi {fn}\Delta T}$$

$$h\left(t\right)={\sum }_{n=0}^{N-1}{a}_{n}{e}^{j{\varphi }_{n}}\delta (t-n\Delta T)$$

Where f is the optical frequency, N denotes the number of taps and nΔT is the delay of tap n, the amplitude and phase of the tap are represented by ${a}_{n}$ and ${\varphi }_{n}$ and can be fully adjusted in all dimensions by changing the splitting ratio of the MZI and the phase shift of the phase shifter. The H(f) is periodic with a FSR of 1/ΔT (FSR = 1/ΔT). Benefiting from the periodic interference characteristics of the filter, when the carrier frequency spacing of the input streams aligns with the filter’s FSR, the OCSP can simultaneously perform parallel convolution across multiple data streams, without necessitating additional on-chip components (as shown in Fig. 1a, b).

**Fig. 1: Parallel optical convolution streaming processing system.**

For an input optical data streams, the filter output is given by the convolution:

$$y\left(t\right)=x\left(t\right)* h\left(t\right)={\sum }_{n=0}^{N-1}{a}_{n}{e}^{j{\varphi }_{n}}x(t-n\Delta T)$$

The input optical data streams, at a symbol rate 1/ΔT Baud (1/ΔT is also the repetition frequency of microcomb), are wavelength-division multiplexed and feature unlimited lengths for streaming processing. The input optical data streams are simultaneously duplicated and weighted via cascaded MZIs, with the phases controlled by phase shifters (PSs). Next, the weighted replicas are progressively delayed with a delay step (between adjacent spatial paths) equals to ΔT, the symbol duration of input data streams, effectively achieving time-space interleaving (i.e., the input data’s adjacent symbols are aligned in time). Finally, the weighted and delayed replicas are coupled together, constructively or destructively interfering with each other to yield convolution results.

The convolution window effectively slides at the input data rate. Each output symbol is the dot product between the input data, within a convolution window or receptive field (determined by the scale of the chip, i.e., number of spatial paths N), and the weights (implemented by the MZIs and PSs). Consequently, each output symbol is the result of N multiply-and-accumulate (MAC) operations. The OCSP directly processes the optical fields when the OCSP processes real-valued data (i.e., intensity-modulated, phases are 0 or π), each of which takes 2 operations (1 multiplication and 1 accumulation) and thus results in a peak computing speed of 2 N/ΔT operations per second (OPS), per wavelength channel. This computing speed can be boosted when processing massive wavelength-division multiplexed optical data streams; and the frequency interval between wavelength channels needs to be equal to, or multiple of, the symbol rate 1/ΔT.

Kernel self-calibration of the OCSP

The key of the wavelength-division multiplexing and time-space interleaving convolutional computing architecture is rooted in two factors: a) the alignment of the optical comb repetition frequency with the FSR of the OCSP, and b) the alignment of phase shifts induced by each path on the OCSP. Yet any tiny delay variations in the magnitude of optical wavelengths (~1550 nm) can lead to large phase errors in the convolution weights and thus in the computing results. This is a nontrivial issue due to the challenges in obtaining phase-related responses of the chip and overcoming the dynamic on-chip thermal crosstalk or temperature disturbance (Fig. 2f, g). To address this, we first map the convolution kernel to the time-domain taps of the OCSP chip. Subsequently, we apply an on-chip phase recovery method⁴³ to optical computing chips (Fig. 3a). This adds an optical reference path to the chip, which enables phase recovery based on the intensity-only spectral response of the chip via the gap method. The intensity response can be easily measured with an external tunable laser source and a power meter; by placing the reference path on the chip we ensure that any phase errors in the patch that lead to external instrumentation do not affect the phase measurement. The measured power response of the entire OCSP chip can be regarded as a superposition of spectral components yielded by “internal” (between the streaming processing taps) and “external” (between the reference tap and the streaming processing taps) interferences. As such, the impulse response of the OCSP (i.e., the convolutional weights) can be obtained via the Fourier transform of the entire chip’s power response, provided the reference path is sufficiently shorter than the streaming processing paths (τ > T). With the dynamic convolutional weights obtained (both the amplitudes and phases), the required updates of the electrical power supply can be obtained and the desired kernel weights can be deterministically dialed-up (for the detailed self-calibration process and robustness test, see Supplementary Notes 4, 5). Beyond successfully calibrating our OCSP, this phase recovery approach enables optical computing chips to obtain comprehensive dynamic on-chip frequency and impulse responses (both the phases and amplitudes), thus enabling accurate and trainable on-chip frequency responses that greatly enhances the computing accuracy and allows rapid training of coherent optical computing hardware.

**Fig. 2: Chip fabrication and characterization.**

**Fig. 3: Calibrating processes of OCSP implementing three kernel settings.**

Convolutional kernel verification

To demonstrate our approach, we fabricated a 16-tap FIR chip on a standard Silicon-On-Insulator (SOI) platform⁴³. Taps 9–16 were worked as the OCSP and taps 2–8 were in the off state, while tap 1 served as the reference path. The delay step of the OCSP’s streaming processing taps is ΔT = ~ 20 ps, corresponding to an input optical data rate of 50 GBaud. This also gives the OCSP’s impulse response duration T = 7 × ΔT = 140 ps. The delay gap between the reference path and the OCSP’s first tap is ~8 × ΔT, satisfying τ > T for subsequent phase recovery via Fourier transform. Four stages of tunable couplers were used to achieve the desired amplitudes of convolutional weights and eight PSs were used to manipulate the tap’s phases (i.e., the signs of the weights).

Distinctive convolutional kernels were first implemented using the OCSP chip for generic image processing functions (kernels 1, 2) and verification of convolution accuracy at the critical point (kernel 3) (Fig. 3b). The windowed power response of the entire chip (with ref. path) within a frequency range of 3.5/ΔT was Fourier transformed to obtain the impulse responses, which were then used to assess errors against the desired tap coefficients and thus the needed update of applied electrical power. After 65 iterations, the OCSP taps’ amplitudes and phases converge to their desired values and thus can be used for subsequent optical data processing (Fig. 3b, first two rows). Since the OCSP features a periodic frequency response with FSR of 1/ΔT (Fig. 3b, last two rows), the OCSP could support simultaneous computing/feature extraction of multiple wavelength channels, provided the wavelength channels’ spacing matches with an integer multiple of the OCSP’s FSR.

To verify the parallel computing capabilities of our OCSP chip, five wavelengths were simultaneously selected by the waveshaper1 (ws1) and transmitted through a shared optical path. The input optical data was encoded as intensities of flattened image matrices at 50 GBaud and loaded onto each wavelength channel. At the output port, the waveshaper2 (ws2) dynamically selected the computational results of each wavelength, enabling wavelength-demultiplexed readout (Fig. 4a–c). The waveforms of the convolution results and their mean squared error curves are shown in Fig. 4d, f (see Supplementary Note 2, 7 for the detailed discussion on the convolution results). The convolutional kernels directly implemented by the OCSP chip can achieve desired image processing functions including averaging (kernel 1) or edge enhancement (kernel 2). The OCSP’s convolutional weights need to satisfy certain rules such that the optical carrier can be maintained for detection: the sum of the weights corresponds to the transmission of the carrier, thus it cannot be set as zero—at which point the carrier is suppressed (i.e., located at the notches of the amplitude responses), such as the Sobel operator. We note that this does not impose any limitations onto the OCSP’s capability, since on one hand, a local oscillator laser at the same wavelength as the carrier can be offered at detection to compensate for the carrier’s power losses, on the other hand, the convolutional weights do not satisfy the rule can always be decomposed into two sets of convolutional weights that can be implemented simultaneously and then synthesized, as we have demonstrated in Fig. 4e (synthesized kernels 1 and 2). In this experiment, we break it down into two sets of taps when the sum of tap coefficients is lower than 2 (for 8-tap OCSP used) (see Supplementary Note 1, 3 for detailed discussion).

**Fig. 4: Image processing results of OCSP kernels.**

Optoelectronic hybrid neural network

To further highlight the capability of the OCSP system demonstrated in this work, we performed a proof-of-concept demonstration to validate our OCSP’s deployment in the data centers and complex neural networks, thereby working in coordination with electronic hardware to facilitate advanced AI tasks. Figure 5a shows the schematics of the integrated parallel data interconnect system for large-capacity optical interconnection. At the WDM transmitter, each comb line carries independent data streams through WDM technology. The OCSP is directly embedded in the WDM transmission link, maintaining compatibility with transceiver interfaces, enabling parallel feature extraction across all wavelength channels. At the WDM receiver, the multi-channel convolutional stream is detected by the photodetector and transmitted to the next computing node after processing (such as sampling and retiming) (for detailed deployment in datacenter, see Supplementary Note 9). In this proof-of-concept study, we use microcombs as the multi-wavelength source, waveshaper and Mach-Zehnder modulators facilitate the dynamic loading of data across various wavelengths instead. We adopted the PAM-16 modulation format which is a potential enabler for next-generation data center optical interconnects, offering higher spectral efficiency than the generic PAM-4 modulation format in data centers while meeting requirements envisioned for AI workloads. To validate the universality of our OCSP chip, we conducted tests in distinct datasets. Figure 5b shows the network model in the CIFAR10⁴⁴ data test, 200 images were divided into 5 different batches, and five different wavelengths carried the image data of different batches respectively. The input RGB image of dimensions 34 × 34 × 3 pixels was processed through three 2 × 2 × 3 convolutional kernels, generating three feature maps with spatial resolution 17 × 33 for subsequent network processing. In this experiment, the first convolutional layer was implemented using photonic computing hardware, while the subsequent layers of the network were realized through electronic computing hardware. The electronic hardware segment comprises five basic layers followed by a fully connected (FC) layer. Each basic layer contains four consecutive convolutional layers that form a dense processing block. The feature maps and the expanded waveforms of kernel 1 obtained from the experiment and the CPU are shown in Fig. 5c. The slight difference between the experiment and calculation is mainly caused by limited system bandwidth. The confusion matrix (Fig. 5d) illustrates the accuracy of the prediction obtained from experiment and theoretical calculation is 85% and 92%. (Additionally, the experiment results on ImageNet subset⁴⁵, Fashion-MNIST⁴⁶, and MNIST⁴⁷ datasets are shown in Supplementary Note 10).

Source link

Principle of operation

Kernel self-calibration of the OCSP

Convolutional kernel verification

Optoelectronic hybrid neural network

Leave a Reply Cancel reply