Principle of operation
The core structure integrated in the photonic chip comprises: a passive streaming processing unit, a reference waveguide, and two pairs of fan-out for data processing and calibration respectively (Fig. 1b). The streaming processing unit can be configured as a finite impulse response (FIR) filter which can be modeled as a finite impulse filter42. Mathematically, the transfer function in the time and frequency domain can be given by:
$$H\left(f\right)={\sum }_{n=0}^{N-1}{a}_{n}{e}^{j{\varphi }_{n}}{e}^{-j2\pi {fn}\Delta T}$$
$$h\left(t\right)={\sum }_{n=0}^{N-1}{a}_{n}{e}^{j{\varphi }_{n}}\delta (t-n\Delta T)$$
Where f is the optical frequency, N denotes the number of taps and nΔT is the delay of tap n, the amplitude and phase of the tap are represented by \({a}_{n}\) and \({\varphi }_{n}\) and can be fully adjusted in all dimensions by changing the splitting ratio of the MZI and the phase shift of the phase shifter. The H(f) is periodic with a FSR of 1/ΔT (FSR = 1/ΔT). Benefiting from the periodic interference characteristics of the filter, when the carrier frequency spacing of the input streams aligns with the filter’s FSR, the OCSP can simultaneously perform parallel convolution across multiple data streams, without necessitating additional on-chip components (as shown in Fig. 1a, b).
a Optical spectra at critical locations of the parallel optical convolution streaming processing system. (i) the microcomb with a repetition rate of 1/ΔT serves as a multi-wavelength source. (ii) output optical spectrum of the modulated multi-wavelength signal, where each carrier is independently loaded with data at a symbol rate of 1/ΔT. (iii) the OCSP’s transmission spectrum with a free spectral range (FSR) of 1/ΔT. (iv) multi-wavelength data streams after parallel convolution processing by the OCSP. b System diagram for parallel image processing using the OCSP. Multiple image data streams are independently loaded onto distinct wavelength channels and transmitted through a shared optical fiber to the OCSP chip, enabling parallel convolution processing. Wavelength-selective switches can be employed to dynamically manage access to these data streams. The outputs are then demultiplexed for simultaneous, parallel readout of all processed streams. ISO optical isolator, BPF bandpass filter, DMUX demultiplexer, MUX multiplexer, MZM Mach-Zehnder modulator, PD photodetector
For an input optical data streams, the filter output is given by the convolution:
$$y\left(t\right)=x\left(t\right)* h\left(t\right)={\sum }_{n=0}^{N-1}{a}_{n}{e}^{j{\varphi }_{n}}x(t-n\Delta T)$$
The input optical data streams, at a symbol rate 1/ΔT Baud (1/ΔT is also the repetition frequency of microcomb), are wavelength-division multiplexed and feature unlimited lengths for streaming processing. The input optical data streams are simultaneously duplicated and weighted via cascaded MZIs, with the phases controlled by phase shifters (PSs). Next, the weighted replicas are progressively delayed with a delay step (between adjacent spatial paths) equals to ΔT, the symbol duration of input data streams, effectively achieving time-space interleaving (i.e., the input data’s adjacent symbols are aligned in time). Finally, the weighted and delayed replicas are coupled together, constructively or destructively interfering with each other to yield convolution results.
The convolution window effectively slides at the input data rate. Each output symbol is the dot product between the input data, within a convolution window or receptive field (determined by the scale of the chip, i.e., number of spatial paths N), and the weights (implemented by the MZIs and PSs). Consequently, each output symbol is the result of N multiply-and-accumulate (MAC) operations. The OCSP directly processes the optical fields when the OCSP processes real-valued data (i.e., intensity-modulated, phases are 0 or π), each of which takes 2 operations (1 multiplication and 1 accumulation) and thus results in a peak computing speed of 2 N/ΔT operations per second (OPS), per wavelength channel. This computing speed can be boosted when processing massive wavelength-division multiplexed optical data streams; and the frequency interval between wavelength channels needs to be equal to, or multiple of, the symbol rate 1/ΔT.
Kernel self-calibration of the OCSP
The key of the wavelength-division multiplexing and time-space interleaving convolutional computing architecture is rooted in two factors: a) the alignment of the optical comb repetition frequency with the FSR of the OCSP, and b) the alignment of phase shifts induced by each path on the OCSP. Yet any tiny delay variations in the magnitude of optical wavelengths (~1550 nm) can lead to large phase errors in the convolution weights and thus in the computing results. This is a nontrivial issue due to the challenges in obtaining phase-related responses of the chip and overcoming the dynamic on-chip thermal crosstalk or temperature disturbance (Fig. 2f, g). To address this, we first map the convolution kernel to the time-domain taps of the OCSP chip. Subsequently, we apply an on-chip phase recovery method43 to optical computing chips (Fig. 3a). This adds an optical reference path to the chip, which enables phase recovery based on the intensity-only spectral response of the chip via the gap method. The intensity response can be easily measured with an external tunable laser source and a power meter; by placing the reference path on the chip we ensure that any phase errors in the patch that lead to external instrumentation do not affect the phase measurement. The measured power response of the entire OCSP chip can be regarded as a superposition of spectral components yielded by “internal” (between the streaming processing taps) and “external” (between the reference tap and the streaming processing taps) interferences. As such, the impulse response of the OCSP (i.e., the convolutional weights) can be obtained via the Fourier transform of the entire chip’s power response, provided the reference path is sufficiently shorter than the streaming processing paths (τ > T). With the dynamic convolutional weights obtained (both the amplitudes and phases), the required updates of the electrical power supply can be obtained and the desired kernel weights can be deterministically dialed-up (for the detailed self-calibration process and robustness test, see Supplementary Notes 4, 5). Beyond successfully calibrating our OCSP, this phase recovery approach enables optical computing chips to obtain comprehensive dynamic on-chip frequency and impulse responses (both the phases and amplitudes), thus enabling accurate and trainable on-chip frequency responses that greatly enhances the computing accuracy and allows rapid training of coherent optical computing hardware.
a Photograph of the packaged OCSP chip. Optical signals enter and leave the chip via an edge-coupled fiber array. b The spectrum of the microcomb, with a repetition frequency of ~50 GHz. c The zoom-in of microcomb. d The extracted relationship between the applied electrical power and the power splitting ratio of the MZI. e The extracted relationship between the applied electrical power and the phase shifts of the phase shifters. f The recovered power of the taps with sweeping electrical power applied onto MZI1. g The recovered phases of the taps with sweeping electrical power applied onto phase shifter1
a Kernel self-calibrating schematic diagram. b The convergence curve of the OCSP taps’ amplitudes (first row) and phases (second row) within 65 iterations. The power response (third row) and impulse responses (obtained via Fourier transform, fourth and fifth rows) of the trained OCSP chip with the reference path included (i.e., the power response measured via the calibration ports). The frequency responses of the trained OCSP without the reference path (i.e., the power response measured via the signal ports, last two rows). λ1 = 1556.66 nm, λ2 = 1557.07 nm, λ3 = 1557.48 nm, λ4 = 1557.89 nm and λ5 = 1558.3 nm denotes the carrier of input wavelength-division multiplexed data streams
Convolutional kernel verification
To demonstrate our approach, we fabricated a 16-tap FIR chip on a standard Silicon-On-Insulator (SOI) platform43. Taps 9–16 were worked as the OCSP and taps 2–8 were in the off state, while tap 1 served as the reference path. The delay step of the OCSP’s streaming processing taps is ΔT = ~ 20 ps, corresponding to an input optical data rate of 50 GBaud. This also gives the OCSP’s impulse response duration T = 7 × ΔT = 140 ps. The delay gap between the reference path and the OCSP’s first tap is ~8 × ΔT, satisfying τ > T for subsequent phase recovery via Fourier transform. Four stages of tunable couplers were used to achieve the desired amplitudes of convolutional weights and eight PSs were used to manipulate the tap’s phases (i.e., the signs of the weights).
Distinctive convolutional kernels were first implemented using the OCSP chip for generic image processing functions (kernels 1, 2) and verification of convolution accuracy at the critical point (kernel 3) (Fig. 3b). The windowed power response of the entire chip (with ref. path) within a frequency range of 3.5/ΔT was Fourier transformed to obtain the impulse responses, which were then used to assess errors against the desired tap coefficients and thus the needed update of applied electrical power. After 65 iterations, the OCSP taps’ amplitudes and phases converge to their desired values and thus can be used for subsequent optical data processing (Fig. 3b, first two rows). Since the OCSP features a periodic frequency response with FSR of 1/ΔT (Fig. 3b, last two rows), the OCSP could support simultaneous computing/feature extraction of multiple wavelength channels, provided the wavelength channels’ spacing matches with an integer multiple of the OCSP’s FSR.
To verify the parallel computing capabilities of our OCSP chip, five wavelengths were simultaneously selected by the waveshaper1 (ws1) and transmitted through a shared optical path. The input optical data was encoded as intensities of flattened image matrices at 50 GBaud and loaded onto each wavelength channel. At the output port, the waveshaper2 (ws2) dynamically selected the computational results of each wavelength, enabling wavelength-demultiplexed readout (Fig. 4a–c). The waveforms of the convolution results and their mean squared error curves are shown in Fig. 4d, f (see Supplementary Note 2, 7 for the detailed discussion on the convolution results). The convolutional kernels directly implemented by the OCSP chip can achieve desired image processing functions including averaging (kernel 1) or edge enhancement (kernel 2). The OCSP’s convolutional weights need to satisfy certain rules such that the optical carrier can be maintained for detection: the sum of the weights corresponds to the transmission of the carrier, thus it cannot be set as zero—at which point the carrier is suppressed (i.e., located at the notches of the amplitude responses), such as the Sobel operator. We note that this does not impose any limitations onto the OCSP’s capability, since on one hand, a local oscillator laser at the same wavelength as the carrier can be offered at detection to compensate for the carrier’s power losses, on the other hand, the convolutional weights do not satisfy the rule can always be decomposed into two sets of convolutional weights that can be implemented simultaneously and then synthesized, as we have demonstrated in Fig. 4e (synthesized kernels 1 and 2). In this experiment, we break it down into two sets of taps when the sum of tap coefficients is lower than 2 (for 8-tap OCSP used) (see Supplementary Note 1, 3 for detailed discussion).
a Setup diagram. EDFA, erbium-doped fiber amplifier. BPF, bandpass filter. MZM, Mach-Zehnder modulator. PD, photodetector. b Original input image. c Spectrum diagram of five wavelengths after modulation of the signal. d The first column shows the experimental results of the output optical waveforms’ intensities, with the black dotted lines showing the simulated results and the second column illustrates the detailed view of the waveforms. e The feature maps generated after applying two different convolution kernels and the synthesized feature map (Sobel operator). f The convolution error curve after processing image data carried by five distinct wavelengths utilizing three different convolutional kernels
Optoelectronic hybrid neural network
To further highlight the capability of the OCSP system demonstrated in this work, we performed a proof-of-concept demonstration to validate our OCSP’s deployment in the data centers and complex neural networks, thereby working in coordination with electronic hardware to facilitate advanced AI tasks. Figure 5a shows the schematics of the integrated parallel data interconnect system for large-capacity optical interconnection. At the WDM transmitter, each comb line carries independent data streams through WDM technology. The OCSP is directly embedded in the WDM transmission link, maintaining compatibility with transceiver interfaces, enabling parallel feature extraction across all wavelength channels. At the WDM receiver, the multi-channel convolutional stream is detected by the photodetector and transmitted to the next computing node after processing (such as sampling and retiming) (for detailed deployment in datacenter, see Supplementary Note 9). In this proof-of-concept study, we use microcombs as the multi-wavelength source, waveshaper and Mach-Zehnder modulators facilitate the dynamic loading of data across various wavelengths instead. We adopted the PAM-16 modulation format which is a potential enabler for next-generation data center optical interconnects, offering higher spectral efficiency than the generic PAM-4 modulation format in data centers while meeting requirements envisioned for AI workloads. To validate the universality of our OCSP chip, we conducted tests in distinct datasets. Figure 5b shows the network model in the CIFAR1044 data test, 200 images were divided into 5 different batches, and five different wavelengths carried the image data of different batches respectively. The input RGB image of dimensions 34 × 34 × 3 pixels was processed through three 2 × 2 × 3 convolutional kernels, generating three feature maps with spatial resolution 17 × 33 for subsequent network processing. In this experiment, the first convolutional layer was implemented using photonic computing hardware, while the subsequent layers of the network were realized through electronic computing hardware. The electronic hardware segment comprises five basic layers followed by a fully connected (FC) layer. Each basic layer contains four consecutive convolutional layers that form a dense processing block. The feature maps and the expanded waveforms of kernel 1 obtained from the experiment and the CPU are shown in Fig. 5c. The slight difference between the experiment and calculation is mainly caused by limited system bandwidth. The confusion matrix (Fig. 5d) illustrates the accuracy of the prediction obtained from experiment and theoretical calculation is 85% and 92%. (Additionally, the experiment results on ImageNet subset45, Fashion-MNIST46, and MNIST47 datasets are shown in Supplementary Note 10).
a Schematic diagram of parallel optical interconnection in data centers. The OCSP chip is embedded in the WDM transmission link, serving as a multi-channel convolution accelerator and cooperating with electronic devices to perform complex network tasks. OPU optical processing unit. XPU accelerated Processing Unit. EIC electronic integrated circuits. b The architecture of the convolution neural network (CNN). Each basic layer contains four consecutive convolutional layers that form a dense processing block. To enhance gradient flow and mitigate vanishing gradients, residual connections are integrated by routing the input through a convolutional layer (for dimension alignment) before merging it with the output of the block via element-wise addition. Following these residual blocks, a pooling layer down-samples the extracted features, which are then flattened and projected onto a 10-dimensional space corresponding to the CIFAR-10 classes by the final FC layer. Conv, convolutional layer. FC, fully connected layer. c The experimental and simulated feature maps and waveform of kernel1. d The confusion matrices obtained from CPU implementation and the proposed optoelectronic hybrid architecture implementation





