Machine learning for real-time processing of ATLAS liquid argon calorimeter signals with FPGAs

The ATLAS experiment at CERN measures energy of proton-proton (p-p) collisions with a repetition frequency of 40 MHz at the Large Hadron Collider (LHC). The readout electronics of liquid-argon (LAr) calorimeters are being prepared for high luminosity-LHC (HL-LHC) operation as part of the phase-II upgrade, anticipating a pileup of up to 200 simultaneous p-p interactions. The increase of the number of p-p interactions implies that calorimeter signals of up to 25 consecutive collisions overlap, making energy reconstruction more challenging. In order to achieve the goal of the HL-HLC, field-programmable gate arrays (FPGAs) are used to process digitized pulses sampled at 40 MHz in real time and different machine learning approaches are being investigated to deal with signal pileup. The convolutional and recurrent neural networks outperform the optimal signal filter currently in use, both in terms of assigning the reconstructed energy to the correct proton bunch crossing and in terms of energy resolution. The enhancements are focused on energy obtained from overlapping pulses. Because the neural networks are implemented on an FPGA, the number of parameters, resource usage, latency and operation frequency must be carefully analysed. A very good agreement is observed between neural network implementations in FPGA and software.


Introduction
The ATLAS detector [1] is placed at the Large Hadron Collider [2] (LHC) and is used to detect particles generated in high-energy p-p collisions.Every 25 ns, the proton bunches collide, resulting in a collision frequency of 40 MHz.Scheduled to begin with Run-4 in 2027, the next highluminosity phase of the LHC (HL-LHC) is projected to achieve instantaneous luminosities of 5-7x10 34  −2  −1 .This corresponds to 140-200 p-p interactions occurring at the same time.The ATLAS liquid argon (LAr) calorimeter mainly exploit the ionisation signal to measure the energy of electromagnetic showers of photons, electrons, and positrons.The fact that up to 25 signal pulses produced in successive LHC bunch crossings (BCs) might overlap, resulting in an out-of-time pileup, significantly decrease the energy resolution of the LAr calorimeter.
Each of the 182,000 calorimeter cells is required to reconstruct the deposited energy at the correct BC with the high energy resolution.The calorimeter is expected to provide real time energy reconstruction to the ATLAS trigger system, thus continuous data processing is required.As a result, the digital processing of LAr calorimeter signals in run 4 must be able to manage continuous data.Due to the huge input data bandwidth of about 250 Tbps delivered through serial connections with 36,000 optical fibers, FPGA technology was chosen over alternative processing devices.In the current design options, one Intel Stratix-10 FPGA [3], with a latency requirement of about 150 ns [4,5], will process 384 or 512 LAr calorimeter cells, which corresponds to data measured by three or four so-called front-end boards (FEBs), respectively.

Energy reconstruction in the ATLAS liquid argon Calorimeter
To calculate the energy in each cell, the current readout electronics of the LAr calorimeters digitize the electronic pulse from the calorimeter at 40 MHz and use an optimal filter [6] (OF) algorithm.Electronic noise and signal pileup are reduced by using a linear combination of up to five digitized pulse samples.To allocate energy to the right BCs, a peak finder is employed.The optimum filter expects a perfect pulse shape, which leads to reduced performance when the pulse is distorted by prior events.In this case, the peak finder fails to efficiently assign the energy to the correct BC.
We develop Artificial Neural Network (ANN) based methods to improve the energy resolution at the HL-LHC.The ANNs are trained using simulated HL-LHC data obtained by AREUS [7], which includes electronics noise and low-energy deposits in the range up to approximately 1 GeV from particles produced in inelastic p-p collisions.To emulate hard-scattering events, a uniform transverse energy spectrum is overlaid randomly, with maximum energy deposits of 5 GeV at a mean interval of 30 BC with a standard deviation of 10 BC.The simulation is performed for one cell in the barrel section of the LAr calorimeter with an average pileup (µ)=140.

Neural network
Two neural networks architectures based on Convolutional and Recurrent Neural Networks (CNNs, RNNs) are evaluated for energy reconstruction.Keras [8] and TensorFlow [9] are used to develop and train the ANNs.

Convolutional neural networks
CNNs [10] are studies as an alternative to the OF approach.The networks use a slidingwindow method to analyse the input data sequence.The best performance is achieved by splitting the CNN architecture into two subnetworks that are tuned for distinct tasks.The first tagging network structure detects relatively high energy deposits over 3 of electronic noise threshold, which corresponds to 240 MeV.A detection probability is provided along with the sample sequence to a second structure that is trained to rebuild the deposited energy in each calorimeter cell.
The two CNNs named 3-Conv and 4-Conv (figure 1) have the same tagging part configuration, but the energy reconstruction consists of one (3-Conv) or two (4-Conv) convolutional layers.The OF achieves a maximum signal efficiency of about 80%, while the tagging CNN reaches efficiencies well above 90%.

Recurrent neural networks
RNN are designed to process time series data.It consists of internal neural network that process the input at the current time combined with past processed state.They are excellent candidates for quantifying deposited energy from time-ordered digital LAr signals.Vanilla-RNN [11] and long short-term memory (LSTM) [12] are the two RNN architectures that are explored.The vanilla-RNN is a network topology with substantially fewer parameters and contains just one activation function, in our case we choose the ReLU activation function.On the other hand, the LSTM has a sophisticated internal structure that uses neural network layers with sigmoid and tanh activation functions to gate the flow of information to the next timestep.As a result, LSTM can process longer sequences and can be used in two different methods.Sliding window method (figure 2), where the digitized signal from the calorimeter is divided into overlapping sub-sequences and each sub-part has a single reconstructed energy.The second method is a single cell, which is a continuous processing of information at each timestamp without the usage of a specified sequence interval.For the Vanilla-RNN, it only works with the sliding window approach due their simplicity.Figure 3 shows a comparison of the energy resolution between various NN algorithms and the optimal filtering algorithms.Only energy deposits above 3 the noise thresholds are considered.All four ANNs outperform the OF performance.

Neural network performance
Figure 4 shows the energy resolution as function of the time gap between subsequent energy deposits.At low gap, leading to overlapping pulses, the OF performances degrade significantly.The NNs are capable of recovering the performance in this low gap region.

FPGA performence
The CNNs and RNNs implementations were made with different hardware description languages.Very High-speed integrated circuit hardware Description Language (VHDL) is used to implement CNNs and High Level Synthesis (HLS) is used for RNNs.Those implementations are simulated in Quartus 20.4 [13] and Questa Sim 10.7c [14] respectively and their output is compared to the one from Keras.The small differences observed in figure 5 are caused by quantization and by the LUT-based realisation of the activation functions.Table 1 shows implementations on a Stratix-10 FPGA for a single data input channel to compare the maximum execution frequency, latency, initiation interval and resource usage in terms of number of digital signal processing (DSP) and adaptive logic modules (ALM).The maximum achievable processing frequency for all implementations is in the range of 480-600 MHz.Receiving at the LHC BC frequency of 40 MHz, it's possible to implement fifteen-fold multiplexing of the input data for vanilla RNN and six-fold for CNNs (Table 2).The VHDL implementation targets mainly low latency for fast execution.The HLS implementation targets high frequency to allow higher multiplexing.This is reflect in the performance shown in table 2. Optimization of both implementation is ongoing to find an acceptable compromise between the high frequency and the low latency to fit the readout requirements for the LAr phase-II upgrade.

Figure 1 :
Figure 1: Representation of the structure of the CNNs developed for energy reconstruction.The pulse tagging part allows to detect energy deposits about the noise level.The energy reconstruction parts uses the pulse tagging output and the digitized input from the calorimeter to reconstruct the energy [15].

Figure 4 :
Figure 4: Resolution as function of the distance to previous high energy deposit for the OF with maximum finder, LSTM, Vanilla-RNN and 3-Conv CNN algorithms [15].

Figure 5 :
Figure 5: Relative deviation of the firmware and software results [15].