Design and test performance of the ATLAS Feature Extractor trigger boards for the Phase-1 Upgrade

In Run 3, the ATLAS Level-1 Calorimeter Trigger will be augmented by an Electron Feature Extractor (eFEX), to identify isolated e/γ and τ particles, and a Jet Feature Extractor (jFEX), to identify energetic jets and calculate various local energy sums. Each module accommodates more than 450 differential signals that can operate at up to 12.8 Gb/s, some of which are routed over 30 cm between FPGAs. Here we present the module designs, the processes that have been adopted to meet the challenges associated with multi-Gb/s PCB design, and the results of tests that characterize the performance of these modules.

A : In Run 3, the ATLAS Level-1 Calorimeter Trigger will be augmented by an Electron Feature Extractor (eFEX), to identify isolated e/γ and τ particles, and a Jet Feature Extractor (jFEX), to identify energetic jets and calculate various local energy sums. Each module accommodates more than 450 differential signals that can operate at up to 12.8 Gb/s, some of which are routed over 30 cm between FPGAs. Here we present the module designs, the processes that have been adopted to meet the challenges associated with multi-Gb/s PCB design, and the results of tests that characterize the performance of these modules.

K
: Digital electronic circuits; Trigger algorithms; Trigger concepts and systems (hardware and software); Data acquisition circuits

Introduction
In Run 3 (starting in 2021), the LHC [1] luminosity will double (to ∼ 2.5 × 10 34 cm −2 s −1 ), which will greatly increase the pileup rate. However, the ATLAS [2] front-end detector electronics will remain largely unchanged. Hence the total ATLAS Level-1 Trigger [3] rate will still be limited by the readout bandwidth of the front-end electronics to 100 kHz or less. Moreover, the Level-1 Trigger must retain sensitivity to electroweak physics processes and stay within the current ATLAS Level-1 latency envelope of 2.5 µs. To meet these challenges, the Phase-I Upgrade [4] to the ATLAS Level-1 Trigger system is needed. Figure 1 shows the architecture of the Phase-I Upgrade of the ATLAS Level-1 Calorimeter Trigger (L1Calo) [5]. The current L1Calo system is augmented by three additional Feature Extractor (FEX) subsystems:

ATLAS Level-1 Calorimeter Trigger architecture for Phase-I Upgrade
• the electromagnetic Feature Extractor (eFEX), comprising eFEX modules and Hub modules with Readout Driver (ROD) daughter cards, which identifies isolated e/γ and τ candidates, using data of finer granularity than is currently available to L1Calo Cluster Processor subsystem; • the jet Feature Extractor (jFEX), comprising jFEX modules and Hub modules with ROD daughter cards, which identifies energetic jets and computes various local energy sums, using data of finer granularity than that available to the current L1Calo Jet Energy Processor subsystem; • the global Feature Extractor (gFEX [6]), comprising one gFEX module, which identifies calorimeter trigger features requiring the complete calorimeter data.
In addition to these, the Phase-I upgrade of L1Calo includes the Tile Rear Extension (TREX) to the Pre-Processor subsystem, which digitizes Tile data and transmits them to the FEXs optically, the Fibre Optical Exchange (FOX), and the FEX Test Module (FTM), which facilitates the testing of FEX modules before system-level commissioning.
Apart from the small number of Pre-Processor modules that digitize Tile data for the FEXs, the current L1Calo system, comprising the Pre-Processor, Jet Energy Processor and Cluster Processor, will be decommissioned after the Phase-I Upgrade is fully commissioned.

Trigger algorithms and performance
In the current L1Calo system, the Cluster Processor processes data from the calorimeters and identifies energy deposits characteristic of isolated e/γ and τ particles, using Trigger Towers of a typical granularity of 0.1 × 0.1 (η × ϕ). The eFEX performs this same function using higher granularity data from the Liquid Argon (LAr) electromagnetic calorimeter. For each LAr Trigger Tower, the eFEX receives data from 10 'supercells' in four layers, as shown in figure 2.  This makes it possible to increase the discriminatory power of L1Calo by running a collection of new trigger algorithms, including R η , f 3 and R H ad as defined in figure 2, that analyse shower shapes. These algorithms run in a window of 0.3 × 0.3 (η × ϕ) that slides by 0.1 in both η and ϕ (such that neighbouring instances of the window overlap).
The mean number of pileup events will increase from 30 at Run 2 to 80 at Phase-I. Figure 3 shows the results of a simulation comparing the performance of the current (Run 2) algorithms with the eFEX (Phase I) algorithms at 80 pileup (µ = 80). It shows the eFEX can reduce the electromagnetic trigger rate by a factor of ∼ 3, or allow the trigger threshold to be lowered by ∼ 7 GeV at the 20 kHz reference point. Further optimization of eFEX algorithms is under study.
The jFEX identifies jets, and calculates E T and E miss T . In the current system, these functions are implemented by the JEP. The jFEX improves on the performance of the JEP by a number of means. It receives higher-granularity calorimeter data (0.1 × 0.1 (η × ϕ) rather than 0.2 × 0.2) and implements a Gaussian-weighted filter, giving it greater discriminatory power; it can implement a larger algorithm window; and each jFEX module processes data from a complete ring of the calorimeter in ϕ, enabling in-time pileup suppression and improving the calculation of E T and E miss T . Figure 4 shows the results of simulations comparing the performance of the algorithms that can be implemented on the jFEX (red curve), with those currently run on the JEP (black curve) at 80 pileup (µ = 80). It shows how the turn-on curves of the jFEX are sharper -a fact that can be used to raise the trigger thresholds without losing efficiency, leading to rate reductions similar to the eFEX.

Processing area
The eFEX subsystem processes data from the calorimeters within the region of |η| ≤ 2.5 and 0 ≤ φ ≤ 2π -a total bandwidth of ∼ 14 Tb/s. Given the limits of current technology, it is impossible to receive this in a single module. Due to the overlapping nature of the eFEX algorithm windows, partitioning the subsystem into multiple modules means a substantial volume of calorimeter data must be duplicated and/or shared between modules (and between FPGAs on the modules). This partitioning needs to balance the total number of modules, the number of FPGAs per module, the fibre count per module, the complexity of fibre mapping between the calorimeters and the eFEX, and the difficulty of sharing data between adjacent modules and between FPGAs. Figure 5 shows the partitioning of the eFEX prototype design. The middle eFEX module processes a core calorimeter area of 1.6 × 0.8 (η × ϕ), whereas the eFEX modules on two sides process a core calorimeter area of 1.7 × 0.8 (η × ϕ). Thus, three eFEX modules process a complete strip in the η range, and 24 modules are required in total.

eFEX prototype
The eFEX prototype, shown in figure 6, is an ATCA [12] module with a non-standard physical form: the front board is extended through Zone 3 into the rear shelf space to optimize the routing of the input fibres, which are connected to the module via a custom Rear Transition Module (RTM).
The eFEX PCB is a 22-layer board with six micro-via layers. It houses: • 4 Xilinx Virtex-7 [13] FPGAs (XC7VX550T) for algorithm processing; • 1 Xilinx Virtex-7 FPGA (XCVX330T) for control and readout functions; • 17 Avago MiniPODs [14] for optical input (144 signals) and output (36); • 94 high-speed fan-out buffers (NB7VQ14M [15]) for data duplication between FPGAs. The high-speed fan-out buffer NB7VQ14M was tested on a previous module, the High-Speed Demonstrator (HSD) [16]. It exhibited very good signal quality at 10 Gbps with negligible propagation delay, and hence it was chosen for data duplication on the eFEX module. In total, there are about 450 high-speed multi-Gb/s differential tracks routed on a single eFEX. Blind and buried vias are used to achieve this density of signal tracks. The PCB is made from a low-loss material (both Isola Itera and Megtron6 have been used on different prototype modules) and the PCB is rotated by 22 o to minimize the effect of PCB fibre glass weave on differential skew.

Processing area
The jFEX subsystem receives data from the calorimeters within the region of |η| ≤ 4.9 and 0 ≤ ϕ ≤ 2π -a total bandwidth of ∼ 3 Tb/s. The jFEX subsystem is partitioned into 7 processing modules each covering a ϕ ring as shown in figure 7. This ϕ-ring coverage of each jFEX module enables it to calculate pile-up (i.e. energy density) for the η range processed, and apply this as a correction to the jet and E miss T algorithms.

jFEX prototype
The jFEX prototype is being manufactured. The PCB layout is shown in figure 8. It uses the same physical form factor as the eFEX prototype, so that the modules can share the same RTM design. The jFEX PCB is implemented as a 24-layer board with 8 micro-via layers. It houses: • 4 Xilinx Ultrascale [13] FPGAs (XCVU190) for algorithm processing and readout; • 1 mezzanine card for control; • 24 Avago MiniPODs for optical input (216 signals) and output (32); Due to its larger algorithm window, the jFEX needs to share even more data between FPGAs than the eFEX. In total, there are about 540 high-speed multi-Gb/s differential tracks routed on a single jFEX PCB. A loopback feature of the Xilinx Multi-Gb/s Transceiver (MGT), Far-End PMA  loopback, is used for data-sharing between FPGAs on the jFEX module. This makes use of the otherwise unused transmitters of MGTs with a small sacrifice of latency. Figure 9 shows the results of loopback tests done on a Xilinx Evaluation Board VCU110 (Ultrascale XCVU190), which shows a very good eye opening at 25 Gb/s.

PCB design method
The eFEX prototype and jFEX prototype share a lot of challenges in PCB design. Firstly, both are very high-density and high-speed PCBs. The baseline speed of the input links specified in the ATLAS Phase-I TDR is 6.4 Gb/s. However, there is always a strong desire to run the links faster in order to further improve trigger performance and flexibility. Secondly, both the eFEX and jFEX require complex channel mapping and data sharing. As a consequence, some high-speed links need to run very long signal tracks across the whole PCB between FPGAs. Thirdly, both the eFEX and jFEX have very high power consumption, approaching 400 W per module. Cooling design will be very challenging as all the power consumed turns into heat.
To meet these challenges, the systematic PCB design method, which was developed with great success in the HSD project [16], has been adopted in both eFEX and jFEX PCB design. A series of PCB simulation, testing and validation technologies have been integrated into the PCB design flow. Notably, in addition to the signal integrity simulation, power integrity simulation (as shown in figure 10) is particularly important for these modules. The power rails for the FPGA cores and MGTs on both the eFEX and jFEX need to be able to carry a current larger than 100 A; the voltage drops across the power distribution networks thus become significant design constraints.

Prototype test
After passing initial power-on and boundary scan tests smoothly, the eFEX prototype (figure 6) was tested with the FTM and the LAr Digital Processing System (DPS) prototype, in a systematic check of all the eFEX high-speed input and output links.
In order to validate the TDR baseline link speed and test the upper limit of the possible link speeds, all eFEX high-speed links were tested at three different speeds (6.4 GB/s, 11.2 Gb/s and 12.8 Gb/s). The decision on the link speed has a significant impact on the FEX architecture, especially for jFEX, where different link speeds required completely different partitioning.
The test setup for these link speed tests is very close to the final system as shown in figure 1. For example, a FOX demonstrator was used to mimic the complex fibre mapping and insertion loss between the LAr DPS and the eFEX.
For the link tests with the LAr DPS, the link sources were Altera Arria 10 FPGAs [17] with MGTs capable of up to 14 Gb/s. The first eFEX prototype is fitted with Xilinx speed grade -2 Virtex-7 FPGAs, with MGTs specified up to 11.3 Gb/s. The FTM is fitted with Xilinx speed grade -3 Virtex-7 FPGAs, with MGTs specified up to 13.1 Gb/s.

Link speed test results
The test results obtained for the TDR baseline link speed of 6.4 Gb/s are extremely good, with wide-open 2-D eye scans and bit error rates of less than 10 −14 (no error over 3 × 10 14 bits) for 257 out of the 264 input links on the eFEX prototype.
At 11.2 Gb/s, the opening of the 2-D eye scans on the eFEX is still very good, as shown in figure 11. Figure 12 shows the overall statistics of the open areas of 2-D eye scans for all eFEX input links at 11.2 Gb/s. The bit error rates on 257 out of 264 eFEX input channels are still less than 10 −14 . Of the results for the other 7 links, 4 are correlated to less-optimal PCB routing (which can be improved in next PCB iteration), and 3 are due to a bad high-speed fan-out buffer (which can be repaired).
At 12.8 Gb/s, many links on the eFEX prototype still work, but a significant number fail, as this is outside of the FPGA MGT's specified speed range. In order to evaluate the eFEX performance at 12.8 Gb/s, another eFEX prototype will be fitted with Xilinx speed grade -3 Virtex-7 FPGAs, with MGTs capable of running up to 13.1 Gb/s.

Link speed decision
Based on the above excellent test results, and previous test results between the LAr DPS and the L1Calo gFEX, 11.2 Gb/s has been adopted as the new baseline link speed between LAr and L1Calo. This has greatly simplified the jFEX architecture, increased the dynamic range of the calorimeter data received by L1Calo, and simplified the link protocol, all of which improve L1Calo trigger performance.

Conclusion
The ATLAS Level-1 Calorimeter Trigger will be upgraded as part of the ATLAS Phase-I upgrades for 2019. Development of both the eFEX and jFEX are well underway, with prototypes under test or in manufacture. A systematic PCB design method, centred on PCB simulation and validation, has been used in developing these high-speed, high-density and high-power modules. The test results of the first eFEX prototype are very good, thus the baseline for the speed of links into L1Calo has been increased from 6.4 Gb/s to 11.2 Gb/s, simplifying the architecture and improving the performance of the trigger.