Prototype hardware design and testing of the global common module for the global trigger subsystem of the ATLAS phase-II upgrade

The High-Luminosity Large Hadron Collider (HL-LHC) will deliver more than ten times the integrated luminosity of the previous runs combined. Meeting its stricter throughput requirements poses new challenges to the Trigger and Data Acquisition (TDAQ) systems of the LHC experiments. Introduced in the framework of the ATLAS experiment’s HL upgrade, the Global Trigger (GT) is a new subsystem which will perform offline-like algorithms on full-granularity calorimeter data. The implementation of the GT’s functionality is firmware-focused and is composed of three layers: multiplexing (or data aggregating), global event processing, and demultiplexing interface to the central trigger processor. Each layer will be composed of several, similar nodes, hosted on replicas of identical hardware, the Global Common Module (GCM), an ATCA front board which is designed to be adopted throughout the entire GT subsystem. This article proceeds from the TWEPP 2021 conference and presents the GCM hardware design, performed in 2020, and focuses on some key results of its extensive testing performed in 2021.


Introduction
While the High-Luminosity Large Hadron Collider (HL-LHC) 1 will increase the accelerator's potential for new discoveries, it will also pose new challenges to the trigger and data acquisition (TDAQ) systems of the LHC experiments, which will have to cope with a higher throughput deriving from the increased rates of proton collisions. As part of the ATLAS experiment's HL-LHC ("Phase-II") TDAQ upgrade, a single-level ("Level-0", L0) trigger architecture will feature new and improved hardware and algorithms, with an increased maximum rate of 1 MHz and a latency of 10 μs [1]. The introduction of the new Global Trigger (GT) subsystem will enable running offline-like (i.e. close to full reconstruction) trigger algorithms on full-granularity data, gathered from several subdetectors and trigger-processing subsystems, 2 to bring event-filterlike capability to the L0 trigger system, previously only a high-level, software feature. The GT subsystem design has a strong focus on firmware (FW)-based implementation of the trigger functionalities. As shown in figure 1, the specialized components are topologically factorized into three main FW layers, each featuring one or more similar nodes. The "data aggregator", or "global multiplexer" (MUX) layer handles the 60 Tb/s input of the GT. Every MUX node gathers up to 64 serial input channels and time-multiplexes the bunch-crossing data such that every event is concentrated onto one (out of, at least, 48 nodes) Global Event Processor (GEP), that executes, in parallel with the other nodes, the actual trigger algorithms. Compared to the phase-I hardware trigger, the event processor is decoupled from the LHC bunch-crossing rate, allowing the use of asynchronous and high-level algorithms, and increasing the number of trigger objects that the L0Calo feature extractors can transmit. At the output layer the Central Trigger Processor (CTP)interface de-multiplexes the results to the CTP, the subsystem responsible for the final trigger decision [1].

GCM hardware design
As figure 2 shows, the GT's unified hardware platform, called the Global Common Module (GCM), is an Advanced Telecommunications Computing Architecture (ATCA) front board [1]. The various nodes of the three FW layers (MUX, GEP, or CTP-interface) are all hosted in replicas of GCMs, simplifying the system-level design and long-term maintenance. This common board features two large Xilinx Virtex UltraScale+ (XCVU13P) processing field-programmable gate arrays ("pFPGA-A" and "pFPGA-B"), with multi-gigabit transceivers (MGT) accessing 8 pairs of 12-channel, 25-Gb/s Samtec FireFly optical modules each. A multiprocessor system on a chip (MP-SoC) Xilinx Zynq UltraScale+ (XCZU19EG), with dedicated GbE, SD, UART and JTAG interfaces, is added for readout, control, monitoring and debugging. There are two 16-GB DDR4 DIMMs available to each FPGA (6 in total). Two SI5345 chips are used to generate the MGTs' reference clocks: one chip generates a synchronized clock, based on the recovered clock from the ATLAS timing, trigger and control (TTC) link; the other one generates an asynchronous clock for the 25.78125 Gb/s links. A CERN Intelligent Platform Management Controller (IPMC) [5] card is also installed to monitor the health of the board and protect it from over-voltage and overtemperature. The IPMC configures the alarm thresholds of the temperature sensors and monitors temperatures, voltages and currents of the DC/DC power modules and of the FPGA devices.

GCM hardware testing
All the major hardware functionalities and critical technologies were successfully verified [6]. Firstly, all power rails were checked for short-circuits and nominal voltages. No issues were found, and the power-on sequencer was programmed to enable them in the correct order. The power ripples of the FPGA rails (VCCINT, MGTAVCC and MGTAVTT) were measured and found to be smaller than 7 mVpp, thus meeting the 10 mVpp requirement from Xilinx. An operating system for the Zynq+ processing system (PS) was cross-compiled with Yocto 2020.2, and the dedicated interfaces were validated. Full functionality from Zynq+ SoC was implemented with Python scripts, handling board initialization and SI5345 chips configuration, control and monitoring of power consumption and thermal performances of the FireFly and LTM4700/4678 power modules. The connectivity of all the serial links was successfully validated down to a bit-error ratio (BER) of 1 × 10 −9 with Xilinx's Integrated BER Tests (IBERT), for both the 12.8 Gb/s (used on L0Calo-to-MUX links) and the 25.78125 Gb/s (used on calorimeters-cells-to-MUX and MUX-to-GEP links) line-rates. Due to the unavailability of high-speed FireFly modules at the time, the latter case relied on the electrical tracks on the PCB connecting four MGT quads on the Zynq+ to two quads on each pFPGA, rather than on the 14 Gb/s FireFly modules used in the former. Figures 3 and  4 show the performance test results of optical and electrical links of pFPGA-A, obtained with the following common configurations: FireFly RX de-emphasis = OFF, pFPGA TX swing = 450 mV, pFPGA RX termination = 600 mV and pFPGA TX pre/post-cursor = 0 dB. The FireFly RX output swing was adjusted as indicated in figure 3: an optimal value of 610 mV yielded a good open area in the range 9000 to 12000, even for the links with the longest traces.

Thermal analysis
The technical coordination for operation in the ATLAS experiment's cavern "USA15" limits the power consumption of ATCA boards to 350 W. The GCM's power dissipation is primarily concentrated onto the FPGAs devices and the power modules; thus, given the impossibility of using on-device active cooling components, proper control of operating temperatures is critical to meet power and temperature specifications and to preserve the lifetime of the devices. 3 To characterize the board's thermal behavior under different power consumption and chassis fan speed scenarios, computational fluid-dynamics (CFD) and heat-transfer simulations were performed with the software COMSOL Multiphysics 5.6. As figure 5 shows, the geometries of the GCM board and its major power-dissipating and flow-obstructing components (FPGAs, FireFly optical modules, DC/DC power modules, DDR4 DIMMs, connectors), along with their heat sinks and vias connecting to the buried ground copper plates (allowing heat to spread across the board), were modeled and meshed manually. Various L-VEL 4 CFD simulations were performed for different fan speeds (maximum air flow rate of 155 m 3 /hour), after recreating a realistic slot environment with air flow data gathered from the ATCA chassis manufacturer's (nVent Schroff) cooling test reports.
Once the assembled prototype became available, power-stress tests were also performed on the ATCA chassis, under realistic scenarios, involving different fan levels and powers dissipated by the pFPGAs, ranging from 33 to 100 W (each). To this end, several bit-files were generated with Xilinx Vivado, implementing multiple replicas of a power-hungry FW "slice", composed of about 5% of all the available UltraScale+ resources (flip-flops, look-up tables, MGTs, digital signal processors, block-RAM and ultra-RAM tiles) and dissipating 9 W. The power and temperature information from on-board sensors and all the I2C devices was read out and saved by Python scripts running on the Zynq+ PS, making up a collection of over a thousand data-points over a couple-dozen tests run under the different conditions. Figure 5 shows the temperature results and the good matching observed between the CFD simulations and the actual stress tests. The design was proven to be in compliance with the temperature ratings of the most power-dissipating components and with the board's total power dissipation in the GT's use cases for MUX and GEP nodes.

Conclusions
The ATLAS GT is a new FW-focused subsystem designed to meet new trigger requirements of the HL-LHC. The GT's dedicated unified hardware platform, GCM, was designed, fabricated, and tested successfully in all of its functionalities, and it is now ready to host FW development. Thermal analysis was performed both with CFD simulations and real-world tests; the thermal performances of the board were proven to be satisfactory and meet all requirements, under different FW configurations involving different FPGA resource usages and power consumptions hypotheses (low-power MUX and high-power GEP nodes), as well as different chassis cooling conditions.