Latest frontier technology and design of the ATLAS calorimeter trigger board dedicated to jet identification for the LHC run 3

To cope with the enhanced luminosity of the beam delivered by the Large Hadron Collider (LHC) in 2020, the “A Toroidal LHC ApparatuS” (ATLAS) experiment has planned a major upgrade. As part of this, the trigger at Level1 based on calorimeter data will be upgraded to exploit fine-granularity readout using a new system of Feature Extractors, which each use different physics objects for the trigger selection. The article focusses on the jet Feature EXtractor (jFEX) prototype, one of the three types of Feature Extractors. Up to 2 TB/s have to be processed to provide jet identification (including large area jets) and measurements of global variables within few hundred nanoseconds latency budget. This requires the use of large Field Programmable Gate Array (FPGA) with the largest number of Multi Gigabit Transceiver (MGT) available on the market. The jFEX board prototype hosts four large FPGAs from the Xilinx Ultrascale family with 120 MGTs each, connected to 24 opto-electrical devices, resulting in a densely populated high speed signal board. MEGTRON6 was chosen as the material for the 24 layers jFEX board stack-up because of its property of low transmission loss with high frequency signals (GHz range) and to further preserve the signal integrity special care has been put into the design accompanied by simulation to optimise the voltage drop and minimise the current density over the power planes. An integrated test has been installed at the ATLAS test facility to perform numerous tests and measurements with the jFEX prototype.

. Level1 Calorimeter trigger system at the beginning of LHC Run 3. New components are shown in yellow. The new system will run in parallel with the current legacy system until the new system will be fully commissioned.
is fully commissioned. The new system is composed of an optical plant (FOX) that distributes digitised data from the calorimeters to the Feature EXtractors (FEXs). The FEXs differ in the physics objects used for the trigger selection and they are the: electron FEX (eFEX) [3], the jet FEX (jFEX) and the global FEX (gFEX) [4].
In the following sections motivations, requirements and the consequent challenges of the design and production of the jFEX board will be described. Special emphasis is given to the simulation of the signal integrity, power plane configuration and thermal behaviour. Measurements of the PMA loopback as local data duplication method and FPGA power consumption as function of the FPGA resource's usage (expressed in DSP number) are also reported.

II. HARDWARE DESIGN
The jFEX board will identify jet and large-area tau candidates and calculate ΣE T and E miss T . It will receive data from the central and forward calorimeter with various granularities over the η range of ± 4.9. The input bandwidth has been maximised so that the jFEX can exploit the calorimeter information at the highest possible granularity whilst maintaining the capability of large area jet clustering. Several algorithms for the jet identification will run on the jFEX, with the current baseline algorithm being the "Sliding window jet algorithm with Gaussian weighting", which identifies the local maximum with the sliding window algorithm and for each local maximum summing the surrounding cells with a Gaussian weighting distribution. Each FPGA covers an area of 2.4 × 3.2 in η × φ, with the core area of 0.8 × 1.6. The data are duplicated on the module within the FPGA using the PMA loop-back (see Section IV-A). The need of large processing power is expected due to the complexity of the algorithms that will be implemented. Fig. 2 is the jFEX block diagram, where all the jFEX functionalities are sketched. The digitised calorimeter data are fed into the jFEX through the Fibre Optical Plant (FOX) and then received on the board through the electro-opto devices, miniPOD [5]. Each module covers the whole ring in φ with overlapping regions between processors in η and φ duplicated on board using PMA loop-back. The results of jet identification and tau algorithms and the calculation of transverse energy sum and missing transverse energy are sent to the Level1 Topological Processor board [6] (L1Topo in short) as Trigger Objects (TOBs).
The board control is located on an extension mezzanine (see Section II-B) to allow flexibility in upgrading control functions and components without affecting the main board. This philosophy guarantees smooth and reliable board operation and compatibility with the surrounding trigger system for the next twenty years of data taking.
The data readout checks and external communications are performed via the ROD [2] and the HUB [2].

A. jFEX challenges and design
The high input bandwidth and the large processing power required drove the FPGA choice. The Xilinx Ultrascale XCUV190-2FLGA2577 device [8] meets the requirement of large number of Multi Gigabit Transciever (MGT) and of processing power. This device has 120 MGTs, capable up to 3.6 Tbps input bandwidth. The jFEX board hosts four Xilinx Ultrascale and twenty-four miniPODs, twenty receivers and four transmitters. The jFEX board is an ATCA [7] board therefore the routing space available is quite limited taking into account all the components on it. This constraint affected also the stack-up design because of number of the signal layers dedicated to high speed signals using the configuration of alternate ground planes and signal planes for best signal quality.
The challenges faced during the design of the board are summarised as follows: • signal integrity • FPGA power estimate • FPGA power dissipation (not discussed in this article) The jFEX is characterised by a modular design that consists of a single block replicated four times. The single block is defined by the FPGA and the six miniPOD adjacent to it. The modular design has the advantage of symmetric board design and consequently the length of the traces are similar between processors and miniPOD and also for FPGA interconnections. This is also a consequence of the design choice of organising the traces in buses. The differential signal pair has width 500 μm with each trace 200 μm wide. Bus lengths have been minimised and there are no crossing busses to optimise signal integrity.
The FPGA power supplies are located centrally between the four FPGAs to minimise the voltage drop between the power supply and the processor. Fig. 3 shows the jFEX layout. There are more than sixteen thousand connections. The dashed white squares highlight the FPGA position and the dashed yellow rectangles the miniPODs location. Since there is only one transmitter per block, the latter was labelled.
Concerning the FPGA power estimate, during the design phase only the FPGA power estimate calculated with the Xilinx Power Tool was available. The power plane configuration was validated by the FPGA power plane and thermal simulation (see Section III) taking into account the increase in resource usage due to the jet identification algorithm. Since the accuracy of the FPGA power consumption estimate affects the choice of the DC/DC converter, a campaign of measurements with the evaluation board VCU110 was performed and the results are reported in Section II-B.

B. Mezzanines
The jFEX board control resides on the extension mezzanine allowing modifications, implementation of new functionality and future upgrade without affecting the design of the main board. The current first version of the extension mezzanine hosts several functionalities, such as the possibility to configure the FPGA through the JTAG chain and via the selectmap mode, the IPBus master, the reception and the decoding of the Trigger, Timing and Control (TTC) data, spare clocking circuitry that has been used for testing two different types of jitter cleaner (Si5338 and Si5345), I2C buses and the PicoZed [11] plugged into an FCI connector.
To gain more flexibility during the prototype phase, the FPGA power supplies were also mounted on small power cards where the tuning and the optimisation could be done without affecting the design of the main board. After the optimisation of the power modules for ripple and current stability, the power module design will be directly implemented on the main board for the pre-production phase. The requirements on the power module are for low ripple and current stability even with sudden changes in the FPGA current draw so as to not affect the FPGA performance. According to our estimates, the current draw on the VCC int could reach 60A and this requires a DC/DC converter able to deliver such an amount of current. The first power module design was based on the Maxim's [9] components as recommended by Xilinx for the very low ripple. In Fig. 4, the CAD drawing of the power module for a single FPGA based on GE [10] is shown. The 80A DC/DC converter from GE was, at the time of the power module design, the only available choice. The modules are currently assembled and under test with static and dynamic loads.

III. SIMULATION
The simulation activity was carried out in parallel to the jFEX design to validate the design choices. The FPGA power plane configuration was a critical issue in particular for the VCC int plane which draws 55% of the total current and contributes the same amount to the total board heating. Every jFEX module hosts four processors therefore the total current flow is not negligible. Fig. 5 shows the power plane for the VCC int of one processor. The final shape of the plane has been reached after several simulation iterations, with a special focus on minimising the voltage drop across the FPGA. The maximum voltage drop allowed in the Ultrascale specification is only ± 20 mV and the expectation with the current power plane design is 12 mV. The thickness of the copper for this power plane was increased to 105 μm. The effective voltage drop will be further reduced by the addition of sense lines to the centre of the FPGA.  The current distribution was also investigated because of high current density spots between 50 -60 A/mm 2 seen in the simulation. Such spots typically occur on power planes close to the distribution pins. With the help of thermal simulation it was possible to quantify the accumulated power in these points. In all the cases, the accumulated power was of the order of μW or in the worst situations, mW, therefore harmless for the board. The calculated total heat caused by the current flow is, in the worst case, 6.4 • C. Fig. 6 shows the VCC int plane for the four FPGAs. The asymmetric heat distribution comes from the presence of the 12V regulator in another power plane that, by thermal conductivity, affects the heat distribution on the VCC int plane.
To assess the quality of the high speed signal, post-layout simulations were performed on the whole board. Fig. 7 shows the behaviour of the reflected signals as function of the S  Simulated reflected signals for one bus of twelve high speed differential pairs located on signal layer 1. The red marker represents the SFP+ specs that define the region of a good quality reflected signal. The jFEX bus is in the good region and only a couple of traces cross the red marker around 6 GHz. parameter (dB) against the Frequency (GHz). The bus of differential signal pairs is quite compact in its distribution over the frequency range because of the similar length of the traces. The red marker indicates the allowed region for the reflected signals according to the SFP+ specs [12]. A couple of traces of the signal bus cross the red marker around 6 GHz. Analysis of the transmitted signals was also performed and the bus was lying in the region specified by the SFP+ as good quality signal.

A. PMA loop-back measurement
The jFEX algorithms require data duplication between the processors. Even though Xilinx does not officially support PMA loop-back data duplication as a data duplication method, it has been proven to work reliably with Xilinx Virtex7 on the L1Topo module. The Xilinx evaluation board, VCU110 [13], was used to assess the quality of the PMA loop-back with a similar FPGA but different package compared to that which will be mounted on the jFEX board. Fig. 8 shows the corresponding eye diagram of the IBERT [14] test for a link Fig. 8. Eye diagram of the IBERT test set at 10 −13 BER limit for one link running at 28 Gbps. speed of 28 Gbps with the limit of Bit Error Rate (BER) set at 10 −13 . In further tests, the BER limit of 2.25 × 10 −16 was reached without error.
The PMA loopback measurement proves the feasibility of the data duplication on the jFEX board and at higher link speed compared to the current baseline link speed for all the three FEXs of 11.2 GBps.

B. FPGA power consumption tests
One of the challenges faced during the jFEX design was to properly assess the FPGA power consumption given that the algorithm might be modified and further optimised in the future. The strategy adopted was to implement different algorithm configurations that foresee different amount of DSP usage in the FPGA of the Xilinx evaluation board, VCU110 to map out the current behaviour as a function of the DSP usage. Fig. 9 shows the comparison between the measurement performed on the evaluation board, namely the red dots, and the expected current values for different data toggle rate calculated by the Vivado tool for the implemented firmware.
The data fit of the measured points confirms the linear current behaviour and, assuming a possible maximum FPGA usage around 60% corresponding to 1200 DSPs for the implementation of different algorithms on the processors, the expected maximum current is around 35A, validating the choice of the high current DC/DC converter for the VCC int.
V. CONCLUSIONS AND PERSPECTIVES The ATLAS Level1 Calorimeter trigger will be upgraded for the start of the LHC Run 3 to meet the higher luminosity of the LHC accelerator whilst preserving the sensitivity to electroweak processes. The jFEX is one of the three new feature extractors that will be installed and it is an ATCA board hosting four Xilinx Ultrascale that are able to handle up to 3.6 Tbps input bandwidth. The processors share common data to run the algorithms on complete φ rings and use PMA loop-back to duplicate this data on board.
Parallel to the design activity there has been a quite extensive and complimentary work on power and thermal simulation, in particular for VCC int which accounts for more than 50% of current and heat load. The outcome of this investigation was a geometrical maximisation of the VCC int plane and the quantification of the total heat into the board. As a result no issues with the power plane and thermal performance are expected with the resultant design. Similarly the post layout simulation of the reflected and transmitted signals are in agreement with the SFP+ specification.
To gain flexibility during the prototype phase, the FPGA power supplies are located on four daughter modules and, in turn, on a bigger daughter module plugged on the main board. This modularity allows quick modification and several iterations on the power mezzanines without affecting the design of the main board. The goal is to find a suitable configuration characterised by low ripple and stable current delivery as specified by Xilinx for the Ultrascale devices. With the same philosophy the board control is located on an extension mezzanine, where on the first version some functionalities have been added for debugging purpose only. Unlike the FPGA power supply modules, the board control will be always located on the extension mezzanine, allowing possible modifications required for a smooth operation over the future twenty years of data taking.
The sliding window jet algorithm with Gaussian weighting, the baseline algorithm for jet identification, has been already implemented in order to assess the impact of the firmware on the FPGA resource usage. Currently there are two versions available: the first one is meant to be used to measure the current load on the FPGA as function of the DSP consumption to better assess the FPGA power consumption while the second one is an optimised version meant to be used in the final implementation to extract physics results.
The jFEX prototype board will be delivered in early November and the final production of the full system is expected to be completed by July 2018, leaving 1.5 years for testing, system integration and commissioning. The time between installation in the ATLAS cavern and the LHC restart will be used for testing and commissioning.