Clock Distribution and Readout Architecture for the ATLAS Tile Calorimeter at the HL-LHC

The Tile Calorimeter (TileCal) is one detector of the ATLAS experiment at the large hadron collider (LHC). TileCal is a sampling calorimeter made of steel plates and plastic scintillators which are readout using approximately 10 000 photomultipliers tubes. In 2024, the LHC will undergo a series of upgrades toward a high luminosity LHC (HL-LHC) to deliver up to $7.5\times $ the current nominal instantaneous luminosity. The ATLAS Tile Phase II Upgrade will accommodate detector and data Acquisition (DAQ) system to the HL-LHC requirements. The detector electronics will be redesigned using a new clock distribution and readout architecture with a full-digital trigger system. After the Long Shutdown 3 (2024–2026), the on-detector electronics will transfer digitized hadron calorimeter data for every bunch crossing (~25 ns) to the Tile PreProcessors (TilePPr) in the counting rooms with a total data bandwidth of 40 Tb/s. The TilePPrs will store the detector data in pipeline memories to cope with the new ATLAS DAQ architecture requirements and will interface with the front-end link exchange system and the first trigger level. The TilePPr boards will distribute the sampling clock to the on-detector electronics for synchronization with the LHC clock using high-speed links configured for fixed and deterministic latency. The upgraded readout and clock distribution strategy were fully validated in a demonstrator system using prototypes of the upgraded electronics in several test beam campaigns between 2015 and 2018.


I. INTRODUCTION
ILECAL [1] is the central hadronic calorimeter of the ATLAS experiment [2] at the Large Hadron Collider (LHC) at CERN. TileCal was designed to have a key role in the energy reconstruction of hadrons, jets, tau-particles and missing transverse momentum.
It is a sampling segmented calorimeter using steel plates as passive absorber and plastic scintillator tiles as active material. The TileCal detector is composed of four cylinders subdivided azimuthally into 64 wedge modules each. The   The light produced by the particles crossing the detector is read out using up to 9852 PhotoMultiplier Tubes (PMTs) for the entire detector. The signals from the PMTs are amplified, shaped, and digitized at 40 Msps using a clock synchronous with the LHC beam crossing ( Figure 2). The digital samples are stored in pipeline memories during the Level-1 trigger latency (2.5 μs). Simultaneously, the PMT analog signals are transmitted to the Level-1 Calorimeter trigger system. The digital samples corresponding to the events selected by the Level-1 trigger system are transmitted through optical fibers to the ReadOut Drivers (RODs) [3] located off-detector at a maximum trigger rate of 100 kHz in average.
The custom 9U VME ROD module is the core element of the off-detector electronics of the current system. Each ROD board can read out up to 8 detector modules. Thus, 32 ROD modules installed in four VME crates, corresponding to the four detector barrels, are used to operate the entire detector. The  The ROD receives 7 digital samples for each event selected by the Level-1 trigger system, and processes them within 10 μs to minimize the dead-time in the detector. Then, the processed data are transmitted to the Level-2 trigger system located in the High Level Trigger (HLT).

II. ATLAS PHASE II UPGRADE
The LHC has planned a series of upgrades culminating in the High-Luminosity LHC (HL-LHC) which will increase the instantaneous luminosity up to 7.5·10 34 cm -2 s -1 with a pileup close to 200 collisions per bunch interaction providing a total integrated luminosity of 4000 fb -1 in ten years. In order to cope with the new radiation levels and increased data bandwidths for the HL-LHC, the TileCal readout electronics will be redesigned introducing a new readout strategy with a full-digital trigger system.
After the ATLAS Phase II Upgrade [4], the TileCal ondetector readout electronics will transmit digital detector data for every bunch crossing (~25 ns) to the Tile PreProcessor (TilePPr) boards in the counting rooms. As described in Table I, this new readout strategy implies an important increase of the required bandwidth for the communication between on-and off-detector electronics and trigger systems, keeping the same ratio between the number of readout channels and off-detector electronics boards.
All detector data received from the on-detector electronics will be stored in pipeline memories until the reception of a trigger acceptance signal. Selected data will be transmitted from the TilePPr board to the ATLAS Data AcQuisition (DAQ) system through the Front End LInk eXchange (FELIX) [5]. While storing the data in the pipelines, the TilePPr must provide reconstructed cell energy for every bunch crossing to the ATLAS trigger system with a fixed and maximum latency of 1.7 μs [4].
In addition, in the new readout architecture the TilePPr board will distribute the LHC clock to the on-detector electronics for the digitization of the PMT signals and for the synchronization with the rest of the readout electronics of the DAQ system. Figure 3 shows a sketch of the readout architecture envisaged for the TileCal detector at the HL-LHC. The test beam setup is combined with two more TileCal modules: one EB module and one LB module. These modules were instrumented with the legacy readout electronics in order to compare the performance of the upgraded and legacy electronics. Figure 4 shows a picture of the complete test beam setup where the three TileCal modules are piled up on a movable table capable of placing modules at any combination of angle and position with respect to the incident beam. The Demonstrator project plans include the replacement of one of the TileCal modules in ATLAS by the Demonstrator module during the Long Shutdown 2 (2019-2021). For this reason, the Demonstrator module was designed with the capability of providing both analog trigger signals to the current ATLAS Level-1 trigger system and full-digital trigger data for the upgraded system.

A. Upgraded on-detector electronics
The Demonstrator module includes all the upgraded ondetector electronics required for the acquisition of the PMT signals, high-speed interface with the off-detector electronics and the high voltage distribution system for the PMTs.
The TileCal modules for the HL-LHC will be segmented into four identical and independent modules, called minidrawers, to improve the reliability of the system through redundancy. Figure 5 shows a drawing of a mini-drawer with the position of all the on-detector components. Each minidrawer is composed of a mechanical aluminum structure, one Main Board [6], one Daughter Board [7] and one HV regulation board to read out and operate up to 12 PMT blocks equipped with upgraded 3-in-1 Front End Boards (FEB) [6].
Each Main Board digitizes the analog signals received from up to 12 PMT blocks. The Main Board hosts a Daughter Board, which collects and transmits digitized data to the TilePPr prototype in the off-detector electronics at the LHC frequency.

IV. THE TILEPPR DEMONSTRATOR
The TilePPr Demonstrator board [8] was designed to acquire and process the detector data transmitted from the Demonstrator module, as well as for the distribution of the LHC clock and Timing, Trigger and Commands (TTC) to the on-detector electronics. This prototype serves one TileCal module representing 1/8 th of the final TilePPr module for the HL-LHC.
The TilePPr Demonstrator board is a double mid-size Advanced Mezzanine Card (AMC), which can be operated in an Advanced Telecommunications Computing Architecture (ATCA) carrier or in a Micro Telecommunications Computing Architecture (µTCA) crate. The PCB counts 16 layers, where 8 layers are devoted for power distribution and ground planes, and 8 layers are used for data lines. NELCO 4000-13 SI dielectric material was selected to reduce high frequency losses in the high-speed lines interfacing with the on-detector electronics.
This prototype is equipped with four QSFP modules connected to one Virtex 7 XC7VX485T FPGA (Readout FPGA), one Kintex 7 XC7K420T FPGA (Trigger FPGA), two Texas Instruments CDCE62005 jitter cleaners. It also hosts DDR3 memories and one Ethernet port in the front panel. The AMC backplane provides power to the TilePPr, as well as four point-to-point high-speed connections between the Readout FPGA and the Rear Transition Module (RTM), and several GbE and PCIe ports to connect both FPGAs with rest of the boards in the ATCA shelf.
The Readout FPGA implements all the firmware required for the readout and operation of the Demonstrator module, and the Trigger FPGA preprocesses and transmits data to the upgraded trigger system. Figure 6 shows a picture of the TilePPr Demonstrator board.

A. TilePPr validation tests
As part of the validation process of the TilePPr prototype, extensive Bit Error Rate (BER) tests and jitter measurements were performed to evaluate the signal quality of the clock and data transmitted towards the on-detector electronics.
These jitter measurements are crucial for the validation of this high-speed readout system, since the jitter affects the performance of all the components involved in the data communication chain, such as clocks, jitter cleaners, highspeed transceivers and power supplies.
Different types of jitter data were extracted from the optical output of the QSFP modules using a Keysight DCA-X 86100D sampling oscilloscope [9] equipped with an optical module 86105C. The optical signal quality was measured at the output of an Avago AFBR-79Q4Z QSFP module by connecting to the oscilloscope a multimode fiber one meter long. The transceivers were configured to operate the GBT links at 4.8 Gbps and 9.6 Gbps without any adjustment of the pre-or post-emphasis features. Table II presents the jitter measurement results (µ) done at 4.8 Gbps (Tbit≈208 ps) and 9.6 Gbps (Tbit≈104 ps) data rates with the corresponding standard deviation (σ) of the measurement. The obtained jitter measurements show a good signal quality with low jitter values for the communication towards the on-detector electronics.
The estimated Total Jitter (TJ) values for the given BER indicate that the design is robust enough to operate at the required data rates. The estimated TJ (10 -18 ) corresponds to 0.23 Unit Interval (UI) for 4.8 Gbps and to 0.55 UI for 9.6 Gbps, both at a BER of 10 -18 . The Random Jitter (RJ) presents similar values for 4.8 Gbps and 9.6 Gbps data rates, while the Deterministic Jitter (DJ) indicates a higher jitter at 9.6 Gbps. The Keysight DCA-X 86100D provided a resolution of 5 fs and 2 fs for the RJ measurement at 4.8 Gbps and 9.6 Gbps respectively, and 50 fs and 20 fs for the rest of the measurements. Figure 7 shows the eye diagrams obtained with the sampling scope during the test, resulting in wide open eyes in time and amplitude at both data rates. Finally, BER tests were performed using the IBERT IP core [10] on the sixteen links running at 9.6 Gbps with a PRBS-31 data pattern over a period of 115 hours. No errors were found during the tests corresponding to a BER ≤ 5·10 -17 with a confidence level of 95%.

V. CLOCK AND READOUT ARCHITECTURE AT THE TEST BEAM
The clock and readout architecture implemented for the readout and operation of the Demonstrator module is very close to the final one proposed for the HL-LHC. The difference between the Demonstrator and HL-LHC readout architectures lies in the capability of Demonstrator module to provide analog trigger signals to the ATLAS Level-1 trigger system. This feature is required to operate the Demonstrator module within the current DAQ architecture of ATLAS, if the Demonstrator module will be inserted in the TileCal detector to take data during the run 3 (2021-2023). Figure 8 shows a complete block diagram with all the components and interconnections of the readout architecture employed during the test beam campaigns.

A. Readout architecture at the test beam setup
The bi-directional communication between the TilePPr and the Demonstrator module is implemented using the GigaBit Transceiver (GBT) protocol [11] with an asymmetric bandwidth (4.8 Gbps / 9.6 Gbps). The required sixteen GBT links transmit Detector Control System (DCS) and TTC commands to the on-detector electronics at 4.8 Gbps, and receive detector data in the TilePPr at 9.6 Gbps through the four QSFP modules. The sampling clock is transmitted to the on-detector electronics embedded with the TTC and DCS data through the GBT links.
In the on-detector electronics, each Daughter Board receives four links from the TilePPr corresponding to one QSFP module. One of the receiver links is connected to the input of a GBTx chip [12] for the recovery of the reference clock and for the remote configuration of the on-detector electronics. The remaining three GBT links are connected to two Kintex 7 FPGAs for the reception of configuration and control commands.
The Daughter Board is plugged into a Main Board through a 400-pin FMC connector. This connector provides a highspeed path to receive the PMT digitized data and permits the configuration of the Main Board components. Both Daughter Board and Main Board are divided in two independent halves called A and B-sides, which are individually powered with 10 V from the Low Power Voltage Supplies (LVPS) for redundancy.
The Main Board is divided in four sections each controlled by an Altera Cyclone IV FPGA. A section contains the required circuitry to control and read out 3 PMTs for a total of 12 PMTs per Main Board. Each Cyclone IV controls 3 dualchannel 12-bit ADCs for digitizing the shaped PMT signals provided by the 3-in-1 cards (high-and low-gain) at 40 Msps, 6 DACs to control the bias voltage levels applied to the ADC inputs and 3 slow-speed ADCs for sampling the integrators at 50 ksps. The Cyclone FPGAs are accessed from the Daughter Board via an SPI interface, while digitized PMT signals are sent directly from the ADCs to the Daughter Board using LVDS lines at 560 Mbps. Two I2C buses (one per side) are dedicated for the readout of the integrator ADCs.
In the Daughter Board, the serial data coming from the 12 high-speed ADCs at 560 Mbps are deserialized with fixed and deterministic latency. The ADC data are packed with the integrator and monitoring data and transmitted to the TilePPr Demonstrator board through a pair of redundant GBT links per side running at 9.6 Gbps.
The TilePPr receives the digitized data from four Daughter Boards at 40 MHz through the four QSFP modules requiring a total input bandwidth of 160 Gbps. Data are unpacked and stored in the circular pipeline memories of the Readout FPGA implemented with dedicated block RAM resources upon the reception of a Level-1 trigger Acceptance (L1A) signal. When the L1A signal is received, the TilePPr collects the data for the selected event from the pipelines and transmits them to the FELIX system (16 samples and 2 gains) and to the legacy ROD (7 samples and 1 gain) with the proper format, keeping backward compatibility with the current ATLAS DAQ system.

B. Clock distribution at the test beam setup
Related to the distribution of the sampling clock towards the on-detector electronics, the TilePPr recovers the 40 MHz clock from the TTC stream with an Analog Devices ADN2814 chip [13]. The recovered clock is buffered to a CDCE62005 jitter cleaner to meet the jitter requirements of the transceiver reference clock [14]. As explained above, the 40 MHz clock is recovered in the Daughter Boards and distributed as sampling clock to the Main Board, where the Cyclone IV FPGAs fan out the clock to the ADCs.
An important feature of this system is that the GBT links were configured for fixed and deterministic latency operation, minimizing the phase variations in the distributed clock after power cycling the readout electronics.
This operation mode of the GBT links permits the correct assignment of the bunch crossing identifier to the digitized data in the TilePPr, since the digitized data is not time stamped in the on-detector electronics. For this reason, the transceivers of the Daughter Board FPGAs use the recovered clock from the GBTx chip for the transmission to the offdetector electronics, operating both on-and off-detector electronics systems in the same clock domain.
All the data flow and control of the TilePPr and configuration commands for the on-detector electronics is implemented over an Ethernet network using the IPbus protocol [15]. A set of IPbus registers are used to handle the TTC and DCS commands to the on-detector electronics as well as for data monitoring and calibration. The internal configuration for the different modes of operation and remote monitoring of the TilePPr status is managed through the IPbus registers.

VI. CLOCK STABILITY TESTS
The stability of the clock distributed towards the ondetector electronics was studied using digital phase monitoring tools implemented in the logic of the Readout FPGA. The quality of the clock signal distributed as a sampling clock plays a crucial role in the energy and time reconstruction of the physics events produced in the detector.
The phase monitoring module implemented in the TilePPr permits the detection and tracking of non-deterministic latency variations in the high-speed path between on-and off-detector electronics. The jitter cleaners or FPGA transceivers could introduce a latency variation after power cycle, or due to temperature and voltage drifts.

A. OverSampling to UnderSampling circuit
Each one of the sixteen GBT links was connected to a phase monitoring module composed of one OverSampling to UnderSampling circuit (OSUS) [16]. The OSUS circuit tracks the phase difference between the 40 MHz clock transmitted and recovered back from the on-detector electronics.
The OSUS circuit is a digital circuit based on the Digital Dual Mixer Time Domain (DDMTD) circuit [17] [18], but it combines both subsampling and oversampling techniques. The combination of both sampling techniques makes possible the measurement of clock phase differences between signals with frequencies different from the one used to generate the sampling clock. In addition, this technique permits to increase the acquisition rate of phase measurements with respect to the original DDMTD circuit. Figure 9 presents a functional block diagram of the OSUS circuit. The sampling clock ( ) is generated from the first input clock ( ) using a Phase Locked Loop (PLL). The PLL distributes to the samplers a sampling clock with a frequency of times the input clock frequency, where M is the oversampling factor and N defines the resolution of the OSUS circuit. M and N factors are natural numbers.
Both and input clocks are sampled with , obtaining the oversampled and signals. These oversampled signals are passed to the Phase Control block, which multiplexes the oversampled signals in intervals of M·Ts and builds M pairs of subsampled signals.   Figure 10 shows the timing diagram of an OSUS circuit with M=4 and N=8, representing the sampling clock ( ), one input clock ( ), its corresponding oversampled signal ( ), and posterior regrouped samples forming the subsampled signals ( ).  (1).
where corresponds to the number of clock cycles of between the rising edges of the filtered signals with the same index .

B. Implementation in the Readout FPGA
All phase monitoring modules implemented in the Readout FPGA drive the same sampling clock to the samplers. Thus, a single Clock Management Tile (CMT) [19] block is required to generate a 240 MHz sampling clock from the 40 MHz clock recovered from the TTC system.
The chosen N factor is 16384, leading into a resolution of ~1.5 ps for the measurement of 40 MHz clock signals (M = 6). However, the accuracy of the phase monitoring module was measured to be ~30 psRMS during the tests. The accuracy is limited by the jitter of the clock extracted from the transceiver, and by the jitter introduced by the signal routing through the fabric logic [20] and clock buffers [19]. Figure 11 and 12 present the results of the studies on the stability of the clock distributed clock from the TilePPr towards the on-detector electronics, where signal corresponds to the clock used for transmission and signal to the clock recovered from the on-detector electronics.
During the tests the phase difference between clocks was acquired after resetting the Daughter Board 100 times. After each reset, 1,000 phase measurements between and were taken and read out through the IPbus interface. Then, the average value was calculated in the computer for the set of measurements. Figure 11 shows the histogram of all phase measurements taken during the tests for the four channels (A0, A1, B0 and B1) of one QSFP module. The phase difference depends on the propagation delay introduced by the fiber optic, optoelectronic modules, FPGA resources and routing. As can be observed in Figure 11, the results do not show significant variations between the phase of the distributed and recovered clock. Thus, the operation of the GBT links with fixed and deterministic latency is validated. Fig. 11. Histogram of the phase measurements taken after resetting the ondetector electronics 100 times. Figure 12 presents the average value of 1,000 measurements after each reset as a function of the reset count. As shown in Figure 12, the maximum time variation between both clocks after resetting the on-detector electronics is 100 pspk-pk. Such variations in the phase of the sampling clock have no impact in the detector performance in terms of energy and time reconstruction resolution [22], as it was observed during the data-taking period in Run 1 (2010Run 1 ( -2012. Hence, the proposed clock distribution strategy fulfills the requirements for the HL-LHC.

VII. CONCLUSIONS
The ATLAS Tile Calorimeter plans a complete redesign of the readout electronics for the HL-LHC using a new readout strategy with a full-digital trigger. In the new readout architecture, the on-detector electronics will transmit all the digitized samples to the off-detector electronics at the LHC frequency. The sampling clock for the digitization of the PMT signals will be distributed embedded with the data from the off-detector electronics.
A Demonstrator module was built with prototypes of the upgraded electronics, using the new readout and clock distribution strategy for the HL-LHC. The Demonstrator module was tested in several test campaigns with particle beams between 2015 and 2018, where the on-detector electronics transmitted the digitized detector signals at 40 MHz to the TilePPr module.
Different tests were performed to validate the proposed readout and clock distribution strategy for the TileCal detector at the HL-LHC. Digital phase monitoring circuits implemented in the TilePPr were used to detect small phase deviations in the distributed clock produced after resetting the on-detector electronics. The stability of the clock ensures a good performance of the algorithms for energy and time reconstruction in TileCal.