An FPGA based demonstrator for a topological processor in the future ATLAS L1-Calo trigger “GOLD”

The existing ATLAS trigger consists of three levels. The level 1 (L1) is an FPGAs based custom designed trigger, while the second and third levels are software based. The LHC machine plans to bring the beam energy to the maximum value of 7 TeV and to increase the luminosity in the coming years. The current L1 trigger system is therefore seriously challenged. To cope with the resulting higher event rate, as part of the ATLAS trigger upgrade, a new electronics module is foreseen to be added into the ATLAS Level-1 Calorimeter Trigger electronics chain: the topological Processor (TP). Such a processor needs fast optical I/O and large aggregate bandwidth to use the information on trigger object position in space (e.g. jets in the calorimeters or muons measured in the muon detectors) to improve the purity of the L1 triggers streams by applying topological cuts within the L1 latency budget. In this paper, an overview of the adopted technological solutions and the R&}D activities on the demonstrator for the TP (“GOLD”) are presented.

. Simplified outline of the L1-Calo trigger chain with the inclusion of the topological processor.

Phase I upgrade and topological processor
Due to the gradual increase in the LHC luminosity up to 10 34 cm −2 s −1 as it is foreseen in 2014 (Phase 0) and 2.5 × 10 34 cm −2 s −1 for 2018 (Phase I), the L1-Calo trigger needs to be upgraded to cope with the higher background rate in order to keep L1 trigger rates at current levels without pre-scaling trigger streams of physics interest.
As mentioned before, the current L1-Calo trigger allows making selections on counts of objects (numbers of jets at various thresholds and clusters) and Missing Transverse Energy. To achieve additional background rate reduction at Level-1, the information on jet or muon direction in space can be used. The current L1-Calo scheme is not designed to handle such amount of information in the Real Time Data Path (RTDP). To explore this feature, a new element in the L1-chain is needed: the Topological Processor (TP). The TP will make it possible to use the RoI information in the L1 trigger. Technically the TP works as a data concentrator (from the calorimeters and muon detectors) and feeds such data into specific topological algorithms which provide an output to the CTP. As a consequence, the TP requires optical connectivity (data concentrator), high bandwidth, and high processing power (algorithms coded into FPGAs). Some modules in the existing L1 trigger signal chain will have to be upgraded or replaced to make topological information available to the TP. An outline of the trigger chain including the TP is shown in figure 1.
The applications can perform many tasks. As an example, extensive studies in calculating correlations among jet angles by cutting on their relative azimuthal angle, have shown large potential in reducing QCD background in the dijet plus Missing Transverse Energy stream. Optimization of specific topological algorithms is of central interest and is currently pursued on high luminosity monte carlo samples.

GOLD requirements and implementation
The GOLD "Generic Opto Link Demonstrator" is a demonstrator module for the technologies to be implemented in the TP design.

Main requirements
The GOLD main board is based on the AdvancedTCA form factor. The board needs to carry programmable logic devices (FPGAs and CPLDs), external interfaces, general module infrastructure and power regulators. Electro-optical converters for incoming data, as well as electrical fan-out are located on an mezzanine module. Clock circuitry is located on a separate mezzanine board. Another small mezzanine is dedicated to pre-configuration access to the GOLD.
For the intended use as future modules in the L1Calo trigger chain, low processing latency on the Real-Time Data Path (RTDP) is crucial. The GOLD is designed for minimum latency data transmission and processing throughout the module.
The GOLD is envisaged to be used as a test bench for topological algorithms running on Xilinx Virtex-6 family FPGAs, moreover it can be used as optical data sink (and source) for L1Calo modules, or stand-alone link tests.

Features and implementation
A full review of the GOLD specification is available on [4]. Here a summary of some key features is briefly discussed.

Processing power
The processing power is provided by FPGAs of the Xilinx Virtex-6 product families. The GOLD PCB has been designed to be used as a common test bench for two Virtex-6 product lines (LXT/HXT). HXT has become available on the market only recently, and therefore all current documentation is focused on the XC6VLXT. The Multi-Gigabit Transceivers (MGTs) of the LXT devices support up to 6.6 Gbs −1 data rate, higher data rates can be handled on the GOLD only if HXT devices are mounted. Both the LXT and the HXT subsystems are connected to the input mezzanine with two 400-pin FMC mezzanine connectors each. In figure 2 the outline of the board shows the physical locations of the FPGAs, indicated with the letters from A to J. Initially only the upper half of the board will be gradually mounted with FPGAs.
The input processor FPGAs (A, B, C, D) are each fed electrically at up to 36 × 6.6 Gbs −1 from the centrally located input mezzanines. Their processing power is available to regional preprocessing at quadrant level. The input processors are connected to the main processor using parallel differential I/O.
The main processor FPGA (E) has access to the results from the input processors via a total of 320 differential lanes. A small fraction of lanes are required for control purposes on this demonstrator. The main processor has access to the detector information from the full solid angle and will therefore house global algorithms. The output of the topological algorithms is sent from the main processor via a single fibre-optical ribbon through the front panel to the CTP.

Real time data path
ATCA backplane zones 2 and 3 of the GOLD (Z2 and Z3 in figure 2) are used for real-time data transmission. Optical data enter the GOLD through the backplane. The fibres are fed in via five blind-mate MTP/MPO type backplane connectors that can carry up to 72 fibres each. This limits the maximum aggregate bandwidth on the optical inputs to 3.6Tb/s, if 10 Gbs −1 line rate is used throughout. The opto-electrical conversion is performed by 12-fibre industry standard receivers mounted on mezzanine modules feeding electrical signals through four FMC connectors with differential signal layout. CML fan out circuits can be implemented on the mezzanines to explore electrical duplication schemes required by topological algorithms. Data from the two top FMC connectors are routed to the four input FPGAs (A, B, C, D in figure 2) and de-serialised in MGT receivers.

Clock distribution
FPGA fabric and MGT clocks need appropriate signal levels and signal quality. The MGT clocks are CML level signals, AC-coupled into the MGT clock pads of the FPGAs. Clock generation and jitter clean-up are performed on a clock and control mezzanine module, located towards the bottom of the GOLD. There are two independent clock trees for the fabric clocks into all FPGAs. There is one common MGT clock tree into all FPGAs. Another MGT clock tree supplies the main processor MGTs. There is yet another clock tree available to supply the high speed quads of the XC6VHXT devices.

Configuration and control
With the GOLD, an attempt of prototyping various module controls for future ATCA based modules is performed. The GOLD is equipped with a USB microcontroller which supports 480Mb/s USB high-speed mode. The microcontroller is interfaced to the FPGAs via JTAG protocol. The microcontroller design is fully compatible to the Xilinx platform USB solution. Either production firmware or service code can be loaded to the FPGAs, as required.
GOLD module control is initially done via a serially extended VME bus. The module is seen from a VME crate CPU as if it were a local VME module. As soon as the required software environment is available, this access mode will be converted to standard Ethernet access. The required hardware components are available on the GOLD.

GOLD production status
The GOLD PCB has been produced in September 2011 and at the time of this conference is in assembly phase. Initially the GOLD will mount three XC6VLX240 devices: the two input FPGAs indicated with the letters A and B in figure 2 and the main processor. This will allow initial tests on the hardware as explained in the next section. A picture of the bare GOLD PCB is show in figure 3.

Plans for GOLD testing
After the PCB production, an electrical test was performed in order to guarantee that all interconnections were functioning. Moreover, after assembly, extended tests aimed at verifying the operation of the external hardware interface, board level connectivity and the basic operation of the hardware devices are planned.
A schematic overview of the GOLD test setup is shown in figure 4. The GOLD control is obtained via an opto fibre connection to a 6U VME-busmodule carrying SerDes devices and optical transceivers. The opto-electrical interface will be a SFP module with LC-type opto connectors.
At first the interconnection between the FPGAs is tested by means of a JTAG boundary scan performed using the GOEPEL electronics boundary Scan software SYSTEMCASCON, interfaced via USB. Moreover, specifics test aimed at verifying functioning and performance of the GOLD are here listed: Latency test: the latency is measured on high-speed SerDes and SerDes for differential connection between input and main FPGA. The latency and functioning for algorithms was already simulated and will be validated on the GOLD.  Transceiver fine tuning: in figure 5 and eye diagrams is illustrated. The empty regions measure the interval in which the transceiver is functioning free from bit errors. The eye diagrams are dependent on the transceivers parameters settings. A combined test of the opto-electro converter (AVAGO) connected to a MGT (inside the FPGA) is done to maximize the eye opening by fine tuning the transceiver setting. The Xilinx IBERT core has the logic to control, monitor, and change the transceiver parameters runtime, accessible through JTAG. The parameters of the optoelectro converter are controlled via I 2 C. The eye diagrams are then measured and the eye opening is maximized by fine tuning such parameters at 6.4 Gbs −1 .
Bit Error Test (IBERT): the ChipScope Pro Serial I/O toolkit integrate the debug of the highspeed serial transceiver of Xilinx FPGAs. Eye diagrams are measured for each MGT with Chip-Scope by swapping the sampling point and also directly with the oscilloscope (32GHz; 80 GSa/s). By running up to 8 MGT in parallel at 6.4 Gbs −1 (after fine tuning of their setting) for one week, no errors should be measured (error bit rate <10 −16 ).
Test of the full chain of the RTDP: 12 Channel data source can either be the main FPGA sending via one AVAGO on the mainboard, the input FPGA "B" (which is connected to one AVAGO transmitter on the input mezzanine) or the BLT module [5]. The received data is then processed in the two input FPGAs and the result is sent to the main FPGA.

Summary
This document gives an overview of the Generic Opto Link Demonstrator (GOLD) built to explore technological solutions that can be adopted in the future L1Calo TP design. The GOLD will be used to test topological algorithms in their firmware incarnation and test latency and perfomances on a simulated input. At the time of this conference the GOLD PCB has been produced and assembled. The plan for testing of the board functionality has been presented.