Upgrade of the PreProcessor system for the ATLAS level-1 calorimeter trigger

The ATLAS Level-1 Calorimeter Trigger is a hardware-based pipelined system designed to identify high-PT objects in the ATLAS calorimeters within a fixed latency of 2.5 us. It consists of three subsystems: the PreProcessor which conditions and digitises analogue signals and two digital processors. The majority of the PreProcessor's tasks are performed on a dense Multi-Chip Module(MCM) consisting of FADCs, a time-adjustment and digital processing ASICs, and LVDS serialisers designed and implemented in ten year old technologies. An MCM substitute, based on today's components (dual channel FADCs and FPGA), is being developed to enhance the flexibility of the digital processing and to profit from state-of-the-art electronics. The development and first test results are presented.


Introduction
ATLAS [2] is one of the four major experiments built at the Large Hadron Collider (LHC). It is a general-purpose experiment for proton-proton collisions, designed to observe the large spectrum of physics processes expected at the LHC. The ATLAS detector consists of an inner tracker surrounded by electromagnetic and hadronic calorimeters enclosed by a muon spectrometer. It is a very complex system with ∼150×10 6 channels producing approximately 1.5 MByte of data for each event. With an LHC bunch crossing rate of 40.08 MHz it generates ∼60 TByte of raw data every second, a rate that cannot be stored directly with current technologies. The task of the trigger system is to reduce the initial interaction rate to a suitable level, by an online selection of potentially interesting physics events with maximum efficiency.
The ATLAS trigger is reducing the event rate using a three level system: Level-1 (LVL1) and the High Level trigger (HLT) where the latter is subdivided into the Level-2 (LVL2) and the Event Filter (EF). The Level-1 trigger includes three major subsystems: the Level-1 Calorimeter and Muon triggers (L1Calo and L1Muon), and the Central Trigger Processor (CTP).
The Level-1 Calorimeter Trigger is a full-custom, fixed-latency, pipelined system [3]. The main parts of it are the PreProcessor system (PPr) and two digital processors: the Cluster Processor (CP) and the Jet/Energy-sum Processor (JEP). The PreProcessor system receives analogue signals from approximately 7200 trigger towers, with typical granularity of 0.1×0.1 in pseudorapidity η and azimuthal angle φ , 1 digitises them, determines the bunch crossing of the energy deposit, and sends data into two parallel digital processor systems. The Cluster Processor searches for isolated electromagnetic and τ/hadron cluster candidates, while the Jet/Energy-sum Processor looks for jet candidates and calculates the total and missing transverse energy for each event.

The PreProcessor system
The main purposes of the PreProcessor system are the digitisation of the analogue input signals, the determination of the bunch crossing of the primary interaction and a precisely calibrated transverse 1 In the ATLAS coordinate system, θ is the polar angle and φ is the azimuthal angle with respect to the beam axis.
The pseudorapidity η is defined as η = − ln tan(θ /2). energy measurement. The system is made out of 124 modules installed in standard VME crates with a VME64xP backplane mounted. Each PreProcessor Module (PPM) processes 64 channels in parallel. It is a highly modular system, with the majority of the processing performed on a dense Multi-Chip Module (MCM), shown in figure 1, consisting of FADCs, a single ASIC, and serialised output driver chips. The conditioned analogue input signals are digitised at 25 ns intervals. For optimal energy resolution it is necessary to set the point of sampling onto the maximum of the analogue signal shape. This can be achieved by means of a timing element called PHOS4. The device allows adjustment of the FADC's sampling strobe across one clock-period with a resolution of 1 ns. The heart of the system is the ASIC, a pipelined digital processing element which converts the digitised input signals into correctly time-aligned, calibrated and noise-suppressed outputs. The ASIC performs the assignment of the energy deposits to the correct LHC bunch-crossing referred to as the Bunch-Crossing Identification (BCID). In addition, the ASIC holds functionality for debugging and system tests (e.g. monitoring, playback memories, etc.).
After some further processing (e.g. pre-summing of channels for the JEP system which works at lower granularity than the CP system) also performed in the ASIC, the information is sent to the Processor systems (CP and JEP) in three serial LVDS streams.

The new MCM: from ASICs to FPGA
The MCM is an essential component of the PreProcessor system. A conservative spare policy(50%) has been implemented, but it would be impossible to quickly produce more MCMs in case of need. There are 9 unpackaged ASIC dies on each of the 2048 MCMs and most of them would be almost impossible to reproduce or purchase today: • PHOS4 (time-adjustment chip with 1 ns resolution). The technological process for the production is still available, GDSII files are available as well. However, there is no support provided by the developers any more, so success of the production is uncertain.
• AD9042 ADC is not available as an unpackaged die. A packaged version would be too big to be mounted on the MCM.
• PPrASIC was designed at Kirchhoff Institute for Physics, Heidelberg University. Verilog code and GDSII files are available. The technological process is outdated (AMS, 0.6 µs), but still available.
• Serialisers are available, but the new, pin compatible chip (DS92LV1023E) is recommended by National Semiconductor.
The basic MCM technology decisions have been taken a decade ago. At that time it was not feasible to have standard, packaged and really flexible components which would correspond to all requirements. Compactness of the MCM prohibited use of FPGAs, which were too small in terms of available resources and too big in size to be placed on the module. Therefore, a less flexible solution (ASIC) was used. Nowadays FPGAs are powerful and compact devices. A high degree of adaptability provided by the FPGA will allow us to adjust and to add new digital processing algorithms (event-by-event pedestal subtraction, more sophisticated BCID algorithms, etc.) if it will be required for a smooth operation after the LHC upgrade for the high luminosity. In order to profit from the exponential growth of state-of-the-art electronics the pin-, size-and latencycompatible substitute for the MCM based on today's components is being developed. Two dual channel 105MHz FADCs AD9218 are used for compact, packaged, fast, low noise, low power digitisation. A Xilinx Spartan-6 FPGA serves as a flexible, low-cost, configurable digital processing unit which is able to replace not only the PreProcessor ASIC but the time-adjustment chip and LVDS-serialisers as well.
Standard components allow us to use commercially available evaluation boards, together with the existing equipment for the current MCM testing, for the FPGA configuration bitstream development and for tests in parallel with the design of the PCB layout for the new module. In the rest of the paper we will briefly discuss the implementation and tests of the different digital MCM components inside the single FPGA. The XILINX Spartan-6 FPGA Evaluation Kit SP601 is used for prototyping and testing. The development of the analogue part for the new MCM and PCB layout design is not presented here.
PHOS4 -The time-adjustment chip: the PHOS4 is a fine time adjustment chip with a resolution of 1 ns. The FPGA's Digital Clock Manager (DCM) [5] is a very suitable component for this task. DCMs provide advanced clocking capabilities to FPGA applications. One of the functions of this block is an input signal phase shifting. The DCM provides a digital interface to dynamically advance or retard the phase-shift value with a fixed step. The step size is well below 1 ns [6]. However, it could be different from chip to chip. To calibrate this part of the design one could use an analogue delay line outside of the FPGA (as is foreseen on the new MCM PCB). Four FADC strobe signals have to be adjusted independently. Therefore, an FPGA with more than four DCMs should be used (preferably with at least six DCMs: four -for the PHOS4 replacement, one for the serialisation clock generation, and at least one for the clock management for the rest of the design). The smallest Spartan-6 which would correspond to this requirement is SC6SLX45 with eight DCMs [4]. This is the largest Spartan-6 device in the CSG324 package with 15×15 mm 2 size suitable for the MCM mounting. However, the SP601 Evaluation Kit has the SC6SLX16 chip with four DCMs. So, only two channels of the PHOS4 implementation were tested "standalone", with- LVDS serialisers: pre-processed data is serialised with a ratio 10:1 and sent from the MCM in three serial LVDS streams with 400 Mbit/s (480 Mbit/s with overheads: 10 data bits + 1 start bit + 1 stop bit @ 40 MHz). LVDS serialisers have been implemented inside the FPGA using the Spartan-6 output serialiser blocks (OSERDES2) [7] and tested with the SP601 Evaluation Kit. The eye-diagram (figure 2) taken on the development board connector shows a good signal quality.
The PreProcessor ASIC: the existing ASIC Verilog code was slightly adapted for the FPGA. Memory IP cores used for the ASIC production were replaced with the XILINX BlockRAM [8]. The parallel interface to the external serialiser chips was replaced with the LVDS serialisers implemented inside the FPGA. New code has been synthesised for the Spartan-6 FPGA. The device utilisation summary is shown in table 1. As one can see, ∼75% of the FPGA resources are free and could be used for the digital processing algorithm improvements.
The power consumption of the FPGA, as being estimated by the Xilinx XPower Analyzer, is ∼200 mW. Together with the FADC power consumption of ∼275 mW per channel it gives ∼1.5 W per module. This is about a factor two less than the power consumed by the current MCM.

Conclusions
The substitute for the MCM based on today's components is being developed to get the more flexible pre-processing unit and to profit from ten years of technology change since the original MCM development. Standard, packaged components make it possible to design and test partially implemented functional elements in parallel with the design of the PCB layout, simplify the PCB layout and make a final production of the new modules much easier.
The full compatibility with the current module allows transparent replacement within the existing concept of the PreProcessor mother-and daughter-cards as well as mixed operation when old and new MCMs are used on the same board. However, modern reconfigurable devices do not just allow us to reduce the number of components on the module, but open the possibility to adjust and / or to add new pre-processing algorithms for higher luminosity expected after the LHC upgrade.