CMS level-1 upgrade calorimeter trigger prototype development

As the LHC increases luminosity and energy, it will become increasingly difficult to select interesting physics events and remain within the readout bandwidth limitations. An upgrade to the CMS Calorimeter Trigger implementing more complex algorithms is proposed. It utilizes AMC cards with Xilinx FPGAs running in microTCA crate with card interconnections via crate backplanes and optical links operating at up to 10 Gbps. Prototype cards with Virtex-6 and Virtex-7 FPGAs have been built and software frameworks for operation and monitoring developed. The physics goals, hardware architectures, and software will be described in this talk. More details can be found in a separate poster at this conference.


Introduction
In the future, the LHC will exceed the original design luminosity of 10 34 cm −2 s −1 and further increase the number of interactions (pileup) in a single LHC crossing. Triggering on the events of interest will become increasingly difficult, and the current CMS trigger algorithms will become less efficient. A proposed solution for the CMS calorimeter trigger will utilize modern high-speed FPGAs and fast optical links, making more flexible and complex algorithms in the hardware possible.

Present level-1 calorimeter trigger
The current level-1 calorimeter trigger consists of the Regional Calorimeter Trigger (RCT) [1] and the Global Calorimeter Trigger (GCT) [2]. The RCT receives more than 8000 Trigger Primitives (TPs) from the hadronic, electromagnetic, and forward calorimeters and consists of one 6U and 18 9U crates of custom electronics. In each 9U crate it finds and forwards 8 e/γ candidates (of types isolated and non-isolated), creates 14 central tower sums, 28 quality bits, and 8 forward calorimeter towers and quality bits to the Global Calorimeter Trigger (GCT). The GCT consists of one 9U and six 6U crates and reduces the number of candidates to 4 isolated and 4 non-isolated e/γ, finds central, tau, and forward jets, and calculates global quantities like missing E T , total E T , and H T . All of these are sent to the Global Trigger (GT) where the final trigger decisions are made.

Motivation
The LHC is currently delivering start-of-fill luminosities of up to 7.5×10 33 cm −2 s −1 with 50 ns bunch spacing at the CMS and ATLAS experiments. Total sustained level-1 trigger rates of up to 90 kHz have been recorded at CMS, and the average number of interactions per crossing (pileup) is around 30-35 at the start of a fill. By the end of 2017, the LHC luminosity could reach 2×10 34 cm −2 s −1 , with either 25 ns or 50 ns bunch spacing. With these beam conditions, pileup could range from 50 to almost 100 interactions per crossing. At CMS, trigger and detector upgrades will be essential to continue collecting good quality physics data. These upgrades will enable the level-1 trigger thresholds to be kept as low as possible and reduce the effects of pileup by improving the algorithms and resolution.

Planned improvements to the calorimeter algorithms
Upgrades to the calorimeter trigger hardware will benefit the calorimeter trigger objects: e/γ, jet, tau, and global quantities such as missing E T , by improving position and energy resolution and increasing the complexity of the algorithms. For a detailed description of the current algorithms see the CMS Level-1 Trigger TDR [3].
For the e/γ, the hadronic calorimeter depth segmentation can be used to better separate the hadronic from e/γ-like objects and is under study. The granularity will be improved significantly, from the current 4x4 trigger towers (∆R of 0.35) to a half tower (∆R about 0.044). Changing the topology and separating the hadronic and electromagnetic deposits in the calorimeter will improve isolation of the e/γ candidates. Effects of the improvements can be seen in figure 1.
For the jet triggers the resolution will go from 4x4 towers to 1 tower (∆R about 0.088). In the forward region (3< |η| <5), the jet trigger will make use of the full granularity of the forward calorimeters, resulting in a resolution improvement of a factor of ∼6. The jet algorithm will be more flexible, with the diameter ranging from 8-12 towers and the jet shape circular or square, rather than the current 12x12 tower square. For the tau triggers, which are currently a version of the jet trigger, the cluster size will be significantly smaller than the current 12x12 towers.
Overall, with an upgrade of the global trigger, there will be the possibility to increase the number of candidates, which is currently limited to 4 isolated e/γs 4 non-isolated e/γ, 4 central (|η| <3) jets, 4 tau jets (|η| <3), and 4 forward (3< |η| <5) jets. Pileup subtraction can also be done, improving the accuracy of the first-level decision. Finally, the global quantities will also benefit from calculations with the clusters in the calorimeter, rather than broad swaths of towers.

Architecture of the upgraded calorimeter trigger
A two-layer system will be built, with two architecture options proposed for the processing algorithms. One is a more conventional pipelined trigger, more sophisticated and compact than what exists today at CMS, using the modern FPGA link I/O and data sharing to perform the new trigger algorithms in the desired latency. Individual FPGAs will be dedicated to different algorithms, i.e. one for jets and one for taus. The proposal has been described in detail in reference [4]. The other uses time multiplexing, where all trigger primitives of one LHC crossing are transmitted over several bunch crossings but all algorithms are performed in a single FPGA during the same number of -2 -  crossings. In this case several FPGAs handle the data round-robin style so that the overall latency is not affected. A demonstrator has been built and it is described in reference [5].
The flexibility of modern FPGAs and links makes it possible to reconfigure the architecture during a LHC technical stop. Of the two layers, the first, to be built by the University of Wisconsin -Madison, is described in section 3. The second, to be built by CERN and a number of U.K. groups: Imperial College, Bristol, RAL, and Iceberg Technologies is described in section 4. A graphic of the architectures and layer divisions is shown in figure 2.

CTP-6 and VadaTech 894 -layer 1
The first layer will consist of custom boards and a custom backplane to receive the trigger primitives via optical fiber from the calorimeters. It will either create 2x2 tower clusters with half-tower -3 - Processing cards will be located in slots 2-5 and 8-11, CIO cards for inter-crate data sharing will be in slots 1 and 12. Slots 6 and 7 are reserved for either spare processing or CIO cards. Right: a photo of the VT894 in use with, from left to right, redundant power supplies, Wisconsin CTP-6 card, AMC 13 above a MCH, second CTP-6 card for link testing. position found by using an E T weighting technique in a pipelined trigger processing system, or time-multiplex the trigger primitives to their designated layer 2 boards.

VadaTech 894
Depending on the final implementation, data sharing along a backplane and between crates may be necessary at this level. To help accomplish this, an enhancement to the preferred CMS µTCA crate, the VadaTech VT892 [6] has been developed. This new crate, the VadaTech VT894 [7], is the same configuration as the VT892, with additional connections to the unused ports. The crate supports 12 double-width, full-height AMC cards with redundant power supply and two µTCA Carrier Hub (MCH) slots. The MCH1 slot houses a commercial MCH module, used for gigabit ethernet connectivity and IPMI control. The MCH2 slot has a custom module, the Boston University AMC13 [8] for CMS Trigger Timing and Control (TTC) downlink and the CMS data acquisition interface to the crate. Each slot connects to 20 ports on the backplane with a transmit and receive pair: • ports 0-3 -For gigabit ethernet, TTC, and data acquisition • Ports 4-7 -Star fabric to the slot for MCH1 • Ports 8-11 -Star fabric to the slot for MCH2 • Ports 12-15 and 17-20 -not connected on the VT892, but enhanced with a custom fabric on the VT894.
A diagram of the additional backplane fabric and a photo of the crate are shown in figure 3. The backplane allows sharing among adjacent processing cards as well as to a Crate Input/Output (CIO) card to enable data sharing among multiple crates.

Wisconsin Calorimeter Trigger Processor (CTP)
For the first layer, the University of Wisconsin has built a fully functional Calorimeter Trigger Processor card prototype with dual Xilinx Virtex-6 (XC6VHX250T or XC6VHX350T) FPGAs and -4 - Each FPGA logic core has its own dedicated 25A power module. A photograph of the board and description of some of the major components is in figure 4 and its caption. For more information, see reference [9]. Two of the CTP-6 cards have been built. Extensive loopback testing on the 12 optical outputs to the 48 optical inputs has been performed at 6.4 Gbps over 5m fibers. The link driver and receiver settings affect the results, and settings have been found for error free operation. The eye diagram from the test can be seen in figure 5.
Currently, a survey of the VT894 backplane links is being performed by moving the CTP-6 cards between different crate slots. At the time of the workshop, about 25% of the VT894 custom fabric links had been tested. Each test is run at 4.8 Gbps at for about 30 minutes and so far none have had errors.

MP7 -layer 2
The second layer will consist of a set of custom boards to receive either the calorimeter trigger clusters or the time-multiplexed trigger towers via optical fiber from the processing boards of the first layer.   links of transmit and receive at 10 Gbps, as well as 50 Gbps of electrical I/O with 28 LVDS links running at 1.8 Gbps. For buffering, the MP7 uses dual QDR RAM of either 72 or 144 MB clocked at 500 MHz. It also has extensive monitoring, with 15 voltage/current sensors and 16 temperature sensors on board. The firmware storage can be done via standard PROM or a Micro SD card to allow fast storage of many firmware versions. In addition, it has a USB2 console via a microcontroller. A photograph and brief description of the card is in figure 6. For more information, see reference [9].
Testing of the MP7 is underway. The JTAG access to the FPGA and microcontroller via a complex programmable logic device has been verified. The QDR RAM Functionality has been tested to 375 MHz, with 2×13.5 Gbps on each port, with the tests to 500 MHz still to be completed. The -6 - Module Management Controller (MMC) code has been ported to the MP7 from the previous Mini-T [5] and includes more monitoring than before. All on board power supplies' voltage, current, and power, humidity, temperature, and more are monitored.
A simultaneous 48-channel 8B/10B-encoding test has been completed. Over 7×10 23 bits per channel were transmitted and no bit or alignment errors were observed. This includes data capture, counter, and synchronization. Additionally, a simultaneous 24 channel PRBS31 (harsher) test was performed with Xilinx IBERT [11]. This was limited to half of the channels due to the IBERT software limitations. Still, 10 13 bits were transmitted without any errors. Neither test had any special tuning.

IPbus/µHAL package
For the CMS experiment a new hardware control standard, IPbus/µHAL, version 1.0, was released in August 2012 [12]. These tools allow hardware control via gigabit Ethernet, using UDP as the main transport protocol, with software support for TCP available. A complete solution is provided: an IPbus/UDP firmware module to add to a FPGA design, The µHAL (microTCA Hardware Access Library) application programming library, and ControlHub for serializing concurrent accesses from multiple clients.
IPbus is based on well-established networking technology, making it very flexible, with uses ranging from a single crate on a bench with a PC, to a system with multiple PCs, crates, and routers. The IPbus firmware footprint is small. Real-world usage in a low-end Xilinx Spartan 6 FPGA (XC6LX16-CS324) is a small fraction of the FPGA's available resources [11].
The current performance of IPbus/µHAL is dominated by the latency. The current firmware only supports a single UDP packet in flight per target device. To minimize network transports, requests are queued and only dispatched when necessary. Plots showing the latency vs. read size and the read bandwidth vs. read size for the current version of IPbus/µHAL are in figure 8. The next release of IPbus aims to improve the performance figures by reducing the firmware latency and supporting multiple packets in flight. This update should be available in early 2013.
-7 - Figure 8. The left plot shows the latency in µs vs. the read size in bytes. As the read size increases, the number of packets used goes up and the read bandwidth suffers, as seen in the plot in the right. The discontinuities are where the number of UDP packets increases from 1 to 2 and then from 2 to 3. This issue will be addressed in the next update of the IPbus package.

Outlook and conclusions
An upgrade to the current CMS Level-1 Calorimeter is essential for ensuring reliable physics performance as the luminosity of the LHC increases. Two FPGA-based high-speed calorimeter trigger-processing boards and a new µTCA backplane have been built this year: the CTP-6, MP7, and VT894. Intense testing is underway for these, and the cards will be used in the two layers of the new calorimeter trigger. Initial testing promises good performance. Additionally, the IPbus/µHAL firmware and software package will allow uniform operation and administration of these systems and others while in operation at CMS. The built-in modularity will allow staging of the new system and a slice will be ready by the end of the LHC long shutdown 1 (end of 2014).