CMS Level-1 Upgrade Calorimeter Trigger Prototype Development

As the LHC increases luminosity and energy, it will become increasingly difficult to select interesting physics events and remain within the readout bandwidth limitations. An upgrade to the CMS Calorimeter Trigger implementing more complex algorithms is proposed. It utilizes AMC cards with Xilinx FPGAs running in micro-TCA crate with card interconnections via crate backplanes and optical links operating at up to 10 Gbps. Prototype cards with Virtex-6 and Virtex-7 FPGAs have been built and software frameworks for operation and monitoring developed. The physics goals, hardware architectures, and software will be described in this talk. More details can be found in a separate poster at this conference. Presented at TWEPP12: Topical Workshop on Electronics for Particle Physics CMS Level-1 Upgrade Calorimeter Trigger Prototype Development P. Klabbers, M. Bachtis, J. Brooke, M. Cepeda Hermida, K. Compton, S. Dasu, A. Farmahni-Farahani, S. Fayer, R. Fobes, R. Frazier, C. Ghabrous, T. Gorski, A. Gregerson, G. Hall, C. Hunt, G. Iles, J. Jones, C. Lucas, R. Lucas, M. Magrans, D. Newbold, I. Oljavo, A. Perugupalli, M, Pioppi, A. Rose, I. Ross, D. Sankey, M. Schulte, D. Seemuth, , W.H. Smith, J. Tikalsky, A. Tapper, and T. Williams a University of Wisconsin, Madison, WI, USA b University of Bristol, Bristol, UK c Imperial College, London, UK d CERN, Geneva, Switzerland e Iceberg Technology, UK f Rutherford Appleton Laboratory, UK ABSTRACT: As the LHC increases luminosity and energy, it will become increasingly difficult to select interesting physics events and remain within the readout bandwidth limitations. An upgrade to the CMS Calorimeter Trigger implementing more complex algorithms is proposed. It utilizes AMC cards with Xilinx FPGAs running in micro-TCA crate with card interconnections via crate backplanes and optical links operating at up to 10 Gbps. Prototype cards with Virtex-6 and Virtex-7 FPGAs have been built and software frameworks for operation and monitoring developed. The physics goals, hardware architectures, and software will be described in this talk. More details can be found in a separate poster at this conference. As the LHC increases luminosity and energy, it will become increasingly difficult to select interesting physics events and remain within the readout bandwidth limitations. An upgrade to the CMS Calorimeter Trigger implementing more complex algorithms is proposed. It utilizes AMC cards with Xilinx FPGAs running in micro-TCA crate with card interconnections via crate backplanes and optical links operating at up to 10 Gbps. Prototype cards with Virtex-6 and Virtex-7 FPGAs have been built and software frameworks for operation and monitoring developed. The physics goals, hardware architectures, and software will be described in this talk. More details can be found in a separate poster at this conference.


Introduction
In the future, the LHC will exceed the original design luminosity of 10 34 cm -2 s -1 and further increase the number of interactions (pileup) in a single LHC crossing.Triggering on the events of interest will become increasingly difficult, and the current CMS trigger algorithms will become less efficient.A proposed solution for the CMS calorimeter trigger will utilize modern high-speed FPGAs and fast optical links, making more flexible and complex algorithms in the hardware possible.

Present Level-1 Calorimeter Trigger
The current level-1 calorimeter trigger consists of the Regional Calorimeter Trigger (RCT) [1] and the Global Calorimeter Trigger (GCT) [2].The RCT receives more than 8000 Trigger Primitives (TPs) from the hadronic, electromagnetic, and forward calorimeters and consists of one 6U and 18 9U crates of custom electronics.In each 9U crate it finds and forwards 8 e/γ candidates (of types isolated and non-isolated), creates 14 central tower sums, 28 quality bits, and 8 forward calorimeter towers and quality bits to the Global Calorimeter Trigger (GCT).
The GCT consists of one 9U and six 6U crates and reduces the number of candidates to 4 isolated and 4 non-isolated e/γ, finds central, tau, and forward jets, and calculates global quantities like missing E T , total E T , and H T .All of these are sent to the Global Trigger (GT) where the final trigger decisions are made.

Motivation
The LHC is currently delivering start-of-fill luminosities of up to 7.5×10 33 cm -2 s -1 with 50 ns bunch spacing at the CMS and ATLAS experiments.Total sustained level-1 trigger rates of up to 90 kHz have been recorded at CMS, and the average number of interactions per crossing (pileup) is around 30-35 to start.By the end of 2017, the LHC luminosity could reach 2×10 34 cm -2 s -1 , with either 25 ns or 50 ns bunch spacing.With these beam conditions, pileup could range from 50 to almost 100 interactions per crossing.At CMS, trigger and detector upgrades will be essential to continue collecting good quality physics data.These upgrades will enable the level-1 trigger thresholds to be kept as low as possible and reduce the effects of pileup by improving the algorithms and resolution.

Planned Improvements to the Calorimeter Algorithms
Upgrades in the calorimeter trigger hardware will benefit the calorimeter trigger objects: e/γ, jet, tau, and global quantities such as missing E T , by improving position and energy resolution and increasing the complexity of the algorithms.For a detailed description of the current algorithms see the CMS Level-1 Trigger TDR [3].
For the e/γ, the hadronic calorimeter depth segmentation can be used to better separate the hadronic from e/γ-like objects.The resolution will be improved significantly, from the current 4x4 trigger towers (ΔR of 0.35) to a half tower (ΔR about 0.044).Changing the topology and separating the hadronic and electromagnetic deposits in the calorimeter will improve isolation of the e/g candidates.Effects of the improvements can be seen in Figure 1.
For the jet triggers the resolution will go from 4x4 towers to 1 tower (ΔR about 0.088).In the forward region (3<|η|<5), the jet trigger will make use of the full granularity of the forward calorimeters, resulting in a resolution improvement of a factor of ~6.The jet algorithm will be more flexible, with the diameter ranging from 8-12 towers and the jet shape circular or square, rather than the current 12x12 tower square.For the tau triggers, which are currently a version of the jet trigger, the cluster size will be significantly smaller than the current 12x12 towers.
Overall, with an upgrade of the global trigger, there will be the possibility to increase the number of candidates, which is currently limited to 4 isolated e/γ, 4 non isolated e/γ, 4 central (|η|<3) jets, 4 tau jets (|η|<3), and 4 forward (3<|η|<5) jets.Pileup subtraction can also be done, improving the accuracy of the first-level decision.Finally, the global quantities will also benefit from calculations with the clusters in the calorimeter, rather than broad swaths of towers.
Figure 1: Effects of some of the planned improvements on the isolated electrons.Left, the improvement in the trigger rate vs. threshold for isolated electrons.Right, the trigger efficiency for isolated electrons.

Architecture of the Upgraded Calorimeter Trigger
A two-layer system will be built, with two architecture options proposed for the processing algorithms.One is a more conventional pipelined trigger, more sophisticated and compact than what exists today at CMS, using the modern FPGA link I/O and data sharing to perform the new trigger algorithms in the desired latency.Individual FPGAs will be dedicated to different algorithms, i.e. one for jets and one for taus.The proposal has been described in detail in reference [4].The other uses time multiplexing, where all trigger primitives of one LHC crossing are transmitted over several bunch crossings but all algorithms are performed in a single FPGA during the same number of crossings.In this case several FPGAs handle the data round-robin style so that the overall latency is not affected.A demonstrator has been built and it is described in reference [5].
The flexibility of modern FPGAs and links makes it possible to reconfigure the architecture during an LHC technical stop.Of the two layers, the first, to be built by the University of Wisconsin -Madison, is described in section 3. The second, to be built by a number of UK groups: Imperial College, Bristol, CERN, RAL, and Iceberg Technologies is described in Section 4. A graphic of the architectures and layer divisions is shown in Figure 2.

CTP-6 and VadaTech 894 -Layer 1
The first layer will consist of custom boards and a custom backplane to receive the trigger primitives via optical fiber from the calorimeters.It will either create 2x2 tower clusters for use in a pipelined trigger processing system, or time-multiplex the trigger primitives to their designated layer 2 boards.

VadaTech 894
Depending on the final implementation, data sharing along a backplane, and between crates may be necessary at this level.To help accomplish this, an enhancement to the preferred CMS µTCA crate, the VadaTech VT892 [6] has been developed.This new crate, the VadaTech

Wisconsin Calorimeter Trigger Processor (CTP)
For the first layer, the University of Wisconsin has built a fully functional Calorimeter Trigger Processor card prototype with dual Xilinx Virtex-6 (XC6VHX250T or XC6VHX350T) FPGAs and 6.5 Gbps capable links.A Front-End FPGA handles the communication with the front panel via 48 receive and 12 transmit optical links and the Back-End FPGA manages DAQ and communication along the VT894 backplane.The board supports TCP/IP for GbE connections via the MCH.Each FPGA logic core has its own dedicated 25A power module.A photograph of the board and description of some of the major components is in Figure 4 and its caption.For more information, see reference [9].
Two of the CTP-6 cards have been built.Extensive loopback testing on the 12 optical outputs to the 48 optical inputs has been performed at 6.4 Gbps over 5m fibers.The link driver and receiver settings affect the results, and settings have been found for error free operation.
The eye diagram from the test can be seen in Figure 5.

MP7 -Layer 2
The second layer will consist of a set of custom boards to receive either the calorimeter voltage/current sensors and 16 temperature sensors on board.The firmware storage can be done via standard PROM or a Micro SD card to allow fast storage of many firmware versions.In addition, it has a USB2 console via a microcontroller.A photograph and brief description of the card is in Figure 6.For more information, see reference [9]. Figure 6: The MP7 prototype trigger card.In the center is a Xilinx Virtex -7 (VX485T) with 6 Avago miniPod receive and 6 transmit, 48 in use for each direction on this version of the MP7.
Testing of the MP7 is underway.The JTAG access to the FPGA and microcontroller via a complex programmable logic device has been verified.The QDR RAM Functionality has been tested to 375 MHz, with 2×13.5 Gbps on each port, with the tests to 500 MHz still to be completed.The Module Management Controller (MMC) code has been ported to the MP7 from the previous Mini-T [5] and includes more monitoring than before.All on board power supplies' voltage, current, and power, humidity, temperature, and more are monitored.Neither test had any special tuning.

Outlook and Conclusions
An upgrade to the current CMS Level-1 Calorimeter is essential for ensuring reliable physics performance during as the luminosity of the LHC increases.Two FPGA-based highspeed calorimeter trigger-processing boards and a new µTCA backplane have been built this year: the CPT-6, MP7, and VT894.Intense testing is underway for these, and the cards will be used in the two layers of the new calorimeter trigger.Initial testing promises good performance.
Additionally, the IPbus/µHAL firmware and software package will allow uniform operation and administration of these systems and others while in operation at CMS.The built-in modularity will allow staging of the new system and a slice will be ready by the end of the LHC long shutdown 1 (end of 2014).

Figure 2 :
Figure 2: Diagram of the pipelined trigger (left) and the time-multiplexed trigger (right).Layer 1 is boxed for each figure.Not shown for the time-multiplexed trigger is a de-multiplexing stage before the trigger decision is made at the global trigger.

VT894 [ 7 ]
, is the same configuration as the VT892, with additional connections to the unused ports.The crate supports 12 double-width, full-height AMC cards with redundant power supply and MCH slots.The MCH1 slot houses a commercial MCH (µTCA Carrier Hub) module, used for GbE connectivity and IPMI control.The MCH2 slot has a custom module; the Boston University AMC13 [8] for TTC downlink and the DAQ interface to the crate.Each AMC slot connects to 20 ports on the backplane with a transmit and receive pair: • Ports 0-3 -For GbE, TTC, and DAQ • Ports 4-7 -Star fabric to slot MCH1 • Ports 8-11 -Star fabric to slot MCH2 • Ports 12-15 and 17-20 -not connected on the VT892, but enhanced with a custom fabric on the VT894.A diagram of the fabric and a photo of the crate are shown in Figure 3.It allows sharing among adjacent processing cards as well as to a Crate Input/Output (CIO) card to enable data sharing among multiple crates.

Figure 3 :
Figure 3: Left: Diagram of the additional backplane connections showing interconnections.Processing cards will be located in slots 2-5 and 8-11, CIO cards for inter-crate data sharing will be in slots 1 and 12. Slots 6 and 7 are reserved for either spare processing or CIO cards.Right: A photo of the VT894 in use with, from left to right, redundant power supplies, Wisconsin CTP-6 card, AMC 13 above a MCH, second CTP-6 card for link testing.

Figure 4 :
Figure 4: The Wisconsin CTP-6 card.From left to right at the top of the board are the MMC (Module Management Controller) circuitry, power supplies, and JTAG/USB console interface mezzanine.In the center of the board are the two Virtex-6 FPGAs and between them, two SDRAM.The 5 Avago 12 channel optical modules, 4 receive and one transmit are at the far right.

Figure 5 :
Figure 5: Resulting Eye diagram from 12x loopback test of the back-end to the front-end FPGA.Currently, a survey of the VT894 backplane links is being performed by moving the CTP-6 cards between different crate slots.At the time of the workshop, about 25% of the VT894 custom fabric links had been tested.Each test is run at 4.8 Gbps at for about 30 minutes and so far none have had errors.
trigger clusters or the time-multiplexed trigger towers via optical fiber from the processing boards of the first layer.The prototype board for this layer, the MP7 (Main Processor 7) is based on the Virtex-7, a large FPGA with large bandwidth I/O.It can have from 1.0-1.4Tb/s of optical I/O, 48-72 links of transmit and receive at 10 Gbps, as well as 50 Gbps of electrical I/O with 28 LVDS links running at 1.8 Gbps.For buffering, the MP7 uses dual QDR RAM of either 72 or 144 MB clocked at 500 MHz.It also has extensive monitoring, with 15

Figure 7 :
Figure 7: MP7 eye diagrams A clean optical eye is seen in the left scope trace.On the right is the electrical received eye as measured by the Xilinx internal 2D eye-scan IBERT firmware, in the full scale available within the analyzer.A simultaneous 48-channel 8B/10B-encoding test has been completed.Over 7×10 23 bits per channel were transmitted and no bit or alignment errors were observed.This includes data capture, counter, and synchronization.Additionally, a simultaneous 24 channel PRBS31 (harsher) test was performed with Xilinx IBERT [10].This was limited to half of the channels due to the IBERT software limitations.Still, 10 13 bits were transmitted without any errors.
Figure 8.The next release of IPbus aims to improve the performance figures by reducing the firmware latency and supporting multiple packets in flight.This update should be available in early 2013.

Figure 8 :
Figure8: The left plot shows the latency in µs vs. the read size in bytes.As the read size increases, the number of packets used goes up and the read bandwidth suffers, as seen in the plot in the right.The discontinuities are where the number of UDP packets increases from 1 to 2 and then from 2 to 3.This issue will be addressed in the next update of the IPbus package.
For the CMS experiment a new hardware control standard, IPbus/µHAL, version 1.0, was released in August 2012[11].These tools allow hardware control via gigabit Ethernet, using UDP as the main transport protocol, with software support for TCP available.A complete solution is provided: An IPbus/UDP Firmware module to add to a FPGA design, The µHAL (microTCA Hardware Access Library) application programming library, and ControlHub for serializing concurrent accesses from multiple clients.