ATLAS IBL: integration of new HW/SW readout features for the additional layer of Pixel Detector

An additional inner layer for the existing ATLAS Pixel Detector, called Insertable B-Layer (IBL), is under design. The front-end electronics features a new readout ASIC, named FE-I4, which requires new off-detector electronics, currently realized with two VME-based boards: the Back Of Crate module (BOC) implements optical I/O functionality and the ReadOut Driver module (ROD) implements data processing functionality, plus a Timing Interface Module (TIM). This paper presents a proposal for the IBL readout system, mainly focusing on the ROD board.


Introduction
An additional inner layer for the existing ATLAS Pixel Detector [1], named Insertable B-Layer (IBL) [2], is under design. IBL will allow robust tracking despite effects arising from luminosity, hardware lifetime and radiation. The IBL will also provide improved precision for vertexing and b-tagging to the current detector.
IBL front-end electronics features a new readout ASIC (FE-I4 [3]), which requires new offdetector electronics, currently realized with two VME [4] boards: the Back Of Crate module (BOC) implements optical I/O functionality and the ReadOut Driver module (ROD) implements data processing functionality.
This document proposes a new architecture for the ROD card, which provides backward compatibility for operation with current Pixel-BOCs and support for a modified architecture of the off-detector readout, with improved performance and a clear separation of data-flow and calibration tasks.

Current ROD/BOC system
In the current readout system a BOC -ROD pair connects to up to 32 detector channels at 40 Mb/s via optical links (or 8 channels at 160 Mb/s). As shown in figure 1, the BOC provides an optical I/O interface to the Pixel Detector: in particular the BOC is able to transmit configuration and trigger commands to the module front-end electronics and also to receive data from the detector channels and send them to the ROD, which performs formatting and event fragment building. The event fragments are then encapsulated into S-Link packets and routed towards the ATLAS DAQ system through the S-Link [5] on the BOC. 132 ROD and BOC pairs and 132 S-Link boards are used in the ATLAS Pixel Detector.
The current ROD board hosts 11 Xilinx FPGAs for data formatting, event fragment building and routing and 5 Texas Instruments Digital Signal Processors (DSPs): one of them, the Master

Readout driver card proposal
The idea for IBL readout chain is to keep BOC-ROD task subdivision as in the current Pixel readout chain, assigning the data path (optical link interface to the front-end chips and S-Links) to the BOC and part of the data processing to the ROD and part to an external computer farm. The two boards will benefit from up-to-date programmable devices and technologies in order to increase system performances and to have a more compact system. A description of the proposed BOC is available at [6], while this paper will focus on ROD details.
The aim of the new 9U VME ROD card is to process data coming from 16 IBL modules for a total number of 32 FE-I4 data links at a rate of 160 Mb/s each. This number of channels accounts for a total I/O bandwidth of 5.12 Gb/s that requires the use of 4 S-Links since no data reduction is foreseen. This leads to a data I/O bandwidth as large as 4 times the current ROD board.
The proposed architecture for the new ROD card (see figure 2) is based on the use of modern FPGA devices, while no DSP chip is foreseen. The main idea is to perform on-board event fragment building and histogramming, while sending histograms (or even raw data) to a computer farm for fitting operations. In more details, the new ROD executes the data taking in order to obtain the perpixel calibration of noise and charge response by accumulating the per-pixel occupancies, sums of Time Over Threshold 1 (ToT) and sums of ToT squared parameters. Histograms are created, saved on RAMs and eventually transferred via Gbit Ethernet to an off-line high-performance computer farm. Gbit Ethernet links will be used in order to avoid bottlenecks on the data transfer experienced with the VME bus. The execution of data fitting using computers allows an improved flexibility for fit code development, since more convenient tools are available compared to the DSP environment.
Another idea is to transfer the current ROD Master DSP functions to a PowerPC 2 core inside a programmable device, in order to have an architecture without DSPs: this has the great advantage of leading to a single design and simulation environment within which both logic and PowerPC software can be verified together. This should allow a reduction in the design and debug time, if compared to having two different design systems for FPGAs and DSPs. In more details the foreseen architecture is based on one Virtex5 FPGA with PowerPC capabilities implementing the ROD controller function, plus 2 Xilinx Spartan6 programmable devices implementing event builder and histogrammer blocks. These commercial devices also allow the reuse of most of the VHDL code that has been designed to implement the firmware on the current ROD card for the ATLAS pixel detector. The next paragraphs provide more details about the main firmware blocks under design.

ROD controller
The ROD controller (see figure 3) is a Virtex-5 programmable device with an embedded PowerPC core performing all the basic ROD control operations. The PowerPC runs a software which performs system control and non real-time functions, while the remaining FPGA logic performs all the real-time functions.
The PowerPC provides FE-I4 configuration to a command interface block using a serial port with data received from the VME host or from an external PC through a Gbit Ethernet connection, that can be used as a higher speed alternative to the VME bus. It also drives the calibration runs by providing software triggers through the same serial port and controlling that the data taking is working properly. The PowerPC core also drives the setup buses used to read or control all the configuration registers on the ROD and BOC boards; a bus arbiter block controlled from the VME host decides whether the PowerPC is controlling the setup buses or if they are controlled by the VME host itself.
The VME host is able to access the processor memory when needed in R/W mode. Some of the real-time functions are the decoding of trigger information from the TIM board and consequent distribution to the BOC board and the routing of the busy signal to the TIM in order to stop the generation of triggers when the ROD is not able to receive any of them.

Event builder block
The event builder block shown in figure 4 has the main purpose of reading and buffering data from 8 FE-I4 chips through the BOC by means of a FIFO-like interface: as soon as the event builder block recognizes that the BOC buffer has data inside, it starts reading data on two 8 bit buses at 80 Mb/s and pushing them into two FIFOs. Data are then popped out and a frame is produced in which each event contains a header (along with L1ID + BCID information coming from the ROD controller), data from the 8 FE-I4s in an ordered manner, plus a trailer containing a data error summary. Such frames are then sent along 3 different pathways: they are stored in a FIFO before being transferred towards one S-Link, they are sent to the Histogramming block and also towards the Gbit Ethernet block.

Histogrammer
In order to calibrate the detector, a number of histograms have to be collected during detector operation: this task is accomplished on the histogrammer block, which perform calculations during the calibration tasks, such as threshold scan and ToT scan. The goal is to accumulate per-pixel occupancy values, sums of ToT and sums of ToT 2 parameters in order to create histograms that are saved on-the-fly on a very fast memory available inside the FPGA chips or on external synchronous static RAMs and eventually transferred via Gbit Ethernet to an off-line high-performance computer, where the fit processing is executed. The logic required inside the FPGA to perform histogramming is simple and requires little logic resources apart from memory. Figure 5 shows the block diagrams for threshold scan requiring a 1-byte wide adder per pixel and for ToT scan with the additional adders from the sum of ToT (12 bits) and the sum of ToT 2 (16 bits). The internal memories are dual-ported and capable of reading and writing one word per clock period at the same time.
The two Gbit Ethernet links allow to reach an overall maximum data bandwidth of 220 MB/s for sending the histograms towards the external PC farm. This value is much higher than the 4 MB/s bandwidth supported by the VME bus in the current system. At the same time the usage of some GHz processors in the PC farm allows to speed-up the fit operations, if compared to the 220 MHz DSPs used in the current ROD board.

JINST 6 C01018
The communication protocol with the external PC through the Gbit Ethernet link is a custom made protocol in which the ROD board needs to receive a continuous feedback from the PC. In particular the ROD board transmits block n after receiving from the PC the acknowledge that all previous blocks at least up to block n − 2 have been correctly received. In case of packets loss a timeout mechanism is foreseen.

Conclusions
A new architecture for the IBL off-detector readout is proposed with major modifications to the current BOC-ROD boards in order to manage a higher data bandwidth and achieve better performances. In particular a ROD card without DSPs is presented with the innovative approach of using a PowerPC core for high-level operations, a histogrammer based on FPGA logic blocks and the use of a PC farm for fit operations. The presented design has also the important advantage of allowing the designers to use a single design and simulation environment both for FPGA logic and the embedded processor software, leading to a simpler and more manageable system.