Upgrade of the ALICE-TPC read-out electronics

The ALICE experiment at CERN LHC employs a large volume time projection chamber (TPC) as its main tracking device. Instigated by analyses indicating that the high level trigger is capable of sifting events with rare physics probes, it is endeavoured to read out the TPC an order of magnitude faster then was reckoned during the design of its read-out electronics. Based on an analysis of the read-out performance of the current system, an upgrade of the front-end read-out network is proposed. The performance of the foreseen architecture is simulated with raw data from real 7 TeV pp collisions. Events are superimposed in order to emulate the future ALICE running conditions: high multiplicity events generated either by PbPb collisions or by the superposition (pile-up) of a large number of pp collisions. The first prototype of the main building block has been produced and characterised, demonstrating the feasibility of the approach.


Introduction
A large volume (90 m 3 ) time projection chamber (TPC) is the primary detector of the dedicated heavy ion experiment ALICE ("A Large Ion Colliding Experiment") at the CERN Large Hadron Collider (LHC). It is of hollow-cylindrical geometry with a length of 500 cm and inner and outer radii of 90 cm and 250 cm, respectively. The signal read-out is performed by 557,578 active channels distributed over 4,356 front-end cards (FECs) on two end-plates and is designed to resolve the signal of up to 20,000 particles emerging from a single, head-on PbPb collision [1].
The huge number of active elements together with a fine sampling of the arrival time (940 samples over 94 µs of drift time) result in raw event sizes of about 700 MByte, which is reduced to around 70 MByte for central collisions of lead ions by sophisticated on-detector digital signal processing. Trigger rates above 100 Hz require a high bandwidth and high throughput readout network.

Front-end electronics
The front-end electronics design is based on two custom integrated circuits: PASA (pre-amplifier and shaping amplifier) and ALTRO (digitisation, signal processing and event buffering). Each chip houses 16 channels, with 8 chips assembled onto a FEC, yielding 128 channels per card. In addition a small FPGA, the "board controller" (BC), is located on each FEC to provide monitoring (temperature, voltages and currents) and a special read-out mode "sparse read-out" (explained below).
The FECs are distributed over 216 readout partitions: 2 sides × 18 sectors × 6 partitions. Dependent on their radial position the partitions comprise from inner-to outermost 18, 25, 18, 20, 20, 20 FECs. Each FEC has a 40-bit wide GTL (Gunning transceiver logic) bus interface as well as separate lines for triggers, clocks (10 MHz sampling and 40 MHz read-out) and a slow-speed serial monitoring bus.

Running scenarios
One can distinguish three basic running scenarios of ALICE: • low rate pp data taking without pileup, • high rate pp data taking with pileup from different bunch crossings, • PbPb data taking with minimum bias events and isolation trigger.
Being designed to read central PbPb collision events, the TPC is essentially empty in the first scenario, and neither tracking nor reconstruction prevent to run the detector with pileup. Simulations suggest that 10-30 events would not be difficult to handle in the current framework. In this very early stage of operation, however, ALICE takes data with very little pileup within the TPC to ease the reconstruction.
To obtain performance figures, of both the current read-out electronics and that of an upgrade in the other scenarios, simulations overlaying real events were carried out (figure 1). They are based on a sample of √ s = 7 TeV, min. bias pp events without pile-up and do not involve reconstruction of the data. Despite the duplication of noise (which is however very localised and thus does not have a significant impact) this is the best model-free approximation available.
2 Current read-out

Architecture
Currently the read-out network is divided into 216 independent read-out partitions, with each partition connected individually to the central services by three links: a 160 MByte/s optical link (DDL) to the data acquisition (DAQ), an optical link (TTC) for clock and trigger reception from the central trigger processor (CTP), and a 10 MBit/s Ethernet connection to the detector control system (DCS). Within each read-out partition, a read-out control unit (RCU) interfaces the FECs to these links via two buses. This layout is depicted in figure 2a.

Performance
The electronics was designed to read-out low rate, high occupancy PbPb events with focus on using a robust, fault-tolerant protocol. At low occupancies however this implies a penalty since the involved protocol overhead imposes a limit on the read-out rate. To overcome this limitation, a special read-out mode, "sparse read-out", has been implemented. In this mode the BC of each FEC compiles a list of non-empty channels and transmits it to the RCU, which in turn reads out only these channels. Because the BCs can build the lists in parallel, this significantly speeds up the read-out. The drawback is that the FECs are isolated from the bus during this operation, making them blind to triggers, which makes the use of the multiple event buffer impossible.
The read-out performance obtained by the simulations described above is summarised in figure 4a. It shows the read-out time T rdo = max i {T rdo,i } of the slowest of the 216 partitions and the maximum throughput ρ = max i {S rdo,i }/ max i {T rdo,i } together with the event-by-event spreads. It is obtained by injecting the previously described data samples into the read-out electronics (utilising a dedicated memory of the ALTROs [2]) and reading them out. The data is however not sent to the DAQ to avoid any backpressure and thus represents the optimum case. Very good performance is observed for high occupancy events, while one clearly sees the impact of the protocol overhead onto the performance for those events with low occupancy. The figure shows the performance without using the derandomising multiple-event buffer, which would decrease the average read-out time, but cannot be used in sparse read-out.

Proposed upgrade 3.1 Motivation
The main motivation for an upgrade is driven by physics: since during the last years the estimation for the charged particle multiplicity of central PbPb collisions reduced by a factor of three down to dN ch /dη ≈ 1500 [3], we are confident that the detector will be able to run at higher rates. In addition the increasing capabilities of on-line data reconstruction and selection of interesting events by the high level trigger allow for very high event rates, even if they cannot all be written to tape.
In addition, an upgrade can also enhance the read-out electronics by resolving the sparse readout versus multiple-event buffering issue and the protocol overhead limitation for small event sizes. Furthermore a new design can increase the fault tolerance of the system.  Figure 4b shows the performance of a single FEC indicating that it can be read-out ten times faster than a full partition. Figure 4a in turn shows that, for low occupancies, this would fit even into the currently available bandwidth. For central PbPb events the off-detector bandwidth would need to be increased to yield a big impact on throughput -which is foreseen to happen in 2016. The planned upgrade is based on a different network topology in which the bus architecture of the current network is replaced by a star-like one (figure 2). The links are furthermore implemented using high speed serial lines instead of a parallel bus. This is done by 4,356 interface cards, "FECints", that establish a high speed serial connection between FECs and about 200-500 concentrator nodes "master RCUs" (figure 2b):

Architecture/topology
Master RCU. The master RCU forms the star-point in the data flow. It connects via high speed serial links to the FECints and provides interfaces to the central services. Depending on the ALICE-DAQ strategy these will either be the current interfaces to DAQ, CTP and DCS or a new link such as the GBT and/or versatile link. These are implemented into a single Xilinx Virtex-6 FPGA. We foresee at least 216 such master RCUs, but the modularity allows for an arbitrary number and partitioning.
FECint. The FECint is an interface to each FEC, translating its parallel GTL bus interface to a high speed serial one. The link is dimensioned to deal with the peak data rate of the FEC of 1.6 GBit/s using a (at least) 2 GBit/s 8b10 encoded protocol. Moreover the FECint provides the FEC with the clock and trigger signals as well as with the slow control communication.
The link to the master RCU currently foresees four differential lines for 1) data transmission, 2) data reception, 3) reference-clock reception and 4) trigger reception, as well as three lines for I 2 C communication.

FECint prototype
A first prototype (figure 3) of the FECint has been designed and characterised. It was designed to assess both mechanical and signal integrity issues as well as cost. It is based on an Xilinx Virtex-II Pro FPGA, which was selected as the smallest available FPGA with high speed serial links that resembles a Xilinx Spartan-6. The latter is the device of choice for price reasons and will be used for subsequent prototyping.
In order to reduce space and component count, the FECs are interfaced directly using CMOS levels; GTL drivers were omitted. This was first demonstrated in a dedicated set-up and was shown to be working correctly -mainly because no bus but a point-to-point connection is involved. The design study resulted in a 4 × 18.4 cm 2 PCB. It features four serial links, two with swapped receive and transmit lines and two kinds of connectors (HDMI and mini-HDMI). This allows to close the loop and try different cables of different connector types and lengths.
The prototype was used to communicate with the FECs as well as through the serial links and shown to work. The firmware features an integrated state-analyser for the ALTRO bus as well as a flexible clock generation that allows for variable phase adjustments. A two-dimensional phase scan of the sampling and read-out clock and the resulting command transaction time is exemplary shown in figure 5.
Despite a minor mechanical issue concerning the FEC connector heights, the prototype works nicely.

Conclusions and planning
It has been shown that an upgrade is feasible and could yield an increase in read-out speed by a factor of about ten.
A second iteration of the FECint is ongoing and will be finalised by the end of the year. At the same time the first PbPb collisions will have been recorded and will provide grounds for a re-assessment of the detector stability and relevant data-rates.
In case a significant improvement is confirmed, we are ready for qualification of components and a large scale prototype engineering in 2012-2013, production and testing in 2014-2015, and installation in 2016 during the long LHC shutdown.