Radiation tests of key components of the ALICE TOF TDC Readout Module

A. Alici¹, P. Antonioli²*, A. Mati², S. Meneghini¹, M. Pieracci², M. Rizzi¹, C. Tintori²

(1) INFN Bologna, Via Inerio 46, Bologna, Italy
(2) CAEN S.p.A., Via Vetraria 11, Viareggio, Italy

Abstract

The ALICE Time-of-Flight (TOF) system will be a large area (140 m²) detector made by Multigap Resistive Plate Chambers (MRPC). The read-out will be performed by a VME TDC Readout Module (TRM) hosting each 30 High Performance TDC chips (HPTDC).

Radiation tests carried out at Zurich PSI with a 60 MeV proton beam line on key components of the TRM (FPGA, RAM, Flash memory, voltage regulators, microcontroller and others) are described, with an emphasis on the validation of the watchdog mechanisms of SEU events, as well as protection from latchups. In particular, an Altera Stratix (0.13 CMOS technology) was tested, as well as its internal CRC check mechanism for configuration bits, provided by the vendor. The HPTDC, already exposed at Louvain CRC on a proton beam line by a CMS group, was additionally tested at SIRAD facility at INFN Legnaro laboratories with heavy ions to fully characterise the SEU sensitivity as a function of the LET.

The combination of all measurements allowed a full assessment of expected SEU error rates inside TRM, validating all proposed components, with the noticeable exception of the Altera FPGA. A Flash based FPGA from Actel is now our baseline option.

I. INTRODUCTION

The Time-Of-Flight (TOF) detector of the ALICE experiment [1,2] has to provide particle identification information in the momentum range between 0.5 GeV/c and 2.5 GeV/c in the central region (|η|≤1) through precise time measurements of pulses induced by particles crossing the MRPCs. Time digitisation will be carried out by the HPTDC chip [3]. Over 20,000 HPTDC chips will be installed in the detector. Each Tdc Readout Module (TRM) card will house 30 chips. The conceptual design of TRM, board prototypes results and time resolution performances achieved have been presented elsewhere [4]. Preparing for final engineering and production, due in 2005/6, we conducted during 2004 devoted irradiation campaigns to validate all the components.

A. The TDC Readout Module

A logic block scheme of the TRM is shown in Fig. 1. The card, a VME slave card, will host 30 HPTDC, organised in two separate 32-bit parallel readout chains. A central FPGA will act as readout controller, event manager and implement also the VME interface (these three functions are separately shown in Fig. 1). To make easier maintenance and to match the Front End granularity, the HPTDCs will be mounted in 10 piggy-back cards, connected to both sides of a central motherboard. A central aluminium bar will guarantee the needed cooling of HPTDC.

Figure 1: TRM logic blocks

After hit matching on L1 arrival, the FPGA will move matched hits from HPTDC readout FIFO to two coupled SRAM, that will mimic a dual port RAM. Finally an L2a signal will make available the requested event to the output FIFO.

Between L1 and L2, the FPGA will provide data packing and online data compensation for Integral Non Linearity (INL) of the HPTDC [3], accessing a LUT in SRAM. Flash memory will host a non-volatile copy of HPTDC LUT, as well as two copies of FPGA firmware, to allow FPGA remote programming. The microcontroller will act mainly as a latchup and SEU watchdog. It will additionally trace error conditions signalled by the HPTDCs (notably SEU autodetections inside configuration bits) and to proper react through the JTAG interface of the chips.

* presenting and corresponding author: antonioli@bo.infn.it
A DSP option, originally foreseen for data processing between L1 and L2, was dropped, fully exploiting FPGA capabilities. Key criteria selecting components to implement the discussed scheme are the possibility to remotely upgrade the firmware and to be fault-tolerant and “SEU-aware” with respect to SEE radiation induced.

The TOF system will provide readout for 684 TRM cards, housed in 72 custom 12 slots VME crates.

B. The HPTDC

The TRM is based on a High Performance TDC (HPTDC) ASIC [3], developed by CERN/EP Microelectronic group for LHC applications, with multi-hit and multi-event capabilities. The ASIC provides relative time measurement of each hit at external trigger arrival. The time digitization is based on a clock synchronous counter and a DLL interpolator. The external 40 MHz clock is internally multiplied by a PLL to feed properly the DLL to reach the required resolution. The HPTDC has a 647 configuration bits register, plus a 40 control bits register, accessible through its JTAG interface. Digitised hits are stored in four L1 buffers, 256 hits deep and shared by 2 channels each. Matched hits with the external trigger are moved to a readout FIFO, 256 hits deep. The total amount of internal memories is 5.1 kB.

C. The ALICE/TOF radiation environment

The TRM will operate in a moderate hostile environment for what concerns total levels of radiation. A total dose of 1.2 Gy is actually expected in 10 years, with a total charged hadrons and neutron fluence of $2.1 \times 10^9$ cm$^{-2}$ with energy above 20 MeV [4]. If damages for total integrated dose are likely to be negligible, protections for latchups are needed, as well as an adequate prediction of SEU error rates. The maximum flux of charged hadrons and neutrons above 20 MeV will occur in ALICE/TOF during Pb-Pb collisions and it was estimated to be 89 Hz/cm$^2$[5].

We planned an irradiation campaign using proton beams to test the components, excluding the HPTDC, used in the TRM, even if irradiation data for some of these devices are already available. Obviously among RAM and Flash memories and few other components like voltage regulators and limited current p-switch, we focused this test on the candidate FPGA, an Altera Stratix. Realized in 0.13 CMOS technology, this device provides built-in mechanism to check upsets in its configuration bits. At the same time, even if already qualified for space applications, the microcontroller chosen (an Atmel ATMEGA16) to act a latchup watchdog and boot controller needed to be tested carefully. This test is described in section II of this paper.

The HPTDC was not implemented in a technology guaranteed to be radiation hard and SEU insensitive, nevertheless it has self-checking built-in mechanisms enabling it to auto-detect SEU occurrences. The CMS group, which is using the HPTDC for readout of muon barrel drift chambers, tested at Louvain CRC facility the HPTDC for SEU, irradiating 8 HPTDC with 60 MeV protons for a total fluence of $5 \times 10^{10}$ cm$^{-2}$. They registered just one SEU. Extrapolating their measurement [7], taking into account the different radiation levels and the number of HPTDC used, we expect a total rate of 2.4 SEU per day in the whole detector. Due to the large amount of HPTDC used we wanted to precisely characterize the SEU threshold of the device. We therefore additionally planned an irradiation campaign with heavy ions at SIRAD facility [8] at INFN Legnaro Laboratories, complementing the existing measurement. This test is described in section III of this paper.

II. RADIATION TESTS AT PSI

The test board used for this test (shown in Fig. 2 as block diagram and during the irradiation in Fig. 3) was designed to contain all the components to be housed in the central main board of the TRM. A complete list of the tested components is given in Table 1.

![Figure 2: Block diagram of test board used at PSI](image)

We connected the FAULT and ON signals of the MAX893L to the microcontroller, which was therefore responsible to handle any latch-up. Additionally a latchup protection was inserted to protect the microcontroller itself. The Altera Stratix CRC_ERROR pin was finally connected again to microprocessor. When the FPGA asserted that pin, the micro provided to reboot the FPGA: any error condition catch by the micro was reported through an RS232 interface. When a fault condition is asserted by the p-switch, the

---

Table 1: Components irradiated at PSI

<table>
<thead>
<tr>
<th>Component</th>
<th>Functionality</th>
</tr>
</thead>
<tbody>
<tr>
<td>EP1S20F780</td>
<td>FPGA</td>
</tr>
<tr>
<td>IDT71V416S</td>
<td>RAM</td>
</tr>
<tr>
<td>ATMEGA16</td>
<td>Microcontroller</td>
</tr>
<tr>
<td>AT45DB161B</td>
<td>Flash memory</td>
</tr>
<tr>
<td>MAX893L</td>
<td>Current limited p-switch</td>
</tr>
<tr>
<td>ADP3339AKC-1.5/2.5</td>
<td>Low drop voltage regulator</td>
</tr>
<tr>
<td>Statek CX03M</td>
<td>Clock</td>
</tr>
</tbody>
</table>

---

microcontroller provides to switch off and then on the relevant component, checking the fault is cancelled. Moreover, through a parallel port interface, we monitored the internal status, as well as RAM, Flash and internal memories of the FPGA. We used 62% of internal memory resources of the FPGA (1 Mbit of RAM). The design implemented in the FPGA, besides the needed I/O interface, included a large shift register (4 kbit deep), monitored for internal logic error. A global cross section for the device of $6.5 \times 10^{-8}$ cm$^2$ has been obtained using two independent methods. In one case we stopped the irradiation immediately after the CRC_ERROR was asserted (square point in Fig. 4). Reducing enough the beam intensity ($\approx 5 \times 10^7$ Hz/cm$^2$), we then run continuously the device (that is we did the reboot under irradiation but with a very low probability to observe a SEU during boot). This method is reported with full circles in Fig. 4. As can be seen, we found the two methods gave very similar result, once taken into account the dead time during the boot phase. The cross sections expressed in cm$^2$/bit for the memories tested are shown in table 2. It is worthwhile to note the cross section measured for the Stratix is reasonably consistent with results obtained for SRAM circuits realized with commercial 0.13 CMOS technology [9]. Moreover, exploiting the configuration check provided by the vendor of the device, the measurement corresponds to an effective and complete monitor of all the configuration bits.

<table>
<thead>
<tr>
<th>Component</th>
<th>$\sigma$ (cm$^2$/bit)</th>
</tr>
</thead>
<tbody>
<tr>
<td>EP1S20F780 (conf.)</td>
<td>$1.1 \times 10^{-14}$</td>
</tr>
<tr>
<td>EP1S20F780 (mem.)</td>
<td>$3.4 \times 10^{-14}$</td>
</tr>
<tr>
<td>IDT71V416</td>
<td>$8.5 \times 10^{-15}$</td>
</tr>
<tr>
<td>AT45DB161B</td>
<td>$&lt;10^{-19}$</td>
</tr>
</tbody>
</table>

No latchups have been observed in any of the devices under test. We didn’t observe also any SEE in Flash memories and in the ATMEGA16. All the devices were irradiated up to a total dose of 14 krad. Being the microcontroller the main controller of the card, it was additionally irradiated up to 20 krad, without recording any error condition.

### III. HEAVY IONS IRRADIATION OF THE HPTDC

The SIRAD facility provides ion species with kinetic energies ranging from 30 MeV (H) to 334 MeV (Au). The corresponding surface LET spans from 0.015 MeV cm$^2$/mg to 80 MeV cm$^2$/mg. On the basis of previous irradiation data of memories in 0.25 CMOS technology, we selected LET values from 3.9 to 41.7 MeV cm$^2$/mg. Due to the ion energies available at SIRAD, the HPTDC was decapsulated and mounted in a test socket, as shown in Fig. 5.

Figure 3: The test board at PSI

Figure 4: Device cross section for the Altera Stratix (see text for explanation of the square point at 60 MeV)

Figure 5: Test card used at SIRAD. The decapsulated HPTDC is visible on the right.
The test card provided access to the HPTDC through its JTAG interface using an Altera MAX CPLD. We handled the I/O through 10 LEMO connectors mounted into the back side of the card. Even in this case a MAX893L protected the irradiated HPTDC, being the FAULT and ON signals managed through the CPLD.

We irradiated the chip for each LET in three different setups: (a) no hit stored in internal buffers; (b) half readout FIFO filled (128 hit stored); (c) all L1 buffers and readout FIFO full (1280 hits stored), varying the beam intensity in such a way to have a SEU occurrence in a time interval much larger than our periodic check of the configuration (every 100 ms). Readout FIFO and L1 buffers parity error can be detected only when reading back data: data were read back every 10 seconds. No latchups have been detected. The total integrated dose by the device was 4.1 krad.

The global cross section measured for a configuration upset of the HPTDC is shown in Fig. 6. This curve includes also errors detected in state machine logic, not only in configuration bits, and this contribution is shown with the lower points.

A Weibull fit using the usual formula

$$\sigma = \sigma_{\text{ini}} \left(1 - \exp\left[-\left(\frac{\text{LET} - \text{LET}_{\text{ini}}}{W}\right)^{3}\right]\right)$$

is also shown. The fit for the configuration bits only gave $\sigma_{\text{ini}}=1.3 \times 10^{-7}$ cm$^2$/bit, LET$_{\text{ini}}=4.1$ MeV/cm$^2$/mg, $W=27$ MeV/cm$^2$/mg, $S=1.7$. These values are in good agreement with values measured for 0.25 CMOS memories and shift registers as in [10].

Fig. 7 shows the result for the L1 buffers and for the readout FIFO separately. Using the procedure described in [11] and the Weibull fit values, we estimated a cross-section for protons at 60 MeV. The result for the four discussed components of the HPTDC is given in Table 3.

Table 3: SEU cross sections (estimated for proton at 60 MeV) for each HPTDC component

<table>
<thead>
<tr>
<th>Component</th>
<th>(\sigma) (cm$^2$/bit)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Configuration/control bits</td>
<td>$3.8 \times 10^{-12}$</td>
</tr>
<tr>
<td>Readout FIFO</td>
<td>$6.7 \times 10^{-12}$</td>
</tr>
<tr>
<td>L1 buffers</td>
<td>$2.4 \times 10^{-12}$</td>
</tr>
<tr>
<td>Global</td>
<td>$3.0 \times 10^{-12}$</td>
</tr>
</tbody>
</table>

IV. ERROR RATE ESTIMATIONS

From the obtained cross sections, it is straightforward to obtain error rate estimations for the different components inside TRM, multiplying them for the abovementioned maximum foreseen rate of neutrons and charged hadrons above 20 MeV (100 Hz/cm$^2$ including some safety factors). However it should be noted that for some memories (as the HPTDC L1 buffers and readout FIFOs as well as the TRM internal event buffers) they are dependent also from L1/L2 latencies, L1 rate, readout time and occupancy inside in the detectors. We computed these error rates under conservative assumptions (L1 = 1 KHz, TOF occupancy at 30%). They are presented in Table 3, for one card and within the whole system in term of minimum time between failures (MTBF).

Table 4: MTBF for different TRM components

<table>
<thead>
<tr>
<th>Component</th>
<th>MTBF/TRM</th>
<th>MTBF/TOF</th>
</tr>
</thead>
<tbody>
<tr>
<td>Stratix (conf)</td>
<td>43 hours</td>
<td>3.8 min</td>
</tr>
<tr>
<td>INL LUT</td>
<td>6.8 days</td>
<td>14.3 min</td>
</tr>
<tr>
<td>Event buffer</td>
<td>2.9 years</td>
<td>2300 min</td>
</tr>
<tr>
<td>HPTDC (conf)</td>
<td>3.4 years</td>
<td>1.8 years</td>
</tr>
<tr>
<td>L1 buffers</td>
<td>&gt;400 years</td>
<td>260 days</td>
</tr>
<tr>
<td>Readout FIFO</td>
<td>&gt;1600 years</td>
<td>800 days</td>
</tr>
</tbody>
</table>
Three main results are achieved:

1) The error rate in the whole system for the Altera Stratix is clearly unacceptable;

2) Despite the large number of chips deployed, the error rate inside HPTDC is well under control. During ALICE life, some board will never experiment SEU inside HPDTC internal memories;

3) Excluding the FPGA, the more frequent upset source will come from the INL LUT for the HPTDC.

V. CONCLUSIONS AND OUTLOOK

During the 2004 irradiation campaign, we successfully validated almost all the proposed components for the ALICE TOF Tdc Readout Module, including memories (RAM and Flash), micro controller, clock, voltage regulator and limited current p-switch. A detailed analysis showed that most frequent upset will occur in HPTDC look-up tables. Adequate firmware is under development to trace the upset and force a reload of the LUT from the Flash memory.

The cross-section measured for the tested FPGA is, as far as we know, one of the few existing for Altera 0.13 devices. The CRC control mechanism in Altera Stratix worked very well. Depending on system dimension, radiation level and application, the use of this device, coupled with a SEU quasi-immune device for boot and monitor, as our micro or a CPLD, it is a suitable scheme for LHC applications.

Unfortunately as the numbers discussed in section IV showed, this is not the case for the ALICE/TOF. Error rate is clearly too high and we are currently moving our design to an Actel ProAsic Plus, which are substantially immune to SEU [12]. Even if such a solution slightly complicates the design to handle the remote programming of the card (and also additional power supply has to be provided), we will certainly gain a greater robustness of the system with respect to SEU.

The irradiation with heavy ions of HPTDC gave results in nice agreement with existing data (irradiation with protons), allowing also a test of internal memory buffers. This error rate in the whole system is well under control and, as expected, the upset in configuration bits is the dominating error source.

A global estimation of TRM SEU rate in all its components has been obtained.

ACKNOWLEDGMENTS: we want to thank F. Facio for his continuous collaboration and in particular to provide us the calculations to obtain cross sections for protons from the measured ones at a given LET. A. Candelori (SIRAD) and W. Hajdas (PSI) are gratefully acknowledged for their precious support and help with the operation of the beams.

VI. REFERENCES


[12] E. Dénès et al., proceedings of this conference