First measurement of single event upsets in the readout control FPGA of the ALICE TPC detector

This paper presents the first measurement of Single Event Upsets (SEUs) in the configuration memory of the readout control FPGA of the ALICE Time Projection Chamber. The measurements have been performed during pp collisions at a center-of-mass energy of (S)1/2 = 7 TeV in the period from May to August 2011. A linear dependence was observed between the number of SEUs and the luminosity seen by ALICE. In addition the spatial distribution of SEUs have been analysed for the 216 FPGA installed on the two end-plates of the TPC detector.

ABSTRACT: This paper presents the first measurement of Single Event Upsets (SEUs) in the configuration memory of the readout control FPGA of the ALICE Time Projection Chamber. The measurements have been performed during pp collisions at a center-of-mass energy of √ S = 7 TeV in the period from May to August 2011. A linear dependence was observed between the number of SEUs and the luminosity seen by ALICE. In addition the spatial distribution of SEUs have been analysed for the 216 FPGA installed on the two end-plates of the TPC detector.
KEYWORDS: Radiation damage monitoring systems; Radiation-hard electronics; Front-end electronics for detector readout

Introduction
The ALICE experiment [1] is an experiment at the Large Hadron Collider (LHC) where high energy beams of particles (Lead-Lead and proton-proton) will be collided. These collisions give rise to a high primary particle production rate which further produce secondaries through hadronic and electromagnetic cascades in the structural elements of ALICE [1]. The result is particle fluxes that may pose a reliability risk to the front-end electronics in the main tracking detector of AL-ICE, the Time Projection Chamber (TPC) [2]. An important node in the TPC readout electronics is the Readout Control Unit [3] (RCU). It uses an SRAM based Xilinx Virtex-II Pro 7 Field Programmable Gate Array (FPGA) for data readout, hereafter called the RCU main FPGA. In total 216 RCUs are distributed over the two sides ("A" and "C") of the TPC detector, 18 trapezoidal sectors in azimuthal, and 6 partitions in radial direction, see figure 1. A major drawback of Static Random Access Memory (SRAM) based FPGAs is their susceptibility to radiation induced effects [4], in particular Single Event Upsets (SEUs). An SEU refers to any type of memory cell whose content or value has been changed into an erroneous state due to an ionizing radiation event. For the front-end electronics of the TPC detector, it is the hadronic cascade and in particular the high energy hadrons which are the main concern. They can induce nuclear interactions with the device material and produce fragments with with sufficient ionization power to induce SEUs.
An SEU in the configuration memory of this FPGA can lead to a variety of undesirable effects and consequently cause a malfunction in the operation of the FPGA. This can potentially interrupt the readout functionality of detector data [5]. To improve the radiation tolerance of the RCU main FPGA a reconfiguration solution has been implemented to detect and correct SEUs in the SRAM configuration memory of the FPGA [6]. This solution runs continuously during normal operation and consequently offers online monitoring of SEUs in the configuration memory of all the RCU main FPGAs. In this paper we will present results from the SEU measurement performed during stable beam conditions for the period of May to August 2011.

The RCU reconfiguration solution
As the RCU main FPGA is in charge of data readout for the TPC detector, it is important that this FPGA is kept in operational status during a data taking run. A reconfiguration network has therefore been added to correct SEUs in the configuration memory of the RCU main FPGA. The main parts of this network are the Detector Control System board (DCS board) [7] board running an embedded computer, an Actel Flash based support FPGA (Actel ProASIC plus APA075 [8]) and a Flash memory device [9]. While the support FPGA is the main configuration controller all the configuration files which are needed are stored on the Flash memory device. Figure 2 shows a conceptual schematic of the RCU motherboard indicating how the data readout path passes through the RCU main FPGA. High level housekeeping tasks are performed by the DCS embedded computer. It is connected to the reconfiguration network and the RCU main FPGA through the DCS bus and offers remote access to the reconfiguration controller. However, it is not directly involved in the reconfiguration process nor the readout data path.
The reconfiguration solution is based on partial reconfiguration [10] which allows to read back a subset of the configuration memory checking for corrupted bit locations. If an erroneous bit value is detected, it is corrected by rewriting the correct value to this bit location. The smallest portion of data that can be read from or written to the configuration memory in one operation is called a frame. Therefore, the reconfiguration solution described above is referred to as Frame by frame Readback, Verification and Correction (FRVC). Running one cycle of FRVC includes sequentially reading back all frames in the configuration memory once. FRVC can be run as a single cycle or in a continuous mode. The time to execute one cycle of FRVC has been measured to 150 ms while the rewriting of one frame of data has been measured to 180 µs [6]. This gives a continuous operating frequency of approximately 7 Hz. An important feature of this solution is that it can run in the background without interrupting the operation of the RCU main FPGA. This will allow to correct SEUs in the configuration memory of the RCU main FPGA during normal operation. Scrubbing [11] is an alternative to the FRVC process. Like FRVC it also reconfigures the RCU main FPGA without interrupting the operation of the device. However, compared to FRVC it reconfigures the full device in one operation and can neither address individual frames nor count the number of SEUs. FRVC is therefore the preferred correction mechanism which at the same time offers online monitoring of SEUs. Figure 3 shows a conceptual block diagram of the configuration controller in the RCU support FPGA. During the FRVC process each configuration frame is read back from the RCU main FPGA and verified bit by bit with the original frame stored on the Flash memory. If different, the support FPGA reconfigures only the frame containing the corrupted data. A frame in the Xilinx Virtex-II Pro FPGA contains 424 bytes of data organized in a 1-bit vertical column from the top to the bottom edge of the FPGA. For each FPGA 936 frames are read back and checked during one cycle of FRVC. These are the frames connected to the configuration logic blocks, the input/output blocks, global clock resources and the interconnects of the embedded user memory.

Procedure
A dedicated SEU counter has been implemented to keep track of the number of detected and corrected SEUs. The SEU counter is continuously checked by the DCS embedded computer and whenever a change is detected in this or other reconfiguration registers, the new values are time stamped and stored in the experiment database. Effectively the database therefore stores the cumulative value of SEUs until a reset is requested by the DCS embedded computer. Due to the limited resources (number of registers) in the RCU support FPGA, only the id of the the last corrupted frame is stored. However, with the relatively low rate of SEUs there is a very low probability of having two or more SEUs between consecutive updates of the database. It is therefore possible to also study the frame distribution of the SEUs.

Measurement period and conditions
The SEU measurement results presented in this paper are based on analyzing data collected during pp collisions at a center-of-mass energy of √ s = 7 TeV in the period from May to August 2011. This period corresponds to LHC fills 1783 to 2010. Out of these fills only those where the integrated luminosity exceeded 1 nb −1 during stable beam conditions have been included in the analysis. The source of the luminosity information is the LHC Programme Coordination (LPC) web pages [12] which present the delivered luminosity as reported by the experiments after each fill containing some stable beams [13].

Luminosity dependence
During the measurement period from May to August, a total of 1552 SEUs were detected and corrected in the configuration memory for all of the 216 RCU main FPGAs. The total delivered luminosity for the corresponding analyzed fills was 2840 nb −1 . A comparison between the evolution of the luminosity and the number of SEUs is shown in figure 4. The two graphs follow each other remarkably well and strongly demonstrates that the detected SEUs are in fact caused by radiation and not due to e.g. some other electrical problems. Assuming a linear dependence, figure 5 shows the number of detected SEUs plotted as a function of the integrated luminosity per fill. For the lower values the dependence is not very clear and probably attributed to poor statistics. However, with increasing number of SEUs per fill, and consequently improved statistics, the linear dependence is also improved. The linear fit indicates a proportionality constant of approximately 0.5.
Depending on how a given configuration memory cell is utilized by the system, an SEU may or may not result in a detectable malfunction of that system. In fact, only 1 out of every 10-40 configuration memory bits are utilized in a typical design [14]. When studying the impact of SEUs, Xilinx refers to this number as the single event upset probability impact (SEUPI). Dividing  the SEU rate by the SEUPI number will consequently give the expected number of functional failures. This scaling factor, which is highly dependent on the implemented design, can be derived through accelerated beam tests or fault injection test. For the RCU main FPGA irradiation tests have only been performed for a simplified test design like a shift register [6]. However, fault injection is currently performed to study the design which is now installed and fully operational in RCU main FPGAs of the TPC detector. The result of this study is expected to be published in the near future when the analysis is completed. In cases where the SEUPI scaling factor may be unknown, Xilinx recommends a conservative factor of 10 [15]. The measurement results presented in figure 5 therefore shows that as runs are reaching integrated luminosities in the order of 100 nb −1 and beyond, radiation induced interruptions in the data readout can be expected. In figure 6(a) the histogram shows how the occurrence of SEUs are distributed within a fill. As each fill has a different length, the time stamp of an SEU is normalized to the length the corresponding fill. What seems to be a random distribution in time is a good indication of stable beam conditions during the fills. With the instantaneous luminosity slightly decreasing during a fill, one could maybe expect a similar trend for the SEUs. However, a significant improvement of statistics is probably needed to see this trend. In section 3.1 it was mentioned that due to the low SEU rate it is also possible to study how the SEUs are distributed between the frames of the configuration memory. The result for the measurements presented in this paper is shown in 6(b) where the SEUs seems to be equally distributed between the frames. This is also observed during irradiation tests [16] and is expected if the FPGA is exposed to a uniform radiation source. As some frames contain bit locations with non-existing memory [10,16], e.g. frames close to the embedded PowerPC, a perfect equal distribution is not expected. Again, this is not visible in figure 6(b) due to insufficient statistics.

Spatial distribution
The spatial dependence was then analyzed for the full data set and the results are shown in figure 7. Each point is normalized to the total number of SEUs detected during the measurement period. The top left plot indicates a radial dependence where the measured number of SEUs (open squares) is decreasing when moving from the innermost partitions to the outermost partitions. Assuming that SEUs are induced by high energy hadrons, a good agreement is found when comparing the measurements to Monte Carlo simulation results from [16]. The green triangles show the relative radial distribution of the high energy hadron fluence as obtained from simulations of PbPb collisions. As for the number of SEUs, each point is normalized to the total fluence summed over all partitions. A comparison of absolute values has not been attempted since simulations of pp collisions were not performed in [16]. Assuming that the fluence scales with the number of nucleons/participants in the collisions, the spatial distribution is expected to be roughly similar. Thus the relative comparison shown in the top left plot of 7 is reasonable The radial dependence measured for the SEUs in the configuration memory of the RCU main FPGA is also both analytically predicted and measured in [17]. In [17] SEUs were measured in  the user memory of ASICs mounted on each front-end card of the TPC detector (see figure 1). For the azimuthal dependence, the lower plot in figure 7 indicates no correlation between the number of SEUs and the sector position. Again, this is similar to the results observed in [17]. On the other hand, the asymmetry measured between the A-and C-side and shown in the top right plot of figure 7, is opposite to what is predicted by simulations and also measured in [17]. The reason for this is not yet fully understood and needs further investigation.

Conclusion
This paper have presented the first quantitative measurements of single event upsets in the configuration memory of the FPGA in charge of data readout for the TPC detector. A linear dependence was found between the number of single event upsets measured and the integrated luminosity delivered to ALICE during stable beam operation. This result can be used to predict expected single event upset rates for future runs with similar operating conditions. When combined with the results from the ongoing fault injection studies, this will allow us to better foresee the effect of the single event upsets, and thus more accurately predict how often the data readout may be interrupted due to radiation. As the operation of the LHC continues more SEU data will be collected and added to improve our analysis.