The LHCb silicon tracker: running experience

The LHCb Silicon Tracker is part of the main tracking system of the LHCb detector at the LHC. It measures very precisely the particle trajectories coming from the interaction point in the region of high occupancies around the beam axis. It covers the full acceptance angle in front of the dipole magnet in the Tracker Turicensis station and the innermost part around the beam axis in the three Inner Tracker stations downstream of the magnet. The Silicon Tracker covers a sensitive area of 12 m2 using silicon micro-strip sensors with very long readout strips. We report on running experience for the experiment. Focussing on electronic and hardware issues we describe some of the lessons learned and pitfalls encountered after three years of successful operation.


Overview
The LHCb Silicon Tracker [2] is part of the LHCb main tracking system and provides data for the region of high track densities. It is composed by the TT 1 tracking station in front of the main dipole magnet, covering the full acceptance angle of the experiment (from 10 mrad to 300 mrad), and the IT 2 stations after the magnet, covering the region directly around the beam pipe. The charge information of the silicon strip detectors, is amplified by the Beetle [3] readout chip and the analog signals are transmitted via copper cables to the Service Boxes, which are located outside the acceptance area. The Service Boxes contain the Digitizer Boards, on which the analogue signals from the Beetle front-end chips are digitized and encoded into a Gigabit data stream for transmission via VCSEL 3 diodes and 120 m of multi-ribbon optical fiber to the counting house. In the counting house, the optical ribbons can be directly connected to TELL1 [4] readout preprocessor boards equipped with two multi-channel optical receiver cards. The slow control system is implemented using SPECS [5] communication to the Control Board [2, pg. 63], also in the Service Boxes, and the clock and fast control signals are distributed from the TFC [6] modules via optical links. A full diagram of the system is displayed in figure 1.
Final results from the production of the detector modules and the readout electronic boards, as well as some other issues encountered during early commissioning, were already presented in TWEPP 2008 [7]. After running successfully for three years, we report on the most recurring problems and the lessons learned with particular focus on hardware and electronic issues as well as system operation and limitations.

VCSEL diodes
The silicon tracker uses VCSEL diodes in the readout system to transmit the data, from the detector to the TELL1 readout preprocessor boards, through optical fibers. In production, those VCSEL diodes were wave soldered to the digitizer boards. As a result, approximately 30% of the 2218 VCSEL diodes in the Silicon Tracker stations (1120 diodes in TT and 1008 diodes in IT), showed low output power. These were identified and replaced. [7] Since then some other VCSEL diodes had to be replaced due to ageing or malfunction (only 2%). The VCSEL diodes would stop transmitting and needed to be excluded from operation until they could be replaced. A single broken VCSEL means the loss of the data of one Beetle readout chip over 1120 in TT or one over 1008 in IT. The effect in the performance is difficult to quantify as -2 -  it depends on the location of the sensor and of the type of sensor itself, but the track reconstruction has not been affected by this problem. No significant correlation with temperature and humidity environmental conditions, peak luminosity or operation has been observed. The figure 2 shows the accumulated number of VCSELs replaced since the start of commissioning in 2009. Due to technical constraints, the VCSEL diodes are not exchanged in situ. In practice the whole digitizer board [2, pg. 22] is replaced. In the process, multiple cables need to be unplugged, sometimes including the optical fibre carrying the clock signal. The advantage of the easy access to TT digitizer boards played an important role, and the broken VCSEL diodes can be replaced in the technical stop periods, every two months. For IT, the broken VCSEL diodes can only be replaced during the long technical stop in Winter once a year, as the detector frames need to be open to access the digitizer boards. This exchange process is responsible for the clock synchronization problem which was observed since the start of operation but only understood this year, as explained in the next section.

Synchronization errors
Since the start of operation in May 2010, the slow-control monitoring for TT presented unphysical readings for temperature sensors and low-voltage regulators connected to the SPECS [5] mezzanine DCU [8]. The communication server was able to access most of the hardware connected to the SPECS slave mezzanine, but not the DCU, giving back an error code and a wrong reading. This was always recovered after a power cycle of the Control Board [2, pg. 63] done automatically by the control system safety tree [2, pg. 228]. Figure 3a shows this in detail.
It was not possible to reproduce the problem at will. As the problem was believed to affect only the DCU, an automatic reset was implemented, but it did not solve the problem. In March 2012 the problem appeared also in the IT subdetector with a much higher rate, as shown in figure 3b.
The start date of the problem in IT is unknown as it was masked by a software problem with the SPECS server but most likely started after January 2012 when multiple Digitizer Boards were exchanged to replace some dead VCSELs. The connectors of the optical fibers were not properly protected during the intervention and some dirt accumulated in them causing from time to time synchronization errors.
-3 -  Thanks to this, a relationship was established between the SPECS DCU problem and other alarms from devices connected through the SPECS as well as the occurrence of incomplete events (loss of data from the Beetle). The origin of the problem was traced back to the connection of the TFC fibers delivering the clock and the trigger commands to the Control Board.
The Control Board contains the SPECS slave mezzanine and other slow control hardware and is responsible for the clock fine delay and the decoding of the fast control commands. Unfortunately the design of the Control Board is such that it uses the clock delivered by the TFC as the clock source for all the hardware in the board (but the internal registers of the SPECS mezzanine), making very difficult to identify the clock distribution as the source of the problem.
Once understood, it was concluded the unphysical readings, and the incomplete events were all caused by a deficient connection of the TFC signal leading to synchronization errors.
An automatic recovery software workaround triggered by the SPECS DCU error code, was put in place on the 19th April 2012 to minimize the effect of the synchronization loss, as there is no easy access to the hardware until February 2013, when the connectors of the optical fibers will be carefully cleaned, hopefully fixing the problem.

HV peaks
Starting in October 2010, various TT detector biasing voltage channels had peaks in the current that sometimes reached 1 mA while taking data, and could reach 5 mA with no beam, leading several of these HV channels to trip.
When ramping the HV channels in the presence of beam, no current peak is observed until the instantaneous luminosity increases. A correlation effect can be observed in figure 4b. The current increases moderately fast just after an increase in the instantaneous luminosity, and decreases slowly. This effect is observed mostly in layers X1(TTaX) and X2(TTbX), as shown in figure 5a; those are the layers closest to the detector box walls.
When ramping the HV channels with no beam, as shown in figure 4a, a slightly different effect is observed. Just as the HV channel is switched on, the current ramps up very fast and then decreases also fast at the beginning and more slowly after the first few seconds until it finally -4 -   reaches its nominal value. The peaks observed are higher in this scenario. An improvement through training was observed sometimes, but was not conclusive. When ramping with no beam also some channels on U(TTaU) and V(TTbV) layers show similar behaviour. Some attempts and tests have been performed in order to try to fix or palliate the problem. Lowering the biasing voltage from 350 V to 300 V reduced the maximum current while lowering the temperature of the sensors from 5 • C to -15 • C really made things worse. This last test confirmed that the current is not in the bulk of the silicon sensor. Relying on the correlation observed between the luminosity ramp, an attempt was made to ramp this luminosity in small steps, with a substantial reduction of the value of the peak currents.
Some effort was put into shielding the sensors and the rails holding them. In a first stage, in May 2011, a kapton insulation was installed in the detector box front and back walls, confining the problem only to the X layers. A second version of this electrostatic shield was installed in May 2012, adding a foil of aluminium and mylar to the existing kapton insulation. The Aluminium side was then also connected to high voltage.
-5 - The problem disappeared in mid June 2012 (figure 5b) after a change in the beam filling scheme and a change in the operation mode of the detector; whereas before the HV was switched off just after the beam was dumped, it now stays fully powered until the start of the next fill.
The effect of these various attempts to fix the issue are clearly visible when looking at the evolution of the maximum current measured in all the physics fills, as shown in figure 6.
Unfortunately the origin of this problem is not yet understood, peaks can still be observed when LHCb attempts to run above 5 × 10 32 cm −2 s −1 instantaneous luminosity (the design luminosity for LHCb is 2 × 10 32 cm −2 s −1 ), but more testing on the matter has been considered unnecessary as it would incurr in loss of good physics data, and the detector is performing well.

Cooling reliability issues
Both IT and TT detectors have independent cooling systems to cool down the silicon sensors. Once a year some maintenance works are carried out on each of the cooling systems, implying the silicon sensors could not be cooled during that time. To avoid that and also in case one of the cooling plants would stop working, both systems were interconnected at the level of the C6F14 circuit. In this way, one cooling plant can cool both detectors for short periods of time. The design and development of this cooling solution was specific for the tracking stations of LHCb and CMS experiments, and both have developed similar problems.
The first problem appeared very early in IT cooling plant and although several modifications have been introduced in the system, as shown in figure 7, the problem persisted and appeared also in TT a year ago. The oil present in the system, needed for the compressor, accumulates in the evaporation heat exchanger from R404A to C6F14. After some time, the concentration of oil is so high in the heat exchanger, that the C6F14 starts to warm up. To correct this behaviour, regularly, from twice a month up to 3 times a week, the cooling plant has to be stopped for several minutes to let the oil sink out of the heat exchanger and be collected in the oil separator. During this short intervention, the C6F14 coolant still circulates on the detector side.
The second problem is located in the C6F14 circuit. There is some dirt mixed in the C6F14, whose origin is still not understood, although several hypothesis have been considered. Unfortunately the few filters placed along the circuit are not able to filter these impurities, and at some point it accumulates in the flow regulation valves, thus reducing the flow of C6F14 cooling the sensors. This appeared in both IT and TT, but the origin of the debris is believed to be located only in one of the circuits. Its presence in both systems is easily explained as a consequence of the cross-feeding interconnection between both C6F14 circuits. A clear correlation exists between the appearance of this problem in the TT C6F14 circuit and the interconnection of both C6F14 circuits for a variable amount of time. The problem was cured by closing and opening the flow regulation valves.

Conclusions
The Silicon Tracker is in good shape and has behaved extremely well during operation in tha last three years, but some minor issues arised and had to be dealt with. Diagnosis and action can sometimes be tricky and a good design and placement for the front end and control electronics can prove advantageous.
The VCSEL diodes failure were not foreseen but as most of the front end and readout electronics are outside the detector frame, this problem is not a major concern and we can recover the non working channels every two months for TT or every year for IT.
Sometimes while fixing a problem, some other problems may arise due to the human factor. The VCSEL exchange has been the source of the synchronization problems observed in the slow control system, now corrected by a software workaround but not completely fixed as the access to the electronics is not always allowed.
Some TT high voltage channels showed peaks. The problem is not yet understood but after some modifications the HV is behaving correctly in the presence of beam.