VCSEL reliability in ATLAS and development of robust arrays

Vertical Cavity Surface Emitting Lasers (VCSELs) are used for optical transmission for all ATLAS detectors. There have been various problems with the reliability of the VCSELs used in some but not all of the sub-systems. This paper briefly reviews the design of VCSELs and the potential reliability issues. Some of the techniques used in Failure Analysis (FA) are explained with examples from studies of devices that failed during ATLAS operation. Estimates of the failure rates from the different systems are given and the implications for detector operations are discussed. The attempts to diagnose the causes of the problems are described. The mitigation strategies for the two sub-systems most seriously affected are described. Finally some conclusions about VCSEL reliability and the outlook for ATLAS operation will be given.


Introduction
VCSELs are used for for the readout and control for all ATLAS detectors [1]. There have been various problems with the reliability of the VCSELs used in some but not all of the sub-systems. For the most seriously affected systems there have been end of life (wearout) rather than random failures. For the liquid Argon calorimeter, on-detector VCSEL failures cause loss of data from a region of the detector and replacements can only be installed after a detector opening during a shutdown. Therefore for this system, even a low failure rate is a cause for concern. For the SCT and Pixel off-detector VCSELs, access for replacements is possible so that the loss of data so far has been negligible but the concern is for the long term operation of the detectors. Section 2 gives a brief review of VCSELs and discusses some of the potential reliability concerns. Section 3 describes some of the techniques used in VCSEL Failure Analysis (FA) and examples are given from studies of devices that failed during ATLAS operation. Estimates of the failure rates or mean time to failure and the nature of the failures from the different systems are given in section 4. The attempts to diagnose the cause(s) of the failures from FA and from controlled experiments are discussed. The mitigation strategies for the two sub-systems most seriously affected are described in section 5. The developments that have enabled some recent VCSEL arrays to have increased resistance to humidity are briefly discussed. Finally in section 6 some conclusions about VCSEL reliability and the outlook for ATLAS operation will be given.

Advantages and disadvantages of VCSELs
VCSELs are widely used in multimode optical links for distances up to about 1 km. The main advantages of VCSELs over Edge Emitting Lasers (EELs) are cost (because they can be tested at the wafer level) and low power consumption because of the low laser threshold. In order to lower the laser threshold an insulating layer funnels current through a small aperture. This provides current confinement and some wave guiding. This can be achieved with either a proton implant or oxide implant insulating layer. The first VCSELs were produced using proton implants but the smaller diameter apertures required for new faster VCSELs use oxide layers. As VCSELs are made from GaAs substrates they potentially suffer from the growth of Dark Line Defects (DLDs) [2]. The very high minority carrier concentrations in semiconductor lasers allows any defects to grow [3]. These defects act as centres for non-radiative transitions so that any defects in the active volume of the device will increase the laser threshold. These defects in the Distributed Bragg Reflector (DBR) mirror tend to grow slowly towards the active region but once they reach the active region they grow rapidly resulting in a rapid failure of the device. Reliable VCSELs therefore require very high quality starting material and extreme care over all environmental factors which can degrade the device. The first oxide VCSELs were designed for operation in a dry environment. Single channel VCSELs are usually mounted in hermetic TO can packages. Different methods for packaging arrays have been developed but there is no practical hermetic package for arrays. The array package used for the SCT and Pixel detectors is described in [5]. This led to poor reliability for VCSEL arrays in non-hermetic packages, while excellent reliability was obtained for single channel VCSELs. Developments by VCSEL manufacturers have led to successful arrays that can be operated with good reliability in non-hermetic packages (see section 5.2).

Failure analysis
In order to fabricate reliable VCSELs it is essential that causes of infant mortalities and random failures can be understood and be eliminated or reduced to an acceptable level. After random failures have been reduced to an acceptable level, accelerated aging tests are performed to validate the reliability of the VCSELs for many years of operation at standard operating conditions. A wide range of powerful diagnostic techniques are available and this paper gives a brief review of some of the methods that were used by ATLAS.

IV analysis
The simplest diagnostic test that can be performed is a measurement of the current versus voltage (IV) of the VCSEL. Forward IV curves for working channels are very uniform and damaged channels show a characteristic shift. However this shift gives no indication as to the cause of the damage. Reverse IV curves are very sensitive to Electro Static Discharge (ESD) or the very closely related effect of Electrical Over Stress (EOS) in that they show greatly enhanced reverse leakage below the breakdown threshold. Therefore this can be used as a powerful indicator for very low level ESD but it cannot be used to prove unambiguously that ESD was the cause of the damage.

Electroluminescence
Electroluminescence (EL) involves imaging the VCSEL when it is operated below laser threshold. A working channel should show a very uniform bright area, whereas damaged channels show significant dark areas where the minority carrier lifetime is reduced by the non-radiative recombination from the trap sites. An example of EL 1 for a dead VCSEL from the LAr detector [4] is shown in figure 1(a).

Electron beam induced current
Another technique which can resolve active areas that have been damaged is Electron Beam Induced Current (EBIC). In this technique the induced electron current is measured as the primary electron beam is scanned in a Scanning Electron Microscope (SEM). The resulting current is sensitive to trap sites, hence defects show up as dark areas. EBIC is not normally used for VCSELs because of scattering from metal contacts. However it has been used successfully by ATLAS. An example EBIC picture 2 from a dead channel from a VCSEL array 3 (used in the pixel Timing, Trigger and Control (TTC) system [5]) is shown in figure 1(b). As for all the first generation VC-SEL arrays, these devices were designed for operation in dry environments and are expected to be sensitive to humidity during operation [8] (see section 4.2 for further discussion of the effects of humidity during operation).

Electron microscopy
The most detailed information on the exact location of the damage comes from Electron Microscopy. In plan view Scanning Electron Microscopy (SEM), only surface or shallow damage can be studied. If the location of the damage can be determined from EL or EBIC a very thin cut can be made using a Focused Ion Beam (FIB) and the sample analysed used Scanning Transmission Electron Microscopy (STEM). This technique was applied to the channel analysed by EBIC ( figure 1(b). Defects can be seen clearly in the different layers of the device as illustrated by the STEM picture 2 in figure 2. The alternating layers of the DBR mirrors and the oxide implant can be seen clearly. Defects appear to spread out from the tip of the oxide and propagate to the active Multi-Quantum Well (MQW) layer. No structure is visible in the MQW because of the extensive damage.

Optical spectrum analysis
Optical Spectrum Analysis (OSA) is a powerful, non-destructive technique for determining early signs of damage in working VCSELs. VCSELs are single longitudinal mode devices but have multiple transverse modes. While most of the optical power is contained in a few modes, the loss of the much lower power higher order modes is a strong indication for low-level damage [6].

VCSEL reliability in ATLAS
Failure rates have been estimated for 11 different sub-systems in ATLAS. Some systems with a significant number of links have been operating for over 3 years with no failures. Other systems -4 -have seen a low rate of random failures. In the case of the VCSELs used for the readout of the Resistive Plate Chambers (RPC) [7], this was traced by the manufacturer to insufficient ESD control during wafer processing. The rate of failures in the LAr links is of particular concern as access to the devices requires a long process for opening of the End Caps. The most serious reliability problem was seen in the off-detector VCSELs used for TTC distribution for the SemiConductor Tracker (SCT) and Pixels. In this case the failures were accelerating with time and were therefore end of life failures rather than random failures.

Liquid Argon Calorimeter
The FA techniques used for the failed VCSELs from the Liquid Argon Calorimeter (LAr) showed very clear evidence of damage to the VCSELs but were unable to determine the cause. An alternative approach to determining the failure was also used in which some damage to the device was deliberately introduced to see if it would lead to device failure. The most common cause of field failures in VCSELs is ESD. Low level ESD pulses were applied and significant degradation was found using OSA after pulses of 300V were applied. VCSELs are known to be sensitive to humidity so another test was to intentionally expose a device to humidity. A hole was made deliberately in the TO can and the VCSEL was then run continuously in a normal lab environment with a relative humidity RH∼50%. A clear decrease in the spectral width (defined as the width of the spectrum which contains 95% of the optical power) with time was seen.
The VCSELs in the TO can are normally hermetic but there is a possibility that damage to some devices resulted in exposure to the outside atmosphere. As with the other FA techniques, these tests only indicate possible causes of damage but can not unambiguously determine the cause of damage. In addition to these tests OSA was used for all channels and a clear population of suspect channels with narrow spectra was observed as shown in figure 3.

SCT and Pixel TX VCSELs
The TTC distribution for the SCT and Pixel detectors is based on 12 way VCSEL arrays [5]. The Mean Time To Failure (MTTF) was less than 1 year. ESD was suspected as the primary cause because this is the most common cause of VCSEL field failures. It is also known that low level ESD pulses can cause delayed VCSEL deaths and we performed tests that confirmed that this was possible for the Truelight VCSEL arrays. The ESD precautions used in the assembly of the packages were reviewed and problems were identified. The ESD precautions were greatly improved and a complete production of new packages was performed. This resulted in a slight improvement but the devices still showed end of life failures with a MTTF ∼ 1 year.
Another common cause of oxide implant VCSEL failures is exposure to humidity [8] during device operation. Single channel VCSELs are usually packaged in hermetic TO cans, whereas arrays are usually exposed to the environment. High pressure steam is used to grow the oxide implant. Holes are deliberately introduced to allow this growth [9], therefore humidity in the environment can easily reach the oxide. This is of concern when the device is biased, because an electrolytic reaction can deplete the Ga in the GaAs and this can act as a source of point defects which can subsequently grow [8]. Different crates in the SCT had higher Relative Humidity (RH) than other crates and an inverse correlation between RH and MTTF was observed. Subsequent -5 - Figure 3. Spectral widths versus serial number for the ATLAS Liquid Argon Calorimeter VCSELs. The black squares refer to devices which failed after the OSA measurements. The orange squares refer to devices whose spectra were classified as narrow according to a slightly different definition of the spectral width. The tests also confirmed that epoxy does not produce a humidity tight seal. Elevated temperature testing in very low RH showed no failures, providing further indications that humidity was the cause of failure. In order to perform a more direct controlled test, samples of Truelight VCSEL arrays were operated at 10 mA drive current at 50% duty cycle in both normal lab air (RH in the range from 40% to 60%) and dry N 2 . OSA measurements were taken to monitor any changes in the -6 - spectral widths. The spectral width for these VCSELs was defined as the width of spectrum with a power greater than 30 dB below the peak value. This definition is chosen so as to be sensitive to the low power higher order modes in the spectrum. The results for VCSELs operated in air and dry N 2 are shown in figure 4. For the VCSELs operated in dry N 2 there were no significant changes in the spectral width (slopes consistent with zero within about 1σ ) For the devices operated in normal lab conditions (figure 4) there was a clear narrowing of the spectra with a significance in the range 3.8 σ to 9.8 σ ).

Liquid Argon Calorimeter VCSELs
For the LAr VCSELs the suspect channels with narrow spectra were replaced during the 2010/11 shutdown. Since then there have been no further failures. However a backup option with full optical redundancy has been developed in case there are further failures.

SCT and Pixel TX
These VCSEL are exposed to a normal humidity lab environment and the particular VCSELs used were not engineered to be robust in such an environment. It is very difficult to make completely hermetic packages for array VCSELs. However it is possible to manufacture moisture resistant VCSELs. Although the precise details are proprietary, the critical feature is to cover the holes in the Distributed Bragg Reflector (DBR) mirror used to allow steam to enter the device to grow the oxide implant. This can be done with a dielectric layer. Several companies have produced VCSELs which have passed tests at 85 • C/85% RH ("damp heat") and therefore would be expected -7 -to survive 10 years operation in normal lab humidity (see for example ref. [9]). Two different options are being pursued as well as the use of a compressor to provide low humidity (RH ∼ 15%) air for the racks containing the VCSELs.
AOC arrays 4 are being packaged in an identical way to the Truelight arrays. We have performed damp heat tests (85 • C/85% RH) on a sample of 31 channels and have seen no failures in 3200 hours of operation at a DC current of 8 mA. We have also performed accelerated aging tests on 60 channels for which we monitored the spectral width. We saw no significant changes in 2000 hours operation at 70 • C/85% RH.
ULM VCSEL arrays 5 are being assembled in the iFlame package 6 by Xloom. This uses VC-SELs which are also qualified for operation in damp heat. Two attractive features are that the design is based on a minimal modification to a commercial 4 channel transceiver and that it uses flip chip bonding for the VCSEL which is a less aggressive process than wire bonding which can damage the chip. A modified version of the PCB which holds the BPM-12 laser driver chip [5] has been designed and built which is compatible with the iFlame package. Full functionality has been demonstrated with a 4 channel prototype. The next tests will involve reliability testing of the 12 channel iFlames. In order to increase the yield we are also investigating the use of higher power VCSEL arrays from ViS. 7 .

Summary
Some ATLAS systems have seen no VCSEL failures after operation of more than 1000 channels for over 3 years, which is consistent with the manufacturer's claims for very high VCSEL reliability. Other systems have seen a low rate of random failures, which in one case was traced to ESD during wafer manufacture. Finally, one system has shown large, systematic failures. Many FA techniques are available and in some but not all cases these can uniquely identify the cause of failure. In general the high reliability of VCSELs can easily be compromised by damage from a range of environmental factors and extensive reliability testing of the final packaged product is essential.
In the case of the LAr OTx VCSELs, both ESD and humidity were identified as plausible damage mechanisms. OSA was demonstrated to be a very powerful diagnostic technique and has been used to screen out suspect devices. VCSEL arrays were found to be very sensitive to humidity. The SCT/Pixel TX VCSELs are being replaced with two solutions using arrays designed for operation in high humidity environments. As an additional safety measure, the humidity in the racks is being reduced.