The GBT-SerDes ASIC prototype

In the framework of the GigaBit Transceiver project (GBT), a prototype, the GBTSerDes ASIC, was developed, fabricated and tested. To sustain high radiation doses while operating at 4.8Gb/s, the ASIC was fabricated in a commercial 130 nm CMOS technology employing radiation tolerant techniques and circuits. The transceiver serializes-deserializes the data, ReedSolomon encodes and decodes the data and scrambles and descrambles the data for transmission over optical fibre links. This paper describes the GBT-SerDes architecture, and presents the test results.


Introduction
The GigaBit Transceiver (GBT) architecture and transmission protocol [1,2] has been proposed for data transmission in the future upgrades of the physics experiments at the Large Hadron Collider (LHC).Due to the planned higher beam luminosities, the experiments will require high data rate links and electronic components capable of sustaining high radiation doses.The goal of the GBT project is to produce the electrical components of a radiation hard optical link, whose architecture is shown in figure 1.One half of the link resides on the detector and hence in a radiation environment, therefore requiring custom electronics.The other half is in a radiation-free environment and can thus use commercially-available components.Optical data transmission is via a system of optoelectronics components produced by the Versatile Link project [3].The architecture incorporates timing and trigger signals, detector data and slow controls all into one physical link, hence providing an economic solution for all data transmission applications in a particle physics experiment.
The on-detector part of the link consists of a set of three custom integrated circuits that operate at 4.8 Gb/s.The ASICs are: the GBTIA, a transimpedance amplifier [4], the GBLD, a laser driver [5] and the GBTX, a serial transceiver ASIC that implements the GBT protocol and communicates with the detector frontend devices through a series of low speed serial links (80 Mb/s,

JINST 5 C11022
160 Mb/s or 320 Mb/s) [1].In order to be radiation hard to total dose effects, the ASICs will be manufactured in a commercial 130 nm CMOS technology to benefit from its inherent resistance to ionizing radiation.Details of the off-detector part of the GBT link can be found in [6].
Part of the radiation tolerance of the link is achieved through the transmission of a Forward Error Correction (FEC) code that allows the recovery from errors induced by Single Event Upsets (SEUs) in the photo-diode (PD), the GBTIA and the transceiver data-path.The implementation of the FEC code uses Reed-Solomon code (RS(15,11)) resulting in an error correction capability of up to 16 consecutive bits in error [7].To achieve DC balanced data transmission over the optical fibre the transmitted data is scrambled and descrambled before and after encoding and decoding respectively [7].The GBT transmits a 120-bit frame, with 32 bits reserved for the FEC code and 4 for a header.The link efficiency is therefore 84/120 = 70% and the available user bandwidth is 3.36 Gb/s (including 4 bits of slow-control data).
This paper describes the GBT-SerDes in which a subset of the GBTX functionality is implemented, namely the Serializer circuit (SER) the Clock and Data Recovery (CDR) circuits of the De-serializer (DES), the Reed-Solomon Encoder/Decoder (CODEC), the Scrambler/Descrambler (SCR/DSCR) and three phase shifters with 50 ps resolution.Further robustness against SEUs is provided by the use of Triple Modular Redundancy (TMR) for the low-speed digital logic and the most critical high-speed digital functions.The GBT-SerDes was thus built in the same technology foreseen for the fabrication of the GBTX in order to study the feasibility of the high speed serializer and clock recovery circuits.The following sections describe the details of the GBT-SerDes architecture and present the test results.

GBT-SerDes architecture
The architecture of the GBT-SERDES is shown in figure 2. It is broadly composed of a transmitter (TX) and a receiver (RX) section.The TX receives parallel data through the Parallel Input (Parallel In) interface.This data is then scrambled and RS encoded before it is fed to the SER where it is converted into a 4.8 Gb/s serial stream with the frame format described above.On the RX side, after serial to parallel conversion in the DES, the data is fed to the frame aligner, then RS decoded and de-scrambled before it is sent to the external parallel bus through the parallel output interface.To minimise prototyping costs, a time-division multiplexed 30-bit parallel bus was adopted for the input and output buses thus significantly reducing the silicon area since the ASIC is pad limited.
In the receiver and transmitter data paths, switches have been inserted between the functional blocks.These switches allow routing the data, at different levels of depth down the data path, from either the RX into the TX or vice-versa.This functionality can be used for evaluation testing of the ASIC but it also aims at providing a diagnostic tool for field tests of the optical link that will use the GBTX.Further self testing features are a Pseudo Random Bit Sequence (PRBS) generator in the TX.The PRBS generator can also be programmed to produce constant data or a simple bit count.The critical high-speed blocks (shaded regions in figure 2) are implemented using full-custom design techniques while the remaining circuits are based on the standard library cells provided by the foundry.The SER operation is based on the division of the 120-bit frame into three 40-bit words which are serialized at 1.6 Gb/s and then time division multiplexed to form the final 4.8 Gb/s serial bit stream.This architecture minimises the number of components operating at full speed.

Serializer
The PLL is designed to be tolerant to SEUs by using large transistor sizes on the VCO ringoscillator and consequently large currents are used on each stage of the VCO making the VCO signal phase less sensitive to the charge perturbations deposited by ionizing particles [8].Triple Modular Redundancy (TMR) is used in the feed-back divider of the PLL to mitigate SEUs.To 2010 JINST 5 C11022 obtain 4.8 GHz divider operation it was necessary to use dynamic logic on the feedback divider.However, even in the case of the dynamic logic, standard TMR circuits imposed too high a penalty on the operation frequency.To overcome this problem a new TMR flip-flop was developed [8].
Figure 3 (right) shows the serializer eye-diagram diagram at 4.8 Gb/s displaying a total jitter measured is 52.4 ps.

De-serializer
The de-serializer block diagram is represented in figure 4 (left).Its main features are: a Half-Rate Phase/Frequency-Detector (HRPFD), frequency aided lock acquisition and a constant-latency "barrel-shifter" [1].
In the CDR the HRPFD is used since it allows the operation of the CDR circuits at half the frequency and hence safer timing margins.Although the phase detector also detects frequency, its detection range is insufficient to cover all the process, voltage and temperature variations.To ensure that the CDR can always lock to the data it is thus necessary to pre-calibrate the VCO "freerunning" oscillation frequency.For that, the VCO has two control inputs: a coarse control input that allows the centring of the VCO oscillation frequency and a fine control input that is under the regulation action of the CDR HRPFD and charge pump, and allows the CDR circuit to lock to the serial data.The CDR VCO coarse voltage is derived from that of a reference PLL that is locked to the reference clock (figure 4, left).The VCOs in both PLLs are replicas of each other so that for the same control voltage they should have the same oscillation frequency.Due to statistical variations on the fabrication process this is however not exact, leading to a slight difference between the VCO frequencies.The CDR VCO fine control voltage is under control of the CDR loop and, due to the frequency detecting ability of the HRPFD, will be able to pull the CDR VCO frequency to that of the incoming serial data.Figure 4 on the right displays the jitter histogram for the recovered 40 MHz clock that displays a total jitter of 63.2 ps.

Phase-shifter
The purpose of the phase shifter is to generate multiple clocks as local timing references that are synchronous with the accelerator clock.The frequency and phase of the output clocks are digitally 2010 JINST 5 C11022 programmable.The output clock frequency can be 40 MHz, 80 MHz, or 160 MHz and the phase resolution is 50 ps independent of the frequency.
To handle multiple output frequencies and a phase resolution of 50 ps in a range of 25 ns (for the 40 MHz clock), the phase shifter consists of three components: a PLL, Coarse De-skewing Logic (CDL), and Fine De-skewing Logic (FDL).Figure 5 depicts the overall system block diagram.
From the 40 MHz reference, the PLL generates a 1.28 GHz clock (with a period of 781 ps) for both the CDL and FDL blocks.The divider in the PLL is made of a 5-bit binary counter whose outputs are used by the CDL to produce the right output clock frequency.Since the output clocks are synchronized with the 1.28 GHz clock, the PLL guarantees the synchronization of the output clocks with the reference clock.
In addition to perform frequency selection, the CDL shifts the clock by multiple periods of the 1.28 GHz clock.The output of the CDL block is therefore a clock of the specified frequency with the phase shifted by multiples of 781 ps.
The FDL is designed to phase shift the clock by a fraction of 781 ps.It is based on a modified DLL structure with a 16-stage voltage controlled delay line (VCDL).The 16 delay stages each create a phase-shift of 1/16 of 781ps (∼ 50 ps).This is achieved by feeding the CDL clock to the VCDL and connecting a delayed version of the CDL clock, delayed by one clock cycle of the 1.28 GHz clock, to the phase detector (PD).The other input of the PD is the VCDL output.This architecture sets the delay through the VCDL to be exactly one period of 1.28 GHz clock, thus the delay through each stage is ∼ 50 ps.A 16:1 MUX is used to select the desired phase.To generate multiple clock outputs simultaneously using this architecture, replicas of the CDL and FDL can be employed whereas one PLL can be shared among different channels.In the GBT-SerDes, three phase-shifting channels were implemented.
For the samples tested, the measured value of the differential non-linearity is 4.7 ps rms (21.5 ps PP) and that of the integral non-linearity is 4.3 ps rms (21.9 ps PP).

Test-setup
To fully test and characterize the GBT-SerDes, a test bench was built, based on a custom board (in blue in figure 6) hosting the GBT chip, an SFP+ module, a Cyclone III FPGA to generate, analyze or translate the two 30-bits busses into LVDS, and two high-density connectors to transfer these 30-bits busses to a Stratix II Gx Development Kit provided by Altera (in grey in figure 6).This board allows testing the GBT-SerDes in either transmitter or receiver mode, using the 4.8 Gb/s  and c) in figure 6.For the loopback test, the Stratix board is not required, and the custom board can be used standalone.These tests are controlled either via Ethernet UDP or I2C, and managed by a Graphical User Interface written in Java.

Summary
A serializer/de-serializer ASIC operating at 4.8 Gb/s has been designed and tested.The ASIC is custom made for the experiments of the future upgrade of the LHC accelerator.The ASIC architecture and circuits were to be to single event upsets and the ASIC is fabricated in an 130 nm CMOS technology in order to be hard to the effects of ionizing radiation.

Figure 3 (
Figure 3 (left) shows the block diagram of the serializer.It consists of a 120-bit input register, three 40-bit shift registers, a frequency synthesizer consisting of a Phase-Lock Loop (PLL) with an 120-bit feedback divider (composed of two stages, the first dividing by 3 and the second dividing by 40), and a 3:1 high speed multiplexer.The SER operation is based on the division of the 120-bit frame into three 40-bit words which are serialized at 1.6 Gb/s and then time division multiplexed to form the final 4.8 Gb/s serial bit stream.This architecture minimises the number of components operating at full speed.The PLL is designed to be tolerant to SEUs by using large transistor sizes on the VCO ringoscillator and consequently large currents are used on each stage of the VCO making the VCO signal phase less sensitive to the charge perturbations deposited by ionizing particles[8].Triple Modular Redundancy (TMR) is used in the feed-back divider of the PLL to mitigate SEUs.To