Design and test of the electron Feature EXtractor (eFEX) pre-production module for the ATLAS Phase-I Upgrade

Weiming Qian
On behalf of the ATLAS Collaboration
Outline

➢ Introduction:
  – ATLAS Phase-I Upgrade

➢ Architecture:
  – ATLAS Level-1 Calorimeter Trigger at Phase-1

➢ Electron Feature Extractor (eFEX):
  – Algorithms
  – Pre-production board design
  – Firmware design and integration test

➢ Summary
Introduction

- ATLAS Phase-I upgrade
  - LHC luminosity $\rightarrow 2 \times 10^{34} \text{ cm}^{-2}\text{s}^{-1}$
    - Event pile-up $\mu=80$
  - But Level-1 trigger constraints remain:
    - Total trigger rate $\leq 100\text{kHz}$
    - Latency $\leq 2.5\mu\text{s}$

- Strategy for Level-1 Calorimeter Trigger
  - Use higher granularity data from calorimeter
    - $\rightarrow$ Multi-Gb/s serial links
  - More complex algorithms, e.g.
    - Shower-shape information
    - Event-wide information
    - Large-radius jets
ATLAS Level-1 Calorimeter Trigger

- Legacy & Phase-1 systems to run in parallel during commissioning in 2021
ATLAS Level-1 Calorimeter Trigger

- Legacy & Phase-1 systems to run in parallel during commissioning in 2021

New at Phase-1 Upgrade

Legacy & Phase-1 systems to run in parallel during commissioning in 2021
New Hardware Level-1 Trigger Modules

Fibre Optical eXchange (FOX)  electron Feature EXtractor (eFEX)  global Feature EXtractor (gFEX)

Tile Read Extension board (TREX)  HUB-ROD  Jet Feature Extractor (FEX)

FEX Test Module (FTM)  L1Topo
Focus of this talk

TWEPP2019, Santiago Spain
eFEX Algorithms

- $e/\gamma$ algorithms well defined
  - Shower-shape & isolation distinguish $e/\gamma$ hits from dominant jet background

$w_s^2 = \frac{\sum E_i (i - i_{\text{max}})^2}{\sum E_i}$

$R_\eta = \frac{E_{\text{clu}}}{E_{\text{env}} + E_{\text{clu}}}$

$f_3 = \frac{E_{L3}}{E_{\text{Tot}}}$

$R_{\text{Had}} = \frac{E_{\text{Had}}}{E_{\text{Tot}}}$

EM shower in LAr

LAr supercell structure in a trigger tower of $\Delta \eta \times \Delta \phi = .1 \times .1$

Tile trigger tower
The seed finder searches for local maxima

All 36 comparisons are done in parallel
  - In case of same value
    - top wins over bottom and left wins over right

Calorimeter Cells

Comparison Table

<table>
<thead>
<tr>
<th></th>
<th>S₁</th>
<th>S₀</th>
<th>S₁</th>
<th>S₂</th>
<th>S₃</th>
<th>U₁</th>
<th>U₀</th>
<th>U₁</th>
<th>U₂</th>
<th>U₃</th>
<th>D₁</th>
<th>D₀</th>
<th>D₁</th>
<th>D₂</th>
<th>D₃</th>
<th>D₄</th>
</tr>
</thead>
<tbody>
<tr>
<td>S₁</td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>S₀</td>
<td></td>
<td>x</td>
<td>o</td>
<td>o</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
</tr>
<tr>
<td>S₁</td>
<td></td>
<td>x</td>
<td>o</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
</tr>
<tr>
<td>S₂</td>
<td></td>
<td>x</td>
<td></td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
</tr>
<tr>
<td>S₃</td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
</tr>
<tr>
<td>S₄</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>x</td>
<td></td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>x</td>
</tr>
<tr>
<td>U₁</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>ud</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>U₀</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>ud</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>U₁</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>ud</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>U₂</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>ud</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

X = comparison
0 = non-adjacent comparison to choose between 2 local maxima
ud = comparison to find highest-energy neighbour in φ
e/γ algorithm f/w implementation

- For single 3x3 algorithm window

Each shower-shape algorithm resolved to 2 adders + multiplier + comparators
**e/γ algorithm performance**

- EM trigger rate reduced by a factor of ~3, or
- The threshold lowered by ~7GeV
  - Compared at reference points of 20KHz

(Figure 8a from LAr Phase-I TDR)

(Figure 11 from LAr Phase-I TDR)
eFEX subsystem partitioning

- 24 eFEX modules in total covering $|\eta| \leq 2.5$
  - 3 eFEX modules cover a $\eta$-strip

- Each eFEX module
  - Produces trigger candidates for area $\leq 1.7(\eta) \times 0.8(\phi)$
  - Which requires data from area of $1.8(\eta) \times 1.0(\phi)$
    - (as algorithms examine neighbouring cells)

ATLAS calorimeter

- Area for which trigger candidates are produced
- Area of data examined by algorithms
- Extra area in LAr + Tile carried within fibres, but not used by algorithms
- Extra area in Tile carried within fibres, but not used by algorithms
eFEX Module

- ATCA form factor
  - 4 Xilinx Virtex-7 FPGAs (XC7VX550T) - algorithm
  - 1 Xilinx Virtex-7 FPGA (XC7VX330T) - control + readout
  - 156 optical inputs @11.2G - (13 MiniPODs)
  - 48 optical outputs @11.2G – (4 MiniPODs)
  - 100 on-board high-speed fan-out buffers
    - Algorithm environment data-sharing
  - 450 differential track-pairs @11.2G
  - 14 electrical links @ 6.4G
    - TTC and Readout over ATCA backplane

- Final Design Review successful (Dec 2017)
  - Major functions demonstrated successfully
    - Prototype eFEX module power ~300W

- Pre-production eFEX module designed and manufactured
eFEX PCB design method

- **eFEX challenges**
  - Very high density
  - Very high speed
  - Very complex signal mapping
  - Very long signal tracks
  - Very high power consumption
  - Cooling

- **Systematic method for PCB design**
  - Time-Domain Reflection/Transmission (TDR/TDT)
    - Measuring channel impedance and loss
  - Eye diagram test
    - Measuring link margin
  - Bit-Error Rate (BER) test
eFEX Signal Integrity simulation

3D via modelling

Single channel S-parameter extraction

Eye diagram simulation

Single channel simulation and optimization flow for 10G+

Parallel crosstalk channels

Multi channel S-parameter modelling

Crosstalk simulation

Multi-channel crosstalk simulation and optimization flow for 10G+
eFEX Power Integrity simulation

- High-current power plane optimization
  - Increase copper weight on power planes
  - Optimise DC-DC placement and breakout area

**Prototype** VMGTAVCC Power Plane (½ oz.)

**Pre-production** VMGTAVCC Power Plane (1 oz.)
eFEX Power Integrity simulation

- Power plane split optimization
  - Repartition the power plane

**prototype**
VMGTAVTT Power distribution

- **pFPGA1**
- **pFPGA2**
- **cFPGA**

80mV
MGTAVTT@20A

**pre-production**
VMGTAVTT Power distribution

- **pFPGA1**
- **pFPGA2**
- **pFPGA3**
- **pFPGA4**

20mV
MGTAVTT@20A
eFEX PCB TDR test

- eFEX PCB has embedded coupon
  - Check PCB impedance before assembly
    - eFEX pre-production PCB impedance are within spec
      - Taking into account measurement error (a few Ω)

Best layer (104Ω)  
Worst layer (113Ω)
eFEX high-speed link tests

- LAr Trigger prOcessing Mezzanine (LATOME) → FOX → eFEX
  - optical input to eFEX @ 11.2G

- eFEX ↔ FTM
  - optical input/output @ 11.2G
  - electrical input/output @ 6.4G via ATCA BP

- eFEX → L1Topo
  - Optical output @ 11.2G

- eFEX ↔ HUB-ROD
  - electrical input/output @ 6.4G via ATCA BP
eFEX link test results

- BER < $10^{-14}$
  - Very good margin

Typical eye diagram on eFEX Virtex-7 FPGA

Typical eye diagram on L1 Topo Ultrascale+ FPGA

Open Area of Eye Scans @ 11.2 Gbps on eFEX

Open Area of Eye Scans @ 11.2 Gbps on L1 Topo
Optical attenuation link tests

- Complex Fibre Optical eXchange (FOX)
  - Between LATOME/TREX and FEXes
    - Mapping validated with eFEX
    - Many optical connections
      - Optical attenuation up to 5dB
- Optical attenuation tests
Optical attenuation test results

- Link optical power margin ~7dB
  - No error observed with 7dB attenuator
    - BER < $10^{-13}$ (30-minute tests)
  - Errors observed with 10dB attenuator
    - BER limited to ~ $10^{-12}$
eFEX IPBus and DCS (Detector Control System)

- IPBus firmware implemented in FPGAs
  - Soak test very reliable
    - No packet lost under normal operation

- CERN IPMC (Intelligent Platform Management Card for ATCA module)
  - ATCA shelf address setting
  - V/I/T monitoring for DCS

```
[atlun01] ~$ ipmitool -I lan -H shelf1.pp.rl.ac.uk -A NONE -t 0x8a sensor

Hot Swap       0x0   discrete  0x1080 na  na  na
IPMB Physical  0x88  discrete  0x0880 na  na  na
Version change 0x0   discrete  0x0080 na  na  na
Internal temp. 29.000 degrees C ok  na  na  na
LMB2 internal  28.000 degrees C ok  na  na  na
LMB2 FPGA temp 49.000 degrees C ok  na  na  na
QBDW033 Vinput 48.250 Volts   ok  na  40.000 na
QBDW033 Voutput 11.640 Volts   ok  na  1.112 na
QBDW033 Ioutput 7.568   Amps   ok  na  na  na
QBDW033 temp   40.485 degrees C ok  na  na  na
MDT040 Vinput  11.983 Volts   ok  na  5.839 na
MDT040 Ioutput 1.892   Amps   ok  na  na  na
MDT041 Vinput  12.045 Volts   ok  na  4.543 na
MDT041 Ioutput 1.902   Amps   ok  na  na  na
```
eFEX MGT clock domain

- eFEX real-time path requires minimum and deterministic latency
  - No Virtex-7 MGT internal elastic buffers
  - Each MGT QUAD uses GT0_RXOUTCLK to drive its fabric user clocks
    - Resulting in too many clock domains in a single processor FPGA

- New MGT clocking scheme in eFEX processor FPGA
  - Single MMCM_clk280 to drive the fabric user clocks for all MGTs
    - eFEX and upstream/downstream modules are running off the same clock
  - It works with following reset sequence
    - reset MGT → reset MMCM (repeat if needed)
eFEX real-time f/w

- Critical parts done and well tested
- Real-time latency measured with FTM
  - \( \frac{512-377}{7} - 4 \) (20m fibre) - 2.4 (FTM MGT) = 12.89 BCs
  - eFEX latency budget: 13.5 BCs
eFEX readout f/w

- Most parts done
Segments of readout chain tested successfully

- Tested @ 1MHz L1A
eFEX f/w management

- eFEX firmware repository is stored on Gitlab
- A collaborative HDL management tool
  - A set of TCL (Tool Command Language) scripts manage firmware repository
  - A Gitlab Continuous Integration script automatically synthesizes and implements HDL projects when a Git Merge Request is opened


https://pos.sissa.it/343/142/
eFEX pFPGA utilization

- Processor FPGA utilization
  - Fabric logic utilization < 50%
    - Comfortable spare capacity
Summary

- The eFEX is a complex high-density high-speed module.
- The prototype has been tested extensively with very good results.
- Many optimizations, e.g. in power distribution, have been made for the pre-production design.
- The firmware design for eFEX is well advanced.
- eFEX firmware development is well managed and coordinated with custom tools.
- eFEX is well positioned to take part in the oncoming Calorimeter-Trigger slice test.
Tau algorithm - seeding

- Needs to prevent double counting
- Most studies assume the same seeding as e/γ
  - Layer 2 supercell = ET maximum
  - 1 seed per tower
- Efficient for Tau, but two drawbacks
  - Insensitive to potential long-lived particles
  - One tau can produce multiple seeds
- Could address both by requiring central tower (EM+Had) = ET maximum
  - Some algorithms will still require most energetic cell in central tower to be identified
Many cluster definitions

- Performance comparable to Run 2
- Generally larger clusters give sharper turn-on
- Try to find smaller cluster with comparable performance
  - Potentially less pileup sensitivity
- Complexity of implementation
Tau algorithm – jet rejection

- Width of deposit is main jet discriminant
  - Preliminary studies focus on layer 2
    - Layer 1 also offers fine granularity
  - Mainly $R_\eta$-like variables
    - Ratio of ET in small/large regions
      - **TDR isolation** (from TDR)
        - $EM_2(3 \times 2) / EM_2(9 \times 3)$
      - **Oregon isolation** (from Oregon group)
        - $EM_2(3 \times 2) / EM_2(12 \times 3)$
      - **Nagoya isolation**
        - $EM_2(1 \times 2 \text{ or } 2 \times 1) / EM_2(12 \times 3)$
  - Results so far comparable to Run 2 rejection (which is to say not spectacular)
    - ~15% rate reduction for ~5% efficiency loss
eFEX PCB
eFEX preproduction module