Tau lepton trigger and identification at CMS in Run-2

Abstract In the context of LHC Run-2, the Compact Muon Solenoid (CMS) detector was upgraded. In particular, the CMS trigger system and particle reconstruction were improved. The CMS experiment implements a sophisticated trigger system composed of a Level-1 trigger, instrumented by custom-designed hardware boards, and software layers called High-Level-Triggers (HLT). A new Level-1 trigger architecture with improved performance has been installed and is now used to maintain the thresholds used in LHC Run-1 in the more challenging conditions experienced during Run-2. Optimized software selection techniques have also been developed at the HLT. The hadronic τ reconstruction algorithm has been modified to better account for the π 0 (s) from τ decays. In addition, improvements to discriminators against QCD-induced jets and electrons were also developed. The results of these improvements are presented and the validation of the τ identification performance is shown.


Introduction
The τ-lepton 1 is a complex object used in the context of many analyses at CMS: study of the 125 GeV Higgs boson in the H → ττ channel, hunt for extra bosons as predicted by the Minimal Supersymmetric Standard Model (MSSM), search for lepton flavor violation in Higgs boson decays (H → µτ, eτ), and searches for exotic particles such as the graviton, radion, Z or W . Therefore, it is necessary to develop specific and powerful techniques to efficiently select and identify the τ h at CMS.
In 2016, during the Run-2, the LHC has exceeded its nominal performance, reaching a luminosity of 1.5 × 10 34 cm −2 · s −1 and an average simultaneous number of collisions per bunch crossing (pileup) up to 52. This poses a certain number of challenges for the detector that the CMS Collaboration anticipated and addressed.
The design and implementation of an efficient τ lepton trigger has always represented a challenge in highenergy physics, especially in intense collision environments such as that of the LHC. The CMS trigger [1] is organized in two levels: the electronics-based Level-1 (L1), which reduces the rate from 40 MHz to ∼100 kHz and has to take the decision within ∼3.8 µs, and a software-based High-Level-Trigger (HLT) which subsequently reduces the rate from ∼100 kHz to ∼1 kHz. The L1 trigger has been fully replaced and upgraded between 2015 and 2016 [2], and has been successfully used for the CMS data taking in 2016. The trigger algorithms and performance, with focus on the selection of τ h , are presented in this document.
The reconstruction of τ h is key in the success of τ-related searches at CMS. Subsequent identification techniques of τ h are implemented to reject fake τ h candidates from QCD-induced hadronic jets, electrons, and muons. The fundamental concepts and recent developments in the reconstruction and identification of τ h are described in this document.
The 2016 proton-proton data set corresponds to an integrated luminosity greater than 40 fb −1 .
2. The τ h -trigger at CMS

Level-1 Trigger
The reconstruction and selection of L1 τ h -trigger objects is based on calorimetric information. At L1 trigger, the basic reconstruction element is the Trigger Tower (TT), which has an extension of roughly 0.087×0.087 in η(pseudorapidity)×φ(azimuthal angle) and, depending on the position, sums up the energy from the electromagnetic calorimeter (ECAL), hadronic calorimeter (HCAL) or hadronic forward calorimeter (HF) 2 . With respect to the previous system, the key conceptual changes to the L1 calorimeter trigger are: all the data from a single event are streamed into one processing card, which allows a global view of the detector and pileup subtraction at L1; and the clustering of the TT energies into L1-objects (e/γ, τ h , etc.) is dynamic. In addition, the flexibility of the system allows performing calibration at the TT and L1-object levels, while enabling powerful identification techniques such as isolation to be implemented.
The L1 τ h candidates are built from a collection of TT, which are dynamically clustered around a local energy maximum ("seed TT") to capture the τ h -footprint and minimize the effect of pileup. Two neighboring clusters are allowed to be merged (about 15% of genuine τ h are merged) in order to specifically reconstruct τ → h ± π 0 (s) and τ → h ± h ∓ h ± decays (h ± denotes a charged hadron). To improve the energy response and resolution, the L1-τ h candidates are calibrated using coefficients function of the E L1 T , η L1 , merging and electromagnetic fraction of the L1 candidate. The energy response at L1 with respect to offline for τ h candidates of 2016 data [3] is represented in Fig. 1.
The isolation energy of a L1 τ h candidate is defined as the sum of the transverse energies of all TT in a 6(η)×9(φ) TT window around the cluster seed, to which the transverse energy of the L1 τ h cluster(s) is subtracted. To control the large rate of fake L1 τ hcandidates from QCD-induced jets, an upper cut on the L1 τ h isolation energy is applied. The cut depends on the E L1 T , η L1 and an estimator of pileup called n TT , which is defined as the number of TT with non-zero energy in 2 The size of the TT evolves slightly with η. a central-η ring of 8 TT. The isolation energy cut values are built such that the efficiency to select a genuine τ h is constant with η and pileup; the cut is relaxed with E L1 T to ensure 100% plateau efficiency. The efficiency of the L1 τ h -trigger, as function of p offline T , is shown for isolated seeds in Fig. 2 Figure 2: Level-1 trigger efficiency of isolated τ h -seeds (i.e. requiring the L1 τ h candidate to pass a cut on its isolation transverse energy) as a function of the offline τ h transverse momentum, for L1 E T thresholds of 28, 30 and 32 GeV.

High-Level-Trigger
The High-Level-Trigger (HLT) for τ h is based on the Particle Flow algorithm (PF), which performs a global event reconstruction, through an optimal combination of information from the CMS sub-detectors. A list of stable particles (e/µ/γ/charged and neutral hadrons) is issued by the PF, which are in turn combined to build higher level objects like jets, τ h and missing transverse energy.
At HLT, τ h are identified combining charged and neutral PF candidates included in jets reconstructed using the anti-k T algorithm with a cone of size ∆R = 0.4. Up to three leading charged hadron candidates within a narrow cone (typically ∆R = 0.1) around the direction of the seeding jet are assumed to be the signature of the τ decay into charged hadrons. Electromagnetic deposits in ECAL compatible with the π 0 characteristic of τ decays in h ± π 0 (s) are considered as part of the signature as well. To reduce the contamination from QCD-induced jets faking τ h , isolation selections are applied to HLT τ h candidates. The isolation is defined as sum-p T of nonsignal charged and neutral PF candidates that lie within the jet cone ∆R < 0.4. In order to reduce the dependency of the isolation on pileup, a subtraction term reflecting the average energy deposit per unit area from soft scattering (ρ * A e ff) is applied. The efficiency of the HLT τ h -trigger can be observed in Fig. 3, and was measured in 2016 data [4].

τ h -reconstruction
The offline τ h -reconstruction at CMS is called the "Hadron Plus Strips (HPS)" algorithm [5,6]. The τ h candidates are built from a combination of Particle Flow charged hadron(s) and π 0 → γγ candidate(s), the latter reconstructed with η − φ clusters of e/γ candidates ("strips"). Using a set of constraints on the mass and transverse momenta of the decay products of the τ, the HPS algorithm reconstructs the different τ h hadronic decay modes, notably the h ± , h ± π 0 (s), and h ± h ∓ h ± signatures. The HPS algorithm used in Run-2 is similar to the one of Run-1, with a few modifications. In particular, a new feature is the dynamic strip reconstruction: the size of the window in which strips are reconstructed is taken as function of the e/γ-candidate transverse momentum. This allows for a better containment of the τ h energy within strips and avoids the associated impact on the isolation energy which is used as a discriminating variable to identify τ h (see Sec. 3.2). Fig. 4 shows the strip size in the φ direction, as function of the e/γ p T for τ h from simulation.

τ h -identification
The τ h -identification at CMS is aimed at efficiently selecting the genuine τ h while rejecting the fakes from jets, electrons, and muons. To reject the contribution from jet fakes, the main handle used is called isolation, which quantifies the activity around the τ h -candidate: on average it is less for genuine τ h than for jets. The cutbased discriminator relies on the isolation-sum discriminating variable, which sums the p T of PF candidates (charged and neutral, excluding the ones that belong to the τ h itself) in a cone centered around the τ h . Corrections are applied to reduce the effect of pileup on the isolation. Several working points, each targeting a specific signal efficiency, are defined by imposing cuts on the isolation-sum variable. The mis-identification probability as function of the τ h -identification efficiency is shown in Fig. 5.  An improved multivariate analysis (MVA) based discriminant against jets has also been developed. The use of an MVA approach allows the information from isolation energy, τ-lifetime and shower shapes to be optimally combined. With respect to the cut-based approach to isolation, the MVA-based discriminant leads to a significant reduction of the mis-identification probability at constant signal efficiency, as demonstrated in Fig. 5.
The τ h -identification efficiency has been measured in Run-2 data, using three different data-driven techniques [6]: the Tag & Probe (T&P) method which relies on Z → τ µ τ h events; the Z → τ µ τ h /Z → µµ ratio method; and using W * → τν events. Here, the results from the first method are described. The T&P sample of Z → τ µ τ h candidates consists in events with at least one well identified and isolated muon (the tag) and one loosely-preselected τ h candidate (the probe). A set of other requirements are imposed to ensure a high purity in Z → τ µ τ h . The events are separated into pass and fail categories, depending on if the τ hcandidate passes the isolation discriminator under study. The signal (Z → τ µ τ h ) and background expected contributions are predicted using simulation, except the QCD multi-jet and W+jets backgrounds for which sideband techniques are used. The parameter-of-interest, the τ hidentification scale factor (defined as the ratio of observed data to simulation efficiencies to select τ h from the Z → τ µ τ h process) is then extracted from a simultaneous maximum likelihood fit in both categories, including systematic uncertainties on the signal and background normalization and shapes. In the fit, the lowpurity fail category helps to constrain the backgrounds.
The results for the measurement of the τ h -identification scale factors from the T&P method, using the visible mass of the τ h decay products as the fit variable, are summarized in Table 1. They show good consistency between the observed data and simulation.
Details about anti-electron and anti-muon discriminators can be found in Ref. [

Conclusion
In this note, the systems and techniques used to trigger, reconstruct and identify τ h at CMS in Run-2 have been presented. The L1 trigger system of the CMS detector has been recently fully upgraded. Excellent performance in the selection of τ h at L1 has been achieved in 2016. The High-Level-Trigger has also demonstrated good performance, thanks to its offline-like reconstruction of τ h . The offline τ h reconstruction and identification techniques have also been updated in the context of LHC Run-2. In particular, the development of advanced discriminators against jets has been pursued, leading to improved rejection of fakes.