Springer : Quantum anomaly detection in the latent space of proton collision events at the LHC

Belis, Vasilis; Vallecorsa, Sofia; Pierini, Maurizio; Grossi, Michele; Tavernelli, Ivano; Reiter, Florentin; Barkoutsos, Panagiotis; Woźniak, Kinga Anna; Dissertori, Günther; Puljak, Ema

doi:10.1038/s42005-024-01811-6

Quantum anomaly detection in the latent space of proton collision events at the LHC - Belis, Vasilis et al - arXiv:2301.10780

\textit{Classical-quantum pipeline.} LHC collision data (simulation) are passed through an autoencoder for dimensionality reduction followed by the quantum anomaly detection models: \textit{unsupervised quantum kernel machine} and \textit{quantum clustering algorithms} (QK-means/QK-medians). Each jet contains 100 particles, each particle is described by three features $(\Delta\eta, \Delta\phi, p_T)$ where $\Delta$ represents a distance from the jet axis. Hence, a dijet collision event is described by 300 features. The quantum models are trained on Standard Model data and learn to recognise anomalies in unseen data. All models are evaluated by calculating the Receiver Operating Characteristic (ROC) curve and metrics appropriate for anomaly detection tasks, and are compared to their classical counterparts (see ``Evaluation of model performance" subsection in the Results).

\textit{The quantum circuits.} (\textbf{a}) Data encoding circuit $U(x)$, for a data point $x$, that implements the feature map of the unsupervised kernel machine and is used to define the quantum kernel $k(x_i,x_j) = \left|{\braket{0|U^\dagger(x_i) U(x_j)|0}}\right|^2$, where $G(\theta,\phi,\lambda)\in\text{SU(2)}$ is a universal 1-qubit gate, and $x_i$, for $i=0,1,\dots, n$, denotes the elements of the input feature vector $x$. The entanglement gates correspond to CNOT gates. (\textbf{b}) Quantum distance calculation circuit used to compute the similarity between an input sample and a cluster center in the QK-means algorithm. The prepared $\ket{\psi}$ and $\ket{\phi}$ states depend on the input feature vectors (Methods).

\textit{The quantum circuits.} (\textbf{a}) Data encoding circuit, $U(x)$, that implements the feature map of the unsupervised kernel machine for a data point $x$, where $G(\theta,\phi,\lambda)\in\text{SU(2)}$ is a general 1-qubit gate, and $x_i$, for $i=0,1,\dots, n$, denotes the elements of the input feature vector $x$. (\textbf{b}) Quantum distance calculation circuit used to compute the similarity between an input sample and a cluster center in the QK-means algorithm.

\textit{Performance evaluation results.} Each subplot depicts the ROC curve calculation on test data for each model and each parameter of interest. The figure is structured as follows. Rows represent the model evaluation as a function of the parameters of interest: (\textbf{a}) new physics anomalous signatures, (\textbf{b}) latent space dimension, and (\textbf{c}) the number of training samples. Columns correspond to the different anomaly detection models. A 5-fold testing is performed to assess the statistical significance of the results, using a test dataset of $10^5$ samples where half of the samples are anomalies and half are \ac{SM} events.

\textit{Performance evaluation results.} Each subplot depicts the ROC curve calculation on test data for each model and each parameter of interest. Rows represent the model evaluation as a function of the parameters of interest: (top) new-physics anomalous signatures, (middle) latent space dimension, and (bottom) the number of training samples. Columns correspond to the different anomaly detection models. A 5-fold testing is performed to assess the statistical significance of the results and of the differences in performance, using a test dataset of $10^5$ samples where half of the samples are anomalies and half are \ac{SM} events. The best-performing classical kernel model is the one equipped with the Radial Basis Function (RBF) kernel. The uncertainty bands represent one standard deviation and are drawn only for the TPR range of interest. For smaller values of TPR, the uncertainties increase due to low testing statistics and the bands are omitted for readability purposes.

\textit{Performance evaluation results.} Each subplot displays the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) on test data for each model and each parameter of interest. Rows represent the model evaluation as a function of the parameters of interest: (\textbf{a}) new-physics anomalous signatures, (\textbf{b}) latent space dimension, and (\textbf{c}) the number of training samples. Columns correspond to the different anomaly detection models. A 5-fold testing is performed to assess the statistical significance of the results and of the differences in performance, using a test dataset of $10^5$ samples where half of the samples are anomalies and half are \ac{SM} events. The best-performing classical model is a kernel model equipped with the Radial Basis Function (RBF) kernel. The uncertainty bands represent one standard deviation and are drawn only for the True Positive Rate (TPR) range of interest. For smaller values of TPR, the uncertainties increase due to low testing statistics and the bands are omitted for readability purposes.

\textit{Background rejections}. Comparing the background rejection $\varepsilon_b^{-1}(\varepsilon_s)$ of the unsupervised quantum and classical kernel machines, for signal efficiencies $\varepsilon_s=0.6,0.8$. In table (\textbf{a}) the background rejection is computed for a fixed latent dimension of 8 and the number of training samples is varied. In table (\textbf{b}) the background rejection is calculated for a fixed training size of 600 and the dimensionality of the latent space is varied.

\textit{Background rejections}. Comparing the background rejection $\varepsilon_b^{-1}(\varepsilon_s)$ of the unsupervised quantum and classical kernel machines, for signal efficiencies $\varepsilon_s=0.6,0.8$. In Table (\textbf{a}) the background rejection is computed for a fixed latent dimension of 8 and the number of training samples is varied. In Table (\textbf{b}) the background rejection is calculated for a fixed training size of 600 and the dimensionality of the latent space is varied. Values are obtained using 5-fold testing of $N_\mathrm{test}=10^5$.

\textit{Performance of the unsupervised quantum kernel machine and role of entanglement}. (\textbf{a}) The performance of the unsupervised quantum kernel machine, quantified by $\Delta_\mathrm{QC}$ for different numbers of qubits $n_q$, is assessed as a function of the data encoding circuit repetitions (depth) $L$.``NE" represents the case where no entanglement is present in the circuit. (\textbf{b}) Summary of the performance increase as a function of the system size for $L=7$. (\textbf{c}) Entanglement capability of the data encoding circuit as a function of the depth $L$. Uncertainties are comparable to the size of the displayed data points and are computed using the same 5-fold testing procedure as in Fig.~\ref{fig:results}.

\textit{Relating the performance of the unsupervised quantum kernel machine to properties of the quantum circuit}. The plot of the ratio $\Delta_\mathrm{QC}$, defined in Eq.~\ref{eq:delta_qc}, as a function of parameters of the data embedding quantum circuit presented in Fig.~\ref{fig:circuits}a. $\mathrm{L}$ denotes the number of repetitions of the ansatz, $\mathrm{NE_0}$ and $\mathrm{NE_1}$ represent cases where no entanglement is present in the circuit, and $\mathrm{FE}$ refers to all-to-all entanglement. For the case of three repetitions, $\Delta_\mathrm{QC}$ is also presented as a function of the number of qubits $\mathrm{n_q}$.

\textit{Characterization metrics of the data encoding circuit.} The metrics are calculated via sampling the circuit parameters from three different distributions as depicted in the legends: the uniform distribution in $[0,2\pi]$, the QCD background data distribution, and the signal (anomaly) scalar boson data distribution. (\textbf{a}) The expressibility (Expr) as a function of the different circuit architectures. (\textbf{b}) The entanglement capability $\langle \mathrm{Q} \rangle$ of the data encoding circuit as a function of the different circuit architectures. (\textbf{c}) The expressibility of the data encoding circuit as a function of the number of qubits ($n_q$). (\textbf{d}) The variance of the kernel $\mathrm{Var}_{z, z'}k(z,z')$ as a function of the number of qubits, where $k(z,z')$ is the kernel corresponding to the data encoding circuit , $z$ and $z'$ are data feature vectors sampled from the signal or background distributions.

\textit{The Autoencoder architecture.} The model reduces the dimensionality of the high energy physics dataset from 300 dimensions per jet to latent space dimension $\ell$. The generated latent space serves as the input to the anomaly detection algorithms.

\textit{Topology of physical qubits}. The layout of the physical qubits on the \texttt{ibm\_toronto} machine. The connections between nodes represent the possibility of executing 2-qubit gates between neighboring qubits. The selected eight qubits, circled in grey, represent the ones used to run the quantum kernel machine. Different noise levels are color-coded, the lighter color representing higher noise levels, for single-qubit and 2-qubit gates on the nodes and connections, respectively.

\textit{Characterization metrics of the data encoding circuit.} The metrics are calculated via sampling the circuit parameters from three different distributions as depicted in the legends: the uniform distribution in $[0,2\pi]$, the QCD background data distribution, and the signal (anomaly) scalar boson data distribution. (\textbf{a}) The expressibility (Expr) as a function of the different circuit architectures. (\textbf{b}) The entanglement capability of the data encoding circuit ($\langle \mathrm{Q} \rangle$) as a function of the different circuit architectures. (\textbf{c}) The expressibility of the data encoding circuit as a function of the number of qubits ($\mathrm{n_q}$). (\textbf{d}) The variance of the kernel, $\mathrm{Var}_{z, z'}k(z,z')$, where $k(z,z')$ is the kernel defined in Eq.~\ref{eq:quantum_kernel} and $z$ is the data feature vector sampled from the signal or background distributions corresponding to the data encoding circuit as a function of the number of qubits.

\textit{Performance of the unsupervised quantum kernel machine and role of entanglement} The performance of the unsupervised quantum kernel machine, quantified by $\Delta_\mathrm{QC}$, for $\varepsilon_s=0.6$ and for different numbers of qubits $n_q$, is assessed as a function of the data encoding circuit repetitions (depth) $L$. ``NE" represents the case where no entanglement is present in the circuit.

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science