APS : Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection

Schueler, J.; Majewski, P.A.; Cottle, A.; Cazzaniga, C.; Millins, L.; Balashov, S.N.; Marley, T.; Lilley, S.; Solovov, V.N.; Kudryavtsev, V.A.; Lopes, M.I.; Garcia, F.; Sumner, T.J.; Borg, J.E.; Oliveri, E.; Neep, T.; Knights, P.; Kaboth, A.C.; McCabe, C.; Lopez Asamar, E.; Nakhostin, M.; Dapica, P. Luna; Neves, F.; Lindote, A.; Araújo, H.M.; Tilly, E.; Ropelewski, L.; Nikolopoulos, K.; Kastriotou, M.; Veenhof, R.; Katsioulas, I.; Brunbauer, F.M.; Khazov, A.; Frost, C.D.; Hunt, D.; Loomba, D.; Tarrant, J.; Nandakumar, R.; Turnley, R.; Brew, C.; Mills, A.F.; Lisowska, M.; Kraus, H.

doi:10.1103/PhysRevD.111.072004

Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection - Schueler, J. et al - arXiv:2406.07538

Example outputs of common CNN-based computer vision tasks illustrated on snippets of image data recorded in the MIGDAL detector. Referring to each frame from left to right: (i) Classification maps an image input to a discrete set of outputs. The example shown here is for an image classifier trained for particle identification, where the classifier predicts a nuclear recoil (NR) from neutron scattering in the input image. (ii) Regression maps the input to a continuous set of outputs. The example illustrated here is for a model trained to reconstruct energies, so the regression model reconstructs the energy present in the input image as \SI{525}{keV} of visible energy. (iii) Object detection algorithms simultaneously classify and localize any number of objects in a single image. The two bounding boxes shown were predicted by our trained YOLOv8 pipeline (see Sec.$\,$\ref{sec:YOLOpipeline}) and indicate that the algorithm detected a NR track (red box) and an ER track (pink box). (iv) Key point detection takes object detection a step further and identifies key points within bounding boxes; shown here is an example of particle trajectory fitting with key points. (v) Finally, within each classified bounding box, instance segmentation assigns every pixel as belonging to the object-class or not. This example shows translucent segmentation masks overlaid on the NR and proton within the red and yellow bounding boxes, respectively, designating the pixels that the algorithm assigned as belonging to the track.

Orca-Quest camera image postprocessed with $4\times 4$ pixel binning and Gaussian smoothing recorded in the presence of both an $^{55}$Fe calibration x-ray source and neutrons from the D-D generator, with YOLOv8's bounding box predictions shown. This event is illustrative of the characteristic 2D Migdal effect topology where a NR (red box; $\sim$\SI{310}{keV} visible energy) and ER (pink box; $\sim$\SI{6.0}{keV} visible energy) appear to share the same vertex. In actually, this event is not the Migdal effect, but rather the coincidental occurrence of an ER spatially overlapping with an NR within the \SI{8.3}{ms} exposure window of the camera.

Two consecutive camera frames at \SI{20}{ms} exposure (left and middle panels) and their sum (right) visualized with a logarithmic intensity scale. The pink, yellow, and red bounding boxes represent YOLOv8's bounding box predictions for an ER, a proton, and a NR, respectively (see Sec.$\,$\ref{sec:YOLOpipeline} for YOLO pipeline details). The white bounding boxes in the left and middle panels show YOLOv8's prediction that the rolling shutter clipped the NR and also estimate where the track was clipped. Summing these two frames together recovers the clipped NR at the expense of signal to noise. The faint long track inside the yellow dashed bounding box shows an example of a proton ghost.

Snippet of the Label Studio front-end interface showing a logarithmic-intensity-scale image with preannotated bounding boxes and associated classification labels and scores, produced by a YOLO model pretrained on the training data from Table$\,$\ref{tab:i}.

Diagrammatic representation of $\mathrm{IoU}(B_1,B_2)$. $B_1$ and $B_2$ are distinct bounding boxes with their overlap region shaded in red.

Unprocessed frame (linear intensity scale) used for the high occupancy batch with YOLO's bounding box predictions shown. The pink bounding box is an ER prediction and the red bounding boxes are NR predictions.

Pipeline processing rate as a function of number of frames processed for a batch with events corresponding to a typical \SI{20}{ms} exposure run (filled points), and our custom high occupancy batch (unfilled points). Downsampling, which we parallelize into three subprocesses, is the performance bottleneck for the typical run sample, leading to a larger variance in when the inputs of Process 2 are generated, hence the larger spread in overall processing rates for this sample.

Simulated ERs, measured NRs, and hybrids plotted on logarithmic intensity scales. Each row shows an example of stitching together the vertex of a simulated ER with the point of highest intensity of a measured NR to generate a hybrid Migdal event. Left column: Simulated ER tracks with gain scaling but no noise; truth ER vertices are shown in white. Middle column: Measured NRs with their point of highest intensity shown in white. Right column: hybrid Migdal formed by translating the ER so its vertex aligns with the point of highest intensity of the NR; vignetting scaling is applied to the ER once it's aligned with the NR. The bounding box predictions of YOLO trained on the Augment sample (Sec.$\,$\ref{subsec:SimResults}) are shown for the three hybrids with red (pink) boxes denoting NR (ER) predictions. The top and bottom rows show examples of positive detections. YOLO did not identify the ER in the middle-row hybrid indicating a negative detection. The fractions of ER significant pixels, $\rm f_{sigpix}$ (see Sec.$\,$\ref{subsec:SimResults2}), in the top, middle, and bottom-row hybrids are 0.42, 0.27, and 0.10, respectively.

Top: Simulated Migdal (bars) and coincidence (solid curves) detection efficiency as defined in Eq.$\,$(\ref{eq:3.4}) versus ER energy, integrated over all NR energies. Bottom: Migdal detection efficiency versus NR energy for pure simulated and hybrid Migdal events with $\SI{5.0}{keV_{ee}}\leq E_\mathrm{ER}\leq\SI{6.0}{keV_{ee}}$. Simulated NR energies are ground truth energies with SRIM quenching factors applied, which is why the pure simulation histogram terminates at a lower NR energy than the hybrid simulation histograms. The training and evaluation samples for each color plotted in this figure are summarized in Table$\,$\ref{tab:iii}.

Distribution of $\rm f_{sigpix}$ (gray bars; left vertical axis) and Migdal detection efficiency versus $\rm f_{sigpix}$ (red histogram; right vertical axis) of the $\SI{5.0}{keV_{ee}}\leq E_\text{ER}\leq\SI{6.0}{keV_{ee}}$-subset of the hybrid Migdal simulation set trained on the Augment training sample.

Comparison of bounding box overlap between ground truth and YOLO's predictions for positive detections in the hybrid Migdal (black) and hybrid coincidence simulation (red) samples. The bolded curves represent ER IoU overlap [$\mathrm{IoU}(B^\mathrm{ER}_\mathrm{p},B^\mathrm{ER}_\mathrm{t})$], while the unbold curves represent NR IoU overlap [$\mathrm{IoU}(B^\mathrm{NR}_\mathrm{p},B^\mathrm{NR}_\mathrm{t})$].

Random coincidence background rejection and acceptance of simulated Migdal signal detected by YOLO shown at integral steps of $d(\overline{b}_\mathrm{ER},b^*_\mathrm{NR})$ -- the maximum allowed distance between the centroid of the ER bounding box and point of highest intensity of the NR bounding box. The zoomed-in inset shows the performance near the $d(\overline{b}_\mathrm{ER},b^*_\mathrm{NR})\leq\SI{5}{mm}$ optimum. Red dashed lines denote the background rejection and signal retention at this optimum.

Snapshot of a live online display that updates with 600 images worth of data every five seconds. The top banner shows updating counters of objects of interest accumulated over the course of the run. The left plot shows energy versus 2D length of tracks with color representing YOLO's classification assignments of track species of interest. The right plot shows the energy spectrum of NRs that YOLO identified.

Distribution of $\rm f_{sigpix}$ (gray bars; left vertical axis) and Migdal detection efficiency versus $\rm f_{sigpix}$ (red histogram; right vertical axis) of the hybrid Migdal simulation set trained on the Augment training sample.

Selection of six event displays satisfying all search criteria applied in Table$\,$\ref{tab:v}. The frames are $4\times 4$-binned with Gaussian smoothing applied, leading to a \SI{3.1}{mm} per twenty pixel conversion. NR bounding boxes are colored red and ER bounding boxes are colored pink. Clockwise starting from the upper left frame, the reconstructed energies $\{E_\mathrm{NR},E_\mathrm{ER}\}$, in units of \SI{}{keV_{ee}}, within YOLO's predicted bounding boxes are (i) $\{140,5.8\}$, (ii) $\{68,5.3\}$, (iii) $\{110,5.9\}$, (iv) $\{160,5.4\}$, (v) $\{170,5.3\}$, and (vi) $\{200,5.3\}$.

Intensity spectrum of simulated $^{55}$Fe tracks identified by YOLO after applying gain scaling, vignetting, and adding noise. Data-driven vignetting corrections are computed following the procedure outlined in Appendix$\,$\ref{sec:A1} to achieve the shown spectrum. The solid line-portion of the curve fit is our fit-region for a single-peak Gaussian.

Surviving coincidence-background frame counts, $\rm N_{bg}$, versus Migdal signal efficiency as a function of distance between the bounding box centroid of YOLO's ER and NR prediction, $d(\overline{b}_\mathrm{ER},\overline{b}_\mathrm{NR})$, evaluated on around 25000 detected hybrid coincidence frames and about 27000 hybrid Migdal frames. The number above the each curve represents the threshold fraction of significant ER pixels, $\mathrm{\widehat{f}_{sigpix}}$, for that curve. For example, the leftmost curve represents $\rm f_{sigpix}\geq 0$ and the next curve represents $\rm f_{sigpix}> 0$. For all other curves we use selections greater than or equal to the number shown, so the rightmost curve represents $\rm f_{sigpix}\geq 0.4$. Point sizes are different between the six curves for visual clarity.

Top: Electron recoil intensity spectrum for a D-D run with the $^{55}$Fe source present (``DD$\rm+^{55}$Fe"; black bars) and a D-D run without the $^{55}$Fe source present (``DD only"; blue shaded) scaled to the equivalent elapsed time of the D-D$\rm+^{55}$Fe run. Bottom: Recovered $^{55}$Fe spectrum after background subtracting the timescaled D-D run spectrum from the D-D$\rm+^{55}$Fe signal spectrum. The solid line-portion of the curve fit is our fit-region for a single-peak Gaussian.

Localization and detection efficiencies versus truth energy (SRIM quenching factors applied to NR energies) for frames containing single simulated tracks. For ERs, $\mathrm {\varepsilon_{det}}=\mathrm {\varepsilon_{local}}$, so we do not include a separate $\rm \varepsilon_{local}$ trace for ERs.

Left: Effective intensity map generated from $^{55}$Fe x-rays. Right: $^{55}$Fe spectrum before (pale blue bars) and after (black bars with fit) applying the vignetting correction map.

Event displays with YOLO's predicted bounding boxes for six randomly selected frames in our Migdal search sample drawn from events satisfying all criteria applied in Table$\,$\ref{tab:v}. The frames are $4\times 4$ binned with Gaussian smoothing applied, leading to a \SI{3.1}{mm} per twenty pixel conversion. NR bounding boxes are colored red and ER bounding boxes are colored pink. Clockwise starting from the upper left frame, the reconstructed energies $\{E_\mathrm{NR},E_\mathrm{ER}\}$, in units of \SI{}{keV_{ee}}, within YOLO's predicted bounding boxes are (i) $\{140,5.8\}$, (ii) $\{61,4.9\}$, (iii) $\{64,5.6\}$, (iv) $\{170,4.3\}$, (v) $\{82,5.4\}$, and (vi) $\{290,4.1\}$.

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science