APFELgrid : a high performance tool for parton density determinations

We present a new software package designed to reduce the computational burden of hadron collider measurements in Parton Distribution Function (PDF) ﬁts. The APFELgrid package converts interpolated weight tables provided by APPLgrid ﬁles into a more eﬃcient format for PDF ﬁtting by the combination with PDF and α s evolution factors provided by APFEL . This combination signiﬁcantly reduces the number of operations required to perform the calculation of hadronic observables in PDF ﬁts and simpliﬁes the structure of the calculation into a readily optimised scalar product. We demonstrate that our technique can lead to a substantial speed improvement when compared to existing methods without any reduction in numerical accuracy.


Introduction
Measurements at colliders such as the Tevatron and Large Hadron Collider (LHC) have a unique capacity to shed light upon the internal dynamics of the proton and provide constraints upon proton PDFs [1]. However including large hadron collider datasets in PDF fits can provide a significant challenge due to the large computational footprint of performing accurate theoretical predictions over the many iterations required in a fitting procedure. In order to make the fullest use of current and future LHC results, efficient strategies for the computation of these observables must therefore be employed.
The typical Monte Carlo software packages used to perform predictions for hadron collider observables cannot be easily deployed in a PDF fit due the processing time required to obtain accurate results (usually of the order of a few hours or more per data point). To overcome such limitations, the typical strategy adopted for fast cross section prediction relies on the precomputation of the partonic hard cross sections in such a way that the standard numerical convolution with any set of PDFs can be reliably approximated by means of interpolation techniques.
Such interpolation strategies are implemented in the APPLgrid [2] and FastNLO [3] projects. For the computation of the hard cross sections, these packages rely on external codes to which they are interfaced by means of a suite of functions allowing for the filling of PDF-and α s -independent look-up tables of cross section weights. Monte Carlo programs such as MCFM [4] and NLOJet++ [5] have been interfaced directly to APPLgrid/FastNLO and more recently dedicated interfaces to automated general-purpose event generators have been developed. The aMCfast [6] and MCgrid [7] codes can generate interpolation grids in APPLgrid/FastNLO format by extracting the relevant information from the MadGraph5 aMC@NLO [8] and SHERPA [9] event generators respectively.
While these tools have proven to be invaluable in the extraction of parton densities, the volume of experimental data made available by LHC collaborations for use in PDF fits is already stretching the capabilities of the typical fitting technology. A standard global PDF fit may now include thousands of hadronic data points for which predictions have to be computed thousands of times during the minimisation process. As a consequence, performing these predictions using the standard interpolating tools, i.e. APPLgrid and FastNLO, starts to become prohibitively time-consuming. For this reason a high-performance tool tailored specifically to the requirements of PDF analysis becomes increasingly important.
The FastKernel method was developed to address this problem in the context of the NNPDF global analyses [10]. This method differs from the standard procedureà la APPLgrid or FastNLO in that it maximises the amount of information that is precomputed prior to fitting so as to minimise the amount of operations required during the fit. More specifically, the FastKernel method relies on the combination of precomputed hard cross sections with DGLAP evolution kernels into a single look-up table, here called a FastKernel (FK) table. In this way the prediction for a given hadronic observable can be obtained by performing a simple matrix product between the respective FK table and PDFs evaluated directly at the fitting scale.
In this paper we present the APFELgrid package, a public implementation of the FastKernel method in which the hard partonic cross sections provided in an APPLgrid look-up table are combined with the DGLAP evolution kernels provided by the APFEL package [11].
This paper proceeds as follows. In Sect. 2 we present the technical details of the implementation of the FastKernel method. This is followed in Sect. 3 by a performance benchmark of the APFELgrid library and resulting FK tables. Finally, in Sect. 4 we summarise the results discussed in this work.

Interpolation tools for collider observables
Hadron collider observables are typically computed in QCD by means of a double convolution of parton densities with a hard scattering cross section. Consider for example the calculation of a general cross section pp → X with a set of PDFs {f }: where Q 2 is the typical hard scale of the process, the index s sums over the active partonic subprocesses in the calculation, p sums over the perturbative orders used in the expansion, p LO is the leading-order power of α s for the process andσ (p)(s) is the N p LO contribution to the cross section for the partonic subprocess scattering (s) → X. F (s) represents the subprocess parton density: where the C (s) ij matrix enumerates the combinations of PDFs contributing to the s-th subprocess. The central observation of tools such as APPLgrid and FastNLO is that the PDF and α S dependence may be factorised out of the convolution via expansion over a set of interpolating functions, spanning Q 2 and the two values of parton-x. For example one may represent the subprocess PDFs and α S in terms of Lagrange basis polynomials I τ (Q 2 ), I α (x 1 ) and I β (x 2 ) as: where we use the shorthand F (s) . Using these expressions in the double convolution of Eq. (1) one can finally obtain an expression for the desired cross section which depends upon the subprocess PDFs only through a simple product: where W consists of the convolution of the hard cross section with the interpolating polynomials. This information may be stored in a precomputed look-up table. The final expression for the cross section in Eq. (4) is therefore a considerably simpler task to perform inside a fit than the direct evaluation of the double convolution.

The FastKernel method
A number of tools (e.g. APFEL, HOPPET [12] and QCDNUM [13]) are available which perform PDF evolution via an analogous interpolation procedure. In such a way PDFs at a general scale Q τ may be expressed as a product of PDFs at some initial fitting scale Q 0 and an evolution operator obtained by the solution of the DGLAP equation.
where latin indices run over PDF flavour, greek indices run over points in an initialscale interpolating x-grid and the evolution operator A may be accessed directly in the APFEL package. Given this operator, we may replace the (general-scale) PDFs used in the subprocess parton density Eq. (2) with their equivalent expressions evaluated at the fitting scale as with the object C combining the operations of subprocess density construction and PDF evolution. Going further and using the expression for subprocess parton densities in Eq. (7) in the full cross section calculation we obtain Performing some further contractions it is possible to obtain an extremely compact expression for the calculation of the cross section in question, in terms of only the initial-scale PDFs and summing only over the initial scale interpolating x-grid and the incoming parton flavours: where the object: is referred to here as an FK table, and combines the information stored in APPLgrid-style interpolated weight grids with analogously interpolated DGLAP evolution operators. This combination enables for a maximally efficient expression for the calculation of observables at hadron colliders under PDF variation, without invoking any additional approximation.

Features and limitations of FK tables
The FK product of Eq. (10) differs with respect to the product in Eq. (4) in several notable ways. Firstly the typical APPLgrid or FastNLO products use as input PDFs at a general scale, requiring that PDF evolution e.g. Eq. (6) be performed for every variation of the PDFs during the fit. In the FK product this evolution is pre-cached at the stage of FK table generation, requiring only initial-scale PDFs at the time of fitting. This pre-caching of the evolution also removes the need to sum over hard scale and perturbative order during the fit, further reducing the number of operations required. As the FK product acts directly at the fitting scale, it benefits from the typically reduced number of active partonic modes, with the sum over flavours in Eq. (10) being limited to those directly parametrised in the fit. Having reduced the calculation to such a simple form, it is also straightforward to apply standard computational tools such as multi-threading through e.g OpenMP or Single Instruction Multiple Data (SIMD) operations such as SSE or AVX to further reduce computational expense.
While these features provide significant performance enhancements, the FK table format is not suitable as a complete replacement for tools such as APPLgrid. The precomputation of the PDF evolution necessarily means that all theory parameters such as perturbative order, strong coupling and factorization/renormalization scales are inextricably embedded in each FK table. In order to perform PDF fits including variations of these parameters, multiple FK tables must be computed, each with different theory settings. While performing such a re-calculation directly from Monte Carlo codes would be exceptionally time consuming, the data representation in APPLgrid files allows for an efficient (re)-combination with varying theory parameters.

Performance benchmarks
We shall now examine the performance differences between the two expressions for fast interpolated cross section prediction Eq. (4) (APPLgrid) and Eq. (10) (FK). In order to provide a comprehensive benchmark, we consider here a wide range of processes including LHC and Tevatron electroweak vector boson production measurements [14][15][16][17][18][19][20][21][22], tt total cross sections [23][24][25][26], double-differential Drell-Yan cross sections [27,28] and inclusive jet data [29][30][31][32]. Predictions are performed over a wide range of kinematics, for a total of 52 source APPLgrid files corresponding to the majority of available LHC and Tevatron datasets applicable to PDF determination. While the source APPLgrid files have varying grid densities in x and Q 2 , for the purposes of comparison the corresponding FK tables are produced consistently with 30 points in x, and at an initial scale below the charm threshold, therefore with seven active partonic species. These settings are chosen so as to provide a realistic comparison, in a production environment the density and distribution of the x-grid may be adjusted to match interpolation accuracy requirements. For these comparisons the FK table is stored as double-precision in memory for table generation and in single-precision for the purposes of computing the FK product.
In Fig. 1 we compare the average time taken per datapoint for the FK and APPLgrid calculations, for all of the 52 tables. We show timings for the FK calculation in four different configurations: AVX-OpenMP 2x (2 CPU cores), AVX, SSE3 and the standard double precision product. Due to the inherent structural differences between the FK and APPLgrid procedures, results from the FK calculation are systematically faster than those from APPLgrid. In particular, when comparing FK AVX-OpenMP 2x to APPLgrid timings we obtain minimally a factor of ten improvement in speed for electroweak vector boson production and a maximum factor of 2000 improvement in predictions for inclusive jet data. Across all processes and kinematic regions we observe significant performance improvements from using the FK calculation even without the use of SIMD or multithreading.
While sheer computational speed is typically the primary consideration when computing observables in a PDF fit, other factors such as table size in the filesystem and memory, along with the computational cost of pre-computing FK tables must be considered. Indeed, the computation of the FK table in Eq. (11) requires a great deal of operations which can be time consuming, particularly in the case of source APPLgrid files with very high interpolation precision.
In Fig. 2 we examine the FK table generation time with APFELgrid, FK table file size and memory usage of the FK tables arising from the same source APPLgrid files as discussed in Fig. 1. When examining the table generation time per point, we observe timings from a few milliseconds to 3.5 minutes per point, with differences arising from the varying grid densities used in the input APPLgrid files. In terms of the grid size on disk, FK tables are typically larger than their corresponding APPLgrid files, primarily as the FK file format is encoded in plain text for compatibility whereas APPLgrid files are expressed in binary as ROOT files. However when measuring the in-memory resident set size used by the two procedures, the amount of system memory used by FK tables is systematically less than APPLgrid files for all processes considered here by at least 75%. Note that this effect is in part due to the differing precisions of the default representations.

Conclusion
In this work we have demonstrated that by employing the so-called FastKernel method, it is possible to convert an APPLgrid weight table into a derived format, referred to as an FK table, including the effects of PDF and α s evolution. This procedure has been implemented in the APFELgrid package, supplied as a set of C++ routines designed to supplement the PDF evolution library APFEL with FK table generation capabilities. The APFELgrid package allows one to obtain a computationally efficient expression for the calculation of hadronic cross sections, in terms of only the initial-scale PDFs and summing only over the initial scale interpolating x-grid and the incoming parton flavours. The simple structure of the resulting product makes FK tables particularly suitable for the efficient use of computational tools such as SIMD and OpenMP.
We have shown that in several practical examples the numerical evaluation of an FK product is considerably faster than the corresponding APPLgrid product, even in the case where neither SIMD or multi-threading are applied. FK tables are supplied in a simple plain-text format in order to simplify the construction of user interfaces, and therefore are typically larger than corresponding APPLgrids. However we have shown that the inmemory resident set sizes occupied by FK tables are typically smaller than those required by APPLgrids, in our examples by at least 75%.
The substantial speed improvement of FK tables with respect to APPLgrid in association with the reduction in memory footprint makes the APFELgrid code a valuable tool for modern PDF fits including large collider datasets.