OFFLINE

Introduction

Since the last CMS Bulletin report, in December 2010, the LHC has mostly been in winter shutdown, thus not delivering luminosity to the CMS Experiment. Nevertheless, activities on the Offline side have been frenetic, and the system taking data from 14th March is significantly different from the one that ended the 2010 run with heavy-ion (HI) collisions.

Activities after the machine shutdown have followed two distinct paths: on the one hand we wanted to close the book with 2010 data, with a complete Data and Monte Carlo reprocessing with latest available calibrations and algorithms; on the other we wanted to improve our system in view of the 2011 LHC Run.

The CMSSW_3_9_X cycle was already deployed successfully both Online and Offline for the HI run. It has also been used, after a tight agenda of calibrations and new algorithm development, for the 2010 end-of-year complete reprocessing that was launched before Christmas and essentially completed by the end of the vacation period. At the end of February, a release from the same cycle has been used to reprocess the HI data, to form the Tracker Zero Suppressed Dataset, which will be used as the basis of future analyses. When CMSSW_3_9 entered into production in early November, the next cycle, 3_10, started. Its goals were somewhat limited with respect to previous cycles, since the release was to be used to start the generation and simulation of the 2011 LHC collisions at 8 TeV. The new version of Geant4 (v9.4) was used to allow for the most recent developments, and a huge production of simulated events was launched during the Christmas break.

The Offline Project had two major goals to accomplish before the re-start of data-taking in 2011. In order to squeeze the last drop of CPU performance out of our code, the switch to 64-bit compilation was requested to help trigger selection. In addition, the version of ROOT we have as the basis of our code needed to be improved to benefit from improvements in IO performance and new features in managing schema changes in data formats. The two tasks required hectic activities up until the beginning of March, with frequent special releases that were used by physics groups to compare and validate results. The transition to 64-bit compilation was realised and validated in the 3_11 cycle by the middle of February, and was deployed on Tier 0 before the start of the 2011 Cosmics Run; the Online part happened roughly at the same time. Performance gains varied with use cases and in the case of reconstruction they were measured to be of the order of 20%. Transitioning to the new version of ROOT required more time, since we wanted to be sure the old custodial raw data was readable without problems. The Offline Project moved to the 4_1 release cycle (new ROOT, 64 bit) during the first week of March, and this was followed in the next week by deployment Online.

Following the decision to run the LHC at 3.5 TeV per beam in 2011, the Monte Carlo production that had been started in December using the 3_10 release was abandoned and restarted with the lower correct energy. This new production used updated parameters for the Beam Spot and number, as well as the correct structure of pile-up events. The production started in March, and is currently ongoing. At the same time, a re-reconstruction of 2010 Monte Carlo with 2011 pile-up parameters was requested, in order to get ready for high-luminosity conditions, and is currently ongoing.

The current development cycle is CMSSW_4_2, and is not planned for deployment on Tier 0 before the end of April. It contains many changes in event reconstruction, and again new calibrations. Most notably, it includes a “twist-free” alignment of the CMS Tracker. In view of the 2011 LHC Run, all the certification activities have restarted at full speed, and the Physics Validation team is gearing up to provide validated data within one week of the data being taken.

All the components delivered by the CMS Offline project have undergone important developments in the last six months:
– The plan to consolidate event display activities into a single project delivering a single application, called FireWorks, is now complete. FireWorks now contains all the features that had been identified in the review that took place at the start of 2010, features such as Geant4 geometry navigation, iSpy-like modes, integration with the full CMSSW framework and reconstruction on demand while visualising events.
– Full simulation of the detector has improved in many aspects, with new parameterised calorimetric libraries and extending the coverage up to high η to include the CASTOR detector in standard Monte Carlo samples.
– Reconstruction was able to deliver new features (e.g. improved Particle Flow code), and at the same time could follow the day-by-day operations with a negligible number of jobs failing prompt reconstruction in 2010. A lot of work was spent in the streamlining of our data formats and Data Tiers: 2011 will be a resource-constrained year, and disk space at Tier 1 and Tier 2 is a concern; a CMS-wide review of our data format helped to bring down data sizes even in the presence of pile-up, with gains which in some cases reach 20%.
– Fast Simulation is continuing efforts to match the data coming from the detector, and fast simulation of the CMS detector is effectively used in physics analyses to be presented at Winter Conferences.
– Alignment/Calibration is continuously updating CMS calibrations; this happens not only as far as new data arrives, but also with completely new algorithms being deployed. Examples are the production of a Tracker “twist-free” alignment, and the possibility to add sagitta to the Tracker modules. Moreover, the Prompt Calibration Loop at Tier 0 is now fully operational, and more and more calibration workflows are moving to it.
– The Generators project have adopted and integrated use of the Rivet and Professor tools in order to facilitate generator tuning and validation. Work has been done on the development of a Sample Generation Request Interface (PREP) that will ease the process of requesting hundreds of Monte Carlo datasets, and will allow an easy recovery of their production states.
– The new job submission tool, WMAgent, is now in the final state of validation and will become the single CMS component used to submit GRID Jobs. It is now being tested at Tier1 for reprocessing workflows; it will soon be used also for Monte Carlo production at Tier2 and later in the year for analysis tasks, serving as the base for the specific CRAB3 tool.
– The DQM project has been focussing on the automation of the certification workflows, with subsystems’ automatic answers soon after Tier 0 reconstruction; an improved Run Registry (v3) is also in preparation and should be ready in the next few months.

In 2011, an Offline Workshop took place at CERN on 2nd and 3rd February and was focussed on operations for this year’s run. Given the scarce manpower situation, efforts will be made to automate as much as possible all the validation, certification and calibration workflows, freeing manpower for addressing real development issues.

Another event aimed at the discussion of specific issues took place since the last Bulletin. From 24th to 26th January, a special workshop was held in Bari to discuss “Storage and Data Access Evolution” for the runs from 2012 onwards. CMS, like the other LHC experiments, is going to deploy a new Computing Model, with new access patterns to data; clearly this has consequences on the software components, starting with new Framework features that will need to be developed, tested and deployed. Five working groups have been formed to follow the tests in the next month, and will report to Offline and Computing Projects.

In the following we give more details of progress made in the various Offline sub-projects.

Generators

The activities in the Generator Tools group during the initial part of the year have been devoted to two main targets: prepare the mass production for 2011 and complete a number of developments required.

During the Christmas period effort has been put into producing the input samples needed for a mass production at 8 TeV that at that time seemed the most likely energy for the 2011 run. The changes in LHC plans have forced us to move to a different program i.e. to reprocess the existing samples at 7 TeV while planning a new production that will exploit improvements in the simulation and further extend the statistics. In this context, effort has been put in better coordinating the different requests of the main Standard Model samples in order to avoid duplications.

In parallel, efforts have been made to finalise the deployment of the Production and REProcessing (PREP) management tool, which is now in an advanced commissioning phase, and to make progress in the integration of new software components required. The most relevant among them are the tools to be used for generator tuning (Rivet/Professor) and the state of the art version of the LHAPDF library of parton distribution functions.

Full Simulation

There has been a large amount of activity in the Full Simulation group over the past three months. At the end of 2010, several improvements were added to the Simulation code base. A major addition was the new version of Geant4, with improved physics description of anti-protons and hyperons, and better shower models for low- and medium-energy showers. Substantial work was also completed for the forward calorimeter systems. A transition from a shower model description of HF to one using a combination of parametrised showers (GFlash) and Geant allowed the description of energy deposition in the HF photomultiplier windows and fibre bundles at the back of the calorimeter. This should allow the modelling of the large energy deposits seen when particles interact in these regions of the detector. The change in modelling strategy also allows the full integration of the CASTOR detector, which is now included in the default simulation configuration. The shower library simulation of CASTOR was validated against the full Geant simulation last year, and we look forward to a detailed comparison with collision data. In addition, the simulation has been updated to include the full components of the TOTEM T1 detector, which was installed during the winter shutdown.

A major focus of the Full Simulation moving forward will be the study and validation of the pile-up simulation. With the LHC expected to deliver instantaneous luminosities corresponding to anywhere between 10 and 16 interactions per crossing, CMS will enter a new regime in terms of detector occupancy. (This level can be compared to an average of slightly fewer than three interactions per crossing in 2010, a qualitatively lower value in many respects.) A considerable effort aimed at understanding pile-up issues, ranging from low-level detector-specific studies through the impact on physics analyses, was launched during the February Physics Week, with Simulation and the PVT group providing a focus for the work. Studies have included the effects of out-of-time pile-up in the simulation, as well as work on isolation variables and pile-up subtraction in jets. There is much work still to be done; we look forward to having a substantial dataset of collisions for a true evaluation of the impact of pile-up on CMS.

Reconstruction

At the end of 2010, the 3_8 and 3_9 releases have been used for the reprocessing of the proton-proton data collected in 2010, with a view to presenting results at the winter conferences. The 3_8 release had been deployed for prompt reconstruction in September 2010, and the reprocessing with 3_8 was done to provide a consistent dataset for physics analysis. The 3_9 reprocessing included state-of-the-art calibrations and improvements in almost every aspect of the reconstruction software. Improvements were brought to reconstruction of Muon with high transverse momentum, of track-corrected missing transverse energy and of electron identification. Special attention was given to low-level reconstruction in the electromagnetic calorimeter, with improved energy calibration due to loss of transparency of the crystals, partial recovery of dead channels and better noise identification. Monte Carlo samples have been reprocessed consistently, and results being presented at winter conferences are derived from the 3_9 datasets.

The 3_9 releases have been used for data taking during the heavy-ion collision operation of the LHC. The fact that operation and processing issues were promptly solved is an indicator of the excellence of the Offline and operation teams. However, this experience clearly revealed a lack of redundancy checks that we are looking forward to consolidating, together with the Heavy-Ion Physics community, before the next lead-lead LHC campaign.

Over the 2010/2011 winter break, three major releases were produced. The 3_10 release cycles included minor software developments and were aimed at the production of Monte Carlo samples with 8 TeV of centre-of-mass energy. This production and release was later deprecated due to the change in LHC planning.

The 3_11 release cycle includes part of the developments foreseen for the resumption of proton-proton operation. It has been deployed for the warm-up of the CMS detector and the cosmic data-taking campaign.

The 3_11 release cycle has been mirrored into the 4_1 series in order to adapt to changes in ROOT, which provides the core of the file handling. Careful validation was performed and the 4_1 series has been put in production for the 2011 proton-proton operation. The 4_2 release cycle includes primarily development in the reconstruction of electrons and photons, with a particle-flow approach. Changes in the structure of data formats have been made in order to reduce the RECO and AOD event size given the increased level of pile-up expected in 2011. A reduction of approximately 15% was achieved through the elimination of redundant information and improvements to the implementation of targeted data format classes. Initial development to cope with additional pile-up events from the LHC started during the 4_2 cycle and will continue in future cycles, making use of experience to be gained during 2011 data-taking.

As a continuous effort throughout release cycles, emphasis is put on release validation automation, release feature documentation, and simplification of operation maintenance, all of which requires attention and help from collaborators outside of the Offline group. We look forward to consolidating the RECO team with new members in order to perform smoothly all the tasks related to reconstruction.

Alignment and Calibration

At the end of the 2010 data-taking period a large number of conditions updates were prepared to best describe the status of the CMS detector through the whole period of running. These updates have been deployed in the reprocessing of the complete dataset in view of the winter conferences. The refined knowledge of the detector conditions that resulted from this effort is also beneficial for the online and prompt reconstruction of the data that CMS is going to collect in 2011.

A major step for the Alignment and Calibration group has been the full commissioning of the Prompt Calibration Loop, by means of which the values for time-dependent conditions are updated at the level of the first full reconstruction of the data. Prompt workflows for the beam-line calibration at luminosity section granularity, and the determination of channel status conditions for ECAL and strip tracker, were already prepared at the end of last year and are now in production. The ECAL laser corrections for the crystals transparency loss due to irradiation are now ready to be included in the Prompt Calibration Loop as well. They will be tested during the first weeks of data-taking on a dedicated stream and after careful validation will be deployed for use in the prompt reconstruction. The Tier 0 prompt processing is currently running with a delay of 48 hours to allow time for uploading the prompt calibration workflows in the condition database. This deployment represents an important milestone for alignment and calibration operations in CMS.

Cosmics data collected during the commissioning phase after the Christmas shutdown represent a key asset for the alignment of the silicon tracker. The alignment team used this dataset to assess promptly the small movements of the pixel detector and to prepare the alignment geometry for this year’s data-taking.

Finally, the conditions data used in the simulation have been updated in view of the massive Monte Carlo production, which will be used for the analyses being prepared for the summer conferences.

Database

In the first quarter of 2011 the Database project worked on further consolidating the various services it provides. Amongst the major milestones were the upgrade of all Oracle DBs to use version 10.0.2.5 in January and the addition of more disks to the online cluster in February, making good use of the time without beams. Both actions are to ensure the smooth running of the DB services during the upcoming data-taking in 2011 and to cope with the expected growth of the Databases. The IT DBA team has provided a first version of the tool to monitor DB-related information and is sending e-mail notifications for expiring passwords well ahead of the expiration date. Work is ongoing to extend this service to other areas like account locking, unusual growth of the accounts and the like. The internal review of the usage of the DB in the CMS Computing projects continued in the quarter. Most projects have presented details on their applications and their view of how to handle the expected large data volume during data-taking. The quality of the presentations was quite high and they were followed by fruitful discussions.

Fast Simulation

Since the last Bulletin, the Fast Simulation group has continued the effort to provide a more realistic simulation, while keeping high the requirements on speed and performance that are implicit in its mandate. All this must be seen in the context of the preparation for the challenges of the 2011 analyses that we all have ahead of us. The group introduced complete flexibility on the distribution of the number of pile-up events to be overlapped (in time) with the main generated event: the simple Poissonian model assumed so far (that remains as an option) can indeed be inadequate to describe correctly data sets which span across very different luminosity conditions. As a first application of this new feature, we provide the same kind of distribution as agreed for the next Spring ’11 Full Simulation production (flat until 10 events, then exponentially decreasing): this will also ease the comparison of the results obtained with the Fast and the Full simulations.

In coordination with the Higgs PAG and the EGamma POG, we are starting a test production of H→gg samples and we are testing the possibility to filter multi-jet QCD events at RECO level, thus exploiting the speed of the Fast Simulation to perform the complete simulation-reconstruction chain before deciding whether to save or reject an event according to loose cuts on photon identification. The possibility of filtering at the RECO level, which is only possible in an efficient way with a Fast Simulation, is expected to provide a significant increase in realism with respect to samples "EM-enriched" only at the generator level. We are witnessing a steady increase of the use of Fast Simulation by the physics groups. This is natural considering that more and more new physics searches start to be competitive with previous experiments, and parameter scans (e.g. in SUSY searches) demand large amounts of events to be produced quickly. This implies a more urgent demand of person-power for development and maintenance tasks; readers are invited to take a look at the Twiki https://twiki.cern.ch/twiki/bin/viewauth/ CMS/FastSimNeeds for the list of open tasks, which also includes a preliminary assessment of the amount of service points awarded to each task. Another implication of the growing interest in using the Fast Simulation is an increased need for careful validation. As already often remarked in the past, it is of paramount importance that the people in charge of release validations on behalf of the PAGs and POGs always provide their results for both Full and Fast simulations.

Analysis Tools

The new structure of retrieving jet energy corrections from the database is now fully established. The L1Offset (L1FastJet) corrections, which have been derived by the JetMET POG and which will gain increasing importance with the expected pile-up scenarios, gave a good and successful test case of the new structures; the new corrections were accessible for end user analyses very shortly after they had been derived.

The re-organisation of particle flow as an integral part of the RECO objects themselves (especially particle flow isolation for leptons) implied a transition of particle flow reconstruction packages from PhysicsTools into the CommonTools area. It was accompanied by a change in the data structure of RECO Taus. Final clean-up work is ongoing.

The CMSSW_3_8_7 Analysis Release was used for the analyses of 2010 data. In addition, nearly all analyses of the Top PAG shown at the Moriond conferences have made use of the PAT or other Analysis Tools in one way or another. Analysis Tools tightly accompanied a large majority of analyses and thus contributed to the big success of the 2010 data analysis campaign. A new Analysis Release CMSSW_3_9_9 has been announced and more releases of this kind are coming in the new release series for 2011. Users are highly encouraged to make use of them: they provide the best CMS reconstruction and analysis software available at the time, which can be applied out-of-the-box and with reasonable configurations without any need to apply recipes or to play tricks.

Now that Analysis Tools and the PAT are well established, the focus will now be on documentation and statistics tools. Work on the latter has already started in the area of multivariate analyses techniques and will soon start on the important subjects of combining results, calculating significances and determining limits using RooStat and other tools. This work is being done in close collaboration with the responsible person in the CMS Statistics Committee. Also documentation will gain emphasis by an adapted structure of tutorials and WorkBook and SWGuide contributions. These new structures are planned to be in place after the next PAT tutorial at CERN at the beginning of April and will be announced clearly by that time. This work is being done in close collaboration with the CMS User Support convenors.

Data Quality Management

With Cosmics, and finally with stable beam collision data since 13th March, we are happy to see that DQM tools are being used widely to assist commissioning of the CMS detector in preparation of the 2011 data-taking period. The group has been very busy maintaining daily operations, as well as improving DQM tools to prepare for sustainable future operation.

Data Quality Monitoring (DQM) group activities span both the online and offline areas. We manage 24/7 online and offline central DQM shift operation, as well as following up on the quality of data all the way to providing JSON files for the physics analysis through the process called “data certification”.  The DQM group is also responsible for developing, maintaining and validating many tools and programs, such as the Run Registry, DQM GUI, DQM releases etc., in order to carry out DQM operations.

2011 started with running intense offline DQM shifts in order to validate 2010 Dec22 ReReco data sets in time for the winter conferences. Double shifts were conducted in each of the three locations, simultaneously. The rest of the 2011 online and offline DQM shifts have been scheduled, following the new 2011 central shift sign-up procedure beginning in early February, together with the other central shifts.

Validating a new DQM release that incorporates many changes from a collection of many-tagged subsystems is a non-trivial time-consuming task. The testing and tagging procedure of each subsystem has been improved to simplify and improve the release validation process.

We have completed a detailed new design and plan for the Run Registry upgrade (RR3) and began the development work in January 2011. Run Registry keeps track of data quality information and also plays a key role producing JSON files. RR3, with a much cleaner design, should provide a better and more robust service, and we look forward to its completion in several months time.

A run can last hours while the duration of a Lumi Section (LS) is ~ 23 seconds. The LS is the smallest of data segments for which we could provide data quality information, and we have already been providing LS-based DCS information using automated processes. This is another area of intensive development and involves working closely with the subsystem DQM experts.

With the 2011 operation in full swing, we look forward to delivering high quality Data Quality Monitoring working closely with shift people and colleagues from many different areas.

Data and Workflow management

For the last few months, much of the DMWM Effort has been focused on rolling out the WMAgent based data processing system and commissioning it for Tier 1 operations. There has been continuous testing and refinement, with the Computing project's Data Operations and Integration teams running and re-running test workflows to ensure that the new system works and meets the goals set for reproducibility, reliability and a much higher level of automation than the previous ProdAgent system. As the WMAgent system has matured for Data Processing, it has also started to form the basis for CRAB3, the next generation of Analysis workflow management. The CRAB team is now shifting effort to developing and testing the prototype and wider testing will occur over the coming months.

The Data Aggregation Service (DAS) has rolled out into production and has been undergoing user tests and will soon supersede the current Data Discovery interface. This will allow users to query across many more CMS data services, rather than just being confined to DBS, making it possible to refine data searches using information from several sources such as DQM, Run Registry and others as they bring up data services.

Behind the scenes, the new DBS3 prototypes are up and running with development tests running against them, migration tests and cross checks of the old DBS2 data and ever more performance tests. PhEDEx continues to move CMS data at levels of over 200TB per week, with an improved schema and redesigned web interface in the pipeline.

The HTTP Group, a joint effort between Computing and DMWM, has commenced on a regular monthly release focussed on making sure we can deploy reliable, secure, central web services. There has been a lot of good work from Lassi Tuura and his team in refining and cleaning up the deployment of these critical services for CMS.


by L. Silvestris, T. Boccali, L. Sexton-Kennedy. Edited by J. Harvey and K. Aspola, with contributions from F. Cossutti, P. Lenzi, S. Banerjee, M. Hildreth, D. Lange, J-R Vlimant, G. Cerminara, R. S. Beauceron, C. Rovelli, F. Cavallari, A. Pfeiffer, V. Innocente, S. Giammanco, A. Perrotta, S. Rappoccio, R. Wolf, K. Maeshima, M. Rovere, S. Dutta, D. Evans and S. Metson