DAQ

The DAQ system has been deployed for physics data taking as well as supporting global test and commissioning activities. In addition to 24/7 operations, activities addressing performance and functional improvements are ongoing.

The DAQ system consists of the full detector readout, 8 DAQ slices with a 1 Tbit/s event building capacity, an event filter to run the HLT comprising 720 8-core PCs, and a 16-node storage manager system allowing up to 2 GByte/s writing rate and a total capacity of 250 TBytes.

Operation

The LHC delivered the highest luminosity in fills with 6-8 colliding bunches and reached peak luminosities of 1-2 1029/cm2/s. The DAQ was typically operating in those conditions with a ~15 kHz trigger rate, a raw event size of ~500 kByte, and a ~150 Hz recording of stream-A with a size of ~50 kB. The CPU load on the HLT was ~10%.

Tests for Heavy-Ion operation

Tests have been carried out to examine the situation for data-taking in the future Heavy Ion (HI) run. The high occupancy expected in HI running was simulated via non-zero-suppressed (NZS) data. The Tracker was not on at the time so it simulated “virgin raw” operation. Data compression was not used and the events were shipped to Tier0 with an average size of ~19 MBytes. It is expected that standard (lossless) ROOT compression will reduce this size to ~12 MBytes. A NZS Tracker FED event size of 50 kBytes yields an FRL data record size of 100 kBytes, at the level of the event builder input, for those FRLs that merge two Tracker FEDs. This is a factor 50 above nominal p-p conditions of 2 kBytes per FRL, and would correspond to a 2 kHz level-1 rate.

To test the event building limits for these huge events, recording of events was switched off and events were built without back-pressure at an input rate of 1.6 kHz. Although, it is slightly lower than expected, the 1.6 kHz level-1 rate is around 10 times larger than the anticipated HI rate. Recording events with the storage manager, the data throughput saturated at ~2.6 GBytes/s, as expected with 16 storage manager nodes. This is reduced to about ~1.6 GBytes/s if files are simultaneously transferred to Tier0, due to disk access contention.

There is the possibility that the total level-1 rate, in HI running, could be as high 300 Hz, rather than the 80 Hz initially stated. Above 150 Hz, some amount of data reduction before recording will be required, either by event rejection by the level-1 trigger or by data reduction in the HLT. From the test it is clear that, at the highest rates, the simultaneous transfer to the Tier0 would soon become the bottleneck. It is likely therefore that we will temporarily keep the bulk of the data at P5 and later transfer it to Tier0 in non-collisions periods.

Selected Developments

To support the DAQ shifter and analyze retrospectively the operation of the global DAQ a new tool has been developed. It is called the DaqDoctor. It correlates the monitoring information of various components in the central DAQ in order to draw conclusions on the overall DAQ status. The correlation helps to identify the origin of a given problem and if it corresponds to a known pattern, the tool presents instructions to the DAQ shifter. Furthermore, the shifters are alerted with sounds in case of problems. Cases handled are, for example, diagnostics when triggers have stopped, error states asserted by the sub-detectors on the TTS, PCs not responding, etc. The historical records of the DaqDoctor are interesting for subsystem experts in order to diagnose their systems port-mortem.

A searchable browser is available in the private network at the URL:
http://cmsdaqweb/cgi-bin/daqpro/DoctorsNotes.cgi

A subset of error conditions ordered by subsystem can be found under:
http://cmsdaqweb/cgi-bin/daqpro/subsystemErrors.cgi



by F. Meijers and C. Schwick