CERN Accelerating science

Article
Title Automated agents for management and control of the ALICE Computing Grid
Author(s) Grigoras, C (CERN) ; Betev, L (CERN) ; Carminati, F (CERN) ; Legrand, I (Caltech) ; Voicu, R (Caltech)
Publication 2010
In: J. Phys.: Conf. Ser. 219 (2010) 062050
In: 17th International Conference on Computing in High Energy and Nuclear Physics, Prague, Czech Republic, 21 - 27 Mar 2009, pp.062050
DOI 10.1088/1742-6596/219/6/062050
Subject category Computing and Computers
Accelerator/Facility, Experiment CERN LHC ; ALICE
Abstract A complex software environment such as the ALICE Computing Grid infrastructure requires permanent control and management for the large set of services involved. Automating control procedures reduces the human interaction with the various components of the system and yields better availability of the overall system. In this paper we will present how we used the MonALISA framework to gather, store and display the relevant metrics in the entire system from central and remote site services. We will also show the automatic local and global procedures that are triggered by the monitored values. Decision-taking agents are used to restart remote services, alert the operators in case of problems that cannot be automatically solved, submit production jobs, replicate and analyze raw data, resource load-balance and other control mechanisms that optimize the overall work flow and simplify day-to-day operations. Synthetic graphical views for all operational parameters, correlations, state of services and applications as well as the full history of all monitoring metrics are available for the ent ire system that now encompasses 85 sites all over the world, mo re than 14000 CPU cores and 10PB of storage.
Copyright/License publication: (License: CC-BY)

Corresponding record in: Inspire


 Record created 2010-06-10, last modified 2022-08-17