New solutions for data management on the horizon

Almost all large-scale scientific experiments, including those at CERN, manage their data using relational databases, accessible with a programming language called SQL (Structured Query Language). But, as the amount of data continues to grow, there are also growing doubts that relational databases are the best solution.

 

New types of databases, called NoSQL, are promising a different way to access large amounts of data. The languages used in NoSQL are far less complicated, making the initial set-up much easier. In addition, data can be stored in a more flexible way, promising a faster way to access and manage data.

The CERN Database Group in the IT Department is participating in small-scale tests of NoSQL solutions with three of the four large detectors (CMS, ATLAS and LHCb). Over the past few months, non-relational database vendors – including Google, Hadapt, and Oracle – have also been presenting their NoSQL solutions to the IT Department.

“We have used the Oracle relational database for 30 years,” says Tony Cass, leader of the CERN Database Group. “Most people would probably expect this for administrative applications, but Oracle was introduced at first to support the construction and operation of the Large Electron–Positron Collider (LEP). Today, if Oracle doesn’t work, the LHC accelerator doesn’t work.”

“CERN’s Oracle databases have been highly optimized to deliver fast performance, and it takes time and expertise to adapt the databases for new queries,” continues Cass. “In contrast, creating NoSQL solutions for a novel application is often very rapid.”

“Some NoSQL databases are the best fit for certain problems,” says Simon Metson, who is in charge of the Data Management and Workflow Management team at CMS, which tested the implementation of NoSQL databases last year. “They do not require a lot of new written code to manage data.”

As yet, no one has done a comparison of use for large-scale data at CERN. Will a NoSQL solution be faster? No one knows. “In a year’s time we'll have a better understanding of the different NoSQL implementations, and we’ll have a growing realisation of what is appropriate,” says Cass.

This is an edited version of a story that first appeared in iSGTW, click here to read the full story.

by Adrian Giordani