CERN Computing Colloquium | Scientific Databases at Scale and SciDB | 27 May

by Dr. Michael Stonebraker (MIT - Massachusetts Institute of Technology - Cambridge MA, USA)

Monday 27 May 2013 from 2 p.m. to 4 p.m.
at CERN ( 222-R-001 - Filtration Plant )

Abstract: As a general rule, scientists have shunned relational data management systems (RDBMS), choosing instead to “roll their own” on top of file system technology.  We first discuss why file systems are a poor choice for science data storage, especially as data volumes become large and scalability becomes important.

Then, we continue with the reasons why RDBMSs work poorly on most science applications.  These include a data model “impedance mismatch” and missing features. We discuss array DBMSs, and why they are a much better choice for science applications, and use SciDB as an exemplar of this new class of DBMSs.

Most science applications require a mix of data management and complex analytics.  In most cases, the analytics entail a sequence of linear algebra computations.  We discuss the possible ways of integrating a DBMS with statistical calculations, and conclude with the mechanism being used by SciDB.

More information here.