Thumbnail
Access Restriction
Open

Author Babik, M. ♦ Souto, F.
Source CERN Document Server
Content type Text
File Format PDF
Date Created 2012-06-26
Language English
Subject Domain (in DDC) Natural sciences & mathematics ♦ Physics ♦ Modern physics ♦ Technology ♦ Engineering & allied operations ♦ Applied physics
Subject Keyword sam-mr ♦ service availability ♦ service monitoring ♦ hadoop ♦ Computing and Computers ♦ monitoring ♦ non-relational storage ♦ sam ♦ service reliability
Abstract Service Availability Monitoring (SAM) is a well-established monitoring framework that performs regular measurements of the core site services and reports the corresponding availability and reliability of the Worldwide LHC Computing Grid (WLCG) infrastructure. One of the existing extensions of SAM is Site Wide Area Testing (SWAT), which gathers monitoring information from the worker nodes via instrumented jobs. This generates quite a lot of monitoring data to process, as there are several data points for every job and several million jobs are executed every day. The recent uptake of non-relational databases opens a new paradigm in the large-scale storage and distributed processing of systems with heavy read-write workloads. For SAM this brings new possibilities to improve its model, from performing aggregation of measurements to storing raw data and subsequent re-processing. Both SAM and SWAT are currently tuned to run at top performance, reaching some of the limits in storage and processing power of their existing Oracle relational database. We investigated the usability and performance of non-relational storage together with its distributed data processing capabilities. For this, several popular systems have been compared. In this contribution we describe our investigation of the existing non-relational databases suited for monitoring systems covering Cassandra, HBase and MongoDB. Further, we present our experiences in data modeling and prototyping map-reduce algorithms focusing on the extension of the already existing availability and reliability computations. Finally, possible future directions in this area are discussed, analyzing the current deficiencies of the existing Grid monitoring systems and proposing solutions to leverage the benefits of the non-relational databases to get more scalable and flexible frameworks.
Description Presented at: J. Phys.: Conf. Ser. 396 (2012) 052008 Computing in High Energy and Nuclear Physics 2012, New York, NY, USA, 21 - 25 May 2012, pp.052008
Learning Resource Type Article
Publisher Date 2012-01-01
Rights License Article: (License: CC-BY)
Organization CERN. Geneva. IT Department
Page Count 5