using nosql technologies for handling of the cms conditions

38
Using NoSQL technologies for handling of the CMS Conditions Roland Sipos for the CMS Collaboration Forum on Concurrent Programming Models and Frameworks 17 June 2015

Upload: others

Post on 26-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using NoSQL technologies for handling of the CMS Conditions

Using NoSQL technologies for handling of the CMS Conditions

Roland Sipos for the CMS Collaboration

Forum on Concurrent Programming Models and Frameworks17 June 2015

Page 2: Using NoSQL technologies for handling of the CMS Conditions

Overview● Intro

○ CMS Conditions Database○ NoSQL

● Candidates○ Test framework○ Deployment

● Results● Outlook

2

Page 3: Using NoSQL technologies for handling of the CMS Conditions

Intro3

Page 4: Using NoSQL technologies for handling of the CMS Conditions

Conditions DatabaseAlignment and Calibration constants, that record a given “state” of the CMS Detector.

Essential for the analysis and reconstruction of the recorded data.

Also critical for the dataflow and need to be properly re-synchronized during the data processing.

4

Page 5: Using NoSQL technologies for handling of the CMS Conditions

CondDB - DetailsConditions are free from:● Full table scans

○ Only “by key” (or range of keys) access● Joins● Complex, nested queries● Transactions

○ Data is written once, and never deleted, altered● Absolute consistency

○ Only consistency criteria: newly appended data should be available for reads ASAP! (in less than few seconds)

5

Page 6: Using NoSQL technologies for handling of the CMS Conditions

CondDB - MotivationsFind alternative data storing technologies for the CMS Conditions data for:● Storing BLOBs● And it’s meta data● In a read-heavy environmentFurther requirements:● Durability● High availability● (Optional scalability)

Do we really need relational access for such use-case?6

Page 7: Using NoSQL technologies for handling of the CMS Conditions

Relational vs. Non-relational● Based on relational

model: relational algebra○ schema

● SQL: not necessarily but the most widespread

● Transactions○ ACID

● Well tested, "proven"

● Does not based on the relational model.○ schema free

● Query languages may differ:○ Datalog, XPath, etc.

● Unique operations (eg. CRUD)○ BASE

● Many, quite new solutions.○ beta phase versions, etc. 7

Page 8: Using NoSQL technologies for handling of the CMS Conditions

NoSQL - ACID vs. BASEACID

● Atomicity○ "all or nothing"

transactions● Consistency

○ data is always valid● Isolation

○ transactions are independent

● Durability○ permanent state

BASE● Basic Availability

○ it’s OK to give approximate answers

● Soft state○ easier (schema)

evolution● Eventual consistency

○ stored data achieves consistent state with time

8

Page 9: Using NoSQL technologies for handling of the CMS Conditions

NoSQL - CAP Theorem

Consistencyaka."All client see the same data at the same time!"

Availabilityaka."Every request got a response about success or failure!"

Partition toleranceEventualconsistency

RDBMSs

Enforced consistency(PAXOS)

You can choose only two by the following three attributes.

- Eric Brewer at 2000

10+ years already passed, many misleading/false information were born based on the theorem.

9

Page 10: Using NoSQL technologies for handling of the CMS Conditions

Partitioning - General● Distributed memory cache● Clustering

○ scaling the persistency layer● Separate operations (reads and writes)

○ dedicated master for write, group of slaves for read● Sharding - horizontal partitioning

○ the storage volume is distributed among many nodes

Scale up: "put more RAM or better processor to the server" Scale out: distribute data among computing elements

Scale out = partitioning10

Page 11: Using NoSQL technologies for handling of the CMS Conditions

Partitioning - Vertical

Splitting up stored data in one entity, into multiple ones. (It’s a bit more than normalization.)

“Different columns on different resources.”

E.g.: Frequently accessed columns in separate table, cached in-memory. Rarely used columns stored on disk.

11

Page 12: Using NoSQL technologies for handling of the CMS Conditions

Partitioning - Horizontal

One entity, however it’s data may be distributed on many storage elements.

“Different rows on different resources.”

E.g.: Year based distribution on different storage elements based on an “insertion time” constraint.

12

Page 13: Using NoSQL technologies for handling of the CMS Conditions

NoSQL - GeneralNoSQL in keywords:● Only a buzzword

○ Meaning: “One size does not fit all!”● CAP Theorem● ACID vs BASE● Different models

○ Doc. store, Key-Value, Column oriented, BigTable

NoSQL means: “we have options”!Not against relational DBs, but a complement to those!

13

Page 14: Using NoSQL technologies for handling of the CMS Conditions

NoSQL - Options

Options

Non-Relational Relational

Operational

Analytic

NewSQLNoSQL

Document

Key-value DaaS

Column oriented Graph

Oracle IBM DB2 JustOneDB MS SQL Server

HadoopCloudera Hadapt

Oracle TimesTen IBM InfosphereSAP (Hana, Sybase IQ) HP Vertica

SPARK

Lotus Notes

CouchDB MongoDB

MySQL PostgreSQL JustOneDB

ProgressObjectivity Versant

McObjectMarkLogic

SQL Azure RavenDB Amazon RDS

XeroundFathomDB NuoDB

Riak Redis Voldemort BerkleyDB

Cassandra Accumulo

BigTable HyperTable HBase

Neo4j

Couchbase SimpleDBApp Engine

Brand new RDBS Add-on

Clustrix VoltDB SnakeSQL

ScaleDBMySQL Cluster GenieDB TokutekDrizzle

Flat, Hierarchical, Network, etc...

source: Tim Gasper - Big Data Right Now: Five trendy open source technologies

14

Page 15: Using NoSQL technologies for handling of the CMS Conditions

Challenges 1.How to chose? Rule of thumb: Benchmarking!But an exact way for NoSQL benchmarking could not exist. (It's not like TPC-X for RDBMSs.) Even if we want to compare them, we must fight with different...

○ problems and possibilities,○ APIs,○ partitioning techniques, etc ...

15

Page 16: Using NoSQL technologies for handling of the CMS Conditions

Challenges 2.The main issue is that the design and preferred use-cases of the NoSQL databases are REALLY different. There is no "x better than y" argument for EVERY use-case.

Benchmarks are based on several use cases:● different computing elements,● compared by write/read/update/scan

operations. (mixed with a ratio: 90%/10%/0%)16

Page 17: Using NoSQL technologies for handling of the CMS Conditions

Prototypes - The candidates17

Page 18: Using NoSQL technologies for handling of the CMS Conditions

SelectionIn multiple phases...

Find:● Showstopper problems (no-go)● Barely usable (some issues)● Promising candidates

Preliminary testing.18

Page 19: Using NoSQL technologies for handling of the CMS Conditions

CandidatesNo-go

● HBase (/w HDFS)○ BLOB size problem.

● CouchDB○ Drivers

● Hypertable○ In development

● etc.: app layer needs, CAP characteristics, durability problems.

Promising● MongoDB

● Cassandra

So-so● RIAK

○ Query routing!● (Couchbase)

19

Page 20: Using NoSQL technologies for handling of the CMS Conditions

DeploymentAutomated virtual environments on OpenStack.

○ Personal tenant - biased by user interactions○ Thanks to the collaboration with CERN IT, the

evaluation was made on dedicated resources○ Also SSD cached vs. disk comparisons were made

Details:○ No overcommit○ Instances are “equally” distributed on the

hypervisors. (for 5 node: 2-2-1 on 3 hypervisors)○ 1 GBit NICs (shared between co-hosted VMs)

20

Page 21: Using NoSQL technologies for handling of the CMS Conditions

Evaluation

Empirical evaluation: Check if a given prototype meets the usability and performance criterias of the desired solution.

If more of them passes the criteria, choose the best, based on essential features and performance characteristics.

21

Page 22: Using NoSQL technologies for handling of the CMS Conditions

CustomSamplers 1.An extension for JMeter, with CMS specific needs, in order to measure the performance of different databases.For each candidate the extension has:● Deployers

○ To build up the data model● QueryHandlers

○ Simulate the CMS workflow● ConfigElements

○ Configure persistency objects● Samplers

○ Report to the testplan listeners22

Page 23: Using NoSQL technologies for handling of the CMS Conditions

CustomSamplers 2.Testplans are XML configurations that set up the behaviour of the testing engine by controlling:● number of threads● ramp up time of thread creation● configure connection layers● assign requests to threads

1 TPS : 1200 threads started in 1200 second, result in 20 min. constant stage tests. 23

Page 24: Using NoSQL technologies for handling of the CMS Conditions

ResultsIncreasing request numbers: 1-9 TPS(For both remote and single testplans)● Exploring limits for saturating factors like:

○ Network bandwidth○ Access of persistency objects○ Storage elements (Ephemeral disk/SSD, Ceph)

● Scaling out (different cluster setups):○ Node numbers (5 x m1.large, 4 x m1.medium) ○ Routing techniques (Round robin, Token-aware)○ Distributed testing (4 JMeter engine)

24

Page 25: Using NoSQL technologies for handling of the CMS Conditions

Single nodeNormal test - DISK

25

Page 26: Using NoSQL technologies for handling of the CMS Conditions

Single nodeRemote test - DISK

26

Page 27: Using NoSQL technologies for handling of the CMS Conditions

Single nodeNormal test - SSD

27

Page 28: Using NoSQL technologies for handling of the CMS Conditions

Single nodeRemote test - SSD

28

Page 29: Using NoSQL technologies for handling of the CMS Conditions

Medium clusterRemote test - DISK

29

Page 30: Using NoSQL technologies for handling of the CMS Conditions

Medium clusterRemote test - SSD

30

Page 31: Using NoSQL technologies for handling of the CMS Conditions

Large clusterRemote test - DISK

31

Page 32: Using NoSQL technologies for handling of the CMS Conditions

Large clusterRemote test - SSD

32

Page 33: Using NoSQL technologies for handling of the CMS Conditions

Ceph volume

33

Page 34: Using NoSQL technologies for handling of the CMS Conditions

Outro - Present and future34

Page 35: Using NoSQL technologies for handling of the CMS Conditions

Application layerThe current implementation of the session layer is highly modular and extendable with alternative storage backends. Steps:● Handling persistency objects

○ Extending the software framework with NoSQL support

● Implement the Session interfaces○ Implementing the “equivalent” CondDB queries

● Testing 35

Page 36: Using NoSQL technologies for handling of the CMS Conditions

Integration● Release validation● Find differences between the current

solution and the prototypes○ Using real data○ Real use-cases - using CMSSW

This will be the final performance comparison between different deployments.

36

Page 37: Using NoSQL technologies for handling of the CMS Conditions

Outlook● Understand and eliminate issues during the

release validation● Fine-tuning critical performance factors● Formal evaluation and comparison of the

different solutions

Long term project!Not a “by tomorrow” change, but for LS2.

37

Page 38: Using NoSQL technologies for handling of the CMS Conditions

The endThank you for your attention!

Any questions are welcome!

From: http://geek-and-poke.com/38