cassandra summit 2014: the cassandra experience at orange — season 2

39
Jean Armel Luce Orange France Thursday, September 11 2014 The C* Experience at Orange Season 2

Upload: planet-cassandra

Post on 05-Dec-2014

316 views

Category:

Technology


1 download

DESCRIPTION

Presenter: Jean Armel Luce, Cassandra Administrator at Orange At the Cassandra Summit Europe 2013, Jean Armel presented "The Cassandra Experience at Orange - Season 1", explaining the 1st steps of Cassandra at Orange (choice of Cassandra, migration without any interruption of service, improvements of the QoS after the migration). For the "Cassandra Experience at Orange - Season 2", Jean Armel is going to focus on 2 new features added to the PnS application during the last months: Graphs and analytics. A Cassandra table must have 1 and only 1 primary key, while some data have many logical identifiers. Designing data as a graph may help! As for analyitcs, Hadoop + Hive allow to do analytics on data stored in Cassandra. This presentation is going to highlight a few tips about the installation of Hadoop/Hive over C*, and about the isolation between mapreduce tasks and on line queries.

TRANSCRIPT

Jean  Armel  Luce  

Orange  France  

 

 

 

 

 

 

     Thursday,  September  11  2014    

The  C*  Experience  at  Orange  Season  2  

2

The Cassandra experience at Orange - season 1

Summary  

1.  Why  did  we  choose  C*  for  our  applicaHon  PnS  ?  

2.  Our  migraHon  strategy  (without  any  interrupHon  of  service)  

3.  AOer  the  migraHon  …  

§  For more details , watch « the Cassandra Summit Europe 2013 »: –  hQps://www.youtube.com/watch?v=mefOE9K7sLI&feature=youtube_gdata  

 Jean Armel Luce - Orange-France

3

The Cassandra experience at Orange - season 2  

Summary  1.  Short  descripHon  of  the  applicaHon  PnS  

2.  Keyring:  why  design  customer  ids  with  graphs  in  C*  ?  

3.  BYOHH  (Bring  Your  Own  Hadoop  &  Hive)    with  Cassandra  

Jean Armel Luce - Orange-France

If you missed season 1: Short description of PnS

5

PnS: Short description of the application

§  PnS means Profile and Syndication: a highly available service for collecting and serving live data about Orange customers

§  End users:

–  Orange customers (www.orange.fr) –  Sellers in Orange shops –  Some services in Orange (advertisements, …)

Jean Armel Luce - Orange-France

6

PnS: The Big Picture

Jean Armel Luce - Orange-France

End users

Millions of HTTP requests (Rest or Soap) Fast and highly available

Database

WebService to get or set data stored by pns: -  postProcessing(data1) -  postProcessing(data2) -  postProcessing(data3) -  postProcessing(datax) -  …

PNS

Data providers

Thousands of files (Csv, json or Xml) Scheduled data injection

DB Queries R/W operations

7

§  1 multi DC cluster

§  and web services

(read and writes)

§  for batch updates

PnS: Architecture at the end of 2013

Jean Armel Luce - Orange-France

Bagnolet

Sophia Antipolis

2 DCs architecture for high availability

8

PnS: Some key dates about the PnS3.0 project

Jean Armel Luce - Orange-France

Season 1 (From April 2013 to October 2013)

Migration to C*

Season 2 part 1 (November 2013)

Keyring

Season 2 part 1 (April 2014)

Hadoop & Hive for Analytics

Keyring:    Why  design  customer  ids  with  graphs  in  C*  ?  

10

PnS database design

§  Nearly 35 tables at the end of 2013  

CREATE TABLE customers (

customer_id varchar,

col1 varchar,

col2 bigint,

col3 set<text>,

...,

coln timestamp,

PRIMARY KEY (customer_id));

§  SELECT colx, coly, colz FROM customers WHERE customer_id = '???' ;

 

Jean Armel Luce - Orange-France

11

Customer ids

§  What is a customer id ? –  cell  number  –  internet  account  –  email  address  –  ISE  (internal  identifier  used  by  many  other  Orange  applications)  –  ....    

§  For many reasons, data is stored in tables with different primary keys

–  some data are often retrieved using a cell number è stored when possible in a table where PK is a cell number

–  … but all customers don’t have a cell number è stored in a table where PK is not a cell number

–  …

Jean Armel Luce - Orange-France

12

Customer ids translation

§  A PnS user knows only 1 customer id

§  He often needs to retrieve data indexed by another kind of cust id in the DB

Jean Armel Luce - Orange-France

My cell number is (209)

123-4567 SELECT * FROM pns WHERE cust_id = ‘ISE_QWERTY’

customer_id translation

13

Database design in the old relational databases

§  Design with secondary indexes ?

SELECT email_address FROM customer_ids WHERE cell_number = ???;

§  Requires a lot of secondary indexes with values having high cardinality

§  With C*, secondary indexes with values having a high cardinality are wasteful

     

Jean Armel Luce - Orange-France

ISE Cell_ number

email_ address … idtypeN

Primary Key

Secondary indexes

intranet account

14

Design with graph for C*

Jean Armel Luce - Orange-France

IdType=‘Internet’ IdValue=‘99999999999

IdType=‘EMAIL’ IdValue=‘priam@

orange.com’ IdType=‘ISE’ IdValue=‘myISE1’

IdType=‘Cell’ IdValue=(209) 123-4567

IdType=ISE IdValue=‘myISE2’

IdType=‘EMAIL’ IdValue=‘hecuba@

orange.com’

IdType=‘Cell’ IdValue=(209)

123-4568 Type=‘ISE’

Value=‘myISE3’ IdType=‘Cell’ IdValue=(209)

123-4569 IdType=‘EMAIL’ IdValue=‘paris@

orange.com’

IdType=‘EMAIL’ IdValue=‘alexander@

orange.com’

15

The new « Customer ids » table in C*

§  Table of edges between customer ids

CREATE TABLE graph( idvalue1 text, -- type of the initial vertex of the arc idtype1 text, -- value of the initial vertex of the arc idvalue2 text, -- type of the terminal vertex of the arc idtype2 text, -- value of the terminal vertex of the arc attr map<text, text>, -- a column of map type for storing any kind of property t timestamp, PRIMARY KEY ((idvalue1) , idtype1 , idtype2 , idvalue2 ) );        SELECT * FROM graph WHERE idvalue1 IN (‘???’)

 

Jean Armel Luce - Orange-France

16

Small independant graphs

Jean Armel Luce - Orange-France

§  500.000.000 edges in the graph

§  The keyring graph is not a single large graph

§  It’s rather a lot of small independant undirected graphs Ø Each vertex has a small neighborhood. Ø  The search of a customer id is limited into a small subset of

the edges and vertices

17

Atomicity

§  The edges are bi-directional (undirected)

–  We need to insert or update 2 rows for each edge –  The atomic batch mode guarantees that the 2 directions are updated

atomically  

Jean Armel Luce - Orange-France

18

Jean Armel Luce - Orange-France

Optimization of the search of the shortest path

§  We know which kind of customer id are used by the PnS users

§  We know which kind of customer id are used for indexation

§  For each pair, the shortest paths are predefined in our application PnS (according to the kind of customer ids)

19

Jean Armel Luce - Orange-France

Search API in the graph

§  An in-house C++ library offers an API for an iterative breadth-first graph exploration

§  Example: looking for H from A

E

C

H

F

D

G

I

A

B

SELECT * FROM graph WHERE credval1 IN (‘B’, ‘F’);

20

Jean Armel Luce - Orange-France

Nb queries per search

§  Looking for a direct neighbour requires only 1 SELECT

§  Looking for a neighbour of a neighbour requires 2 SELECT

§  Looking for a neighbour of a neighbour of a neighbour requires 3 SELECT

§  …

21

§  A search executed using 1, 2 or 3 reads è very low response time (thanks to FusionIO and C++ code)

Jean Armel Luce - Orange-France

Search Response time

Number of searches/sec Response time per search (in ms)

Nearly 700 searches/sec 2ms < RTT < 3.5 ms

22

§  We had to rethink this feature, because C* != RDBMS

§  At first glance, a graph looks like an exotic design … but for our use case, it works well with C* … and FusionIO.

§  Favoring the access to data through the partitioning key is very efficient for getting a low response time and a linear scalability.

Jean Armel Luce - Orange-France

Conclusions about Keyring

BYOHH: Bring Your Own Hadoop & Hive with Cassandra

24

Poolofweb

serversDC1

Poolofweb

serversDC2

DC1 DC2

Jean Armel Luce - Orange-France

Basic architecture of the Cassandra cluster

§  Cluster without Hadoop: 2 datacenters, 16 nodes in each DC

§  RF (DC1, DC2) = (3, 3)

§  CL = ONE or LOCAL_QUORUM for online queries

§  Requests from web servers in DC1 are sent to C* nodes in DC1

§  Requests from web servers in DC2 are sent to C* nodes in DC2

25

Jean Armel Luce - Orange-France

Adding a new datacenter for analytics

§  Cluster with Hadoop/Hive: 3 datacenters, 16 nodes in DC1, 16 nodes in DC2, 4 nodes in DC3

§  RF (DC1, DC2, DC3) = (3, 3, 1)

§  Because RF = 1 in DC3, we need less storage space in this datacenter

§  We favor cheaper disks (SATA) in DC3 rather than SSDs or FusionIo cards

26

Jean Armel Luce - Orange-France

Architecture of the Cassandra cluster with the new datacenter for analytics

DC1 DC2

DC3

Poolofweb

serversDC1

Poolofweb

serversDC2

27

Jean Armel Luce - Orange-France

Potential impacts of map reduce tasks for online queries

DC1 DC2

DC3

Poolofweb

serversDC1

Poolofweb

serversDC2

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

Timeouts

HH

Timeouts

HHHH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HHHH HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

HH

Hinted Handoffs for online

update queries not replicated in DC3

Map reduce tasks take all the resources (CPU, RAM, IO, …)

Timeouts due to

CL=ONE used for online READ queries

28

CL ANY CL LOCAL_ONE CL ONE CL LOCAL_QUORUM CL EACH_QUORUM CL QUORUM CL ALL

Jean Armel Luce - Orange-France

Solution for timeouts (online READ queries)

Isolation between online queries and map reduce tasks:

§  Use a LOCAL CONSISTENCY LEVEL:

–  For map reduce tasks in DC3: –  LOCAL_ONE

–  For online queries in DC1 or DC2: –  LOCAL_ONE –  LOCAL_QUORUM

LOCAL_ONE is available since C* 1.2.12 (cf. JIRA CASSANDRA-6238)

Timeouts due to

CL=ONE used for online READ queries

29

Jean Armel Luce - Orange-France

Guarantee on resources for online queries

§  Use CGROUPS:

–  Can guarantee a minimum of CPU/RAM for online queries –  Cgroups cannot be used for I/O disks (Map tasks call C* processes

when reading data on disk)

Solution for Hinted Handoffs (online WRITE queries) 1/2

Hinted Handoffs for online

update queries not replicated in DC3

30

Jean Armel Luce - Orange-France

Solution for Hinted Handoffs (online WRITE queries) 2/2

Swap global and local read repair chances

§  By default, in C* 1.2:

–  read_repair_chance = 0.1 –  dclocal_read_repair_chance = 0.0

§  For highly read tables, the read repairs are not sent to DC3:

–  Set read_repair_chance = 0.00 –  Set dclocal_read_repair_chance = 0.1

Ø  Less load and IO disks in DC3 DCLOCAL_READ_REPAIR_CHANCE=0.1 is now the default since C* 2.0.9 (cf. JIRA CASSANDRA_7320)

DC1 DC2

DC3

Hinted Handoffs for online

update queries not replicated in DC3

31

Jean Armel Luce - Orange-France

Tradeoff “ease of exploitation vs optimization”

§  256 VN per C* node is usually recommended

§  At least 1 map task per virtual node in DC3

–  Disabling virtual nodes in DC3 adding new nodes in DC3 is less easy shorten the execution time

–  Enabling virtual nodes in DC3 adding new nodes in DC3 is easier,

What is the right number of vnodes ? 64 VN/node looks good.

32

Jean Armel Luce - Orange-France

Contributions and open sourced modules

§  Hive Handler open sourced by Orange

§  Works with CDH4.4 and C* 1.2.13

§  Feature added to this handler: authentication

§  Github: https://github.com/Orange-OpenSource/cassandra_handler_for_hive

Thanks to Cyril Scetbon for this handler

33

Jean Armel Luce - Orange-France

Conclusions about BYOHH

§  The installation of Hadoop & Hive is tricky, but we didn’t have choice for analytics because CQL has many limitations

§  We had to rethink our architecture. Now, we are able to do analytics with Hadoop + Hive with a better isolation between online and analytics queries.

§  We have also discovered an interesting ecosystem around C* which offers more capabilities. With this ecosystem, we can benefit from the strengths of C* and workaround some of the limitations.

Conclusions  

35

Jean Armel Luce - Orange-France

Season 2: Conclusions

1. Rethink

2. Adapt

3. Leverage

36

Jean Armel Luce - Orange-France

Thank  you  

37

Jean Armel Luce - Orange-France

Questions

38

Jean Armel Luce - Orange-France

A few answers about hardware/OS version /Java version/Cassandra version/Hadoop version

§  Hardware:

§  16 nodes in DC1 and DC2 at the end of 2013: §  2 CPU 6cores each Intel® Xeon® 2.00 GHz

§  64 GB RAM

§  FusionIO ® 800 GB MLC

§  4 nodes in DC3 §  24 GB de RAM

§  2 CPU 6cores each Intel® Xeon® 2.00 GHz

§  SATA Disks 15Krpm

§  OS: Ubuntu Precise (12.04 LTS)

§  Cassandra version: 1.2.13

§  Hadoop version: CDH 4.4 (with Hive 0.10): Hadoop 2 with MRv1

§  Hive handler: https://github.com/Orange-OpenSource/cassandra_handler_for_hive

§  Java version: Java7u45 (GC CMS with option CMSClassUnloadingEnabled)

39

Jean Armel Luce - Orange-France

A  few  answers  about  data  and  requests  

§  Data types:

§  Volume: 6 TB at the end of 2013

§  elementary types: boolean, integer, string, date

§  collection types

§  complex types: json, xml (between 1 and 20 KB)

§  Requests:

§  10.000 requests/sec at the end of 2013

§  80% get

§  20% set

§  Consistency level used by PnS for online queries and batch updates:

§  LOCAL_ONE (95% of the queries)

§  LOCAL_QUORUM (5% of the queries)