c* summit eu 2013: using cassandra in a telco storage system
DESCRIPTION
Speaker: Pavel Pontryagin, Senior Engineer at Peter-Service Video: http://www.youtube.com/watch?v=Sf81x5V8xKY&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=7 Data volume grows and in telecommunication area it is painful to support and scale RDBMS systems. This presentation shows how we switched from SQL to NoSQL. This will be an overview of aspects: * how we model schema for call data using NoSQL vs SQL. * what hardware architecture we use * NoSQL vs SQL insert-select performance * how we store graph data using C*.TRANSCRIPT
#CASSANDRAEU
CASSANDRASUMMITEU
Using Cassandra in a Telco Storage System
Pavel Pontryagin, senior developer
#CASSANDRAEU
CASSANDRASUMMITEU
* What hardware architecture we use
* How we model schema for call, client and graph data using NoSQL vs SQL
* NoSQL vs SQL insert-select performance
What we will discuss today is how we switched from SQL to NoSQL
#CASSANDRAEU
CASSANDRASUMMITEU
SQL
What and how much is Telco data
#CASSANDRAEU
CASSANDRASUMMITEU
* What is CDR? Under the hood.
#CASSANDRAEU
CASSANDRASUMMITEU
500 millions/day 100 millions/day
Scaling data in a traditional schema
#CASSANDRAEU
CASSANDRASUMMITEU
* Speed up processing * Increase reliability * Decrease costs * Unify units * Add new functionality
Unanswered questions. How to:
Solution for us is
Switch to Cassandra
#CASSANDRAEU
CASSANDRASUMMITEU
noSQL
How to load, model and query data
#CASSANDRAEU
CASSANDRASUMMITEU
The schema is
Local Storage
SSTable Loader
Local Storage
SSTable Loader
Local Storage
SSTable Loader
CDR files CDR files CDR files
Thrift Thrift Thrift 62Gb RAM 12 x CPU 2.50GHz 10Tb array
#CASSANDRAEU
CASSANDRASUMMITEU
SQL data Query
SELECT started, numA, numB FROM events_1003_main WHERE started > sysdate -1 and numA = ‘765-23-14’;
Data modeling. Calls
started numA numB Event type
Lac Cell IMSI IMEI
20131015210458 765-23-14 765-23-18 VOICE 2901 35140 IMSI1 IMEI1
#CASSANDRAEU
CASSANDRASUMMITEU
noSQL data “raw” data CALLS_201301 index cf CALLS_MSISDN_201301 Query
SliceQuery<String, Composite, String> sliceQuery; //define query sliceQuery.setColumnFamily(“CALLS_MSISDN_201301”); //define CF
sliceQuery.setKey(“201301.765-23-18”); //Query given msidn for given date Composite start = new Composite(); start.addComponent(0, “201301”, AbstractComposite.ComponentEquality.EQUAL); // from date Composite end = new Composite(); end.addComponent(0, “201303”, AbstractComposite.ComponentEquality.GREATER_THAN_EQUAL); // to date sliceQuery.setRange(start, end, false, 1000); QueryResult<ColumnSlice<Composite, String>> qr = sliceQuery.execute(); //get the result
uuid1 20131015210458,765-23-14,765-23-18,VOICE,2901,35140
201301.765-‐23-‐14 {2013-‐10-‐15 21/04/58:uuid1} numB
201301.765-‐23-‐18 {2013-‐10-‐15 21/04/58:uuid1} numA.IMSI1.IMEI1
#CASSANDRAEU
CASSANDRASUMMITEU
… where num like ‘%123’ Using SolR
Clone Keep using
+SolR indexing
#CASSANDRAEU
CASSANDRASUMMITEU
noSQL data “raw” data CALLS_201301 index cf CALLS_LAC_CELL_201301 Query
SliceQuery<String, Composite, Date> sliceQuery; //define query sliceQuery.setColumnFamily(“CALLS_LAC_CELL_201301”); //define CF
sliceQuery.setKey(“2013.05.01 15:15”); //Query given time slice Composite start = new Composite(); start.addComponent(0, “2901”, AbstractComposite.ComponentEquality.EQUAL); // from date Composite end = new Composite(); end.addComponent(0, “2901”, AbstractComposite.ComponentEquality.GREATER_THAN_EQUAL); // to date sliceQuery.setRange(start, end, false, 1000); QueryResult<ColumnSlice<Composite, Date >> qr = sliceQuery.execute(); //get the result
uuid1 20131015210458,765-23-14,765-23-18,VOICE,2901,35140
2013.05.01 15:15 {2901:35140: uuid1} 20131015210458
#CASSANDRAEU
CASSANDRASUMMITEU
SQL data Query SELECT distinct dn.msisdn, tl.name, t.full FROM SUBS_DATA.DCT_NUMS dn, SUBS_DATA.SUBS_NUMS_HIST snh, subs_data.clnt_attrs_hist cah, SUBS_DATA.DCT_NAMES t, dicts.telcos tl Where (dn.num_id = subn_num_id) and (cah.clnt_clnt_id = snh.clnt_clnt_id) and (t.name_id = cah.reg_name_id) and (tl.telco_id = dn.tlco_tlco_id) and (snh.started > to_date('10.09.2012', 'DD.MM.YYYY')) and (snh.started < to_date('17.09.2013', 'DD.MM.YYYY')) order by dn.msisdn;
Data modeling. Subscribers
started numA numB Event type
Lac Cell IMSI IMEI
20131015210458 765-23-14 765-23-18 VOICE 2901 35140 IMSI1 IMEI1
#CASSANDRAEU
CASSANDRASUMMITEU
noSQL “raw” subscribers data first slice update index cf SUBS_NAME
uuid1 Pavel, 765-23-14, address1, document1, 2, 21-01-2013, 1(active_status)
uuid2 Vadim, 765-23-16, address2, document2, 2, 21-01-2013, 1(active_status)
Pavel {21-01-2013: document1: 1} {21-01-2014: document1: 0} uuid1 uuid3
Vadim {21-01-2013: document2: 1}
uuid2
uuid3 Pavel, 765-23-14, address1, document1, 2, 21-01-2014, 0(active_status)
#CASSANDRAEU
CASSANDRASUMMITEU
Data modeling. Graph data
#CASSANDRAEU
CASSANDRASUMMITEU
noSQL graph data First record from the first switch Second record from the second switch CALLS_EDGES column family
2013/01/12.msisdn1 {VOICE: msisdn2: 1:uuid1} {VOICE: msisdn2: 2:uuid2} OUT OUT
2013/01/12.msisdn2 {VOICE: msisdn1: 1:uuid1} {VOICE: msisdn1: 2:uuid2} IN IN
uuid started numA numB Event type
subsciber
uuid1 2013/01/12 msisdn1 msisdn2 VOICE 1
uuid started numA numB Event type
subsciber
uuid2 2013/01/12 msisdn1 msisdn2 VOICE 1
#CASSANDRAEU
CASSANDRASUMMITEU
* Loading speed * Query speed * Fun
Summary
Write, rec/sec
Oracle 10 000 Cassandra 5 000 per node
seconds
Oracle > 1, depends on plans, statistics etc. Cassandra < 1, stable
#CASSANDRAEU
CASSANDRASUMMITEU
#CASSANDRAEU
CASSANDRASUMMITEU
* How to model "traditional" data * How to add full-text search to intensive data load * A couple of hardware issues
What we discussed today…