nosql slideshare presentation

35
Data Research Day 2013 for Telco Prepared by Nicolas Seyvet Help from N. Hari Kumar P. Matray

Upload: ericsson-labs

Post on 12-May-2015

3.570 views

Category:

Technology


0 download

DESCRIPTION

Can No-SQL technologies hold for the specific requirements that apply to the Telco domain? This is the Slideshare Presentation by Ericsson Researcher Nicolas Seyvet to accompany his blog "NoSQL for Telco" http://labs.ericsson.com/blog/nosql-for-telco

TRANSCRIPT

Page 1: NoSQL Slideshare Presentation

Data Research Day 2013

for Telco

Prepared by

Nicolas Seyvet

Help from

N. Hari Kumar P. Matray

Page 2: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 2

› Software Developer10+ years at Ericsson

› HLR, PGM, IMS-M, MMS, MTV, BCS

› Joined Research late 2012–BMUM -> BUSS (5+ years)–DUCI (<6 months)

› Active member in various /// groups–Linux (ELX, UMWP, etc.), Agile, SWAN, EQNA

› Open source contributor

Who AM I?

Page 3: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 3

›Why NoSQL?›CAP›Research activities›Market trends

The Plan

Page 4: NoSQL Slideshare Presentation

Data Research Day 2013

NoSQL: Why?

Page 5: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 5

NoSQL: Why?Trends – Usual Suspects

Gartner Data Center TCO Report, June 2012.

Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated content, RDF, ontologies

GossipSDN

Page 6: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 6

NoSQL: Why?TrendS: Architecture

1980s: Mainframe applications 1990s: Database as integration hub 2000s: Decoupled services

› Multicore› Parallelization/Distributed› Cloud› Schemaless

ApplicationApplicationApplicationApplication ApplicationApplication ApplicationApplication ApplicationApplication ApplicationApplication ApplicationApplication

Page 7: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 7

Two Ways to ScaleGo BIG or many?

PARTITION

(replication)

Page 8: NoSQL Slideshare Presentation

Data Research Day 2013

CAP

Vaila

bilit

y

artition

Page 9: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 9

› 2000 Prof Eric Brewer, PoDC Conference Keynote› 2002 Seth Gilbert and Nancy Lynch, ACM SIGACT News 33 (2)

CAP Theorem Brewer’s Conjecture

“Of three properties of shared-data systems – data Consistency, system Availability and tolerance to network Partitions – only two can be achieved at at any given moment in timeany given moment in time.”

Page 10: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 10

CAP Theorem The business decision

Partition

CONSISTENT

Available

OR

Page 11: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 11

CAP Summary

CA

CP

AP

Available

Consistent Partition Tolerance

Voldemort, Riak, Cassandra,

CouchDb, Dynamo like systemsTraditional relational: MySQL, PostgreSQL, etc.

HBase, MongoDB, Redis, BigTable like systems

AP: Requests will complete at any node possibly violating consistency

CP: Requests will complete at nodes that have quorum

Page 12: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 12

› Trends

Why NoSQL now?

“Internet size”, Cluster friendly

Rapid development / Solution oriented

Polyglot Persistence

Schemaless

Page 13: NoSQL Slideshare Presentation

Data Research Day 2013

Research ActivitiesTelCO ApplicabilityAggregationEvent Streams

Page 14: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 14

HBAseBigTable/Columnar

Region allocationFailoverLog splittingLoad balancingOne active (elected), many stand by

Holds regionsHandle I/O requestsIn-Memory data (MemStore)Split regionsCompact regions

Data filesWrite-Ahead Log (WAL)Rack awareDefault data replication x3

CoordinationMaster selectionRoot region lookupNode registration…

› ZooKeeper (cluster)

› Hadoop (cluster)

› HBase: 1 elected master / many region servers

Page 15: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 15

›Comprehensive report

›Using HBase is DOABLE!

TelCO Applicability StudyHbase For HLR data?

OK!

Page 16: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 16

HBASE BULK ProcessingEvent Processing & Aggregation

› 100 Million rowsQueries evaluatedSELECT col1 FROM tableSELECT SUM(col1) FROM table WHERE col2=val2

GROUP BY col3

› CPU

› RAM

› Network

› Schema

› Map/Reduce

› Scan

› Co-processor

Page 17: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 17

Bulk ProcessingScaling out/Horizontally

› 100 Million rows

› Linear scaling!

SELECT SUM(col1) FROM table WHERE col2=val2GROUP BY col3

Page 18: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 18

READ/WRITE100000 iterations

› 150,000,000 rows› row = key + 1 column (1K)

Entire cluster up and running8 nodes ( 1Master / 7 slaves)

Periodic degradation

Page 19: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 19

RobustnessKilling Them Softly…

Master

Slaves

Page 20: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 20

How much Data can it Fit?ITK / Constellation / CEA

› Network produces events– RNC, SGSN, S-&R-KPI– Traffic DPI– GTP-C

› CEA (Perfmon)– Correlated events

1000+ K events/s

10+ K events/s

EventFeederEvent

Feeder

HBaseBulkLoader

HBaseBulkLoader

Lookup data

Staging data on HDFS

Map/Reduce

HBasePutLoader

HBasePutLoader

Put.. Put.. Put…

10,000,000 subscribers

Page 21: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 21

The Upcoming Fight

Storkluster18 machines

Bigdata2 machines

Page 22: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 22

› It scales!

› TestDFSIO benchmark- Read > 3000 GB/s- Writes > 2000 GB/s

› But

…. it is not that simple…

What about HDFS ? Small files

(250 B)

CPU and I/OCPU and I/ONetworkNetwork

CPUCPU

Larger files(1 KB)

Larger files(1 KB)

Page 23: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 23

› It scales!

› And it gets…

more complicated

What about End to End?writing to Hbase included

200 K events/s

100 K events/s

Page 24: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 24

› Within ~2 hours – Rows/s ----------- 7K/s– CPU +++ x2– IO +++++++++ 100%

But….

Page 25: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 25

› Remember what we were doing?– Hint: Creating lots of small files to add to HBase?..

› Major compaction storm! – Manage compaction and region splitting

HDFS CURSECompaction Storm

HBaseBulkLoader

HBaseBulkLoader

M/R

Page 26: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 26

› Scalability … Scalability… Scalability

› It works but it is not so easy…

› Recommendation:– Polyglot data storage

Conclusion

Page 27: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 27

Page 28: NoSQL Slideshare Presentation

Data Research Day 2013

NoSQL

Page 29: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 29

› It is not about saying SQL is bad or should not be used

› ”An accidental neologism” – Martin Fowler› A twitter hash

› No prescriptive definition, just observations of common characteristics

– “Any database that is not a Relational Database”– Running well on clusters (scalable)– schemaless

› Polyglot persistence– Using different stores in different circumstances

NoSQL: The name

The term was coined at a meetup with the creators behind some prominent emerging databases... then there was a conference ...... and a mailing list ...... the name caught on ...... then there were more conferences ...... and here we are!

Page 30: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 30

NoSQL: Why?Trend No 2/4: Connectedness

Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated content, RDF, ontologies

Application

M2M

Page 31: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 31

NoSQL: Why?Trend No 3/4: Content Individualization

› Individualization of content› Decentralization

Schemaless•Extend at runtime•De-normalize•Domain design (not schema migration)

Page 32: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 32

› 4 emerging categoriesKey-Value

Graph

BigTable

Document

(NewSQL)

(Object)

NoSQL Landscape

DBN

Page 33: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 33

Consistency

“A system is consistent if an update is applied to all relevant nodes at the same logical time”

NoSQL solutions DO support Transactions

Standard database replication (or caching) IS NOT strongly consistent, as such any solutions making use of any of those is by definition Eventually Consistent at best

Strong consistency Weak consistencyAtomicity Consistency Isolation Durability (ACID)

Eventual consistency (inconsistency window)

Page 34: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 34

› “The network will be allowed to lose arbitrarily many messages sent from one node to another” [..]

› “For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response”

Gilbert and Lynch, SIGACT 2002

Partition Tolerance / Availability

High latency ~= Partition

CP: Requests will complete at nodes that have quorum

AP: Requests will complete at any node possibly violating consistency

Page 35: NoSQL Slideshare Presentation

Ericsson Internal | 2013-06-03 | Page 35

HBASE BULK ProcessingEvent Processing & Aggregation

Queries evaluatedSELECT col1 FROM tableSELECT SUM(col1) FROM table WHERE col2=val2

GROUP BY col3

› 100 Million rows