the big data revolution is an evolution

Post on 05-Dec-2014

855 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.

TRANSCRIPT

Eric Lubow

@elubow

elubow@simplereach.co

The Big Data Revolution is an

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Overvie• Evolution

• SimpleReach

• Data Stores / Languages

• Architecture Implementation

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

We're in the midst of an evolution, not a revolution.

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

The 2 Truths

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Even with the right tools, 80% of the work of building a big data system is acquiring and refining

The Real Truth

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

30m plays/day + 4m user ratings + 75k movies metadata + 24.4m users metadata =

David Fincher + Kevin Spacey + British House of

Cards

Mitch Hurwitz + Will Arnett + Jason Bateman + Arrested

Development

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

BRING IT TOGETHE

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

evolutionrevolutionInsufficient Capabilities

Scale/Need Changes

Development & Integration

New Products

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Millions of URLs per day

• Over 1 billion pageviews per month

• 250m events per day (~3k events/second)

• Auto-scale 90-130 machines depending on traffic

SimpleReach

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

HUMBLE BEGINNINGS

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Scale

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

AND THEN...

C*

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Large data volume ingestion at high velocity

• Really fast writes to many locations (eventual consistency)

• Query by column groups within rows (slicing)

• TTLs for small group aggregation

• Wrote Helenus, Node.js driver for Cassandra

Cassandra C*

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Fast atomic increments (Node.js is native JSON)

• Sharding

• Solid ORM for Rails (MongoID)

• B-Tree Indexes

• Document based via JSON

• TTLs for ephemeral data

MongoDB

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Supports hundreds of thousands transactions per second

• Great caching engine

• Supports useful variable types like sets, sorted set, lists

• Everything is guaranteed to be Memory Mapped

Redis

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Works with standard MySQL driver

• Column Stores for ad-hoc analytics queries in SQL

• Heavy compression of data (avg 12:1)

Infobright

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Polyglottany doesn’t only apply to data stores

• Each language has its own benefit to each stack layer

• Each language has its own individual benefits

• Each language has its own development benefits

The c0dez

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Cons• Redis - Can only utilize a single core. SerDe price.

• Infobright - DELETE/UPDATEs are VERY expensive

• Cassandra - No btree indexes or probabilistic counters

• Mongo - Indexes must fit in memory. Forced Replica ping times

• Python - Whitespace. Community

• Ruby - Not high performance enough for our standards

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Evolution Takes Work• Service Oriented Architecture (Internal API)

• Data accuracy checks: visual and programmatic

• Built framework for testing out engines (Storage, Queueing, etc)

• Access to many toolsets (for all languages, DBs, Engines)

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Service

Internal API

Solr

Real-timeC*

C*

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Path of a Packet

InternetEP

Inte

rnal

API

Solr

C*

Mong

Redis

IB

API

Fire Hos

SC

Cons

umer

s

Que

ue

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Architecture DistributionUS-EAST-1a

MONGO-SHARD-0001-B

MONGO-SHARD-0000-A

CASSANDRA-0001

CASSANDRA-0010

REDIS-0001A

INFOBRIGHT-0001

iAPI-0001

US-EAST-1b

MONGO-SHARD-0002-B

MONGO-SHARD-0001-A

CASSANDRA-0002

CASSANDRA-0011

REDIS-0001B

iAPI-0002

US-EAST-1e

MONGO-SHARD-0002-A

MONGO-SHARD-0000-B

CASSANDRA-0003

CASSANDRA-0012

INFOBRIGHT-0002

iAPI-0003

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

The Schrute of the Problem

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Evolving Amazon Tools• Full Featured API

• Simple Queuing Service

• Data Pipelining

• OpsWorks

• Cloud Formation

• Redshift Analytics

• CloudSearch

• Elastic Beanstalk

• Elastic MapReduce

• Simple Workflow Coordinator

• S3 / Glacier

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

DevOps Wizardry• Extensive use of AWS

• Monitor: Nagios, Statsd, and Graphite

• Manage: Chef, OpsWorks, cSSHx

• Deployments

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Summary• Solutions Require Evolution

• Build, Use, and Integrate Tools

• Abstraction

• Distribution

• Monitoring & Automation

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

A revolution only lasts fifteen years, a period which coincides with the

Evolution Takes Time

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

We’re (Ask us about Food Coma Fridays)

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Questions are guaranteed in life.Answers aren’t.

Eric Lubow

@elubow

elubow@simplereach.co

Thank you.

top related