clustrix database overview

Post on 03-Dec-2014

1.024 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Clustrix is the leading scale-out SQL database engineered for the cloud. With Clustrix, you can scale transaction throughput, run real-time analytics and simplify operations.

TRANSCRIPT

CLUSTRIX OVERVIEW

The Leading Scale-out SQL Database. Engineered for the Cloud

Presenters

PUBLIC CLOUDPRIVATE CLOUD

WHAT IS CLUSTRIX

DBaaS

• Vertically integrated solution

• In-house: private data center /colocation facility

• Maximum flexibility

• The only scalable primary SQL database in AWS

• Fully managed

• Monthly subscription

• Uses flash appliance

The Leading Scale-out SQL Database. Engineered for the Cloud

Flash Appliance DBaaS

E-commerce

TARGET APPLICATIONS

Gaming

Agoge

Consumer Web Advertising Analytics Healthcare Analytics

SaaS Gaming

BILLIONS OF ROWS

BILLIONS OF TRANSACTIONS

REAL-TIMEANALYTICS

MILLIONS OF USERS/DEVICES

IT’S TIME TO REINVENT THE SQL DATABASE

WEB-SCALE APPLICATIONS

GROWING DATA SETS

MILLIONS OF USERS

HIGH CONCURRENCY

REAL-TIMEANALYTICS

BILLIONS OF TRANSACTIONS

CLOUD COMPUTING

SCALE-OUT ARCHITECTURE

FAULT TOLERANT

EASY MANAGEMENT

SCALING A DATABASE IS HARD

Scale - Up Sharding NoSQL

NoSQL

CUSTOMER PRIORITIES

Time to MarketCost Scale and PerformanceOperational Simplicity

Expensiveband-aid

Application

Relational Logic

Engineering and ops overhead

Engineering and ops overhead

CLUSTRIX: BUILT FOR SCALE AND THE CLOUD

HIGH-SCALE TRANSACTIONS

• Linear scalability for writes/updates/reads

• Double nodes double transactions/sec

REAL-TIME ANALYTICS

• Linear speedup for analytics

• Double nodes half the query time

ACID, SQL AND MYSQL

SELF-MANAGING

BUILT-IN FAULT TOLERANCE

SCALE-OUT

Add nodes as demand grows

REAL WORKLOADS

PERFORMANCE AND SCALE

• Massive Media

• Near-linear scalability for reads/writes/updates

• Add more nodes to handle more TPS

• Near-linear speedup for analytics

• More nodes faster queries

• 20 million+ users / 70,000+ TPS• Write heavy workload; 1TB+ writes / day

High Scale Transactions Real Real-Time Analytics

CLUSTRIX DESIGN

Intelligent Data Distribution

Massively Parallel Query Processing

SharedNothingArchitecture

Node

QueryCompiler

Database Engine

Data map

Node

QueryCompiler

Database Engine

Data map

Node

QueryCompiler

Database Engine

Data map

SQL

SQL

SQL

SQL

SQL

Node

INTELLIGENT DATA DISTRIBUTION

Billio

ns o

f row

s Tables • Tables split into slices• Each slice has replica on another node

Node Node Node

S1S1 S2S2 S3S3 S4S4 S5S5

S2

S5

• Adding a node triggers re-balance

• Losing a node triggers re-protect

Node

PARALLEL QUERY PROCESSING

Simple queries

• Fielded by any node

• Routed to data node

Node Node Node

Complex queries

• Split into query fragments

• Process fragments in parallel

REPLICATION AND DISASTER RECOVERY

MySQL to Clustrix Replication Clustrix to MySQL Replication

Asynchronous replication

MySqlDump Backup

Clustrix Parallel Backup

Fast backup

DISASTER RECOVERY

CLUSTRIX TOOLS: INSIGHT

Real-time and historical insight into query performance

Monitor database health

DATABASE LANDSCAPE

Real-Time Analytics (OLAP)

Size: 10s of TerabytesMode: OnlineBest fit: Either

Data Warehousing

Size: PetabytesMode: OfflineBest fit: Column stores

Transactions(OLTP)

Size: 10s of TerabytesMode: OnlineBest fit: Row stores

IN-MEMORY COLUMN STORES

SHARED NOTHING ROW STORE

SHARED NOTHING COLUMN STORES

SINGLE NODEROW STORES

IN-MEMORY ROW STORES

SHARED DATA ROW STORES

100TBs

Query Complexity

MemSQL, VoltDB, MySql Cluster

MySql, MS Sql Server, IBM DB2, Oracle

Oracle RAC, NuoDB

SAP Hana

Clustrix

HP Vertica, EMC Greenplum, Amazon Redshift

Concurrent Writes/Updates

Single node query processing

Massively Parallel Processing1TB

USE CASES

High-Scale Transactions

MySQL Consolidation

Business Critical MySQL

10x SCALE without DB experts

or app changes

1/10th TCO benefit by eliminating

database sprawl

90% lower downtime with 50%

less TCO

200% performance gain

with 50% less TCO

Operational Intelligence

QUESTIONS AND NEXT STEPS

Questions?

OPERATIONAL INTELLIGENCE

Microsoft SQL ServerMedExpert proprietary treatment research

Analytics Application: Professionals provide expert advice to improve patient outcomes

New DoD & Medicare contracts Expected 100x increase in usage

One Scale-Out database • 4 nodes - growth to 20• Minor application changes & tuning

Alternatives ConsideredFusion I/O – 20% boostNo TTM to shard the application

Why ClustrixPOC showed performance boost for analytics queriesand linear scale for long term

Clustrix Results50% - 200% faster query responseTCO less than 50% near term

THE CHALLENGE

HIGH-SCALE TRANSACTIONS

• Write heavy workload with 1TB+ writes per day

• 20 million+ users / 70,000+ TPS

CLUSTRIX 18 NODES

• 11X+ the TPS of a single MySQL server

• 20B+ Rows of data

“Pre-Clustrix, we spent a lot of time on optimizing for performance and scale. Now we can spend those resources better.”

Toon CoppensCTO and Co-Founder

Massive Media

BUSINESS CRITICAL MYSQL

SaaS Application: Low cost course materials for education

Chaotic/Unstable MySQL Environment

Back-to-School ExpansionUptime during critical peak season

3 node clusters2 geographic locationsAutomated Fault Tolerance & Easy Expansion

Alternatives Considered Why Clustrix? Clustrix Results80% reduction in downtimeTCO reduction in 50%

HW upgrade = stop gap Replication implementation was unstable & custom

POC showed easy to upgrade and expand a live Ruby on Rails application

THE CHALLENGE

MYSQL CONSOLIDATION

MySQL Sprawl• 1150 databases• 100 DBAs

Private DBaaS • 10:1 Compression• Re-deploy staff

Alternatives ConsideredFusion I/O couldn’t keep upMySQL tools – too unstable

Clustrix Results90% lower TCO14 nodes today – growth to 35

E-Commerce: ¥1.2 trillion per year

Availability #1 priority

CHALLENGE

CLUSTRIX TECHNOLOGY

Intelligent Data Distribution Parallel Query EvaluationBi

llions

of r

ows Tables

• Tables split into slices• Auto-distribute, auto-protect, re-protect

Normal queries• Fielded by any node• Routed to data node

Complex queries• Split into query fragments• Process fragments in parallel

SQL

SQL SQL SQL

SQL 1

2

3

SQL

JSON

S1 S2S1 S2

• Application sees a linearly scalable, single instance MySQL database• Automatic fault tolerance• Online expansion, data (re) distribution, and schema changes

Node

SCALE AND FAULT TOLERANCE

• All data has multiple copies on different nodes

Node Node Node

AA BB CC DD EE

B

E

• Re-balance on adding a node

• Re-protect on losing a node

Node

PARALLEL QUERY PROCESSING

Node Node

Simple queries

• Fielded by any node

• Routed to data node

Complex queries

• Split into query fragments

• Process fragments in parallel

ANALYTIC QUERY PROCESSING

Read A, apply filter

SELECT a, bFROM A JOIN B on (id)WHERE (A.a = 15)

Read B and Join

Return to User

Node Node Node

Node Node Node

Analytic queries get speedup from Massively Parallel Processing• Concurrent Parallelism• Pipeline Parallelism

Send each row to correct Node

based on id

Node

StartQuery

Node

SQL FOR STRUCTURED DATA

SQL winsHierarchical loses Network loses

1970

SQL winsER losesObject loses

1980 1990 2000 2010

RelationalStructuredData

Unstructured Data

Single NodeSQL Struggles

Distributed SQL winsNoSQL wins

Clustrix

VerticaGreenplum

MongoDBCouchDBHadoop

System RIngres

OraclePostgres

NoSQL

Distributed SQL Primary

Distributed SQL Warehousing

With increasing data size,struggling old SQL implementationsare replaced by new Distributed SQL

CLUSTRIX APPLIANCE

Clustrix Appliance 3 Node Cluster (CLX 4110 )

• 24 Intel Xeon CPU cores • 144GB RAM • 6GB NVRAM • 1.35TB Intel SSD protected

• (2.7TB raw) data capacity• Low-latency Infiniband interconnect

top related