nosql for sql professionals
TRANSCRIPT
Copyright © 2012 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 1
Unlock Potential
William McKnight President McKnight Consulting Group October 16, 2012
NoSQL for SQL Professionals
Dipti Borkar Director, Product Management Couchbase
2
William McKnight
President, McKnight Consul5ng Group • Frequent keynote speaker and trainer interna5onally • Consulted to Pfizer, Sco5abank, Teva Pharmaceu5cals,
Verizon, and many other Global 1000 companies • A prolific writer with hundreds of ar5cles, blogs and white
papers in publica5on • Focused on delivering business value and solving business
problems u5lizing proven, streamlined approaches to informa5on management
• Former Fortune 50 Informa5on Technology execu5ve
3
RDBMS LEGACY SOURCES
DATA MARTS DATA INTEGRATION
DATA WAREHOUSES MDBS
USERS/REPORTS
OPERATIONAL
ANALYTICAL
OPERATIONAL APPLICATIONS AND USERS
Former Enterprise Information Holy Grail
4
No More
5
The Relational Database Data Page
© McKnight Consulting Group, 2010
Page Header
Page Footer
Row IDs
Records
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Craig Lennox Mr
+66 1226 71269 [email protected]
1122Medtronic, Inc. Mark Kohls Principle Database Administrator
763.516.2557 [email protected]
6
What does Big Data Mean?
" Data in NoSQL - No SQL allowed or Not Only SQL?
" Sensor, social and web data? " Data in a system that does not support SQL? " A system with petabytes? " Hadoop?
7
" An increased number and variety of data sources that generate large quantities of data – Sensors (e.g. location, RFID, …) – Social (e.g. twitter, wikis, … ) – Web clicks
" Realization that data was “too valuable” to delete – Even when little signal to lots of noise
" Dramatic decline in the cost of hardware, especially storage – If storage was still $100/GB there would be no big data
revolution underway
Why the Sudden Explosion of Interest?
8
" More data model flexibility – JSON as a data model (think XML) – No “schema first” requirement; load first
" Faster time to insight from data acquisition " Relaxed ACID
– Eventual consistency – Willing to trade consistency for availability – ACID would crush things like storing clicks on Google
" Low upfront software costs " Utilizes Java " Full Scans " Programmers love the freedoms
Why NoSQL for Big Data
9
Hadoop, MapReduce and “Big Data”
• Parallel programming framework
• Hadoop is an open source distributed file system (HDFS) plus MapReduce
• Hadoop is used by those facing webscale-data challenges
10
Who uses Hadoop
40,000+ nodes running Hadoop Research for Ad systems and web search
Product search indexes Analytics from user sessions
Log analysis for reporting and analytics and machine learning
Log analysis, data mining, and machine learning
Large scale image conversion
High energy physics, genomics, Digital Sky Survey
11
ACID
" Atomicity – full transactions pass or fail " Consistency – database in valid state after each
transaction " Isolation – transactions do not interfere with one
another " Durability – transactions remain committed no
matter what (i.e., crashes)
12
What Gives the CIO Heartburn About NoSQL
" Developer Skills " Lack of ACID Compliance " Tools lacking and Projects Flawed " Fast Nature of Unburdened Projects " Different Developers " Schema-less/lite Models " Lack of Payback Methodology
13
1. Take a large problem and divide it into sub-‐problems
2. Perform the same func5on on all sub-‐problems
3. Combine the output
DoWork() DoWork() DoWork() …
…
…
Output
MAP
RE
DUCE
MapReduce
14
" Programming framework (library and runtime) for analyzing data sets stored in HDFS
" MapReduce jobs are composed of two functions – Map – Reduce
" User only writes the Map and Reduce functions " MR framework provides all the “glue” and
coordinates the execution of the Map and Reduce jobs on the cluster. – Fault tolerant – Scalable
MapReduce (MR)
15
A Quick Summary
Parallel DB Systems NoSQL Data Model " Structured data with known
schema " Any data will fit in any
format " (un)(semi)structured
Hardware Configuration
" Purchased as an appliance " “User assembled” from commodity machines
Fault Tolerance " Failures assumed to be rare " No query level fault tolerance
" Failures assumed to be common
" Simple, yet efficient, fault tolerance.
Where to do big data analytics?
16
Key-Value Stores
" NoSQL OLTP " A record may look like:
– Book: “Of Mice and Men": Author: “Hemmingway“
" Great for unstructured data centered on a single object.
" Typically used as a cache for data frequently requested by web applications such as online shopping carts or social-media sites.
17
" A record may look like: – “id” => 12345, – “name” => “Jane”, – “age” => 22, – “address” => number => 123 street => Main
" Often deployed for web-traffic analysis, social gaming, content stores, user-behavior/action analysis, or log-file analysis in real time.
Document Stores
18
" Based on Graph Theory – Vertices (nodes), edges (relations) and properties
" Navigating social networks, configurations and recommendations – i.e., Get the cheapest flights from DFW to SYD leaving
on 7/12/12 with a minimum number of stops and each stop less than 2 hours.
" i.e., Social Networks – Churn and Offer Management
Graph Stores: Emphasizing Relationships as Primary Data
19
From “Picking the Right NoSQL Database Tool” by Mikayel Vardanyan
Picking the Right NoSQL Database
20
The NoSQL Challenge
21
There’s No Technology Silver Bullet
21 >
Source: eBay, eBay Extreme Analytics in a Virtual World, Nov 10,2010
22
RDBMS LEGACY SOURCES
DATA WAREHOUSE APPLIANCE
DATA INTEGRATION
MDBS
USERS/REPORTS
MASTER DATA
OPERATIONAL
ANALYTICAL
OPERATIONAL APPLICATIONS AND USERS
COLUMNAR DATABASES
HADOOP
Hybrid Information Universe
DATA WAREHOUSE
DATA MARTS
ELEMENTS IN THE CLOUD
SYNDICATED DATA
DATA STREAM PROCESSING
NOSQL
23
" Increasingly data first lands in the unstructured universe
" NoSQL stores are big data "EL" tools " The Need for Data Integration with the Enterprise
UnBig (RDBMS)
Big (NoSQL)
Data Integration
24 24
Agile Approaches
15 Implementation
16 Release
Evaluation
11 ETL
Development
14 Metadata
Repository Development
12 Application
Development
13 Data Mining
9 ETL Design
10 Metadata
Repository Design
3 Project
Planning
1 Business Case
Assessment
2 Enterprise
Infrastructure Evaluation
5 Data
Analysis
7 Metadata
Repository Analysis
4 Project
Requirements Definition
6 Application Prototyping
8 Database
Design
Justification Planning Deployment
Business Analysis Design Construction
Support
17 Operate and
Maintain
Source: Business Intelligence Roadmap, Larissa Moss & Shaku Atre"
25
Source: Cloud Security and Privacy. An Enterprise Perspective on Risks&Compliance (Mather, Kumaraswamy & Latif)
Cloud Services
The benefits of cloud computing are: • On-Demand and Self Service • Broad Network Access • Resource Pooling • Rapid Elasticity • Measured Service
26
Information Store Guidance
Real-‐Time
Small Data OK
Terabytes Petabytes Historical Data
Unstructured Data
Source Data supplier to other systems
Random Queries
Ad-‐hoc
OperaKonal Systems
Columnar database
Data Mart (relaKonal)
Data Stream Processing
Data Warehouse
NoSQL
Master Data Management
MulKdimensional Mart
27
What Will Motivate IT to Adopt NoSQL?
" Continuation of Big Vendor Legacy Seen as Too Expensive
" Scaling: Data > 1 Machine " Schema Flexibility " Mandatory Requirements to Keep Multiple Years of Highly
Detailed Data " Tired of Losing “Deals” to More Agile Hybrid IT
Organizations " NoSQL Tool Marketplace Innovations
28
NoSQL for Interac5ve Applica5ons
29
2.0�
NoSQL Database NoSQL Document Database
Couchbase Server
30
Market Adop5on
Internet Companies Enterprises
• Social Gaming • Ad Networks • Social Networks • Online Business
Services • E-‐Commerce • Online Media • Content Management • Cloud Services
• Communica5ons • Retail • Financial Services • Health Care • Automo5ve/Airline • Agriculture • Consumer Electronics • Business Systems
31
Market Adop5on – Customers
Internet Companies Enterprises
More than 300 customers -‐-‐ 5,000 producKon deployments worldwide
32
RELATIONAL VS NOSQL DOCUMENT DATABASES
33
Rela5onal vs Document data model
RelaKonal data model Document data model Collec5on of complex documents with arbitrary, nested data formats and
varying “record” format.
Highly-‐structured table organiza5on with rigidly-‐defined data formats and
record structure.
JSON JSON
JSON
C1 C2 C3 C4
{ }
34
Example: User Profile
Address Info
1 DEN 30303 CO
2 MV 94040 CA
3 CHI 60609 IL
User Info
KEY First ZIP_id Last
4 NY 10010 NY
1 DipK 2 Borkar
2 Joe
2 Smith
3 Ali 2 Dodson
4 John 3 Doe
ZIP_id CITY ZIP STATE
1 2
2 MV 94040 CA
To get informaKon about specific user, you perform a join across two tables
35
All data in a single document
Document Example: User Profile
{ “ID”: 1, “FIRST”: “DipK”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }
JSON
= +
36
RDBMS Scales Up Get a bigger, more complex server
Users
ApplicaKon Scales Out Just add more commodity web servers
Users
System Cost Applica5on Performance
Rela5onal Technology Scales Up
RelaKonal Database
Web/App Server Tier
Expensive and disrupKve sharding, doesn’t perform at web scale
System Cost Applica5on Performance
Won’t scale beyond this point
37
Couchbase Server Scales Out Like App Tier
NoSQL Database Scales Out Cost and performance mirrors app Ker
Users
Scaling out flalens the cost and performance curves
Couchbase Distributed Data Store
Web/App Server Tier
ApplicaKon Scales Out Just add more commodity web servers
Users
System Cost Applica5on Performance
Applica5on Performance System Cost
38
NoSQL Database Considera5ons
Easy Scalability
Consistent High Performance
Flexible Data Model
Always On 24x7x365
Grow cluster without applica5on changes, without down5me
when needed
Always awesome experience for your applica5on users.
The sun never sets on the Internet, your applica5on needs the database
to always serve data.
Keep developers produc5ve and allow fast and easy addi5on of
new features
39
USE CASE AND APPLICATION EXAMPLES
40
Data driven use cases
• Support for unlimited data growth • Data with non-‐homogenous structure • Need to quickly and ofen change data structure • 3rd party or user defined structure • Variable length documents • Sparse data records • Hierarchical data
41
Performance driven use cases
• Low latency magers • High throughput magers • Large number of users • Unknown demand with sudden growth of users/data
• Predominantly direct document access • Workloads with very high muta5on rate per document
42
Use Case Examples
Web app or Use-‐case Couchbase SoluKon Example Customer
Content and Metadata Management System
Couchbase document store + Elas5c Search McGraw-‐Hill…
Social Game or Mobile App
Couchbase stores game and player data
Zynga…
Ad TargeKng Couchbase stores user informa5on for fast access
AOL…
User Profile Store Couchbase Server as a key-‐value store
TuneWiki…
Session Store Couchbase Server as a key-‐value store
Concur….
High Availability Caching Tier
Couchbase Server as a memcached 5er replacement
Orbitz…
Chat/Messaging Plaoorm
Couchbase Server DOCOMO…
43
• User account informa5on • User game profile info • User’s social graph • State of the game • Player badges and stats
Social and Mobile Gaming
• Ability to support rapid growth • Fast response 5mes for awesome user experience
• Game up5me –24x7x365 • Easy to update apps with new features
• Scalability ensures that games are ready to handle the millions of users that come with viral growth.
• High performance guarantees players are never lef wai5ng to make their next move.
• Always-‐on opera5ons means zero interrup5on to game play (and revenue)
• Flexible data model means games can be developed rapidly and updated easily with new features
Types of Data ApplicaKon Requirements
Why NoSQL and Couchbase
Use Case: Social Gaming
44
• User profile: preferences and psychographic data
• Ad serving history by user • Ad buying history by adver5ser
• Ad serving history by adver5ser
Ad TargeKng
• High performance to meet limited ad serving budget; 5me allowance is typically <40 msec
• Scalability to handle hundreds of millions of user profiles and rapidly growing amount of data
• 24x7x365 availability to avoid ad revenue loss
• Sub-‐millisecond reads/writes means less 5me is needed for data access, more 5me is available for ad logic processing, and more highly op5mized ads will be served
• Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
• Always-‐on opera5ons = always-‐on revenue. You will never miss the opportunity to serve an ad because down5me.
Types of Data ApplicaKon Requirements
Why NoSQL and Couchbase
Use Case: Ad Targe5ng
45
Use Case: Content and metadata store
Building a self-‐adapKng, interacKve learning portal with Couchbase
46
As learning move online in great numbers
Growing need to build interactive learning environments that
Scale!!
Scale to millions of learners
Serve MHE as well as third-‐party content
Including open content
Support learning apps
010100100111010101010101001010101010
Self-‐adapt via usage data
The Problem
47
• Allow for elastic scaling under spike periods • Ability to catalog & deliver content from many
sources • Consistent low-latency for metadata and stats access
• Require full-text search support for content discovery • Offer tunable content ranking & recommendation func5ons
Backend is an Interactive Content Delivery Cloud that must:
XML Databases
SQL/MR Engines
In-‐memory Data Grids
Enterprise Search Servers
Experimented with a combination of:
Hmmm...this looks kinda like: + Content Caching (Scale) + Social Gaming (Stats) + Ad Targe<ng (Smarts)
The Challenge
48
The Technologies
49
The Learning Portal
• Designed and built as a collaboration between MHE Labs and Couchbase
• Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration
• Available for download and further development as open source code
50
COUCHBASE SOLUTION “THE BASICS”
51
COUCHBASE SERVER CLUSTER
Basic Opera5on
• Docs distributed evenly across servers
• Each server stores both acKve and replica docs – Only one server ac5ve at a 5me
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on – App never needs to know
• App reads, writes, updates docs
• MulKple app servers can access same document at same Kme
User Configured Replica Count = 1
READ/WRITE/UPDATE
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1 ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc
SERVER 2
Doc 8
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
REPLICA
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
REPLICA
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
SERVER 3
Doc 6
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
Doc 9
52
Add Nodes to Cluster
• Two servers added with one-‐click operaKon
• Docs automaKcally rebalance across cluster – Even distribu5on of docs – Minimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3
SERVER 4
SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc
Doc 8 Doc
Doc 9 Doc
Doc 2 Doc
Doc 8 Doc
Doc 5 Doc
Doc 6
READ/WRITE/UPDATE READ/WRITE/UPDATE
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1
53
Fail Over Node
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 3
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3
SERVER 4
SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc 9
Doc 8
Doc Doc 6 Doc
Doc
Doc 5 Doc
Doc 2
Doc 8 Doc
Doc
• App servers accessing docs
• Requests to Server 3 fail
• Cluster detects server failed – Promotes replicas of docs to
ac5ve – Updates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Doc
Doc 1 Doc 3
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
User Configured Replica Count = 1
COUCHBASE SERVER CLUSTER
54
Couchbase Server Admin Console
55
56
Q & A
57
William McKnight [email protected] www.mcknightcg.com
Dipti Borkar [email protected] www.couchbase.com