vfabric sqlfire for high performance data
DESCRIPTION
Learn how to manage high performance data using VMware vFabric SQLFireTRANSCRIPT
Confidential
Managing High Performance Data with vFabric SQLFire
2 Confidential
Agenda Various NoSQL attributes and why SQL
SQLFire features + Demo
Scalability patterns• Hash partitioning
• Entity groups and collocation
• Scaling behavior using “data aware stored procedures”
Consistency model • How we do distributed transactions
Shared nothing persistence
3 Confidential
3
Confidential
We Challenge the traditional RDBMS design NOT SQL
Too much I/O
Design roots don’t necessarily apply today
• Too much focus on ACID
• Disk synchronization bottlenecks
First write to LOG
Second write to Data files
Buffers primarily tuned
for IO
4 Confidential
4
Confidential
Common themes in next-gen DB architectures
“Shared nothing” commodity clustersfocus shifts to memory, distributing data and clustering
Scale by partitioning the data and move behavior to data nodes
HA within cluster and across data centers
Add capacity to scale dynamically
NoSQL, Data Grids, Data Fabrics, NewSQL
5 Confidential
What is different ?
Several data models Key-value
Column family (inspired by Google BigTable)
Document
Graph
Most focus on making model less rigid than SQL
Consistency model is not ACID
5
Low scale Low scale High scale High scale Very high scale Very high scale
STRICT – Full ACID (RDB)
Tunable Consistency
Eventual
6 Confidential
What is our take with SQLFire?
7 Confidential
So, what is vFabric SQLFire?
Distributed, Main memory oriented SQL Data management platform
NoSQL characteristics of scalability, performance, availability but retains support for distributed transactions, SQL querying
It is also designed so you can use it as a operational layer in front of your legacy databases through a caching framework
8 Confidential
Show me a picture
8
As data changes, subscribers are pushed
notification eventsData transparently replicated and/or partitioned;Redundant storage can be in memory and/or on
disk
Many physical machine nodes appear as one logical system
Other
Synchronous read through, write through or
Asynchronous write-behind to other data sources and sinks
JavaClient
Increase/Decrease capacity on the fly
C#Client
JDBC
JDBC or ADO.NET
Databases
File system
Shared Nothing disk persistence
Each cache instance can optionally persist to disk
9 Confidential
Comparing with NoSQL, Object Data Grids
The Good• Little vendor specific
• Use SQL for Application DML, Queries
• Vendor specific stuff in DDL
• Better Query engine• Cost based optimizer, skip-list indexing, parallel queries
• No deserialization headaches
• Maintain referential integrity
• Easier to integrate with existing relational DBs and other products• Plug-n-play is a myth
10 Confidential
Comparing with NoSQL, Object Data Grids
Not So Good• Not as efficient for simple key access
• You can only manage scalar types• Nested graphs is painful
• Complex data relationships that could be represented as a single object and fetched using a key now may require a join• Join processing is computationally expensive
• OR mapping can add latency
11 Confidential
Features in 1.0
Partitioning and Replication
Multiple Topologies
• Peer-2-peer, client-server, WAN
Events framework
• Listeners, triggers, Asynchronous write behind
Queries
• Distributed, optimized for main memory
Procedures and Functions
• Standard Java stored proc and parallel “data aware”
Caching
• Loader, writers, Eviction, Overflow and Expiration
Command line tool
Manageability, Security
12 Confidential
12
Confidential
Flexible Deployment Topologies
Java Application cluster can host an embedded clustered database by just changing the URLjdbc:sqlfire:;mcast-port=33666;host-data=truejdbc:sqlfire:;mcast-port=33666;host-data=true
13 Confidential
13
Confidential
Flexible Deployment Topologies
14 Confidential
Partitioning & Replication
15 Confidential
Explore features through example
FLIGHTS---------------------------------------------
FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME,…..
PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER)
FLIGHTAVAILABILITY---------------------------------------------
FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER ,…..
PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE))
FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER)
FLIGHTHISTORY---------------------------------------------
FLIGHT_ID CHAR(6), SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3),…..
1 – M
1 – 1
SEVERAL CODE/DIMENSION TABLES---------------------------------------------
AIRLINES: AIRLINE INFORMATION (VERY STATIC)COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTSCITIES: MAPS: PHOTOS OF REGIONS SERVED
Assume, thousands of flight rows, millions of flightavailability records
16 Confidential
TableTable
CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. );
CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. );
SQLFSQLF SQLF
Creating Tables
17 Confidential
CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE;
CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE;
Replicated TableReplicated TableReplicated TableReplicated Table Replicated TableReplicated Table
SQLFSQLF SQLF
Replicated Tables
18 Confidential
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID);
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID);
TableTable
Partitioned TablePartitioned TablePartitioned TablePartitioned TablePartitioned TablePartitioned Table
Replicated TableReplicated TableReplicated TableReplicated Table Replicated TableReplicated Table
SQLFSQLF SQLF
Partitioned Tables
19 Confidential
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1;
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1;
TableTable
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Replicated TableReplicated TableReplicated TableReplicated Table Replicated TableReplicated Table
SQLFSQLF SQLF
Partition Redundancy
20 Confidential
CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);
CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);
TableTable
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Replicated TableReplicated TableReplicated TableReplicated Table Replicated TableReplicated Table
SQLFSQLF SQLF
Partition Colocation
Colocated PartitionColocated PartitionColocated PartitionColocated Partition Colocated PartitionColocated Partition
Redundant PartitionRedundant PartitionRedundant PartitionRedundant Partition Redundant PartitionRedundant Partition
21 Confidential
CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) PERSISTENT persistentStore ASYNCHRONOUS;
CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) PERSISTENT persistentStore ASYNCHRONOUS;
TableTable
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Partitioned TablePartitioned Table
Redundant PartitionRedundant Partition
Replicated TableReplicated TableReplicated TableReplicated Table Replicated TableReplicated Table
SQLFSQLF SQLF
Persistent Tables
Colocated PartitionColocated PartitionColocated PartitionColocated Partition Colocated PartitionColocated Partition
Redundant PartitionRedundant PartitionRedundant PartitionRedundant Partition Redundant PartitionRedundant Partition
sqlf backup /export/fileServerDirectory/sqlfireBackupLocationsqlf backup /export/fileServerDirectory/sqlfireBackupLocation
Data dictionary is always persisted in each serverData dictionary is always persisted in each server
22 Confidential
Demo default partitioned tables, colocation, persistent tables
FLIGHTS---------------------------------------------
FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME,…..
PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER)
FLIGHTAVAILABILITY---------------------------------------------
FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER ,…..
PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE))
FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER)
FLIGHTHISTORY---------------------------------------------
FLIGHT_ID CHAR(6), SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3),…..
1 – M
1 – 1
SEVERAL CODE/DIMENSION TABLES---------------------------------------------
AIRLINES: AIRLINE INFORMATION (VERY STATIC)COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTSCITIES: MAPS: PHOTOS OF REGIONS SERVED
23 Confidential
Scaling data with Partitioned tables
24 Confidential
Hash partitioning for linear scaling
Key Hashing provides single hop access to its partitionBut, what if the access is not based on the key … say, joins are involved
25 Confidential
Hash partitioning only goes so far
Consider this query :
Select * from flights, flightAvailability
where <equijoin flights with flightAvailability>
and flightId ='xxx';
If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is stored
Distributed joins are expensive and inhibit scaling
• joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodesEquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0
26 Confidential
Partition aware DB design
• Designer thinks about how data maps to partitions
• The main idea is to:
minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions
• Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
27 Confidential
Partition aware DB design
• Turns out OLTP systems lend themselves well to this need• Typically it is the number of entities that grows over time and not the
size of the entity.
• Customer count perpetually grows, not the size of the customer info
• Most often access is very restricted and based on select entities
• given a FlightID, fetch flightAvailability records• given a customerID, add/remove orders, shipment records
• Identify partition key for “Entity Group”• "entity groups": set of entities across several related tables that can all
share a single identifier
• flightID is shared between the parent and child tables• CustomerID shared between customer, order and shipment
tables
28 Confidential
Partition aware DB design
Entity Groups
Table FlightAvailability partitioned by FlightID colocated with Flights
FlightID is the entity group Key
29 Confidential
Partition Aware DB design
STAR schema design is the norm in OLTP design
Fact tables (fast changing) are natural partitioning candidates
• Partition by: FlightID … Availability, history rows colocated with Flights
Dimension tables are natural replicated table candidates
• Replicate Airlines, Countries, Cities on all nodes
Dealing with Joins involving M-M relationships
• Can the one side of the M-M become a replicated table?
• If not, run the Join logic in a parallel stored procedure to minimize distribution
• Else, split the query into multiple queries in application
30 Confidential
Scaling Application logic with Parallel “Data Aware
procedures”
31 Confidential
Procedures
Java Stored Procedures may be created according to the SQL Standard
SQLFabric also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.
In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients)
CREATE PROCEDURE getOverBookedFlights
(IN argument OBJECT, OUT result OBJECT)
LANGUAGE JAVA PARAMETER STYLE JAVA
READS SQL DATA DYNAMIC RESULT SETS 1
EXTERNAL NAME com.acme.OverBookedFLights;
32 Confidential
Data Aware Procedures
Parallelize procedure and prune to nodes with required data
CALL [PROCEDURE]
procedure_name
( [ expression [, expression ]* ] )
[ WITH RESULT PROCESSOR processor_name ]
[ { ON TABLE table_name [ WHERE whereClause ] } |
{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}
]
Extend the procedure call with the following syntax:
Fabric Server 2Fabric Server 1
Client
Hint the data the procedure depends on
CALL getOverBookedFlights( <bind arguments>
ON TABLE FLIGHTAVAILABILITY
WHERE FLIGHTID = <SomeFLIGHTID> ;
If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case)
33 Confidential
Parallelize procedure then aggregate (reduce)
CALL [PROCEDURE]
procedure_name
( [ expression [, expression ]* ] )
[ WITH RESULT PROCESSOR processor_name ]
[ { ON TABLE table_name [ WHERE whereClause ] } |
{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}
]
Fabric Server 2Fabric Server 1
Client
Fabric Server 3
CALL SQLF.CreateResultProcessor( processor_name, processor_class_name);
register a Java Result Processor (optional in some cases):
34 Confidential
Consistency model
35 Confidential
Consistency Model without Transactions
• Replication within cluster is always eager and synchronous
• Row updates are always atomic; No need to use transactions
• FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued
• Consistency in Partitioned tables• a partitioned table row owned by one member at a point in time
• all updates are serialized to replicas through owner
• "Total ordering" at a row level: atomic and isolated
• Membership changes and consistency – need another hour
• Pessimistic concurrency support using ‘Select for update’
• Support for referential integrity
36 Confidential
Distributed Transactions
• Full support for distributed transactions (Single phase commit)
• Highly scalable without any centralized coordinator or lock manager
• We make some important assumptions• Most OLTP transactions are small in duration and sizeMost OLTP transactions are small in duration and size
• W-W conflicts are very rare in practiceW-W conflicts are very rare in practice
• How does it work?
• Each data node has a sub-coordinator to track TX state
• Eagerly acquire local “write” locks on each replica
• Object owned by a single primary at a point in time
• Fail fast if lock cannot be obtained
• Atomic and works with the cluster Failure detection system
• Isolated until commit
• Only support local isolation during commit
37 Confidential
Scaling disk access with shared nothing disk files and
a “journaling” store design
38 Confidential
Disk persistence in SQLF
Parallel log structured storage
Each partition writes in parallel
Backups write to disk also
• Increase reliability against h/w loss
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
• Don’t seek to disk
• Don’t flush all the way to disk
– Use OS scheduler to time write
• Do this on primary + secondary
• Realize very high throughput
39 Confidential
Performance benchmark
40 Confidential
How does it perform? Scale?
Scale from 2 to 10 servers (one per host)
Scale from 200 to 1200 simulated clients (10 hosts)
Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
41 Confidential
How does it perform? Scale?
CPU% remained low per server – about 30% indicating many more clients could be handled
42 Confidential
Is latency low with scale?
Latency decreases with server capacity
50-70% take < 1 millisecond
About 90% take less than 2 milliseconds
43 Confidential
Q & A
VMWare vFabric SQLFire BETA available now
Checkout http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire
44 Confidential
Built using GemFire object data fabric + Derby
Storage – memory+disk, partitioning,
Replication, HA, events, Reliable distribution
JDBC
4.x
ADO.NET
GemFire CORE (from GFE) Simplifed Config
model
- Standard SQL DDL with extensions- Cluster wide
config
Query engine with Cost based optimizer; efficient tuple storage model,
skip list based indexing
Design focus: optimize for horizontally partitioned data models
- distributed scatter/gather- Rich SQL syntax
- read through- Write through
- parallel data-aware procedures
- write behind
QUERYING
FRAMEWORK for
Derby
NEW + Derby SQL façade on top of GFE framework
NEW
44