@ andy_pavlo

21
@andy_pavlo @andy_pavlo Automatic Database Automatic Database Partitioning in Partitioning in Parallel Parallel OLTP OLTP Systems Systems SIGMO SIGMO May 22 May 22 nd nd , 201 , 201

Upload: hedda

Post on 14-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Automatic Database Partitioning in Parallel OLTP Systems. @ andy_pavlo. SIGMOD May 22 nd , 2012. Main Memory • Parallel • Shared-Nothing Transaction Processing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: @ andy_pavlo

@andy_pavlo@andy_pavlo

Automatic Automatic Database Database

Partitioning in Partitioning in

Parallel Parallel OLTP OLTP

SystemsSystemsSIGMODSIGMOD

May 22May 22ndnd, 2012, 2012

Page 2: @ andy_pavlo

2

Page 3: @ andy_pavlo

3

Page 4: @ andy_pavlo

4

Page 5: @ andy_pavlo

5

Page 6: @ andy_pavlo

Main Memory • Parallel • Shared-NothingTransaction Processing

H-Store: A High-Performance, DistributedMain Memory Transaction Processing SystemProc. VLDB Endow., vol. 1, iss. 2, pp. 1496-1499, 2008.

Page 7: @ andy_pavlo

7

ClientApplication

Database Cluster

Procedure NameInput

Parameters

Transaction

Execution

Database Cluster

Transaction

Result

Page 8: @ andy_pavlo

TPC-C NewOrder

8

Page 9: @ andy_pavlo

9

Page 10: @ andy_pavlo

10

Page 11: @ andy_pavlo

Automatic Database Design Toolfor Parallel Systems

Skew-Aware Automatic Database Partitioningin Shared-Nothing, Parallel OLTP SystemsSIGMOD 2012

Page 12: @ andy_pavlo

CUSTOCUSTOMERMERORDEORDERSRS

ITEMITEM

CUSTOCUSTOMERMERORDEORDERSRS

ITEMITEM

CUSTOCUSTOMERMERORDEORDERSRS

ITEMITEM

CUSTCUSTOMEROMERORDEORDE

RSRSITEMITEM

Schema

Workload

-------

-------

-------

-------

-------

-------

DDL

DDL

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

SELECT * FROM DISTRICT D_W_ID = 10 AND D_ID =9;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

SELECT * FROM DISTRICT D_W_ID = 10 AND D_ID =9;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

SELECT * FROM DISTRICT WHERE D_W_ID = 10 AND D_ID =9;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID,…) VALUES(10, 9, 12345,…);

SELECT * FROM WAREHOUSE WHERE W_ID = 10;

SELECT * FROM DISTRICT WHERE D_W_ID = 10 AND D_ID =9;

INSERT INTO ORDERS (O_W_ID, O_D_ID, O_C_ID,…) VALUES(10, 9, 12345,…);

NewOrdNewOrd

ererNewOrdNewOrd

ererDDLDDLCUSTOM

ERCUSTOM

ERORDERSORDERS

ITEMITEM

12

Page 13: @ andy_pavlo

o_id o_c_id

o_w_id

78703 1004 5 -

78704 1002 3 -

78705 1006 7 -

78706 1005 6 -

78707 1005 6 -

78708 1003 12 -

c_id c_w_id

c_last

1001 5 RZA -

1002 3 GZA -

1003 12 Raekwon

-

1004 5 Deck -

1005 6 Killah -

1006 7 ODB -

CUSTOMER ORDERS

CUSTOCUSTOMERMERORDERORDER

SS

CUSTOCUSTOMERMERORDERORDER

SS

CUSTOCUSTOMERMERORDERORDER

SS

ITEMi_id i_na

mei_price

603514

XXX 23.99 -

267923

XXX 19.99 -

475386

XXX 14.99 -

578945

XXX 9.98 -

476348

XXX 103.49

-

784285

XXX 69.99 -

ITEMITEM ITEMITEM ITEMITEM

CUSTOMERc_id c_w_i

dc_las

t…

1001 5 RZA -

1002 3 GZA -

1003 12 Raekwon

-

1004 5 Deck -

1005 6 Killah -

1006 7 ODB -

13

Page 14: @ andy_pavlo

CUSTOCUSTOMERMERORDERORDER

SS

CUSTOCUSTOMERMERORDERORDER

SS

CUSTOCUSTOMERMERORDERORDER

SSITEMITEM ITEMITEM ITEMITEM

Client Application

NewOrder(5, “Method Man”,

1234)

14

Page 15: @ andy_pavlo

Best DesignInput

Workload

-------

-------

-------

-------

-------

-------

Schema

DDL

DDL

Initial DesignRelaxationLocal Search

Restart

Large-Neighborhood

Search

15

Page 16: @ andy_pavlo

DistributedTransactions

WorkloadSkew Factor+

Cost Model

Page 17: @ andy_pavlo

Algorithm Comparison

(cost estimate)lower is better

TATP TPC-CTPC-C Skewed17

HorticultureState-of-the-Art

Page 18: @ andy_pavlo

+88% +16% +183%

HorticultureState-of-the-Art

Throughput

TATP TPC-CTPC-C Skewed18

(txn/sec)higher is better

Page 19: @ andy_pavlo

19

Page 20: @ andy_pavlo

Conclusion:Dating scene is still difficult.But partitioning your database is now easier.

Page 21: @ andy_pavlo

http://github.com/apavlo/h-storehttp://github.com/apavlo/h-storehttp://hstore.cs.brown.eduhttp://hstore.cs.brown.edu