cloudcamp chicago lightning talk "building warehousing systems on redshift" - tristan...

9
Redshift: Lessons Learned Tristan Crockett – Software Engineer, Edgeflip

Upload: cloudcamp-chicago

Post on 27-Jul-2015

45 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Redshift: Lessons Learned

Tristan Crockett – Software Engineer, Edgeflip

Page 2: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Basics

● Analytical database● PostgreSQL with column storage engine● Automatic Data compression● No traditional indexes; specify a sort key (how

are records in the table sorted?) and distribution key (which node will house a record?)

Page 3: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

My Work with Redshift

● Data warehouse for Facebook user feeds and related app data

● Inputs– RDS (MySQL)

– DynamoDB

– Facebook

● Stats– ~2TB of compressed data

– Two main tables, ~5bil and ~25bil rows respectively

Page 4: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Advantages / Disadvantages

● Fast at copying data in from S3● Fast at computing aggregate/analytical

functions over an entire table● Decent at intra-db operations (create table as

select, insert into select)● Most everything else is slow● Without traditional indexes, table design isn't as

flexible

Page 5: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Lessons/Tips

● Optimize load size (1 MB to 1 GB per file)● Compress input● Upsert when needed, and always vacuum● Don't populate tables with 'CREATE TABLE AS'

if you like compression (which you do)● To avoid complicated joins, consider computing

single-table aggregates and join on the results

Page 6: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Upsert

Page 7: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Keep an Eye on Compression

Page 8: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Single-Table Aggregates

Page 9: CloudCamp Chicago lightning talk      "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock

Thanks for Listening!

[email protected]

@thcrock