(bdt309) delivering results with amazon redshift, one petabyte at a time | aws re:invent 2014

Post on 02-Jul-2015

1.003 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Amazon Enterprise Data Warehouse team, responsible for data warehousing across all of Amazon's divisions, spent 2014 working with Amazon Redshift on its largest datasets, including web log traffic. The key goals in this project were to provide a viable, enterprise-grade solution that enabled full scans of 2 trillion rows in under an hour at load. Key to success were automation of routine DW tasks that become complicated at scale: backfilling erroneous data, re-calculating statistics, re-sorting daily additions, and so forth. In this session, we discuss the scale and performance of a 100-node 1PB Amazon Redshift cluster, as well as describing some of the technical aspects and best practices of running 100-node clusters in an enterprise environment.

TRANSCRIPT

November 12, 2014 | Las Vegas, NV

Erik Selberg (selberg@amazon.com)

Samar Sodhi (samars@amazon.com)

selberg@amazon.com

Use Case Goal Benchmark

Scan 2.25 Trillion Rows

(15 months)

60m 14m

Load 5 Billion Rows

(1 day)

60m 10m

Load 150 Billion Rows

(30 days)

24 hours 9.75 hours

samars@amazon.com

– VACUUM is slow, physical partitions do not exist

• Doesn’t allow for parallel loads into the same table

• 15 concurrent queries

– “Bad” queries can impact the entire cluster

2x

– COMPUPDATE (samples the date) – fast but not optimal

FASTER 86.35%

GREATER THAN 15X 14.91%

10X TO 15X 18.42%

5X TO 10X 25.73%

3X TO 5X 19.88%

2X TO 3X 7.02%

1X TO 2X 3.80%

SAME 8.47%

SLOWER 5.65%

1X TO 2X 1.75%

FASTER 14.85%

3X TO 5X .56%

2X TO 3X 3.64%

1X TO 2X 10.64%

SAME 19.05%

SLOWER 66.11%

1X TO 2X 18.49%

2X TO 3X 8.96%

3X TO 5X 9.8%

5X TO 10X 10.08%

10X TO 15X 5.04%

SLOWER THAN 15X 13.73%

or

30 min

48 hours

48 hours

Daily (6B) 40 8XL nodes 100 8XL nodes

Vacuum 80 min 30 min

Stats Collection 90 sec 50 sec

Monthly (150B) 40 8XL nodes 100 8XL nodes

Vacuum (Deep

Copy) 380 min 201 min

Stats Collection 22 min 4 min

http://bit.ly/awsevals

top related