amazon redshift - meetupfiles.meetup.com/4035202/amazonredshiftmeetup.pdf · amazon redshift runs...
TRANSCRIPT
![Page 2: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/2.jpg)
What is a Data Warehouse ? • Large data volumes (TB to PB) • Queries are complex and IO intensive • Data typically loaded in batches
• Integrates with Business Intelligence tools for reporting and analysis
![Page 3: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/3.jpg)
DW - Existing AWS landscape
Scale Out
Fully SQL Compa2ble
Op2mized data import & export
Efficient Aggregates & Joins
Local storage
No single point of failure
RDS X X DynamoDB X X X EMR/Hadoop X X ½ X
![Page 4: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/4.jpg)
DW - Existing AWS landscape
Scale Out
Fully SQL Compa2ble
Op2mized data import & export
Efficient Aggregates & Joins
Local storage
No single point of failure
RDS X X DynamoDB X X X EMR/Hadoop X X ½ X RedshiJ X X X X X X
![Page 5: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/5.jpg)
Introducing Amazon Redshift
• Fully managed database service
• Built from the ground up for DW • Secure & Reliable – Fault tolerant, automatic backup, encryption
• Fast – Scale out, specialized hardware, columnar storage
• Inexpensive – 1/10th the cost of alternatives, pay as you go
• Easy to Use – Provision & resize with a few clicks • Compatible – JDBC/ODBC, mostly PostgreSQL compatible
![Page 6: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/6.jpg)
Why did we call it Amazon Redshift?
Edwin Hubble 1889 – 1953
![Page 7: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/7.jpg)
>> How much storage is provisioned by Redshift customers ?
>> How many Redshi< clusters were created in first 10 weeks?
![Page 8: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/8.jpg)
Amazon Redshift architecture • Leader Node
– SQL endpoint – Stores metadata – Coordinates query execution
• Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB
• Single node version available
10 GigE (HPC)
IngesKon Backup Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3
JDBC/ODBC
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
Leader Node
![Page 9: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/9.jpg)
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do unnecessary I/O
• To get total amount, you have to read everything
![Page 10: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/10.jpg)
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you only read the data you need
![Page 11: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/11.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes • Columnar compression saves
space & reduces I/O
• Amazon Redshift analyzes and compresses your data
analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
![Page 12: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/12.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Direct-attached storage
• Large data block sizes
• Track of the minimum and maximum value for each block
• Skip over blocks that don’t contain the data needed for a given query
• Minimize unnecessary I/O
![Page 13: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/13.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage to maximize throughput
• Hardware optimized for high performance data processing
• Large block sizes to make the most of each read
• Amazon Redshift manages durability for you
![Page 14: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/14.jpg)
Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
16 GB RAM
2 TB disk
2 cores
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
• Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2 • Need to leverage all the nodes
128 GB RAM
16 cores
16 TB disk
![Page 15: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/15.jpg)
Amazon Redshift parallelizes and distributes everything • Query
• Load
• Backup/Restore • Resize
![Page 16: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/16.jpg)
Amazon Redshift parallelizes and distributes everything
• Load in parallel from Amazon S3 or Amazon DynamoDB
• Data automatically distributed and sorted according to DDL
• Scales linearly with number of nodes
Amazon S3/DynamoDB
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
• Query
• Load
• Backup/Restore • Resize
![Page 17: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/17.jpg)
Amazon Redshift parallelizes and distributes everything
• Backups to Amazon S3 are automatic, continuous and incremental
• Configurable system snapshot retention period
• Take user snapshots on-demand
• Streaming restores enable you to resume querying faster
Amazon S3
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
• Query
• Load
• Backup/Restore • Resize
![Page 18: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/18.jpg)
Amazon Redshift parallelizes and distributes everything
• Resize while remaining online
• Provision a new cluster in the background
• Copy data in parallel from node to node
• Only charged for source cluster
• Query
• Load
• Backup/Restore • Resize
SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Leader Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Leader Node
![Page 19: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/19.jpg)
Amazon Redshift parallelizes and distributes everything • Query
• Load
• Backup/Restore • Resize
SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Compute Node
128GB RAM
48TB disk
16 cores Leader Node
• Automatic SQL endpoint switchover via DNS
• Decommission the source cluster
• Simple operation via AWS Console or API
![Page 20: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/20.jpg)
Amazon Redshift lets you start small and grow big Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
XL
XL XL XL XL XL XL XL XL
XL XL XL XL XL XL XL XL
XL XL XL XL XL XL XL XL
XL XL XL XL XL XL XL XL
Note: Nodes not to scale
![Page 21: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed](https://reader033.vdocuments.us/reader033/viewer/2022060405/5f0f320f7e708231d442f59a/html5/thumbnails/21.jpg)