![Page 1: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/1.jpg)
Building a Data Warehouse on AWS
Amazon S3
Amazon Redshift
CollectCollect ProcessProcess AnalyzeAnalyzeStoreStore
Data Answers
Visualize
@Lynn Langit
![Page 2: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/2.jpg)
AWS MarketplaceEnterprise software store for business users who need simplified procurement
•2000+ product listings •to browse, test and buy software
•1-click deployment •to launch, in multiple regions around the world
•Pay-as-you-go pricing •to use on demand
Advanced Analytics
Data Enablement
Business Intelligence
![Page 3: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/3.jpg)
Building a Data Warehouse on AWS
Move data into Redshift from S3 for analysis
Amazon S3
Amazon Redshift
AWS Marketplace Partners
Matillion
Visualize
Yellowfin
CollectCollect ProcessProcess AnalyzeAnalyzeStoreStore
Data Answers
![Page 4: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/4.jpg)
Setup
![Page 5: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/5.jpg)
Our Scenario and Source FilesFile Types
-- Text - .csv-- Compressed - .gz
File CategoriesDetails / Events -- Flights
-- WeatherMetadata
-- Airports -- Carriers
“In this scenario we will use Matillion ETL for Redshift to prepare two separate data sources ready for analysis. The sample data is US airport flight information from 1995 -> 2008. Every flight to or from a US airport (and whether it left on time or not) is included.
The second data set is weather data, taken from NOAA, including the daily weather readings for each US Airport.”
![Page 6: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/6.jpg)
Loading data from S3 in to Redshift
![Page 7: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/7.jpg)
Using Matillion ETL for Redshift• Create Instance (AMI/EC2) of Matillion/AWS Marketplace• Connect Matillion to Redshift
![Page 8: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/8.jpg)
Loading Data in Redshift
![Page 9: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/9.jpg)
Table distribution stylesDistribution Key All
Node 1
Slice 1
Slice 2
Node 2
Slice 3
Slice 4
Node 1
Slice 1
Slice 2
Node 2
Slice 3
Slice 4
key1
key2
key3
key4
All data on every nodeSame key to same location
Node 1
Slice 1
Slice 2
Node 2
Slice 3
Slice 4
EvenRound robin distribution
![Page 10: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/10.jpg)
Sort Keys• Single Column - [ SORTKEY ( date ) ]
• Queries that use 1st column (i.e. date) as primary filter
• Compound - [ SORTKEY COMPOUND ( date, region, country) ]
• Queries that use 1st column as primary filter, then other columns
• Interleaved - [ SORTKEY INTERLEAVED ( date, region, country) ] • Queries that use different columns in filter
![Page 11: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/11.jpg)
Time Series Data – Vacuum Operation
Unsorte
dRegion
Sorte
dRegio
n Sorte
d
Sorte
d
Sorte
d
Append in Sort Key Order
Sort Unsorted Region
Merge
![Page 12: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/12.jpg)
Visualizing with Yellowfin
![Page 13: Building a data warehouse with AWS Redshift, Matillion and Yellowfin](https://reader033.vdocuments.us/reader033/viewer/2022042517/587eebb81a28ab17388b6dc9/html5/thumbnails/13.jpg)
Automate – https://github.com/lynnlangit/AWSDataWarehouse