sql server to redshift data load using ssis

28
SQL Server To Redshift Data Load Using SSIS Reach for the Clouds, Inc. Next Generation SSIS Tasks and Connectors Series AUTHOR: NAYAN PATEL | SR. ETL SSIS ARCHITECT [email protected]

Upload: marc-leinbach

Post on 13-Apr-2017

183 views

Category:

Software


0 download

TRANSCRIPT

Page 1: SQL Server to Redshift Data Load Using SSIS

SQL Server To Redshift Data Load Using SSISReach for the Clouds, Inc.Next Generation SSIS Tasks and Connectors Series

AUTHOR: NAYAN PATEL | SR. ETL SSIS ARCHITECTN PAT E L @ R FTC LO U D S . C O M

Page 2: SQL Server to Redshift Data Load Using SSIS

Content• Introduction – SQL Server to Redshift Load • Video Tutorial – Redshift Data Load • Right way but hard way • Steps for Amazon Redshift Data Load from On-Premise files or RDBMS (e.g. MySQL, SQL

Server)• Doing it easy way • Should I use SSIS to load Redshift

• Setup your Amazon Redshift Cluster • Add inbound rule for Redshift Cluster • Automate Redshift Cluster Creation • Create Sample table and data in Source – (in this example SQL Server) • Create Sample table in Amazon Redshift • SQL Server to Redshift Data Load using SSIS • Conclusion - Related Links

Page 3: SQL Server to Redshift Data Load Using SSIS

Introduction – SQL Server to Redshift Load• Before we talk data load from SQL Server to Redshift

using SSIS lets talk what is Amazon Redshift (or sometimes referred to as AWS Redshift). Amazon Redshift is a Cloud based Data warehouse service. This type of system also referred as MPP (Massively Parallel Processing). Amazon Redshift uses highly modified version of PostGrey SQL Engine behind the scene. Amazon Redshift provides advantage of Scale as you go, at very low cost compared to onsite dedicated hardware/software approach.

Page 4: SQL Server to Redshift Data Load Using SSIS

Right way but hard way

• If you are reading some of the guidelines published by Amazon regarding Redshift Data load then you will quickly realize that there is a lot to do under the cover to get it going right way. Here are few steps you will have to perform while loading data to Redshift from your On-Premise server (Data can be sitting in files or Relational source).

Page 5: SQL Server to Redshift Data Load Using SSIS

Right way but hard waySteps for Amazon Redshift Data Load from On-Premise files or RDBMS (e.g. MySQL, SQL Server)• Export local RDBMS data to flat files (Make sure you remove invalid characters, apply

escape sequence during export)• Split files into 10-15 MB each to get optimal performance during upload and final Data

load• Compress files to *.gz format so you don’t end up with $1000 surprise bill :) .. In my case

Text files were compressed 10-20 times• List all file names to manifest file so when you issue COPY command to Redshift its

treated as one unit of load• Upload manifest file to Amazon S3 bucket• Upload local *.gz files to Amazon S3 bucket• Issue Redshift COPY command with different options• Schedule file archiving from on-premises and S3 Staging area on AWS• Capturing Errors, setting up restart ability if something fails

Page 6: SQL Server to Redshift Data Load Using SSIS

Doing it the easy way

• So if you are not sure you ready to code many steps listed above then you can use Amazon Redshift Data Transfer Task.• In next few sections we will describe how to setup your

Redshift Cluster for Demo purpose and load Data from SQL Server to Redshift using SSIS.

Page 7: SQL Server to Redshift Data Load Using SSIS

Doing it the easy wayShould I use SSIS to load Redshift• If you are curious which approach to use to load data then consider few

facts• Do you have existing ETL processes written in SSIS?• Do you need more visual approach and better work flow management

(what SSIS Provides)?• Do you need connection string encryption and other goodies offered by

SSIS such as native logging, passing parameters from SSIS environment• Do you have expertise available for SSIS in-house or you better stay with

command line scripts?• Do you have need to create workflow which can run on any server where

SSIS is not installed?

Page 9: SQL Server to Redshift Data Load Using SSIS

Setup your Amazon Redshift Cluster

4. ON NODE CONFIGURATION PAGE SPECIFY NODE TYPE (THIS IS VM TYPE), CLUSTER TYPE AND NUMBER OF NODE. IF YOU ARE TRYING UNDER FREE TIRE THEN SELECT SMALLEST NODE POSSIBLE (IN THIS CASE IT WAS DW2.LARGE). CLICK CONTINUE TO GO TO NEXT PAGE

Page 10: SQL Server to Redshift Data Load Using SSIS

Setup your Amazon Redshift Cluster

5. ON ADDITIONAL CONFIGURATION PAGE YOU CAN PICK VPC (VIRTUAL PRIVATE CONNECTION), SECURITY GROUP FOR CLUSTER AND OTHER OPTIONS FOR ENCRYPTION. FOR DEMO PURPOSE SELECT AS BELOW SCREENSHOT . CLICK CONTINUE TO REVIEW YOUR SETTINGS AND CLICK CREATE CLUSTER

Page 11: SQL Server to Redshift Data Load Using SSIS

Setup your Amazon Redshift Cluster

6. GIVE IT FEW MINS WHILE YOUR CLUSTER IS BEING CREATED. AFTER FEW MINUTES (5-10 MINS) YOU CAN GO BACK TO SAME PAGE AND REVIEW CLUSTER STATUS AND OTHER PROPERTIES AS BELOW. COPY CLUSTER ENDPOINT TO SOMEWHERE BECAUSE WE WILL NEED IT LATER.

Page 12: SQL Server to Redshift Data Load Using SSIS

Add inbound rule for Redshift ClusterN O T E : S K I P T H I S S T E P I F YO U H AV E A L R E A D Y A D D E D YO U R I P T O I N B O U N D E XC L U S I O N R U L E . BY D E FA U LT YO U C A N N O T C O N N E C T T O A M A Z O N R E D S H I FT C LU S T E R F R O M O U T S I D E AW S N E T W O R K ( E . G . F R O M YO U R O N - P R E M I S E S M AC H I N E ) . I F YO U W I S H T O C O N N E C T T H E N YO U M U S T A D D I N BO U N D E XC E P T I O N R U L E T O A L LO W YO U R R E Q U E S T T O R E D S H I FT C LU S T E R O N S P E C I F I C P O RT.T O A D D C R E AT E N E W I N BO U N D R U L E P E R F O R M F O L LO W I N G S T E P S

1. UNDER REDSHIFT HOME PAGE CLICK [SECURITY] TAB. YOU MAY SEE FOLLOWING NOTICE DEPENDING ON WHICH REGION YOU ARE. CLICK ON [GO TO THE EC2 CONSOLE] LINK OR YOU CAN DIRECT GO TO EC2 BY CLICKING SERVICES -> EC2 MENU AT THE TOP

Page 13: SQL Server to Redshift Data Load Using SSIS

Add inbound rule for Redshift Cluster

2. ON EC2 SECURITY GROUPS PAGE SELECT SECURITY GROUP ATTACHED WITH YOUR REDSHIFT CLUSTER AND THEN IN THE BOTTOM PANE CLICK ON INBOUND TAB

3. ON INBOUND TAB CLICK EDIT OPTION TO MODIFY DEFAULT ENTRY OR YOU CAN ADD NEW RULE

4. CLICK ON ADD RULE IF YOU WISH TO ADD NEW ENTRY ELSE EDIT AS BELOW AND CLICK SAVE

Page 14: SQL Server to Redshift Data Load Using SSIS

Automate Redshift Cluster CreationIf you have need to automate Redshift Cluster Creation or any of the following things automatically then check Redshift Cluster management Task• Automate Amazon Redshift Cluster Create Action in few clicks. You can also

add Access Security Rule.• Automate Amazon Redshift Cluster Delete Action• Fetch Amazon Redshift Cluster Property to SSIS Variable (e.g. Fetch Cluster

Status)• Fetch all cluster and their properties as DataTable (Use ForEach Loop and

iterate through all clusters)• Automate Redshift Cluster Snapshot Creation• Automate Redshift Cluster Snapshot Delete Action• Support for Wait until Cluster operation is done

Page 15: SQL Server to Redshift Data Load Using SSIS

Create Sample table and data in Source – (in this example SQL Server)

Note: Skip this step if you wish to use your own table. If you do so please ignore certain steps and screenshots mentioned in this article.

For this demo we will use Free Northwind sample database supplied by Microsoft.• Download Sample Database from here.• Extract the zip file -> Open *.sql file and run it to create new database with sample tables and data.

Page 16: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

4. DOUBLE CLICK ON THE TASK TO SEE UI.

5.CLICK ON [NEW] CONNECTION.

6. CONFIGURE REDSHIFT CONNECTION PROPERTIES AND CLICK TEST.

Page 17: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

7. TEST CONNECTION IS SUCCESSFUL THEN CLICK OK TO SAVE CONNECTION DETAIL.

8. ENTER FOLLOWING SCRIPT IN THE SQL TEXTBOX AND HIT OK TO SAVE IT.

Page 18: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

9. NOW RIGHT CLICK ON THE TASK AND EXECUTE. THIS SHOULD CREATE NEW TABLE IN REDSHIFT.

Page 19: SQL Server to Redshift Data Load Using SSIS

SQL Server to Redshift Data Load using SSIS

Once table is created now lets do real work to get data moving from SQL Server to Amazon Redshift. Perform the following steps to configure SSIS Amazon Redshift Data Transfer Task1. Drag Amazon Redshift Data Transfer Task on the SSIS designer surface.2. Double click on the task to edit properties.3. Select Action: In the top Action drop down select Bulk Import to Redshift from any RDBMS (e.g. MySQL, Oracle, SQL Server) option4. Configure Source: On the Source tab click [New] next to connection dropdown and configure Source connection or pick existing connection. In our case we are extracting data from SQl Server database (Northwind) on local server.Enter the following SQL Query to extract 100,000 rows from SQL Server

Page 20: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

5. CONFIGURE SOURCE STAGING AREA: ON THE SOURCE TAB YOU HAVE TO ENTER FOLDER LOCATION WHERE STAGING FILES WILL BE SAVED BEFORE WE UPLOAD TO REDSHIFT (SEE ABOVE SCREEN).

Page 21: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

6. CONFIGURE TARGET: ON TARGET TAB SELECT EXISTING REDSHIFT CONNECTION MANAGER (OR CREATE NEW), SELECT TARGET TABLE FROM THE DROPDOWN WHERE YOU WANT TO LOAD DATA. IF YOU HAVE LONG LIST OF TABLES THEN SIMPLY ENTER SCHEMA NAME IN THE SCHEMA FILTER TEXT BOX AND CLICK REFRESH TO RELOAD TABLE DROPDOWN WITH FEWER ITEMS.

Page 22: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

7. CONFIGURE RELOAD OPTION AND TARGET STAGING AREA: ON TARGET TAB CHECK TRUNCATE TARGET TABLE OPTION IF YOU WANT TO RELOAD EACH TIME EXECUTE THIS TASK ELSE LEAVE IT UNCHECKED TO APPEND RECORDS. WE ALSO HAVE TO SPECIFY AMAZON S3 STAGING AREAS WHERE REDSHIFT WILL LOOK FOR FILES TO LOAD.

Page 23: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

8. CONFIGURE FILE FORMAT: WE ARE GOING TO GENERATE CSV FILES FOR REDSHIFT LOAD SO MAKE SURE YOU SELECT CORRECT COLUMN DELIMITER. ALSO MAKE SURE YOU CHECK ALWAYS COMPRESS FILE OPTION TO REDUCE BANDWIDTH.

Page 24: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

9. CONFIGURE ARCHIVE OPTIONS: ON ARCHIVE TAB WE CAN SPECIFY HOW TO ARCHIVE SOURCE AND TARGET FILES WE GENERATED. SOURCE FILES ARE CSV FILES AND SOURCE STAGE FILES ARE *.GZ FILES (IF YOU SELECT COMPRESSION). TARGET STAGE FILES ARE EITHER CSV OR *.GZ FILES.BY DEFAULT SOURCE CSV FILES ARE KEPT AND ALL OTHER STAGE FILES ARE DELETED. SEE BELOW SCREENSHOT

Page 25: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

10. CONFIGURE ADVANCED OPTIONS: ON ADVANCED OPTIONS TAB YOU FINE TUNE LOAD PROCESS SUCH AS HOW TO HANDLE NULL DATA, HOW TO HANDLE DATA TRUNCATION ETC. READ HELP FILE FOR MORE INFO

Page 26: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

11. CONFIGURE ERROR HANDLING OPTIONS: ON ERROR HANDLING TAB YOU CAN SPECIFY HOW MANY ERRORS YOU WANT TO IGNORE BEFORE FAILING ENTIRE LOAD. YOU CAN ALSO REPLACE SOME INVALID CHARACTERS DURING YOUR IF YOU CHECK [ALLOW INVALID CHARACTERS] OPTION.

Page 27: SQL Server to Redshift Data Load Using SSIS

Create Sample table in Amazon Redshift

12. NOW FINALLY WE READY TO EXECUTE OUR SSIS PACKAGE. ONCE ITS DONE YOU CAN REVIEW LOG. HERE IS THE SAMPLE EXECUTION LOG .

Page 28: SQL Server to Redshift Data Load Using SSIS

ConclusionSo in this article we outlined different steps needed to load data into Redshift from relational source (e.g. MySQL, SQL Server, Oracle). Redshift is a great way to offload your expensive data warehouse to cloud so you don’t have to worry about costly maintenance and future growth. With redshift you can grow your data size from Gigabyte to Petabyte. SSIS Amazon Redshift Data Transfer Task. can give you an easy way to maintain your Redshift data transfer process with ease of use and fast load options (for full or incremental load).

Again this was just proof of concept but we encourage you to do your own benchmarking and research see which approach suites best for your need.• Related Links:• SSIS Amazon Redshift Data Transfer Task

TAGS: amazon redshift Amazon Redshift Data Transfer Task aws command line csv excel export How-To json mysql PDF Redshift SSIS SSIS PowerPack