ssis etl within an hour · pdf filessis etl within an hour ... ssis naming conventions ......
TRANSCRIPT
SSIS ETL Within an Hour Alex Junk
Braden Smith
ITS – Business Intelligence Shared Service Center (BISSC)
What is ETL?
“Extract-Transform-Load”
Process of accessing and manipulating data from sources to load into Data Warehouse
Process can vary based on situation and goals
What is SSIS?
SQL Server Integration Services
Microsoft’s platform that can be used to build ETL solutions
Includes built in components for ETL
Control Flow
Task Execution Order, Containers, & Functionality
Data Flow
Sources, Transformations, Destinations
Extended with Task Factory components and others such as Attunity drivers
Task Factory Example - Fact Destination Table
UI Architecture
Can deploy to:
IOWASSISHOST1.iowa.uiowa.edu (2012 SSIS Server)
IOWASSISTEST3 & IOWASSISHOST3.iowa.uiowa.edu (2016 SSIS Server)
Secure Zone Servers:
IOWASSISHOST2.iowa.uiowa.edu (2012 SSIS Server)
IOWASSISTEST4 & IOWASSISHOST4.iowa.uiowa.edu (2016 SSIS Server)
*** TEST4 & HOST4 are still being provisioned at this time
SSIS Lifecycle
Initial Project Development:
Connect to and develop on REMOTEBI.iowa.uiowa.edu
Deploy the project to the Test SSIS server
Test and verify correct execution of the SSIS project
Promote the project to the Production SSIS server
Ongoing Project Development:
Re-factor or make changes to the project on REMOTEBI.iowa.uiowa.edu
Re-deploy the project to the Test SSIS server
Test and verify correct execution of revisions made to SSIS project
Re-deploy the project to the Production SSIS server
ETL in an Hour Demo
Install Visual Studio 2015
Visual Studio licensing
https://helpdesk.its.uiowa.edu/software/download/msselect/MS%20Select%20Products%205-19-2017.pdf
Install SQL Server Data Tools (SSDT)
https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt
Setup GIT repository on the University BitBucket service
SSIS Naming Conventions
http://sqlblog.com/blogs/jamie_thomson/archive/2012/01/29/suggested-best-practises-and-naming-conventions.aspx
ETL in an Hour Resources
Visual Studio licensing
https://helpdesk.its.uiowa.edu/software/download/msselect/MS%20Select%20Products%205-19-2017.pdf
Access to REMOTEBI.iowa.uiowa.edu
Email [email protected] to request access
SQL Server Database Request Form (Workflow)
https://workflow.uiowa.edu/form/its-sql-server-database-request
BISSC
Alex Junk ([email protected])
Braden Smith ([email protected])
Creating Project
Open Visual Studio
File>New>Project…
Templates>Business Intelligence>Integration Services
Name: BIC_SSIS
Solution Name: BIC_Demo
* Make sure “Create Directory for Solution” is checked
Encrypting Solution
Right Click on BIC_SSIS>Properties to change settings
Change protection level to at minimum “Encrypt Sensitive Data with Password”
Choose a password and remember that for a later step
Configuration Properties
Navigate to “Configuration Properties”>”Target Server Version” within the properties pane on the right hand side
Select Appropriate SSIS deployment server version (2012, 2014, 2016, 2017)
Warning Message Prompts: Change Package Security Warning and Target Server warning (Click OK or Yes for both
SSIS Package Setup
Delete original Package.dtsx and recreate another one (in the Solution Explorer window, right click SSIS Packages, click on New SSIS Package)
Rename SSIS Package to something more descriptive (GL_Demo)
Creating a new SSIS package will inherit the encryption ProtectionLevel from the project
Create Connection Manager
Right-click in the “Connection Manager” Area, then select “New OLE DB Connection…”
Configure Connection Manager
Change Provider to Oracle Provider for OLE DB
Set Server or File Name to: DWTEST.WORLD
Use a specific user name and password for authentication (EX: Your_app_User)
Test Connection to verify connection is ok
Finish Connection Configuration
Rename connection to something more descriptive EX: Source_OracleDW
Convert connection to Project Connection (This allows SSIS packages to use the same connection)
Right Click Connection> “Convert to Project Connection”
Parameterize Connection
Right Click the connection, then select “Parameterize”
Create Parameters
As a guideline it is good to parameterize at least the following DB Connection Properties depending on authen:
Oracle DB Parameters: ServerName, UserName, Password
SQL DB Parameters: ServerName, InitialCatalog, UserName, Password
Repeat Process for each parameter needing to be parameterized
Parameter Location
These New Parameters will be located within the Project.Params location
Create a Data Flow Task
Drag and drop Data Flow Task from the SSIS Toolbox window onto the workspace area
Rename Data Flow Task to something more descriptive (DFT - Import GL data to SQL)
Double click on the Data Flow Task
Create Data Source
Create a data source by dragging and dropping the OLE DB Source item onto the workspace area
Rename OLE DB Source to something more descriptive (ORA_SRC – GL_DTL_DAILY)
Configure Data Source
Double click the ORA_SRC – GL_DTL_DAILY data source
Data access mode>SQL command
Add SQL Command Paste in SQL query for the data wanting to be retrieved (example below):
SELECT
GL_CORP_ELEM_CD,
GL_JRNL_ID,
GL_JRNL_EFFECT_DT,
GL_JRNL_SEQ_NBR,
GL_JRNL_LN_NBR,
ACCTG_PRD,
ACCTG_FISC_YR,
SUBLEDGER_ACCT,
SOURCE_CD,
GL_JRNL_DESC,
SSN,
CHECK_NBR,
VOUCH_NBR,
PO_NBR,
GL_JRNL_AMT,
GL_LOAD_DT,
SOURCE,
LEDGER
FROM
GL.GL_DTL_DAILY
View Columns
Click on the Columns on the left pane of the Source Editor to view the columns being loaded from the data source. Modifying the Output Column is optional and may be useful for some data loads.
Add Derived Column
Find Derived Column within the Common section of the SSIS Toolbox and drag it onto the workspace area.
Rename as appropriate for the column being created (DER – Concatenate columns together)
Configure Derived Column
Single click on ORA_SRC - GL_DTL_DAILY
Click and drag the blue arrow to DER – Concatenate columns together
Now the components are connected!
Add Sort
Drag and drop a Sort component onto the workspace area
Rename it SRT - Remove GL_DTL_DAILY duplicates
Single click on DER - Concatenate columns together
Click and drag the blue arrow to SRT - Remove GL_DTL_DAILY duplicates
Configure Sort
Double click on SRT - Remove GL_DTL_DAILY duplicates
Select all the Available Input Columns needing to be sorted (to remove duplicate rows, select all the columns that make up a unique identifier)
Click the checkbox next to Remove rows with duplicate sort values
Add Destination
Drag and drop an OLE DB Destination onto the workspace area
Rename it SS_DST - GL_DTL_DAILY
Single click on SRT - Remove GL_DTL_DAILY duplicates
Click and drag the blue arrow to SS_DST - GL_DTL_DAILY
Configure Destination Connection
Double click on SS_DST -GL_DTL_DAILY
Click on the OLE DB connection manager dropdown
Select the appropriate database destination
Configure Destination
Leave Data access mode as Table or view - fast load
Click to Name of the table or the view dropdown and select the appropriate destination table (GL_DTL_DAILY)
SSIS Create Table
*** If a table is not already setup, SSIS can create the table by clicking New… next to the Name of the table or the view dropdown
Make sure to modify the table name within the CREATE TABLE statement
For the CREATE TABLE statement to work correctly, the credentials used for the Connection Manager must have table create permissions within the database
A recommendation is to copy and paste the CREATE TABLE statement into SQL Server Management Studio (SSMS), DBVisualizer, or a text-editing program in order to add columns, such as an identity column, and constraints, such as a primary key
Also note that SSIS tries to map each column to the correct datatype, but sometimes this can be problematic (i.e. – Oracle timestamp columns)
SSIS Create Table…Cont.
Notepad++ (Changed table name to GL_DTL_DAILY and added an identity column GL_DTL_DAILY_ID
Added a primary key constraint at the end of the CREATE TABLE statement
When the CREATE TABLE statement is finished, copy and paste it back into the SSIS Create Table window
Click OK to execute the CREATE TABLE statement (this will create the table in the database)
Column Mappings
Click on Mappings to display the Available Input Columns, coming from the SSIS package and the Available Destination Columns, residing in the destination database
Make sure every Input Column needing to be stored in the destination database is mapped to the appropriate column
To map the columns, click and drag from one of the Available Input Columns to the appropriate Destination Column
Click OK
Congratulations! This first ETL is complete.
Deployment
Right click on SSIS project wanting to be deployed or re-deployed to SSIS server>Deploy
Click Next
Configure Deployment
Enter Server name if it isn’t populated already (IOWASSISTEST3.iowa.uiowa.edu)
Click Browse next to the Path textbox
Expand SSISDB and select your workgroup folder (BISSC)
Click OK
Advance Deployment
Click Next
Deploy
Click Deploy
Finish Deployment
Click Close when all the Results say Passed
Create Environment
Create Environments on the SSIS server
Connect to SSIS server using SSMS
Expand Integration Services Catalogs>SSISDB>WorkgroupCatalog
Right click Environments>Create Environments…
Name Environment
Enter Environment name (BIC_Demo - TEST)
Click OK
Create another environment (BIC_Demo - PROD)
Environmental Variables - Test
Right click on the test environment (BIC_Demo -TEST)
Click Properties
Select the Variables page (in the upper left hand corner)
*** Name the variables the same as what are in the SSIS Project Params
Click in the Name field to create the first environmental variable (Destination_SQL_DB_ServerName)
Select the Type of variable (String)
Place the value of the variable for the TEST environment (IOWASQLDMTEST1.iowa.uiowa.edu)
Repeat this process for all the TEST environment variables needed
Click OK when all the variables are created
Environmental Variables - PROD
Perform the same variable creation for the PROD environment (BIC_Demo -PROD)
Name the variables the same as what are in the TEST environment
Click in the Name field to create the first environmental variable (Destination_SQL_DB_ServerName)
Select the Type of variable (String) Place the value of the variable for the
TEST environment (IOWASQLDMHOST1.iowa.uiowa.edu)
Repeat this process for all the PROD environment variables needed
Linking Environmental variables to SSIS Projects
Expand Integration Services Catalogs>SSISDB>Workgroup Space (BISSC)>Projects
Right click on the SSIS project
Click Configure…
Linking Environmental Variables – Cont.
Click on References (upper left hand corner)
Click Add… to link the previously created environment to the SSIS project
Expand Local Folder
Click on TEST environment (BIC_Demo - TEST)
Click OK
Repeat to add PROD environment
Map Parameters
Switch back to Parameters (upper left hand corner)
There is a list of the Project Parameters which were created within the SSIS Project
Click on the ellipsis next to the Value of the first variable
Select Use environment variable radio button
Click the dropdown and select the environmental variable to use (they should be the same name)
Click OK Repeat this process for all the variables
listed
Complete Setup
Once completed, the Value column for all the Parameters should have an underlined value (this indicates that it is using an environmental variable)
Click OK
The SSIS Project is now setup to use environmental variables
Execute Using Environmental Variables
Expand Integration Services Catalogs>SSISDB>Workgroup Space (BISSC)>Projects>SSIS Project (BIC_SSIS)>Packages
Right click on the package needing to be executed (GL_Demo.dtsx)
Click Execute…
Select Environment
Click Environment checkbox
Click the dropdown and select the environment for the package execution (.\BIC_Demo - TEST)
Click OK
BI Architecture
SSIS Resources
Stairway to Integration Services (21 part tutorial)
http://www.sqlservercentral.com/stairway/72494/
Pluralsight series by Stacia Misner Varga
https://app.pluralsight.com/library/courses/ssis-basic
https://app.pluralsight.com/library/courses/ssis-advanced
https://app.pluralsight.com/library/courses/ssis-advanced-part2
How to get Pluralsight through the university
https://its.uiowa.edu/pluralsight
Roadmap
SSIS Scale Out
Tie Parameterized Variable to Visual Studio Configuration Settings
First, with the Project Parameters window open, click the Add Parameters to Configurations button.
Tie Parameterized Variable to Visual Studio Configuration Settings This opens the Manage Parameters Values window which lets you assign different values to each
configuration for each parameter. Once that is set up, as you change the active configuration in Visual Studio the appropriate values will be automatically assigned to the parameters. This is completely separate from the Environments that are set up in SQL Server, it’s just useful when running within Visual Studio so you can easily and automatically use different values depending on the active configuration.
Additional Resources
Visual Studio Licensing Information:
https://helpdesk.its.uiowa.edu/software/download/msselect/MS%20Select%20Products%205-19-2017.pdf
Request access to RemoteBI server (Workflow):
https://workflow.uiowa.edu/form/remotebi-access-request
RemoteBI server information:
https://its.uiowa.edu/support/article/110181
SQL server database request form (Workflow):
https://workflow.uiowa.edu/form/its-sql-server-database-request