mcse business intelligence certification - courseware · 31/07/2012 1 . 7/31/2012 . 1 ©2007...
TRANSCRIPT
Microsoft MCSE Business Intelligence Certification
Courseware Version 1.1
www.firebrandtraining.com
31/07/2012
1
7/31/2012 1 ©2007 – Body Temple 7/31/2012
1
Part 1: Business Intelligence
MSCE SQL Server 2012 BI
Designing Business Intelligence
with Microsoft SQL Server 2012
7/31/2012 2 ©2007 – Body Temple 7/31/2012
2
EQUIPPING THE ORGANIZATION FOR EFFECTIVE DECISION MAKING
• Effective Decision Making
• Keys to Effective Decision Making
• Business intelligence
31/07/2012
2
7/31/2012 3 ©2007 – Body Temple 7/31/2012
3
Who is a Decision Maker
Decision makers must be throughout the organization
• Effective plans and policies created at the top can be undone
by poor decision making at lower levels
• Good decisions made by those at the bottom can be quickly
overwhelmed by poor decisions made up the line
Effective decision making at every level leads to
success
7/31/2012 4 ©2007 – Body Temple 7/31/2012
4
What is an Effective Decision?
Effective decisions are choices that move an
organization closer to an agreed-on set of goals in a
timely manner.
Key ingredients necessary for making effective
decisions:
• There must be a set of goals to work toward.
• There must be a way to measure whether a chosen course is
moving toward or away from those goals.
• Information based on those measures must be provided to the
decision maker in a timely manner.
31/07/2012
3
7/31/2012 5 ©2007 – Body Temple 7/31/2012
5
Goals
Goals must be specific and should avoid being vague,
as in:
• “superior customer satisfaction” how to measure?
• “increased profit margin” what costs are impacted?
• “better meeting our community’s needs” what needs?
To function as part of effective decision making, a
goal must:
• Contain a specific target.
• Provide a means to measure whether
we are progressing toward that target.
7/31/2012 6 ©2007 – Body Temple 7/31/2012
6
Is Your Map Upside-Down?
Need a method of navigation to:
• Determine whether we are heading towards or away from the
goal
• Measure the steps by which we are moving towards the goal
31/07/2012
4
7/31/2012 7 ©2007 – Body Temple 7/31/2012
7
Business Intelligence and Microsoft SQL Server 2012
Business Intelligence is not simply a set of facts and
figures on a printed report
For the report to be an effective business intelligence
tool it needs to be:
• Formatted in a way that is easily understood
• Has concise summaries of relevant data
• Delivered in a timely fashion
7/31/2012 8 ©2007 – Body Temple 7/31/2012
8
SQL Server 2012 Editions for BI
Capabilities Enterprise Business
Intelligence
Standard
Basic corporate BI • • • Basic Data Integration • • • Self-Service BI • •
Advanced Corporate BI • •
Enterprise Data Management • •
Advanced Data Integration •
Data Warehousing •
Advanced High Availability •
31/07/2012
5
7/31/2012 9 ©2007 – Body Temple 7/31/2012
9
Microsoft BI Technologies
SQL Server
Analysis Services
SQL Server Database Engine
Microsoft SQL Server Integration Services
SQL Server Master Data Services
1011000110
SQL Server Data Quality
Services
Microsoft SQL Azure
and the Windows Azure
Marketplace
Microsoft SharePoint
Server
Microsoft PowerPivot
Technologies
Microsoft Excel • Data Mining Add-In
• PowerPivot Add-In • MDS Add-In
Power View
SQL Server
Reporting Services
Reports, KPIs, and Dashboards
EIM
& D
ata
Ware
housi
ng
Busi
ness
Inte
llig
ence
7/31/2012 10 ©2007 – Body Temple 7/31/2012
10
MAKING THE MOST OF WHAT YOU’VE GOT – USING BUSINESS
INTELLIGENCE
• What Business intelligence Can Do For You
• Business Intelligence at Many Levels
• Building the Foundation
31/07/2012
6
7/31/2012 11 ©2007 – Body Temple 7/31/2012
11
When We Know What We Are Looking For
Two approaches:
• Layout-led Discovery
• When we know the questions we want answered and where to
find the information needed
• The most common form of business intelligence and one we are
all familiar with
• Data-led Discovery
• We know the question, but we don’t know where to look for the
answer
• The information we find determines where we want to go next
7/31/2012 12 ©2007 – Body Temple 7/31/2012
12
Discovering New Questions and Their Answers
Data may hold answers to questions we have not
though to ask
Data may contain trends, correlations, and
dependencies at a level of detail that would be
impossible for a human being to notice using either
the layout-led or data-led discovery
• Discovery requires a computer to use data mining techniques
• Works at the lowest level of detail
• Uses advanced mathematical algorithms
31/07/2012
7
7/31/2012 13 ©2007 – Body Temple 7/31/2012
13
The Top of the Pyramid
Decision makers at the upper
levels of our organizations must
look at the big picture
They deal with the long-term
policies and direction
They need:
• Highly summarized measures
• Higher Latency
7/31/2012 14 ©2007 – Body Temple 7/31/2012
14
Mid-Level
Mid-level decision makers are managing the operation
of departments and other working units within the
organization
They set short-term goals
Do the planning for the functioning of these areas
They need:
• Summarized measures with drilldown
• Some latency acceptable
31/07/2012
8
7/31/2012 15 ©2007 – Body Temple 7/31/2012
15
The Broad Base
The firepersons, managers, and group leaders
Deal with daily operations
Set daily goals
Make decisions on resource allocation for the next week, the next day, or the next shift
Plan the next sales campaign or the next sales call
They need:
• Measures at the detail level
• Low latency
7/31/2012 16 ©2007 – Body Temple 7/31/2012
16
SEEKING THE SOURCE – THE SOURCE OF BUSINESS INTELLIGENCE
• Seeking the Source
• The Data Mart
• Snowflakes, Stars, and Analysis Services
31/07/2012
9
7/31/2012 17 ©2007 – Body Temple 7/31/2012
17
Transactional Data
Transactional data is the information stored to track
the interactions, or business transactions, carried out
by an organization
Online transaction processing (OLTP) systems record
business interactions as they happen. They support
the day-to-day operation of an organization
7/31/2012 18 ©2007 – Body Temple 7/31/2012
18
Difficulties Using Transactional Data for Business Intelligence
OLTP systems are the treasure chests holding the raw
data we need to calculate measures and create
business intelligence
Well-designed OLTP systems are optimized for
efficiently processing and storing transactions
• Normalized data
BI is concerned with aggregates
OLTP systems are usually not good at delivering large
aggregates
31/07/2012
10
7/31/2012 19 ©2007 – Body Temple 7/31/2012
19
The Data Mart
A data mart is a body of historical data in an
electronic repository that does not participate in the
daily operations of the organization. Instead, this
data is used to create business intelligence. The data
in the data mart usually applies to a specific area of
the organization.
• Data is available for our business intelligence needs
somewhere outside of our OLTP systems
7/31/2012 20 ©2007 – Body Temple 7/31/2012
20
Features of a Data Mart
Built for speed of access
Data is de-normalized (repeated) requiring fewer table joins for data retrieval
Design pattern is organized around “facts”, knows as stars and snowflakes schemas
Data Mart Characteristics:
• No Real-Time Data
• Data is copied from the OLTP systems periodically and written to the data mart.
• Consolidated
• Data from different OLTP systems are consolidated into a single mart.
• Cleansed
• Inconsistencies and errors are removed from transactional data so it has the consistency necessary for use in a data mart.
31/07/2012
11
7/31/2012 21 ©2007 – Body Temple 7/31/2012
21
Data Mart Structure
The data we use for business intelligence can be divided into four categories:
• Measures
• A measure is a numeric quantity expressing some aspect of the organization’s performance. The information represented by this quantity is used to support or evaluate the decision making and performance of the organization. A measure can also be called a fact.
• Dimensions
• A dimension is a categorization used to spread out an aggregate measure to reveal its constituent parts.
• Attributes
• An attribute is an additional piece of information pertaining to a dimension member that is not the unique identifier or the description of the member.
• Hierarchies
• A hierarchy is a structure made up of two or more levels of related dimensions. A dimension at an upper level of the hierarchy completely contains one or more dimensions from the next lower level of the hierarchy.
7/31/2012 22 ©2007 – Body Temple 7/31/2012
22
Data Mart Structure Example (1 of 2)
A measure of total sales as a single point of
information:
By applying categorization or a dimension to that
single point of data, we can spread it out, for example
for each year
31/07/2012
12
7/31/2012 23 ©2007 – Body Temple 7/31/2012
23
Data Mart Structure Example (2 of 2)
Next we can spread the total sales for each product
type
If we were to spread it further out by sales region, the
measure becomes a cube
7/31/2012 24 ©2007 – Body Temple 7/31/2012
24
The Star Schema
All attributes are directly related to the key attribute,
which enables users to browse the facts in the cube
based on any attribute hierarchy in the dimension
31/07/2012
13
7/31/2012 25 ©2007 – Body Temple 7/31/2012
25
The Snow Flakes Schema
An attribute is either directly linked to the key
attribute if their underlying table is directly linked to
the fact table, or is indirectly linked by means of the
attribute that is bound to the key in the underlying
table that links the snowflake table to the directly
linked table
7/31/2012 26 ©2007 – Body Temple 7/31/2012
26
ONE-STOP SHOPPING – THE UNIFIED DIMENSIONAL MODEL
• Online Analytical Processing
• The Unified Dimension Model
• Tools of the Trade
31/07/2012
14
7/31/2012 27 ©2007 – Body Temple 7/31/2012
27
Online Analytical Processing
Online analytical processing (OLAP) systems enable
users to quickly and easily retrieve information from
data, usually in a data mart, for analysis.
OLAP systems present data using measures,
dimensions, hierarchies, and cubes.
7/31/2012 28 ©2007 – Body Temple 7/31/2012
28
Building OLAP - Out of Cubes
A cube is a structure that contains a value for one or
more measures for each unique combination of the
members of all its dimensions.
• These are detail, or leaf-level, values.
The cube also contains aggregated values formed by
the dimension hierarchies or when one or more of the
dimensions is left out of the hierarchy.
An aggregate is a value formed by combining values
from a given dimension or set of dimensions to create
a single value.
31/07/2012
15
7/31/2012 29 ©2007 – Body Temple 7/31/2012
29
Features of an OLAP System
Multidimensional Database
• Structures around measures, dimensions, hierarchies, and cubes rather than tables, rows, columns and relations.
Preprocessed Aggregates
• OLAP systems preprocess a portion of the aggregates that are found throughout the cube.
• The preprocessing is done as part of the background task that loads or updates the data in the OLAP database.
Easily Understood
• If designed properly, dimensions and hierarchies should match the structure of the organization.
7/31/2012 30 ©2007 – Body Temple 7/31/2012
30
Architecture
ROLAP – Relational OLAP
• Stores the cube structure in a multidimensional database.
• The leaf-level measures are left in the relational data mart.
• The preprocessed aggregates are also stored in a relational database table.
MOLAP – Multidimensional OLAP
• Stores the cube structure in a multidimensional database.
• Both the preprocessed aggregate values and a copy of the leaf-level values are placed in the multidimensional database as well
HOLAP – Hybrid OLAP
• Combines ROLAP and MOLAP
• Stores the cube structure and the preprocessed aggregates in a multidimensional database.
• Leaves the leaf-level data in the relational data mart that serves as the source of the cube.
31/07/2012
16
7/31/2012 31 ©2007 – Body Temple 7/31/2012
31
Disadvantages
Complexity to Develop and Administer
• To maintain and easy to use environment for the end user, a certain amount of complexity is shifted to the development and administrative tasks of the system.
Data Mart Required
• Using either a star or a snow flake layout.
Latency
• Data needs to be migrated from the OLTP systems to the data mart.
Read-Only
• Not a disadvantage but could be problematic when changes to the data are necessary to project certain results.
7/31/2012 32 ©2007 – Body Temple 7/31/2012
32
The Unified Dimensional Model
UDM introduced with SQL Server 2005
UDM is designed to provide all the benefits of an OLAP
system with multidimensional storage and
preprocessed aggregates, while avoiding a number of
the drawbacks of more traditional OLAP systems
31/07/2012
17
7/31/2012 33 ©2007 – Body Temple 7/31/2012
33
UDM Structure
UDM is a structure that sits on top of a data mart and
looks exactly like an OLAP system to an end user.
• Does not require a data mart.
• Can be built over one or more OLTP systems.
• Can be built over both a data mart and OLTP systems data.
• Can include data from other vendors databases and XML.
A UDM can have one or more data sources.
UDM utilizes data views to determine which tables and
fields to use from the data source.
7/31/2012 34 ©2007 – Body Temple 7/31/2012
34
UDM Proactive Caching
UDM uses proactive caching technology to obtain the performance advantages of traditional OLAP systems.
The cache is created when needed and changed when the underlying data or the underlying structure changes.
• Items are created in the cache before they have been requested by the user.
• The UDM monitors the data in the data source. As the data is modified, the UDM updates its structures.
Proactive cache can be built using MOLAP, ROLAP, or HOLAP.
31/07/2012
18
7/31/2012 35 ©2007 – Body Temple 7/31/2012
35
UDM Advantages
OLAP Built on Transactional Data
Extremely Low Latency
Ease of Creation and Maintenance
Design Versioning with Source Control
7/31/2012 36 ©2007 – Body Temple 7/31/2012
36
Creating Analysis Solutions with SQL Server 2012
• SQL Server Data Tools
Multidimensional models
Tabular models
Data mining models
• Microsoft Excel
PowerPivot tabular models
Data mining models
31/07/2012
19
7/31/2012 37 ©2007 – Body Temple 7/31/2012
37
Creating Reporting Solutions with SQL Server 2012
• Project-Based Development with source control
• Sophisticated Design Environment
• Rich Design Capabilities
• ClickOnce Installation
• Reusable Report Elements
• Flexible Layout
• Interactive data Visualization in the Browser
• Drag and Drop from Existing Data Model
Auth
ori
ng
Delivery
BI Developer IT Pro Power User Information Worker
Interactive Subscriptions Data Alerts
Report Designer Report Builder Power View
31/07/2012
1
7/31/2012 1 ©2007 – Body Temple 7/31/2012
1
Part 2: Business Intelligence
MSCE SQL Server 2012 BI
Designing Business Intelligence
with Microsoft SQL Server 2012
7/31/2012 2 ©2007 – Body Temple 7/31/2012
2
BUILDING FOUNDATIONS – CREATING DATA MARTS
• Data Mart
• Designing a data Mart
• Table Compression
31/07/2012
2
7/31/2012 3 ©2007 – Body Temple 7/31/2012
3
Who Needs a Data Mart Anyway?
Even with the UDM, situations still exist where a data mart may be the best choice as a source for business intelligence data
• Legacy Databases
• Some databases may not have an appropriate OLE DB provider
• Data from Non-database Source
• Data would need to be imported into a data mart before it can be utilized by a UDM
• No Physical Connection
• No full-time connection available to the data
• Dirty data
• Data requires cleaning before it can used as a source
7/31/2012 4 ©2007 – Body Temple 7/31/2012
4
Designing a Data Mart
A data mart is made up of measures and dimensions
organized in hierarchies and attributes
Design must take into consideration:
• Identifying the information that our decision makers need
• Reconcile the information with the available data in the
OLTP systems
• Organize the data into the data mart components
31/07/2012
3
7/31/2012 5 ©2007 – Body Temple 7/31/2012
5
Decision maker’s Needs
Decision makers need to be involved in the design process:
• Decision makers are the ones in the trenches
• Decision makers ultimately determine the success or failure of a
project
Questions that need to be answered by decision makers:
• What facts, figures, statistics, and so forth do you need for
effective decision making? (foundation and feedback measures)
• How should this information be sliced and diced for analysis?
(dimensions)
• What additional information can aid in finding exactly what is needed? (attributes)
7/31/2012 6 ©2007 – Body Temple 7/31/2012
6
Available Data
Reality check: is the data available in the OLTP
systems?
If the data is not available, can we get that
information from another data source?
31/07/2012
4
7/31/2012 7 ©2007 – Body Temple 7/31/2012
7
Data Mart Structures
The structures, measure, dimensions, hierarchies, and
attributes, will lead us to the star or snow flake
schema that will define our data mart
The next set of slides will discuss the process by which
to address the design of each structure
7/31/2012 8 ©2007 – Body Temple 7/31/2012
8
Measures
The measures are the foundation and feedback information our decision makers require.
Reconcile the requirements with what is available in the OLTP data to come up with a list of measures.
Examples of numeric data that can be used as measures: Monetary Amounts, Counts, Time Periods
The following will be needed for each measure:
• Name of the measure
• What OLTP field or fields should be used to supply the data
• Data type (money, integer, decimal)
• Formula used to calculate the measure (if there is one)
31/07/2012
5
7/31/2012 9 ©2007 – Body Temple 7/31/2012
9
Dimensions and Hierarchies
While measures define what the decision makers want to see, the dimensions and hierarchies define how they want to see it.
Reconcile the requested dimensions and hierarchies with what is available from the OLTP data
The following is needed for each dimension:
• Name of the dimension
• What OLTP field or fields are to be used to supply the data
• Data type of the dimension’s key (the code that uniquely identifies each member of the dimension)
• Name of the parent dimension (if there is one)
7/31/2012 10 ©2007 – Body Temple 7/31/2012
10
Attributes
Attributes provide additional information about a dimension and may result from Information decision makers want to:
• Be readily available during analysis
• Filter on during the analysis process
We need to reconcile the requested attributes with the data available from the OLTP database to come up with the list of attributes in our design.
The following is needed for each attribute:
• Name of the attribute
• What OLTP field or fields are to be used to supply the data
• Data type
• Name of the dimension to which it applies
31/07/2012
6
7/31/2012 11 ©2007 – Body Temple 7/31/2012
11
Stars and Snowflakes
Measures are placed in a single table called the fact
table.
The dimensions at the lowest level of the hierarchies
are each placed in their own dimension table
In a Star Schema, all the information for a hierarchy is
stored in the same table.
In the snowflake schema, each level in the
dimensional hierarchy has its own table. Dimensions
are linked together with foreign key relationships to
form the hierarchy.
7/31/2012 12 ©2007 – Body Temple 7/31/2012
12
Table Compression
Table compression modifies the way data is physically
stored on the disk drive in order to save space
• It is transparent to applications making use of the data
SQL Server provides for 2 types of table compression:
• Row Compression
• Page Compression
To enable table compression:
ALTER TABLE ManufacturingFact
REBUILD WITH (DATA_COMPRESSION = PAGE)
ROW
31/07/2012
7
7/31/2012 13 ©2007 – Body Temple 7/31/2012
13
TRANSFORMERS – INTEGRATION SERVICES STRUCTURE AND
COMPONENTS
• Integration Services
• Package Items
7/31/2012 14 ©2007 – Body Temple 7/31/2012
14
Overview of Data Warehouse Load Cycles
• Extract changes from data sources
• Refresh the data warehouse based on changes
Special considerations for slowly changing dimensions
Data Warehouse Staging Database
ETL process inserts or modifies
data in the data warehouse
based on changes
ETL process extracts new
and modified data
Users modify data
in business
applications
31/07/2012
8
7/31/2012 15 ©2007 – Body Temple 7/31/2012
15
Review Options for ETL
Microsoft SQL Server Integration Services
The Import and Export Data Wizard
Transact-SQL
The bcp utility
Replication
7/31/2012 16 ©2007 – Body Temple 7/31/2012
16
Package Structure
SSIS creates structures called packages
• Used to move data between systems
• Contain data sources and data destinations
SSIS is an ETL (Extract, Transform, and Load) tool that
is:
• Easy to use
• Extremely flexible
• Exceedingly capable
• Highly scalable
31/07/2012
9
7/31/2012 17 ©2007 – Body Temple 7/31/2012
17
Package Items
Control Flow
• Control Flow Containers
• Control Flow Tasks
• Maintenance Plan tasks
Data Flow
• Data Flow sources
• Data Flow transformation
• Data Flow destinations
7/31/2012 18 ©2007 – Body Temple 7/31/2012
18
Event Handlers
Integration Services packages are event-driven
An event can be the completion of a task or an error
that occurs during task execution
An event handler is a
routine that is defined as a
control flow
Event handler tasks can be
created in the Event
Handlers Designer tab
31/07/2012
10
7/31/2012 19 ©2007 – Body Temple 7/31/2012
19
Precedence Arrows
Control the order in which tasks are executed
Three options are available:
• Success – Green
• Failure – Red
• Completion - Blue
7/31/2012 20 ©2007 – Body Temple 7/31/2012
20
Deploying SSIS Packages
Deployment from Development/Testing/Staging to a
Production environment involves 4 primary tasks:
• Package Configuration
• Creating a Package Deployment Utility
• Installing with a Package Deployment Utility
• Executing Integration Services Packages
31/07/2012
11
7/31/2012 21 ©2007 – Body Temple 7/31/2012
21
SSIS Deployment Models
Package Deployment Model
SSIS Packages are deployed and managed individually
Project Deployment Model
Multiple packages are deployed in a single project
Project
Package Package
Project-level parameter
Package-level parameter Package-level parameter
Deploy
Deploy
SSIS Catalog
Package
Deployment
Model
Project-level connection manager
Package connection manager Package connection manager
7/31/2012 22 ©2007 – Body Temple 7/31/2012
22
Package Deployment Model
• Storage
MSDB
File System
• Package Configurations
Property values to be set dynamically at run time
• Package Deployment Utility
Generate all required files for easier deployment
31/07/2012
12
7/31/2012 23 ©2007 – Body Temple 7/31/2012
23
Project Deployment Model
• The SSIS catalog
Storage and management for SSIS projects on a SQL Server 2012 instance
• Folders
A hierarchical structure for organizing and securing SSIS projects
7/31/2012 24 ©2007 – Body Temple 7/31/2012
24
Deployment Model Comparison
Feature Package Deployment Project Deployment
Unit of Deployment Package Project
Storage File system or MSDB SSIS Catalog
Dynamic configuration Package configurations Environment variables
mapped to project-level
parameters and
connection managers
Compiled format Multiple .dtsx files Single .ispac file
Troubleshooting Configure logging for
each package
SSIS catalog includes
built-in reports and views
31/07/2012
13
7/31/2012 25 ©2007 – Body Temple 7/31/2012
25
SSIS Catalog • Pre-requisites
SQL Server 2012
SQL CLR enabled
• Creating a catalog
Use SQL Server Management Studio
One SSIS catalog per SQL Server instance
• Catalog Security
Folder Security
Object Security
Catalog Encryption
Sensitive Parameters
7/31/2012 26 ©2007 – Body Temple 7/31/2012
26
Environments and Variables
• Environments
Execution contexts for projects
• Variables
Environment-specific values that can be mapped to project parameters and connection manager properties at run time
31/07/2012
14
7/31/2012 27 ©2007 – Body Temple 7/31/2012
27
Deploying an SSIS Project
• Integration Services Deployment Wizard
SQL Server Data Tools
SQL Server Management Studio
7/31/2012 28 ©2007 – Body Temple 7/31/2012
28
Viewing Project Execution Information
• Integration Services Dashboard provides built-in reports
• Additional sources of information:
Event Handlers
Error Outputs
Logging
Debug Dump Files
31/07/2012
15
7/31/2012 29 ©2007 – Body Temple 7/31/2012
29
Control Flow Containers
For Loop Container
• Enables us to repeat a segment of a control flow, the number of times is controlled by 3 properties:
• InitExpression, initial value
• EvalExpression, evaluated on ever loop, if true the loop content is executed
• AssignExpression, evaluated along with the EvalExpression after each execution of the loop
Foreach Loop Container
• Iterates one time for each item in the collection
Sequence Container
• No iteration, it only helps in organizing the tasks in a package
7/31/2012 30 ©2007 – Body Temple 7/31/2012
30
Control Flow Tasks
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Bulk Insert Task
Data Flow Task
Data Mining Query Task
Data Profiling Task
Execute DTS 2000 Package Task
Execute Package Task
Execute Process Task
Execute SQL Task
File System Task
FTP Task
Message Queue Task
Script Task
Send mail Task
Transfer Database Task
Transfer Error Messages Task
Transfer Jobs Task
Transfer Logins Task
Transfer Master Stored Procedures Task
Transfer SQL Server Objects Task
Web Service Task
WMI Data Reader Task
WMI Event Watcher Task
XML Task
31/07/2012
16
7/31/2012 31 ©2007 – Body Temple 7/31/2012
31
Maintenance Plan Tasks
Back Up Database Task
Check Database Integrity Task
Check Database Integrity Task
Execute T-SQL Statement Task
History Cleanup Task
Maintenance Cleanup Task
Notify Operator Task
Rebuild Index Task
Reorganize Index Task
Shrink Database Task
Update Statistics Task
Custom Tasks
7/31/2012 32 ©2007 – Body Temple 7/31/2012
32
Data Flow Sources
ADO.NET Data Source
Excel Source
Flat File Source
OLE DB Source
Raw File Source
XML Source
31/07/2012
17
7/31/2012 33 ©2007 – Body Temple 7/31/2012
33
Data Flow Transformations
Aggregate
Audit
Cache Transform (new)
Character Map
Conditional Split
Copy Column
Data Conversion
Data Mining Query
Derived Column
Export Column
Fuzzy Grouping
Fuzzy Lookup
Import Column
Lookup
Merge
Merge Join
Multicast
OLE DB Command
Percentage Sampling
Pivot
Row Count
Row Sampling
Script Component
Slowly Changing Dimension
Term Extraction
Term Lookup
Union All
Unpivot
7/31/2012 34 ©2007 – Body Temple 7/31/2012
34
Data Flow Destinations
ADO.NET
Data Mining Model
Training
Data Radar
Dimension Processing
Excel
Flat File
OLE DB
Partition Processing
Raw File
Recordset
SQL Server Compact
SQL Server
31/07/2012
18
7/31/2012 35 ©2007 – Body Temple 7/31/2012
35
Package Debugging
Setting Breakpoints
• We can set a breakpoint on any of the control flow tasks in a
package
Viewing Package State
• While the package execution is paused at a breakpoint, there
are several places to see the current execution state of the
package
Viewing Data Flow
• We can attach data viewers inside the data flow at various
steps along the way to view the doings inside package tasks
7/31/2012 36 ©2007 – Body Temple 7/31/2012
36
Change Data Capture
1. Enable Change Data Capture
1. Map start and end times to log sequence numbers
2. Handle null log sequence numbers
1. Extract changes between log sequence numbers
EXEC sys.sp_cdc_enable_db
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Customers',
@role_name = NULL, @supports_net_changes = 1
DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @StartDate)
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @EndDate)
IF (@from_lsn IS NULL) OR (@to_lsn IS NULL)
-- There may have been no transactions in the timeframe
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Customers(@from_lsn, @to_lsn, 'all')
31/07/2012
19
7/31/2012 37 ©2007 – Body Temple 7/31/2012
37
The CDC Control Task and Data Flow Components Initial Extraction Incremental Extraction
CDC Control Mark Initial Load Start
Source
Staged Inserts
CDC Control Mark Initial Load End
CDC State Table
CDC
State
Variable
CDC Control Get Processing Range
CDC Source
Staged
Inserts
CDC Control Mark Processed Range
CDC
State
Variable
CDC Splitter
Staged
Updates
Staged
Deletes
1. A CDC Control Task records the starting LSN
2. A data flow extracts all records
3. A CDC Control task records the ending LSN
1. CDC Control Task establishes the range of LSNs to be extracted
2. A CDC Source extracts records and CDC metadata
3. Optionally, a CDC Splitter splits the data flow into inserts, updates, and deletes
4. A CDC Control task records the ending LSN
Data
Flo
w
Data
Flo
w
1
2
3
1
2
3
4
CDC
7/31/2012 38 ©2007 – Body Temple 7/31/2012
38
Loading a Data Warehouse from CDC Output Tables
Staging DB
Data Warehouse Staging DB Data Warehouse
Execute SQL Task INSERT… FROM
Execute SQL Task UPDATE… FROM
JOIN ON BizKey
Execute SQL Task DELETE WHERE BizKey IN
or
UPDATE… FROM
JOIN ON BizKey
Source Staged Inserts
Source Staged Updates
Source Staged Deletes
Destination Dimension Table
OLE DB Command UPDATE…
OLE DB Command UPDATE… or DELETE…
Staging and Data Warehouse Co-located Remote Data Warehouse
Data
Flo
w
Data
Flo
w
Data
Flo
w
31/07/2012
1
7/31/2012 1 ©2007 – Body Temple 7/31/2012
1
Part 3: Delivering
MSCE SQL Server 2012 BI
Designing Business Intelligence
with Microsoft SQL Server 2012
7/31/2012 2 ©2007 – Body Temple 7/31/2012
2
DELIVERING BUSINESS INTELLIGENCE WITH REPORTING SERVICES
• Reporting Services
• Report Server Architecture
• Designing and Creating Reports
31/07/2012
2
7/31/2012 3 ©2007 – Body Temple 7/31/2012
3
Reporting Scenarios
1. Scheduled Delivery of Standard Reports
2. On-Demand Access to Standard Reports
3. Embedded Reports and Dashboards
4. Request to IT for Custom Reports
5. Self-Service Reporting
1
2 4
5
3
7/31/2012 4 ©2007 – Body Temple 7/31/2012
4
Report Structure
A report project can contain a number of reports.
Each report contains two distinct sets of instructions:
• data definition, controls where the data for the report comes
from and what information is to be selected from that data
• Contains 2 parts: data source and data set
• report layout, controls how the information is presented on
the screen or on paper
The information in the data
definition and the report layout is
stored in XML format using the
Report Definition Language (RDL)
31/07/2012
3
7/31/2012 5 ©2007 – Body Temple 7/31/2012
5
Report Server
Report Catalog
• Hosts copies of the RDLs
Report Processor
• Retrieves the RDL for the report from the Report Catalog
Data Providers
• Knows how to retrieve the information from a data source
Renderers
• Works with the processor to read through the report layout
Request Handler
• Responsible for receiving requests for reports and passing those requests on to the report processor
7/31/2012 6 ©2007 – Body Temple 7/31/2012
6
The Distributed Installation
Reporting Services items are split between two
computers that work together to create a complete
Reporting Services system:
• Database Server, hosts SQL
Server 2012 which
in turn host the databases
that make up the Report
Catalog
• Report Server, runs
Reporting Services Windows
Service
31/07/2012
4
7/31/2012 7 ©2007 – Body Temple 7/31/2012
7
The Scale-Out Installation
A specialized form of the distributed installation
• A single database
server interacts with
several report servers
• Each of the report
servers uses the same
set of Report Catalog
databases for its
information
• Allows us to handle
more simultaneous
users
7/31/2012 8 ©2007 – Body Temple 7/31/2012
8
Review New & Enhanced Features SQL2012
SQL Server 2008 R2 Reporting Services
• Self-Service Reporting
Enhanced Report Builder
Shared Datasets
Report Parts
• Authoring Enhancements
Textbox Rotation
Lookup Functions
Aggregations of Aggregates
Pagination Enhancements
• New Data Visualizations
Sparklines and Data Bars
Indicators
Maps
SQL Server 2012 Reporting Services
• Power View – Interactive self-service reporting
• Greater integration with SharePoint Server
• Improved Rendering to Microsoft Word and Excel formats
• Data Alerts – E-mail based notifications of changes to report data
31/07/2012
5
7/31/2012 9 ©2007 – Body Temple 7/31/2012
9
Self-Service Reporting
• Empower information workers
• Supplement standard reports
• Reduce IT workload
• Supported by:
Report Builder 3.0
Shared Data Sources and Datasets
Report Parts
7/31/2012 10 ©2007 – Body Temple 7/31/2012
10
SharePoint Integration
• Implemented as a SharePoint 2010 Shared Service
• A SharePoint site provides the UI for report server content and operations.
• Managed through SharePoint Central Administration
• Creates an integrated, consistent reporting environment for organizations that use SharePoint
31/07/2012
6
7/31/2012 11 ©2007 – Body Temple 7/31/2012
11
Scaling Out Reporting Services in a SharePoint Farm
Network Load Balancer (NLB)
Web Front-End (WFE)
Application Service
Database Server
Reporting
Services
Add-in
Reporting Services
SharePoint
Mode
• Add Application servers to the SharePoint farm to scale-out report processing services
Install Reporting Services in SharePoint Mode
• Add Web front-end servers to the SharePoint farm to balance user requests
Install Reporting Services Add-in for SharePoint Products
7/31/2012 12 ©2007 – Body Temple 7/31/2012
12
FALLING INTO PLACE – MANAGING REPORTING SERVICES
• Report Manager
• Managing Reports on the Report Server
• Ad Hoc Reporting
31/07/2012
7
7/31/2012 13 ©2007 – Body Temple 7/31/2012
13
Folders and The Report Manager
Folders can be created in the Report Manager to group reports and other items and can contain:
• Reports
• Supporting files (eternal images, shared data sources, etc.)
• Other folders
The Report Manager application provides a straightforward method for creating and navigating folders in the Report Catalog.
By default, the Report Manager site is installed in the default website on the server. It is located in a virtual directory called Reports. http://ComputerName/reports
7/31/2012 14 ©2007 – Body Temple 7/31/2012
14
Deploying Reports Using the Report Designer
The most common method of moving reports to the
report server
31/07/2012
8
7/31/2012 15 ©2007 – Body Temple 7/31/2012
15
Uploading Reports Using Report Manager
Another common method of moving a report to the
report server is by using the Report Manager, aka
uploading the report
7/31/2012 16 ©2007 – Body Temple 7/31/2012
16
Security
In Reporting Services, security is designed for
• Flexibility
• Individual access rights can be assigned to each folder and to
each items within the folder
• Ease of management
• Security inheritance
• Security Roles
• Integration with Windows security
31/07/2012
9
7/31/2012 17 ©2007 – Body Temple 7/31/2012
17
Integration with Windows Security
Reporting Services does not maintain its own list of
users and passwords
It depends entirely on integration with Windows
security
note: Custom security is possible,
however not advisable due to complexity
7/31/2012 18 ©2007 – Body Temple 7/31/2012
18
Tasks and Rights
Each task in Reporting Services has a corresponding
right
Tasks come in 2 flavors:
• Security tasks
• System-wide security tasks
31/07/2012
10
7/31/2012 19 ©2007 – Body Temple 7/31/2012
19
Roles
The rights to perform tasks are grouped together to create roles
Reporting Services includes several predefined roles:
• The Browser Role
• The Publisher Role
• The My Reports Role
• The Content Manager Role
• The System User Role
• The System Administrator Role
Role assignments can be also be created for folders, reports or resources
Folders, except the home folder) also inherit the role assignments of their parent roles
7/31/2012 20 ©2007 – Body Temple 7/31/2012
20
Linked Reports
The linked report is deployed to one folder
It is then pointed to (linked to) from links placed
elsewhere within the report catalog
To the user the links look just like a report
31/07/2012
11
7/31/2012 21 ©2007 – Body Temple 7/31/2012
21
Report Caching
Report caching is an option that can be turned on
individually for each report on the report server to speed
the rendering of reports to users
• The report server saves a copy, or instance, of the report in a temporary location the first time the report is executed
• On subsequent executions, with the same parameter values chosen, the report server pulls the information necessary to
render the report from the report cache
Cached reports are assigned an expiration data and time
Cached reports must use stored credentials for the shared
data sources
7/31/2012 22 ©2007 – Body Temple 7/31/2012
22
Execution Snapshots
An execution snapshot is another way to create a
cached report instance
• Created automatically
• Can be created on a scheduled basis
• Can be created as soon as the feature is turned on for a
particular report
Advantage over caching:
• First use to retrieve the report after the cache has expired
does not need to for the report to be generated
31/07/2012
12
7/31/2012 23 ©2007 – Body Temple 7/31/2012
23
Report History
The report history feature of the Report Manager
enables us to keep copies of a report’s past execution
• Lets us save the state of our data without having to save
copies of the data itself
• Have to provide a default value for each report parameter
• Can start to pile up if we are not careful
• They are not lost if the definition of the underlying report is
changed
• Just like the cached report instance, the report history snapshot
contains both the report definition and the dataset
7/31/2012 24 ©2007 – Body Temple 7/31/2012
24
Standard Subscriptions
A request to push a particular report to a user or set of users
Self-serve operations
2 delivery options: Email and File Share
Can use multiple subscriptions on one report (different parameters, end of the week and end of the month)
To subscribe to a report or create a subscription for delivery to others, you must have rights to the Manage Individual Subscriptions task
• Browser, Content Manager, and My Reports roles have rights to manage individual subscriptions
31/07/2012
13
7/31/2012 25 ©2007 – Body Temple 7/31/2012
25
Data Driven Subscriptions
Aka “mass mailing”
Enables us to take a report and e-mail it to a number
of people on a mailing list
To create a data-driven subscription for a report, you
must have rights to the Manage All Subscriptions task
• Only the Content Manager role has the rights to this task
While a data-driven subscription is a scheduled
process rather than triggered by a particular event,
we can make it behave almost as if it were event-
driven using stored procedures
7/31/2012 26 ©2007 – Body Temple 7/31/2012
26
Ad Hoc Reporting
Users may need to create reports that are one-time in nature or cannot wait till a report developer is available
The Report Builder, along with the Report Models, provides a means for end users to explore their data without having to learn the ins and outs of SELECT statements and query builders
The Report Model
• Provides a nontechnical user with a view of database content without requiring an intimate knowledge of relational theory and practice
• Hides all of the complexity of primary keys and foreign key constraints
31/07/2012
14
7/31/2012 27 ©2007 – Body Temple 7/31/2012
27
Cleaning Up the Report Model
Remove any numeric aggregates that don’t make sense
Remove attributes that should not be present
Rename entities that have cryptic names
Put the proper items in the Lookup folder
Use folders to organize entities, attributes, and roles
Rearrange the entity, attribute, and role order
Manually create calculated attributes
Add descriptions
Create perspectives coinciding with business areas
7/31/2012 28 ©2007 – Body Temple 7/31/2012
28
Entities, Roles, and Fields
Reports are created in the Report Builder using entities, roles, and fields
Entities are the objects or processes that our data knows something about
• Can be grouped together in entity folders within the Report Model or in perspectives to help keep things organized
Roles show us how one entity relates to another entity
• Enable us to show information from multiple entities together on a single report in a meaningful manner
Fields are bits of information: a product name, a machine number, or a date of manufacture
• Fields are what we place on our reports to spit out these bits of information
31/07/2012
15
7/31/2012 29 ©2007 – Body Temple 7/31/2012
29
Using Reporting Services without the Report Manager
When using a custom application to access reports, it
is not feasible to use the report manager
Other approaches are:
• URL Access
• Web Services Access
• The Report Viewer Control
7/31/2012 30 ©2007 – Body Temple 7/31/2012
30
Tabular Data Model
• An in-memory database that uses xVelocity in-memory technologies
• Based on the widely understood relational model
• Quick and easy to create
• Faster time to deployment
• Easier to learn than multidimensional models, so has a lower barrier to entry
• Scalability from desktop BI to organizational BI
xVelocity
31/07/2012
16
7/31/2012 31 ©2007 – Body Temple 7/31/2012
31
Options for Creating Tabular Data Models
• Tabular Data Models in PowerPivot for Excel
Create a tabular data model in a Microsoft Excel workbook
Importing data automatically creates a tabular data model
The data is stored in the Excel workbook
• Tabular Data Models in Microsoft SQL Server 2012 Analysis Services
Create a tabular data model by using SQL Server Data Tools
The data is stored in SQL Server 2012 Analysis Services
There are additional features to support larger, more complex solutions:
• Row-level security
• Partitioning
• DirectQuery mode
• Deployment options
7/31/2012 32 ©2007 – Body Temple 7/31/2012
32
PowerPivot Technologies
• PowerPivot for Excel
Sophisticated desktop data analysis solution
Increased autonomy for information workers
Fast query response times
DAX for custom measures and calculated columns
Diagram view for management of tables and relationships
Hierarchies and perspectives
31/07/2012
17
7/31/2012 33 ©2007 – Body Temple 7/31/2012
33
PowerPivot Technologies
• PowerPivot for SharePoint
Portal for sharing and collaboration
Gallery to browse and access workbooks and reports
Server-side processing enables users to open workbooks in a browser
Central management and security for workbooks
7/31/2012 34 ©2007 – Body Temple 7/31/2012
34
Features in PowerPivot
•Diagram view
•Hierarchies
•Perspectives
•Support for multiple relationships between tables.
•The ability to sort one column by the values in another column.
•New DAX functions
•Reporting properties
31/07/2012
18
7/31/2012 35 ©2007 – Body Temple 7/31/2012
35
Importing Tables from a Data Source
• Create data source connections in Excel PowerPivot window
• Use a wide range of connection options including common third-party databases
• Automatically add related tables
• Filter out columns that are not required for analysis:
Improves PowerPivot performance
Simplifies user experience
• Provide table aliases for ease of use
7/31/2012 36 ©2007 – Body Temple 7/31/2012
36
Sharing PowerPivot for Excel Workbooks
• Upload PowerPivot for Excel workbooks to PowerPivot Gallery on SharePoint:
Browse workbooks and reports in the gallery
View them in Windows Internet Explorer
Open them in Excel for further analysis
• Use uploaded workbooks as data sources for Excel
31/07/2012
19
7/31/2012 37 ©2007 – Body Temple 7/31/2012
37
Using PowerPivot Gallery
• Shows thumbnail previews of PowerPivot workbooks
• Offers different viewing options:
Gallery
All Documents
Theater
Carousel
• Click a workbook to open it in Internet Explorer
7/31/2012 38 ©2007 – Body Temple 7/31/2012
38
Review of DAX • DAX is a formula-based language for building business logic and
queries in tabular data models:
Calculated columns
Measures
Queries
• Its syntax is similar to Microsoft Excel formulas:
It uses functions, operators, and values
It is easy to use and already familiar to information workers
• It differs from Excel formulas in key ways:
It is designed to work with relational data, not data ranges
It includes more advanced functionality
• There are new functions to expand DAX
31/07/2012
20
7/31/2012 39 ©2007 – Body Temple 7/31/2012
39
DAX Functions
• Text functions
• Information functions
• Filter and value functions
• Logical functions
• Mathematical and trigonometric functions
• Statistical and aggregation functions
• Date and time functions
• Time intelligence functions
7/31/2012 40 ©2007 – Body Temple 7/31/2012
40
DAX Syntax and Data Types
• DAX formulas start with the equal sign (=) followed by an expression
• Expressions can contain functions, operators, constants, and references to columns
• Column references:
Fully qualified name:
Unqualified name:
• Measure names must be enclosed in brackets
• DAX uses eight data types
'table name'[column name]
[column name]
31/07/2012
21
7/31/2012 41 ©2007 – Body Temple 7/31/2012
41
Aggregations
• Summarize underlying detailed data for analysis
• Use automatic aggregation for simple calculations:
SUM
COUNT
MIN
MAX
AVERAGE
• Create a measure for more complex aggregations
SUM('Reseller Sales'[Sales Amount])
7/31/2012 42 ©2007 – Body Temple 7/31/2012
42
Context
• A DAX measure or calculated column defines a field that you can use in a PivotTable table or a PivotChart chart
• The exact values that appear in PivotTable tables and PivotChart charts vary with context:
Row context
Query context
Filter context
31/07/2012
22
7/31/2012 43 ©2007 – Body Temple 7/31/2012
43
DAX Queries
• Client applications, such as the Power View reporting tool, issue DAX queries
• You can write queries manually by using the DAX query language
• You can filter, order, and summarize results
EVALUATE(
FILTER('Reseller Sales',
'Reseller Sales'[OrderDateKey]>20040101))
7/31/2012 44 ©2007 – Body Temple 7/31/2012
44
Calculated Columns • Named columns that are populated by using a DAX formula
• Create calculated columns in the Data View window of a PowerPivot for Excel workbook:
Add a new column
Provide a name
Enter a DAX expression in the formula bar
• A value is calculated for each row in the table when the calculated column is created
• Use calculated columns in PivotTable tables, PivotChart charts, slicers, and measure definitions
=CONCATENATE('Employee'[First Name], CONCATENATE(" ",
'Employee'[Last Name]))
31/07/2012
23
7/31/2012 45 ©2007 – Body Temple 7/31/2012
45
Measures
• Named formulas that can contain sophisticated business logic:
Implicit measures
Explicit measures
• Create explicit measures in two places:
The PowerPivot Field List in an Excel worksheet
The Measure Grid in the table view in the PowerPivot window
• Use measures in PivotTable tables and Pivot Chart charts
IF([Previous Year], ([Sum of Sales Amount] – [Previous
Year])/[Previous Year], BLANK())
7/31/2012 46 ©2007 – Body Temple 7/31/2012
46
Multiple Relationships
• Tabular data models support multiple relationships between tables
• Only one relationship is active at a time
• The active relationship is used by default in DAX formulas
• The USERELATIONSHIP function enables you to select the relationship that you want to use for a specific formula
=CALCULATE(SUM(Reseller Sales[Sales Amount]),
USERELATIONSHIP(Reseller
Sales[ShipDateKey],Date[DateKey]))
31/07/2012
24
7/31/2012 47 ©2007 – Body Temple 7/31/2012
47
Time Intelligence • Compare data from one time period against equivalent data from
a different time period
• The tabular model should contain a separate table that contains only date information
• The date table should have a continuous range of dates without any gaps
• The column in the date table that uses the date data type should use day as the lowest level of granularity
• Mark the table as Date Table to use time intelligence functions against that table
• Use time intelligence functions to build measures
CALCULATE([Sum of Sales Amount],
DATEADD('Date'[FullDateAlternateKey], -1, YEAR))
7/31/2012 48 ©2007 – Body Temple 7/31/2012
48
Dynamic Measures • Calculate different values for each row in a PivotTable table:
Create a linked table that contains the input values for the dynamic measure
Use the HASONEVALUE function to check that there is a single input value for each row
Use conditional logic to apply different calculations to each row based on the input value
IF([Check Single Values)], SWITCH(VALUES('Time
Period'[Period]), "Current Year", [Sum Of Sales
Amount], "Previous Year" [Previous Year], "YOY Growth"
, IF(NOT(ISBLANK([Previous Year])), [Sum of Sales
Amount] – [Previous Year], BLANK())), BLANK())
31/07/2012
1
7/31/2012 1 ©2007 – Body Temple 7/31/2012
1
Part 4: Cloud Technologies in a BI Solution
MSCE SQL Server 2012 BI
Designing Business Intelligence
with Microsoft SQL Server 2012
7/31/2012 2 ©2007 – Body Temple 7/31/2012
2
Cloud Technologies in a BI Solution
Cloud Data Sources
SQL Azure
SQL Azure Reporting Services
The Windows Azure Marketplace DataMarket
31/07/2012
2
7/31/2012 3 ©2007 – Body Temple 7/31/2012
3
Cloud Data Scenarios
Application Databases Third-Party Data
7/31/2012 4 ©2007 – Body Temple 7/31/2012
4
Microsoft Cloud Platform for Data
Databases
Windows Azure Marketplace Data
Market SQL Azure
Reporting Data Sync
31/07/2012
3
7/31/2012 5 ©2007 – Body Temple 7/31/2012
5
Cloud Data and Services in the BI Ecosystem
ETL Load
Process ETL Staging
Process
1011000110
Data Cleansing
Staging
Database Data Warehouse
Windows Azure Marketplace
DataMarket DQS KB
SQL Azure
Data Sync
SQL Azure
7/31/2012 6 ©2007 – Body Temple 7/31/2012
6
Comparing SQL Azure with SQL Server
V
31/07/2012
4
7/31/2012 7 ©2007 – Body Temple 7/31/2012
7
Topology of SQL Azure
Load Balancer
TDS
7/31/2012 8 ©2007 – Body Temple 7/31/2012
8
Using SQL Azure as a Data Source for a Data Warehouse
SSIS
Data Sync
31/07/2012
5
7/31/2012 9 ©2007 – Body Temple 7/31/2012
9
SQL Azure Reporting
• Cloud-based reporting
• Create reports with the same tools as on-premise Reporting Services
• Two core scenarios
Operational reports for Windows Azure SQL Database
Embedded reports in Windows or Azure applications
7/31/2012 10 ©2007 – Body Temple 7/31/2012
10
Windows Azure Marketplace DataMarket Data Scenarios
Windows Azure Marketplace DataMarket
1011000110
Data Cleansing
DQS KB
SSIS