ISQS 6339, Data Management and Business ISQS 6339, Data Management and Business Intelligence Intelligence
Cubism – Measures and Cubism – Measures and DimensionsDimensionsZhangxi Lin
Texas Tech University
1
OutlineOutlineMeasuresWhere we’ve beenPopulating fact tableTypes of dimensions
2
Structure and Components of Structure and Components of Business IntelligenceBusiness Intelligence
3
SSMSSSMS SSISSSIS SSASSSAS
SSRSSSRS
SASEM
SASEM
SASEG
SASEG
Snowflake Schema of the Data MartSnowflake Schema of the Data Mart
4
Manufacturingfact
DimProduct
DimProductSubType
DimProductType
DimBatch
DimMachine
DimMachineType
DimMaterial
DimPlant
DimCountry
1
2
3
4
5
8
6
7
910
Where we’ve been and where we Where we’ve been and where we are noware nowExercise 1: Getting started Exercise 2: Creating a data mart with
SSMSExercise 3: Creating data mart with
BIDSExercise 4: Populating dimensions of a
data martExercise 5: Loading fact tablesExercise 6: Create and customize a
cube
5
What we need to do with the half-What we need to do with the half-done data mart?done data mart?Populate DimBatch dimenstion tablePopulate ManufacturingFact tableBuild an OLAP cube (we already did this
before)Check measuresCheck dimensions
6
7
MEASURES MEASURES
FactsFactsFacts are measurements associated with a
specific business process.Many facts can be derived from other facts,
including additive and semiadditive facts. Non-additive facts can be avoided by
calculating it from additive facts.Measures are clustered together in a
group, called measure group.
8
Types of measuresTypes of measures Three types
◦ Additive measures. Most facts are additive (calculative), such as sum
◦ Semiadditive measures. The measures that can be added along some dimensions, but not along others. For example, inventory level can be added along product dimension but not time dimension.
◦ Non-additive (such as max, average), or descriptive (e.g. factless fact table).
Aggregate functions◦ Additive: Sum◦ Semiadditive: ByAccount, Count, FirstChild,
FirstNonEmpty, LastChild, LastNonEmpty, Max, Min◦ Nonadditive: DistinctCount, None.
Measures and dimensionsMeasures and dimensionsDimensions are used to
aggregate measures. Therefore, they must be somehow related to measures
Granularity◦Important for the analysis◦There could be missing values in the
fact table
LOADING FACT LOADING FACT TABLESTABLES
11
Exercise 5: Loading Fact Exercise 5: Loading Fact TablesTables Project name: MMMFactLoad-lastname Package name: FactLoad.dtsx Tasks
◦ Create Inventory Fact table◦ Load Dim Batch◦ Load Manufacturing Fact◦ Load Inventory Fact
Deliverable: email a screenshot of the “green” outcome of the ETL project to [email protected], with a subject title “ISQS 6339 EX5 - <lastname>”
12
Inventory Fact TableInventory Fact Table
Create a Table InventoryFact in your database.◦ Compound primary key: DateOfInventory,
ProductCode, and Material◦ Define two foreign keys
Column Name Data Type Allow Nulls
InventoryLevel Int No
NumberOnBackorder Int No
DateOfInventory Datatime No
ProductCode Int No
Material Varchar(30) No
13
Data Sources for Loading Data Sources for Loading FactFact For loading DimBatch table and ManufacturingFact
table◦ BatchInfo.CSV
For loading InventortyFact table ◦ Lin.OrderProcessingSystem Database
14
Control Flow for Loading Facts and the Control Flow for Loading Facts and the Remaining DimensionRemaining Dimension
Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package
15
Flat File Connection Flat File Connection Data types
◦BatchNumber, MachinNumber: four-byte signed integer [DT_I4]
◦ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4]
◦TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp]
Only check BatchNumber as the input of Dim Batch
All columns are needed for fact tables
16
Some Frequently Used Some Frequently Used NodesNodes
Load DimBatch Data FlowLoad DimBatch Data Flow
18
Load DimBatch Data FlowLoad DimBatch Data Flow
19
Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.
The Flat File SourceThe Flat File Source
20
21
Sort Transformation
In the Aggregate item,Define “Group-by” BatchNumber.
In Derived column item, Define BatchName From BatchNumber
Use the expression(DT_WSTR, 50)[BatchNumber]To change the data typeOf BatchName.
Load Fact Data FlowLoad Fact Data Flow
22
Derived Columns for the Fact Derived Columns for the Fact tabletable
23
Expressions for the Derived ColumnsExpressions for the Derived Columns
AcceptedProducts◦ [NumberProduced] – [NumberRejected]
ElapsedTimeForManufacture◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped])
DateOfManufacture◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)
[TimeStarted],1,10) This expression converts TimeStarted into a
string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.
24
25
OLE DB DestinationFor loading the facttable
Load Inventory FactLoad Inventory Fact OLE DB Source
◦ OrderProcessingSystem.InventoryFact OLE DB Destination
◦ MaxMinManufacturingDM-lastname.InventoryFact No transformation
There are two ways to loading the table◦ Create the table and use ETL to load it◦ Import directly from the source to the database
MaxMinManufacturingDM-lastname
26
Debugging ResultsDebugging Results
27
Loading DimBatch Loading ManufacturingFact
28
BUILDING AN OLAP BUILDING AN OLAP CUBECUBE
Exercise 6: Design a CubeExercise 6: Design a Cube Project name: ISQS6339_EX6_2015_lastname Tasks
◦ Add in new date items (year, quarter, and month) to two fact tables
◦ Create time dimension using Manufacturing Fact table◦ Define calculated measures (Total Products, Percent
Rejected) ◦ Define hierarchies of attributes in dimension tables◦ Create a cube from the MaxMinManufacturing data
mart with hierarchical date dimension Deliverable:
◦ Screenshots: dimension hierarchies, dimensions, relationships of facts and dimensions, deployment result, format of measures, and browsing results.
29
Three Steps to Create a Cube from Three Steps to Create a Cube from Data SourcesData Sources Defining data source Defining data source view
◦ Add in three new columns of year, quarter, and month for the two fact tables
Building a cube. ◦ Define a new dimension Dim Time from
Manufacturing Fact table Customize the cube:
◦ Link two fact tables in a cube◦ Define new primary key for Dim Time◦ Define calculated measures◦ Relate dimensions to measures
30
T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - ManufactureDefinition - Manufacture YearOfManufacture
CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture
CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3
THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6
THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9
THEN 'Q3'ELSE 'Q4'END
MonthOfManufactureCONVERT(char(4), YEAR(DateOfManufacture)) +
RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)
31
T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - InventoryDefinition - Inventory YearOfInventory
CONVERT(char(4),YEAR(DateOfInventory)) QuarterOfInventory
CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3 THEN
'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6 THEN
'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9 THEN
'Q3'ELSE 'Q4'END
MonthOfInventoryCONVERT(char(4), YEAR(DateOfInventory)) +
RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2)
32
Data Source ViewData Source View
33
New columns
Select Measures PageSelect Measures Page
34
Uncheck ManufactureFact Count
35
The finished cube
36
CubeStructure
37
Defining a format string
38
Inventory measures
“Number on Backorder” is also set with these two parameters
Calculated measures – Calculated measures – made-up factsmade-up facts The definition of calculated measure is stored
in the OLAP cube itself. The actual values that result from a calculated
measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation.
The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL.
39
41
42
DIMENSIONSDIMENSIONSin SQL Serverin SQL Server
Types of DimensionsTypes of Dimensions Fact dimensions: the Dimensions created from attributes
in a fact table Parent-Child dimensions: Built on a table containing a
self-referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure
group multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment. ◦ To create a role playing dimension, add the dimension to the
Dimension Usage tab multiple times. Then create a relationship between each instance of the dimension and the measure group.
Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group
Data mining dimensions: the information discovered by data mining
Many-to-many dimensions: e.g. multiple ship to addresses
Slowly changing dimensions
43
Slowly changing Slowly changing dimensionsdimensionsType 1 SCD – no trackType 2 SCD – tracking the entire history,
adding four attributes: SCD Original ID, SCD Start Date, SCD End Date, SCD Status
Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value
Add a time dimension (a fact Add a time dimension (a fact dimension)dimension)
Rename time dimensionRename time dimension
Date HierarchyDate Hierarchy
Material Hierarchy & Plant Material Hierarchy & Plant HierarchyHierarchy
Product HierarchyProduct Hierarchy
Relating Dimensions in Relating Dimensions in the Cubethe Cube