analytics best practices

7/28/2019 Analytics Best Practices

http://slidepdf.com/reader/full/analytics-best-practices 1/30

The World’s Leading Provider of eBusiness Applications Software

Analytics 7.x MetaDataAnalytics 7.x MetaData

Construction Best PracticesConstruction Best PracticesMarch 14, 2003March 14, 2003



AgendaAgenda

The Basics•Best Practices for adding and extending Dimensions

•Best Practices for adding and extending Facts

Common Problems•Circular J oins•Many to Many Relationships•Combo Tables•No Link between tables

Additional Best Practices

DW Best PracticesSummary



How to add a DimensionHow to add a Dimension

Adding a dimension table seems to be quite simple, but there are many steps thatneed to be followed to ensure proper and optimal funct ioning

1. Import the Table into the Physical Layer

• It is a Best Practice to not import any FKs; it is best to manually add them to be sure thatyou know precisely which ones are added to reduce the possibility of lurking Circular J oins2. Add the appropriate Physical joins to each corresponding Fact table

• Create FK joins from the Dimension to the Fact (1:M)• Ideally, each join should be a simple inner join on one column. More complex joins

will reduce runtime performance, and should ideally be handled by the Data Model and theETL

3. Create a corresponding Logical table in the Logical Model and map columns4. Create appropriate Logical joins in the Business Model Diagram

• Each Logical join should be complex, not an FK join. For simple, single tabledimensions, this is not much of a concern. In more complex environments, it is best toallow Analytics to choose the proper Physical join path. By notusing a complex join,Analytics does not have the option to choose the appropriate join path

5. Make a Dimensional Hierarchy• Every Logical Dimension Table (except snowflakes) should have a Dimensional Hierarchy

associated with it, even for something as simple as a LOV or if there is no natural businesshierarchy

• Create the Totals Level and any lower levels that are necessary, and enter in theappropriate value for Number of elements at this level. These values are important whenAnalytics has to decide which Fact Source to use.



How to add a Dimension (cont.)How to add a Dimension (cont.)

6. Level Set Attributes• Place any Attributes from the Logical Dimension Table to their appropriate levels in the

hierarchy (e.g., put Quarter Desc to the Quarter Level in the Time Dimensional Hierarchy)7. Level Set appropriate Facts

• For any and all Fact Sources that will join to this new Dimension, you must add theDimensional Hierarchy and its appropriate level in the Content Tab for each source

• This step is f requently not performed, and is a significant cause of MetaData errors:

• A consistency error of a Logical Table does not join to any Fact source• Sometimes this will lead to the wrong Fact Source being used in queries



How to Extend a DimensionHow to Extend a Dimension

1. Add the new table into the physical model. Again, do not import any FKs, as it is best to do themyourself

2. Physically model the Dimension Extension table to its base table in the dimension via a 1:M FK

join (make it appear as a parent of the main table), even if the table is a 1:1.• This is a common problem, where Analytics over-joins to extension tables when they are

not needed, thereby slowing performance. Modeling it as a 1:M instead of a 1:1 willeliminate the over-joining problem3. Add the new table to the existing table source for the dimension4. Level set any attributes to the proper level in the Dimensional Hierarchy as appropriate

Frequently a Logical Dimension Table must be extended to include other tables. Inthe OOB Analytics Application, these are usually _DX tables, but may include other tables as well.

At this point, the table has been correctly added to an existing dimension. However, you most likelywill want to add continue with the following steps to tune the run-time performance.



How to Extend a Dimension (cont.)How to Extend a Dimension (cont.)

Sometimes it is necessary to support queries from a Dimension Extension Tableattribute (e.g., _DX or _DH) along with some facts without the need for anyattribute from the base Dimension table (e.g., _D). In addit ion to the steps justperformed on the previous sl ide,5. Create a physical 1:M join between the Dimension Extension table (e.g., the _DX or _DH table)

to all appropriate Facts. A full example would appear as below:

6. Create a new Table source for each extension table in the Logical Dimension Table sources list,without the base Dimension table (_D) included in the new source. If it is possible to have aquery where the _DH and _DX are used but not the _D, then make another source with onlythose two tables and not the _D.

7. Map the columns. Note that each attribute on the extension table should now appear undermore than one table source

Queries will now be able to bypass the _D table if its attributes are not used.



How to add FactsHow to add Facts

1. Import the table into the Physical layer, again without FKs2. Add physical joins to appropriate dimension tables.

• These joins should ideally be 1:M FK joins on single columns utilizing an inner join3. Create a new Logical Fact Table for the new table

• In Version 7.5 of Siebel Analytics, it is a best practice to add new fact tables to a new

Logical Fact table. Version 7.0 does not allow multiple Logical Fact tables; however bydoing so in 7.5 it is easier to identify extensions to the OOB application.

4. Map the appropriate columns

Adding a new Fact or Aggregate table involves the following steps:

5. Identify the appropriate Dimensional

Hierarchies and Levels in the Content Tab• This step is frequently not performed, and is

the major cause of Analytics selecting thewrong fact table

• By identifying the levels that the fact is theintersection of, Analytics can estimate the

“size”of the fact table, and can thus choosethe “smallest”fact source one to satisfy thequery

• By adjusting the #of elements for eachhierarchy level, different fact sources maybe selected



How to add FactsHow to add Facts -- AggregatesAggregates

• An aggregate table is handled like any other Fact source in Analytics.• It is added as a new source in an existing Logical Fact table, and maps to many of the same

metrics as a base level fact table• An aggregate table is an “aggregate”to Analytics based on how the table is described by its

Dimensions in the Content tab of the Source.• An aggregate will be higher up on one or more Dimensional Hierarchies• An aggregate usually has few dimensions associated with it

• Note that when moving up a dimensional hierarchy in a Star Schema, the dimension should bePhysically snowflaked. Ideally, you would want a monthly aggregate to contain a MONTH_WID

FK to a W_MONTH_D table, and not a link to the W_MONTH_WID in the W_DAY_D table.

Aggregate tables are handled cleanly in Analytics



How to Extend Fact TablesHow to Extend Fact Tables

A Fact table may be extended with the addition of other tables into the source. InOOB, these are the _FX tables.

1. Import the Fact extension table as like any other table, again without FKs2. Model the Extension table (_FX) to the base Fact table (_F) as a parent-child 1:M. This is

the same technique used for Dimension Extension tables, where a 1:1 should not be used3. Add the Fact Extension table to the existing Fact Source4. Map the columns5. Do not add an additional Fact Source for the Fact Extension table as was done for

Dimensional Extension tables.• Unlike a Dimension Extension Table, a Fact Extension Table is “helpless”without its base

Fact table. The Base Fact table contains all of the keys necessary to join to otherDimensions, and to describe the level of the Facts



How to Extend Fact Tables (cont.)How to Extend Fact Tables (cont.)

Sometimes a Fact Extension table wil l have additional Foreign Keys to other dimensions not associated into the base fact table.

6. Make the physical join necessary between the Fact Extension table and theappropriate Dimension(s)

7. Add a Logical complex join in the Business Model between the Dimension and theLogical Fact Table8. Add the new Dimension to the Fact Source’s Content tab

For example, W_ORDERITEM_F does not have a FK to W_ASSET_D. By adding

ASSET_WID or reusing an ATTRIB column on to the W_ORDERITEM_FX table, all of the Order Item facts can now be viewed by Asset:



Circular JoinsCircular Joins

•A circular join is where from any table in a dimension to any other table in either the

same or any other dimension there is more than one way to join to it (Fact tablesexcluded)

•Circular joins show themselves as extra joins in the physical SQL submitted to thedatabase

•Sometimes they may only slow performance

•More frequently, they reduce the number of records returned, creating an erroneousreport

•There are 2 types of Circular J oins, and both must be eliminated:•Intra-Dimensional, where the circular joins is entirely contained within one LogicalDimension Table

•Inter-Dimensional, where the circular join is across two Logical Dimension Tables

Circular joins are one of the 1st things to review on a repository – they causesignificant problems and can be difficult to find



Circular Joins : IntraCircular Joins : Intra

An Intra-Dimensional Circular join is where two join paths exist between two tablesin a single source for a Logical Dimension table.

Account Logical Table

•Alias one of the tables and re-link the alias to break the join.•Determine the context of the original table (Dim_Country) and the new alias(Dim_Customer_Country)

•If they do not have the same context, meaning that for an Account, the value of thecorresponding record in Dim_Country does not always equal the value inDim_Customer_Country, then the additional columns from the alias should be added as new

columns with a different name. For Example, Account.Account Country and Account.CustomerCountry•If they have the same context, meaning their values are identical and that the extra joinrepresents a denormalization, then an additional source will need to be created:

•One source should have Dim_Country and not Dim_Customer_Country•The new source would Dim_Customer_Country and not Dim_Country

Account Logical Table



Circular Joins : InterCircular Joins : Inter

An Inter-Dimensional Circular Join is where two Logical Dimension Tables arephysically connected by a join “ behind the scenes” . This scenario can also bewhere one physical table is in two Logical Dimensions, as shown below:

Time Logical Table Customer Logical Table

Analytics sees a physical join that crosses the Time Logical Table and the Customer Logical Table.When Time and Customer are used on a report together, a circular join case will arise as show below.For this report to return any rows, the Customer Acquisition Date must equal the Sales Date. If this isnot the case, then the row will not be returned, which is incorrect.

Dim_Day forCustomerAcquisition Date



Circular Joins : Inter (cont.)Circular Joins : Inter (cont.)

The solut ion involves aliasing the offending table to remove the cross Dimension join:

A report using both dimensions will now no longer join on incorrect FKs, yielding the correct result:

Analytics Aliases are identical to aliasing a table in a complexSQL query where a table is usedmore than once.



Many to Many RelationshipsMany to Many Relationships

•OOB, Siebel Analytics employs the use of Primaries to eliminate the many-to-many relationships thatexist in the Siebel OLTP.

•A standard example of this is the desire to see the Primary Employee for an Opportunity (1:M)•If the requirement is to see all of the Employees (the Sales team) for the Opportunity, then there is a

M:M situation•Requests of this type typically fall outside the scope of a pure analytical Data Warehouse, and are

more closely associated with traditional transactional or operational reporting off of the OLTP.•However, these requirements are becoming very common place as Analytics is being sold more and

more as a single reporting interface

•M:M situations present a huge potential to over count/over sum metrics, rendering them incorrect•As an example, for a $1M opportunity with 5 members on the sales team, a report that shows

all five members will each show $1M, and the total will be $5M•Other queries can simply show $5M without any indication that there is more than one record,

which is even more dangerous!•Most of the solutions presented here require an additional table which serves as the

breakout/relationship/intersection table between the opportunity and the employees. For thisexample, the W_OPTY_CON_F table suits this need.

A Star Schema works best with 1:M relationships between Dimensions and Facts. A requirement to make this into a M:M relationship is more difficult to handle in aData Warehousing environment.



Many to Many Relationships (cont.)Many to Many Relationships (cont.)

Technique #1: Select a Primary

The best technical solution to any problem is not to have to deal with it at all! See if the customeris open to the idea of only seeing primaries instead of all records. Putting a high price tag on theeffort might help dissuade this requirement.

There are several techniques to handle the M:M scenario.

Technique #2: Model the Relationship table into a lower level of the dimension•In our example, make a complete copy the entire Employee dimension, including the

Dimensional hierarchy. Rename it to something that indicates that it is a M:M version, such asEmployee (All on Opty)

•Modify the table source to include the W_OPTY_CON_F table•Create a new, lower level in the Employee (All on Opty) Dimensional Hierarchy•Add the OPTY_WID to the new lower level in the Hierarchy•For the Opportunity Fact source, link in the new dimensional hierarchy at the new, lowest level

created

•Create a new presentation table that clearly identifies the use of all employees as opposed to just the primary

•Note that over counting will s till occur




Technique #3a: Simple use of a Bridge TableA special feature of Siebel Analytics, called a Bridge Table, allows for modeling of M:M

requirements•Make the appropriate Physical joins for the Intersection table (W_OPTY_CON_F) to the

Dimension and Fact table•Create a new logical table (Opty Con Bridge), and map into it the intersection table

(W_OPTY_CON_F)•In the Business Model, place the Bridge Table between the Facts and the Dimension, with each

having a complex join to it.•Mark the intersection table as a Bridge Table•Note that over-counting will stil l occur

Physical Model Logical Model

A Bridge Table can be thought of in a similar manner as a Fact table, but without any Facts. They bothare at the “bottom”of the data model (they are only on the Many side of 1:Ms). In our example, weeven used a Fact table as a Bridge table!

The Bridge check box helps Analytics to identify that the table is a Factless Fact table and not aregular Fact table, which helps it to resolve requests and generate SQL.




Technique #3b: Use of a Weighted Bridge Table•Improving on the simple use of a Bridge table is a more complex solution where each

dimensional record associated with a fact is assigned a percentage of the whole.•In our example of 5 employees on one opportunity, each could be assigned 20% of the total

•These percentages (or weighting factors) can then be stored in the Bridge table•Metrics will then need to be adjusted to include in the weighting factor•In our example, there would still be 5 records, but each would be assigned a corresponding

percentage of the total, in this case $200K •The problem of over counting is solved

Physical Model Logical Model

•This technique requires significantly more effort, both on the ETL side as well as the Analyticsmodeling side.

•Furthermore, a business rule to spread the totals among the many dimensional records may bedifficult to arrive at – customers may not wish to do so




Technique #4: Use of a Level Based Measures•Level Based Measures are more of a way of not letting a user over count than anything else•A measure is assigned to a level in a dimensional hierarchy through the content tab of the

logical column

•By doing so, it stipulates that the metric cannot be summed up to higher levels•This includes the Total level, which is the same as not included at all•By forcing the metric to be at the “Many Level”(i.e., the All Employee level), a report will always

break the metric out over that level.•A LBM might be considered with other techniques as well.




In our example, if the OptyRev metric is Level set to the All Employee level, 5 records (for our 1opportunity example) would appear on every report that uses the OptyRev metric. This is most usefulwhen a report uses an attribute in the dimension that is identical for multiple records.

EastJohn

EastScott

WestBill

WestLarry

WestTom

Emp RegionEmp NameIf each of these 5 employee records were associatedwith the same $1M Opportunity, and our report askedfor Emp Region and OptyRev, we would get thefollowing result:

$2MEast

$3MWest

Opty RevEmp Region

The use of a level based measure would prevent the summing up, and might help to indicate to theuser that they tried to do something that they shouldn’t. The user might then realize the mistake, andthen add the names in to make sense of the report.

$1MEast

$1MEast

$1MWest

$1MWest

$1MWest

Opty RevEmp Region



Combo TablesCombo Tables

If the table is primarily a stand-alone table not used with any other dimensions, (as found frequentlywhen adding additional external data sources designed by the customer), then use the followingguidelines:

•Create a dimension table, using the combo table as its source•Create a fact source, using the same combo table as its source•No aliases are needed for this case•A query that incorporates the dimension and the fact will not perform a self-join, but instead

will issue tight SQL against that single table.

A Combo table is one that serves as both a Fact table and a Dimension table.Usage of the table will dictate how the table is to be modeled.



Combo Tables (cont.)Combo Tables (cont.)

•For example, assume that the Physical Table Activities isa combo table and is included in the Activities Dimension

and the Activities Facts without any aliases•Assume that as a Fact, the Combo table joins to the

Dim_Acct table (see top diagram to the right)•Assume that as a Dimension, the Combo table joins to

the W_FACTS_F Fact table (see bottom diagram to theright)

For Combo Tables that are more integrated with other dimensions and facts, thetable should be aliased to avoid over-joins.

When a report on W_FACTS_F is issued with the Account and Activities dimensions included, therewill be an overjoin between the Account and Activities Dimension tables. (Inter-Dimensional Circular

J oin)

Expected Behavior Actual Behavior



Combo Tables (cont.)Combo Tables (cont.)

Just as with the Inter-Dimensional Circular Join solution, the use of an aliasrectifies the problem:

Note that now when Activities Dimension and Activities Fact are used together by themselves,Analytics will issue a self join to the underlying table, which will negatively impact performance.



No Direct Physical LinkNo Direct Physical Link

There are several scenarios where this may occur, including:•A M:M between a dimension and a fact table (as previously discussed), where the intersection

table lies in between the two•Along with a Fact table, the dimension table is a snowflaked, higher table in a dimension (such

as W_HOUSEHOLD_D table in OOB)•When modeling on a physical database that resembles an OLTP type normalized system

Sometimes it is necessary to have a logical relationship between two tables thatare not directly physically related.

There are two strategies to address this type of problem:

1. Force fit any additional tables into the simple Star Schema Logical Model (as in Technique #2 forM:M)•Involves adding the new linking table(s) into one of the existing Logical table’s source•May require the creation of a new level in the hierarchy•Allows the existing logical model to remain intact•May overcomplicate the sources



No Direct Physical Link (cont.)No Direct Physical Link (cont.)

2. Enhance the Logical Model with additional Logical tables (i.e., Snowflakes and Bridges)• Add the table to the Logical Model as a new logical table with the same type of relationship,

either as• a Bridge table (as in Technique #3 for M:M)

• or as a Snowflaked table (e.g., Household is snowflaked in OOB)

Bridge Table Snowflaked Table



No Direct Physical Link (cont.)No Direct Physical Link (cont.)

To help with the distinction between techniques 1 and 2, consider how you could model the TimeDimension if instead of a single W_DAY_D table, it were completely snowflaked into W_DAY_D,(holding only day level attributes) W_WEEK_D, W_MONTH_D, W_QUARTER_D and W_YEAR_D:

Technique #1 would bring all of the associated time tables into a single Table Source for a singleLogical Time Table. The Business Model would remain identical to what it is in OOB.

Technique #2 would create a Logical Table for each of the physical snowflake tables, thereby makingthe logical resemble the physical

The tradeoffs specific to each case will drive the decision of which technique to use.



Additional Best PracticesAdditional Best Practices

A few other Best Practices to consider

• Save backups of the online and offline repositories before and after every completedunit of work.

• Ensure that aliases for presentation layer columns and tables are not used unlessnecessary – unintended effects can occur at report generation time.

• When creating table aliases in the Physical Layer, keep the original table name,followed by the alias name, for example W_DAY_D Hire Date.

• Opaque Views (A Physical Layer table that consists of a Select statement) should beused only as a last resort option.

• In general, push as much processing to the database as possible. This includes taskssuch as filtering, string manipulation and additive measures.

• When importing tables, do so without FKs. It is best to manually add them in an effortto never allow hidden circular joins to be created.



DW Best Practices StrategyDW Best Practices Strategy

•Do as much work at night as possible. In general, let the hard work be captured inthe Physical Model and calculated by the ETL. Doing so will simplify the Analyticsrepository and can significantly improve performance.

•If a Logical Dimension Table is joining all over the place at query time to gatherattributes into one place, then that work should be done in the Physical Data Modeland ETL

•As a rule of thumb, the closer your Warehouse Physical Model resembles a 3NF datamodel (like the Siebel OLTP), the more complex and slower your DW will be.

•Think about making your Physical Data Model resemble what your Logical Model willlook like in the end. Make it look like a clean, simple Star as much as possible

•This is what the Logical Model is set up to be – a simple (but large!) Star Schema

•When extending OOB Analytics, do not be afraid to create a whole new Star ordimension table! By overly modifying existing stars and dimension tables, the MetaDatacan become too complex and perform poorly.

Many of the difficult problems encountered in Analytics are ideally not solved by Analytics but by proper Data Modeling/Data Warehouse design. This isparticularly true with regards to performance.



DW Best PracticesDW Best Practices

Simple Dimensional Attribute Denormalization•One of the best ways to improve query performance and simplify the Analytics

MetaData is to denormalize attributes into the physical dimension tables.•Analytics can easily join to other tables to retrieve a piece of information within a

dimension. Here, Country information is linked into the Account Logical Table atrun time.

•It is recommended to do this in the Data Model by populating Country columns inthe Dim_Account table at ETL time

•Analytics will not have to join at query time to the Dim_Country table –performance improvement

Use of ROW_WIDs for LOVs and other tables•The use of a simple integer based FK join is the fastest type of join•With LOV’s do not join based on Name and Type columns – this makes things

overly complicated

Ensure Referential Integrity in the Database

•Doing so allows inner joins to be created, which better utilize indexes than outer joins•It also ensures that records are not dropped if an inner join is used

Convert Non-Indexable Common Filters to Indexable Filters via ETL•If a commonly used filter is slow, such as a Name not like ‘%SI%’, convert it to

an indexable flag column via the ETL



SummarySummary

1. Eliminate all Physical Layer Circular J oins – Use Aliases!2. Level set all Fact Sources via the Content tab

3. Add new tables into existing sources first, then…4. Create new sources for dimensions to help eliminate over-joins by giving Analytics a

“smaller”option to choose from5. The closer your Physical model resembles the Logical model, the easier and faster

running the Analytics environment will be.

6. Do the heavy lifting in the ETL! There is no replacement for proper Data Model andETL design

For more information and details around these and other Best Practices, please consultInfo Center:Implementation Resources > Competency Centers > Analytics Competency

The key take-aways from this Tech.Update:

analytics best practices

Documents