modeling pattern awareness - the hans blog · modeling pattern awareness | 2/28/2014. modeling...

8
Modeling Pattern Awareness 2014 Authored by: Hans Hultgren Modeling Pattern Awareness

Upload: others

Post on 31-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

Modeling Pattern Awareness

2014 Authored by: Hans Hultgren

Modeling Pattern Awareness

Page 2: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

1

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

Modeling Pattern Awareness The importance of knowing your pattern

Forward

Over the past decade Ensemble Modeling has been steadily becoming the preferred method for modeling the data warehouse. More recently we have also seen a rise in the schema-less alternatives associated with many of the big data deployments. Where we once had to choose between 3NF and Dimensional modeling, we now have a dynamic range of alternatives (and flavors) that translate into an expanding set of possible patterns. As we move forward in this new world of modeling possibilities, it becomes increasingly important to know and understand your modeling pattern.

When discussing the options, strategies and considerations related to your design pattern it becomes clear that there are no strictly right or wrong choices. The only choice that is inherently wrong is to proceed without defining or understanding your modeling pattern.

Page 3: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

2

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

I ndex

FORWARD................................................................................................................................1 INDEX .....................................................................................................................................2 MODELING THE DATA WAREHOUSE ..............................................................................................3

MODELING PATTERN FLAVORS AND ALTERNATIVES .......................................................................3 SCHEMA-LESS ALTERNATIVES ....................................................................................................4 DEPLOYMENT AND ARCHITECTURE ALTERNATIVES ........................................................................4

DETERMINING YOUR PATTERN .....................................................................................................5 MODELING PATTERN AWARENESS .................................................................................................5 WANT TO LEARN MORE? ............................................................................................................5 LINKS AND RESOURCES ...............................................................................................................6 ABOUT HANS HULTGREN ............................................................................................................6 APPENDIX: TYPES OF ENSEMBLE INFOGRAPHIC .................................................................................7

Page 4: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

3

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

Modeling the Data W arehouse

Today the modeling of an Enterprise Data Warehousing (EDW) is moving towards using Ensemble Modeling approaches. These approaches specifically address the data warehouse requirements of agility, traceability, historization and low latency performance. The Data Vault modeling approach leads this group of Ensemble Modeling methods but there are several others in the group including 2G, Anchor, Focal Point, Head & Version, Temporal and 6NF.

From a high level perspective the main modeling choices for the DW are now 3NF, Dimensional, and Ensemble modeling. However there are two factors that expand these choices into a broader array of possibilities. First is the growing list of modeling pattern flavors and alternatives that are available today, and Second is the pool of Big Data driven schema-less alternatives.

M o de l ing P attern F lav ors and A l te r nat ive s The flavors of Ensemble modeling begin with the approaches mentioned above (See also the Appendix: Types of Ensemble Infographic). But within each one of these flavors there are also more choices that define a particular modeling pattern. For example, within the Data Vault modeling approach some of the more defining alternatives include a) Abstracted Concepts, b) Typed Relationships, c) Multi-Active or Multi-Valued Satellites, d) Raw vs Business Vault, e) Peg-Leg Links, f) Transactional Links, g) Link:Link relationships, h) Composite vs Concatenated BK, i) Stand-Alone Reference Tables, j) Standard Attribute Clusters, k) Unit Of Work and mixed cardinality Links, l) End Dating Alternatives, and etc. These alternatives lead to the definition of dozens of different patterns.

Within the traditional approaches of 3NF and Dimensional modeling there are also several defining alternatives to consider. For example, with Dimensional modeling you will need to make active decisions concerning a) Historization, b) Level of Conformity, c) Abstracted Dimensions, d) Outliers, e) Star vs Snow Flaking, f) Factless Facts, g) Mini-Dimensions, h) Scope and Number of Facts, i) Bridging, j) Hierarchies, k) Persisted versus In Memory, l) Key Integration, and etc.

In all cases, the choices you make for each of these alternatives will define your specific pattern. While these choices provide you with a great deal of freedom and flexibility, it is critical that you actively make these decisions and then define and document your final pattern. This is because pattern consistency and predictability are major critical success factors for your data warehousing team. Without a predictable standard your automation strategies will not be effective. Without consistency the agility of your team will be greatly compromised.

The alternatives covered thus far are related to schema-based modeling techniques and approaches. The other paradigm we need to consider is the schema-less techniques.

Page 5: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

4

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

S che m a-less A l te rnat ives The explosive growth of the Big Data movement has brought with it a great deal of public awareness of our industry. At the same time it has contributed to some misunderstandings and confusion. From the highest level we can look at Big Data as the data we could not manage given the standard tools and techniques we had available. The categories for this type of data include a) it is too big, b) it comes at us too fast, c) it changes shape too fast, or d) it lacks structure. Today we have attacked Big Data primarily from the technical direction using Hadoop paired with cloud based services. But from a modeling perspective the primary delineator is probably best described as “schema-on-read” versus “schema-on-write”. With Schema-less, we store the data before we have it aligned with a specific schema and then apply the schema at the time we read the data.

If you introduce schema-on-read modeling alternatives into your pattern then you have additional choices to make. For example, in Data Vault modeling you may consider using Name-Value Pair (NVP, and also key-value pair) as an alternative pattern for your Satellites. This means that you would not need to design and model the context attributes (that define a core concept) in advance of loading the data. Each attribute would be stored as data alongside its corresponding value. Looking at the model there would be no schema to tell you which attributes you might expect (employee, address, city, state, zip, phone, hire date, height, etc.). To see which context attribute exist you would need to read the data and grab the tags (the “Names” in the NVP) associated with each value (schema-on-read).

This leads to a set of generic modeling alternatives that you can mix into your specific modeling pattern. Examples of these include a) Hyper-Agility modeling which uses the backbone of Data Vault modeling with NVP Satellites, and b) Focal Point modeling which is an Ensemble modeling method along the lines of Data Vault and Anchor but uses a combination of attributed Satellites for core static context and groupings of NVP Satellites for expanded and dynamic context.

De p lo yme nt and Ar chitec ture A l ternati ves Beyond the modeling patterns themselves there are also deployment and architecture alternatives to consider. Here we have been discussing the enterprise data warehouse (EDW) deployments. However you may be working with other flavors of data integration initiatives such as a) Operational Integration, b) Operational Data Store or ODS, c) Regional or Subject Area DW, d) Analytical Applications, e) Raw DW only, f) BDW only, and etc. Your goals and requirements may or may not include full auditability, enterprise integration or full historization. These variables will have a big impact on your architecture and your modeling pattern. Just as with the modeling pattern, the alternatives we have represent a high degree of freedom; but in the end you will need to determine and define the characteristics of your specific deployment. This includes the requirements, goals, architecture and modeling pattern.

Page 6: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

5

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

De termining Your P attern

The road to defining your modeling pattern begins with an open mind, a willingness to learn, and the time and resources for analysis and education. Since many of the alternatives are new and evolving you should consider attending training courses and seminars to learn these techniques. In the end you should be comfortable with the mechanics and implications (the pros and cons) of each modeling pattern decision you make.

As with all initiatives the goals should be defined by business requirements. As you move to these modeling decisions they too are most often driven by some form of business-driven criteria. Constant communications with business stakeholders is perhaps the number one contributor to the success of your data warehouse program.

Modeling P attern Aw areness

The only consistently bad modeling decision is the uninformed decision. Knowing your requirement and goals, knowing the alternatives available to you, knowing the implications of each alternative, and selecting the one you think is best will always result in the best possible solution. And there is no modeling police force that will arrest you for applying exceptions to your modeling pattern. Modeling exceptions are inevitable. The only modeling pattern exception that is always wrong is the one you don’t know is an exception. If you are unaware that you are blending patterns, mixing paradigms, deploying exceptions, then you will not have a consistent, predictable and effective modeling pattern. The resulting modeling mash-up will not result in an agile data warehouse program.

Awareness leads to consistency, which leads to standardization and repeatable patterns, which leads to efficiency and effective automation, which leads to an agile data warehouse program.

W ant to Learn More?

The Data Vault Ensemble is the predominant Ensemble Modeling technique in the world today. Understanding the full benefits of the Data Vault Ensemble starts with getting your certification. This process is facilitated by Genesee Academy along with regional partners throughout the world. The course includes materials, online video lessons, two (2) days in the classroom with lectures, labs and group modeling exercises. On the last day there is an exam which results in the certified data vault data modeler (CDVDM) designation.

Please visit GeneseeAcademy.com for more information on course schedules and registration.

In Sweden please visit TopofMinds.se for course schedules and special events.

Page 7: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

6

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

L ink s and Resources

For more free information and other resources please visit these links on Twitter, Blogspot, WordPress, LinkedIN and YouTube.

About Hans Hultgren

Hans Patrik Hultgren is President of Genesee Academy and a Principal at TopOfMinds AB. Hans is a Data Warehousing and Business Intelligence educator, author, trainer, advisor and consultant. He is currently working on Business Intelligence and Enterprise Data Warehousing (EDW) with a focus on Data Vault. Primarily in Stockholm, Amsterdam, Denver and NYC. Hans recently published a book “Modeling the Agile Data Warehouse with Data Vault” which is available on Amazon websites. Specialties include EDW, Ensemble Modeling, Data Vault, Agile Data Warehousing, Big Data Integration, Education, Training, e-Learning, and Entrepreneurship.

Page 8: Modeling Pattern Awareness - The Hans Blog · Modeling Pattern Awareness | 2/28/2014. Modeling Pattern Awareness . The importance of knowing your pattern . Forward . Over the past

7

Mod

elin

g Pa

ttern

Aw

aren

ess

| 2/

28/2

014

Appendix: Types o f Ensemble Infographic