information architech and dwh with powerdesigner

37
INFORMATION ARCHITECTURE FOR DWH PROJECTS RUAIRI PRENDIVILLE SENIOR CONSULTANT , SYBASE (UK) JUNE I4 TH , 2012, ISTANBUL

Upload: sybase-tuerkiye

Post on 21-Nov-2014

1.211 views

Category:

Technology


2 download

DESCRIPTION

14 Haziran 2012 tarihinde Sybase Türkiye tarafından yapılan PoerDesigner etkinliğindeki DWH ve IA projelerindeki PowerDesigner'in önemi konulu sunum

TRANSCRIPT

Page 1: Information Architech and DWH with PowerDesigner

INFORMATION ARCHITECTURE FOR DWH PROJECTS

RUAIRI PRENDIVILLE SENIOR CONSULTANT , SYBASE (UK) JUNE I4TH, 2012, ISTANBUL

Page 2: Information Architech and DWH with PowerDesigner

2 – Company Confidential – June 18, 2012

AGENDA

• Introduction to DWH

– Are DWH project complex, challenges, requirements

• Information Architecture (EA) for DWH

–Models, Workflows, Artifacts

• Mapping over Sybase PowerDesigner

–Mapping to IA models and artifacts

– Features of interest for DWH

• Demonstration

–One example of IA architecture on the project

• DWH Referent IA

– Layers and recommendations

Page 3: Information Architech and DWH with PowerDesigner

3 – Company Confidential – June 18, 2012

COMPLEXITY OF DWH PROJECTS

ARE DWH PROJECTS COMPLEX?

Page 4: Information Architech and DWH with PowerDesigner

4 – Company Confidential – June 18, 2012

Page 5: Information Architech and DWH with PowerDesigner

5 – Company Confidential – June 18, 2012

BI/DWH COMPLEXITY

Causes, sources

– Data Sources Different sources, technologies, business functions, legacy,

overlapping, concepts, elements

– Scope and Performances, • Never enough, never on time, content variations(!),

– Participants • Different backgrounds, knowledge, skills, motivation, visions,

– Requirements • Continuous changes and extensions,

– Growth and Development • Volume of data, people, reports and analysis,

–Quality • Clean, right, correct on time, cleansing (division of resp.)

Page 6: Information Architech and DWH with PowerDesigner

6 – Company Confidential – June 18, 2012

BI/DWH COMPLEXITY

Change, Heterogeneity

– Never enough Users/BI Analyst can not give definite, detailed, complete and

precise specification of all reports/views in advance (!),

Changes are coming on the end, they are inevitable and continual,

– Never on time Every change should be implemented and used in usable time

frame, before user forgets about it,

– Never one and exactly one data source, Different sources results in: – Different DMS technology, different refresh rate, different volume,

different performances (management).....

– Data overlapping – consolidation,

Page 7: Information Architech and DWH with PowerDesigner

7 – Company Confidential – June 18, 2012

BI/DWH COMPLEXITY

Volume and growth, data quality

– Large Volume and intensive Growth is inevitable, Operational Data Sources, – Keep only data set needed for operational work (year?),

– Keep it in the shape suitable for operational work (Relational),

– Analytical Extension, Keep data needed for sound analysis (many years?),

Keep it in the shape suitable for analysis (MDM),

– Growth is inevitable per time and volume,

– Compromises: Time: keep last (x) months or representative sample,

Nobody is actually happy, neither IT or Business

– Data Quality Clean, consolidated data source does not exists,

Every data source needs “housekeeping” constantly, Data Entry/ETL

Page 8: Information Architech and DWH with PowerDesigner

8 – Company Confidential – June 18, 2012

BI/DWH COMPLEXITY

Performance

– Never fast enough Use of technology not designed/suited for analytics (RDBMS),

Intensive use of „ad hoc“ request – indexing problem (RDBMS), – Free exploration over arbitrary data set is heavily limited,

Very intensive and heavy administration – never ending story,

–One and only one complete “Version of the truth” Similar or overlapping analysis are presenting different data!! – Which one is correct and right? What about the rest of it?

You are publisher – hold the responsibilities, – Hold the reader's trust,

– Publish on regular basis,

– Use variety of sources and edit them with quality and consistency,

Data consistency must be established and protected,

Page 9: Information Architech and DWH with PowerDesigner

9 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

WORKFLOWS AND MODELS

Page 10: Information Architech and DWH with PowerDesigner

10 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Position of EA

Page 11: Information Architech and DWH with PowerDesigner

11 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Position of IA

Motivation, goals, business principles organizational structure, Business Functions, Services and Processes

IT support for Business Architecture System Services, Applications, Databases, Components, Forms, Reports, Data Flows....

Technology for IS Architecture, network, servers, installed instances, Access points,

From current to planned

Page 12: Information Architech and DWH with PowerDesigner

12 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Method (ADM)

– To Define Architecture Development Method Define at any point of the project Who is doing What, How and When

Define Phases, Workflows, Artifacts, Models, and Deliverable

– Essential for DWH/BI with Backward Requirement process

– Presented ADM and IA is: Agile – simplification of RUP, TOGAF

Iterative and Incremental – cyclic repetition of workflows,

Data Driven – based on Data Assets

Comprehensive – includes all activities including maintenance and RFC

Model Driven – all artifacts are represented with modeling artifacts

Requirement Driven – placed in the center of methodology

Sustainable – at any point knowledge is collected, formally specified and properly presented

Page 13: Information Architech and DWH with PowerDesigner

13 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Main ADM Cycle

Requirements&Constraints in the center

Not all are mandatory

Presented main cycle, others possible

Many cycles are expected

All that is needed to obtain sustainable system

• Development

• Deployment

• Maintenance

Active, in IME

Page 14: Information Architech and DWH with PowerDesigner

14 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

B.Data Analysis

– Objective: Discover, identify, collect, elaborate, specify, define and present Data Assets

Different abstraction levels: from conceptual to implementation

– Viewpoints: Architectural viewpoint, Data Providers and Consumers, data flow process,

engaged systems, applications, components, usage, access rights

Structural viewpoint, structure, attributes, relationships, dependencies, rules applied on conceptual and physical level

– Inputs: IA (others not in the scope),

–Outputs DA, Sources Conceptual and Physical Data

Page 15: Information Architech and DWH with PowerDesigner

15 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

C.DWH Design and Implementation

– Objective: D&I of integrated and unified data collection, organized in dimension of

time, which is subject oriented used for analysis, planning and evaluation of business performances

Establish common view (unified/integrated/complete) over the enterprise data, stable source of historical information, accommodate data growth

– Activities: Full and detailed schema specification for DWH, Staging and ODS

– Inputs: IA & DA, Sources Conceptual/Physical Data,

–Outputs IA & DA, Conceptual and Physical DWH

Page 16: Information Architech and DWH with PowerDesigner

16 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

D.ETL Design and Implementation –Objective:

To analyze, elaborate, define, specify, present and implement full and incremental ETL flows between source2staging, staging2DWH, DWH2MDM

To discover, identify, collect, elaborate, specify, define and present all characteristics of ETL flows,

– Activities: Extraction Method, Schema, Condition and Frequency for increments,

– Source 2 Target Mapping,

Transformation processes on the appropriate level of details,

Data Flow Architecture, Trash management

– Inputs: IA&DA, Sources Conceptual/Physical Data and DWH,

–Outputs DA, Data Flow, Sources Conceptual/Physical Data,

Page 17: Information Architech and DWH with PowerDesigner

17 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

E.BI Design and Implementation –Objective:

Establish Multidimensional space (Business Universe) with Facts, Measures, Dimensions and Hierarchies,

Build visualization including Reports, Dashboards, OLAP views,

Check if requested KPI set is supported and presented,

– Activities: MDM Space (above)

Detailed specification of requested KPI with mapping

Detailed specification of Reports, Dashboards and OLAP Views

Access rights and delivery mechanisms

– Inputs: DWH, IA and DA

–Outputs IA, DA, DWH (MDM) models

Page 18: Information Architech and DWH with PowerDesigner

18 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Paths – Simplified Analysis, Design and Implementation

Generally many paths are possible

Gap Analysis may discover missing info

Analysis, Design and Implementation of DWH, ETL and BI are tightly interconnected and dependent on each other

Page 19: Information Architech and DWH with PowerDesigner

19 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Models – Business&Information Architecture

– Specifies Application systems and applications,

Data Assets, Databases, Data Source/Destination, Data Providers/Consumers,

Usage of Data Assets and Applications, cooperation and collaboration of Applications and/or services,

ownership over the Data Assets, Applications and Services, elements of SLA

ETL procedures on high abstraction level.

– Viewpoints Architecture of the system

– Represents a “hat” for the rest of the system

Page 20: Information Architech and DWH with PowerDesigner

20 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Models - Source Model

– Specifies details to understand structural relationships and meaning (conceptual)

internal structure of data source with implementation details (physical)

– tables, columns, views, keys, procedures, indexes, rights, constraints, triggers

consolidated by bidirectional synchronization and associated transformation

Specifies extraction scheme for every data source,

Source for Source2Target mapping

– Viewpoint One or more diagrams per source to represent subject area

– Used to synchronize changes from source into DWH IA Starting point for change management

Page 21: Information Architech and DWH with PowerDesigner

21 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Models – DWH model

– Specifies details to understand structural relationships and meaning (conceptual)

internal structure of DWH with implementation details (physical),

Internal MDM structure of Data Marts (physical),

For Staging, Trash, ODS, DWH and MDM

Target for Source2Target mapping

Relational2MDM mapping,

– Viewpoints One or more diagrams to represent subject area

One or more MDM diagrams to represent Data Marts

– Used to synchronize changes from DWH IA to actual RDBMS Ending point for change management

Page 22: Information Architech and DWH with PowerDesigner

22 – Company Confidential – June 18, 2012

INFORMATION ARCHITECTURE FOR DWH

Models – Data Flow model

– Specifies Connects all important data sources to destination

Data flows from source to destination with all attributes and constraints

– characteristics of flow processes, source and destination tables, kind of the flow (ETL, replication or federation), integration preconditions, used integration service, possible outcomes etc.

Mapping source2target within every step of the flow

– Viewpoints One or more diagrams to represent actual flow task

– Aggregation, sort, filter, projection, split, join, merge, lookup

One or more diagrams to represent transformation control flow

– Used to present integration, consolidation and migration

Page 23: Information Architech and DWH with PowerDesigner

23 – Company Confidential – June 18, 2012

MAPPING ON SYBASE POWERDESIGNER

DWH RELATED MODELS AND FEATURES

Page 24: Information Architech and DWH with PowerDesigner

24 – Company Confidential – June 18, 2012

MAPPING ON SYBASE POWERDESIGNER

Data Models

Page 25: Information Architech and DWH with PowerDesigner

25 – Company Confidential – June 18, 2012

MAPPING ON SYBASE POWERDESIGNER

Architecture and Requirements

Page 26: Information Architech and DWH with PowerDesigner

26 – Company Confidential – June 18, 2012

MAPPING ON SYBASE POWERDESIGNER

DHW related features – Dependency Matrix

–What Two dimensional hierarchical matrix

Present, review and create/delete links of particular kind between two artifacts

– Any model, any diagram, any two artifacts

– Indirect (two or more links) dependency, drilling

Hierarchy of objects on row/column, Copy to CVS

– Reasoning Full, rich, useful dependency analysis (network)

EAM to understand and present dependency between Data Assets and Data Providers and Consumers,

PDM to create mapping overview

DMM to present actual source to target dependencies

Page 27: Information Architech and DWH with PowerDesigner

27 – Company Confidential – June 18, 2012

MAPPING ON SYBASE POWERDESIGNER

DHW related features – Mappings

–What Modeling Connection between objects

– Mapping with transformation (O/R, R/R, O/O)

– Generation (Generate Mappings)

Wizard to convert mappings into Transformation Task (ILM)

Mapping Editor

– Reasoning Data Flow specification

Relational to Multidimensional,

– DWH2MDM

Relational to Relational,

– Source2Target

Any descriptive dependency

– Federation concept (not Replication or ETL)

Page 28: Information Architech and DWH with PowerDesigner

28 – Company Confidential – June 18, 2012

MAPPING ON SYBASE POWERDESIGNER

DHW related features – Impact/Lineage Analysis

–What Impact Analysis – consequences of the change

Lineage Analysis – objects forming the basis for object

Temporary View for Review

IAM for permanent view, snapshot (Drilling, Exploring)

Analysis Rules changeable (Impact/Lineage),

– Reasoning Change Management evaluation, estimation and planning,

– To asses change impact before it happens (costs, time, resources)

Snapshots

– Development points, different version of system,

Meta Data BI

– To explore meta-data set, discover implicit dependencies,

Page 29: Information Architech and DWH with PowerDesigner

29 – Company Confidential – June 18, 2012

DEMONSTRATION

EXAMPLE

Page 30: Information Architech and DWH with PowerDesigner

30 – Company Confidential – June 18, 2012

REFERENT INFORMATION ARCHITECTURE

MAIN LAYERS OF INFORMATION ARCHITECTURE

Page 31: Information Architech and DWH with PowerDesigner

31 – Company Confidential – June 18, 2012

REFERENT INFORMATION ARCHITECTURE

Recommended architecture - example

Page 32: Information Architech and DWH with PowerDesigner

32 – Company Confidential – June 18, 2012

REFERENT INFORMATION ARCHITECTURE

Data Access Layer - recommendations

– Set of processes, tools, activities and models: Data extraction (E) from operational system to DWH,

Transformation to suitable shape (T),

– Data cleansing, consistency check, integrity,

– Translation from operational to enterprise format,

– Enterprise DWH data structure is inevitable different then operational,

Data loading into DWH (L),

– Recommendations : Understand OS structure, rules and dynamics,

Dynamics of data refresh rate should be realistic,

Changed Data Capture Algorithm: Intrusive/Non Intrusive,

Apply effective ETL/ELT, use staging area,

Document everything - very intensive and complex changes,

– Mappings between Data Sources and DWH Destination,

Page 33: Information Architech and DWH with PowerDesigner

33 – Company Confidential – June 18, 2012

REFERENT INFORMATION ARCHITECTURE

DWH Layer and Recommendations – DWH, Staging, Trash and ODS

Common view on enterprise data, regardless of how/who will use it,

Unification offers flexibility in how the data is later interpreted

A stable source of historical information,

Efficient accommodation of a data explosion (growth),

Supply data for Analytical layer on required granularity,

Trash and Alerts&Matching to jump over initial cleansing (blocker),

– Recommendations: Use pre-packaged solution and existing experience,

Use relational and multidimensional modeling (document all),

Relational to address performance issues, follow OS paradigm,

Multidimensional to present later Business Universe for Analytics,

Use views to transform and map Relational to Multidimensional and back,

Page 34: Information Architech and DWH with PowerDesigner

34 – Company Confidential – June 18, 2012

REFERENT INFORMATION ARCHITECTURE

Delivery Layer and Recommendations – Set of processes, tools, activities and models:

Selection of data subset to be delivered,

Reorganization (format) of the data to be delivered,

– Aggregation, Summing, Counting, additional classification,

– Data transformation (Date, Time), slowly changing dimensions

Transform DWH structures to “Business Universe”,

– Facts, Measurements, Dimensions, Hierarchies, Business lang. abstraction

Granularity accordingly to the End User needs,

– Recommendations: Use multidimensional modeling,

Model transformation/mappings between DWH and Data Mart(s),

Extend multidimensional model with required BI meta-data,

Define refresh rate on the basis on user needs (constraints),

Aggregations are difficult for incremental update

Page 35: Information Architech and DWH with PowerDesigner

35 – Company Confidential – June 18, 2012

REFERENT INFORMATION ARCHITECTURE

MDM Layer and Recommendations – Data marts

DWH derivatives, provide the business community answers to asked questions and strategic analysis,

Tailored for a particular capability or function of enterprise,

Vertically organized and bounded to one business function,

Organized in multidimensional structure,

OLAP – On line Analytical Processing – ROLAP/MOLAP

– Recommendations: Choose proper storage technology (ROLAP/MOLAP),

– Special storage may not be standard and may narrow your choices,

– Common storage may not be performative enough,

Choose Virtual Marts as basement of Analytics,

Use separate V. Mart for separate business concerns,

Adjust to BI meta-data requirements, use automated access,

Page 36: Information Architech and DWH with PowerDesigner

INFORMATION ARCHITECTURE FOR DWH PROJECTS

QUESTIONS?

RUAIRI PRENDIVILLE SENIOR CONSULTANT , SYBASE (UK) JUNE I4TH, 2012, ISTANBUL

Page 37: Information Architech and DWH with PowerDesigner