lecture @dhbw: data warehouse part iv: …buckenhofer/20181dwh/buckenhofer-d… · top-down vs...

Post on 05-Aug-2018

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A company of Daimler AG

LECTURE @DHBW: DATA WAREHOUSE

PART IV: FRONTEND, METADATA, PROJECTS, ADVANCED TOPICSANDREAS BUCKENHOFER, DAIMLER TSS

ABOUT ME

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.doag.org/de/themen/datenbank/in-memory/

http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas BuckenhoferSenior DB Professionalandreas.buckenhofer@daimler.com

Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics

As a 100% Daimler subsidiary, we give

100 percent, always and never less.

We love IT and pull out all the stops to

aid Daimler's development with our

expertise on its journey into the future.

Our objective: We make Daimler the

most innovative and digital mobility

company.

NOT JUST AVERAGE: OUTSTANDING.

Daimler TSS

INTERNAL IT PARTNER FOR DAIMLER

+ Holistic solutions according to the Daimler guidelines

+ IT strategy

+ Security

+ Architecture

+ Developing and securing know-how

+ TSS is a partner who can be trusted with sensitive data

As subsidiary: maximum added value for Daimler

+ Market closeness

+ Independence

+ Flexibility (short decision making process,

ability to react quickly)

Daimler TSS 4

Daimler TSS

LOCATIONS

Data Warehouse / DHBW

Daimler TSS China

Hub Beijing

10 employees

Daimler TSS Malaysia

Hub Kuala Lumpur

42 employeesDaimler TSS IndiaHub Bangalore22 employees

Daimler TSS Germany

7 locations

1000 employees*

Ulm (Headquarters)

Stuttgart

Berlin

Karlsruhe

* as of August 2017

5

• After the end of this lecture you will be able to

• Understand function of Frontend Tools and Information Design

• Understand the necessity for metadata

• Understand lifecycle of DWH projects

• Advanced topics like Operational BI, DWH Appliances, Cloud BI

WHAT YOU WILL LEARN TODAY

Data Warehouse / DHBWDaimler TSS 6

FRONTEND

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 8

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Integration Layer

(Cleansing Layer)

Aggregation Layer

Metadata Management

Security

DWH Manager incl. Monitor

VISUALIZATION IN THE USUAL CASE OF LIFE

Data Warehouse / DHBWDaimler TSS 9

RUSSIAN CAMPAIGN OF NAPOLEON

Data Warehouse / DHBWDaimler TSS 10

Source: https://de.wikipedia.org/wiki/Charles_Joseph_Minard

MAPPING THE 1854 LONDON CHOLERA OUTBREAK

Data Warehouse / DHBWDaimler TSS 11

Source: https://www1.udel.edu/johnmack/frec682/cholera/

MAPPING THE 1854 LONDON CHOLERA OUTBREAK

Data Warehouse / DHBWDaimler TSS 12

EXCERCISE: VISUALIZE AS MUCH AS POSSIBLE

Data Warehouse / DHBWDaimler TSS 13

Umsatz in €

2014 2015 2016

Kanada 16.000 14.000 17.000

England 8.000 9.000 8.000

Frankreich 7.000 4.000 5.000

USA 60.000 85.000 90.000

Deutschland 4.000 10.000 15.000

Australien 10.000 8.000 15.000

Umsatz 105.000 130.000 150.000

POSSIBLE SOLUTION 1

14Data Warehouse / DHBWDaimler TSS

POSSIBLE SOLUTION 2

15Data Warehouse / DHBWDaimler TSS

• Reporting (Standard, ad-hoc)

• OLAP

• Dashboards, Scorecards

• Advanced Analytics / Data Mining / Text Mining

• Search & Discovery

INTERFACE TO THE END USER

Data Warehouse / DHBWDaimler TSS 16

Standard Reports

• Prepared static reports that can be executed at request by end users

• Are executed at the end of an ETL process and e.g. send by email to end users

• Normally based on fact tables and its dimensions

• Reports are often lists similar to Excel-Sheets but can also contain graphics (e.g. line charts)

Ad-hoc Reports

• End users create their own reports („Self service“)

REPORTING (STANDARD, AD-HOC)

Data Warehouse / DHBWDaimler TSS 17

ROLAP / MOLAP Client Frontend

• Prepared cubes (multidimensional or relational fact tables)

• User can perform interactive analysis of data

• Rollup / drill-down

• Pivot

• Slicing

• Dicing

OLAP

Data Warehouse / DHBWDaimler TSS 18

„Progress reports“

Provide an overall view of KPIs (Key Performance Indicators)

Combination of several elements from Reporting and/or OLAP (e.g. line charts) into an overall view (like a „cockpit“)

Dashboard is more focused on operational goals

• High-level overview what is happening

Scorecard is more focused on strategic goals

• Plan a strategy and identify why something happens

DASHBOARDS, SCORECARDS

Data Warehouse / DHBWDaimler TSS 19

See Mr. Bollinger‘s lecture

ADVANCED ANALYTICS / DATA MINING / TEXT MINING

Data Warehouse / DHBWDaimler TSS 20

Not just numerical data

Analysis of new data types gets more and more important

• Text

• GPS coordinates

• Pictures

• Videos

Data can be available in RDBMS (e.g. text modules/indexes available), Hadoop or SQL DBs

SEARCH & DISCOVERY

Data Warehouse / DHBWDaimler TSS 21

MANY GRAPHICAL ELEMENTS TO USE IN REPORTS

Data Warehouse / DHBWDaimler TSS 22

Source: https://github.com/d3/d3/wiki/Gallery

MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR

Data Warehouse / DHBWDaimler TSS 23

Source: Hichert / Faisst, http://www.backup-page.hichert.com/

DO YOU USE 3D USUALLY ?

Data Warehouse / DHBWDaimler TSS 24

Some remarks about previous slide

• 3D elements introduce clutter and give not more information

• Pie chart most often does not make sense

• Line chart barely readable

• Labels are placed outside of the graphic

• Tachometer costs a lot of space and show

• Too much color in general

• Color without meaning, e.g. red should be used for alarms / errors

MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR

Data Warehouse / DHBWDaimler TSS 25

STORY TELLING WITH APPROPRIATE VISUALIZATION

Famous example by Hans Rosling (watch 3:08 onwards)

https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=de

Data Warehouse / DHBWDaimler TSS 26

Information design is the practice of presenting information in a way that fosters efficient and effective understanding of it.(source: Wikipedia, https://en.wikipedia.org/wiki/Information_design )

Some authors are well known for their criticism of many graphical representations - they provide rules for good information design

• Edward Tufte

• Stephen Few

• Rolf Hichert

INFORMATION DESIGN

Data Warehouse / DHBWDaimler TSS 27

Define standards, e.g.

• use always the same colors and with care, e.g.

• red = negative

• green = positive

• pie charts are rarely useful and should be avoided

• better use bar chart or line chart

• No 3D elements as these elements don’t enhance information but introduce clutter

• Standardize abbreviations, e.g. PY = previous year

INFORMATION DESIGNREDUCE TO THE ESENTIALS

Data Warehouse / DHBWDaimler TSS 28

TABLE WITH INTEGRATED BAR CHARTS

Data Warehouse / DHBWDaimler TSS 29

Source: Hichert, http://www.hichert.com/de/resource/table-template-02/

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?

30Data Warehouse / DHBWDaimler TSS

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING

31Data Warehouse / DHBWDaimler TSS

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?IMPROVED VERSION

32Data Warehouse / DHBWDaimler TSS

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING

33Data Warehouse / DHBWDaimler TSS

EYE-TRACKING - BEFORE AND AFTER

34Data Warehouse / DHBWDaimler TSS

Consumers / BI Users

• use reports, OLAP and dashboards to obtain information

Power Users

• Use reports , OLAP and dashboards to obtain information

• Create new reports and dashboards

Data Scientists

• Statistical / mathematical geeks

• Analyze / explore data

• Need to analyze raw (non-cleansed, non-transformed) data

BI END USER ROLES

Data Warehouse / DHBWDaimler TSS 35

META DATA MANAGEMENT

WHAT IS METADATA?

Data Warehouse / DHBWDaimler TSS 37

Data about

other data

Business Metadata

• Definition of business vocabulary and relationships

• Definition of the value range

• Linkage to physical representation

TYPES OF METADATA

Data Warehouse / DHBWDaimler TSS 38

Report and ETL metadata

• Report definitions

• Data sources

• Column definitions

• Computations

Logical and physical metadata of data model

• Table structure

• Definition of columns

• Relationships between tables and columns

• Dimension hierarchy

TYPES OF METADATA

Data Warehouse / DHBWDaimler TSS 39

THE AREAS OF METADATA

Data Warehouse / DHBWDaimler TSS 40

THE AREAS OF METADATA CONNECTED

Data Warehouse / DHBWDaimler TSS 41

Components of a data warehouse system are interconnected

• BI report user has to know

• the meaning, definitions of the shown measures, „KPIs“ (key performance indicators)

• BI report designer has to know

• the table definitions

• the meaning of the column values

• ETL job designer has to know

• the table definitions or the exact definition of the measures

• Database administrator has to know

• Which tables are used by ETL jobs, reports

WHY A COMMON METADATA REPOSITORY?

Data Warehouse / DHBWDaimler TSS 42

Metadata driven ETL development

• Generate parts of ETL code

• increasing interest for Data Vault development projects

• Tools e.g. MID Innovator, Quipu, AnalytiX DS, Talend, Pentaho, Wherescape, andothers

Common metadata repository ensures consistency across all components

• Many tools involved (DB, ETL, Frontend, …)

Enables cross component metadata analysis

• Data Lineage

• Impact Analysis

WHY A COMMON METADATA REPOSITORY?

Data Warehouse / DHBWDaimler TSS 43

“Data lineage”

• Import & Browse Full BI Report Metadata

• Navigate through report attributes

• Visually navigate through data lineage across tools

• Combines operational & design viewpoint

WHY A COMMON METADATA REPOSITORY?

Data Warehouse / DHBWDaimler TSS 44

“Impact Analysis”

• Show complete change impact in graphical or list form

• Includes impact on reports in BI tools

• Visually navigate through impacted objects across tools

• Allows impact analysis on any object type

WHAT HAPPENS IF I CHANGE THIS COLUMN?

Data Warehouse / DHBWDaimler TSS 45

• Show relationships between business terms, data model entities, and technical and report fields

• Requires cross-tool mapping of business terms

• Allows field meaning to be understood

• Allows business term relationships to be understood

WHAT DOES THIS FIELD MEAN?

Data Warehouse / DHBWDaimler TSS 46

• Shows objects that user manages

• Shows stewardship relationships on business terms

• Shows user group associations

WHAT OBJECTS DOES THIS USER OWN?

Data Warehouse / DHBWDaimler TSS 47

• Navigation through complete job details

• Navigation of complete operational metadata

WHAT HAPPENED ON THE LAST JOB RUN?

Data Warehouse / DHBWDaimler TSS 48

DATA WAREHOUSING PROJECTS

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 50

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Integration Layer

(Cleansing Layer)

Aggregation Layer

Metadata Management

Security

DWH Manager incl. Monitor

Top Down (Inmon)

Bottom Up (Kimball)

Top-Down (Inmon)

• Comprehensive approach regarding available data

• Design Core Warehouse Layer = integrated data model first considering all requirements

• Design data marts afterwards

Bottom-Up (Kimball)

• Approach focusing on fast delivery of first results

• Design one data mart first

• Next Marts are modeled afterwards usually using Kimball architecture

• conformed dimensions to integrate different data marts / fact tables

TOP-DOWN VS BOTTOM-UP APPROACH

Data Warehouse / DHBWDaimler TSS 51

TOP-DOWN VS BOTTOM-UP APPROACHADVANTAGES AND DISADVANTAGES

Data Warehouse / DHBWDaimler TSS 52

Top-Down (Inmon) Bottom-Up (Kimball)

☺ Core Warehouse Layer is designed optimal ☺ Early involvement of end users

☺ Data from Core Warehouse Layer is reused in many Marts

☺ Fast results

Time-consuming approach with high preparatory effort

Focus on single Marts leads to risk that overall view is lost, esp. properly designed Core Warehouse Layer

High risk with changing requirements Data often not reused but inconsistently copied across Marts

Both approaches have their down-sides

• Top-Down takes enormous initial effort to build data model for Core Warehouse Layer

• Bottom-Up is risky as central / integrated focus is lost

Think big, start small

• Think Big: Design conceptual data model for Core Warehouse Layer covering whole enterprise

• Start small: Implement physical data model for Core and Mart Layer in iterations by each business department

THINK BIG, START LOCAL

Data Warehouse / DHBWDaimler TSS 53

• DWH is not a product

• DWH databases are more complex with different layers and data models

• Data first, code is secondary

• Data quality is a major concern

• Data integration is a challenging objective

• Business need difficult to justify quantitatively

WHAT’S DIFFERENT IN DWH PROJECTS?

Data Warehouse / DHBWDaimler TSS 54

WHY DO DWH PROJECTS FAIL?

Data Warehouse / DHBWDaimler TSS 55

Feasibility study Analysis Design Implementation TestOperations and maintenance

PROJECT PHASESSMALL ITERATIONS INSTEAD OF LONG PHASES

Data Warehouse / DHBWDaimler TSS 56

Organizational team that coordinate and standardize DWH activities within an (end user) organization

• Define standards and create BI portfolio (e.g. which tools/products to use)

• Create DWH architecture and govern BI activities

• Establish processes for business and IT interaction

• Monitor DWH/BI market for new trends

• Determine skills and experience of Business users

BICC: BI CENTER OF EXCELLENCE

Data Warehouse / DHBWDaimler TSS 57

Define 3-5 criteria for the evaluation of an ETL tool

How does a relational DBMS (like Oracle, DB2, MS SQL Server) meet these requirements?

EXERCISE

Data Warehouse / DHBWDaimler TSS 58

• Supplier profile

• Support

• HW/SW requirements

• License / maintenance Costs

• Usability

• Reliability

• Performance and scalability

• Multi-tenant

• Interfaces

• Scheduling

EXERCISE - DEFINE 5 CRITERIA FOR THE EVALUATION OF AN ETL TOOL

Data Warehouse / DHBWDaimler TSS 59

• RDBMS provide many of the functionalities but additional programming required

• RDBMS are often used for ETL/ELT by programming with SQL, PL/SQL, SQLT, etc

EXERCISE - HOW DOES A RELATIONAL DBMS MEET THESE REQUIREMENTS?

Data Warehouse / DHBWDaimler TSS 60

ETL Tool Manual ETL

Informatica, Talend, Oracle ODI, etc. SQL, PL/SQL, SQLT, etc.

Separate license No additional license

Workflow, error handling, and restart/recovery functionality included

Workflow, error handling, and restart/recovery functionality must be implemented manually

Impact analysis and where-used (lineage) functionality available

Impact analysis and where-used (lineage) functionality difficult

Faster development, easier maintenance Slower development, more difficult maintenance

Additional (Tool-) Know How required Know How often available

NEWER / ADVANCED TOPICS

• OPERATIONAL DATA WAREHOUSING

• DATA WAREHOUSE APPLIANCES

• CLOUD BI

OPERATIONAL DATA WAREHOUSING

Classical“ Data Warehouses

• Information in the warehouse used to support strategic business decisions

• Kept separate from operational systems

• Load of new data only in larger intervals (mostly weekly or monthly)

• Shorter intervals not required by users

• Huge system resources of the ETL process

Near Real Time Operational Data Warehousing

• Information in the warehouse used for tactical business decisions as well

• Low latency of information in data warehouse therefore needed

• Not only mathematical aggregations

OPERATIONAL DATA WAREHOUSING

Data Warehouse / DHBWDaimler TSS 63

With classical data warehouses users have to access two types of systems to get a complete image of a customer (for instance for CRM applications or in call centers)

• the data warehouse to see what happened in the past

• the OLTP systems to get the most current information

With an operational data warehouse

• all this information is in one system

• tighter integration with operational systems is easier

• for instance personalized offers „closing the loop“

WHY OPERATIONAL DATA WAREHOUSING?

Data Warehouse / DHBWDaimler TSS 64

SMARTFACTORY: OPERATIONAL BI SERVICE PLATFORM

Data Warehouse / DHBWDaimler TSS 65

Source: Gluchowski: Analytische Informationssysteme, 5.Aufl., p. 279

Workers getting alarms on their

watch

Containing and displaying

complex manuals, e.g. during repair

New data sources

sending lots of data with high speed

(sensor data, logs, etc.)

Right-Time data

required for automated actions, e.g.

cordless screwdriver knows and

adjusts torque

Near Real time /Right time ETL

• Executed asynchronously; triggered by business transactions in the OLTP

• Incremental real-time load

• Tighter integration of operational and data warehouse systems

DWHs become „mission critical“

• Higher requirements on availability and performance

Higher „transactional“ system load on data warehouse system

• DWH DB has to deal with typical DWH system load and transactional load

Data Quality mandatory

• Data is used for automated decisions

CHALLENGES FOR OPERATIONAL DATA WAREHOUSING

Data Warehouse / DHBWDaimler TSS 66

COMPARISON CLASSICAL DWH – OPERATIONAL DWH

Data Warehouse / DHBWDaimler TSS 67

Classical DWH Operational DWH

Strategic• Passive• Historical trends

Tactical• Prediction• Automatic execution of decisions

Batch• E.g. daily batch

Near Real-Time / Right-Time• Up-to-date view

Lower Availability• System can be down for

maintenance and longer response times for some reports are accepted

Availability• System becomes critical and must

fulfill high availability and performance requirements

DATA WAREHOUSE APPLIANCES

Setting up and configuring a data warehouse system is a complex task

• Hardware

• Servers + Storage + Network

• Connectivity to source systems

• Software

• Database management system

• ETL software

• Reporting and analytics software

• ...

An optimal performance of the whole system is difficult to achieve

DATA WAREHOUSE APPLIANCES

Data Warehouse / DHBWDaimler TSS 69

Data Warehouse Appliances are

• Pre-configured and pre-tested hard- and software configurations developed for running a data warehouse

• Optimized for data warehousing workload / Only suited for running OLAP• In contrast one size fits all: RDBMS are suited for OLTP, OLAP and mixed workloads

• Ready to be used after they are delivered to the customer

• Products, e.g. Teradata, HP Vertica, Exasol, Oracle Exadata, IBM Netezza(IBM PureData System for Analytics), MS Analytic Platform System

DATA WAREHOUSE APPLIANCES

Data Warehouse / DHBWDaimler TSS 70

APPLIANCE SIMPLICITY (E.G. ORACLE EXADATA)

Data Warehouse / DHBWDaimler TSS 71

• Up to 912 CPU core and 28.5TB memory per rack

• 2 to 19 DB servers per rack

• 3 to 18 Storage servers per rack

• Maximum of 920TB flash capacity

• 2.1PB of disk capacity

• 10TB size disk (10TB x 12 = 120TB RAW per storage server)

• About 4.8 million reads and about 4.3 million writes per second

ORACLE EXADATA X7 – SOME KEY FIGURES

Data Warehouse / DHBWDaimler TSS 72

Source: http://jaffardba.blogspot.com/2017/10/whats-new-in-exadata-x7.html

• Move as many operations as possible to storage cell instead of moving data to the DB server

• E.g. filter data already at storage cell and not at DB server

• Avoid transferring unnecessary data

• Column-oriented In-memory storage with high compression

• Many appliances are based on shared nothing architecture• Each node is independent

• Each node has its own storage or memory

• Parallel processing simpler and faster as no overhead due to contention

TYPICAL ENHANCEMENTS

Data Warehouse / DHBWDaimler TSS 73

CLOUD BI

• BI applications (database, ETL tools, Frontend) are hosted in a public cloud, e.g.

• AWS (Amazon Web Services)

• Microsoft Azure

• …

• Many tools nowadays are available in the cloud first • Vendors try to force customers to use clouds

• Or even available in the cloud only• E.g. Microsoft Power BI

CLOUD BI

Data Warehouse / DHBWDaimler TSS 75

CLOUD BI ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 76

Source: Lang: Business Intelligence erfolgreich umsetzen, 5.Aufl., p. 185

• Analytics as a service

• Provide complete BI (Analytics) SW stack including

• data storage

• data integration (ETL)

• data visualization and/or data modeling (Frontend)

• Meta data management

• Data as a service

• Provide quality data for further usage

• Data marketplace

CLOUD BI ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 77

CLOUD BI – DATA WAREHOUSING SERVICES

Data Warehouse / DHBWDaimler TSS 78

Source: http://db-engines.com/en/system/Amazon+Redshift%3BSnowflake

SNOWFLAKE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 79

Don‘t confuseSnowflake

product withSnowflake

dimensional model from

session 2

Snowflake Storage

• Snowflake loads data into its internal optimized, compressed, columnar format

• Snowflake itself uses (!) Amazon Web Service’s S3 (Simple Storage Service) cloud storage

Query Processing

• Each virtual warehouse is an MPP (Multi Parallel Processing) compute cluster composed of multiple compute nodes allocated by Snowflake from Amazon EC2

• Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses

SNOWFLAKE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 80

• Frontend

• Meta Data Management

• DWH Project Management

• New / Advanced topics

• Operational Data Warehousing

• Data Warehouse Appliances

• Cloud BI

SUMMARY

Data Warehouse / DHBWDaimler TSS 81

END OF DWH LECTUREGOOD TEXT BOOKS

• Kimball et al: The Data Warehouse Lifecycle Toolkit, Wiley 2008

• Bauer / Günzel: Data-Warehouse-Systeme, dpunkt, 2013

• Köppen et al: Data Warehouse Technologien, mitp, 2016

Data Warehouse / DHBWDaimler TSS 82

Source: http://dilbert.com/strip/2014-05-07

Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

tss@daimler.com / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Data Warehouse / DHBWDaimler TSS 83

THANK YOU

How to document / identify requirements?

• Must be easy to understand from non-technical users during Analysis/Technical concept phase

• Must provide sufficient information for System Design phase

The following slides provide some example work products that are produced during Analysis/Technical concept phase and may be refine during System Design phase

POSSIBLE DWH ANALYSIS AND DESIGN WORK PRODUCTS

Data Warehouse / DHBWDaimler TSS 84

• Answer most important questions of participating business units

• Provide high-quality data

• Introduction in time

• Usage of modern technology

• Business orientation

• Easy to use

• Executive sponsor

• Patience – user acceptance evolves over time

CRITICAL SUCCESS FACTORS FOR BUILDING A DATA WAREHOUSE

Data Warehouse / DHBWDaimler TSS 85

• New applications and data sources

• Increase demand for an

• Operational DWH, e.g.

• Industry 4.0 / Smart Factory

• Internet Of Things

• Internet of medical things

• Connected Cars

EXAMPLES OF OPERATIONAL DATA WAREHOUSING

Data Warehouse / DHBWDaimler TSS 86

Source: Gluchowski: Analytische Informationssysteme, 5.Aufl., p. 277

Replace pen & paper with electronic workflows

Decision support for each end user and not only management

Increasing demand to publish same content on different devices

Source: http://www.agiledata.org/essays/disciplinedAgileDW.html

DISCIPLINED AGILE DATA WAREHOUSING (SCOTT AMBLER) -PHASES

Data Warehouse / DHBWDaimler TSS 87

• Project initiation like initial scoping, initial architectural modeling

Inception

• Produce potentially consumable solution on an incremental basis

Construction

• Deploying the solution to their stakeholders

• Continuous deployment: Over time this phase becomes shorter and ideally disappears

Transition

DISCIPLINED AGILE DATA WAREHOUSING (SCOTT AMBLER) -PHASES OVERVIEW

Data Warehouse / DHBWDaimler TSS 88

Source: http://www.agiledata.org/essays/disciplinedAgileDW.html

• Take a usage-driven approach

• Keep your initial modelling light-weight

• User stories are only one option• As a Professor I would like to analyze the current grades of my students so that I can adjust the

difficulty of future tests and assignments

• As a Student I would like to know the drop out rates by course and professor from previous years to determine the likely difficulty of my course choices

• As a Registrar I would like to know the rate of enrollments within a class over time to determine the popularity of them

• As a Student I would like to know the estimated travel time between back-to-back classes so that I can determine whether I can make it to class on time

Source: http://www.disciplinedagiledelivery.com/user-stories-for-data-warehousebusiness-intelligence-a-disciplined-agile-approach/

DISCIPLINED AGILE DATA WAREHOUSING (SCOTT AMBLER) -USAGE-DRIVEN APPROACH

Data Warehouse / DHBWDaimler TSS 89

• Classical

• Waterfall model

• Incremental model

• Agile

• Scrum

• Kanban

PROCESS MODEL

Data Warehouse / DHBWDaimler TSS 90

• Benefits of DWH

• Cost-effectiveness

• SW selection

• HW selection

• Staff requirement including external Know-How

• Data protection and data security agreement, data classification

• Proof of Concept (PoC) to challenge different possible solutions

• Architectural concept

FEASIBILITY STUDY

Data Warehouse / DHBWDaimler TSS 91

• Documentation of user requirements: specification sheet

• Backend including ETL

• Frontend

• Security

• Metadata, Business glossary

• Non-functional requirements

• Analysis of data sources

• Data quality

• Data models

• Data security

ANALYSIS

Data Warehouse / DHBWDaimler TSS 92

• Technical description how to implement specifications

• Data model for different DWH layers

• Data integration design

• Frontend design

• Security concept

• Capacity planning

DESIGN

Data Warehouse / DHBWDaimler TSS 93

• Installation of development, test, integration, production, maintenance environment

• Usage of Metadata repository for implementation of data model, etl, frontend, security

• Launch of DWH

• Release Management

IMPLEMENTATION

Data Warehouse / DHBWDaimler TSS 94

• Functional

• Data quality / data validation

• Usability

• Performance

• Operational

• Security

TEST

Data Warehouse / DHBWDaimler TSS 95

• Deployment of new features, changes or bug fixes

• End user training

• Monitoring

• Production concept

• Initial load and future delta loads

• Keep the system running

OPERATIONS AND MAINTENANCE

Data Warehouse / DHBWDaimler TSS 96

• Security concerns for sensitive data

• But new data source coming from Internet. Storing the data in a (public) cloud can make sense, e.g.

• Connected Cars, IOT in general

CLOUD BI

Data Warehouse / DHBWDaimler TSS 97

Cloud Services

• Authentication and access control

• Infrastructure management

• Metadata management

• Query parsing and optimization

• Security

SNOWFLAKE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 98

top related