lecture @dhbw: data warehouse part iv: …buckenhofer/20181dwh/buckenhofer-d… · top-down vs...
TRANSCRIPT
A company of Daimler AG
LECTURE @DHBW: DATA WAREHOUSE
PART IV: FRONTEND, METADATA, PROJECTS, ADVANCED TOPICSANDREAS BUCKENHOFER, DAIMLER TSS
ABOUT ME
https://de.linkedin.com/in/buckenhofer
https://twitter.com/ABuckenhofer
https://www.doag.org/de/themen/datenbank/in-memory/
http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/
https://www.xing.com/profile/Andreas_Buckenhofer2
Andreas BuckenhoferSenior DB [email protected]
Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics
As a 100% Daimler subsidiary, we give
100 percent, always and never less.
We love IT and pull out all the stops to
aid Daimler's development with our
expertise on its journey into the future.
Our objective: We make Daimler the
most innovative and digital mobility
company.
NOT JUST AVERAGE: OUTSTANDING.
Daimler TSS
INTERNAL IT PARTNER FOR DAIMLER
+ Holistic solutions according to the Daimler guidelines
+ IT strategy
+ Security
+ Architecture
+ Developing and securing know-how
+ TSS is a partner who can be trusted with sensitive data
As subsidiary: maximum added value for Daimler
+ Market closeness
+ Independence
+ Flexibility (short decision making process,
ability to react quickly)
Daimler TSS 4
Daimler TSS
LOCATIONS
Data Warehouse / DHBW
Daimler TSS China
Hub Beijing
10 employees
Daimler TSS Malaysia
Hub Kuala Lumpur
42 employeesDaimler TSS IndiaHub Bangalore22 employees
Daimler TSS Germany
7 locations
1000 employees*
Ulm (Headquarters)
Stuttgart
Berlin
Karlsruhe
* as of August 2017
5
• After the end of this lecture you will be able to
• Understand function of Frontend Tools and Information Design
• Understand the necessity for metadata
• Understand lifecycle of DWH projects
• Advanced topics like Operational BI, DWH Appliances, Cloud BI
WHAT YOU WILL LEARN TODAY
Data Warehouse / DHBWDaimler TSS 6
FRONTEND
LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 8
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer(Input Layer)
OLTP
OLTP
Core Warehouse
Layer(Storage
Layer)
Mart Layer(Output Layer)
(Reporting Layer)
Integration Layer
(Cleansing Layer)
Aggregation Layer
Metadata Management
Security
DWH Manager incl. Monitor
VISUALIZATION IN THE USUAL CASE OF LIFE
Data Warehouse / DHBWDaimler TSS 9
RUSSIAN CAMPAIGN OF NAPOLEON
Data Warehouse / DHBWDaimler TSS 10
Source: https://de.wikipedia.org/wiki/Charles_Joseph_Minard
MAPPING THE 1854 LONDON CHOLERA OUTBREAK
Data Warehouse / DHBWDaimler TSS 11
Source: https://www1.udel.edu/johnmack/frec682/cholera/
MAPPING THE 1854 LONDON CHOLERA OUTBREAK
Data Warehouse / DHBWDaimler TSS 12
EXCERCISE: VISUALIZE AS MUCH AS POSSIBLE
Data Warehouse / DHBWDaimler TSS 13
Umsatz in €
2014 2015 2016
Kanada 16.000 14.000 17.000
England 8.000 9.000 8.000
Frankreich 7.000 4.000 5.000
USA 60.000 85.000 90.000
Deutschland 4.000 10.000 15.000
Australien 10.000 8.000 15.000
Umsatz 105.000 130.000 150.000
POSSIBLE SOLUTION 1
14Data Warehouse / DHBWDaimler TSS
POSSIBLE SOLUTION 2
15Data Warehouse / DHBWDaimler TSS
• Reporting (Standard, ad-hoc)
• OLAP
• Dashboards, Scorecards
• Advanced Analytics / Data Mining / Text Mining
• Search & Discovery
INTERFACE TO THE END USER
Data Warehouse / DHBWDaimler TSS 16
Standard Reports
• Prepared static reports that can be executed at request by end users
• Are executed at the end of an ETL process and e.g. send by email to end users
• Normally based on fact tables and its dimensions
• Reports are often lists similar to Excel-Sheets but can also contain graphics (e.g. line charts)
Ad-hoc Reports
• End users create their own reports („Self service“)
REPORTING (STANDARD, AD-HOC)
Data Warehouse / DHBWDaimler TSS 17
ROLAP / MOLAP Client Frontend
• Prepared cubes (multidimensional or relational fact tables)
• User can perform interactive analysis of data
• Rollup / drill-down
• Pivot
• Slicing
• Dicing
OLAP
Data Warehouse / DHBWDaimler TSS 18
„Progress reports“
Provide an overall view of KPIs (Key Performance Indicators)
Combination of several elements from Reporting and/or OLAP (e.g. line charts) into an overall view (like a „cockpit“)
Dashboard is more focused on operational goals
• High-level overview what is happening
Scorecard is more focused on strategic goals
• Plan a strategy and identify why something happens
DASHBOARDS, SCORECARDS
Data Warehouse / DHBWDaimler TSS 19
See Mr. Bollinger‘s lecture
ADVANCED ANALYTICS / DATA MINING / TEXT MINING
Data Warehouse / DHBWDaimler TSS 20
Not just numerical data
Analysis of new data types gets more and more important
• Text
• GPS coordinates
• Pictures
• Videos
Data can be available in RDBMS (e.g. text modules/indexes available), Hadoop or SQL DBs
SEARCH & DISCOVERY
Data Warehouse / DHBWDaimler TSS 21
MANY GRAPHICAL ELEMENTS TO USE IN REPORTS
Data Warehouse / DHBWDaimler TSS 22
Source: https://github.com/d3/d3/wiki/Gallery
MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR
Data Warehouse / DHBWDaimler TSS 23
Source: Hichert / Faisst, http://www.backup-page.hichert.com/
DO YOU USE 3D USUALLY ?
Data Warehouse / DHBWDaimler TSS 24
Some remarks about previous slide
• 3D elements introduce clutter and give not more information
• Pie chart most often does not make sense
• Line chart barely readable
• Labels are placed outside of the graphic
• Tachometer costs a lot of space and show
• Too much color in general
• Color without meaning, e.g. red should be used for alarms / errors
MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR
Data Warehouse / DHBWDaimler TSS 25
STORY TELLING WITH APPROPRIATE VISUALIZATION
Famous example by Hans Rosling (watch 3:08 onwards)
https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=de
Data Warehouse / DHBWDaimler TSS 26
Information design is the practice of presenting information in a way that fosters efficient and effective understanding of it.(source: Wikipedia, https://en.wikipedia.org/wiki/Information_design )
Some authors are well known for their criticism of many graphical representations - they provide rules for good information design
• Edward Tufte
• Stephen Few
• Rolf Hichert
INFORMATION DESIGN
Data Warehouse / DHBWDaimler TSS 27
Define standards, e.g.
• use always the same colors and with care, e.g.
• red = negative
• green = positive
• pie charts are rarely useful and should be avoided
• better use bar chart or line chart
• No 3D elements as these elements don’t enhance information but introduce clutter
• Standardize abbreviations, e.g. PY = previous year
INFORMATION DESIGNREDUCE TO THE ESENTIALS
Data Warehouse / DHBWDaimler TSS 28
TABLE WITH INTEGRATED BAR CHARTS
Data Warehouse / DHBWDaimler TSS 29
Source: Hichert, http://www.hichert.com/de/resource/table-template-02/
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?
30Data Warehouse / DHBWDaimler TSS
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING
31Data Warehouse / DHBWDaimler TSS
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?IMPROVED VERSION
32Data Warehouse / DHBWDaimler TSS
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING
33Data Warehouse / DHBWDaimler TSS
EYE-TRACKING - BEFORE AND AFTER
34Data Warehouse / DHBWDaimler TSS
Consumers / BI Users
• use reports, OLAP and dashboards to obtain information
Power Users
• Use reports , OLAP and dashboards to obtain information
• Create new reports and dashboards
Data Scientists
• Statistical / mathematical geeks
• Analyze / explore data
• Need to analyze raw (non-cleansed, non-transformed) data
BI END USER ROLES
Data Warehouse / DHBWDaimler TSS 35
META DATA MANAGEMENT
WHAT IS METADATA?
Data Warehouse / DHBWDaimler TSS 37
Data about
other data
Business Metadata
• Definition of business vocabulary and relationships
• Definition of the value range
• Linkage to physical representation
TYPES OF METADATA
Data Warehouse / DHBWDaimler TSS 38
Report and ETL metadata
• Report definitions
• Data sources
• Column definitions
• Computations
Logical and physical metadata of data model
• Table structure
• Definition of columns
• Relationships between tables and columns
• Dimension hierarchy
TYPES OF METADATA
Data Warehouse / DHBWDaimler TSS 39
THE AREAS OF METADATA
Data Warehouse / DHBWDaimler TSS 40
THE AREAS OF METADATA CONNECTED
Data Warehouse / DHBWDaimler TSS 41
Components of a data warehouse system are interconnected
• BI report user has to know
• the meaning, definitions of the shown measures, „KPIs“ (key performance indicators)
• BI report designer has to know
• the table definitions
• the meaning of the column values
• ETL job designer has to know
• the table definitions or the exact definition of the measures
• Database administrator has to know
• Which tables are used by ETL jobs, reports
WHY A COMMON METADATA REPOSITORY?
Data Warehouse / DHBWDaimler TSS 42
Metadata driven ETL development
• Generate parts of ETL code
• increasing interest for Data Vault development projects
• Tools e.g. MID Innovator, Quipu, AnalytiX DS, Talend, Pentaho, Wherescape, andothers
Common metadata repository ensures consistency across all components
• Many tools involved (DB, ETL, Frontend, …)
Enables cross component metadata analysis
• Data Lineage
• Impact Analysis
WHY A COMMON METADATA REPOSITORY?
Data Warehouse / DHBWDaimler TSS 43
“Data lineage”
• Import & Browse Full BI Report Metadata
• Navigate through report attributes
• Visually navigate through data lineage across tools
• Combines operational & design viewpoint
WHY A COMMON METADATA REPOSITORY?
Data Warehouse / DHBWDaimler TSS 44
“Impact Analysis”
• Show complete change impact in graphical or list form
• Includes impact on reports in BI tools
• Visually navigate through impacted objects across tools
• Allows impact analysis on any object type
WHAT HAPPENS IF I CHANGE THIS COLUMN?
Data Warehouse / DHBWDaimler TSS 45
• Show relationships between business terms, data model entities, and technical and report fields
• Requires cross-tool mapping of business terms
• Allows field meaning to be understood
• Allows business term relationships to be understood
WHAT DOES THIS FIELD MEAN?
Data Warehouse / DHBWDaimler TSS 46
• Shows objects that user manages
• Shows stewardship relationships on business terms
• Shows user group associations
WHAT OBJECTS DOES THIS USER OWN?
Data Warehouse / DHBWDaimler TSS 47
• Navigation through complete job details
• Navigation of complete operational metadata
WHAT HAPPENED ON THE LAST JOB RUN?
Data Warehouse / DHBWDaimler TSS 48
DATA WAREHOUSING PROJECTS
LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 50
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer(Input Layer)
OLTP
OLTP
Core Warehouse
Layer(Storage
Layer)
Mart Layer(Output Layer)
(Reporting Layer)
Integration Layer
(Cleansing Layer)
Aggregation Layer
Metadata Management
Security
DWH Manager incl. Monitor
Top Down (Inmon)
Bottom Up (Kimball)
Top-Down (Inmon)
• Comprehensive approach regarding available data
• Design Core Warehouse Layer = integrated data model first considering all requirements
• Design data marts afterwards
Bottom-Up (Kimball)
• Approach focusing on fast delivery of first results
• Design one data mart first
• Next Marts are modeled afterwards usually using Kimball architecture
• conformed dimensions to integrate different data marts / fact tables
TOP-DOWN VS BOTTOM-UP APPROACH
Data Warehouse / DHBWDaimler TSS 51
TOP-DOWN VS BOTTOM-UP APPROACHADVANTAGES AND DISADVANTAGES
Data Warehouse / DHBWDaimler TSS 52
Top-Down (Inmon) Bottom-Up (Kimball)
☺ Core Warehouse Layer is designed optimal ☺ Early involvement of end users
☺ Data from Core Warehouse Layer is reused in many Marts
☺ Fast results
Time-consuming approach with high preparatory effort
Focus on single Marts leads to risk that overall view is lost, esp. properly designed Core Warehouse Layer
High risk with changing requirements Data often not reused but inconsistently copied across Marts
Both approaches have their down-sides
• Top-Down takes enormous initial effort to build data model for Core Warehouse Layer
• Bottom-Up is risky as central / integrated focus is lost
Think big, start small
• Think Big: Design conceptual data model for Core Warehouse Layer covering whole enterprise
• Start small: Implement physical data model for Core and Mart Layer in iterations by each business department
THINK BIG, START LOCAL
Data Warehouse / DHBWDaimler TSS 53
• DWH is not a product
• DWH databases are more complex with different layers and data models
• Data first, code is secondary
• Data quality is a major concern
• Data integration is a challenging objective
• Business need difficult to justify quantitatively
WHAT’S DIFFERENT IN DWH PROJECTS?
Data Warehouse / DHBWDaimler TSS 54
WHY DO DWH PROJECTS FAIL?
Data Warehouse / DHBWDaimler TSS 55
Feasibility study Analysis Design Implementation TestOperations and maintenance
PROJECT PHASESSMALL ITERATIONS INSTEAD OF LONG PHASES
Data Warehouse / DHBWDaimler TSS 56
Organizational team that coordinate and standardize DWH activities within an (end user) organization
• Define standards and create BI portfolio (e.g. which tools/products to use)
• Create DWH architecture and govern BI activities
• Establish processes for business and IT interaction
• Monitor DWH/BI market for new trends
• Determine skills and experience of Business users
BICC: BI CENTER OF EXCELLENCE
Data Warehouse / DHBWDaimler TSS 57
Define 3-5 criteria for the evaluation of an ETL tool
How does a relational DBMS (like Oracle, DB2, MS SQL Server) meet these requirements?
EXERCISE
Data Warehouse / DHBWDaimler TSS 58
• Supplier profile
• Support
• HW/SW requirements
• License / maintenance Costs
• Usability
• Reliability
• Performance and scalability
• Multi-tenant
• Interfaces
• Scheduling
EXERCISE - DEFINE 5 CRITERIA FOR THE EVALUATION OF AN ETL TOOL
Data Warehouse / DHBWDaimler TSS 59
• RDBMS provide many of the functionalities but additional programming required
• RDBMS are often used for ETL/ELT by programming with SQL, PL/SQL, SQLT, etc
EXERCISE - HOW DOES A RELATIONAL DBMS MEET THESE REQUIREMENTS?
Data Warehouse / DHBWDaimler TSS 60
ETL Tool Manual ETL
Informatica, Talend, Oracle ODI, etc. SQL, PL/SQL, SQLT, etc.
Separate license No additional license
Workflow, error handling, and restart/recovery functionality included
Workflow, error handling, and restart/recovery functionality must be implemented manually
Impact analysis and where-used (lineage) functionality available
Impact analysis and where-used (lineage) functionality difficult
Faster development, easier maintenance Slower development, more difficult maintenance
Additional (Tool-) Know How required Know How often available
NEWER / ADVANCED TOPICS
• OPERATIONAL DATA WAREHOUSING
• DATA WAREHOUSE APPLIANCES
• CLOUD BI
OPERATIONAL DATA WAREHOUSING
Classical“ Data Warehouses
• Information in the warehouse used to support strategic business decisions
• Kept separate from operational systems
• Load of new data only in larger intervals (mostly weekly or monthly)
• Shorter intervals not required by users
• Huge system resources of the ETL process
Near Real Time Operational Data Warehousing
• Information in the warehouse used for tactical business decisions as well
• Low latency of information in data warehouse therefore needed
• Not only mathematical aggregations
OPERATIONAL DATA WAREHOUSING
Data Warehouse / DHBWDaimler TSS 63
With classical data warehouses users have to access two types of systems to get a complete image of a customer (for instance for CRM applications or in call centers)
• the data warehouse to see what happened in the past
• the OLTP systems to get the most current information
With an operational data warehouse
• all this information is in one system
• tighter integration with operational systems is easier
• for instance personalized offers „closing the loop“
WHY OPERATIONAL DATA WAREHOUSING?
Data Warehouse / DHBWDaimler TSS 64
SMARTFACTORY: OPERATIONAL BI SERVICE PLATFORM
Data Warehouse / DHBWDaimler TSS 65
Source: Gluchowski: Analytische Informationssysteme, 5.Aufl., p. 279
Workers getting alarms on their
watch
Containing and displaying
complex manuals, e.g. during repair
New data sources
sending lots of data with high speed
(sensor data, logs, etc.)
Right-Time data
required for automated actions, e.g.
cordless screwdriver knows and
adjusts torque
Near Real time /Right time ETL
• Executed asynchronously; triggered by business transactions in the OLTP
• Incremental real-time load
• Tighter integration of operational and data warehouse systems
DWHs become „mission critical“
• Higher requirements on availability and performance
Higher „transactional“ system load on data warehouse system
• DWH DB has to deal with typical DWH system load and transactional load
Data Quality mandatory
• Data is used for automated decisions
CHALLENGES FOR OPERATIONAL DATA WAREHOUSING
Data Warehouse / DHBWDaimler TSS 66
COMPARISON CLASSICAL DWH – OPERATIONAL DWH
Data Warehouse / DHBWDaimler TSS 67
Classical DWH Operational DWH
Strategic• Passive• Historical trends
Tactical• Prediction• Automatic execution of decisions
Batch• E.g. daily batch
Near Real-Time / Right-Time• Up-to-date view
Lower Availability• System can be down for
maintenance and longer response times for some reports are accepted
Availability• System becomes critical and must
fulfill high availability and performance requirements
DATA WAREHOUSE APPLIANCES
Setting up and configuring a data warehouse system is a complex task
• Hardware
• Servers + Storage + Network
• Connectivity to source systems
• Software
• Database management system
• ETL software
• Reporting and analytics software
• ...
An optimal performance of the whole system is difficult to achieve
DATA WAREHOUSE APPLIANCES
Data Warehouse / DHBWDaimler TSS 69
Data Warehouse Appliances are
• Pre-configured and pre-tested hard- and software configurations developed for running a data warehouse
• Optimized for data warehousing workload / Only suited for running OLAP• In contrast one size fits all: RDBMS are suited for OLTP, OLAP and mixed workloads
• Ready to be used after they are delivered to the customer
• Products, e.g. Teradata, HP Vertica, Exasol, Oracle Exadata, IBM Netezza(IBM PureData System for Analytics), MS Analytic Platform System
DATA WAREHOUSE APPLIANCES
Data Warehouse / DHBWDaimler TSS 70
APPLIANCE SIMPLICITY (E.G. ORACLE EXADATA)
Data Warehouse / DHBWDaimler TSS 71
• Up to 912 CPU core and 28.5TB memory per rack
• 2 to 19 DB servers per rack
• 3 to 18 Storage servers per rack
• Maximum of 920TB flash capacity
• 2.1PB of disk capacity
• 10TB size disk (10TB x 12 = 120TB RAW per storage server)
• About 4.8 million reads and about 4.3 million writes per second
ORACLE EXADATA X7 – SOME KEY FIGURES
Data Warehouse / DHBWDaimler TSS 72
Source: http://jaffardba.blogspot.com/2017/10/whats-new-in-exadata-x7.html
• Move as many operations as possible to storage cell instead of moving data to the DB server
• E.g. filter data already at storage cell and not at DB server
• Avoid transferring unnecessary data
• Column-oriented In-memory storage with high compression
• Many appliances are based on shared nothing architecture• Each node is independent
• Each node has its own storage or memory
• Parallel processing simpler and faster as no overhead due to contention
TYPICAL ENHANCEMENTS
Data Warehouse / DHBWDaimler TSS 73
CLOUD BI
• BI applications (database, ETL tools, Frontend) are hosted in a public cloud, e.g.
• AWS (Amazon Web Services)
• Microsoft Azure
• …
• Many tools nowadays are available in the cloud first • Vendors try to force customers to use clouds
• Or even available in the cloud only• E.g. Microsoft Power BI
CLOUD BI
Data Warehouse / DHBWDaimler TSS 75
CLOUD BI ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 76
Source: Lang: Business Intelligence erfolgreich umsetzen, 5.Aufl., p. 185
• Analytics as a service
• Provide complete BI (Analytics) SW stack including
• data storage
• data integration (ETL)
• data visualization and/or data modeling (Frontend)
• Meta data management
• Data as a service
• Provide quality data for further usage
• Data marketplace
CLOUD BI ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 77
CLOUD BI – DATA WAREHOUSING SERVICES
Data Warehouse / DHBWDaimler TSS 78
Source: http://db-engines.com/en/system/Amazon+Redshift%3BSnowflake
SNOWFLAKE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 79
Don‘t confuseSnowflake
product withSnowflake
dimensional model from
session 2
Snowflake Storage
• Snowflake loads data into its internal optimized, compressed, columnar format
• Snowflake itself uses (!) Amazon Web Service’s S3 (Simple Storage Service) cloud storage
Query Processing
• Each virtual warehouse is an MPP (Multi Parallel Processing) compute cluster composed of multiple compute nodes allocated by Snowflake from Amazon EC2
• Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses
SNOWFLAKE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 80
• Frontend
• Meta Data Management
• DWH Project Management
• New / Advanced topics
• Operational Data Warehousing
• Data Warehouse Appliances
• Cloud BI
SUMMARY
Data Warehouse / DHBWDaimler TSS 81
END OF DWH LECTUREGOOD TEXT BOOKS
• Kimball et al: The Data Warehouse Lifecycle Toolkit, Wiley 2008
• Bauer / Günzel: Data-Warehouse-Systeme, dpunkt, 2013
• Köppen et al: Data Warehouse Technologien, mitp, 2016
Data Warehouse / DHBWDaimler TSS 82
Source: http://dilbert.com/strip/2014-05-07
Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Data Warehouse / DHBWDaimler TSS 83
THANK YOU
How to document / identify requirements?
• Must be easy to understand from non-technical users during Analysis/Technical concept phase
• Must provide sufficient information for System Design phase
The following slides provide some example work products that are produced during Analysis/Technical concept phase and may be refine during System Design phase
POSSIBLE DWH ANALYSIS AND DESIGN WORK PRODUCTS
Data Warehouse / DHBWDaimler TSS 84
• Answer most important questions of participating business units
• Provide high-quality data
• Introduction in time
• Usage of modern technology
• Business orientation
• Easy to use
• Executive sponsor
• Patience – user acceptance evolves over time
CRITICAL SUCCESS FACTORS FOR BUILDING A DATA WAREHOUSE
Data Warehouse / DHBWDaimler TSS 85
• New applications and data sources
• Increase demand for an
• Operational DWH, e.g.
• Industry 4.0 / Smart Factory
• Internet Of Things
• Internet of medical things
• Connected Cars
EXAMPLES OF OPERATIONAL DATA WAREHOUSING
Data Warehouse / DHBWDaimler TSS 86
Source: Gluchowski: Analytische Informationssysteme, 5.Aufl., p. 277
Replace pen & paper with electronic workflows
Decision support for each end user and not only management
Increasing demand to publish same content on different devices
Source: http://www.agiledata.org/essays/disciplinedAgileDW.html
DISCIPLINED AGILE DATA WAREHOUSING (SCOTT AMBLER) -PHASES
Data Warehouse / DHBWDaimler TSS 87
• Project initiation like initial scoping, initial architectural modeling
Inception
• Produce potentially consumable solution on an incremental basis
Construction
• Deploying the solution to their stakeholders
• Continuous deployment: Over time this phase becomes shorter and ideally disappears
Transition
DISCIPLINED AGILE DATA WAREHOUSING (SCOTT AMBLER) -PHASES OVERVIEW
Data Warehouse / DHBWDaimler TSS 88
Source: http://www.agiledata.org/essays/disciplinedAgileDW.html
• Take a usage-driven approach
• Keep your initial modelling light-weight
• User stories are only one option• As a Professor I would like to analyze the current grades of my students so that I can adjust the
difficulty of future tests and assignments
• As a Student I would like to know the drop out rates by course and professor from previous years to determine the likely difficulty of my course choices
• As a Registrar I would like to know the rate of enrollments within a class over time to determine the popularity of them
• As a Student I would like to know the estimated travel time between back-to-back classes so that I can determine whether I can make it to class on time
Source: http://www.disciplinedagiledelivery.com/user-stories-for-data-warehousebusiness-intelligence-a-disciplined-agile-approach/
DISCIPLINED AGILE DATA WAREHOUSING (SCOTT AMBLER) -USAGE-DRIVEN APPROACH
Data Warehouse / DHBWDaimler TSS 89
• Classical
• Waterfall model
• Incremental model
• Agile
• Scrum
• Kanban
PROCESS MODEL
Data Warehouse / DHBWDaimler TSS 90
• Benefits of DWH
• Cost-effectiveness
• SW selection
• HW selection
• Staff requirement including external Know-How
• Data protection and data security agreement, data classification
• Proof of Concept (PoC) to challenge different possible solutions
• Architectural concept
FEASIBILITY STUDY
Data Warehouse / DHBWDaimler TSS 91
• Documentation of user requirements: specification sheet
• Backend including ETL
• Frontend
• Security
• Metadata, Business glossary
• Non-functional requirements
• Analysis of data sources
• Data quality
• Data models
• Data security
ANALYSIS
Data Warehouse / DHBWDaimler TSS 92
• Technical description how to implement specifications
• Data model for different DWH layers
• Data integration design
• Frontend design
• Security concept
• Capacity planning
DESIGN
Data Warehouse / DHBWDaimler TSS 93
• Installation of development, test, integration, production, maintenance environment
• Usage of Metadata repository for implementation of data model, etl, frontend, security
• Launch of DWH
• Release Management
IMPLEMENTATION
Data Warehouse / DHBWDaimler TSS 94
• Functional
• Data quality / data validation
• Usability
• Performance
• Operational
• Security
TEST
Data Warehouse / DHBWDaimler TSS 95
• Deployment of new features, changes or bug fixes
• End user training
• Monitoring
• Production concept
• Initial load and future delta loads
• Keep the system running
OPERATIONS AND MAINTENANCE
Data Warehouse / DHBWDaimler TSS 96
• Security concerns for sensitive data
• But new data source coming from Internet. Storing the data in a (public) cloud can make sense, e.g.
• Connected Cars, IOT in general
CLOUD BI
Data Warehouse / DHBWDaimler TSS 97
Cloud Services
• Authentication and access control
• Infrastructure management
• Metadata management
• Query parsing and optimization
• Security
SNOWFLAKE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 98