experiences from a data vault pilot exploiting the internet of things
DESCRIPTION
Experiences from a Data Vault Pilot Exploiting the Internet of Things by USG ICT ProfessionalsTRANSCRIPT
Business Analytics For All
BA4All – Insight Session April 29th 2014 Guy Van der Sande – Vincent Greslebin
Unlocking the Value within the Data Vault
Fifthplay : Architecture
Smart Homes Platform Data Warehouse
Data Vault
Data mart
Gebruikers
Marketing & SSC
Utility Portal
ETL Dag - 1
- Controle data kwaliteit - Toepassing business
rules - Aggregatie - Filtering
Facility Portal
Fifthplay : Why Data Vault ?
• Pattern based design which allows agility to take place
• Easy to add new data sources making it future proof. This allows
Fifthplay to stay innovative
• Large volume of data
• Build up history that is not available in the operational system
• Possibility of performing analysis on raw data (cfr quality checks)
• Development speed (Pilot : 37 working days)
Data Vault ?
Data Vault ?
Data Vault ?
The Data Vault is a detail oriented, historical tracking and
uniquely linked set of normalized tables that support one or
more functional areas of business. It is a hybrid approach
encompassing the best of breed between 3rd normal form
(3NF) and star schema.
The design is flexible, scalable, consistent and adaptable to
the needs of the enterprise.
Standard architecture
The centerpiece of the Enterprise Data Warehouse
History is build-up
Granularity as ‘detailed’ as possible
No use of business rules
Use of business keys that are horizontal in nature and
provide visibility across lines of business
A new layer which has the benefits of the RAW Data
Vault, but with the business data embedded
In the Business Data Vault the data has been altered,
cleansed and changed to meet the business rules
Downstream of the raw data vault
Starting point for Master Data Management
Metadata is absolutely vital
The Data Vault Model exists of 3 basic entity types
• Hubs : contains a unique list of business keys
• Links : associations across or between business keys
• Satellites : holds descriptive data (about the business key) over time
Component parts of the Data Vault model
• Represents a Core Business Concept
• Is formed around the Business Key of this concept
• Is established the first time a new instance of that
business key is introduced
• Must be 1:1 with a single instance
• Consists of the business key, a sequence id, a load
date/time stamp and a record source.
Component parts - Hub
• Represents a natural business relationship between business keys
• Is established the first time this new unique association is presented
• Can represent an association between several Hubs and sometimes other
Links.
• maintains a 1:1 relationship with the unique and specific business defined
association between that set of keys.
• Consists of the sequence ids from the Hubs and Links
• Contains sequence id, a load date/time stamp and a
• record source.
Component parts - Link
• The Satellite contains the descriptive information
(context) for a business key.
• A Satellite can only describe one key (Hub or a Link).
• The Satellite is the only construct that manages time
slice data (data warehouse historical tracking of
values over time).
Component parts - Satellite
Fact
Dimension 1
Dimension 3
Dimension 2
Dimension 4
Data Vault – Why ?
Fact
Dimension 1
Dimension 3
Dimension 2
Dimension 4
Data Vault – Why ?
Fact
Dimension 1
Dimension 3
Dimension 2
Dimension 4
Fact
Dimension 5
Data Vault – Why ?
Data Vault – Why ? DV
DM
DV
DM
S
S
S S
H
S
L
H
H
H
Data Vault – Why ?
DV
DM
S
S
S S
H
S
L
H
H
H
Dimension Fact
Data Vault – Why ?
Data Vault – How did we do it with Fifthplay ?
HubServicePartner HubCustomer HubHomeAreaManager
HubSmartPlug
HubDeviceGroup
HubEnergyLogType
LinkServicePartnerCustomer LinkCustomerHomeAreaManager
LinkHomeAreaManagerSmartPlug
LinkCustomerDeviceGroup
LinkDeviceGroupSmartPlug
LinkDeviceSubGroupSmartPlug
LinkSmartPlugApplianceEnergyLogType
HubCityLinkHomeAreaManagerCity
HubCountry
LinkCountryCity
HubSatServicePartnerHubSatCustomer
HubSatHomeAreaManager
LinkSatHomeAreaManagerCity
LinkSatCountryCity
HubSatCountry
HubSatDeviceGroup
HubSatSmartPlug
HubAppliance
HubSatAppliance
LinkSatSmartPlugApplianceEnergyLogType
HubSatHomeAreaManagerAddress
SeqServicePartnerPK
ServicePartnerID
LoadDateTime
RecordSource
SeqCustomerPK
CustomerID
LoadDateTime
RecordSource
SeqHomeAreaManagerPK
HomeAreaManagerNumber
LoadDateTime
RecordSource
SeqSmartPlugPK
SmartPlugID
LoadDateTime
RecordSource
SeqDeviceGroupPK
DeviceGroupID
LoadDateTime
RecordSource
SeqEnergyLogTypePK
EnergyLogName
LoadDateTime
RecordSource
SeqServicePartnerCustomerPK
SeqCustomer
LoadDateTime
RecordSource
SeqServicePartner
SeqCustomerHomeAreaManager
PK
SeqCustomer
LoadDateTime
RecordSource
SeqHomeAreaManager
SeqHomeAreaManagerSmartPlug
PK
SeqHomeAreaManager
LoadDateTime
RecordSource
SeqSmartPlug
SeqCustomerDeviceGroupPK
SeqCustomer
LoadDateTime
RecordSource
SeqDeviceGroup
LoadDateTime
SeqDeviceGroupSmartPlugPK
LoadDateTime
RecordSource
SeqDeviceGroup
SeqDeviceSubGroupSmartPlug
PK
LoadDateTime
RecordSource
SeqDeviceGroup
SeqSmartPlug
SeqSmartPlug
SeqSmartPlugApplianceEnergyLogType
PK
SeqEnergyLogType
LoadDateTime
RecordSource
SeqSmartPlug
SeqCityPK
CityPostalCode
LoadDateTime
RecordSource
CityName
SeqHomeAreaManagerCityPK
SeqCity
LoadDateTime
RecordSource
SeqHomeAreaManager
SeqCountryPK
CountryIsoCode
LoadDateTime
RecordSource
SeqCountryCityPK
SeqCity
LoadDateTime
RecordSource
SeqCountry
SeqSatServicePartnerPK
SeqServicePartner
LoadDateTime
RecordSource
LoadEndDateTime
ServicePartnerCode
ServiucePartnerEmail
ServicePartnerCustomerContact
SeqSatCustomerPK
SeqCustomer
LoadDateTime
RecordSource
LoadEndDateTime
CustomerEmail
CustomerFirstName
CustomerLastName
CustomerLanguage
SeqSatHomeAreaManagerPK
SeqHomeAreaManager
LoadDateTime
RecordSource
LoadEndDateTime
HomeAreaManagerMode
HomeAreaManagerArchitecture
SeqSatHomeAreaManagerCity
PK
SeqHomeAreaManagerCity
LoadDateTime
RecordSource
LoadEndDateTime
HAMCityAddressLine1
HAMCityPhoneNumber
HAMCityAddressLine2
SeqSatCountryCityPK
SeqCountryCity
LoadDateTime
RecordSource
LoadEndDateTime
CountryCityRegion
CountryCityState
SeqSatCountryPK
SeqCountry
LoadDateTime
RecordSource
LoadEndDateTime
CountryName
SeqSatDeviceGroupPK
SeqDeviceGroup
LoadDateTime
RecordSource
LoadEndDateTime
DeviceGroupName
DeviceGroupDescription
SeqSatSmartPlugPK
SeqSmartPlug
LoadDateTime
RecordSource
LoadEndDateTime
SmartPlugDisplayName
SmartPlugManufacturer
SmartPlugModel
SmartPlugIsGenerator
SmartPlugHasChildren
SmartPlugHasSchedule
SeqAppliancePK
ApplianceID
LoadDateTime
RecordSource
SeqSatAppliancePK
SeqAppliance
LoadDateTime
RecordSource
LoadEndDateTime
ApplianceCategory
SeqSatSmartPlugApplianceEnergyLogType
PK
SeqSmartPlugApplianceEnergyLogType
LoadDateTime
RecordSource
LoadEndDateTime
EnergyLogDateTime
EnergyLogValue
SeqAppliance
EnergyLogValueUnit
Legend
Hub
Link
Satellite
ServicePartnerWebPage
SeqSatHomeAreaManagerAddress
PK
SeqHomeAreaManager
LoadDateTime
RecordSource
LoadEndDateTime
HomeAreaManagerAddressLine1
HomeAreaManagerPostalCode
HomeAreaManagerAddressLine2
HomeAreaManagerCityName
HomeAreaManagerProvince
HomeAreaManagerState
HomeAreaManagerCountry
Fifthplay Raw Data Vault Architecture
Fifthplay Raw Data Vault Architecture
HubSmartPlug
HubEnergyLogType
LinkSmartPlugApplianceEnergyLogType
HubAppliance
HubSatAppliance
LinkSatSmartPlugApplianceEnergyLogType
SeqSmartPlugPK
SmartPlugID
LoadDateTime
RecordSource
SeqEnergyLogTypePK
EnergyLogName
LoadDateTime
RecordSource
SeqSmartPlugApplianceEnergyLogType
PK
SeqEnergyLogType
LoadDateTime
RecordSource
SeqSmartPlug
SeqAppliancePK
ApplianceID
LoadDateTime
RecordSource
SeqSatAppliancePK
SeqAppliance
LoadDateTime
RecordSource
LoadEndDateTime
ApplianceCategory
SeqSatSmartPlugApplianceEnergyLogType
PK
SeqSmartPlugApplianceEnergyLogType
LoadDateTime
RecordSource
LoadEndDateTime
EnergyLogDateTime
EnergyLogValue
SeqAppliance
EnergyLogValueUnit
Legend
Hub
Link
Satellite
Fifthplay : Data Vault – lessons learned
• Don’t stop with data vault; A combination with classic
dimensional Kimball-methodology is advised
• Be creative; get out of your comfort zone, dare to walk
the thine line
• While setting up the data vault, operational issues
where discovered early in the process
• ETL-development goes very quickly because of the
typical pattern design of the data vault;
Data Vault – What’s next ?
2013 : Dan Linstedt
releases Data Vault
2.0 specs
History and what’s next ?
Relational modeling
(E.F.Codd)
Bill Inmon began
discussing Data
Warehousing
• Barry Devlin and
Dr Kimball
release
“Business Data
Warehouse”
• Bill Inmon
popularizes Data
Warehousing
• Dr Kimball
popularizes Star
Schema
Dan Linstedt begins
R&D on Data Vault
Modeling
Dan Linstedt
releases first 5
articles on Data
Vault Modeling
2012 : Dan Linstedt
announces Data
Vault 2.0
1960 1970 1980 1990 2000 2010
Thank You
http://www.linkedin.com/company/usgprofessionalsbe
+32 3 231 94 84 www.usgict.be
https://www.facebook.com/usgictbe
“In the Data Warehousing/BI world, we
should store the data as it stands on the
source system and interpret it on the
way out to the data marts. This is
absolutely critical to remember.” Dan Linstedt
@BICC_at_USG