denodo datafest 2017: multi-zone data virtualization for data lakes
Post on 21-Jan-2018
50 Views
Preview:
TRANSCRIPT
Multi-zone Data Virtualization for Data Lakes
How to share data with other government agenciespreserving privacy and security guidelines
Paul GrootenOctober 26th, 2017
Statistics Netherlands (CBS) Key Characteristics
2Multi-zone Data Virtualization for Data Lakes |
Autonomous Public Body with a Legal Entity (“ZBO”)
Official StatisticsEconomic - Social - Census
National and Regional
180 mEur 2000+
The Hague Heerlen Bonaire
Founded in 1899 (5 fte, 2 rooms), now 3 offices
Ambition: to become the Data Hub of the Dutch Government
Data Collection
Data Processing
Publishing
Statistical process
Which problems do we want to solve
• Current methods and technologies are not sufficientanymore to share data easily on a bigger scale
• We want to share more statistical data (also with external parties)
• We want to become faster and need a shorter time to market
• We need to reduce costs (storage, infrastructure)• We need to work on secure & privacy preserved
data sharing• Data sets should be easy to find
3Multi-zone Data Virtualization for Data Lakes |
The layered Data Architecture
4
Demand
Supply
(Leg
acy)D
atasou
rces
Data Source Layer(DSL)
CSVSQL DB
Web Srv
ETL tooling
XLS
App
CBDS
Vraag
Consumer Layer(CL)
Web PageS2STooling P V A
P V A= Data Prep = Data Visualization = Data Analytics
Security
Data V
irtualizatio
nD
EN
OD
O
Data Transformation Layer (DTL)
Data Provisioning Layer (DPL)
Building Block 1
Building Block 2
Building Block 3
Building Block 4
Web-Service C
OData Web-Service B
Web-Service A
Security
UserQue.
Da
ta G
ove
rna
nce
TechMeta
Me
tad
ata
Ma
na
ge
me
nt
Import Conceptual Meta
Conn.String
Existing New
CIO office | Versie 1.81
Se
curi
ty &
Au
tori
sati
on
Multi-zone Data Virtualization for Data Lakes |
…towards a multi zone DaaS Architecture
5
Security
CLD
atasou
rces
DSL
Data V
irtualizatio
n
DTL
DPL
Da
ta G
ove
rna
nce
Existing New
UserQue.
Me
tad
ata
Ma
na
ge
me
nt
TechMeta
Zone CBS
DDC=Departemental Data Center | UDC=Urban Data Center | CL=Consumer LayerDPL=Data Provisioning Layer | DTL=Data Transformation Layer | DSL=Data Source Layer
Building Block 1
Building Block 2
Web-Service A
CBDSDSC
P VA
P V A= Data Prep = Data Visualization = Data Analytics
Security
Zone DDC1
Building Block 7
Building Block 8
Web-Service D
DDC1
Secured
VPN
P VA
Zone UDC1
Building Block 3
Building Block 4
Web-Service B
UDC1
Secured
VPN
P VA
Zone UDC2
Building Block 5
Building Block 6
Web-Service C
UDC2
Secured
VPN
P VA
EHB Se
curi
ty &
Au
tori
sati
on
So what is a Zone?
6
Security
CLD
atasou
rces
DSL
Data V
irtualizatio
n
DTL
DPL
Da
ta G
ove
rna
nce
Existing New
UserQue.
Me
tad
ata
Ma
na
ge
me
nt
TechMeta
Zone CBS
DDC=Departemental Data Center | UDC=Urban Data Center | CL=Consumer Layer DPL=Data Provisioning Layer | DTL=Data Transformation Layer | DSL=Data Source Layer
Building Block 1
Building Block 2
Web-Service A
CBDSDSC
P VA
P V A= Data Prep = Data Visualization = Data Analytics
Security
Zone EZ
Building Block 7
Building Block 8
Web-Service D
DDC1
Secured
VPN
P VA
Zone UDC1
Building Block 3
Building Block 4
Web-Service B
UDC1
Secured
VPN
P VA
Zone UDC2
Building Block 5
Building Block 6
Web-Service C
UDC2
Secured
VPN
P VA
EHB Se
curi
ty &
Au
tori
sati
on
A Zone :• Is a virtual container in
which a specified set of Data Governance rules apply
• Has a specific user group• Contains virtual datasets• Has it’s own authorization
(which can and will differfrom other zones)
• Has an owner• Has it’s own Change
Advisory Board• Can have it’s own cache
database on it’s ownhardware
What do we want to achieve with the Data Lake
7
€ M{ "
stimulateCost data-access
StatisticalRisc
Growth Re-use
Time toMarket
reduce
Multi-zone Data Virtualization for Data Lakes |
What are our next steps
• Finish Proof-of-Concept by end 2017• Develop product (MVP)• Get approval from Board of Management• Implement Minimal Viable Product in 2018/H2• Enhance MVP with new functionalities,
like disclosure control (confidentiality on-the-fly) protection
8Multi-zone Data Virtualization for Data Lakes |
Recommendations
• Check whether your strategy is in line with your plans (v.v.)
• Start experimenting with Data Virtualization in an early stage (start with the express version)
• Build a culture that embraces change and communicate your plans as often as possible
9Multi-zone Data Virtualization for Data Lakes |
top related