NIST BIG DATA WGReference Architecture Subgroup
Intermediate Report
Co-chairs:Orit Levin (Microsoft)James Ketner (AT&T)Don Krapohl (Augmented Intelligence)
July 24th, 2013
Reference Architecture Objectives• Addresses a broad range of stakeholders (e.g., data owners, industries,
academia, policy makers)• Wide scope:• Encompasses the whole data life cycle or in the ecosystem• Can be applied to different use cases (including various verticals)• Represents different system architectures (e.g., an enterprise data warehouse,
distributed cloud-based system using multiple service providers)• Focus• Potentially with initial focus on the Big Data analytics and tools• Assists in identifying security and privacy issues
• Agnostic to any specific technologies
27/24/2013 NIST Big Data WG / Ref Arch Sub-group
RA Diagram Independent Submissions• Different styles and perspectives, but easy to map between them• Data centric (Wo Chang)• Data Flow centric (Orit Levin, Bob Marcus)• Technology Layers / Stack diagram (Gary Mazzaferro)
• The vocabulary used in these submissions and on the mailing list has been compiled and submitted as M-0057
37/24/2013 NIST Big Data WG / Ref Arch Sub-group
NIST Big Data WG / Ref Arch Sub-group 4
Abstract Reference Architectureby Wo Chang / NIST
7/24/2013
NIST Big Data WG / Ref Arch Sub-group 5
Independent RA Proposals: Big DataSources, Usage, Transformation, and Infrastructure
7/24/2013
Data FlowDiagram by Bob
Marcus
Technology Stack / Layers
Diagram by G. Mazzaferro
Data Flow Ecosystem Diagram by Orit Levin
NIST Big Data WG / Ref Arch Sub-group 6
Data Sources and Usage
7/24/2013
Data FlowDiagram by Bob
Marcus
Technology Stack / Layers
Diagram by G. Mazzaferro
Data Flow Ecosystem Diagram by Orit Levin
NIST Big Data WG / Ref Arch Sub-group 7
Infrastructure: Storage, Security, and Management
7/24/2013
Data FlowDiagram by Bob
Marcus
Technology Stack / Layers
Diagram by G. Mazzaferro
Data Flow Ecosystem Diagram by Orit Levin
NIST Big Data WG / Ref Arch Sub-group 8
Data Transformation: Processing, Analytics, and Visualization
7/24/2013
Data FlowDiagram by Bob
Marcus
Technology Stack / Layers
Diagram by G. Mazzaferro
Data Flow Ecosystem Diagram by Orit Levin
NIST Big Data WG / Ref Arch Sub-group 9
Draft Agreement / Rough Consensus• Transformation includes• Processing functions• Analytic functions• Visualization functions
• Data Infrastructure includes• Data stores• In-memory DBs• Analytic DBs
7/24/2013
Sources
Transformation
Usage
Data
Infra
stru
ctur
e
Secu
rity
Man
agem
ent
Clou
d Co
mpu
ting
Net
wor
k
Next Steps and AIs• Deliverable I: Write the White Paper draft showing one or more (e.g., Data Flow and
Stack approaches) using the same or similar terminology• AI: Chairs will start the draft of the document incorporating the submissions to the Ref Arch
subgroup• AI: Close cooperation between “Ref Arch” and “Def&Tax” sub-groups to produce the Output:
taxonomy for the RA diagrams with definitions for major entities/blocks; Input: M-0057.
• Deliverable II: A draft of a single RA requires more discussion and inputs based on the work of all sub-groups• AI: Chairs will start the draft of the document incorporating the findings of the Ref Arch subgroup• AI: Review the latest contributions to the Ref Arch and incorporate their findings (See email from
Yuri Demchenko / University of Amsterdam)• AI: Close cooperation with the “Use Cases” and “Security” sub-groups to identify the areas of
focus for “zooming” into their architecture
107/24/2013 NIST Big Data WG / Ref Arch Sub-group
Backup Slides
117/24/2013 NIST Big Data WG / Ref Arch Sub-group
12
Submitted RAs
7/24/2013 NIST Big Data WG / Ref Arch Sub-group
NIST Big Data WG / Ref Arch Sub-group 13
Data Centric by Wo Chang / NIST
7/24/2013
Data Flow Diagram by Bob Marcus
147/24/2013 NIST Big Data WG / Ref Arch Sub-group
Individual Data Transfer
Big Data Transfer
Selected Data Storage and Retrieval
Big Data Storage and Retrieval
Aggregation
D a t a O b j e c t sData Sources
Data UsageGovernment (incl. health & financial institutions)Industries / BusinessesNetwork Operators / Telecom Academia
Data Mining
Matching
Collection
Data Transformation Data InfrastructureStorage & Retrieval
Man
agem
ent
Secu
rity
Cond
ition
ing
Anonymized
Pseudo- anonymized
PII
VOLUMEVARIETY
VELOCITY
Aggregation
15
Data Flow Ecosystem Diagram by Orit Levin
7/24/2013 NIST Big Data WG / Ref Arch Sub-group
M i c r o s o f t
Technology Layers / Stack diagramby Gary Mazzaferro
167/24/2013 NIST Big Data WG / Ref Arch Sub-group
NIST Big Data WG / Ref Arch Sub-group 17
Mapping to Technologies and Use CasesPrepared by the authors of the original RAs
7/24/2013
187/24/2013 NIST Big Data WG / Ref Arch Sub-group
19
An Example of Cloud Computing Usage in Big Data Ecosystem
Individual Data Transfer
Big Data Transfer
Selected Data Storage and Retrieval
Big Data Storage and Retrieval
Aggregation
D a t a O b j e c t sData Sources
Data UsageGovernment (incl. health & financial institutions)Industries / BusinessesNetwork Operators / Telecom Academia
Data Mining
Collection
Data Transformation Data Infrastructure
VOLUMEVARIETY
VELOCITY
Data Warehouse
Cloud Provider/ Service
LayerSa
aS
Paa S
IaaSMatching
7/24/2013 NIST Big Data WG / Ref Arch Sub-group
Online Data Aggregator
Data Subject / PersonOnline Sources
Public Records (commons, government, etc.)
Offline Sources
Internal RecordsOther devices (Smart Grid, surveillance, scientific, etc.)
End User devices incl. OS (mobile phones, etc.)
Applications (search, publishers, etc.)
Match/Bridge Service
Networks
Government, health, financial institutions, academia
Industries /Businesses
Network Operators
Collection
DataManagementPlatforms(DMPs)
UI: Do Not Track (DNT)
HTTP: DNT
Analytic Cookie
DMP Cookie
DPI
Match Cookie
Appl. with customers (communications, social network, etc.
Match Container Tag or Pixel request
Offline Data Aggregator
Web Browsers
Data Mining
Person Attribution
UsersSSP DSPAdNet AdX AgencyPublisher AdvertiserAdvertising Industry Ecosystem
DMP Container Tagor Pixel request
Control
Aggregated
1st Party2nd Party
De-identifiedPII
3rd Party
Contextual Data Collection
Behavioral Data Creation
Big Data Transfer
Individual Data Transfer
20
Use Case: Advertising
7/24/2013 NIST Big Data WG / Ref Arch Sub-group
Individual Data Transfer
Big Data Transfer
Selected Data Storage and Retrieval
Big Data Storage and Retrieval
Online Analytical Processing (OLAP)
Data Usage Department Data MartRegional Data MartSubject Data Mart Application Data Mart
Data Mining /Knowledge Discovery in Databases (KDD)
Extraction, Transformation, and Loading(ETL)
Data Transformation Data InfrastructureCentral Data Warehouse
Man
agem
ent
Secu
rity
ArchivesFilesOnline Transaction Processing (OLTP) Systems
MS Office Documents
Functional Data Mart
Operational Data Store
Staging Area
Data Sources
Manual
Managed Report Environment (MRE)
D a t a O b j e c t s
21
Use Case: Enterprise Data Warehouse
7/24/2013 NIST Big Data WG / Ref Arch Sub-group
NIST Big Data WG / Ref Arch Sub-group 227/24/2013