managing scientific data with ndn · 2015-09-29 · managing scientific data with ndn chengyu fan,...
TRANSCRIPT
MANAGING SCIENTIFIC
DATA WITH NDN
Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto,
Catherine Olschanowsky, Christos Papadopoulos
NDNcomm 2015
Sept 28, 2015 Los Angeles, CA
Supported by NSF #13410999 and NSF#1345236
1
Introduction
Scientific data is often very large and complex
Climate - CMIP5: 3.5 PB, CMIP6: 350PB-3EB
Physics - Atlas: 4 PB/Year
Astronomy, bioinformatics, others…
Science infrastructure
Cutting edge hardware but often incompatible
domain software (ESGF, xrootd, etc.)
Complexity, replication, redundancy
1
2
Our Project
Build and deploy software to evaluate NDN in
scientific applications over a dedicated hardware
infrastructure
Evaluate NDN in the context of:
Application services: publishing, discovery, retrieval, access
control, load balancing, failover, caching, etc.
Network integration (OSCARS, SDN, etc.)
Metrics
Performance, reduced complexity, ease of deployment,
interoperability, reuse, efficiency, routing, security/trust, etc.
2
3
NDN Layer Structure
UDP/IP
host host
UDP/IP
4
NDN Layer Structure
APP
UDP/IP
host host
UDP/IP
5
NDN Layer Structure
APP
NDN
UDP/IP
host
router
host
UDP/IP
6
NDN Layer Structure
APP
NDN
UDP/IPETH
Other
host
router
NDN
host
LINK
UDP/IPETH
Other
NDN
7
NDN Layer Structure
APP
NDN
UDP/IPETH
Other
host
router
NDN
host
LINK
UDP/IPETH
Other
NDN
APP
8
NDN Layer Structure
APP
NDN
UDP/IPETH
Other
host
router
NDN
host
LINK
UDP/IPETH
Other
NDN
APP
NDN
9
NDN Layer Structure
APP
NDN
UDP/IPETH
Other
host
router
NDN
host
LINK
UDP/IPETH
Other
NDN
APP
NDN
LINK
router
10
Methodology
Investigate the use of NDN as a common
platform for scientific data applications by:
Understanding data management challenges of
various scientific domains
Developing and evaluating prototype applications
that leverage NDN's features
Use prototypes to further drive NDN research
4
11
First Step – Build a Catalog
Create a shared resource – a distributed, synchronized
catalog of names over NDN
Provide common operations such as publishing, discovery, access control
Catalog only deals with name management, not dataset retrieval
Platform for further research and experimentation
Research questions:
Namespace construction, distributed publishing, key management, UI
design, failover, etc.
Functional services such as subsetting
Mapping of name-based routing to tunneling services (VPN, OSCARS,
MPLS)
5
12
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
Catalog node 2
Consumer
Catalog node 3
13
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
(1)Publish Dataset
names
Publisher
Catalog node 2
Consumer
Catalog node 3
14
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
Catalog node 2
Consumer
Catalog node 3
15
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
Catalog node 2
(2) Sync changes
Consumer
Catalog node 3
16
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
Catalog node 2
Consumer
Catalog node 3
17
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage(3) Query for
Dataset names
Publisher
Catalog node 2
Consumer
Catalog node 3
18
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
Catalog node 2
Consumer
Catalog node 3
19
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
(4) Retrieve data
Catalog node 2
Consumer
Catalog node 3
20
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
(4) Retrieve data
Catalog node 2
Consumer
Catalog node 3
21
Overview of Catalog Workflow
6
NDN
Catalog node 1
Data storage
Data storage
Publisher
(4) Retrieve data
Catalog node 2
Consumer
Catalog node 3
22
NDN-Science Testbed
NSF CC-NIE campus infrastructure award
10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN)
Currently ~50TB of CMIP5, ~70TB of HEP data
7
23
Demos
Search
Publication and Sync
Access control
Retrieval and failover
8
24
Conclusions
IP encourages common host access, not common data access
methods
Does not encourage interoperability at the application level
NDN has the potential to unify the service interface required
by scientific applications
Science testbed and prototypes to test hypothesis and drive research
and experimentation
Ready-to-try catalog, we invite you to try it with your data
Catalog is general, supports a variety of applications
Currently CMIP5 and HEP applications
UI for data search and retrieval.
9
25
Our sponsors: NSF and ESnet
Join us @
http://www.netsec.colostate.edu/mailman/listinfo/ndn-sci
10
Backup Slides
11
27
Current Example: xrootd
12
/my/file /my/file
Data Serverscmsdxrootd cmsdxrootd cmsdxrootd
A B C
Fragile, fairly complex middleware
28
Current Example: xrootd
12
/my/file /my/file
Data Servers
Manager(a.k.a. Redirector)
cmsdxrootd cmsdxrootd cmsdxrootd
cmsdxrootd
A B C
Fragile, fairly complex middleware
29
Current Example: xrootd
12
/my/file /my/file
Data Servers
Manager(a.k.a. Redirector)
Client
cmsdxrootd cmsdxrootd cmsdxrootd
cmsdxrootd
A B C
Fragile, fairly complex middleware
30
Current Example: xrootd
12
/my/file /my/file
4: Try open() at A
Data Servers
Manager(a.k.a. Redirector)
Client
cmsdxrootd cmsdxrootd cmsdxrootd
cmsdxrootd
A B C
Fragile, fairly complex middleware
31
NDN
xrootd under NDN
Significantly reduced system complexity
Better service abstraction
13
/my/file /my/file
Data Serverscmsdxrootd cmsdxrootd cmsdxrootd
A B C
32
NDN
xrootd under NDN
Significantly reduced system complexity
Better service abstraction
13
/my/file /my/file
Data Serverscmsdxrootd cmsdxrootd cmsdxrootd
A B C
33
NDN
xrootd under NDN
Significantly reduced system complexity
Better service abstraction
13
/my/file /my/file
Data Servers
Client
cmsdxrootd cmsdxrootd cmsdxrootd
A B C
34
NDN
xrootd under NDN
Significantly reduced system complexity
Better service abstraction
13
/my/file /my/file
Data Servers
Client
cmsdxrootd cmsdxrootd cmsdxrootd
A B C
? /my/file
35
NDN
xrootd under NDN
Significantly reduced system complexity
Better service abstraction
13
/my/file /my/file
Data Servers
Client
cmsdxrootd cmsdxrootd cmsdxrootd
A B C
? /my/file
36
Data Publication
PublisherCatalog
1) Listening on /<catalog-
prefix>/publish
37
Data Publication
PublisherCatalog
1) Listening on /<catalog-
prefix>/publish
2) Generate NDN names for
datasets/services
38
Data Publication
PublisherCatalog
3) Request publish
1) Listening on /<catalog-
prefix>/publish
2) Generate NDN names for
datasets/services
39
Data Publication
PublisherCatalog
3) Request publish
4) Fetch published name list
1) Listening on /<catalog-
prefix>/publish
2) Generate NDN names for
datasets/services
40
Data Publication
PublisherCatalog
3) Request publish
4) Fetch published name list
5) Authenticate the Data and
validate data name against trust
model
1) Listening on /<catalog-
prefix>/publish
2) Generate NDN names for
datasets/services
41
Data Publication
PublisherCatalog
3) Request publish
4) Fetch published name list
6) Share names with other
catalogs
5) Authenticate the Data and
validate data name against trust
model
1) Listening on /<catalog-
prefix>/publish
2) Generate NDN names for
datasets/services
42
Keys for ndn-atmos
15
Self-signed root key/cmip5/KEY
/cmip5/lbl/KEY /cmip5/nwsc/KEY… Site’s keys
/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY
Application’s keys(Dataset names publishing) (NLSR)
/cmip5/nwsc/<router>/KEY
43
Keys for ndn-atmos
15
Self-signed root key/cmip5/KEY
/cmip5/lbl/KEY /cmip5/nwsc/KEY… Site’s keys
/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY
Application’s keys
signs
(Dataset names publishing) (NLSR)
/cmip5/nwsc/<router>/KEY
44
Trust Model
Only namespace owners are allowed to publish data
Data provenance built into the data packet
16
/PublisherA/publish
Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3
+ /PublisherA/publish/file/4
Content Name
Signature
Data payload
Valid publish message
45
Trust Model
Only namespace owners are allowed to publish data
Data provenance built into the data packet
16
/PublisherA/publish
Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3
+ /PublisherA/publish/file/4
Content Name
Signature
Data payload
/PublisherA/publish
Publisher A’s signature
- /PublisherB/publish/file
Valid publish message Invalid publish message
46
Trust Model
Only namespace owners are allowed to publish data
Data provenance built into the data packet
16
/PublisherA/publish
Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3
+ /PublisherA/publish/file/4
Content Name
Signature
Data payload
/PublisherA/publish
Publisher A’s signature
- /PublisherB/publish/file
Valid publish message Invalid publish message
47
Trust Model
Only namespace owners are allowed to publish data
Data provenance built into the data packet
16
/PublisherA/publish
Publisher A’s signature
- /PublisherA/publish/file/1
- /PublisherA/publish/file/2
+ /PublisherA/publish/file/3
+ /PublisherA/publish/file/4
Content Name
Signature
Data payload
/PublisherA/publish
Publisher A’s signature
- /PublisherB/publish/file
Valid publish message Invalid publish message
48
Name Discovery
ConsumerCatalog
1) Listening on /<catalog-
prefix>/query
49
Name Discovery
ConsumerCatalog
2) Query with parameters
(model=cmip5 AND frequency=6hr)
1) Listening on /<catalog-
prefix>/query
50
Name Discovery
ConsumerCatalog
2) Query with parameters
(model=cmip5 AND frequency=6hr)
3) Query local DB; Packetize
results under
/<catalog-prefix>/query-
results/<params>
1) Listening on /<catalog-
prefix>/query
51
Name Discovery
ConsumerCatalog
2) Query with parameters
(model=cmip5 AND frequency=6hr)
3) Query local DB; Packetize
results under
/<catalog-prefix>/query-
results/<params>
3) ACK
1) Listening on /<catalog-
prefix>/query
52
Name Discovery
ConsumerCatalog
2) Query with parameters
(model=cmip5 AND frequency=6hr)
3) Query local DB; Packetize
results under
/<catalog-prefix>/query-
results/<params>
3) ACK
4) Fetch query results (name list)
1) Listening on /<catalog-
prefix>/query
53
Name Discovery
ConsumerCatalog
2) Query with parameters
(model=cmip5 AND frequency=6hr)
3) Query local DB; Packetize
results under
/<catalog-prefix>/query-
results/<params>
3) ACK
4) Fetch query results (name list)
1) Listening on /<catalog-
prefix>/query
5) Fetch desired dataset(s) or
re-query
54
Data Publication
Catalog
Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher
Generate NDN names for
datasets/services
Inform catalog of names to
add/remove
PublisherCatalog
55
Data Publication
Catalog
Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher
Generate NDN names for
datasets/services
Inform catalog of names to
add/remove
PublisherCatalogRequest publish
56
Data Publication
Catalog
Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher
Generate NDN names for
datasets/services
Inform catalog of names to
add/remove
PublisherCatalogRequest publish
Fetch published name list
57
Data Publication
Catalog
Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher
Generate NDN names for
datasets/services
Inform catalog of names to
add/remove
PublisherCatalogRequest publish
Fetch published name listValidate data name
against trust model
58
Data Publication
Catalog
Accept publish requests:
/<catalog-prefix>/publish
Authenticate and retrieve
data names from publisher
Sync names with other
catalogs
Publisher
Generate NDN names for
datasets/services
Inform catalog of names to
add/remove
PublisherCatalogRequest publish
Fetch published name list
Share names with other
catalogs
Validate data name
against trust model
59
Name Discovery
Catalog
Accept queries on
/<catalog-prefix>/query
Query local DB
Packetize the returned names
under
/<catalog-prefix>/query-
results/<params>
User
Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list
Fetch desired dataset(s) or re-
query
ConsumerCatalog
60
Name Discovery
Catalog
Accept queries on
/<catalog-prefix>/query
Query local DB
Packetize the returned names
under
/<catalog-prefix>/query-
results/<params>
User
Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list
Fetch desired dataset(s) or re-
query
ConsumerCatalogQuery with parameters
61
Name Discovery
Catalog
Accept queries on
/<catalog-prefix>/query
Query local DB
Packetize the returned names
under
/<catalog-prefix>/query-
results/<params>
User
Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list
Fetch desired dataset(s) or re-
query
ConsumerCatalogQuery with parameters
Query local DB;
Packetize results
62
Name Discovery
Catalog
Accept queries on
/<catalog-prefix>/query
Query local DB
Packetize the returned names
under
/<catalog-prefix>/query-
results/<params>
User
Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list
Fetch desired dataset(s) or re-
query
ConsumerCatalogQuery with parameters
Query local DB;
Packetize resultsACK
63
Name Discovery
Catalog
Accept queries on
/<catalog-prefix>/query
Query local DB
Packetize the returned names
under
/<catalog-prefix>/query-
results/<params>
User
Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list
Fetch desired dataset(s) or re-
query
ConsumerCatalogQuery with parameters
Query local DB;
Packetize resultsACK
Fetch query results
64
Name Discovery
Catalog
Accept queries on
/<catalog-prefix>/query
Query local DB
Packetize the returned names
under
/<catalog-prefix>/query-
results/<params>
User
Query catalog for names with
specified components
e.g.: model=cmip5 AND
frequency=6hr
Fetch generated name list
Fetch desired dataset(s) or re-
query
ConsumerCatalogQuery with parameters
Query local DB;
Packetize resultsACK
Fetch data with
standard NDNFetch query results
65
Name Discovery Optimization
Catalog
Accept queries on
/<catalog-prefix>/queryParams
Query local DB
Packetize the returned names
under
/<catalog-
prefix>/queryParams/seg#
In case of failure, queries get
redirected to another catalog
Consumers
Can query any catalog
instances
Can transparently failover to
another catalog
• Avoid maintaining state between user and catalog
• Enables graceful failover
66
NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval …
Provides a better abstraction to the applications
21
/my/file /my/file
Data Serverscmsdxrootd cmsdxrootd cmsdxrootd
A B C
67
NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval …
Provides a better abstraction to the applications
21
/my/file /my/file
Data Serverscmsdxrootd cmsdxrootd cmsdxrootd
A B C
68
NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval …
Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
Client
cmsdxrootd cmsdxrootd cmsdxrootd
A B C
69
NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval …
Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
Client
cmsdxrootd cmsdxrootd cmsdxrootd
A B C
? /my/file
70
NDN
Simplified xrootd Under NDN
NDN integrates discovery, failover, retrieval …
Provides a better abstraction to the applications
21
/my/file /my/file
Data Servers
Client
cmsdxrootd cmsdxrootd cmsdxrootd
A B C
? /my/file
71
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes
Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
NDN
User wants: /CMIP5/output1/VA/6hr/2016
. . .
72
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes
Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer
NDN
/CMIP5
User wants: /CMIP5/output1/VA/6hr/2016
. . .
73
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes
Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer/CMIP5/output/BCC/6hr/1998
NDN
/CMIP5
User wants: /CMIP5/output1/VA/6hr/2016
. . .
74
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes
Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer/CMIP5/output/BCC/6hr/1998
NDN
/CMIP5
/CMIP5/output/BCC/6hr (exclude 1998)
User wants: /CMIP5/output1/VA/6hr/2016
. . .
75
Name Discovery Challenges
Users may need to discover content/services without knowing a
the full NDN name prefix structure
NDN names are contiguous prefixes
Users may only know a few disjoint name components (e.g.
frequency=6hr)
But can not use wildcards for name discovery
22
Consumer/CMIP5/output/BCC/6hr/1998
NDN
/CMIP5
/CMIP5/output/BCC/6hr (exclude 1998)
May take too many requests to find desired data or service
User wants: /CMIP5/output1/VA/6hr/2016
. . .
76
NDN Support for Big Science
NDN Names separate data from hosts
Discovery: Names directly translate to network queries
Failover: Network can get verifiable data from anywhere
Retrieval: Data can be fetched from optimal source(s)
Investigate the use of NDN as a platform for scientific data
applications
Understand data management challenges of various scientific domains
Develop prototype applications to leverage NDN's built-in features
Use these applications as case studies to drive NDN research aspects
23
77
Summary
NDN improves scientific data management at scale
Apps benefit from transparent multipath, automatic failover, etc.
Built-in security provides publisher provenance
Names are the common building block for content and services
Names are flexible: can refer to static content or dynamic services
Catalog supports efficient publication, non-contiguous name
discovery
Users can discover content and services with minimal a priori knowledge
Catalog validates publication requests for authorization
24
78
Managing Scientific Data with NDN
Science testbed
10G testbed (courtesy of ESnet,
UCAR, and CSU Research LAN)
Nodes strategically located near
scientific data (climate +HEP)
CC-NIE NSF award
Distributed, synchronized catalog of
names and services
Common functionality: publishing,
discovery, access control, etc.
Search and retrieval UI
Platform for further research and
experimentation
Research questions:
Namespace construction, distributed
publishing, key management, UI design,
failover, etc.
Functional services such as subsetting
Mapping of name-based routing to
tunneling services (VPN, OSCARS, MPLS)
79
Managing Scientific Data with NDN
Science testbed
10G testbed (courtesy of ESnet,
UCAR, and CSU Research LAN)
CMIP5 and HEP data
CC-NIE NSF award
Name-based Internet architecture
Name the data, not the host
All data digitally signed
Unifies and pushes common functionality
to the network: publishing, discovery,
access control, etc.
Data Intensive applications
Automatic pervasive in-network caching,
parallel retrieval, automatic failover
and more
Simpler alternative middleware
implementation e.g., ESGF, xrootd