managing scientific data with ndn · 2015-09-29 · managing scientific data with ndn chengyu fan,...

Post on 21-Feb-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MANAGING SCIENTIFIC

DATA WITH NDN

Chengyu Fan, Susmit Shannigrahi, Steve DiBenedetto,

Catherine Olschanowsky, Christos Papadopoulos

NDNcomm 2015

Sept 28, 2015 Los Angeles, CA

Supported by NSF #13410999 and NSF#1345236

1

Introduction

Scientific data is often very large and complex

Climate - CMIP5: 3.5 PB, CMIP6: 350PB-3EB

Physics - Atlas: 4 PB/Year

Astronomy, bioinformatics, others…

Science infrastructure

Cutting edge hardware but often incompatible

domain software (ESGF, xrootd, etc.)

Complexity, replication, redundancy

1

2

Our Project

Build and deploy software to evaluate NDN in

scientific applications over a dedicated hardware

infrastructure

Evaluate NDN in the context of:

Application services: publishing, discovery, retrieval, access

control, load balancing, failover, caching, etc.

Network integration (OSCARS, SDN, etc.)

Metrics

Performance, reduced complexity, ease of deployment,

interoperability, reuse, efficiency, routing, security/trust, etc.

2

3

NDN Layer Structure

UDP/IP

host host

UDP/IP

4

NDN Layer Structure

APP

UDP/IP

host host

UDP/IP

5

NDN Layer Structure

APP

NDN

UDP/IP

host

router

host

UDP/IP

6

NDN Layer Structure

APP

NDN

UDP/IPETH

Other

host

router

NDN

host

LINK

UDP/IPETH

Other

NDN

7

NDN Layer Structure

APP

NDN

UDP/IPETH

Other

host

router

NDN

host

LINK

UDP/IPETH

Other

NDN

APP

8

NDN Layer Structure

APP

NDN

UDP/IPETH

Other

host

router

NDN

host

LINK

UDP/IPETH

Other

NDN

APP

NDN

9

NDN Layer Structure

APP

NDN

UDP/IPETH

Other

host

router

NDN

host

LINK

UDP/IPETH

Other

NDN

APP

NDN

LINK

router

10

Methodology

Investigate the use of NDN as a common

platform for scientific data applications by:

Understanding data management challenges of

various scientific domains

Developing and evaluating prototype applications

that leverage NDN's features

Use prototypes to further drive NDN research

4

11

First Step – Build a Catalog

Create a shared resource – a distributed, synchronized

catalog of names over NDN

Provide common operations such as publishing, discovery, access control

Catalog only deals with name management, not dataset retrieval

Platform for further research and experimentation

Research questions:

Namespace construction, distributed publishing, key management, UI

design, failover, etc.

Functional services such as subsetting

Mapping of name-based routing to tunneling services (VPN, OSCARS,

MPLS)

5

12

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

Catalog node 2

Consumer

Catalog node 3

13

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

(1)Publish Dataset

names

Publisher

Catalog node 2

Consumer

Catalog node 3

14

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

Catalog node 2

Consumer

Catalog node 3

15

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

Catalog node 2

(2) Sync changes

Consumer

Catalog node 3

16

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

Catalog node 2

Consumer

Catalog node 3

17

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage(3) Query for

Dataset names

Publisher

Catalog node 2

Consumer

Catalog node 3

18

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

Catalog node 2

Consumer

Catalog node 3

19

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

(4) Retrieve data

Catalog node 2

Consumer

Catalog node 3

20

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

(4) Retrieve data

Catalog node 2

Consumer

Catalog node 3

21

Overview of Catalog Workflow

6

NDN

Catalog node 1

Data storage

Data storage

Publisher

(4) Retrieve data

Catalog node 2

Consumer

Catalog node 3

22

NDN-Science Testbed

NSF CC-NIE campus infrastructure award

10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN)

Currently ~50TB of CMIP5, ~70TB of HEP data

7

23

Demos

Search

Publication and Sync

Access control

Retrieval and failover

8

24

Conclusions

IP encourages common host access, not common data access

methods

Does not encourage interoperability at the application level

NDN has the potential to unify the service interface required

by scientific applications

Science testbed and prototypes to test hypothesis and drive research

and experimentation

Ready-to-try catalog, we invite you to try it with your data

Catalog is general, supports a variety of applications

Currently CMIP5 and HEP applications

UI for data search and retrieval.

9

25

Our sponsors: NSF and ESnet

Join us @

http://www.netsec.colostate.edu/mailman/listinfo/ndn-sci

10

Backup Slides

11

27

Current Example: xrootd

12

/my/file /my/file

Data Serverscmsdxrootd cmsdxrootd cmsdxrootd

A B C

Fragile, fairly complex middleware

28

Current Example: xrootd

12

/my/file /my/file

Data Servers

Manager(a.k.a. Redirector)

cmsdxrootd cmsdxrootd cmsdxrootd

cmsdxrootd

A B C

Fragile, fairly complex middleware

29

Current Example: xrootd

12

/my/file /my/file

Data Servers

Manager(a.k.a. Redirector)

Client

cmsdxrootd cmsdxrootd cmsdxrootd

cmsdxrootd

A B C

Fragile, fairly complex middleware

30

Current Example: xrootd

12

/my/file /my/file

4: Try open() at A

Data Servers

Manager(a.k.a. Redirector)

Client

cmsdxrootd cmsdxrootd cmsdxrootd

cmsdxrootd

A B C

Fragile, fairly complex middleware

31

NDN

xrootd under NDN

Significantly reduced system complexity

Better service abstraction

13

/my/file /my/file

Data Serverscmsdxrootd cmsdxrootd cmsdxrootd

A B C

32

NDN

xrootd under NDN

Significantly reduced system complexity

Better service abstraction

13

/my/file /my/file

Data Serverscmsdxrootd cmsdxrootd cmsdxrootd

A B C

33

NDN

xrootd under NDN

Significantly reduced system complexity

Better service abstraction

13

/my/file /my/file

Data Servers

Client

cmsdxrootd cmsdxrootd cmsdxrootd

A B C

34

NDN

xrootd under NDN

Significantly reduced system complexity

Better service abstraction

13

/my/file /my/file

Data Servers

Client

cmsdxrootd cmsdxrootd cmsdxrootd

A B C

? /my/file

35

NDN

xrootd under NDN

Significantly reduced system complexity

Better service abstraction

13

/my/file /my/file

Data Servers

Client

cmsdxrootd cmsdxrootd cmsdxrootd

A B C

? /my/file

36

Data Publication

PublisherCatalog

1) Listening on /<catalog-

prefix>/publish

37

Data Publication

PublisherCatalog

1) Listening on /<catalog-

prefix>/publish

2) Generate NDN names for

datasets/services

38

Data Publication

PublisherCatalog

3) Request publish

1) Listening on /<catalog-

prefix>/publish

2) Generate NDN names for

datasets/services

39

Data Publication

PublisherCatalog

3) Request publish

4) Fetch published name list

1) Listening on /<catalog-

prefix>/publish

2) Generate NDN names for

datasets/services

40

Data Publication

PublisherCatalog

3) Request publish

4) Fetch published name list

5) Authenticate the Data and

validate data name against trust

model

1) Listening on /<catalog-

prefix>/publish

2) Generate NDN names for

datasets/services

41

Data Publication

PublisherCatalog

3) Request publish

4) Fetch published name list

6) Share names with other

catalogs

5) Authenticate the Data and

validate data name against trust

model

1) Listening on /<catalog-

prefix>/publish

2) Generate NDN names for

datasets/services

42

Keys for ndn-atmos

15

Self-signed root key/cmip5/KEY

/cmip5/lbl/KEY /cmip5/nwsc/KEY… Site’s keys

/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY

Application’s keys(Dataset names publishing) (NLSR)

/cmip5/nwsc/<router>/KEY

43

Keys for ndn-atmos

15

Self-signed root key/cmip5/KEY

/cmip5/lbl/KEY /cmip5/nwsc/KEY… Site’s keys

/cmip5/lbl/<DataPublisher>/KEY /cmip5/nwsc/<operator>/KEY

Application’s keys

signs

(Dataset names publishing) (NLSR)

/cmip5/nwsc/<router>/KEY

44

Trust Model

Only namespace owners are allowed to publish data

Data provenance built into the data packet

16

/PublisherA/publish

Publisher A’s signature

- /PublisherA/publish/file/1

- /PublisherA/publish/file/2

+ /PublisherA/publish/file/3

+ /PublisherA/publish/file/4

Content Name

Signature

Data payload

Valid publish message

45

Trust Model

Only namespace owners are allowed to publish data

Data provenance built into the data packet

16

/PublisherA/publish

Publisher A’s signature

- /PublisherA/publish/file/1

- /PublisherA/publish/file/2

+ /PublisherA/publish/file/3

+ /PublisherA/publish/file/4

Content Name

Signature

Data payload

/PublisherA/publish

Publisher A’s signature

- /PublisherB/publish/file

Valid publish message Invalid publish message

46

Trust Model

Only namespace owners are allowed to publish data

Data provenance built into the data packet

16

/PublisherA/publish

Publisher A’s signature

- /PublisherA/publish/file/1

- /PublisherA/publish/file/2

+ /PublisherA/publish/file/3

+ /PublisherA/publish/file/4

Content Name

Signature

Data payload

/PublisherA/publish

Publisher A’s signature

- /PublisherB/publish/file

Valid publish message Invalid publish message

47

Trust Model

Only namespace owners are allowed to publish data

Data provenance built into the data packet

16

/PublisherA/publish

Publisher A’s signature

- /PublisherA/publish/file/1

- /PublisherA/publish/file/2

+ /PublisherA/publish/file/3

+ /PublisherA/publish/file/4

Content Name

Signature

Data payload

/PublisherA/publish

Publisher A’s signature

- /PublisherB/publish/file

Valid publish message Invalid publish message

48

Name Discovery

ConsumerCatalog

1) Listening on /<catalog-

prefix>/query

49

Name Discovery

ConsumerCatalog

2) Query with parameters

(model=cmip5 AND frequency=6hr)

1) Listening on /<catalog-

prefix>/query

50

Name Discovery

ConsumerCatalog

2) Query with parameters

(model=cmip5 AND frequency=6hr)

3) Query local DB; Packetize

results under

/<catalog-prefix>/query-

results/<params>

1) Listening on /<catalog-

prefix>/query

51

Name Discovery

ConsumerCatalog

2) Query with parameters

(model=cmip5 AND frequency=6hr)

3) Query local DB; Packetize

results under

/<catalog-prefix>/query-

results/<params>

3) ACK

1) Listening on /<catalog-

prefix>/query

52

Name Discovery

ConsumerCatalog

2) Query with parameters

(model=cmip5 AND frequency=6hr)

3) Query local DB; Packetize

results under

/<catalog-prefix>/query-

results/<params>

3) ACK

4) Fetch query results (name list)

1) Listening on /<catalog-

prefix>/query

53

Name Discovery

ConsumerCatalog

2) Query with parameters

(model=cmip5 AND frequency=6hr)

3) Query local DB; Packetize

results under

/<catalog-prefix>/query-

results/<params>

3) ACK

4) Fetch query results (name list)

1) Listening on /<catalog-

prefix>/query

5) Fetch desired dataset(s) or

re-query

54

Data Publication

Catalog

Accept publish requests:

/<catalog-prefix>/publish

Authenticate and retrieve

data names from publisher

Sync names with other

catalogs

Publisher

Generate NDN names for

datasets/services

Inform catalog of names to

add/remove

PublisherCatalog

55

Data Publication

Catalog

Accept publish requests:

/<catalog-prefix>/publish

Authenticate and retrieve

data names from publisher

Sync names with other

catalogs

Publisher

Generate NDN names for

datasets/services

Inform catalog of names to

add/remove

PublisherCatalogRequest publish

56

Data Publication

Catalog

Accept publish requests:

/<catalog-prefix>/publish

Authenticate and retrieve

data names from publisher

Sync names with other

catalogs

Publisher

Generate NDN names for

datasets/services

Inform catalog of names to

add/remove

PublisherCatalogRequest publish

Fetch published name list

57

Data Publication

Catalog

Accept publish requests:

/<catalog-prefix>/publish

Authenticate and retrieve

data names from publisher

Sync names with other

catalogs

Publisher

Generate NDN names for

datasets/services

Inform catalog of names to

add/remove

PublisherCatalogRequest publish

Fetch published name listValidate data name

against trust model

58

Data Publication

Catalog

Accept publish requests:

/<catalog-prefix>/publish

Authenticate and retrieve

data names from publisher

Sync names with other

catalogs

Publisher

Generate NDN names for

datasets/services

Inform catalog of names to

add/remove

PublisherCatalogRequest publish

Fetch published name list

Share names with other

catalogs

Validate data name

against trust model

59

Name Discovery

Catalog

Accept queries on

/<catalog-prefix>/query

Query local DB

Packetize the returned names

under

/<catalog-prefix>/query-

results/<params>

User

Query catalog for names with

specified components

e.g.: model=cmip5 AND

frequency=6hr

Fetch generated name list

Fetch desired dataset(s) or re-

query

ConsumerCatalog

60

Name Discovery

Catalog

Accept queries on

/<catalog-prefix>/query

Query local DB

Packetize the returned names

under

/<catalog-prefix>/query-

results/<params>

User

Query catalog for names with

specified components

e.g.: model=cmip5 AND

frequency=6hr

Fetch generated name list

Fetch desired dataset(s) or re-

query

ConsumerCatalogQuery with parameters

61

Name Discovery

Catalog

Accept queries on

/<catalog-prefix>/query

Query local DB

Packetize the returned names

under

/<catalog-prefix>/query-

results/<params>

User

Query catalog for names with

specified components

e.g.: model=cmip5 AND

frequency=6hr

Fetch generated name list

Fetch desired dataset(s) or re-

query

ConsumerCatalogQuery with parameters

Query local DB;

Packetize results

62

Name Discovery

Catalog

Accept queries on

/<catalog-prefix>/query

Query local DB

Packetize the returned names

under

/<catalog-prefix>/query-

results/<params>

User

Query catalog for names with

specified components

e.g.: model=cmip5 AND

frequency=6hr

Fetch generated name list

Fetch desired dataset(s) or re-

query

ConsumerCatalogQuery with parameters

Query local DB;

Packetize resultsACK

63

Name Discovery

Catalog

Accept queries on

/<catalog-prefix>/query

Query local DB

Packetize the returned names

under

/<catalog-prefix>/query-

results/<params>

User

Query catalog for names with

specified components

e.g.: model=cmip5 AND

frequency=6hr

Fetch generated name list

Fetch desired dataset(s) or re-

query

ConsumerCatalogQuery with parameters

Query local DB;

Packetize resultsACK

Fetch query results

64

Name Discovery

Catalog

Accept queries on

/<catalog-prefix>/query

Query local DB

Packetize the returned names

under

/<catalog-prefix>/query-

results/<params>

User

Query catalog for names with

specified components

e.g.: model=cmip5 AND

frequency=6hr

Fetch generated name list

Fetch desired dataset(s) or re-

query

ConsumerCatalogQuery with parameters

Query local DB;

Packetize resultsACK

Fetch data with

standard NDNFetch query results

65

Name Discovery Optimization

Catalog

Accept queries on

/<catalog-prefix>/queryParams

Query local DB

Packetize the returned names

under

/<catalog-

prefix>/queryParams/seg#

In case of failure, queries get

redirected to another catalog

Consumers

Can query any catalog

instances

Can transparently failover to

another catalog

• Avoid maintaining state between user and catalog

• Enables graceful failover

66

NDN

Simplified xrootd Under NDN

NDN integrates discovery, failover, retrieval …

Provides a better abstraction to the applications

21

/my/file /my/file

Data Serverscmsdxrootd cmsdxrootd cmsdxrootd

A B C

67

NDN

Simplified xrootd Under NDN

NDN integrates discovery, failover, retrieval …

Provides a better abstraction to the applications

21

/my/file /my/file

Data Serverscmsdxrootd cmsdxrootd cmsdxrootd

A B C

68

NDN

Simplified xrootd Under NDN

NDN integrates discovery, failover, retrieval …

Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

Client

cmsdxrootd cmsdxrootd cmsdxrootd

A B C

69

NDN

Simplified xrootd Under NDN

NDN integrates discovery, failover, retrieval …

Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

Client

cmsdxrootd cmsdxrootd cmsdxrootd

A B C

? /my/file

70

NDN

Simplified xrootd Under NDN

NDN integrates discovery, failover, retrieval …

Provides a better abstraction to the applications

21

/my/file /my/file

Data Servers

Client

cmsdxrootd cmsdxrootd cmsdxrootd

A B C

? /my/file

71

Name Discovery Challenges

Users may need to discover content/services without knowing a

the full NDN name prefix structure

NDN names are contiguous prefixes

Users may only know a few disjoint name components (e.g.

frequency=6hr)

But can not use wildcards for name discovery

22

Consumer

NDN

User wants: /CMIP5/output1/VA/6hr/2016

. . .

72

Name Discovery Challenges

Users may need to discover content/services without knowing a

the full NDN name prefix structure

NDN names are contiguous prefixes

Users may only know a few disjoint name components (e.g.

frequency=6hr)

But can not use wildcards for name discovery

22

Consumer

NDN

/CMIP5

User wants: /CMIP5/output1/VA/6hr/2016

. . .

73

Name Discovery Challenges

Users may need to discover content/services without knowing a

the full NDN name prefix structure

NDN names are contiguous prefixes

Users may only know a few disjoint name components (e.g.

frequency=6hr)

But can not use wildcards for name discovery

22

Consumer/CMIP5/output/BCC/6hr/1998

NDN

/CMIP5

User wants: /CMIP5/output1/VA/6hr/2016

. . .

74

Name Discovery Challenges

Users may need to discover content/services without knowing a

the full NDN name prefix structure

NDN names are contiguous prefixes

Users may only know a few disjoint name components (e.g.

frequency=6hr)

But can not use wildcards for name discovery

22

Consumer/CMIP5/output/BCC/6hr/1998

NDN

/CMIP5

/CMIP5/output/BCC/6hr (exclude 1998)

User wants: /CMIP5/output1/VA/6hr/2016

. . .

75

Name Discovery Challenges

Users may need to discover content/services without knowing a

the full NDN name prefix structure

NDN names are contiguous prefixes

Users may only know a few disjoint name components (e.g.

frequency=6hr)

But can not use wildcards for name discovery

22

Consumer/CMIP5/output/BCC/6hr/1998

NDN

/CMIP5

/CMIP5/output/BCC/6hr (exclude 1998)

May take too many requests to find desired data or service

User wants: /CMIP5/output1/VA/6hr/2016

. . .

76

NDN Support for Big Science

NDN Names separate data from hosts

Discovery: Names directly translate to network queries

Failover: Network can get verifiable data from anywhere

Retrieval: Data can be fetched from optimal source(s)

Investigate the use of NDN as a platform for scientific data

applications

Understand data management challenges of various scientific domains

Develop prototype applications to leverage NDN's built-in features

Use these applications as case studies to drive NDN research aspects

23

77

Summary

NDN improves scientific data management at scale

Apps benefit from transparent multipath, automatic failover, etc.

Built-in security provides publisher provenance

Names are the common building block for content and services

Names are flexible: can refer to static content or dynamic services

Catalog supports efficient publication, non-contiguous name

discovery

Users can discover content and services with minimal a priori knowledge

Catalog validates publication requests for authorization

24

78

Managing Scientific Data with NDN

Science testbed

10G testbed (courtesy of ESnet,

UCAR, and CSU Research LAN)

Nodes strategically located near

scientific data (climate +HEP)

CC-NIE NSF award

Distributed, synchronized catalog of

names and services

Common functionality: publishing,

discovery, access control, etc.

Search and retrieval UI

Platform for further research and

experimentation

Research questions:

Namespace construction, distributed

publishing, key management, UI design,

failover, etc.

Functional services such as subsetting

Mapping of name-based routing to

tunneling services (VPN, OSCARS, MPLS)

79

Managing Scientific Data with NDN

Science testbed

10G testbed (courtesy of ESnet,

UCAR, and CSU Research LAN)

CMIP5 and HEP data

CC-NIE NSF award

Name-based Internet architecture

Name the data, not the host

All data digitally signed

Unifies and pushes common functionality

to the network: publishing, discovery,

access control, etc.

Data Intensive applications

Automatic pervasive in-network caching,

parallel retrieval, automatic failover

and more

Simpler alternative middleware

implementation e.g., ESGF, xrootd

top related