presentation.ppt - csc.lsu.edubalman/pdfs/prst/mslsu_presentation.pdf · 11/12/2008 3 data deluge...

17
11/12/2008 1 FailureAwareness and Dynamic Adaptation in Data Scheduling Mehmet Balman Mehmet Balman MS Thesis Department of Computer Science Louisiana State University Research Goal “Reliability and Efficiency” for widearea Data Access

Upload: others

Post on 16-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

1

Failure‐Awareness and Dynamic Adaptation in Data Scheduling

Mehmet BalmanMehmet BalmanMS Thesis 

Department of Computer Science

Louisiana State University

Research Goal

“Reliability and Efficiency” for wide‐area Data Access

Page 2: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

2

Outline

• The Data Placement Challenge

• Lessons Learned from Computer Architecture

• Adaptive Data Scheduling

• Failure‐Aware Data Placement

• Conclusion

Large Scale Applications

• Science• Astronomy ‐ SuperNova LSST(Large Synoptic Survey Telescope)Astronomy  SuperNova, LSST(Large Synoptic Survey Telescope) • Biology (bimolecular computing) • Climate research

• High Energy Physics (Cern)

• Business• Credit Card Fraud detection

• (historical data analyze transactions)• (historical data, analyze transactions)

• Data mining for brokerage and customer services

• Oil and electronic design companies 

• (long term batch processes) • Medical institutions 

• (computational network, large image transfers)

Page 3: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

3

Data Deluge

• Scientific and Business applications becoming more data‐intensive

• Huge Computational requirements

• Immense data sets (real time processing of data)

Data‐intensive Computing

• Using Distributed Resources to satisfy i t ti i texcessive computation requirements

• Data to be shared between geographically distributed sites

• Complex workflow characteristics• Complex workflow characteristics• High capacity, fast storage systems

Page 4: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

4

Data Scheduling

• Make data placement a first class citizen 

• Orchestrating data placement jobs

Stork   www.storkproject.org

Data‐Aware System Model

Page 5: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

5

Key Attributes affecting Data Placement Performance

In  Single Host

Between a Pair of Hosts

Multiple Servers to

Between DistributedHost Pair of Hosts Servers to 

Single ServerDistributed Servers

Available Storage SpaceCPU Load and Memory UsageTransfer Protocol PerformanceNumber of ParallelConnectionsConnections

Network Bandwidth and Latency Number of Concurrent OperationsOrdering of Data Placement Tasks

Contribution

• Failure‐Aware Data Placement Paradigm for increased Fault‐Tolerance

• Adaptive Scheduling of Data Placement Tasks

Page 6: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

6

Outline

• The Data Placement Challenge

• Lessons Learned from Computer Architecture

• Adaptive Data Scheduling

• Failure‐Aware Data Placement

• Conclusion

Generic Model

Page 7: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

7

Microprocessor

Operating System

Page 8: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

8

Distributed Systems

Outline

• The Data Placement Challenge

• Lessons Learned from Computer Architecture

• Adaptive Data Scheduling

• Failure‐Aware Data Placement

• Conclusion

Page 9: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

9

Adaptive Scheduling

• Dynamic Parameter Tuning– Parallel Stream

• Aggregate TCP connections

– Concurrent Jobs

• Aggregation of Data Placement JobAggregation of Data Placement Job• Source/Destination pair

Impact of Parallelism

Page 10: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

10

Concurrent Jobs

Dynamic Parameter Setting

• Low integration cost (no external profilers)

• Adapt to changing network conditions

• No high level predictors

• Increase level of parallelism gradually

• Can we set the number of parallel streams while transfer is in progress?

Page 11: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

11

Adaptive Tuning of Parallel Streams

Adaptive Tuning of Parallel Streams

Page 12: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

12

Job Aggregation

• Aggregate data transfer jobs into a single job

• Eliminate the cost of connection for each transfer

• Major performance improvement E i ll i h ll fil– Especially with small files

Job Aggregation

Page 13: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

13

Outline

• The Data Placement Challenge

• Lessons Learned from Computer Architecture

• Adaptive Data Scheduling

• Failure‐Aware Data Placement

• Conclusion

Failure‐Awareness

• Early Error Detection– Network Exploration

• Error Classification and Reporting

• Adapt  to Failures (Retry?)

Page 14: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

14

Error Reporting Framework

Data Transfer Life Cycle

Tracing Data Transfer Operations

Page 15: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

15

Integration

Failure‐Awareness

Page 16: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

16

Outline

• The Data Placement Challenge

• Lessons Learned from Computer Architecture

• Adaptive Data Scheduling

• Failure‐Aware Data Placement

• Conclusion

Conclusion

• An Adaptive Approach for Parameter Tuning

• Early Error Detection and Error Classification

• Failure‐Awareness in Scheduling

• Aggregation of Data Placement Jobs

Page 17: presentation.ppt - csc.lsu.edubalman/pdfs/prst/MSLSU_presentation.pdf · 11/12/2008 3 Data Deluge • Scientific and Business applications becoming more data‐intensive • Huge

11/12/2008

17

Broader Impact

• Stork– http://www.storkproject.org/

• Petashare (petaFS & petaShell)– http://www.petashare.org/

• I/O aggregation • IRODS FUSE and IRODS Parrot clients

– 3‐fold performance increase

• Stork.globus‐url‐copy – Extending globus‐url‐copy 

• New features:– Checkpointing (rescue file for restart)– Network explorations– Checksum verificationAuto Tuning the number of Parallel Streams

Future Research Problems

• Semantic Compression– For better end‐to‐end performance

• Utilizing Replicated Data 

• Distributed Scheduling– Job delegation