tape access optimization with treqs

31
Tape Access Optimization with TReqS Faster, Better, Stronger Andrés Gómez Casanova Storage Team IN2P3 Computing Center / CNRS 18/03/2010 1 TReqS - IN2P3 Computing Center

Upload: andres-gomez-casanova

Post on 08-Jun-2015

305 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Tape Access Optimization With TReqS

Tape Access Optimization with TReqSFaster, Better, Stronger

Andrés Gómez Casanova

Storage Team

IN2P3 Computing Center / CNRS

18/03/2010

1TReqS - IN2P3 Computing Center

Page 2: Tape Access Optimization With TReqS

Outline

Context Problem Solution Results and numbers Conclusion

18/03/2010TReqS - IN2P3 Computing Center 2

Page 3: Tape Access Optimization With TReqS

Outline

Context Problem Solution Results and numbers Conclusion

18/03/2010TReqS - IN2P3 Computing Center 3

Page 4: Tape Access Optimization With TReqS

Context: Hierarchy

HPSS is used as a backend system.

dCache and xRootD are the frontend systems.

They communicate with HPSS via RFIO.– UNIX-like commands (rfdir,

rfcp, rfchmod…)– One file transfer = One rfcp

18/03/2010TReqS - IN2P3 Computing Center 4

dCache xRootD

HPSS

RFIO

Frontend

Backend

Users

Page 5: Tape Access Optimization With TReqS

Context: Users and data

LHC experiments are important clients. Massive readings.

– Staging campaigns of several thousands of files.– Reads at a rate of 400 MB/s.

Experiments share resources (tapes, movers, libraries) Readings from clients cannot be throttled. Data is spread across multiples tapes. Simultaneous massive readings from different clients are

possible .– This has a big impact in HPSS.

18/03/2010TReqS - IN2P3 Computing Center 5

Page 6: Tape Access Optimization With TReqS

Context: HPSS and tapes

A tape could take 90 seconds to mount into a drive.

The positioning of a tape can take 60 seconds.

HPSS processes the requests in a FIFO fashion.

18/03/2010TReqS - IN2P3 Computing Center 6

Page 7: Tape Access Optimization With TReqS

Context

N requests of different files in M tapes could imply N mountings.– Where M << N.

N requests of different files in one tape could imply N positioning.– Files are read in inversed order.

18/03/2010TReqS - IN2P3 Computing Center 7

Page 8: Tape Access Optimization With TReqS

Context: Files in multiple tapes

18/03/2010TReqS - IN2P3 Computing Center 8

Tape 1

File 1 File 2

File 3

Tape 2

File 5

File 6

Tape 3

File 7 File 8

File 9

Tape 4

File 10

Requests:• File 5• File 3• File 9• File 1• File 6

DriveTape 2

File 5

File 6

Tape 1

File 1 File 2

File 3

Tape 3

File 7 File 8

File 9

Page 9: Tape Access Optimization With TReqS

Context: Files in multiple tapes

18/03/2010TReqS - IN2P3 Computing Center 9

Tape 1

File 1 File 2

File 3

Tape 2

File 5

File 6

Tape 3

File 7 File 8

File 9

Tape 4

File 10

Requests:• File 5• File 3• File 9• File 1• File 6

DriveTape 1

File 1 File 2

File 3

Tape 2

File 5

File 6

Page 10: Tape Access Optimization With TReqS

Context: Files within one tape

18/03/2010TReqS - IN2P3 Computing Center 10

File 1

File 2

File 3

File 4

File 5

File 6

File 7

File 8

File 9

Tape’s beginning Tape’s ending

Head

Requests:• File 5• File 3• File 9• File 1• File 6

Page 11: Tape Access Optimization With TReqS

Outline

Context Problem Solution Results and numbers Conclusion

18/03/2010TReqS - IN2P3 Computing Center 11

Page 12: Tape Access Optimization With TReqS

Problem

The data delivery is slow. The required transfer rate could not be reached if

readings are not controlled. A “grid job” that reads several files could have a

longer wait time when requesting a file. A tape can be mounted several times in a short

period of time. A tape can be forwarded and rewound several

times.

18/03/2010TReqS - IN2P3 Computing Center 12

Page 13: Tape Access Optimization With TReqS

Outline

Context Problem Solution Results and numbers Conclusion

18/03/2010TReqS - IN2P3 Computing Center 13

Page 14: Tape Access Optimization With TReqS

Attempted manual solution

Ask the list of files to stage. Read the HPSS metadata and sort the files

according to:– Which tape they are stored on.– The position in the tape.

Stage them manually.– hpss_cache command.

This takes a lot of manual work and it is very slow.

18/03/2010TReqS - IN2P3 Computing Center 14

Page 15: Tape Access Optimization With TReqS

Desired solution

A software component that sorts read requests by querying the files’ metadata in HPSS; a software component with a Client/Server architecture.

18/03/2010TReqS - IN2P3 Computing Center 15

Page 16: Tape Access Optimization With TReqS

State of the art

BNL approach tried to solve this problem.– Client / Server like architecture approach.– Requests are stored in files.– Written by David Yu, in C language.– Based on Oak Ridge Batch Scheduler.

It was adapted for the needs of CC IN2P3.– In production since August 2009 for dCache.

However, it does not achieve our site requirements.

18/03/2010TReqS - IN2P3 Computing Center 16

Page 17: Tape Access Optimization With TReqS

Proposed solution

A software called TReqS.– Tape Request Scheduler.

Robust.– Requests are stored in a database.

Multi-client and scalable for thousands of clients.

18/03/2010TReqS - IN2P3 Computing Center 17

Page 18: Tape Access Optimization With TReqS

TReqS: Features

Final user has a comprehensible error message when there is a problem with the requests.

Generates metrics for monitoring and accounting.

Modular client:– It wraps the RFIO rfcp command.– Can use any other file transfer mechanism.

18/03/2010TReqS - IN2P3 Computing Center 18

Page 19: Tape Access Optimization With TReqS

TReqS: New Hierarchy

18/03/2010TReqS - IN2P3 Computing Center 19

dCache xRootD

HPSS

TReqS

Frontend

Backend

RFIORFIO

Page 20: Tape Access Optimization With TReqS

TReqS: How this stuff works

18/03/2010TReqS - IN2P3 Computing Center 20

Fro

nten

d

TReqS

HPSSRFIO

Server

DB22

3

4

6

Clie

nt

MySQL

1

5

Page 21: Tape Access Optimization With TReqS

TReqS: Terminology

FileRequest:– File to be transferred.

Queue:– Ordered set of files to stage in the same tape.

Activator – Processes the queues at specified interval

times.

18/03/2010TReqS - IN2P3 Computing Center 21

Page 22: Tape Access Optimization With TReqS

TReqS

Permits to reserve resources (drives) per client.– Minimal resource per client, when there is a lot of

simultaneous requests. Algorithms have several criteria to select the

files to stage:– “Best User”

• User with more files / more space used in a tape.– “Best Queue” (Tape read)

• Most quantity of files.• Largest files.• Oldest request.

18/03/2010TReqS - IN2P3 Computing Center 22

Page 23: Tape Access Optimization With TReqS

TReqS tunning

Each component is an independent class. The algorithms of the classes can be

changed in order to improve the operation. All parameters could be modified

dynamically in configuration file.

18/03/2010TReqS - IN2P3 Computing Center 23

Page 24: Tape Access Optimization With TReqS

TReqS: Technical details

Written in C++ (OOP.) Transactional database (MySQL.) Model Drive Engineering (MDE.) Several unit tests (CUTE.) Log4cxx for logging. Doxygen for the documentation. Open Source license

– GPL like license (CeCILL) Git as Distributed Version Control System.

– Public access from: https://git.in2p3.fr/cgit/treqs

18/03/2010TReqS - IN2P3 Computing Center 24

Page 25: Tape Access Optimization With TReqS

Outline

Context Problem Solution Results and numbers Conclusion

18/03/2010TReqS - IN2P3 Computing Center 25

Page 26: Tape Access Optimization With TReqS

Results

Reduce mounting time. Less “waiting time” for batch jobs in

computing farm. HPSS can handle more requests

simultaneously. Reduce robotic issues.

– Less mounts for the same batch process.

18/03/2010TReqS - IN2P3 Computing Center 26

Page 27: Tape Access Optimization With TReqS

Roadmap

First BNL adaptation was deployed in production in July 2009.– Several bugs fixed.– Real time monitoring.

TReqS version 1.0.– Will be deployed shortly in April.– Tests are being finished.

18/03/2010TReqS - IN2P3 Computing Center 27

Page 28: Tape Access Optimization With TReqS

Requiriments

Hardware:– A virtual machine with at least 512 MB RAM.– Access to a MySQL database.– HPSS API.

18/03/2010TReqS - IN2P3 Computing Center 28

Page 29: Tape Access Optimization With TReqS

Outline

Context Problem Solution Results and numbers Conclusion

18/03/2010TReqS - IN2P3 Computing Center 29

Page 30: Tape Access Optimization With TReqS

Conclusion

TReqS has improved the way HPSS is used.

It reduces the redundant tape mounts, and lowers the seeking time in tapes.

Provides a file access control per experiment.

Forge:– https://forge.in2p3.fr/projects/show/treqs

18/03/2010TReqS - IN2P3 Computing Center 30

Page 31: Tape Access Optimization With TReqS

Thanks for your attention

Questions?

18/03/2010

31

TReqS - IN2P3 Computing Center

https://forge.in2p3.fr/projects/show/treqs