tape access optimization with treqs
TRANSCRIPT
Tape Access Optimization with TReqSFaster, Better, Stronger
Andrés Gómez Casanova
Storage Team
IN2P3 Computing Center / CNRS
18/03/2010
1TReqS - IN2P3 Computing Center
Outline
Context Problem Solution Results and numbers Conclusion
18/03/2010TReqS - IN2P3 Computing Center 2
Outline
Context Problem Solution Results and numbers Conclusion
18/03/2010TReqS - IN2P3 Computing Center 3
Context: Hierarchy
HPSS is used as a backend system.
dCache and xRootD are the frontend systems.
They communicate with HPSS via RFIO.– UNIX-like commands (rfdir,
rfcp, rfchmod…)– One file transfer = One rfcp
18/03/2010TReqS - IN2P3 Computing Center 4
dCache xRootD
HPSS
RFIO
Frontend
Backend
Users
Context: Users and data
LHC experiments are important clients. Massive readings.
– Staging campaigns of several thousands of files.– Reads at a rate of 400 MB/s.
Experiments share resources (tapes, movers, libraries) Readings from clients cannot be throttled. Data is spread across multiples tapes. Simultaneous massive readings from different clients are
possible .– This has a big impact in HPSS.
18/03/2010TReqS - IN2P3 Computing Center 5
Context: HPSS and tapes
A tape could take 90 seconds to mount into a drive.
The positioning of a tape can take 60 seconds.
HPSS processes the requests in a FIFO fashion.
18/03/2010TReqS - IN2P3 Computing Center 6
Context
N requests of different files in M tapes could imply N mountings.– Where M << N.
N requests of different files in one tape could imply N positioning.– Files are read in inversed order.
18/03/2010TReqS - IN2P3 Computing Center 7
Context: Files in multiple tapes
18/03/2010TReqS - IN2P3 Computing Center 8
Tape 1
File 1 File 2
File 3
Tape 2
File 5
File 6
Tape 3
File 7 File 8
File 9
Tape 4
File 10
Requests:• File 5• File 3• File 9• File 1• File 6
DriveTape 2
File 5
File 6
Tape 1
File 1 File 2
File 3
Tape 3
File 7 File 8
File 9
Context: Files in multiple tapes
18/03/2010TReqS - IN2P3 Computing Center 9
Tape 1
File 1 File 2
File 3
Tape 2
File 5
File 6
Tape 3
File 7 File 8
File 9
Tape 4
File 10
Requests:• File 5• File 3• File 9• File 1• File 6
DriveTape 1
File 1 File 2
File 3
Tape 2
File 5
File 6
Context: Files within one tape
18/03/2010TReqS - IN2P3 Computing Center 10
File 1
File 2
File 3
File 4
File 5
File 6
File 7
File 8
File 9
Tape’s beginning Tape’s ending
Head
Requests:• File 5• File 3• File 9• File 1• File 6
Outline
Context Problem Solution Results and numbers Conclusion
18/03/2010TReqS - IN2P3 Computing Center 11
Problem
The data delivery is slow. The required transfer rate could not be reached if
readings are not controlled. A “grid job” that reads several files could have a
longer wait time when requesting a file. A tape can be mounted several times in a short
period of time. A tape can be forwarded and rewound several
times.
18/03/2010TReqS - IN2P3 Computing Center 12
Outline
Context Problem Solution Results and numbers Conclusion
18/03/2010TReqS - IN2P3 Computing Center 13
Attempted manual solution
Ask the list of files to stage. Read the HPSS metadata and sort the files
according to:– Which tape they are stored on.– The position in the tape.
Stage them manually.– hpss_cache command.
This takes a lot of manual work and it is very slow.
18/03/2010TReqS - IN2P3 Computing Center 14
Desired solution
A software component that sorts read requests by querying the files’ metadata in HPSS; a software component with a Client/Server architecture.
18/03/2010TReqS - IN2P3 Computing Center 15
State of the art
BNL approach tried to solve this problem.– Client / Server like architecture approach.– Requests are stored in files.– Written by David Yu, in C language.– Based on Oak Ridge Batch Scheduler.
It was adapted for the needs of CC IN2P3.– In production since August 2009 for dCache.
However, it does not achieve our site requirements.
18/03/2010TReqS - IN2P3 Computing Center 16
Proposed solution
A software called TReqS.– Tape Request Scheduler.
Robust.– Requests are stored in a database.
Multi-client and scalable for thousands of clients.
18/03/2010TReqS - IN2P3 Computing Center 17
TReqS: Features
Final user has a comprehensible error message when there is a problem with the requests.
Generates metrics for monitoring and accounting.
Modular client:– It wraps the RFIO rfcp command.– Can use any other file transfer mechanism.
18/03/2010TReqS - IN2P3 Computing Center 18
TReqS: New Hierarchy
18/03/2010TReqS - IN2P3 Computing Center 19
dCache xRootD
HPSS
TReqS
Frontend
Backend
RFIORFIO
TReqS: How this stuff works
18/03/2010TReqS - IN2P3 Computing Center 20
Fro
nten
d
TReqS
HPSSRFIO
Server
DB22
3
4
6
Clie
nt
MySQL
1
5
TReqS: Terminology
FileRequest:– File to be transferred.
Queue:– Ordered set of files to stage in the same tape.
Activator – Processes the queues at specified interval
times.
18/03/2010TReqS - IN2P3 Computing Center 21
TReqS
Permits to reserve resources (drives) per client.– Minimal resource per client, when there is a lot of
simultaneous requests. Algorithms have several criteria to select the
files to stage:– “Best User”
• User with more files / more space used in a tape.– “Best Queue” (Tape read)
• Most quantity of files.• Largest files.• Oldest request.
18/03/2010TReqS - IN2P3 Computing Center 22
TReqS tunning
Each component is an independent class. The algorithms of the classes can be
changed in order to improve the operation. All parameters could be modified
dynamically in configuration file.
18/03/2010TReqS - IN2P3 Computing Center 23
TReqS: Technical details
Written in C++ (OOP.) Transactional database (MySQL.) Model Drive Engineering (MDE.) Several unit tests (CUTE.) Log4cxx for logging. Doxygen for the documentation. Open Source license
– GPL like license (CeCILL) Git as Distributed Version Control System.
– Public access from: https://git.in2p3.fr/cgit/treqs
18/03/2010TReqS - IN2P3 Computing Center 24
Outline
Context Problem Solution Results and numbers Conclusion
18/03/2010TReqS - IN2P3 Computing Center 25
Results
Reduce mounting time. Less “waiting time” for batch jobs in
computing farm. HPSS can handle more requests
simultaneously. Reduce robotic issues.
– Less mounts for the same batch process.
18/03/2010TReqS - IN2P3 Computing Center 26
Roadmap
First BNL adaptation was deployed in production in July 2009.– Several bugs fixed.– Real time monitoring.
TReqS version 1.0.– Will be deployed shortly in April.– Tests are being finished.
18/03/2010TReqS - IN2P3 Computing Center 27
Requiriments
Hardware:– A virtual machine with at least 512 MB RAM.– Access to a MySQL database.– HPSS API.
18/03/2010TReqS - IN2P3 Computing Center 28
Outline
Context Problem Solution Results and numbers Conclusion
18/03/2010TReqS - IN2P3 Computing Center 29
Conclusion
TReqS has improved the way HPSS is used.
It reduces the redundant tape mounts, and lowers the seeking time in tapes.
Provides a file access control per experiment.
Forge:– https://forge.in2p3.fr/projects/show/treqs
18/03/2010TReqS - IN2P3 Computing Center 30
Thanks for your attention
Questions?
18/03/2010
31
TReqS - IN2P3 Computing Center
https://forge.in2p3.fr/projects/show/treqs