![Page 1: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/1.jpg)
HPSS Best PracticesErich Thanhardt
Bill AndersonMarc Genty
B
![Page 2: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/2.jpg)
Overview● Idea is to “Look Under the Hood” of HPSS to
help you better understand Best Practices○ Expose you to concepts, architecture, and tape tech○ Cite Best Practice’s in context along the way○ Talk ends with references to further resources
● Talk is interactive, please ask questions along the way
![Page 3: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/3.jpg)
HPSS - What is it?
● Acronym
● Stands for High Performance Storage System● “HPSS is software that manages petabytes of
data on disk and robotic tape libraries”.■ Quoted from:http://www.hpss-collaboration.org
![Page 4: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/4.jpg)
HPSS - What makes it different?
● Hardware: Use of tape technology is a distinguishing characteristic of HPSS
● Use case: HPSS is an archive and not a (parallel) file system○ system is remote, not cross mounted○ operation set is limited to metadata and file transfers
Best Practice: Be aware what makes HPSS very different than GLADE - intended use
![Page 5: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/5.jpg)
HPSS Main Use Cases
● Archive○ Data is stored and preserved indefinitely
■ While system components come and go■ Model data and observational data collections
● Disaster Recovery○ Leverage dual sites for geographic separation○ Additional level of archival preservation
![Page 6: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/6.jpg)
CLint Interface (CLI)
HPSS Software Architecture
HPSS
HPSS End User
Metadata
DATA
Control
HSI/HTARClient
4x Gateway Servers
Linux/UnixHost
Gateway
AUTH Authentication
![Page 7: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/7.jpg)
HPSS Software Architecture
● Best Practice: Reporting errors via EV ticket○ include: name, host, datetime, -d4 error tracing○ authentication problems○ those pesky parallel file transfer limits
■ your guaranteed on-ramp to the system■ “data bandwidth” allocation■ will be increasing over the next few months
![Page 8: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/8.jpg)
HPSS Software Architecture
● Best Practice: Validating that a file was written○ “ls -l” both locally and on HPSS○ compare pathname and size○ not sufficient to see the pathname (ls)
● Here is what can happen:○ Creating pathname in HPSS happens first○ Then data transfer between client and HPSS○ That transfer can be interrupted
![Page 9: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/9.jpg)
HPSS
Oracle SL8500 Tape Library
NWSCCheyenne
MLCFBoulder
HPSS - One System/Two Sites
Oracle Tape Drives + Media
Disk Cache
ARCHIVE DISASTER RECOVERY
![Page 10: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/10.jpg)
HPSS Libraries - Oracle SL8500
![Page 11: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/11.jpg)
HPSS Tape Libraries Frontal View
ACSLS Server
MLCF
SL8500 Tape Library
![Page 12: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/12.jpg)
HPSS Libraries Top View
Tape Library
![Page 13: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/13.jpg)
HPSS Libraries - Photos
![Page 14: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/14.jpg)
ORACLE DRIVE & MEDIA
![Page 15: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/15.jpg)
Small File Problem● Cost of a random read:
○ Robot retrieval, mount, seek: 70 secs to avg file○ Transfer data rate: 240 MB/sec○ 184 MB file means 99% latency 1% transfer
● Cost of returning tape○ Double it - indirect cost to you○ 368 MB file means 99% latency 1% transfer
● Compare these with avg filesize of 166 MB
![Page 16: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/16.jpg)
Small File Problem● Best Practice: best is to avoid small files, but
where needed - aggregate with htar
![Page 17: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/17.jpg)
File Deletion● Deleting files
○ Deleting data on tape creates unusable spaces on tape because it’s linear and continuous
○ Mischaracterizations and system data migrations● Best Practice - delete un-needed files but also
avoid temporary files (whether rewriting or create/delete’s)
![Page 18: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/18.jpg)
Repeated Reads and Writes● Best Practice: avoid both repeated reads from
and repeated writes to an archive file - bring the file out and park it somewhere else
![Page 19: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/19.jpg)
File Rescue● Adopting orphaned files from others
○ user/proj combo goes invalid after period of time○ someone needs to take ownership and pay storage
costs● Best Practice - never use “cp” to copy data
internally in order to move it if you don’t have proper permissions - open ticket
![Page 20: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/20.jpg)
Optimizing Reads● Best Practice - if you are reading back data at
large scales, contact Helpdesk at [email protected] for ways to order your requests - it can be done!
● Process is not perfect but usually has a positive effect
![Page 21: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/21.jpg)
Disk
Tape
Memory
CPU
Storage Hierarchy Concept
![Page 22: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/22.jpg)
Attributes of Storage Hierarchy● Cost & Characteristics
○ Speed & Capacity○ Persistence & Reliability
■ hardware, RAID/RAIT, dual copy○ Availability
■ online/nearline/offline○ Location
■ onsite/offsite
![Page 23: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/23.jpg)
HPSS Storage Pyramid
Disk
Tape
DISK CACHE
TAPE LIBSROBOTICSDRIVES & MEDIA
![Page 24: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/24.jpg)
Hierarchical Storage Manager (HSM)
DISK
TAPE
Stage Migrate
Purge
![Page 25: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/25.jpg)
User Interaction with HPSS
DISK
TAPE
Stage Migrate
Purge
![Page 26: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/26.jpg)
Basic Stats Jun-Aug 2014● Writes/Reads ratio ~4-5 to 1● User response times
○ ~116 sec/read vs. ~9-10 sec/write○ ratio read/write response times ~ 13 to 1
![Page 27: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/27.jpg)
Tape Technology Upgrades
DISK
TAPE
Stage Migrate
Purge
Migrate
![Page 28: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/28.jpg)
Data Services Pyramid - Workflow
GLADEGPFS
HPSS
PFS
Archive DR
90 GB/sec
9 GB/sec
![Page 29: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/29.jpg)
Workflow - Optimal➔ Create data on GLADE/GPFS➔ Post process (new data plus deletes)➔ Commit data selectively to HPSS➔ Best Practice!
![Page 30: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/30.jpg)
Workflow - Realistic➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Post process (new data)
◆ Commit post-processed data (selectively?) to HPSS
![Page 31: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/31.jpg)
Workflow - To Avoid➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Delete from GLADE/GPFS➔ …. time passes➔ Stage from HPSS back to GLADE/GPFS➔ …. process staged data
![Page 32: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/32.jpg)
Workflow - To Avoid
➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Delete from GLADE/GPFS➔ …. time passes➔ Stage from HPSS back to GLADE/GPFS➔ …. process staged data
BEST PRACTICE - contact [email protected]
![Page 33: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/33.jpg)
Additional Resources ● CISL Support & Allocations
○ Helpdesk & CISL Consulting■ send email to [email protected]
● HPSS Documentation○ http://www2.cisl.ucar.edu/docs/hpss
● Best Practices doc○ http://www2.cisl.ucar.edu/docs/best_practices
![Page 34: HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE](https://reader031.vdocuments.us/reader031/viewer/2022022007/5ad534037f8b9a0d2d8d583d/html5/thumbnails/34.jpg)
The End