1 freeloader: borrowing desktop resources for large transient data vincent freeh 1 xiaosong ma 1,2...
TRANSCRIPT
1
FreeLoader: borrowing desktop resources for large transient data
Vincent Freeh1
Xiaosong Ma1,2
Stephen Scott2
Jonathan Strickland1
Nandan Tammineedi1
Sudharshan Vazhkudai2
1. North Carolina State University 2. Oak Ridge National Laboratory
September, 2004
2
Roadmap
Motivation FreeLoader architecture Design choices Results Future work
3
Motivation: Data Avalanche
More data to process Science, industry,
government
Example: scientific data Better instruments More simulation power Higher resolution
(Picture courtesy: Jim Gray, SLAC Data Management Workshop)
Space TelescopeP&E Gene Sequencer Fromhttp://www.genome.uci.edu/
4
Data acquisition and storage
Data acquisition, reduction, analysis, visualization, storage
Data Acquisition System
Remote userswith local computing and storage
Remote storage
Local users
High Speed Network
Metadata
rawdata
Remote users
Supercomputers
5
Remote Data Sources
Data serving at supercomputing sites Shared file systems – GPFS Archiving systems - HPSS
Data centers Expensive, high-end solutions with guaranteed
capacity and access rates Tools used in access
FTP, GridFTP Grid file systems Customized data migration program Web browser
6
User perspective
End user typically processes data locally Convenience and control Better CPU/memory configurations Problem 1: needs local space to hold data Problem 2: getting data from remote sources is slow
Central point of failure High contention for resource, multiple incoming
requests – availability is hit Dataset characteristics
Write-once, read-many access patterns Raw data often discarded Shared interest to same data among groups Primary copy archived elsewhere Squirrel – P2P web cache
7
Harnessing idle disk storage
Harnessing storage resources of individual workstations ~ Harnessing idle CPU cycles
LAN environments desktops with 100Mbps or Gbps connectivity Increasing hard disk capacities Increasing % of total is unused – 50% and upwards
Even with contribution << available - impressive aggregate storage
Increasing numbers of workstations are online most of the time
Access locality, aggregate I/O and network bandwidth, data sharing
8
Use Cases
FreeLoader storage cloud as a: Cache Local, client-side scratch Intermediate hop Grid replica
9
Intended Role of FreeLoader
What the scavenged storage “is not”: Not a replacement to high-end storage Not a file system Not intended for integrating resources at wide-area scale Does not emphasize replica discovery, routing protocol and
consistency like P2P storage systems What it “is”:
Low-cost, best-effort alternative to remote high-end storage Intended to facilitate
transient access to large, read-only datasets data sharing within administrative domain
To be used in conjunction with higher-end storage systems
10
FreeLoader Architecture
Pool nMorsel Access, Data Integrity, Non-invasiveness
Management LayerData Placement, Replication, Grid Awareness,
Metadata Management
Management LayerData Placement, Replication, Grid Awareness,
Metadata Management
Pool A
Registration
Storage Layer
Pool m
Registration
Grid Data Access ToolsGrid Data Access Tools
11
Storage Layer
Donors/Benefactors: Morsels as a unit of contribution Basic morsel operations [new(), free(), get(), put()…] Space Reclaim:
User withdrawal / space shrinkage Data Integrity through checksums Performance history per benefactor
Pools: Benefactor registrations (soft state) Dataset distributions Proximity and performance characteristics
dataset 1: 1 2 3
dataset n: 1a 2a 3a 4a
2a1a
21
4a3a
23
2a1a
3a1
12
Management Layer
Manager: Pool registrations Metadata: datasets-to-pools; pools-to-
benefactors, etc. Availability:
Redundant Array of Replicated Morsels Minimum replication factor for morsels Where to replicate? Which morsel replica to choose?
Clients are oblivious to metadata – all metadata requests are sent to manager
Cache replacement policy
13
Dataset Striping
Stripe datasets across benefactors Morsel doubles as basic unit of striping Manager decides the allocation of data blocks to
morsels across benefactors Multiple-fold benefits
Higher aggregate access bandwidth Lowering impact per benefactor Load balancing
Greedy algorithm to make best use of available space
Stripe width and Stripe size can be varied as striping parameters
14
Client interface
Obtains metadata from the manager Performs gets or puts directly to the benefactors All control messages are exchanged via UDP All data transfers – TCP Morsel requests are sent to benefactors in
parallel, striping strategy ensures these blocks are contiguous
Efficient buffering strategy : Buffer pool of size (stripesize+1)*stripewidth Double buffering scheme
Allows network and I/O to proceed in parallel After pool is filled up, buffer contents are flushed to
disk Reduces disk seeks, waits for filled buffer contents to
form contiguous blocks before writing to disk
15
Current Status
Application
Client
Manager
Benefactor
OS
Benefactor
OS
I/O interface
UDP (A)
UDP (C)
UDP/TCP (B)
reserve() cancel()store() retrieve() delete() open() close()read() write()
new() free() get() put()
(A) services: Dataset
creation/deletion Space reservation
(B) services: Dataset retrieval Hints
(C) services: Registration Benefactor alerts,
warnings, alarms to manager
(D) services: Dataset store Morsel request
UDP/TCP (D)
Simple data striping
16
Results: Experiment Setup
FreeLoader prototype running at ORNL Client Box
AMD Athlon 700MHz 400MB memory Gig-E card Linux 2.4.20-8
Benefactors Group of heterogeneous Linux workstations Contributing 7GB-30GB each 100Mb cards
17
Data Sources
Local GPFS Attached to ORNL SCs Accessed through GridFTP 1MB TCP buffer, 4 parallel streams
Local HPSS Accessed through HSI client, highly optimized Hot: data in disk cache without tape unloading Cold: data purged, retrieval done in large intervals
Remote NFS At NCSU HPC center Accessed through GridFTP 1MB TCP buffer, 4 parallel streams
FreeLoader 1 MB morsel size for all experiments Varying configurations
18
Testbed
19
Best of class performance comparisons
Th
roughput
(MB
/s)
20
Effect of stripe width variation ( stripe size=1 morsel)
21
Effect of stripe width variation ( stripe size=8 morsels)
22
Effect of stripe size variation ( stripe width=4 benefactors)
23
Impact Tests
How uncomfortable do the donors feel When running CPU intensive tasks? Disk intensive tasks? Network intensive?
A set of tests at NCSU Benefactor performing local tasks Client retrieving datasets at a given rate
Rate is varied to study the impact on user Pentium 4, 512MB memory, 100Mbps connectivity
24
CPU-intensive and MixedTim
e (
s)
25
Network-intensive Task
Norm
alize
d D
ow
nlo
ad
Tim
e
26
Disk-intensive Task
Impact on I/O performance
0
10
20
30
40
50
60
0 1 2 3 4 5 6 7
Request rate (MB/s)
write
read
Th
roughput
(MB
/s)
27
Sample application - formatdb
Subset of basic file APIs implemented formatdb (NCBI) BLAST toolkit – preprocesses
biological sequence database to create set of sequence and index files
Raw database is ideal candidate for caching on FreeLoader
formatdb not the ideal application for FreeLoader
Local NFS Benefactors
Time(sec)
1 2 4
598 585 599 563 556
28
Significant results
29
Significant results – contd.
2x and 4x speedup wrt GPFS and HPSS Management overhead is minimal 14% worst case performance hit for CPU
intensive <= 25% for network intensive tasks formatdb – tests upper bound of FreeLoader’s
internal overhead Same as local for 1 benefactor, 2 % slower than NFS 5% faster than NFS for 4 benefactors
10 MB/s performance gain for each benefactor added until saturation
30
Conclusions
Goal is to achieve saturation from the client side Striping helps achieve this
Low cost commodity parts Harnessing idle disk bandwidth Low impact on donor, controlled by throttling
request rate Better availability, more suitable for large
transient data sets than regular FS
31
In-progress and Future Work
In-progress Windows support
Future Complete pool structure, registration Intelligent data distribution, service profiling Benefactor impact control, self-configuration Naming and replication Grid awareness
Potential extensions Harnessing local storage at cluster nodes? Complementing commercial storage servers?
32
Further Information
http://www.csm.ornl.gov/~vazhkuda/Morsels/
33