active storage processing in parallel file systems jarek nieplocha evan felix

12
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER

Upload: lamont

Post on 07-Jan-2016

26 views

Category:

Documents


4 download

DESCRIPTION

SDM CENTER. Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas. Active Storage in Parallel Filesystems. Active Storage exploits the old concept of moving computing to the data source - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

Active Storage Processing in Parallel File Systems

Jarek Nieplocha Evan Felix

Juan Piernas-Canovas

SDMCENTER

Page 2: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

2

Active Storage in Parallel FilesystemsActive Storage in Parallel FilesystemsActive Storage in Parallel FilesystemsActive Storage in Parallel Filesystems

Active Storage exploits the old concept of moving computing to the data source

Avoids data movement across the network in parallel machine by allowing applications use compute resources on the I/O nodes of the cluster for data processing

P

P

P

P Net

wor

k

FS

FS

computenodes

I/O nodes

Y=foo(X)

x

Y

P

P

P

P Net

wor

k

FS

FS

computenodes

I/O nodes

Y=foo(X)

Active StorageTraditional Approach

Page 3: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

3

ExampleExampleExampleExample

BLAS DSCAL on diskY = α .Y

Experiment Traditional: The input file is read from

filesystem, and the output file is written to the same file system. The input file has 120,586,240 doubles.

Active Storage: Each server receives the factor, reads the array of doubles from its disk locally, and stores the resulting array on the same disk. Each server processes 120,586,240/N doubles, where N is the number of servers

Speedup contributed to avoiding data movement between client and servers

DSCAL

0

5

10

15

20

25

30

35

0 10 20 30 40

Number of OSTs

Sp

eed

-up

Page 4: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

4

Related WorkRelated WorkRelated WorkRelated Work

Active Disk/Storage concept was introduced a decade ago to use Processing resources ‘Near’ the disk On the Disk Controller. On Processors connected to disks. Reduce network bandwidth/latency limitations.

References DiskOS Stream Based model (ASPLOS’98: Acharya, Uysal, Saltz) Active Storage For Large-Scale Data Mining and Multimedia (VLDB

’98: Riedel, Gibson, Faloutsos)

Research proved Active Disk idea interesting, but Difficult to take advantage of in practice Processors in disk controllers not designed for the purpose Vendors have not been providing SDK

Y=foo(X)

Page 5: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

5

Lustre ArchitectureLustre ArchitectureLustre ArchitectureLustre Architecture

Client

OSTMDSMDSMDSMDS

O(10)OSTOSTOSTOSTOSTOSTOST

OSTOSTOSTOSTOSTOSTOSTOSTOST OSTO(1000)

O(10000)

NALNAL

Directory Metadata& concurrencyFile IO

& Locking

Recovery, File Status,File Creation

Page 6: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

6

Lustre ClientLustre ClientLustre ClientLustre Client

OSC

NALNAL

LLITE

LOV

Application

User Space

•Application IO requests

•LLITE module implements Linux VFS layer

•LOV stripes object and targets IO to correct Object Client

•OSC packages up request for transmission over the NAL

Page 7: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

7

Lustre Object Storage ServerLustre Object Storage ServerLustre Object Storage ServerLustre Object Storage Server

OST

OBDfilter

ext3

NAL NAL

•Requests arrive from Portals NAL

•Object Storage Target directs Request to appropriate lower level OBD

•OBDfilter presents ext3 as Object Based Disk

Page 8: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

8

Current Implementation of Active StorageCurrent Implementation of Active StorageCurrent Implementation of Active StorageCurrent Implementation of Active Storage

OST

OBDfilter

ext3

ASOBD ASDEV

ProcessingComponent

User Space

•Extra Module passes data, until told to pipe data elsewhere•Data is sent to user space process through Unix Character Device File.•Processed Data is written back to disk•Pattern: 1W->2W

NAL

NAL

Page 9: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

10

SC’2004 StorCloud Most Innovative Use AwardSC’2004 StorCloud Most Innovative Use AwardSC’2004 StorCloud Most Innovative Use AwardSC’2004 StorCloud Most Innovative Use Award

Sustained 4GB/sActive Storage write processing

320 TB Lustre984 400GB disks•40 Lustre OSS's running Active Storage

•4 Logical Disks (160 OST’s)•2 Xeon Processors

•1 MDS•1 Client creating files

LustreOST

Client System

Gigabit Network

LustreOSS 39

LustreOST

LustreOST

LustreMDS

LustreOSS 0

LustreOSS 38

Page 10: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

11

Real Time VisualizationReal Time VisualizationReal Time VisualizationReal Time Visualization

Page 11: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

12

Active Storage Processing PatternsActive Storage Processing PatternsActive Storage Processing PatternsActive Storage Processing Patterns

Pattern Description1W->2W Data will be written to the original raw file. A new file will be created

that will receive the data after it has been sent out to a processing component.

1W->1W Data will be processed then written to the original file

1R->1W Data that was previously stored on the OBD can be re-processed into a new file.

1W->0 Data will be written to the original file, and also passed out to a processing component. There is no return path for data, the processing component will do 'something' with the data.

1R->0 Data that was previously stored on the OBD is read and sent to a processing component. There is no return path

1W->#W Data is read from one file and processed, but there may be many files that are output from

#W->1W There are many inputs from various files being written as outputs from the processing component.

1R->1R Data is read from a file on disk, sent to a processing component, then the output is sent to the reading process.

Page 12: Active Storage Processing in Parallel File Systems Jarek Nieplocha       Evan Felix

13

Status and Future WorkStatus and Future WorkStatus and Future WorkStatus and Future Work

Status Proof of concept 1W->2W code works now Difficult to administer and use – 2 people Memory copies between user and kernel space

Future Implement other processing patterns Optimize performance by eliminating memory copies Implement Active Storage for PVFS Support different striping in files HDF, NetCDF, Database Stored Procedure Calls ….