future file systems

14
The Future of Filesystems on Linux Stephen Tweedie Kernel Engineer

Upload: kgiyer

Post on 11-Dec-2015

217 views

Category:

Documents


1 download

DESCRIPTION

Technical

TRANSCRIPT

The Future of Filesystems on Linux

Stephen TweedieKernel Engineer

 What we'll be coveringWhat's in a filesystem?Recent activity in Open Source filesystemsThe consequences for the RHEL userFuture expectations?

So, what's in a filesystem? What we'll be looking at in this talk:

● Local disk filesystems● Clustered filesystems● Not distributed filesystems

Related technologies:● Storage infrastructure● Cluster infrastructure

Open Source FilesystemsAn overview of the “state of the art” Who is involved in development?

● The traditional Linux hobbyist● Dedicated Linux companies/organisations

● Commercial● OSDL

● Partner companies● Government organisations

Who is involved in testing?● The Linux kernel community● Cutting edge distributions● Alpha / Beta cycles for RHEL

Open Source Filesystems: New Core FeaturesThe Linux­2.6 kernel brings: Security

● Extended attributes: ACLs, SELinux● NFS security extensions

Scalability● New IO subsystem:

● Very large (>2TB) filesystems● Much­improved SCSI layer

● New locking infrastructures for massive SMP scalability● Core “RCU” lockless data structures● Ext3 locking and fragmentation improvements

Manageability● LVM2­based virtual driver stack● Filesystem online resize

Open Source Filesystems: Add­onsIt's not just the core kernel distribution that is progressing. Reiserfs4 local filesystem Clustered filesystems outside the core kernel:

● Sistina / Red Hat's GFS ● CFS's Lustre● Oracle's OCFS/OCFS2

Leads to a very different storage model from traditional local­disk filesystems Ideal for SANs LVM2­based cluster LVM under development

So what's new for the RHEL ext3 user?Ext3 features at a glance:

Backwards compatibility is guaranteed, of course! Forwards compatibility at the kernel level RHEL­3 e2fsck will not understand all RHEL­4 features Undesired features can be removed at will

2.4 kernel RHEL-3 RHEL-4

Functionality:

Max. Filesystem size 1TB 2TB (U5) 8TB (U1)

Max. File Size 2TB 2TB 2TB

Extended Attributes No Yes Yes

POSIX ACLs No Yes Yes

SELinux labels No No Yes

Online Resize No No Yes

Performance:

Tree-based Directories No No Yes

Reservations No No Yes

SMP scaling improvements No No Yes

Ext3 performance: “htree” directory indexingC

reat

e 10

00

Cre

ate

1000

0

Cre

ate

1000

00

Cre

ate

1000

000

ls 1

000

ls 1

0000

ls 1

0000

0

ls 1

0000

00

Del

ete

1000

Del

ete

1000

0

Del

ete

1000

00

Del

ete

1000

000

CP

U(c

reat

e) 1

000

CP

U(c

reat

e) 1

0000

CP

U(c

reat

e) 1

0000

0

CP

U(c

reat

e) 1

0000

00

0

0.01

0.1

1

10

100

1000

10000

100000

Sequential directory performance

No htree

Htree

Tim

e (s

econ

ds)

Ext3 performance: reservations

1 2 4 80

10

20

30

40

50

60

70

80

90

100

ext3 read performance

reads, no reservations

reads with reservations

Number of threads

Meg

abyt

es/s

econ

d

1 2 4 80

10

20

30

40

50

60

70

80

90

100

ext3 write performance

writes, no reservations

writes with reservations

Number of threads

Meg

abyt

es/s

econ

d

The future for ext3There's still lots to look forward to! Larger on­disk “inodes” (file structures):

● Extended attributes in the inode● Larger/more­fine­grained metadata: timestamps, block/link counts etc.

Extent maps:● Far more efficient mapping of really large files● 48­bit block pointers (1Exabyte filesystem size)

● (But backup and fsck can become slow!) Performance improvements:

● Background deletes● Deferred deallocations● multi­page operations

RHEL users and GFSGFS features at a glance:

All open source: http://sources.redhat.com/cluster/ Packaged and included in Fedora Core 4

ext3 GFS GFS2Supports internal disks Yes No (1) No (1)

Yes Yes Yes

No Yes Yes

No Yes Yes

No Yes YesOnline resize Yes (RHEL4) Yes YesExtended attributes / ACLs Yes Yes YesSELinux attributes Yes (RHEL4) No Yes“ Ordered data” integrity Yes No YesStatic inode placement Yes Yes NoMax filesystem size 8TB (RHEL4) 8 Exabyte (3) 8 Exabyte (3)Notes:(1) Sharing internal disks is possible via gnbd, but introduces SPOF(2) Cluster infrastructure includes: lock manager, membership/connection manager, fencing agents(3) Theoretical maximum, untested! Limited to 16TB on 32-bit platforms

Supports external disks (FC/iSCSI/SAN)Requires cluster infrastructure (2)Coordinates multiple concurrent mountsIncurs cluster locking overhead

The future for GFSWork still going on with both GFS and GFS2: 2.6 (RHEL4) port of GFS GFS and GFS2 to be able to use Distributed Lock Manager (DLM) GFS2 features:

● Online shrink; defragment (planned)● Ordered data mode● Performance improvements:

● Fuzzy “df” statfs● Faster directory scans, synchronous IO

● SELinux attributes

Ongoing...We've seen some common themes: Performance, performance, performance Scaling up:

● Large SMP systems● Large filesystems

Big business contributing to ongoing development Advanced cluster support Compatibility/migration

Some of the challenges: “lost” projects like InterMezzo Harder and harder for hobbyists to do proper testing Limited real time support for now

Development model is proving extremely scalable and sustainable!

Q&A