file system concepts - androbenchcsl.skku.edu/uploads/swe2015-41/swe2015s16fs_concept.pdf · file...

Post on 02-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

File system concepts

File system concepts

• Ease of searching a specific data

– File to group data: variable size, naming

– Directory to group files

File data

DirectoryFile name, file offset File name, file offset

File data

Unix file systems history

Unix file system(System V, 1974)

Berkeley fast file system(BSD 4.2, 1984)

Extended file system(Linux, 1992)

Log-structured file system (1991)

Minix file system(Minix, 1987)

Ext4 file system(2008)

XFS (IRIX, 1994)Journaling file system

(OS/2, 1999)

BTRFS(2009)

Ext2 file system(1993)

Ext3 file system(2001)

1970

1980

1990

2000

2010

Journaling file system (AIX, 1990)

Journaling file system (Linux, 2001)

XFS (Linux, 2002)

F2FS(2012)

HFS(1985)

HFS+(1998)

DOS/Windows file systems history

• File Allocation Table

– FAT (8bit, 1977) / FAT12 (1980) / FAT16 (1984)

Target for floppy disk

– HPFS (OS/2, 1989)

– FAT32/VFAT (1996)

– exFAT (2006)

• NTFS

– Since Windows NT 3.1 (1993)

Network/distributed file systems

• Network file systems

– Mount remote file system to local directory

– Network File System

– Server Message Block/CiFS (samba)

– AppleTalk Filing Protocol

• Distributed file systems

– Share storage device to build a large file system

– Andrew File System

– Google file system

– Hadoop file system (HDFS)

File system interfaces

• R. C. Daley, P. G. Neumann, A General-Purpose File System For Secondary Storage, 1965– Defined what a file system is and how it works

– Concepts of user, file, directory, directory hierarchy

– Backup storage and their usage• Incremental backup / weekly full backup recovery

• POSIX [IEEE 1003 / Richard Stallman / 1988]

– Standardized file system interfaces

– Standard I/O API

– Direct I/O API

– Memory mapped I/O API

File system interface : stream I/O

• Buffered and line-by-line I/O interface

• Header: <stdio.h>

• Handler: FILE *f;

• Functions

– fopen, fclose

– fprintf, fscanf

– fgets, fputs

– fread, fwrite

– fseek, ftell

#include <stdio.h>

int main(void)

{

FILE *fp;

char *str;

if ( fp = fopen("main.c", "r") )

{

str = malloc(4096);

while( fgets(str, 4095, fp) )

printf("%s", str);

fclose(fp);

free(str);

}

return 0;

}

File system interface : direct I/O

• Header: <fcntl.h>, <unistd.h>, …

• Handler: int fd;

• Functions

– open, creat, close

– read, write

– lseek, lseek64

– posix_fallocate, posix_fadvise

#include <fcntl.h>

#include <unistd.h>

int main(void)

{

int fd;

void *buf;

if ( (fd = open("main.c", "r")) > 0)

{

buf = malloc(4096);

while( read(fd, buf, 4096) > 0)

write(1, buf, 4096);

close(fd);

free(buf);

}

return 0;

}

File system interface : mmap I/O

• Memory access to read/write a file

• Header: <sys/mman.h>

• Handler: void *ptr;

• Functions

– void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)

– int munmap(void *addr, size_t length)

File system interface : mmap I/O

• Example

#include <fcntl.h>

#include <unistd.h>

#include <sys/mman.h>

int main(void)

{

int fd, length;

void *buf;

if ( (fd = open("main.c", "r")) > 0)

{

length = lseek(fd, 0, SEEK_END);

buf = mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0);

write(1, buf, length);

munmap(buf, length);

close(fd);

}

return 0;

}

Stream I/O illustrated

Application

VFS

Page cache

libc

fopen

open

sys_open

Hello, Guys

fgets

read

sys_read

fgets

Hello, Guys

Hello,

fclose

close

fprintf

Hello, World

write

World

fflush

sys_write sys_close

Memory mapped I/O illustrated

Application

VFS

Page cache

libc

mmap

sys_mmap

동해물과백두산이마르고닳도록하느님이보우하사우리나라만세

무궁화삼천리화려강산 …

c=buf[0]

aops->readpage()

buf[1]=‘\n’ munmap

동해물과백두산이마르고닳도록하느님이보우하사우리나라만세

무궁화삼천리화려강산pagefault

aops->writepage()

replacement

sys_munmap

File system benchmarks

65

IOzoneIometer

FilebenchFFSB

sysbenchBonnie

PostmarkTPC

SPECsfsdbench

IOzone

• File I/O performance analysis

• Installation

– apt-get install iozone3

• Parameters

– -s filesize_Kb

– -r record_size_Kb

– -f [path]filename

– -i test

– -a / -A / -z / -Z : auto mode

– -t children66

-i Description

0 write/rewrite

1 read/re-read

2 random-read/write

3 read-backwards

4 re-write-record

5 stride-read

6 fwrite/re-fwrite

7 fread/re-fread

8 random_mix

9 pwrite/re-pwrite

10 pread/re-pread

11 pwritev/re-pwritev

12 preadv/re-preadv

Filebench

• File system operation analysis

• Installation

– http://sourceforge.net/projects/filebench/files/latest/download

– configure ; make; make install

• Execution

– go_filebench• load workload (…/share/filebench/workloads/*)

• set $dir=path

• run duration

• quit67

filemicro_...singlestream...

fivestream...random...fileserver

networkfsoltp

varmailwebservervideoserer

workloads

Postmark

• Mail-server workload simulation

• Installation

– apt-get install postmark

– Distributed as a C source file

• Execution

– postmark config

68

set sizeset numberset transactionsset locationrunquit

Commands

Sysbench

• A modular, cross-platform and multi-threaded benchmark tool

– Target: CPU, memory, threads, mutex, fileio, oltp

• Installation

– apt-get install sysbench

• Parameters

– --test=fileio

– --file-test-mode=rndwr

– --file-total-size=1G

– --file-block-size=16K

– command: prepare, run, cleanup

File system design

File system design elements

• Space allocation

– Contiguous allocation vs. fragmented allocation

– File to block mapping management

– Managing free space

• Name space management

– File naming: name length, case sensitivity, …• ex. early UNIX file system / FAT uses 8.3 naming system

– Directory hierarchy• Single level array

• Tree-structured multi-level directory

• graph-structured directory

Disk layout and file abstraction

• Abstractions in file system

– File data

– Inode: per file metadata• name, size, data location, modified time, owner, …

– Directory hierarchy

– Superblock

– Meta data for free space management

File a, 0 File a, 1Inode aDir bSuperblock

?

Allocated/free space management

• Bitmap approach (ext*fs)

– Low storage capacity usage

– High free space search cost

• Linked List approach (FAT)

– Low free space search cost

11011000

Allocated/free space management

• Tree-based approach

– Inode and indirect blocks

– Extents: (start block number, contiguous blocks)

inode filenameattributes

direct blocks

single indirectdouble indirecttriple indirect

Indirect block

Indirect block Indirect block

data

datadatadatadata

data

Indirect block

data

data

data

data

Allocated/free space management

• Tree-based approach

– B-Tree (XFS, btrfs, …)• Useful for extent-based allocation

(1, 3) (7, 1) (10, 4)

3

4

8

1 2 3 4 5 6 7 8

File allocation

(14, 5) (4, 3) (8, 2)

5

3

2

(0, 1)

Free space

Directory implementation

• Array

– Easy to manage

– File name length limit

• Linear list

– Variable length file name

– Hard to manage

• Hash table

– Indexed by file name: fast search

– Hash collision

RUN.EXE

README.TXT

DATA.DB

RUN.EXE

README.TXT

DATA.DB

Long named file.docx …

Example: FAT

Characteristics

• Background: 1970s

– Personal computer

– Floppy disks (~ 1MB)

• 8.3 name space

– Case insensitive

– Long name format extension

• No protection mechanism

• No consistency guarantee

– chkdsk, diskscan

• File data location management

– Linked list approach

• FAT entry (1 entry / 1 cluster)

– Next cluster number (cluster: 512 bytes ~ 32 KB)

– 0: free, -1: end of file

Boot block

File allocation table

0 0 0 0

A.EXE

FAT Root dir. Data

00003 00005 00006 -1

Backup

Directory

• A special file with 32 bytes directory entries

• Entries

– File name: 11 bytes (name 8, extension 3)

– Attributes• Read-only, hidden, system, sub-directory, archive, long file name

– ctime, atime, mtime• Year (7), month (4), day (5), hour (5), min (6), second/2 (5)

– First data cluster

– File size (max. 4 GB)

Long name extension

• Combining consecutive directory entries

– First entry: normal directory entry (first 11 character)

– LFN entries• File name segment: 26 bytes

• Reserved critical entries

– First data cluster

– File type, sequence number, etc.

Introductio 0 ctime atime mtime FDC lengthn to File L F System.pptx 0

Sequence File type First cluster, for compatibility

Boot sector

• Boot strap

• File system summary

– File system size (sectors)

– Logical sector size

– Cluster size

– # of FATs

– Root directory entries• Root directory first cluster

– Volume label

– Drive number

Free space management

• Next free cluster pointer

– FAT32 maintains last allocated cluster number Possible to undelete recently delete files

– Produces fragmentation

0 0 0 000003 00005 00006 -1

Last allocated cluster

Example: ext3

Characteristics

• Background

– Linux operating system: multi-user

– Evolving for from desktop to server and real-time system

• Based on block groups

– Each block group works as an independent file system

– Inode, directory, file data

• Inodes for allocation and attribute management

• Journaling support from ext3

Block group

• Ext file system = an array of block groups

• Block group size: determined by block size

– 4K block 128MB

– Why? Data block bitmap must fit in a block

bg_block_bitmap, bg_inode_bitmap, bg_inode_tablebg_free_blocks_count, bg_free_inodes_count, …

Inode

• Size: 128-byte / 256-byte (ext4)

Directory

• ext3~ supports HTree: hashing for entry lookup[Daniel Phillips, A Directory Index for Ext2, Linux Symposium’02]

Free space management

• Data block bitmap / inode bitmap in each block group

• Block allocation rule

– Top-level directory’s inode• In the empty block group, if possible

• Block group with maximum free inodes

– Other inodes and data blocks• In the block group where its inode or parent resides, if possible

• Nearest-backside block group with free blocks more than average

/usr /home /var /etc

top related