file system

Post on 23-Jan-2016

37 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

제 05 강 : File System. File System. 1. Data Structure 2. Functions. Kernel Data Structure for File. Process 1. Process 2. PCB. PCB. CPU. mem. FCB. CPU. mem. File. : Table (Data Structure) : Object (hardware or software). Meta-data for a File. - PowerPoint PPT Presentation

TRANSCRIPT

1

File System

1. Data Structure2. Functions

제 05 강 : File System

2

Kernel Data Structure for File

CPU mem

PCB

FCB

Process 1

CPU mem File

: Table (Data Structure): Object (hardware or software)

PCB

Process 2

3

Meta-data for a File• Information kernel needs for a file:

– owner (eg Clinton)– protection (eg rwx r-- r--)– device (eg disk)– content (eg. sector address)– device driver routines (eg read(), open() )– accessing where now (eg offset*)– ….

•In Linux kernel, read/write system call is sequential.•Try “man 2 read” for system call parameters. offset assumed.•For random access, use lseek() system call that moves offset.

4

contiguous allocation

scattered allocation

5

Filecontent

Filecontent

Contents of File FA

may be stored in disk non-contiguously*in units of disk sectors

Filecontent

Filecontent

•Why not contiguous allocation?(O) fast – if R/W whole content sequential use for swap, device copy, …

(X) space management many small holes (useless) external fragmentation

6

Filecontent

Filecontent

Filecontent

Filemetadata

Filecontent

Kernel maintains metadata for each file

7

Filecontent

Filecontent

Filecontent

Filecontent

Filemetadata

File metadata includes pointers to data sectors

8

Filecontent

Filecontent

Filecontent

Filemetadata

Filecontent

Filemetadata

Open() retrieves metadata

from disk to main memory

But not contents – they are too big !!

9

Filecontent

Filecontent

Filecontent

Filemetadata

Filecontent

Filemetadata

This metadata has pointers to data sectors

10

Filecontent

Filecontent

Filecontent

FX

metadata

Filecontent

PA

PB

PC FX

metadata

FX

metadata

Split Metadata for file

FX

metadata

11

Split Metadata for file

– owner– protection information – device – pointer to file content – device driver routines– offset

All processes share single copy in memory

“inode” struct

Let each process have private copysince processes access different part

“file” struct

Private information

Systemwide information

12

offset

other info.

(system) file table inode table

So we have two data structures for each file

private infoPer-process data

Next byteposition to r/w

shared info (systemwide)single copy globally

Information --- less frequently changed

offset

PA

PB

Private informationSystemwide information

13

/* * One file structure is allocated for each open/creat/pipe call. * Main use is to hold the read/write offset */

struct file{

char f_flag;char f_count; /* reference count */int f_inode; /* pointer to inode structure */char *f_offset[2]; /* read/write character pointer */

} file[NFILE];

/* flags */#define FREAD 01#define FWRITE 02#define FPIPE 04

14

struct inode{

char i_flag;char i_count; /* reference count */int i_dev; /* device where inode resides */int i_number; /* i number, 1-to-1 with device address */int i_mode;char i_nlink; /* directory entries */char i_uid; /* owner */char i_gid; /* group of owner */char i_size0; /* most significant of size */char *i_size1; /* least sig */int i_addr[8]; /* device addresses constituting file */int i_lastr; /* last logical block read (for read-ahead) */

} inode[NINODE];

15

Sharing Files

• Example– (Case-1) who, grep -- pipe file

• % who | grep• share inode (pipe file), • not share offset

– (Case-2) parent/child -- tty file• % vi• share inode (tty file), • share offset

pipe

tty(in)

who

vi

grep

sh

16

Sharing files

who

vi

grep

sh

game

(system) file tableInode table

offset

offset

offset

offset

inode

inode

inode

Pipe file

game file

tty device

pipe

processgroup

$ grep|who

$ vi

17

Device switch table• 2-dim array which maps

(device name, operation name) => device driver routine

device independence (above: file, below: device)

openclosereadwrite ioctl

Starting address

ofdriver

routine

devswtab[]:

Read_lp

18

struct cdevsw{ int (*d_open)();

int (*d_close)();int (*d_read)();int (*d_write)();int (*d_sgtty)();

} cdevsw[];d_opend_closed_readd_write d_ioctl

Read_lp

Actually,

one dimensional array of

struct

not two dimensional array

19

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user PA

/

b

a b

datablock

datablock

datablock

Device name

20

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user PA

/

b

a b

offset

datablock

datablock

datablock

21

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user

u-ofile01234

PA

/

b

a b

offset

datablock

datablock

datablock

fd = 4

22

File descriptor table(or open file table)

• An array in struct user ( u_ofile[] array )• per process open file information • whenever program calls open(), create()

fd = open(“/a/b”, …)

– fd is integer (“file descriptor”), starts from 0, 1, 2 ..• 0, 1, 2 reserved for standard (input/output/error) file

– used as an index into• u_ofile[] array (file descriptor table, open file table_)• starting point to access file (points to system file

table)

(3) file descriptor (2) kernel (1) pathname of is returned opens file the file to open Internal rep. symbolic name

23

(system) file tableinode table

offset

offset

inode

inode

user per process

open file tablefile descriptor table ( “file handle” extends this notion to network. Window’s name)

01234

devswtab

device

routine

PA

Kernel data structure for file

u_ofile[]

fd

24

Kernel Data Structure

CPU

user

Process 1

CPUFX

inode

offset

read( )

devswtab

r w o c

25

(System) file table

• struct file• One entry for each open/create/pipe• may be shared (if offset is shared)• content

– offset – counter (number of processes sharing

this entry)– pointer to inode table– r/w/p flag

26

Inode table• includes most of the information for file• shared by all processes• changed less frequently (than offset)• content (while in disk)

– protection mode– owner– size– pointer to sectors– etc

27

In core Inode • content (while in disk)

– protection mode– owner– size– time– array of pointers to disk blocks

• plus (at load time)– counter (number of processes sharing

file)– device name (major/minor device

number)– i-number (location of inode in disk)– status (locked, mount point, …)

28

Filecontent

Filecontent

Filecontent

Filecontent

inode

pointer array within inode

inode

Now, you can reach any data block through in-core inodeThese pointers are stored in an array within inode

29

Balanced tree

• Example: 10 GB disk, 1K sector # of sectors = 10,000,000,000 / 1000

= 10,000,000 sectors each sector pointer ----- 24 bits

• A sector can hold (1000/24) = about 50 sector pointers

~50

30

Balanced tree• Example: 10 GB disk, 1K sector # of sectors = 10,000,000,000 / 1000

= 10,000,000 sectors each sector pointer ----- 24 bits

• A sector can hold (1000/24) = about 50 sector pointers

~50 ~50 ~50 ~50

31

Balanced tree• Example: 10 GB disk, 1K sector # of sectors = 10,000,000,000 / 1000

= 10,000,000 sectors each sector pointer ----- 24 bits

• A sector can hold (1000/24) = about 50 sector pointers

~50

~50 ~50 ~50

~50 ~50 ~50

32

Balanced tree

• Balanced tree of order ~ M (special insert/delete algorithm)• Top level, master index• UNIX – skewed tree

33

direct 0

direct 1

direct 2

direct 3

direct 4

direct 5

direct 6

direct 7

direct 8

direct 9

single indirect

double indirect

triple indirect

Data

Blockpointer array within inode

34

direct 0

direct 1

direct 2

direct 3

direct 4

direct 5

direct 6

direct 7

direct 8

direct 9

single indirect

double indirect

triple indirect

Data

Blockpointer array within inode

Fast for small files (created by human being at terminal keyboards)slower for big files timesharing application

35

direct 0

direct 1

direct 2

direct 3

direct 4

direct 5

direct 6

direct 7

direct 8

direct 9

single indirect

double indirect

Triple indirect

Data

Block~ 1KB

~ 2 KB

~ 9 KB

~109KB

~ 10109 KB

Offset vs Disk Block

57821

36

Linux• 1-12th pointer – direct pointer• 13th pointer – indirect pointer• 14th pointer - doubly indirect pointer• 15th pointer – triply indirect pointer• ---------------• Max 4096 GB file data if

– block address - 32 bits– block size - 4096 byte

37

inode

datablock

Space for inode

Space for data blocks

inode inode

inode

datablock

datablock

datablock

Disk Space for ...

File data size --- variesinode size --- fixed

38

Space for inode in Disk (Each inode - fixed size)

inode 0

inode 1

inode 2

inode n

i-number:

ordinal number ( 順番 ) of inode in disk

If I know (disk, i-number), I can access file content.

disk name inode content

i-number

39

Directory file (it is also a file. content: <name,

pointer>)

file name

“a” “b” “bin” “dev”

i-number

7 1 3 772

i-number = 3

3rd inode in disk

Data blocks

Q: file name – limit char?Q: # of files – limit?

40

Kernel tables before open(/a/b)

(system) file tableinode table

/

user PA

/ a b

datablock

datablock

datablock

inode

data

41

Kernel tables before open(/a/b)

file table inode table

/

user

PA

/ a b

datablock

datablock

datablock

inode

data

datablock

datablock

datablock

a bin x7 11 8

42

Kernel tables before open(/a/b)

file table inode table

/

user

PA

/ a b

datablock

datablock

datablock

inode

data

datablock

datablock

datablock

a bin x7 11 8

a datablock

datablock

datablock

b usr y3 21 6

43

open(“/a/b”)

/ :

/a:

/a/b:

i

data

data a x y bin dev 7 6 8 11 40

data

i

data

data w u b ch temp 7 6 8 11 40

data

i

data

data Content of this file

data

Directory

Directory

Regular File

44

open(“/a/b”, …)• Kernel system call open( ) scans

pathname– 1st -- root directory file:

• get inode 0 in disk inode space• read data blocks of root directory file• search for file name “a”• get corresponding i-number for file “a”

– 2nd -- “a” file:• get inode of file “a” from disk (it is directory file)• get data blocks of directory file “a”• search for file name “b”• get corresponding i-number for file “b”

/ a / b

/ a / ba 7bin 12dev

11

45

(continued)

– file “b”:•read inode of “b” from disk (regular file) ---- given pathname “/a/b” ends here -------

•set up kernel data structures for file “b”– insert inode into in-core inode table– new entry in system file table

(offset <= zero)– new entry in u_ofile[] in user– return file descriptor– open( ) is done

/ a / b

46

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user PA

/ a b

datablock

datablock

datablock

inode

data

47

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user PA

/

b

a b

datablock

datablock

datablock

Device name

48

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user PA

/

b

a b

offset

datablock

datablock

datablock

49

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user

u-ofile01234

PA

/

b

a b

offset

datablock

datablock

datablock

fd = 4

50

Kernel tables after open(/a/b)

(system) file tableinode table

/

a

user

u-ofile01234

PA

/

b

a b

offset

datablock

datablock

datablock

fd = 4returned

Once you have fd,you can access b’s inodeafter only 3 memory accesses

Once you have fd,you can access b’s inodeafter only 3 memory accesses

51

(continued)• open(“/a/b”) is very costly -- # of disk

accesses– once (open or create) is enough translate (pathname=> fd) once, save it

– do not use pathname in subsequent calls• read( ), write( ), close( ), …

– use file descriptor instead• read(fd, ... ), write(fd, ... ),

• Try “man read” … only one system call …

52

Try …

• man 2 read read(int fd )• man 2 open open(char *pathname )• man 3 fread

fread(FILE *file)

53

C functions for file

• Wait a minute …– I used printf(), scanf(), getchar() …. But never used read(), write() before …?

– I used *FILE …. But never used fd (file descriptor) before …?

Right, most people use library function

And library then invokes invokes system calls Remember? Library cannot perform I/O directly ….

library functions are in my address space (user)

54

System calls for files

create() open(), close()read(), write()lseek() move offsetstat() get inode content

• All others are library functions– eg scanf(), gets(), getchar(), …..

55

System call v.s. Library call

in kernel in a.out (user)system call library call

scanf() format getchar() char tty files

gets() string

read() fsacnf() fgetc() all filesfgets() fread() any number

fd *FILE (struct in lib)

56

FILE vs fd

(system)file table

inodetable

/

a

user

u-ofile01234

User a.out

/

b

a

offsetdatablock

datablock

fd

libraryFILE (

count ---- buf

pointer -- buf file

descriptor }

local buffer

Kernel a.out

main( )

When the local buffer (in FILE) becomes empty,Read() system call fills this buffer again

fopen( )

printf( ) write()

system call add( )

sub( )

my code

trap( )

57

Example: open 1. my a.out calls library fopen(“/a/b/c” )2. fopen() creates struct FILE for /a/b/c3. library invokes system call open(“/a/b/c” )

kenel sets up tables (inode, user, .., u_ofile[])

kernel returns file descriptor fd

fopen() saves fd in *FILE (for future use)

fopen() returns *FILE

4. my a.out saves *FILE (for future use) 5. all future use getchar(*FILE)

58

Example: getchar() #include “syscalls.h”

int getchar(void) /* library function -- copied into my a.out */

{ static char buf[BUFSIZ]; /* library local buffer */ static char *bufp = buf; /* pointer */ static int n =0; /* counter */

/* Is library local buffer empty? */if (n == 0) {/* Yes, invoke read() system call & fill up local buffer*/

n = read (0, buf, sizeof(buf)); /* system call */bufp = buf;

} return(--n>0)? (unsigned char) *bufp++: EOF; /* return a character

*/}

data structurein library

59

Functions for file handling

• So, you usually use library…printf() for formating (such as %s, %d)getchar() for performance ….But all library I/O functions end up asking system call(Library functions are “ user” code & cannot do I/O directly)They are front-end and provide you with convenience, performance …

Many library functions may exist

But there’s only one system call for read()

top related