file system
DESCRIPTION
제 05 강 : File System. File System. 1. Data Structure 2. Functions. Kernel Data Structure for File. Process 1. Process 2. PCB. PCB. CPU. mem. FCB. CPU. mem. File. : Table (Data Structure) : Object (hardware or software). Meta-data for a File. - PowerPoint PPT PresentationTRANSCRIPT
1
File System
1. Data Structure2. Functions
제 05 강 : File System
2
Kernel Data Structure for File
CPU mem
PCB
FCB
Process 1
CPU mem File
: Table (Data Structure): Object (hardware or software)
PCB
Process 2
3
Meta-data for a File• Information kernel needs for a file:
– owner (eg Clinton)– protection (eg rwx r-- r--)– device (eg disk)– content (eg. sector address)– device driver routines (eg read(), open() )– accessing where now (eg offset*)– ….
•In Linux kernel, read/write system call is sequential.•Try “man 2 read” for system call parameters. offset assumed.•For random access, use lseek() system call that moves offset.
4
contiguous allocation
scattered allocation
5
Filecontent
Filecontent
Contents of File FA
may be stored in disk non-contiguously*in units of disk sectors
Filecontent
Filecontent
•Why not contiguous allocation?(O) fast – if R/W whole content sequential use for swap, device copy, …
(X) space management many small holes (useless) external fragmentation
6
Filecontent
Filecontent
Filecontent
Filemetadata
Filecontent
Kernel maintains metadata for each file
7
Filecontent
Filecontent
Filecontent
Filecontent
Filemetadata
File metadata includes pointers to data sectors
8
Filecontent
Filecontent
Filecontent
Filemetadata
Filecontent
Filemetadata
Open() retrieves metadata
from disk to main memory
But not contents – they are too big !!
9
Filecontent
Filecontent
Filecontent
Filemetadata
Filecontent
Filemetadata
This metadata has pointers to data sectors
10
Filecontent
Filecontent
Filecontent
FX
metadata
Filecontent
PA
PB
PC FX
metadata
FX
metadata
Split Metadata for file
FX
metadata
11
Split Metadata for file
– owner– protection information – device – pointer to file content – device driver routines– offset
All processes share single copy in memory
“inode” struct
Let each process have private copysince processes access different part
“file” struct
Private information
Systemwide information
12
offset
other info.
(system) file table inode table
So we have two data structures for each file
private infoPer-process data
Next byteposition to r/w
shared info (systemwide)single copy globally
Information --- less frequently changed
offset
PA
PB
Private informationSystemwide information
13
/* * One file structure is allocated for each open/creat/pipe call. * Main use is to hold the read/write offset */
struct file{
char f_flag;char f_count; /* reference count */int f_inode; /* pointer to inode structure */char *f_offset[2]; /* read/write character pointer */
} file[NFILE];
/* flags */#define FREAD 01#define FWRITE 02#define FPIPE 04
14
struct inode{
char i_flag;char i_count; /* reference count */int i_dev; /* device where inode resides */int i_number; /* i number, 1-to-1 with device address */int i_mode;char i_nlink; /* directory entries */char i_uid; /* owner */char i_gid; /* group of owner */char i_size0; /* most significant of size */char *i_size1; /* least sig */int i_addr[8]; /* device addresses constituting file */int i_lastr; /* last logical block read (for read-ahead) */
} inode[NINODE];
15
Sharing Files
• Example– (Case-1) who, grep -- pipe file
• % who | grep• share inode (pipe file), • not share offset
– (Case-2) parent/child -- tty file• % vi• share inode (tty file), • share offset
pipe
tty(in)
who
vi
grep
sh
16
Sharing files
who
vi
grep
sh
game
(system) file tableInode table
offset
offset
offset
offset
inode
inode
inode
Pipe file
game file
tty device
pipe
processgroup
$ grep|who
$ vi
17
Device switch table• 2-dim array which maps
(device name, operation name) => device driver routine
device independence (above: file, below: device)
openclosereadwrite ioctl
Starting address
ofdriver
routine
devswtab[]:
Read_lp
18
struct cdevsw{ int (*d_open)();
int (*d_close)();int (*d_read)();int (*d_write)();int (*d_sgtty)();
} cdevsw[];d_opend_closed_readd_write d_ioctl
Read_lp
Actually,
one dimensional array of
struct
not two dimensional array
19
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user PA
/
b
a b
datablock
datablock
datablock
Device name
20
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user PA
/
b
a b
offset
datablock
datablock
datablock
21
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user
u-ofile01234
PA
/
b
a b
offset
datablock
datablock
datablock
fd = 4
22
File descriptor table(or open file table)
• An array in struct user ( u_ofile[] array )• per process open file information • whenever program calls open(), create()
fd = open(“/a/b”, …)
– fd is integer (“file descriptor”), starts from 0, 1, 2 ..• 0, 1, 2 reserved for standard (input/output/error) file
– used as an index into• u_ofile[] array (file descriptor table, open file table_)• starting point to access file (points to system file
table)
(3) file descriptor (2) kernel (1) pathname of is returned opens file the file to open Internal rep. symbolic name
23
(system) file tableinode table
offset
offset
inode
inode
user per process
open file tablefile descriptor table ( “file handle” extends this notion to network. Window’s name)
01234
devswtab
device
routine
PA
Kernel data structure for file
u_ofile[]
fd
24
Kernel Data Structure
CPU
user
Process 1
CPUFX
inode
offset
read( )
devswtab
r w o c
25
(System) file table
• struct file• One entry for each open/create/pipe• may be shared (if offset is shared)• content
– offset – counter (number of processes sharing
this entry)– pointer to inode table– r/w/p flag
26
Inode table• includes most of the information for file• shared by all processes• changed less frequently (than offset)• content (while in disk)
– protection mode– owner– size– pointer to sectors– etc
27
In core Inode • content (while in disk)
– protection mode– owner– size– time– array of pointers to disk blocks
• plus (at load time)– counter (number of processes sharing
file)– device name (major/minor device
number)– i-number (location of inode in disk)– status (locked, mount point, …)
28
Filecontent
Filecontent
Filecontent
Filecontent
inode
pointer array within inode
inode
Now, you can reach any data block through in-core inodeThese pointers are stored in an array within inode
29
Balanced tree
• Example: 10 GB disk, 1K sector # of sectors = 10,000,000,000 / 1000
= 10,000,000 sectors each sector pointer ----- 24 bits
• A sector can hold (1000/24) = about 50 sector pointers
~50
30
Balanced tree• Example: 10 GB disk, 1K sector # of sectors = 10,000,000,000 / 1000
= 10,000,000 sectors each sector pointer ----- 24 bits
• A sector can hold (1000/24) = about 50 sector pointers
~50 ~50 ~50 ~50
31
Balanced tree• Example: 10 GB disk, 1K sector # of sectors = 10,000,000,000 / 1000
= 10,000,000 sectors each sector pointer ----- 24 bits
• A sector can hold (1000/24) = about 50 sector pointers
~50
~50 ~50 ~50
~50 ~50 ~50
32
Balanced tree
• Balanced tree of order ~ M (special insert/delete algorithm)• Top level, master index• UNIX – skewed tree
33
direct 0
direct 1
direct 2
direct 3
direct 4
direct 5
direct 6
direct 7
direct 8
direct 9
single indirect
double indirect
triple indirect
Data
Blockpointer array within inode
34
direct 0
direct 1
direct 2
direct 3
direct 4
direct 5
direct 6
direct 7
direct 8
direct 9
single indirect
double indirect
triple indirect
Data
Blockpointer array within inode
Fast for small files (created by human being at terminal keyboards)slower for big files timesharing application
35
direct 0
direct 1
direct 2
direct 3
direct 4
direct 5
direct 6
direct 7
direct 8
direct 9
single indirect
double indirect
Triple indirect
Data
Block~ 1KB
~ 2 KB
~ 9 KB
~109KB
~ 10109 KB
Offset vs Disk Block
57821
36
Linux• 1-12th pointer – direct pointer• 13th pointer – indirect pointer• 14th pointer - doubly indirect pointer• 15th pointer – triply indirect pointer• ---------------• Max 4096 GB file data if
– block address - 32 bits– block size - 4096 byte
37
inode
datablock
Space for inode
Space for data blocks
inode inode
inode
datablock
datablock
datablock
Disk Space for ...
File data size --- variesinode size --- fixed
38
Space for inode in Disk (Each inode - fixed size)
inode 0
inode 1
inode 2
inode n
i-number:
ordinal number ( 順番 ) of inode in disk
If I know (disk, i-number), I can access file content.
disk name inode content
i-number
39
Directory file (it is also a file. content: <name,
pointer>)
file name
“a” “b” “bin” “dev”
i-number
7 1 3 772
i-number = 3
3rd inode in disk
Data blocks
Q: file name – limit char?Q: # of files – limit?
40
Kernel tables before open(/a/b)
(system) file tableinode table
/
user PA
/ a b
datablock
datablock
datablock
inode
data
41
Kernel tables before open(/a/b)
file table inode table
/
user
PA
/ a b
datablock
datablock
datablock
inode
data
datablock
datablock
datablock
a bin x7 11 8
42
Kernel tables before open(/a/b)
file table inode table
/
user
PA
/ a b
datablock
datablock
datablock
inode
data
datablock
datablock
datablock
a bin x7 11 8
a datablock
datablock
datablock
b usr y3 21 6
43
open(“/a/b”)
/ :
/a:
/a/b:
i
data
data a x y bin dev 7 6 8 11 40
data
i
data
data w u b ch temp 7 6 8 11 40
data
i
data
data Content of this file
data
Directory
Directory
Regular File
44
open(“/a/b”, …)• Kernel system call open( ) scans
pathname– 1st -- root directory file:
• get inode 0 in disk inode space• read data blocks of root directory file• search for file name “a”• get corresponding i-number for file “a”
– 2nd -- “a” file:• get inode of file “a” from disk (it is directory file)• get data blocks of directory file “a”• search for file name “b”• get corresponding i-number for file “b”
/ a / b
/ a / ba 7bin 12dev
11
45
(continued)
– file “b”:•read inode of “b” from disk (regular file) ---- given pathname “/a/b” ends here -------
•set up kernel data structures for file “b”– insert inode into in-core inode table– new entry in system file table
(offset <= zero)– new entry in u_ofile[] in user– return file descriptor– open( ) is done
/ a / b
46
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user PA
/ a b
datablock
datablock
datablock
inode
data
47
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user PA
/
b
a b
datablock
datablock
datablock
Device name
48
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user PA
/
b
a b
offset
datablock
datablock
datablock
49
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user
u-ofile01234
PA
/
b
a b
offset
datablock
datablock
datablock
fd = 4
50
Kernel tables after open(/a/b)
(system) file tableinode table
/
a
user
u-ofile01234
PA
/
b
a b
offset
datablock
datablock
datablock
fd = 4returned
Once you have fd,you can access b’s inodeafter only 3 memory accesses
Once you have fd,you can access b’s inodeafter only 3 memory accesses
51
(continued)• open(“/a/b”) is very costly -- # of disk
accesses– once (open or create) is enough translate (pathname=> fd) once, save it
– do not use pathname in subsequent calls• read( ), write( ), close( ), …
– use file descriptor instead• read(fd, ... ), write(fd, ... ),
• Try “man read” … only one system call …
52
Try …
• man 2 read read(int fd )• man 2 open open(char *pathname )• man 3 fread
fread(FILE *file)
53
C functions for file
• Wait a minute …– I used printf(), scanf(), getchar() …. But never used read(), write() before …?
– I used *FILE …. But never used fd (file descriptor) before …?
Right, most people use library function
And library then invokes invokes system calls Remember? Library cannot perform I/O directly ….
library functions are in my address space (user)
54
System calls for files
create() open(), close()read(), write()lseek() move offsetstat() get inode content
• All others are library functions– eg scanf(), gets(), getchar(), …..
55
System call v.s. Library call
in kernel in a.out (user)system call library call
scanf() format getchar() char tty files
gets() string
read() fsacnf() fgetc() all filesfgets() fread() any number
fd *FILE (struct in lib)
56
FILE vs fd
(system)file table
inodetable
/
a
user
u-ofile01234
User a.out
/
b
a
offsetdatablock
datablock
fd
libraryFILE (
count ---- buf
pointer -- buf file
descriptor }
local buffer
Kernel a.out
main( )
When the local buffer (in FILE) becomes empty,Read() system call fills this buffer again
fopen( )
printf( ) write()
system call add( )
sub( )
my code
trap( )
57
Example: open 1. my a.out calls library fopen(“/a/b/c” )2. fopen() creates struct FILE for /a/b/c3. library invokes system call open(“/a/b/c” )
kenel sets up tables (inode, user, .., u_ofile[])
kernel returns file descriptor fd
fopen() saves fd in *FILE (for future use)
fopen() returns *FILE
4. my a.out saves *FILE (for future use) 5. all future use getchar(*FILE)
58
Example: getchar() #include “syscalls.h”
int getchar(void) /* library function -- copied into my a.out */
{ static char buf[BUFSIZ]; /* library local buffer */ static char *bufp = buf; /* pointer */ static int n =0; /* counter */
/* Is library local buffer empty? */if (n == 0) {/* Yes, invoke read() system call & fill up local buffer*/
n = read (0, buf, sizeof(buf)); /* system call */bufp = buf;
} return(--n>0)? (unsigned char) *bufp++: EOF; /* return a character
*/}
data structurein library
59
Functions for file handling
• So, you usually use library…printf() for formating (such as %s, %d)getchar() for performance ….But all library I/O functions end up asking system call(Library functions are “ user” code & cannot do I/O directly)They are front-end and provide you with convenience, performance …
Many library functions may exist
But there’s only one system call for read()