Download - Youjip Won - oslab.kaist.ac.kr
Youjip Won
2Youjip Won
inode
data structure to represent the attribute of file
File type: general file, directory, device file, or 0 (unused)
creation time, modified time, size
access authority
location of blocks
one inode per file.
There are on-disk inode and in-memory inode
functions
iget(),iput(), ialloc(), iupdate()
3Youjip Won
On-disk inode
struct dinode
inode structure stored in disk
size: 64 bytes
dinode is successively stored in inode area in filesystem
i-number: dinode index of inode area
inode area
25 blocks
8 inodes per block (512/64)
total number of nodes = 200
25 blocks30 blocks
4Youjip Won
5Youjip Won
struct dinode
type: file type(directory,file,special
file,unused)
0: unused
major: number of major device
minor: number if minor device
nlink: the number of directory
entries referring to the inode
if nlink is zero, deletes it.
size: file size(byte)
addrs: data block address
struct dinode {
short type;
short major;
short minor;
short nlink;
uint size;
uint addrs[NDIRECT+1];
};
6Youjip Won
in-memory inode
struct inode
The inode data structure cached in memory.
includes contents of the on-disk inode.
Other contents included
Reference count, lock, and so on...
7Youjip Won
struct inode
dev: device number
inum: inode index
ref: number of the processes that
currently open the file
If ref field is 0, there is no process
referencing that process, so inode
structure is removed from icache
This field is increased by iget, and is
decreased by iput.
Lock
For the exclusive access for
synchronizing the on-disk inode and in-
memory inode
struct inode {
uint dev;
uint inum;
int ref;
struct sleeplock lock;
int valid;
short type;
short major;
short minor;
short nlink;
uint size;
uint addrs[NDIRECT+1];
};
8Youjip Won
icache
Array of in-memory inodes (NINODES = 50)
make the inode access quicker.
the write-through policy: When on-disk attributes are updated, icache immediately
updates the modified inode to the disk by calling iupdate().
Spinlock for icache
For the exclusive access for the in-memory portion of the inode
ensure that at most one copy of the inode at icache
...inode[0] inode[1] inode[2] inode[3] inode[49]
struct {
struct spinlock lock;
struct inode inode[NINODE];
} icache;
spinlock
9Youjip Won
10Youjip Won
11Youjip Won
inode API’s
ialloc : allocate a new inode.
iget : find the inode entry from inode cache.
itrunc : reduce the size of inode to 0 and release all allocated data block
iput : decrease the reference count after completing the reference to the inode
ilock : lock the in-memory node.
for copying the on-disk inode data to in-memory inode data
iunlock : release the sleeplock acquired at ilock.
12Youjip Won
ialloc
scan the inode structures on the disk for a free one.
If it finds one,
claims it by writing the “new type” to the disk,
loads it to the inode cache,
and returns the pointer (iget).
be sure to lock (ilock) the “struct inode” in iget.
13Youjip Won
allocating a new inode
inode array in the diskblock
inode
disk
memory
buffer cache
1.bread()
2. mark it used.
3.log_write: mark it used.
icache
4.iget
14Youjip Won
struct inode* ialloc(uint dev, short type){
int inum;
struct buf *bp;
struct dinode *dip;
for(inum = 1; inum < sb.ninodes; inum++){
bp = bread(dev, IBLOCK(inum, sb));
dip = (struct dinode*)bp->data + inum%IPB;
if(dip->type == 0){
memset(dip, 0, sizeof(*dip));
dip->type = type;
log_write(bp); // mark it as ‘allocated' on the disk
brelse(bp);
return iget(dev, inum);
}
}
...
}
Code: inode * ialloc()
bread() is called multiple times for the
same block. can we improve it?
Read inode blocks of inode area one by one
15Youjip Won
typical inode operation
Thus a typical sequence is:
ip = iget(dev, inum)
ilock(ip)
... examine and modify ip->xxx ...
iunlock(ip)
iput(ip)
17Youjip Won
inode * iget(dev,inum)
returns the pointer to the in-core copy of the struct inode with dev and inum.
increases the reference count.
locks the icache when it starts and unlock the icache when it finishes.
reference to inode stays valid until the matching call to iput() is made and
reference count becomes 0.
If the requested inode is not in icache, then creates one with valid field being 0.
It does not read the inode from the disk.
It separates the process of reserving a slot in icache from the process of reading the
associated inode from the disk.
18Youjip Won
inode * iget(dev,inum)
static struct inode* iget(uint dev, uint inum)
{
struct inode *ip, *empty;
acquire(&icache.lock);
...
empty = 0;
for(ip = &icache.inode[0]; ip < &icache.inode[NINODE]; ip++){
if(ip->ref > 0 && ip->dev == dev && ip->inum == inum){
ip->ref++;
release(&icache.lock);
return ip;
}
if(empty == 0 && ip->ref == 0) //remember the empty slot
empty = ip;
}
...
}
Check whether inode is cached already in in-memory inode cache
If it is in in-memory cache, ref count is increased by one
19Youjip Won
inode * iget(dev,inum) (cont.)
static struct inode* iget(uint dev, uint inum)
{
struct inode *ip, *empty;
...
empty = 0;
for(ip = &icache.inode[0]; ip < &icache.inode[NINODE]; ip++){
...
if(empty == 0 && ip->ref == 0)
empty = ip;
}
ip = empty;
ip->dev = dev;
ip->inum = inum;
ip->ref = 1;
ip->valid = 0;
release(&icache.lock);
return ip;
}
If it is not in icache, inode is allocated.
20Youjip Won
ilock and iunlock
xv6 allows only one process can read or write a file content or file metadat
a at a time.
How can we guarantee the exclusive access?
ilock() and iunlock()
lock the inode in the icache.
it’s sleep lock!
21Youjip Won
ilock (inode *ip)
1. sleep-lock the inode cache entry pointed by ip.
2. bring-in the inode inum from disk to icache if necessary.
inode array in the diskblock
inode
disk
memory
icache
buffer cache
1. bread()
ip
2.copy
22Youjip Won
ilock
void ilock(struct inode *ip){
struct buf *bp;
struct dinode *dip;
if(ip == 0 || ip->ref < 1) panic("ilock");
acquiresleep(&ip->lock);
if(ip->valid == 0){
bp = bread(ip->dev, IBLOCK(ip->inum, sb));
dip = (struct dinode*)bp->data + ip->inum%IPB;
ip->type = dip->type;
ip->major = dip->major;
ip->minor = dip->minor;
ip->nlink = dip->nlink;
ip->size = dip->size;
memmove(ip->addrs, dip->addrs, sizeof(ip->addrs));
brelse(bp);
ip->valid = 1;
if(ip->type == 0)
panic("ilock: no type");
}
}
If inode is invalid, it initializes fields
23Youjip Won
iunlock
void
iunlock(struct inode *ip)
{
if(ip == 0 || !holdingsleep(&ip->lock) || ip->ref < 1)
panic("iunlock");
releasesleep(&ip->lock);
}
If inode is invalid or lock is not acquired or there is no process that is not
reference this inode, panic occurs.
release the lock of inode
24Youjip Won
typical usage
Thus a typical sequence is:
ip = iget(dev, inum)
ilock(ip)
... examine and modify ip->xxx ...
iunlock(ip)
iput(ip)
25Youjip Won
void iput(inode * ip)
inode array in the diskblock
inode
disk
memory
buffer cache
icache
ip
Write the inode in the inode cache to disk.
done by
iupdate
26Youjip Won
void iput (inode * ip)
decreases ref, the reference count for the in-memory inode.
If the reference counter is 0, the slot in the icache can be recycled.
If nlink is 0 (no link) and reference counter is 0,
frees the inode from the disk.
frees all blocks associated with the inode ( itrunc).
set the in-core inode type to UNUSED ( 0 ) and logs the updated inode to the disk.
27Youjip Won
iput
void iput(struct inode *ip)
{
acquiresleep(&ip->lock);
if(ip->valid && ip->nlink == 0){
acquire(&icache.lock);
int r = ip->ref;
release(&icache.lock);
if(r == 1){
itrunc(ip); // Free all data blocks of file by
using itruc()
ip->type = 0; // Modify type to 0 (0 means unused inode)
iupdate(ip); // Apply modified data
ip->valid = 0;
}
}
releasesleep(&ip->lock);
acquire(&icache.lock);
ip->ref--;
release(&icache.lock);
}
Decrease reference count of inode by one
28Youjip Won
iput
void iput(struct inode *ip)
{
acquiresleep(&ip->lock);
if(ip->valid && ip->nlink == 0){
acquire(&icache.lock);
int r = ip->ref;
release(&icache.lock);
if(r == 1){
itrunc(ip); // Free all data blocks of file by
using itruc()
ip->type = 0; // Modify type to 0 (0 means unused inode)
iupdate(ip); // Apply modified data
ip->valid = 0;
}
}
releasesleep(&ip->lock);
acquire(&icache.lock);
ip->ref--;
release(&icache.lock);
}
If nlink is 0, the inode is released
Decrease reference count of inode by one
29Youjip Won
iput
void iput(struct inode *ip)
{
acquiresleep(&ip->lock);
if(ip->valid && ip->nlink == 0){
acquire(&icache.lock);
int r = ip->ref;
release(&icache.lock);
if(r == 1){
itrunc(ip); // Free all data blocks of file by
using itruc()
ip->type = 0; // Modify type to 0 (0 means unused inode)
iupdate(ip); // Apply modified data
ip->valid = 0;
}
}
releasesleep(&ip->lock);
acquire(&icache.lock);
ip->ref--;
release(&icache.lock);
}
If nlink is 0, the inode is released
Decrease reference count of inode by one
30Youjip Won
iput
void iput(struct inode *ip)
{
acquiresleep(&ip->lock);
if(ip->valid && ip->nlink == 0){
acquire(&icache.lock);
int r = ip->ref;
release(&icache.lock);
if(r == 1){
itrunc(ip); // Free all data blocks of file by
using itruc()
ip->type = 0; // Modify type to 0 (0 means unused inode)
iupdate(ip); // Apply modified data
ip->valid = 0;
}
}
releasesleep(&ip->lock);
acquire(&icache.lock);
ip->ref--;
release(&icache.lock);
}
If nlink is 0, the inode is released
Decrease reference count of inode by one
31Youjip Won
itrunc (inode * ip)
1. free the data blocks pointed by the direct pointers.
2. free the data blocks pointed by the indirect pointers.
3. free the indirect block.
4. set the file size to 0.
5. safely store the updated inode.
35Youjip Won
itrunc
static void itrunc(struct inode *ip)
{
int i, j;
struct buf *bp;
uint *a;
for(i = 0; i < NDIRECT; i++){
if(ip->addrs[i]){
bfree(ip->dev, ip->addrs[i]);
ip->addrs[i] = 0;
}
}
...
36Youjip Won
itrunc
...
if(ip->addrs[NDIRECT]){
bp = bread(ip->dev, ip->addrs[NDIRECT]);
a = (uint*)bp->data;
for(j = 0; j < NINDIRECT; j++){
if(a[j])
bfree(ip->dev, a[j]);
}
brelse(bp);
bfree(ip->dev, ip->addrs[NDIRECT]);
ip->addrs[NDIRECT] = 0;
}
ip->size = 0;
iupdate(ip);
}
37Youjip Won
iupdate(inode * ip)
update the on-disk inode ;
inode array in the diskblock
inode
disk
memory
buffer cache
icache
ip
1. bread
2. copy
3. log_write
38Youjip Won
void iupdate (inode * ip)
Copy a modified in-memory inode to disk.
Must be called after every change to an ip->xxx field that lives on disk, since i-
node cache is write-through.
Caller must hold ip->lock.
39Youjip Won
void iupdate (inode * ip)
void iupdate(struct inode *ip)
{
struct buf *bp;
struct dinode *dip;
bp = bread(ip->dev, IBLOCK(ip->inum, sb));
dip = (struct dinode*)bp->data + ip->inum%IPB;
dip->type = ip->type;
dip->major = ip->major;
dip->minor = ip->minor;
dip->nlink = ip->nlink;
dip->size = ip->size;
memmove(dip->addrs, ip->addrs, sizeof(ip->addrs));
log_write(bp);
brelse(bp);
}
40Youjip Won
42Youjip Won
Putting everything together: filewrite()
int filewrite(struct file *f, char *addr, int n) {
int r;
…
while(i < n){
…
begin_op();
ilock(f->ip);
if ((r = writei(f->ip, addr + i, f->off, n1)) > 0)
f->off += r;
iunlock(f->ip);
end_op();
…
}
}
43Youjip Won
readi/writei/stati
Now, let’s look at the the real system call is implemented using iget/iput/me
mmove.
readi: read the file
writei: write the file
stati: get the inode
44Youjip Won
readi(inode * ip, char*dst, uint off, uint u)
read n byte to dst from off position of ip.
if the inode is device, read directly to the user buffer.
If the inode represents file, read first into buffer cache and then copy to the
user buffer.
45Youjip Won
int
readi(struct inode *ip, char *dst, uint off, uint n)
{
uint tot, m;
struct buf *bp;
if(ip->type == T_DEV){
if(ip->major < 0 || ip->major >= NDEV || !devsw[ip->major].read)
return -1;
return devsw[ip->major].read(ip, dst, n);
}
if(off > ip->size || off + n < off)
return -1;
if(off + n > ip->size)
n = ip->size - off;
for(tot=0; tot<n; tot+=m, off+=m, dst+=m){
bp = bread(ip->dev, bmap(ip, off/BSIZE));
m = min(n - tot, BSIZE - off%BSIZE);
memmove(dst, bp->data + off%BSIZE, m);
brelse(bp);
}
return n;
}
raw device read vs. buffered read
46Youjip Won
writei(inode *ip, char *dst, uint off, uint u)
write n byte of dst to off position of ip.
if the inode is device, write directly to the user buffer.
If the inode represents file, write first to buffer cache and then call log_write().
47Youjip Won
int
writei(struct inode *ip, char *src, uint off, uint n)
{
uint tot, m;
struct buf *bp;
if(ip->type == T_DEV){
if(ip->major < 0 || ip->major >= NDEV || !devsw[ip->major].write)
return -1;
return devsw[ip->major].write(ip, src, n);
}
if(off > ip->size || off + n < off)
return -1;
if(off + n > MAXFILE*BSIZE)
return -1;
for(tot=0; tot<n; tot+=m, off+=m, src+=m){
bp = bread(ip->dev, bmap(ip, off/BSIZE));
m = min(n - tot, BSIZE - off%BSIZE);
memmove(bp->data + off%BSIZE, src, m);
log_write(bp);
brelse(bp);
}
if(n > 0 && off > ip->size){
ip->size = off;
iupdate(ip);
}
return n;
}
48Youjip Won
summary
inode structure: dinode, inode
iget, iput
ilock/iunlock
Updating the on-disk fields of the inode entry is write through: via iupdate().
protection
icache.lock: spin lock to protect the changes in the in-memory field of the inode
inode.lock: sleep lock, to synchronize the changes in the in-memory and on-dis
k inodes.