avishai wool lecture 12 - 1 introduction to systems programming lecture 12 file systems
Post on 21-Dec-2015
217 views
TRANSCRIPT
Avishai Woollecture 12 - 1
Introduction to Systems Programming Lecture 12
File Systems
Avishai Woollecture 12 - 2
Long-term Information Storage• Computers have long-term storage devices (hard
disks, floppy disks, tapes, CDROM, DVD)
• The hardware & disk-driver expose a block interface: “read/write 512 bytes from block NN”
• This is too crude for most applications. • Next level of organization: organize the disk into
Files
Avishai Woollecture 12 - 3
What is a File?• A computer file is a collection of information that can be
identified and referenced in its entirety by a unique name. [Wikipedia]
• Unlike the disk, files can:– Get created, deleted, and renamed– Grow larger and smaller
• The OS maintains the mapping between files and storage units
• This work is done by an OS component called a “File System”
Avishai Woollecture 12 - 4
File Names
• Unix– long file names (255 bytes)– include special characters– case sensitive (Avishai.txt != avishai.txt)
• MS-DOS– file name = 8 chars, ‘.’, 3-char extension– case insensitive (Grades.txt, grades.txt, GRADES.txt,
all the same file)
Avishai Woollecture 12 - 5
File Structure• In Unix, Windows, a file has no structure (that the OS
knows about). It is just a sequence of characters.• A file can use part of a disk block
• A program can impose its own organization inside a file.• OS does not know where the lines start/end!• Conventions for text files:
– Unix: lines end with “Line feed” (LF; 0x0A)
– DOS/Windows: lines end with “Carriage return + line feed” (CR+LF, 0x0D 0x0A)
Avishai Woollecture 12 - 6
File Access• Sequential access
– Read all bytes, in order, from the beginning– Cannot jump around (too much) – can rewind or back up– Efficient: disks are optimized for this access pattern
• Random access– Read bytes in any order– Essential for data base systems– OS maintains a “file marker” for open files – this is the
offset to the next byte to be read– “Seek” system call moves the file marker
Avishai Woollecture 12 - 7
File AttributesW
indo
ws
Win
dow
s
Avishai Woollecture 12 - 8
File operations / system calls
• Create
• Delete
• Open
• Close
• Read
• Write
• Append
• Seek
• Get attributes
• Set attributes
• Rename
Avishai Woollecture 12 - 9
Hierarchical Directory Systems
Avishai Woollecture 12 - 10
Path Names
• Absolute path name:– Unix: /usr/home/yash/teaching/isp.txt– Windows: c:\My Documents\teaching\isp.txt
• Relative path name– use the “current directory” as a starting point– “.” (dot) for “this directory”– “..” (dotdot) for “parent of this directory”– if current directory is /usr/home/yash/research:
• ../teaching/isp.txt is the same file as
Avishai Woollecture 12 - 11
A Unix Program Using File System Calls (1/2)Called by “copyfile abc xyz”
Avishai Woollecture 12 - 12
A Unix Program Using File System Calls (2/2)
Avishai Woollecture 12 - 13
Memory-Mapped Files
• Some operating systems allow another interface:char *p = mmap (filename, …)
• Maps the contents of the file to a memory area
• Now p[0], p[1], … are the bytes of the file (implicit open & read of the whole file)
• p[3] = ‘a’ puts the character ‘a’ into the file
munmap (…)• Write the file to back to disk and close
Avishai Woollecture 12 - 14
Under the hood
• mmap() call does not read the file.
• Makes the file the “backing store” of the pages of the memory area
• Accessing the memory area causes page-faults that bring the data into or out of memory
• munmap() flushes all the pages to disk
Avishai Woollecture 12 - 15
Properties of memory-mapped files• Can’t extend the file –
– On Unix the length is specified in the call to mmap() – Operating system determines exact semantics
• Can be used to share memory between processes:– Both processes mmap() the same file– See each other’s changes to the content– Need concurrency control: semaphore, mutex, etc…
• Dangerous to access a file via mmap() and fread() at the same time: – Contents are unpredictable until munmap() !
Avishai Woollecture 12 - 16
File System Implementations
Avishai Woollecture 12 - 17
A possible file system layout
Structure of disk on all PCs
Internal structure of a partition of a Unix file-system
Avishai Woollecture 12 - 18
Basic disk organization
• MBR = Master Boot Record. Sector 0 of the disk. Contains the initial program loaded at power-up.
• Partition table = divides the physical disk into logical disks (C:, D:, etc)
• Each partition is viewed as a sequential array of numbered blocks, modeling the physical sectors
Avishai Woollecture 12 - 19
Files: Contiguous allocation
(a) Contiguous allocation of disk space for 7 files(b) State of the disk after files D and E have been removed
Avishai Woollecture 12 - 20
Properties of contiguous allocation
• Simple: need to keep start position and size
• High performance: to read whole file, just one seek followed by sequential reads
• But: over time disk becomes fragmented.
• A good design for CD-ROM and DVD: write once, file sizes known in advance.
Avishai Woollecture 12 - 21
Files: linked list
The links are on the disk (using block numbers)
Avishai Woollecture 12 - 22
Properties of linked-list files
• No fragmentation
• but:– sequential access within a file slows down if file
blocks not consecutive– random access very slow (how to do “fseek to byte
100,000 in this file?”)
Avishai Woollecture 12 - 23
Linked list: File Allocation Table
Keep the list links in a separate table in memory
Avishai Woollecture 12 - 24
Properties of a FAT
• Random access much better: linked list is in memory so no need to do disk access for every link
• but: – whole FAT needs to be in memory to be efficient– FAT can be large
Avishai Woollecture 12 - 25
i-nodes
Avishai Woollecture 12 - 26
Properties of i-nodes
• Instead of a global file table, each file’s i-node keeps track of its own blocks.
• Only need to keep i-node in memory if the file is open.
• Total memory (RAM) needed proportional to (size of i-node) x (max number of open files)
Avishai Woollecture 12 - 27
Example file systems
Avishai Woollecture 12 - 28
The MS-DOS file system
• File names 8+3 (UPPERCASE)
• No ownership: all files accessible to user
• Maintained via a File Allocation Table (FAT)
• Attributes:– read-only– hidden– system– archive
Avishai Woollecture 12 - 29
The MS-DOS directory entry
• Time is inaccurate: 2 bytes = 65536, but 86400 seconds per day
• Date uses 7-bit for year, starting 1/1/1980. Runs out in 2107
• First-block-number: index into FAT, with 64K entries
• 10 bytes (of 32) unused!
Avishai Woollecture 12 - 30
FAT-12/16
• Block (also called cluster), multiple of 512 bytes.
• FAT-12: 12-bit block addresses, 512-byte blocks– largest partition: 4096 x 512 = 2MB. OK for floppy
• For disks, MS allowed blocks of 1KB, 2KB, 4KB. Largest partition: 16MB
• FAT-16: switch to 16-bit addresses, block size up to 32KB.– Largest partition 2GB
Avishai Woollecture 12 - 31
FAT-32
a) Win95 2nd Edition / Win98 / Win ME
b) Really FAT-28: 28-bit block addresses
c) Potentially 228 x 215 per partition, but in reality only 241 = 2TB
d) FAT itself now occupies a large RAM:a) for 2GB disk, 4KB blocks 512K blocks FAT
uses 2MB RAM.
Avishai Woollecture 12 - 32
File system compatibility• The Win95 2e / Win98 file system added:
– FAT-32– long file names
• But needed to allow older MS-DOS & Win95 to read directories (backward compatibility).
• Result:– every file has 2 names (one 8+3, one long)– directory entries needed to be patched– Try “dir /x” in a cmd window to see…– Still case-insensitive under the hood…
Avishai Woollecture 12 - 33
The Unix V7 file system
Avishai Woollecture 12 - 34
The Classical Unix file system• Invented with Unix V7 for PDP-11 (1970’s)• 14-character names• all ASCII chars except ‘/’ and NUL (0x00)• every file has a 2-byte i-node number
– at most 64K files per file-system
• Allows some weird filenames: – “ ” (an empty space)– “ ” (a “newline” character)
Avishai Woollecture 12 - 35
A UNIX V7 directory entry
Avishai Woollecture 12 - 36
Reminder: Disk organization
Avishai Woollecture 12 - 37
A UNIX i-node
Max file size: d direct pointers, n indirect pointers p/block Blocksize * (d + n + n2 + n3)
Avishai Woollecture 12 - 38
The steps in looking up /usr/ast/mbox
Avishai Woollecture 12 - 39
BSD Improvements
• File names extended to 255 chars
• Divide disk into cylinder groups, try to keep i-node and file close together to avoid long seeks.
• Use 2 block sizes, one for large files, one for small files.
• Similar improvements also in the Linux file system (ext2).
Avishai Woollecture 12 - 40
The Win2000 (NTFS) file system
Avishai Woollecture 12 - 41
NTFS• Designed from scratch• Not compatible with Win95 / Win98• Usually 4KB blocks (clusters)• Blocks referred to by 64-bit numbers
• Main data structure: Master File Table (MFT)• Each MFT entry describes a file or directory• MFT entry = 1KB• MFT is a file, can be anywhere on disk
Avishai Woollecture 12 - 42
Block runs
• Idea: blocks of a file often sequential on disk
• A “run” is a set of consecutive blocks that belong to the same file
• No need to keep pointer to each block:– Enough to keep start/length of each run
Avishai Woollecture 12 - 43
An MFT record for a 3-run, 9-block file
MFT
Avishai Woollecture 12 - 44
Concepts for review• File• File name• File structure• Sequential access /
Random access• File attributes• Hierarchical directories• Path names• Memory-mapped files
• Master boot record (MBR)
• Partition table• Contiguous block
allocation• File allocation table
(FAT)• i-node• NTFS• MFT