1
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1
CS 537 Lecture 13
File Systems
Michael Swift
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 2
Role of OS for I/O • Standard library
– Provide abstractions, consistent interface – Simplify access to hardware devices
• Resource coordination – Provide protection across users/processes – Provide fair and efficient performance
• Requires understanding of underlying device characteristics • User processes do not have direct access to devices
– Could crash entire system – Could read/write data without appropriate permissions – Could hog device unfairly
• OS exports higher-level functions – File system: Provides file and directory abstractions – File system operations: mkdir, create, read, write
11/08/07 © 2005 Steve Gribble 3
File systems
• The concept of a file system is simple – the implementation of the abstraction for secondary storage
• abstraction = files – logical organization of files into directories
• the directory hierarchy – sharing of data between processes, people and machines
• access control, consistency, …
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 4
Abstraction: File • User view
– Named collection of bytes • Untyped or typed • Examples: text, source, object, executables, application-specific
– Permanently and conveniently available
2
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 5
Files • A file is a collection of data with some properties
– contents, size, owner, last read/write time, protection … • Files may also have types
– understood by file system • device, directory, symbolic link
– understood by other parts of OS or by runtime libraries • executable, dll, source code, object code, text file, …
• Type can be encoded in the file’s name or contents – file extension: .com, .exe, .bat, .dll, .jpg, .mov, .mp3, … – content: #! for scripts
• Operating system view – Map bytes as collection of blocks on physical non-volatile storage device
• Magnetic disks, tapes, NVRAM, battery-backed RAM • Persistent across reboots and power failures
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 6
File Meta-Data • Meta-data: Additional system information associated with each
file – Name of file – Type of file – Pointer to data blocks on disk – File size – Times: Creation, access, modification – Owner and group id – Protection bits (read or write) – Special file? (directory? symbolic link?)
• Meta-data is stored on disk – Conceptually: meta-data can be stored as array on disk
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 7
File access methods • Some file systems provide different access methods that specify
ways the application will access data – sequential access
• read bytes one at a time, in order – Random access
• access given a part of a file by block/byte # – record access
• file is array of fixed- or variable-sized records – Search access
• FS contains an index of file contents • apps query the FS for files containing a word
• Why do we care about distinguishing sequential from random access? – what might the FS do differently in these cases?
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 8
File Operations • Create file with given pathname /a/b/file
– Traverse pathname, allocate meta-data and directory entry • Read from (or write to) offset in file
– Find (or allocate) blocks of file on disk; update meta-data • Delete
– Remove directory entry, free disk space allocated to file • Truncate file (set size to 0, keep other attributes)
– Free disk space allocated to file • Rename file
– Change directory entry • Copy file
– Allocate new directory entry, find space on disk and copy • Change access permissions
– Change permissions in meta-data
3
11/08/07 © 2005 Steve Gribble
Opening Files Expensive to access files with full pathnames
– On every read/write operation: • Traverse directory structure • Check access permissions
Open() file before first access – User specifies mode: read and/or write – Search directories for filename and check permissions – Copy relevant meta-data to open file table in memory – Return index in open file table to process (file descriptor) – Process uses file descriptor to read/write to file
Per-process open file table – Current position in file (offset for reads and writes) – Open mode
Enables redirection from stdout to particular file 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and
Remzi Arpaci-Dussea, Michael Swift 10
Directories • Directories provide:
– a way for users to organize their files – a convenient file name space for both users and FS’s – a map from file name to blocks of file data on disk
• Actually, map file name to file meta-data (which enables one to find data on disk)
• Most file systems support multi-level directories – naming hierarchies (/, /usr, /usr/local, /usr/local/bin, …)
• Most file systems support the notion of current directory – absolute names: fully-qualified starting from root of FS
bash$ cd /usr/local
– relative names: specified with respect to current directory bash$ cd /usr/local (absolute) bash$ cd bin (relative, equivalent to cd /usr/local/bin)
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 11
Directories: Tree-Structured • Directory listing contains <name, index>, but name can be directory
– Directory is stored and treated like a file – Special bit set in meta-data for directories
• User programs can read directories • Only system programs can write directories
– Specify full pathname by separating directories and files with special characters (e.g., \ or /)
• Special directories – Root: Fixed index for meta-data (e.g., 2) – This directory: . – Parent directory: ..
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 12
Path name translation • Let’s say you want to open “/one/two/three”
fd = open(“/one/two/three”, O_RDWR);
• What goes on inside the file system? – open directory “/” (well known, can always find) – search the directory for “one”, get location of “one” – open directory “one”, search for “two”, get location of “two” – open directory “two”, search for “three”, get loc. of “three” – open file “three” – (of course, permissions are checked at each step)
• FS spends lots of time walking down directory paths – this is why open is separate from read/write (session state) – OS will cache prefix lookups to enhance performance
• /a/b, /a/bb, /a/bbb all share the “/a” prefix
4
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 13
Acyclic-Graph Directories • More general than tree structure
– Add connections across the tree (no cycles) – Create links from one file (or directory) to another
• Hard link: “ln a b” (“a” must exist already) – Idea: Can use name “a” or “b” to get to same file data – Implementation: Multiple directory entries point to same meta-data – What happens when you remove a? Does b still exist?
• How is this feature implemented??? – Unix: Does not create hard links to directories. Why?
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 14
File System Workloads drive designs • Motivation: Workloads influence design of file system • File characteristics (measurements of UNIX and NT)
– Most files are small (about 8KB) – Most of the disk is allocated to large files
• (90% of data is in 10% of files) • Access patterns
– Sequential: Data in file is read/written in order • Most common access pattern
– Random (direct): Access block without referencing predecessors • Difficult to optimize
– Access files in same directory together • Spatial locality
– Access meta-data when access file • Need meta-data to find data
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 15
Goals • OS allocates LBNs (logical block numbers) to meta-data, file
data, and directory data – Workload items accessed together should be close in LBN space
• Implications – Large files should be allocated sequentially – Files in same directory should be allocated near each other – Data should be allocated near its meta-data
• Meta-Data: Where is it stored on disk? – Embedded within each directory entry – In data structure separate from directory entry
• Directory entry points to meta-data
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 16
Allocation Strategies • Progression of different approaches
– Contiguous – Extent-based – Linked – File-allocation Tables – Indexed – Multi-level Indexed
• Questions – Amount of fragmentation (internal and external)? – Ability to grow file over time? – Seek cost for sequential accesses? – Speed to find data blocks for random accesses? – Wasted space for pointers to data blocks?
5
Per-file Metadata
• In unix, the data representing a file is called an inode (for indirect node) – Inodes contain file size, access times, owner, permissions – Inodes contain information on how to find the file data
(locations on disk)
• Every inode has a location on disk.
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 17 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and
Remzi Arpaci-Dussea, Michael Swift 18
Contiguous Allocation • Allocate each file to contiguous blocks on disk
– Meta-data: Starting block and size of file – OS allocates by finding sufficient free space
• Must predict future size of file; Should space be reserved? – Example: IBM OS/360
• Advantages – Little overhead for meta-data – Excellent performance for sequential accesses – Simple to calculate random addresses
• Drawbacks – Horrible external fragmentation (Requires periodic compaction) – May not be able to grow file without moving it
Contiguous Allocation of Disk Space
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 20
Extent-Based Allocation • Allocate multiple contiguous regions (extents) per file
– Meta-data: Small array (2-6) designating each extent • Each entry: starting block and size
• Improves contiguous allocation – File can grow over time (until run out of extents) – Helps with external fragmentation
• Advantages – Limited overhead for meta-data – Very good performance for sequential accesses – Simple to calculate random addresses
• Disadvantages (Small number of extents): – External fragmentation can still be a problem – Not able to grow file when run out of extents
D � A � A � A � B � B � B � B � C� C� C� B � B �D � D �
6
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 21
Linked Allocation • Allocate linked-list of fixed-sized blocks
– Meta-data: Location of first block of file • Each block also contains pointer to next block
– Examples: TOPS-10, Alto
• Advantages – No external fragmentation – Files can be easily grown, with no limit
• Disadvantages – Cannot calculate random addresses w/o reading previous blocks – Sequential bandwidth may not be good
• Try to allocate blocks of file contiguously for best performance – Sensitivity to corruption
• Trade-off: Block size (does not need to equal sector size) – Larger --> ?? – Smaller --> ??
D � A � A � A � B � B � B � B � C� C� C� B � B �D � D � D � D �B �
Linked Allocation
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 23
File-Allocation Table (FAT) • Variation of Linked allocation
– Keep linked-list information for all files in on-disk FAT table – Meta-data: Location of first block of file
• And, FAT table itself
• Comparison to Linked Allocation – Same basic advantages and disadvantages – Disadvantage: Read from two disk locations for every data read – Optimization: Cache FAT in main memory
• Advantage: Greatly improves random accesses
File-Allocation Table
7
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 25
Indexed Allocation
• Allocate fixed-sized blocks for each file – Meta-data: Fixed-sized array of block pointers
• Allocate space for ptrs at file creation time
• Advantages – No external fragmentation – Files can be easily grown, with no limit – Supports random access
• Disadvantages – Large overhead for meta-data:
• Wastes space for unneeded pointers (most files are small!)
Indexed Allocation • Brings all pointers together into the index block • Logical view
index table"
Example of Indexed Allocation
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 28
Multi-Level Indexed Files • Variation of Indexed Allocation
– Dynamically allocate hierarchy of pointers to blocks as needed – Meta-data: Small number of pointers allocated statically
• Additional pointers to blocks of pointers – Examples: original UNIX-based file systems
• Comparison to Indexed Allocation – Advantage: Does not waste space for unneeded pointers
• Still fast access for small files • Can grow to what size??
– Disadvantage: Need to read indirect blocks of pointers to calculate addresses (extra disk read)
• Keep indirect blocks cached in main memory
indirect �double�indirect � indirect � triple�
indirect �
8
Multi-level Indexes
• Inode = data on disk containing metadata for a file • Algorithm for looking up a block
– If block number < # direct • Read block number from inode
– If block number < (# direct) + (block size/index size) • Read indirect block address from inode • Read indirect block from disk • Find index in indirect block (block no. - # direct)
– If block number < …
• Compare to multi-level page tables in the file system – What are the differences? – Why? – As with page tables, block size affects how big files can be
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 29 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and
Remzi Arpaci-Dussea, Michael Swift 30
Free space management
• How do you remember which blocks are free? – What operations are needed?
• Free a block • Get a free block(s) -- in some particular location
• Free list: linked list of free blocks – Advantages: simple, constant-time operation – Disadvantage: rapidly loses locality – Used in Unix UFS and FAT
• Bitmap: bitmap of all blocks indicating which are free – Advantages: can find strings of consecutive free blocks
• X86 provides instructions to find 1 bits – Disadvantages: space overhead – Used in Unix FFS
Linked Free Space List on Disk
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 32
Directory internals
• A directory is typically just a file that happens to contain special metadata – directory = list of (name of file, file attributes) – attributes include such things as:
• size, protection, location on disk, creation time, access time, … – the directory list is usually unordered (effectively random)
• when you type “ls”, the “ls” command sorts the results for you
9
Directory Implementation
• A directory is a file containing – name – metadata about file (Windows)
• size • owner • data locations
– Pointer to file metadata (Unix) • Organization
– Linear list of file names with pointer to the data blocks • simple to program • time-consuming to execute
– BTree – balanced tree sorted by name • Faster searching for large directories
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 34
The tree (directory, hierarchical) file system
• A directory is a flat file of fixed-size entries • Each entry consists of an i-node number and a file
name i-node number File name
152 . 18 ..
216 my_file 4 another_file
93 oh_my_god 144 a_directory
• It’s as simple as that!
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 35
Using directories • How do you find files?
– Read the directory, search for the name you want (checking for wildcards)
• How do you list files (ls) – Read directory contents, print name field
• How do you list file attributes (ls -l) – Read directory contents, open inodes, print name +
attributes
• How do you sort the output (ls -S, ls -t) – The FS doesn’t do it -- ls does it!
Mounting Directories
• When a system has multiple disks or partitions, how are they named? – Dos/Windows: drive letters:
• c:\programs files\... • D:\data
– Unix: mount points • /mnt/
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 36
10
File System Mounting • Mounting stores in memory a pointer from a
directory on one partition to the root of another partition
• During access, the OS checks every directory to see if it is a real directory or a mount point – If mount point, follows pointer in memory to find the
root directory of the other file system.
Mounting Replaces Directories
What happens if you mount (b) under /users?
Resulting File System Steps to Create a File
1. Check if name is in use a. Find directory inode b. Read directory contents for existing files
2. Allocate inode a. Update from free inode bitmap/list b. Fill in inode contents
3. Add inode to directory a. Write back directory contents
What happens if you do this in the wrong order?
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 40
11
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 41
File system consistency
• Metadata, directory and file blocks are cached in memory
• The “sync” command forces memory-resident disk information to be written to disk – system does a sync every few seconds
• A crash or power failure between sync’s can leave an inconsistent disk
• You could reduce the frequency of problems by reducing caching, but performance would suffer big-time
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 42
i-check: consistency of the flat file system
• Is each block on exactly one list? – create a bit vector with as many entries as there are blocks – follow the free list and each i-node block list – when a block is encountered, examine its bit
• If the bit was 0, set it to 1 • if the bit was already 1
– if the block is both in a file and on the free list, remove it from the free list and cross your fingers
– if the block is in two files, call support!
– if there are any 0’s left at the end, put those blocks on the free list
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 43
d-check: consistency of the directory file system
• Do the directories form a tree? • Does the link count of each file equal the number of
directories links to it? – I will spare you the details
• uses a zero-initialized vector of counters, one per i-node • walk the tree, then visit every i-node
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 44
Protection systems
• FS must implement some kind of protection system – to control who can access a file (user) – to control how they can access it (e.g., read, write, or exec)
• More generally: – generalize files to objects (the “what”) – generalize users to principals (the “who”, user or program) – generalize read/write to actions (the “how”, or operations)
• A protection system dictates whether a given action performed by a given principal on a given object should be allowed – e.g., you can read or write your files, but others cannot – e.g., your can read /etc/motd but you cannot write to it
12
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 45
Model for representing protection • Two different ways of thinking about it:
– access control lists (ACLs) • for each object, keep list of principals and principals’ allowed actions • Like a guest list (check identity of caller on each access)
– capabilities • for each principal, keep list of objects and principal’s allowed actions • Like a key (something you present to open a door)
• Both can be represented with the following matrix:
/etc/passwd /home/swift /home/guest
root rw rw rw
swift r rw r
guest r principals
objects
ACL
capability
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 46
ACLs vs. Capabilities • Capabilities are easy to transfer
– they are like keys: can hand them off – they make sharing easy
• ACLs are easier to manage – object-centric, easy to grant and revoke
• to revoke capability, need to keep track of principals that have it • hard to do, given that principals can hand off capabilities
• ACLs grow large when object is heavily shared – can simplify by using “groups”
• put users in groups, put groups in ACLs • you are could be in the “cs537-students” group
– additional benefit • change group membership, affects ALL objects that have this group in
its ACL
11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 47
Protection in the Unix FS
• Objects: individual files • Principals: owner/group/world • Actions: read/write/execute
• This is pretty simple and rigid, but it has proven to be about what we can handle!