cpsc 231 d.h.1 learning objectives understanding of disk versus ram performance gap. understanding...
TRANSCRIPT
CPSC 231 D.H. 1
Learning Objectives
• Understanding of disk versus RAM performance gap.
• Understanding definition, design goals and design problems of file structure.
• Understanding of file structure research history.
• Understanding and naming key terms used in file structure.
CPSC 231 D.H. 2
Secondary Storage in Computer Systems
• Data can be stored on:• hard disks
• floppy disks
• tapes
• CD-ROMs
• ZIP and JAZZ disks
• network servers
• Most data is stored on hard disks.
CPSC 231 D.H. 3
Disks
• Disks provide enormous capacity to store information.
• Disks are orders of magnitude slower than main memory (a single disk access can take a quarter of million times longer than a single RAM access).
• DISK = LARGE and SLOW and CHEAP
• RAM = SMALL and FAST
CPSC 231 D.H. 4
RAM versus DiskPerformance Gap
• Example:– 120 nanoseconds to access RAM (Main Memory)
– 30 milliseconds to access disk
• Analogy:– 20 seconds versus 58 days
• CONCLUSION:– Application programs have to spend a lot of
time waiting for data to be read from the disk or to be written to the disk.
CPSC 231 D.H. 5
Questions• What is a millisecond, microsecond and
nanosecond?• Millisecond = 1/1000 s
• Microsecond = 1/1000000 s
• Nanosecond = 1/1000000000 s
• How many times is RAM access faster than disk access?
• Assume • 120 nanoseconds to access RAM (Main Memory)
• 30 milliseconds to access disk
CPSC 231 D.H. 6
File Structure• Definition:
– A file structure is a combination of: • representation for data in files and
• of operations for accessing the data.
– A file structure allows applications to read, write and modify data.
– A good file structure design will give an application an efficient (fast) access to the needed data.
CPSC 231 D.H. 7
File Structure Design Goals
• Minimize the total disk access time • by clustering related data together
• by keeping adjacent blocks close to each other on the disk
• ideally, get all the needed data in just ONE disk access
• Maximize the total disk space utilization• disk de-fragmentation procedures
• data compression
CPSC 231 D.H. 8
Files structure design problems
• One of the most difficult problems in meeting the design goals of a file structure is the fact that files are quite dynamic, i.e. they:
• grow
• shrink
• change their data
• The design goals would be easier to meet if files were static. WHY?
CPSC 231 D.H. 9
Historical view of file structure design
• Early work • presumed that files were located on tapes
• access was sequential
• Recent work• most files are stored on direct access devices (s.a.
hard disks, floppy disks, CD-ROMs, ZIP disks , etc.)
• large files required indexing
• indexes and keys allowed for speedy searches of data on the disk
CPSC 231 D.H. 10
File structure history cont.
• Indexed files grew and became slow to access => tree structures emerged.
• Unfortunately some trees grew very unevenly resulting in slow (almost sequential) searches => AVL trees emerged (self-adjusting binary trees)
• AVL trees grew large and required multiple disk accesses => B-trees emerged.
Tree File
CPSC 231 D.H. 11
B - Tree
CPSC 231 D.H. 13
CPSC 231 D.H. 14
File structure history cont.• B-trees provided excellent performance for
non-sequential files but sequential access was very slow => B+ trees emerged.
• B-trees and B+ trees became the basis for many commercial file systems, since they provide access times that grows in the proportion to logkN, where N is the number of entries in the file and k is the number of entries indexed in a single block of the B-tree.
B+ Trees
CPSC 231 D.H. 15
CPSC 231 D.H. 16
Hashing
• Hashing is a data access mechanism that is based on converting the search key into a storage address.
• A good hashing algorithm can significantly reduce the number of disk accesses.
• Extendible hashing is a hashing that works well with files that over time undergo substantial changes in size.
CPSC 231 D.H. 18
Key terms.• AVL tree - self adjusting binary tree that
can guarantee good access times for data stored in memory (but not on the disk).
• B-tree - a tree structure that provides fast access to data stored in files. B-tree does NOT have to be a binary tree.
• B+ tree - a variation of the B-tree structure that provides for fast sequential access to data as well as indexed access.
CPSC 231 D.H. 19
Key Terms Cont.• File structure
– the organization of data on secondary storage devices such as disks together with operations defined for the data
• Sequential access– access of data that takes records in serial order,
looking at the first, second, and so on.
• Random access– access of data that that takes records in any
order, not necessary serial.
CPSC 231 D.H. 20
Physical files and logical files.• Files are collections of related information.
• Physical files exist on secondary storage devices. Operating systems are responsible for managing physical files.
• Logical files are visible to application programs. Application programs do not know about physical locations of the files (often they do not know if the data is coming from a file or from a keyboard)
CPSC 231 D.H. 21
Association between physical and logical files
• Applications have to make an association between physical and logical file names. In C++ this can be done in the following way:
• ofstream outClientFile (“clients.dat”, ios:out)
• The application can write to outClientFile while the operating system sees clients.dat
CPSC 231 D.H. 22
Special Characters in Files• All computer systems have reserved a
number of characters for specific system functions.
• Examples:– Control-Z indicates often end-of-file in MS-
DOS programs– Control-D indicates often end-of-file in Unix
programs– CR (Carriage return) and LF (Line Feed)
characters together indicate end-of-line
CPSC 231 D.H. 23
Directory Structures
• Files are stored in directories. Thus directories are collections of files
• Most modern systems maintain a tree directory structure:(WHY?)
CPSC 231 D.H. 24
I/O Redirection
• I/O redirection allows for changing the source of input to come from a file instead of a keyboard:
– program < file /* program reads input form a file /* instead of keyboard
• I/O redirection allows for directing the output to go a file instead of the screen
– program > file /* program writes to a file instead of /* the screenRedirection
operator
CPSC 231 D.H. 25
Pipes
• An output of one program can be used as an input to another program be using pipes:
• Example:– program1 | program2
Pipe operator
Pipe Operator
CPSC 231 D.H. 26