file systems in real-time embedded applications march 4th eric julien introduction to file systems 1

25
File Systems in Real- Time Embedded Applications March 4th Eric Julien Introduction to File Systems 1

Upload: branden-mcgee

Post on 31-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

File Systems in Real-Time Embedded Applications

March 4thEric Julien

Introduction to File Systems

1

Week Agenda• Day 1: Introduction to File Systems• Day 2: Understanding how the File

Allocation Table (FAT) Operates• Day 3: Balancing performance, safety and

resource usage in an embedded file system

• Day 4: Choosing the right storage media• Day 5: The challenges of using NAND flash

memory in embedded systems.2

Definition of a file system

From the user’s perspective, the file system provides a means of organizing, storing and retrieving data to a permanent storage device.

3

Definition of a file system

From the designer’s perspective, the file system refers to all the internal data structures and algorithms that support these services.

4

Historical overview

• 1973: CP/M operating system was first introduced. Its FS was very simple and had no directory hierarchy.

• 1980: CP/M was modified and renamed QDOS. QDOS FS was based on a data structure called File Allocation Table.

• 1981: Microsoft bought QDOS and its FS and marketed them as MS-DOS and FAT.

5

FS in embedded systems

Embedded systems, as opposed to full-fledged computers, have strict limitations both in terms of processor speed and memory.

File systems designed for huge data centers (e.g. ZFS) are therefore not well-suited for small, less capable embedded systems.

6

Files

The file abstraction provides the user with a convenient way to retrieve previously stored pieces of data using their name. A file can be seen as a labeled data container.

7

File metadata

The file metadata refer to pieces of information stored on disk that describe a file. The metadata is not part of the file content. Examples of file metadata are:

• File name• File creation• File size• Security attributes

8

Directories

The directory abstraction provides the user with a convenient way to group related files.Internally, the directory stores information that allows file names to be associated with corresponding data block locations.Some old FS (such as early versions of DOS) had a single directory containing all files. Such FS are called flat file systems.

9

Device, partition and volume

The device refers to the physical storage media (e.g. hard disk, SD card, flash memory).The partition is a logical unit obtained by the division of the underlying device physical space (not FS specific).The volume is a formatted partition or device where the FS resides (FS specific).

10

Common internal structures

Although internal architectures vary widely from one FS to another, the base ingredients remain the same:

• Arrays• Bitmaps• Linked lists• Unbalanced trees• Balanced trees

11

Bitmaps

Often used to keep track of resource allocation.

Used by ext2/3/4, NTFS, HFS, ReiserFS among others.

12

0000000000000110

0000000000001000Resources 1, 2 and 19 are allocated

Linked lists

Used to store and manage directory content (a) and file content (b).

Used by ext2/3 (a) and FAT12/16/32 (b).

13

Dir X File A

File B

Dir Y

File C

(a)

File X Block A

Block B

Block C

Block D

(b)

Unbalanced trees

Heavily used by ext2/3 to organize data blocks. More levels of indirection are added as file grows.

14

B

File X

C D E

F G H I J K L M N

A

Metadata

Balanced trees (B-trees)

Figure (a) shows what a balanced tree looks like, as opposed to an unbalanced tree (b). The B-tree is a self-balancing tree that provides logarithmic-time search at the expense of a more complex node insertion/deletion.

15

(a)

(b)

B+-tree vs. linked list

B+-tree (a variant of B-tree) provides fast random access.

16

In a B+-tree, the search time is logarithmic and deterministic.

I

H

G

F

D

C

B

A

C B D F A H G I E

File X

In linked list the search time is linear and non deterministic.

File X

E

VS.

B C E F H I

E H I>=H so branch right

I>=I so branch right

Data found in 3 hops ! Data found

in 8 hops !

File systems• FAT• exFAT• Ext2/3/4• NTFS• HFS/HFS Plus• Btrfs• ZFS• Log-structured file systems (YaFFS, JFFS)

17

FAT

- 3 flavors: FAT12, FAT16 and FAT32.- DOS and Windows 9x file system.- Simple architecture based on linked

lists.- Well-suited for embedded because of its

low footprint (both on-disk and RAM).- Poor performances on big volumes

(remember linked-list vs. B-trees ?).- More on FAT later…

18

exFAT

- Smaller footprint than NTFS (more on NTFS later) but better performances than FAT32.

- Bitmaps used to track unallocated clusters (much faster than browsing the FAT).

- Huge file size limit (16 exabytes).

19

Ext2/3/4

• Default file system for many Linux distributions.

• Internal structure based on unbalanced trees with up to 3 levels of indirection.

• Journaling (in ext3) as a means of providing metadata reliability.

• Extents (variable-sized blocks) in ext4 allows better large file performances.

20

NTFS

• Default Windows file system since XP.• Based on extents.• Directory entries stored in a B-tree,

providing much better performances than FAT for huge directories.

• Clever handling of small files: data is stored with the metadata for fast access and low internal fragmentation.

21

HFS/HFS plus

• Default file system for Mac OS.• All files and directories metadata is

stored in a single giant B-tree.• HFS plus basically provides additional

support for bigger files and longer file names.

• Journaling possible with HFS plus.

22

Btrfs (B-tree file system)

• Almost everything (file, directory, resource allocation management) is B-tree.

• Copy-on-write is used as means of better reliability. Data or metadata is never overwritten. Instead, a modified block is written out-of-place and pointers to it are then adjusted to reflect new block location.

23

ZFS

• More than a regular file system: also a logical volume manager.

• Transactional model based on copy-on-write.

• Provides metadata AND data integrity by checksumming almost everything.

• Many advanced features such as data deduplication, snapshots and clones.

24

Log-structured file systems

• Storage media treated as log.• Good reliability: logging implies copy-

on-write.• High write throughput: logging allows

long sequential write operations.• Well-suited for flash media as it

inherently provides wear leveling.• Used by YaFFS and JFFS (both flash

FS).25