a comparison of journaling and transactional file systems

7/25/2019 A Comparison of Journaling and Transactional File Systems

http://slidepdf.com/reader/full/a-comparison-of-journaling-and-transactional-file-systems 1/12

WHITEPAPER

A Comparison of Journaling andTransactional File Systems

by Datalight Staff



1 | WHITEPAPER

A Comparison of Journaling and TransactionalFile Systems

Executive Summary

As end-user expectations rise and embedded devices get more complex, reli-able file management is rapidly becoming a commonplace requirement. The

file systems available in most embedded operating systems were not specifical-

ly designed with the needs of the embedded marketplace in mind, but instead

evolved out of solutions developed for desktop and server environments. These

file systems have significant shortcomings in embedded devices:

1) They were not designed for use in an environment where power may be

lost.

2) The error recovery processes are slow, which is typically not acceptable

when “instant-on” is the user’s expectation.

3) Desktop and server file systems were not designed for use in an environ-

ment with limited resources such as is often found in embedded devices.

This white paper looks at the limitations of desktop/server file systems in em-

bedded devices, and then reviews the operation of two different solutions—

journaling and transactional file systems— and the key differences between

them.

Note: The paper assumes familiarity with file system basics. File system concepts are covered in Ap-

pendix A for those new to the topic.

Contents

1 Executive Summary

1 Introduction

2 Challenges with Tradi-

tional File Systems

2 Challenges in Building a

Reliable File System

3 Journaling Versus Trans-

actional File Systems

3 User-data Integrity

4 Performance

4 Disk Space Efficiency

4 Programmability

6 Overview of Journal-ing File Systems

Examples of Journ-

aling File Systems

8 Overview of Transac-

tional File Systems

Examples of Trans-

actional File Sys-

tems

10 Appendix A

10 File System Basics

12 Bibliography



2 | WHITEPAPER

Introduction

In the past, most embedded devices did not require file management systems. But data storage

needs in the embedded marketplace have increased dramatically over the last 10 years, putting

greater demands on embedded file systems.

The most popular file system for embedded devices today, FAT, originated in the desktop environ-

ment. Other embedded file systems, such as ext3, originated in the server environment.

The problem is that desktop and server environments provide controlled startup and shutdown

procedures for file systems, while in the embedded world many devices operate in environments

where power may be unexpectedly lost or interrupted.

Developers are currently exploring alternatives to traditional file systems like FAT and ext3 that

have proven to be inadequate for today’s embedded devices. To address the file system reliabil-

ity in embedded file systems, developers first tried journaling file systems. File systems in this

category include ext3, JFS, ReiserFS, and XFS. Originally developed for use in Linux server envi-

ronments, these file systems were adopted to address the problems of power loss and system

crashes seen in embedded devices.

Journaling file systems are reliable. However, there is another category of file systems calledtransactional file systems that are not only more reliable but, unlike journaling file systems, were

specifically designed for small, resource-constrained embedded devices. Datalight Reliance is an

example of such a file system.

Transactional file systems also offer better performance. The combination of reliability and per-

formance is attractive to embedded developers who are being challenged to produce not only

more reliable devices, but also devices that provide end users with a problem-free, high-perfor-

mance data storage experience.

Challenges with Traditional File SystemsFile systems originally created for use in desktop environments were not designed to accommo-

date the loss of power while writing to the disk.

When writing to any given file in any file system, several different areas of the disk must be up-

dated. For example, in a typical FAT file system, appending data to the end of a file entails writing

to four different areas on the disk:

1) Allocating a new block and marking it as “used” in the first file allocation table.

2) Marking the same block as “used” in the second file allocation table.

3) Updating the file’s directory entry to record the new length and timestamp.

4) Writing the actual data to the disk.

The first three steps involve updating the file system “metadata,” while the fourth step is updat-

ing the “user data.” Metadata is simply the logical information about the file system structure,

while the user data is the actual contents of a file.

If power is interrupted at any point in this process, the file system becomes unstable because the



3 | WHITEPAPER

metadata is not in sync. A reliable file system must ensure that all four steps are completed in an

atomic fashion – that is, they are either all completed in their entirety, or none of them are per-

formed. This is referred to as “atomicity,” and it is the foundational concept for both journaling

and transactional file systems.

Created with the idea that they would be used in power stable environments, traditional file sys-

tems were not designed to provide atomicity. In order to deal with situations where the file sys-

tem’s metadata structures does become corrupted, utilities such as chkdsk, scandisk, or fsck werecreated to scan the entire file system (usually at system startup time) for problems. In addition to

being time consuming, the scanning process provides no guarantees that the actual file data got

written, only that the metadata structures are fixed.

Challenges in Building a Reliable File System

Journaling and transactional file systems were developed to address the limitations of traditional

desktop file systems. The journaling approach was primarily developed to address the needs of

the server market, while the transactional approach was developed to address the needs of the

embedded market.

Both journaling file systems and transactional file systems use the concept of a transaction. A

transaction is defined to be a file system event that is atomic. For example, a text file is either

saved or not saved to a disk. By keeping file system events atomic, the state of the file system is

always known, and therefore, the file system is perceived by the user to be reliable.

Always knowing the state of the file system is critical if power is lost or the system crashes. It

allows the system to power back up with the file system intact. The user experiences minimal

lost data.

Every file system requires writes to different areas of the disk when writing to any given file.

Implementing these writes as a transaction requires that data be written in such a fashion that

it can be determined which operations are valid and which operations are invalid should the pro-cess be interrupted at an inopportune moment. This requires that the on-media format be spe-

cifically designed for this, which is why it is impossible to write a FAT compatible file system that

is completely reliable. To do so would require media format changes and this would make the

system non-compatible with FAT.

There are two aspects of modern computer hardware that make it very challenging to implement

a transaction:

1) Modern disk controllers may write data to the disk in a different order than that

requested by the file system. For example, writes are often queued up inside the disk

controller hardware. Depending on where the disk heads are in relation to the media

when a write request is issued, a given sector may be written before other sectors thatwere already queued up.

2) Depending on the media and the block device driver design, individual sector writes

may not be atomic. In most cases, physical sector writes are atomic (either completely

written, or not modified at all). A truly reliable file system, however, cannot count on

this.



4 | WHITEPAPER

Journaling Versus Transactional File Systems

Journaling file systems use a logging approach to manage transactions. Disk operations are re-

corded as transactions in a circular buffer known as “the journal.” Transactional file systems use

a dual-state set of data structures to manage transactions. The two states are referred to as the

“committed-state” and the “working-state.”

As a result of these two different approaches, the key differences between journaling file sys-

tems and transactional file systems fall into the following areas:

User-data Integrity

The primary focus of journaling file systems is the preservation of the file system metadata,

whereas a transactional file system ensures the integrity of both the metadata and the user’s

data. One notable exception is ext3, which does have modes for preserving user data as well.

These options are discussed in more detail later.

A second aspect of user-data integrity is how blocks are written to the disk. In a transactional file

system, user data belonging to the committed state is never overwritten. File operations that

would overwrite existing data are instead written to free blocks. Should the power be lost at an

inopportune moment, the committed disk state remains unchanged.

In a journaling file system, user data may be overwritten during the normal course of operations.

Upon startup after a power loss, the journal will be replayed to fix any metadata problems. The

user data, however, is in an unknown state, and it becomes the responsibility of the application

to determine the state of the data. This is a difficult problem issue to resolve for a variety of rea-

sons, not the least of which is the issue that the hard disk may write the data out of order, as pre-

viously described.

PerformancePerformance issues fall into two categories:

1) Operational performance. A journaling file system is typically slower than a trans-

actional file system because metadata changes must be written twice—once to the

journal and once to the actual disk. A transactional file system only writes metadata

changes a single time.

2) System startup performance. In the event of a power loss, a journaling file system

must open the journal and replay the events to ensure the file system integrity. This

will take a variable amount of time depending on the number of events in the journal.

A transactional file system needs only to perform a simple checksum on two logical disk

blocks to determine which one points to the valid disk state.

Disk Space Efficiency

Journaling file systems require that a fixed amount of disk space be set aside for the journal it-

self—this is in addition to the standard file system metadata. The journal size must be large



5 | WHITEPAPER

enough to contain the maximum number of events the system could ever need and the size is

determined at format time.

Transactional file systems have no such requirement, and in fact, the space needed to record the

dual-state information is smaller than the overhead required by most FAT implementations.

Programmability

Journaling file systems typically operate in a completely automated fashion, about which the

running applications have no specific knowledge. Automated operation is ideal for legacy pro-

grams that won’t be modified for use in the embedded system.

Transactional file systems can run in a similar automated fashion, or can be specifically controlled

by an application. Many programs used in embedded devices are specifically designed for that

environment and can benefit greatly by using a transactional file system that allows the applica-

tion to control how transactions are committed to disk.

For example, it is not uncommon for an application to need to update several files on disk in an

atomic fashion. This is a difficult problem to solve if a power interruption occurs and the applica-

tion has to contain logic to recover from the interrupted operation. With programmable transac-tions, this is easily accommodated.

Overview of Journaling File Systems

Journaling file systems were developed by the database com-

munity11 to solve the problem of data loss due to a system

crash or power loss in server environments. Journaling file sys-

tems ensure that file system operations are atomic. This ap-

proach enables the file system to be structurally sound at all

times. The user perceives that her data is reliable.

The most basic unit of journaling is called a “transaction.” In

the context of a journaling file system, a transaction can be

considered to be a single file operation. For example, a trans-

action could be “to create file A” or “to delete file B.”

Each transaction consists of a record of a sequence of changes made to separate disk sectors dur-

ing a file operation. When the last modification within a transaction is complete, the contents of

the transaction are written to a log.

A log is a fixed-sized, continuous area on the disk that the journaling code uses as a circular buf-

fer. The log is written only during normal operation, and when old transactions complete, their

space in the log is reclaimed.

The key to journaling is that the disk blocks modified during a transaction are not written until

after the entire transaction is successfully written to the log.

By buffering the transaction in memory until it is complete, journaling avoids partially written

transactions. If the system crashes before successfully writing the journal, the entry is not con-

1 Practical File System Design with the BE File System, Dominic Giampaolo, Morgan Kaufmann Publishers, Inc.,San Francisco, CA, 1999, page 112

Strengths of Journaling File Systems

• Protects file system structures

• Atomic operations

Weaknesses of Journaling File Systems

• Protection for user data negatively impacts

performance and overhead

• Slows down as disk utilization increases• Mount time slows propotional with size of

journal



6 | WHITEPAPER

sidered valid. If the system goes down after the journal is written, then when the device reboots,

it examines the log and replays outstanding transactions.

Two different approaches to journaling are used by journaling file systems22. The difference re-

lates to what information is written to the log:

1) Journaling file systems that log changes to metadata.

2) Journaling file systems that can log changes to both metadata and user data.

With either approach, logging changes to file system metadata is what guarantees the integrity

of a journaling file system. After a system crash, the structure of files, directories, and the file sys-

tem can be made consistent by re-executing any pending changes that are completely described

in the log.

Journaling file systems that support the logging of user data are rarely implemented. In addition

to being very slow, another shortcoming is that the log must be much larger due to the need to

record both user data and metadata.

Examples of Journaling File Systems

Operations3 of several journaling file systems developed for use in a server environment and then

later adapted for use in embedded devices are described in this section.

Linux Ext3

Ext3 is a Linux-based journaling file system. Ext3 users can specify whether they want to log all

changes to both file data and metadata or whether they want to log only metadata changes.

Selecting between logging all data and metadata changes (the ext3 journaled mode) or simply

logging metadata changes (the ext3 file system’s writeback mode) is done through mount op-

tions supplied when an ext3 file system is mounted.

Logging changes to both data and metadata is both more robust and substantially slower than

logging metadata changes only. It is more robust because it includes a complete record of chang-

es to all file system data in the log; it is slower because each committed file system update actu-

ally causes two sets of writes – the first set to the log when all the pending changes are logged,

and the second set when those changes are actually made to the file system.

The ext3 file system’s third logging mode, ordered logging, provides most of the guarantees of

fully journaled data mode without the performance penalties inherent in that mode. It does this

by flushing all data associated with a transaction to the disk before the transaction itself com-

mits.

IBM JFS

JFS is IBM’s full 64-bit journaling file system. It logs information about changes to the file sys-

tem metadata as atomic transactions. If an embedded device is restarted without cleaning (orunmounting) a JFS fileset, any transactions in the log that are not marked as having been com-

pleted on disk are replayed when the file system is checked for consistency before it is mounted.

This restores the consistency of the file system but not the contents of the files in the file system.

Files being edited when the system went down will not reflect any updates not successfully writ-

2 Linux File Systems, William von Hagen, Sams Publishing, 2002 3 Ibid



7 | WHITEPAPER

ten to the disk.

ReiserFS

ReiserFS is built into every version of Linux running a 2.4.1 or greater kernel. ReiserFS journals file

system metadata updates rather than both data changes and metadata updates. ReiserFS uses

some clever strategies to maximize metadata consistency, even in the event of a sudden system

failure. For example, when updating file system metadata, ReiserFS does not overwrite the exist-

ing metadata but instead writes it to a new location as close as possible to the existing metadata.

SGI XFS

XFS from SGI was developed by SGI for its UNIX multimedia workstation. XFS provides high

throughput for streaming video and audio, support for huge files, and the ability to store large

amounts of data. XFS file systems are full 64-bit file systems composed of three areas: the data

section, the log, and an optional real-time section. The log includes only file system metadata.

Overview of Transactional File Systems

Transactional file systems were developed to solve the prob-

lem of data loss in embedded systems. Rather than ensuring

that each file system operation is atomic, a transactional file

system ensures that all file system operations between two

points in time are atomic. The advantages of this approach are

higher reliability, better performance, and improved disk space

efficiency.

These file system operations can be file additions, changes, or

deletions, or directory additions or deletions. The transactional

file system writes data to the disk. The data is defined to be committed or “live” data only when

the transaction point occurs.

Examples of Transactional File Systems

Datalight Reliance

Committed and Working States

The Reliance file system has two distinct states—the committed (on-disk) state and the working

state. The committed state is the state found on the media at initialization time. The committed

state is also the state of the file system as written to the media at the last completed transac-

tion point.

Reliance uses a concept of a “metaroot” to refer to the start of the disk state. The committed

state and the working state each have their own metaroot. The metaroot is the base of a hierar-

chical structure of file metadata that represents everything stored on the disk.

The working state is essentially a set of deltas since the last transaction point. On a freshly for-

matted disk, and immediately after a transaction point, the working state is empty. As changes

are made, the working state is built by updating the metaroot with the changes made to the

committed state.

The working state consists of the file system’s logical structures in memory, data in the disk

Strengths of Transactional File Systems

• Protects file system structures

• Protects user data• Live data never overwritten• Atomic operations

• Consistent mount times regardless of disk size

Weaknesses of Transactional File Systems

• For optimal performance transaction settings

should be adapted for specific use cases



Copyright © 2013 Datalight, Inc. All rights reserved. DATALIGHT, Datalight, the Datalight Logo,FlashFX, FlashFX Pro, FlashFX Tera, FlashFXe, Reliance, Reliance Nitro, ROM-DOS, and Socketsare trademarks or registered trademarks of Datalight, Inc. All other product names are trade-

marks of their respective holders. Specification and price change privileges reserved.

Datalight, Inc.22118 20th Avenue SE, Suite 135Bothell, WA 98021 USA

1-800-221-6630www.Datalight.com

About Datalight

Datalight is the market lead-

er in software technologies

that manage data reliably

in embedded devices. For

more than 30 years, our fo-

cus on portable, flexible so-

lutions has enabled custom-ers to save money, reduce

development time and get

to market faster. Our cus-

tomers have discovered that

Datalight solutions result in

unparalleled interoperabil-

ity and increased customer

satisfaction. These accom-

plishments have earned

Datalight a reputation as

a provider of reliable and

cost effective software so-

lutions that are backed by

a commitment to custom-

er service and satisfaction.

For more information, vis-

it www.Datalight.com

or call 425.951.8086 ext 100.

gram is doing data logging, timed transactions might be perfect. However when the program is

updating its configuration data files, explicit control is often useful. Applications often have the

need to update a group of files in an atomic fashion—either they all get updated, or none of them

do. This is a difficult problem to solve in an unstable power environment.

With Datalight Reliance, the application developer can set the default model to automatic or

timed transactions, and then programmatically disable that mode, perform operations on a

whole group of files, perform an explicit transaction point, and then re-enable the default trans-action mode.

Datalight Reliance Nitro

Reliance Nitro is the next generation file system from Datalight that improves upon Reliance. It

shares the same transaction models and data protection features and adds a more sophsticated

metadata structure to improve performance, particularly for systems that have many small files



10 | WHITEPAPER

or complex directory structures. For more about the advantages gained from the Reliance Nitro

architecture, see our whitepaper: “ Achieving Breakthrough Performance From Tree-based File Sys-

tems”

Appendix A

File System Basics

This white paper assumes some familiarity with file system basics. Basic file system definitions

and concepts are outlined in this section.

A file system is a way to organize, store, retrieve, and manage information on a permanent stor-

age medium such as a hard disk or flash memory.44

Each file system has a block size. The block size is defined to be the smallest unit that a file system

can write. Everything a file system does is composed of operations done on blocks. Basic file sys-

tem operations include creating a file, opening a file, writing to a file, and so on.

A file system block is a logical unit rather than a physical unit. The logical block size of a file sys-

tem is either the same size or a multiple of the sector size of the underlying storage medium.

Selecting the right logical block size is a compromise between wasting as little disk space as pos-

sible and minimizing the number of blocks that have to be allocated to store a file.

User data is the named piece of information contained in a file. This piece of information may

be any of the following: text such as a letter, text such as program source code, a database, or a

graphic image.

Metadata is a piece of information about a file. Metadata may include the name of a file as well

as other information such as its owner, creation time, size, and date of last modification.An i-node is a location where a file system stores metadata about a file. The i-node also provides

a pointer to the contents of the file on disk.

The volume of a file system refers to the embedded disk or disks that has/have been initialized

with a file system. The term volume may refer to all the blocks on a disk, a portion of the blocks,

or it can even refer to a span of blocks across several disks.

The superblock of a file system is an area where a file system stores its critical volume-wide in-

formation. A superblock contains information such as the name and size of a volume. In some

systems the superblock may be referred to as the master block or the boot record.

Sector size or block size is the minimum unit that the storage medium can read or write. The

block or sector size of most modern hard disks is 512 bytes. Flash memory management software

manages flash memory so that it appears as a hard drive with 512-by te sectors even thought typi-

cally block sizes of flash memory are…

A file system directory is a way to name and organize multiple files. The main purpose of a direc-

4 Practical File System Design with the BE File System, Dominic Giampaolo, Morgan Kaufmann Publishers, Inc.San Francisco, CA 1999, page 7.



11 | WHITEPAPER

tory is to manage a list of files and to connect the name in the directory with the associated files.

Basic file system operations include initialize, mount, unmount, create a file, open a file, read a

file, create a directory, write to files, read files, delete files, rename files, open directories, and

read directories. These operations are fairly self explanatory with the exception of initialization,

mounting, and unmounting which are defined below.

Initialization of a file system occurs when an operating system creates an empty file system on

a given volume. The file system uses the volume size and any other user-specified options to de-termine the size and placement of its internal data structures.

Mounting a file system consists of several tasks: accessing a raw storage device, reading the su-

perblock and other file system metadata, and then preparing the file system for access to a vol-

ume. A part of this preparation is verifying that the file system is valid. An alternate term for a

valid file system is a “clean” or “consistent” file system meaning that the meta data and the user

data are consistent with each other. Full verification of a file system can take a long time, espe-

cially if the superblock indicates that the volume is “dirty” usually the result of an unexpected

power loss.

Unmounting of a file system involves flushing out to disk all in-memory state associated with

the volume. Once all the in-memory data (data in RAM or other non-volatile memory) is writtento the volume, the volume is said to be “clean.” The last operation of unmounting is to mark the

superblock to indicate that a normal shutdown occurred.

Bibliography

UNIX Filesystems: Evolution, Design, and Implementation Steve D. Pate, Wiley, 2003

Linux File Systems, William von Hagen, Sams Publishing, 2002

Practical File System Design with the BE File System, Dominic Giampaolo, Morgan Kaufmann

Publishers, Inc. San Francisco, CA 1999

a comparison of journaling and transactional file systems

Documents