chap 5. managing files of records. chapter objectives extend the file structure concepts of chapter...

34
Chap 5. Managing Files of Records

Upload: sarai-wiles

Post on 14-Dec-2015

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Chap 5. Managing Files of Records

Page 2: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Chapter Objectives

Extend the file structure concepts of Chapter 4:

Search keys and canonical forms

Sequential search and Direct access

Files access and file organization

Examine other kinds of the file structures in terms of

Abstract data models

Metadata

Object-oriented file access

Extensibility

Examine issues of portability and standardization.

Page 3: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Contents

5.1 Record Access

5.2 More about Record Structures

5.3 Encapsulating Record I/O Ops in a Single Class

5.4 File Access and File Organization

5.5 Beyond Record Structures

5.6 Portability and Standardization

Page 4: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Record Access

Record Key

Canonical form : a standard form of a key

– e.g. Ames or ames or AMES (need conversion) Distinct keys : uniquely identify a single record

Primary keys, Secondary keys, Candidate keys

– Primary keys should be dataless (not updatable)

– Primary keys should be unchanging

Social-securiy-number: good primary key

– but, 999-99-9999 for all non-registered aliens

Page 5: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

O(n), n : the number of records Use record blocking

• A block of several recordsfields < records < blocks

• O(n), but blocking decreases the number of seeking • e.g.- 4000 records, 512 bytes length

Unblocked (sector-sized buffers): 512byte size buffer => average 2000 READ() calls

Blocked (16 recs / block) : 8K size buffer ==> average 125 READ() call

Sequential Search (1)

Page 6: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Sequential Search (2)

UNIX tools for sequential processing

cat, wc, grep

When sequential search is useful

Searching patterns in ASCII files

Processing files with few records

Searching records with a certain secondary key value

Page 7: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Direct Access

O(1) operation

RRN ( Relative Record Number )

It gives relative position of the record

Byte offset = N X R

r : record size, n : RRN value

In fixed length records

Class IOBuffer includes

direct read (DRead)

direct write (DWrite)

Page 8: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

• Record length is related to the size of the fields• Access vs. fragmentaion vs. implementation • Fixed length record

(a) With a fixed-length fields(b) With a variable-length fields

Unused space portion is filled with null character in C

Ames John 123 Maple Stillwater OK74075

Mason Alan 90 Eastgate Ada OK74820

Ames|John|123 Maple|Stillwater|OK|74075|

Mason|Alan|90 Eastgate|Ada|OK|74820|

Unused space

Unused space

(a)

(b)

Choosing a record length and structure

Page 9: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Header Records

General information about file

date and time of recent update, count of the num of records

Header record is often placed at the beginning of the file

Header records are a widely used, important file design tool

Page 10: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

IO Buffer Class definition(1)

class IOBufferAbstract base class for file buffers

public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream

// these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record

// these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const;

protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values

Page 11: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

The full definition of buffer class hierarchy

write method : adds header to a file and return the number of bytes in the

header

read method : reads the header and check for consistency

WriteHeader method : writes the string IOBuffer at the beginning of the

file.

ReadHeader method : reads the record size from the header and checks

that its value is the same as that of the BufferSize member of the buffer

object

DWrite/DRead methods : operates using the byte address of the record

as the record reference. Dread method begins by seeking to the requested

spot.

IO Buffer Class definition(2)

Page 12: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

ReadHeader, WriteHeader Member Functions

static const char * headerStr = "IOBuffer";static const int headerSize = strlen (headerStr);

int IOBuffer::ReadHeader (istream & stream) {

char str[headerSize+1];stream . seekg (0, ios::beg);stream . read (str, headerSize);if (! stream . good ()) return -1;if (strncmp (str, headerStr, headerSize)==0) return headerSize;else return -1;

}

int IOBuffer::WriteHeader (ostream & stream) const{

stream . seekp (0, ios::beg);stream . write (headerStr, headerSize);if (! stream . good ()) return -1;return headerSize;

}

Page 13: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

WriteHeader Member Function – VariableLengthBuffer

int VariableLengthBuffer :: WriteHeader (ostream & stream) const// write a buffer header to the beginning of the stream// A header consists of the // IOBUFFER header// header string// Variable sized record of length fields// that describes the file records{

int result;// write the parent (IOBuffer) headerresult = IOBuffer::WriteHeader (stream);if (!result) return FALSE;// write the header stringstream . write (headerStr, headerSize);if (!stream . good ()) return FALSE;// write the record descriptionreturn stream . tellp();

}

Page 14: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

ReadHeader Member Function - VariableLengthBuffer

int VariableLengthBuffer :: ReadHeader (istream & stream)// read the header and check for consistency{

char str[headerSize+1];int result;// read the IOBuffer headerresult = IOBuffer::ReadHeader (stream);if (!result) return FALSE;// read the header stringstream . read (str, headerSize);if (!stream.good()) return FALSE;if (strncmp (str, headerStr, headerSize) != 0) return FALSE;// read and check the record descriptionreturn stream . tellg ();

}

Page 15: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Read Member Function - VariableLengthBuffer

int VariableLengthBuffer :: Read (istream & stream)// write the number of bytes in the buffer field definitions// the record length is represented by an unsigned short value{

if (stream.eof()) return -1;int recaddr = stream . tellg ();Clear ();unsigned short bufferSize;stream . read ((char *)&bufferSize, sizeof(bufferSize));if (! stream . good ()){stream.clear(); return -1;}BufferSize = bufferSize;if (BufferSize > MaxBytes) return -1; // buffer overflowstream . read (Buffer, BufferSize);if (! stream . good ()){stream.clear(); return -1;}return recaddr;

}

Page 16: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Write Member Function - VariableLengthBuffer

int VariableLengthBuffer :: Write (ostream & stream) const// write the length and buffer into the stream{

int recaddr = stream . tellp ();unsigned short bufferSize;bufferSize = BufferSize;stream . write ((char *)&bufferSize, sizeof(bufferSize));if (!stream) return -1;stream . write (Buffer, BufferSize);if (! stream . good ()) return -1;return recaddr;

}

Page 17: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

DRead, DWrite Member Functions

int IOBuffer::DRead (istream & stream, int recref)// read specified record{

stream . seekg (recref, ios::beg);if (stream . tellg () != recref) return -1;return Read (stream);

}

int IOBuffer::DWrite (ostream & stream, int recref) const// write specified record{

stream . seekp (recref, ios::beg);if (stream . tellp () != recref) return -1;return Write (stream);

}

Page 18: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Encapsulation Record I/O Ops in a Single Class(1)

Good design for making objects persistent

provide operation to read and write objects directly

Write operation until now :

two operation : pack into a buffer + write the buffer to a file

Class ‘RecordFile’

supports a read operation that takes an object of some class and writes it to a

file.

the use of buffers is hidden inside the class

problem with defining class ‘RecordFile’:

– how to make it possible to support files for different object types without

needing different versions of the class

Page 19: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

class BufferFile // file with buffers{ public:

BufferFile (IOBuffer &); // create with a buffer int Open(char * fname, int MODE); // open an existing file

int Create (char * fname, int MODE); // create a new fileint Close();int Rewind(); // reset to the first data record// Input and Output operationsint Read(int recaddr = -1);int Write(int recaddr = -1);int Append(); // write the current buffer at the end of file

protected:IOBuffer & Buffer; // reference to the file’s bufferfstream File; // the C++ stream of the file

};

Usage: DelimFieldBuffer buffer; BufferFile file(buffer);

file.open(myfile); file.Read(); buffer.Unpack(myobject);

BufferFile Class Definition

Page 20: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

template <class RecType>class RecordFile : public BufferFile{ public: int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1 ); RecordFile(IOBuffer& buffer) : BufferFile(buffer) { }};

Class ‘RecordFile’

uses C++ template features to solve the problem

definition of the template class RecordFile

Encapsulation Record I/O Operation in a Single Class(2)

Page 21: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

// template method bodies

template <class RecType>

int RecordFile<RecType>::Read (RecType & record, int recaddr = -1)

{ int writeAdd, result;

writeAddr = BufferFile::Read (recaddr);

if (!writeAddr) return -1; result = record.Unpack(Buffer);

if (!result) return -1; return writeAddr;

}

template <class RecType>

int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1)

{ int result;

result = record . Pack (Buffer);

if (!result) return -1;

return BufferFile::Write (recaddr);

}

Page 22: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

There is difference between file access and file organization.

Variable-length recordsSequential access is suitable

Fixed-length recordsDirect access and sequential access are possible

File Organization File Access

Variable-length Records Sequential accessFixed-length records Direct access

File Access and File Organization

Page 23: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Abstract Data Model

Data object such as document, images, sound

e.g. color raster images, FITS image file

Abstract Data Model does not view data as it appears on a particular medium. application-oriented view

Headers and Self-describing files

Page 24: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Metadata

Data that describe the primary data in a file

A place to store metadata in a file is the header record

Standard format

FITS (Flexible Image Transport System) by International

Astronomers’ union (see Figure 5.7)

Page 25: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Mixing object Types in a file

Each field is identified using “keyword = value”

Index table with tags

e.g.

Page 26: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Object-oriented file access Separate translating to and from the physical format and application (representation-

independent file access)

RAM

image :

star1 star2

Disk

Program find_star :read_image(“star1”, image)process image:end find_star

Page 27: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Extensibility

Advantage of using tags

Identify object within files is that we do not have to know a priori what

all of the objects will look like

When we encounter new type of object, we implement method for reading and

writing that object and add the method.

Page 28: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Factor affecting Portability

Differences among operating system

Differences among language

Differences in machine architecture

Differences on platforms

EBCDIC and ASCII

Page 29: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Achieving Portability (1)

Standardization

Standard physical record format

– extensible, simple

Standard binary encoding for data elements

– IEEE, XDR

File structure conversion

Number and text conversion

Page 30: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Achieving Portability (2)

File system difference

Block size is 512 bytes on UNIX systems

Block size is 2880 bytes on non-UNIX systems

UNIX and Portability

UNIX support portability by being commonly available on a large

number of platforms

UNIX provides a utility called dd

– dd : facilitates data conversion

Page 31: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Portability

화일 공유

화일이 서로 다른 컴퓨터에서 , 서로 다른 프로그램에서 접근 가능

이식성 (Portability) 과 표준화 (Standardization)

이식성에 영향을 주는 요인들

두 회사가 화일을 공유

– A 회사 : sun 컴퓨터 , C 프로그래밍 , B 회사 : IBM PC 에서 Turbo

PASCAL 프로그래밍

운영체제 사이의 차이점들

– 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음

프로그래밍 언어들 사이의 차이점들

화일 공유

화일이 서로 다른 컴퓨터에서 , 서로 다른 프로그램에서 접근 가능

이식성 (Portability) 과 표준화 (Standardization)

이식성에 영향을 주는 요인들

두 회사가 화일을 공유

– A 회사 : sun 컴퓨터 , C 프로그래밍 , B 회사 : IBM PC 에서 Turbo

PASCAL 프로그래밍

운영체제 사이의 차이점들

– 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음

프로그래밍 언어들 사이의 차이점들

Page 32: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Portability

이식성의 달성

표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름

– 물리적 표준 : 어떤 언어 , 기계 , 운영체제에 상관 없이 물리적으로 같게 표현되는 것

– ex) FITS

데이터 요소를 위한 표준 이진 코드화에 동의

– 기본적 데이터 요소 : 텍스트 , 숫자

– ex) IEEE 표준형식과 XDR

이식성의 달성

표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름

– 물리적 표준 : 어떤 언어 , 기계 , 운영체제에 상관 없이 물리적으로 같게 표현되는 것

– ex) FITS

데이터 요소를 위한 표준 이진 코드화에 동의

– 기본적 데이터 요소 : 텍스트 , 숫자

– ex) IEEE 표준형식과 XDR

Page 33: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Portability

변환 1: 직접 변환 형태

변환 2 : 중간 표준 형태

변환 1: 직접 변환 형태

변환 2 : 중간 표준 형태

IBMVAXCraySun 3IBM PC

IBMVAXCraySun 3IBM PC

IBMVAXCraySun 3IBM PC

IBMVAXCraySun 3IBM PC

XDR

Page 34: Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search

Let’s Review !!!

5.1 Record Access

5.2 More about Record Structures

5.3 Encapsulating Record I/O Ops in a Single Class

5.4 File Access and File Organization

5.5 Beyond Record Structures

5.6 Portability and Standardization