representing data elements

Representing Data Elements

Fields, Records, BlocksVariable-length DataModifying Records

Source: our textbook

Overview

Attributes are represented by sequences of bytes, called fields

Tuples are represented by collections of fields, called records

Relations are represented by collections of records, called files

Files are stored in blocks, using specialized data structures to support efficient modification and querying

Representing SQL Data Types

integers and reals: built-in CHAR(n): array of n bytes VARCHAR(n): array of n+1 bytes

(extra byte is either string length or null char)

dates and times: fixed length strings etc.

Representing Tuples

For now, assume all attributes (fields) are fixed length.

Concatenate the fields Store the offset of each field in

schema0 30 286 287 297

nameCHAR(30)30 bytes

addressVARCHAR(255)256 bytes

genderCHAR(1)1 byte

birthdateDATE10 bytes

More on Tuples

Due to hardware considerations, certain types of data need to start at addresses that are multiples of 4 or 8

Previous example becomes:0 32 288 292 304

nameCHAR(30)30 bytes+ 2

addressVARCHAR(255)256 bytes

genderCHAR(1)1 byte+ 3

birthdateDATE10 bytes+ 2

Record Headers

Often it is convenient to keep some "header" information in each record: a pointer to schema information

(attributes/fields, types, their order in the tuple, constraints)

length of the record/tuple timestamp of last modification

Packing Records into Blocks

Start with block header: timestamp of last modification/access offset of each record in the block, etc.

Follow with sequence of records May end with some unused space

headerrecord 1record 2 … record n-1record n

Representing Addresses

Often addresses (pointers) are part of records: the application data in object-oriented databases as part of indexes and other data structures

supporting the DBMS Every data item (block, record, etc.) has

two addresses: database address: address on the disk (typically 8-16 bytes) memory address, if the item is in memory

(typically 4 bytes)

Translation Table

Provides mapping from database addresses to memory addresses for all blocks currently in memory

Later we'll discuss how to implement it

Pointer Swizzling

When a block is moved from disk into main memory, change all the disk addresses that point to items in this block into main memory addresses.

Need a bit for each address to indicate if it is a disk address or a memory address.

Why? Faster to follow memory pointers (only uses a single machine instruction).

Example of Swizzling

Block 1

Block 2

Disk Main Memory

read intomain memory swizzled

unswizzled

Swizzling Policies

Automatic swizzling: as soon as block is brought into memory, swizzle all relevant pointers

Swizzling on demand: only swizzle a pointer if and when it is actually followed

No swizzling: always refer to translation table

Programmer control

Automatic Swizzling

Locating all pointers within a block: refer to the schema, which will indicate where

addresses are in the records for index structures, pointers are at known

locations Update translation table with memory

addresses of items in the block Update pointers in the block (in memory)

with memory addresses, when possible, as obtained from translation table

Unswizzling

When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses

Use translation table again Important to have an efficient data

structure for the translation table

Pinned Records and Blocks

A block in memory is pinned if it cannot be safely written back to disk

Indicate with a bit in the block header

Reasons for pinning: related to failure recovery (more later) because of pointer swizzling

If block B1 has swizzled pointer to an item in block B2, then B2 is pinned.

Unpinning a Block Consider each item in the block to be

unpinned Keep in the translation table the places

in memory holding swizzled pointers to that item (e.g., with a linked list)

Unswizzle those pointers: use translation table to replace the memory addresses with database (disk) addresses

Variable Length Data

Data items with varying size (e.g., if maximum size of a field is large but most of the time the values are small)

Variable-format records (e.g., NULLs method for representing a hierarchy of entity sets as relations)

Records that do not fit in a block (e.g., an MPEG of a movie)

Variable-Length Fields

Store the fixed-length fields before the variable-length fields in each record

Keep in the record header record length pointers to the beginnings of all the

variable-length fields Book discusses variations on this idea

Variable Length Fieldsotherheaderinfo

recordlength

to var lenfield 2

var lenfield 2

var lenfield 3

fixed lenfield 2

var lenfield 1

fixed lenfield 1

to var lenfield 3

Variable-Format Records

Represent by a sequence of tagged fields

Each tagged field contains name type length, if not deducible from the type value

Splitting Records Across Blocks

Called spanned records Useful when

record size exceeds block size putting an integral number of records in a

block wastes a lot of the block (e.g., record size is 51% of block size)

Each record or fragment header contains bit indicating if it is a fragment if fragment then pointers to previous and

next fragments of the record (i.e., a linked list)

Record Modification

Modifications to records: insert delete update

issues even with fixed-length records and fields

even more involved with variable-length data

Inserting New Records

If records need not be any particular order, then just find a block with enough empty space

Later we'll see how to keep track of all the tuples of a given relation

But what if blocks should be kept in a certain order, such as sorted on primary key?

Insertion in Order

If there is space in the block, then add the record(going right to left), add a pointer to it (going leftto right) and rearrange the pointers as needed.

record4

record3

record2

record1

unused

header

What if Block is Full?

Records are stored in several blocks, in sorted order

One approach: keep a linked list of "overflow" blocks for each block in the main sequence

Another approach is described in the book

Deleting Records

Try to reclaim space made available after a record is deleted

If using an offset table, then rearrange the records to fill in any hole that is left behind and adjust the pointers

Additional mechanisms are based on keeping a linked list of available space and compacting when possible

Tombstones

What about pointers to deleted records?

We place a tombstone in place of each deleted record

Tombstone is permanent Issue of where to place the tombstone Keep a tombstone bit in each record

header: if this is a tombstone, then no need to store additional data

Updating Records

For fixed-length records, there is no effect on the storage system

For variable-length records: if length increases, like insertion if length decreases, like deletion

except tombstones are not necessary

representing data elements

Documents

representing data

u1l4 - representing data

representing data with charts and graphs

chapter 1 representing data in a computer

1 lecture 20: indexes friday, february 25, 2005. 2 outline...

worksheet 12.2 representing and interpreting data name: ·...

review set: representing data

representing data with patterns

1.3 representing data - pioneer science

chapter three representing data: constants and variables

representing and interpreting data 1

mapping and representing soil information and data

graphically representing data

natural elements representing the cycle of life and death

representing the reality underlying demographic data

representing and storing structured data

chapter 3 representing data elements 1.how to lay out data...

representing the structural elements of a freely faceted

representing contextualized data using semantic web...

representing contextual aspects of data