dbms topics for bca

Entity – Relationship Model

Data Modelling & ER Diagram | Prepared by Jayaprabha 1

UNIT 2 - Syllabus

• Data Modeling using ER model

• High level conceptual data models for DB design with an example,

• Entity types, Entity sets, Attributes and Keys,

• ER models, Notation for ER diagram, Proper Naming of Schema constructs,

• Relationship types of of degree higher than two,

• Record Storage and Primary file Organisation, Secondary storage devices, Buffering of blocks, placing file records on disk,

• Operations of files, File of unordered records (heap files), files of ordered records (sorted files),

• Hashing techniques and other primary file organization


Entity – Relationship Notations


Entity

• Is a real world object

• Eg: Student, Project, Bank, Department, Phone, Car, Employee . . . .


Attribute

• Describes the characteristic of an Entity.

• Eg1: Employee id, name, designation, salary . . . .

• Eg2: Bank A/c no., a/c name, Customer name, balance . . . .


Types of Attributes

1. Simple Attribute : Is an attribute which can be further subdivided

Eg: Empid, Studno, Phoneno, . . . .


Types of Attributes

2. Composite attribute: An attribute that can be further sub divided.


Types of Attributes

3. Single valued attribute: Attribute that can take only 1 value

4: Multivalued attribute: Attribute having more than 1 value

Eg: Hobbies, Subjects, Degrees. . . .


Types of Attributes

5. Stored attribute: attribute that cannot be derived from other attributes.

Eg: DOB


Types of Attributes

6. Derived Attribute: Value of 1 attribute derived from the other attribute

Eg: Age is derived from DOB


Types of Attributes

7. Null values: attribute having no values/ not known

Eg: An application may have 2 Phone nos. where only 1 is known . . . .


Weak Entity

• Is an entity that cannot be uniquely identified by its attributes alone;

• It must use a foreign key in conjunction with its attributes to create a primary key


http://en.wikipedia.org/wiki/Foreign_key

http://en.wikipedia.org/wiki/Primary_key

Relation/ Relationship

• Is association of 2 or more entities


Weak Relationship


Keys

• A Key is an important concept in Relational DBMS (RDBMS)

• They used to establish a relation between multiple tables

• They also ensure that each record in the table is uniquely identified by combination of one or more fields/ attribute names


Primary Key

• It uniquely identifies each record in the table.


Primary Key


Foreign Key

1. Is a column that references a column of another table. The purpose of the foreign key is to ensure referential integrity of the data.

2. A foreign key is a field in one table that uniquely identifies a row of another table.

3. foreign key is defined in a second table, but it refers to the primary key in the first table

4. A foreign key is a key used to link two tables together. This is sometimes called a referencing key.


Foreign Key


Types of relation

1. Unary

2. Binary

3. Ternary

4. Quarternary


Unary Relation


Binary Relation


Ternary Relation


Quaternary Relation


Cardinality

• It expresses the maximum number of occurrences between 2 related entities.

• Cardinality ratio for binary relationship

• 1 : 1

• 1 : N

• N : 1

• M : N


Cardinality


Dependency/ Weak entity


Storage Medium

• In computers, a storage medium is any technology used to place, keep, and retrieve data.


Primary Storage

•Primary storage (or main memory or internal

memory), is the only one directly accessible to

the CPU.

•The CPU continuously reads instructions

stored in it and executes them when required.


Primary Storage

1. Is directly operated by CPU

2. Eg: RAM, ROM, CACHE memory

3. Provides fast access to data

4. But, has limited storage

5. Is expensive

6. Is volatile


DB Secondary storage device


Secondary storage device

1. Cannot be directly accessed by CPU

2. Eg: Pendrive, Magnetic tape, Magnetic Disk . . . .

3. Have large storage capacity

4. Cost less


Seek Time & Latency Time


Key Words

• Seek Time: Seek time is the time taken for a hard disk controller to locate a specific piece of stored data.

• Rotational delay / Latency time: the time required to locate the first bit or character in a storage location

• Block Transfer Time: Is the time required to transfer a Block of data


Buffering of Blocks


Buffering of Blocks

• The Buffer Manager is responsible for allocating the main memory to the process as per the need and minimizing the delays and unsatisfiable requests


Double Buffering


Double buffering

• When data is transmitted from primary to secondary memory, CPU can start processing the block.

• Simultaneously the I/O processor can read and transfer the next block into the buffer.

• Permits continuous reading or writing of data into consecutive blocks

• Since data is ready for processing waiting time is reduced.


Double Buffering

• Data/ block of data is transferred from primary to secondary memory

• The CPU will start processing the blocks

• The I/O processor will read 1 block of data – transfer the data of 1st

block while reading the the 2nd block of data.

• This technique is called Double Buffering

• Adv:

• Permits continuous reading & writing/ transferring of blocks of data

• Waiting time is reduced as data is read & written continuously


Spanned & Unspanned Records


Data allocation in memory

• Is allocating/ storing data into the memory

• Data allocation can be done in different formats like


Contiguous Allocation


Linked Allocation


Indexed Allocation


Hashing

• Is a technique for searching

• Collision – occurs when data is placed in a location where data already exists

• Bucket – is a block of data

• There are 2 types of hashing

• Internal hashing

• External hashing


Internal Hashing – collision resolution

• Open addressing – place the records in the next available free space

• Chaining – a pointer is placed at the end of every record. If the record overflows the data is placed in the next available free space and a pointer is used to point to the overflow location

• Multiple hashing – if overflow occurs, a next hash function is used find the new location. If that location is full then another hash function is used. In case of collision then open addressing is used.


B Tree

• Is a method of placing / locating the block of data on the disk

• The B-tree minimizes the number of times a medium must be accessed to locate a desired record, thereby speeding up the process.

• a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions .

• The B-tree is a binary search tree in which a node can have more than two children

• the B-tree is optimized for systems that read and write large blocks of data.

• It is commonly used in databases and file systems.


File Operations

1. Open

2. Close

Record- at- a- time

1. Reset

2. Find/ Locate

3. Read

4. Findnext

5. Delete

6. Modify

7. Insert

8. Scan


File Operations

• Set- at- a- time

1. Find all

2. Find

3. Locate N

4. Find ordered

5. Reorganize


Hashing

• In computing, a hash table is a data structure used to implement an associative array, a structure that can map keys to values.

• A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.

• Dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.

• This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.


http://en.wikipedia.org/wiki/Collision_(computer_science)

http://en.wikipedia.org/wiki/Hash_table

http://en.wikipedia.org/wiki/Data_structure

External Hashing


Overflow of buckets by chaining


Extendible hashing


Extendible hashing

• Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand.

• Dynamic hashing is also known as extended hashing.


Magnetic disk

1. It stores a large amount of data

2. A bit is a single unit of storage

3. A bit can be obtained by magnetizing a part of the disk

4. It is single sided when it is magnetized on one side and double sided when it is magnetized on both sides

5. Information is stored in tracks (Concentric circles)


Magnetic Disk

6. TRACKS of same diameter in a disk pack is called cylinder

7. Data stored in one cylinder is retrieved faster when data stored in multiple cylinders.

8. A track is divided into smaller blocks called SECTORS

9. Division of blocks/ sectors is made during disk formatting

10. Their size range b/w 512- 4096 bytes


Magnetic Disk

11. Every block is separated by Inter Block Gap which contain some information.

12. Inter Block Gap acts as a bridge b/w the information contained in different blocks.

13. A disk/ disk drive is mounted on a spindle that has a motor

14. The motor allows the disk to rotate and read- write head is used to read/ write information from/ into the disk


Magnetic Disk

15. a disk controller controls the disk drive and interfaces with the computer system.

16. It takes commands from the computer and activates the read- write on the tape accordingly (ie if user wants to read data from disk then read is activated/ if user wants to write data into disk then write is activated


Placing file/ records on disk

1. Record an record types:

• Data is stored in the form of records

• Each record contains some related values

• Record format is collection of attributes with data type

• Data type may be int, decimal, char, varchar

• There are some unstructured data like images, video, audio or some free text


Placing file/ records on disk

• These unstructured large data are called Binary Large Objects (BLOB)

• It is stored separately and has a pointer to point to it


2. Files, Fixed length rec & var length rec

• File is a collection records

• Fixed length rec – is every rec in a file are of equal length

• Var length rec – rec having different length

• Reasons for variable length


Reasons for variable length records

• 1 or more attributes may have varying size like name, address. . . .

• 1 or more attributes have multiple fields/ repeating fields like DOB, age

• There may be few optional fields ie some of the fields may or may not have values like phone no.

• Files may be of different types & varying like mixed file


3. Rec blocking – Spanned & Unspanned rec

• Files contain records

• Records are stored in Disks

• Disk is divided into several Blocks

• Unit of data transfer between disk and memory is thru Blocks

• If Block size >= Record size i.e., B >= R then Blocking Factor bfr = floor(B/R) records/block

• bfr = B/R = x = floor functions to the prev decimal


Blocking Factor

• bfr = B/ R = 10/ 2 = 5 ie 5 records/ block

• bfr = B/ R = 10/ 3 = 3.5 ie 3 records/ block

10 bytes 2 recs

10 bytes 3 recs


Spanned record

• If record size is larger than block size it is Unspanned Record

Block Record


• Unused space in each Block = B-(bfr*R) bytes

• A pointer is placed at the end of the record to point to the continuation of record in the next block

• This organization is called unspanned records i.e., when record size is larger than block size

• When records are not allowed to cross the boundary then it is unspanned records


• Example

• Block size B = 100

• Rec Size R = 30

• Therefore, # records per clock = bfr = B/R = 100/30 = 3.33 = 3

• Therefore, total unused space = B – (bfr *R )

• 100 – (3 * 30) = 10 bytes


4. Allocating File Blocks on Disk

1. Contiguous allocation : Blocks are allocated to consecutive disk blocks so reading of the files is faster

2. Linked allocation : Each block contains a pointer to the next block; it is slow to read records but easily expandable

3. Indexed allocation : Separate blocks are allocated to maintain an index that contain pointer to the actual file block


5. File Headers

1. Contains info about the file

2. The header contains info about :

• Disk address

• Record format (Field Length, Field Type, Separator Char and Record Type {Spanned, Unspanned Record})

• To search a record in the disk, first the blocks are copied to main memory and then the record is searched using “Linear Search” by using address in the file header


File Organization – Heap File

• Files with unordered records (Heap files)

Records are placed in the order they are entered

New records are entered at the end of the file

This leads to secondary indices

Inserting new record is very efficient



New records are added into memory first and then added to disk

Searching the blocks by linear search -- is time consuming

For record deletion, the program must first find the block containing the record – copy record to memory and then delete it which leaves a blank space in disk

When many records have to be deleted, it results in wastage of space in the disk



To avoid such wastage, records are not deleted but instead marked for deletion; when there is a periodical reorganization, the marked records are purged and new records are inserted

Spanned or Unspanned org can be used for fixed / variable length records

To sort all records in the file based on certain field, the sorted file is maintained separately


Heap file Organization


Heap file Organization- using index


Sorted file

• Files are placed in a particular order

• So reading/ accessing the files data is fast and easy

• Searching can be done based on some searching key

• Binary search is the technique used for searching a record in sorted file


Sort file - insert

• In case a record starting with ‘j’ has to be searched then it uses binary search technique which uses log2(b) formula where b= block

• Inserting a new record is a very tedious task because pushing the other records further down and inserting a new rec at the required place is time consuming


Sort file - insert

• To avoid this problem, some free space can be reserved in every block so that a new records can be added into it.

• If the record is too big to be stored in the free space then the rest of the data will move to the overflow area.

• A pointer is used to pointer to the data in the overflow area.


Sort file - delete

• Records are not deleted at once but marked for deletion.

•

• During re-organization, the marked records are permanently deleted.


Data striping

• Data striping transparently distributes data over multiple disks to make them appear as a single fast, large disk.

• Striping improves the I/O performance by allowing multiple I/Os to be serviced in parallel.


RAID

• Redundant Array of Inexpensive/ Independent Disks

• RAID is a storage technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy and performance improvement.

• Data is distributed across the drives in one of several ways, referred to as RAID levels


RAID – Redundant Array of Inexpensive/ Redundant Disks


RAID


Raid 1


RAID 2

• A RAID 2 stripes data at the bit (rather than block) level, and uses a Hamming code for error correction.

• The disks are synchronized by the controller to spin at the same angular orientation so it generally cannot service multiple requests simultaneously.

• Extremely high data transfer rates are possible.


Parity bit

A parity bit, or check bit is a bit added to the end

of a string of binary code that indicates whether

the number of bits in the string with the value one

is even or odd.

Parity bits are used as the simplest form of error

detecting code.


RAID 2


RAID 3

• Uses a single parity disk and figures out which disk has failed with a help of a controller

• Used for large volume of storage

• Gives higher data transfer


RAID 4

• Uses block level data striping


RAID 5

• Used for storing large volume of data

• Uses block level striping

• Distributes data and parity across all disks

• It requires all drives -but atleast one drive must be present to operate.

• Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost.


RAID 5


RAID 6

• RAID 6 extends RAID 5 by adding an additional parity block; thus it uses block-level striping with two parity blocks distributed across all member disks.


RAID 6


B Tree

• A B-tree is a method of placing and locating files (called records) in a database

• The B-tree algorithm minimizes the number of times a medium must be accessed to locate a desired record, thereby speeding up the process.

• B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in a very short time.

• The B-tree is a generalization of a binary search tree in that a node can have more than two children


Important Questions

• What is a Data model? Explain its different types

• 3 schema architecture

• Centralized architecture

• Buffering of blocks

• Data independence, data abstraction

• Client- server or 3 tier architecture


dbms topics for bca

Software

dobdata modelling er

knowndata modelling

er models

derived attribute

entity types

multivalued attribute

simple attribute

composite attribute