chapter 2 - er
DESCRIPTION
safTRANSCRIPT
Module 2: Entity-Relationship Model
• Introduction
• Benefits of Data Modeling
• Types of Models
• Phases of Database Modeling
• The Entity-Relationship (ER) Model
• Extended Entity-Relationship(EER) Model
• Generalization
• Specialization
• Aggregation
Introduction
Building a database system is a complex process that normally requires analyzing the user's requirements followed by a design process and finally concluding with an implementation. During analysis and design, the database designer normally needs to build a model of the proposed database system. Database modeling is essential to manage the database development process.
Models have not been invented recently. Humans have used models for a long time for a variety of purposes and perhaps that is why modeling is interpreted differently by different people. Some examples of model building:
• Architects build models of buildings to show what a building or a house would look like.
• Landscape architects build models of landscaping to show what a landscape or garden will look like.
• Aeronautical engineers build models of planes to show what a plane would look like. Scaled models may also be used in testing to evaluate a plane's performance.
• Computer architects build models of new computers or new circuit boards that are implemented in hardware.
• Traffic engineers build models of traffic for a city or part of a city to show why there are traffic problems or how they could be solved.
• A model is an abstract and often incomplete representation of the design or definition of a complex product, process or situation
Benefits of Data Modeling • Focusing on essentials
• Ease of communication and understanding
• Product or process improvement
• Exploring alternatives
Data Models A collection of tools for describing
• Data
• Data relationships
• Data semantics
• Data constraints
• Entity-Relationship data model (mainly for database
design)
• Relational model
• Object-based data models (Object-oriented and
Object-relational)
• Semistructured data model (XML)
• Other older models: • Network model • Hierarchical model
•Models an enterprise as a collection of
entities and relationships
Entity: a “thing” or “object” in the enterprise
that is distinguishable from other objects
Described by a set of attributes
• Relationship: an association among several
entities
e.g. each employee is an entity described by empno, empname, designation etc.
Entity-Relationship Model
• Entity set – set of all entities of the same
type
• Relationship set – set of all relationships of
same type
• Mapping cardinality – number of entities
to which another entities can be
associated via relationship set
The overall logical design of database
can be expressed graphically by an E-R
diagram, which has following
components:
Divided Rectangles - entity set
Diamonds –relationship among entity sets
Lines – link attributes to entity sets and
entity sets to relationship sets
Each component of E-R diagram is labeled with entity or
relationship that it represents
Example of schema in the entity-relationship model
Relational Model
• It uses collection of tables to represent
data as well as relationship among those
data.
• Each table has multiple columns, each
column has unique name.
Other Models
Network Model
• Data are represented by collection of
records, and relationships among those
data are represented by links, which
are viewed as pointers.
• Records are organized as a collection
of arbitrary graph
Hierarchical Model
• Data are represented by collection of
records, and relationships among those
data are represented by links, which are
viewed as pointers.
• Records are organized as a collection of
trees rather than arbitrary graph
Entity-Relationship Model
ER modeling: A graphical technique for understanding and organizing the data independent of the actual database implementation OR
The ER model describes data as entities, attributes and relationships
Basic Notations of ER model are
• Entity
• Attributes
• Relationship
Entity
• Entity An entity is a thing or object in the real world that is distinguishable from all other objects OR
Any thing from the real world that have an independent existence and about which we intend to collect data
• An entity may be an object with physical existence (person, book, car, house) or with a conceptual existence (Company, job, or university course)
Entity
• Each Entity has attributes, the particular properties that describe it
• Entity type - Set of entities of the same type that share the same attributes. Each Entity type is described by its name and attributes
• Entity Set – Collection of all entities of a particular entity type in DB at any point in time
• Extension of the entity Set - The individual entities that constitute a set
• Entity sets do not need to be disjoint
Key Attributes of an Entity Set • Uniqueness Constraint - An important
constraint on the entities of an entity type
• Key attribute - An attribute whose values are distinct for each individual entities in the entity set. Its values can be used to identify each entity uniquely – e. g. the account number of an account, an
employee id of an employee
• composite key attribute - If several attributes form a key i.e. combination of attribute values must be distinct for each entity. A composite key must be minimal
Attributes • Each entity has attributes—the particular
properties that describe it
• An entity is represented by a set of attributes
• Attributes are descriptive properties possessed by each member of an entity set
• Each entity may have its own value for each attribute
• The set of possible values for each attribute of a particular entity is called the domain of the attribute – The domain of attribute marital status is just the four
values of set of alphabetic characters single, married, divorced, widowed
– The domain of the attribute month is set of twelve values ranging from January to December
Figure: Two entities, EMPLOYEE e1, and COMPANY c1, and their attributes
Domain of attributes
• Each Simple attribute of an entity is associated with a value set or domain of values, which specifies the set of values that may be assigned to that attribute for each individual entity
– If range of ages allowed for employees is between 16 & 70 then value set for age attribute is set of integer numbers between 16 & 70
– Value set for name attribute is set of strings of alphabetic characters separated by blank spaces
• Value sets are not displayed in ERD. They are specified using the basic data types available in DBMS
Types of attributes Simple Vs composite attribute
• Simple (Atomic) - cannot be divided into smaller subparts (simpler) components – age of an employee
• Composite - can be divided into smaller subparts (simpler) components, which represents more basic attributes with independent meaning – joining date of the employee. Can be divided
into day, month and year
If composite attribute is referred only as a
whole then no need to divide it into component attributes, the whole composite attribute can be referred as simple attribute
Composite Attributes
• Composite attributes can form a hierarchy • The value of a composite attribute is the concatenation of the values of its components i.e. simple attributes • Composite attributes are useful to model situations in which a user sometimes refers to the composite attribute as a unit but at other times refers specifically to its components • If the composite attribute is referenced only as a whole, there is no need to subdivide it into component attributes
Single Vs Multi-valued Attributes
• Single valued - Can take only a single value for each particular entity
– age of a person. There can be only one value for this
• Multi-valued - Can take set of values for the same entity
– skill set of an employee, colors for a car, degrees for a person, dependents of an employee, nominees of an a/c holder
A multi-valued attribute may have lower and upper bounds to limit the number of values allowed for each individual entity
Stored Vs Derived attribute • Stored - Attribute values that need to be
stored permanently – name of an employee, title of a book
• Derived – In some cases, two (or more) attribute values are related. Attribute values that can be calculated (derived) from value of other attribute value – years of service of employee can be calculated
from date of joining and current date
Some attribute values can be derived from related Entities
—an attribute Number_of_employees of a DEPARTMENT entity can be derived by counting the number of employees related to (working for) that department.
Null attribute • Null - In some cases, a particular entity may not have an
applicable value for an attribute
• Apartment_number attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes (Bunglow)
• College_degrees attribute applies only to people with college degrees
The meaning of the above type of NULL is not applicable
NULL can also be used if we do not know the value of an attribute for a particular entity
• if we do not know the home phone number of ‘John Smith’
The meaning of the above type of NULL is not known The unknown category of NULL can be further classified into two
cases. • The first case arises when it is known that the attribute value
exists but is missing—for instance, if the Height attribute of a person is listed as NULL.
• The second case arises when it is not known whether the attribute value exists—for example, if the Home_phone attribute of a person is NULL.
Complex Attributes • Composite and multi-valued attributes can be nested
arbitrarily.
• We can represent arbitrary nesting by grouping components
of a composite attribute between parentheses () and
separating the components with commas, and by displaying
multi-valued attributes between braces { }. Such attributes
are called complex attributes
• if a person can have more than one residence and each residence
can have a single address and multiple phones, an attribute
Address_phone for a person can be specified as
A complex attribute: Address_phone.
Relationship
• A relationship is an association among several entity sets
Example:
Hayes depositor A-102 customer entity set relationship set account entity set
• A relationship set is a mathematical relation among
n 2 entities, each taken from entity sets {(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship – Example: (Hayes, A-102) depositor
Relationship Set borrower
• An attribute can also be property of a relationship set, called as descriptive attribute.
• For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date
• Refers to number of entity sets that participate in a relationship set.
• One Unary • Two Binary • Three Ternary
• Relationship sets that involve two entity sets are binary (or degree two).
• Relationship sets may involve more than two entity sets. – E.g. Suppose employees of a bank may have
jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch
• Relationships between more than two entity sets are rare. Most relationships are binary.
Degree of a Relationship Set
Mapping Constraints
E-R model may define certain constraints to which the contents of the database must confirm. Two most important constraints are
• Existence dependencies
• Mapping Cardinalities
Existence Dependencies
• If the existence of entity set ‘x’ depends on the existence of entity set ‘y’, then ‘x’ is said to be existence dependent on ‘y’. If ‘y’ is deleted, so is ‘x’. Entity set ‘y’ is said to be dominant entity set and ‘x’ is said to be subordinate entity set
Mapping Cardinalities
• Express the number of entities to which another entities can be associated via a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality must be one of the following types: – One to one (1:1) – One to many (1:N) – Many to one (N:1)
– Many to many (M:N)
Mapping Cardinalities
Some elements in A and B may not be mapped to any elements in the other set
Mapping Cardinalities
Some elements in A and B may not be mapped to any elements in the other set
Participation of an Entity Set in a Relationship Set
• Total participation - every entity in the entity set participates in at least one relationship in the relationship set
• Partial participation - some entities may not participate in any relationship in the relationship set
Participation of an Entity Set in a Relationship Set
Entity’s Role
• Each entity set that participates in a relationship set plays a particular role in a relationship
• The role name signifies the role that a participating entity from the entity set plays in each relationship instance, and helps to explain what the relationship means • in the WORKS_FOR relationship set, EMPLOYEE plays
the role of employee or worker and DEPARTMENT plays the role of department or employer.
• Role names are not necessary in relationship types where all the participating entity set are distinct, since each participating entity set name can be used as the role name.
• In some cases the same entity set participates more than once in a relationship set in different roles. In such cases the role name becomes essential for distinguishing the meaning of the role that each participating entity plays
•Entity sets of a relationship need
not be distinct
•The labels “Supervisor” and
“worker” are called roles; they
specify how employee entities
interact via the “Supervision”
relationship set.
•Roles are indicated in E-R
diagrams by labeling the lines
that connect diamonds to
rectangles.
•Role labels are optional, and are
used to clarify semantics of the
relationship
Weak Entity Set
• Strong Entity set - Entity set that has its own key attribute – Employee, student, customer, department
• Weak Entity set - Entity sets that do not have key attributes of their own – Dependent of an Employee, nominees of an a/c
holder
• Identifying / Owner Entity set - Entities belonging to a weak entity set are identified by being related to specific entities from another entity set in combination with one of their attribute values. The other entity set is identifying (owner / parent / dominant) entity set
Weak Entity Set • Identifying Relationship – The relationship
that relates a weak entity set to the owner
• Weak Entity set always has a total participation constraint i.e. existence dependency with respect to its identifying relationship
• Partial Key / Discriminator – Set of attributes that can uniquely identify weak entities that are related to the same owner entity
• In ERD identifying relationship are represented by double line diamond. Partial key attribute is underlined with a dashed or dotted line
E-R Diagram
Overall logical structure of a database expressed graphically
Major Components of ERD consists of • Rectangles divided into two parts represent entity
sets. The first part contains the name of the entity set. The second part contains the names of all the attributes of the entity set. Attributes that are part of the primary key are underlined.
• Diamonds represent relationship sets. • Undivided rectangles represent the attributes of a
relationship set. • Lines link entity sets to relationship sets. • Dashed lines link attributes of a relationship set to the
relationship set. • Double lines indicate total participation of an entity in
a relationship set. • Double diamonds represent identifying relationship
sets linked to weak entity sets.
Design Issues
• Use of entity sets vs. attributes Choice mainly depends on the structure of the enterprise being modeled, and on the semantics associated with the attribute in question
• Use of entity sets vs. relationship sets Possible guideline is to designate a relationship set to describe an action that occurs between entities
• Binary versus n-ary relationship sets Although it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship.
• Placement of relationship attributes
An Example Database Application • The company is organized into departments. Each department
has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations
• A department controls number of projects, each of which has a unique name, a unique number, and a single location
• We store each employee’s name, number, address, salary, gender, and DOB. An employee is assigned to one department but may work on several projects, which are not necessarily controlled by the same department. We keep track of the current number of hours per week that an employee works on each project. We also keep track of the direct supervisor of each employee (who is another employee).
• We want to keep track of the dependents of each employee for insurance purposes. We keep each dependent’s first name, gender, DOB and relationship to the employee
A banking Scenario Banks have customers. Customers are identified by name,
customer id, phone number and address. Accounts can be held by more than one customer and a customers can have more than one accounts. Accounts are identified by an account number, account type (savings, current) and a balance. Customers can avail loans. Loans are identified by loan id, loan type (car, home, personal) and an amount. Banks are identified by a name, code and the address of the main office. Banks are organized into branches. Branches are identified by a branch number, branch name and an address. Accounts and loans are related to the banks’ branches.
Draw an ER diagram for a database to represent this application
University Database The university is organized into departments. Each department is identified by a
unique name (dept_name), is located in particular building, and has a budget.
Each department has a list of courses it offers. Each course has associated with it a course_id, title, dept_name, and credits, and may also have associated prerequisites.
Instructors are identified by their unique ID. Each instructor has name, associated department (dept_name), and salary.
Students are identified by their unique ID. Each student has a name, an associated major department (dept_name), and tot_cred (total credit hours the student earned thus far).
The university maintains a list of classrooms, specifying the name of the building, room_number, and room capacity.
The university maintains a list of all classes (sections) taught. Each section is identified by a course_id, sec_id, year, and semester, and has associated with it a semester, year, building, room_number, and time_slot_id (the time slot when the class meets).
The department has a list of teaching assignments specifying, for each instructor, the sections the instructor is teaching.
The university maintains a list of all student course registrations, specifying, for each student, the courses and the associated sections that the student has taken (registered for).
Enhanced / Extended ER Modeling (EER)
In late 1970’s designers of the DB applications designed more accurate database schemas that reflect the data properties and constraint more precisely for newer applications of database such as GIS (Geographic Information Systems), CAD/CAM. These databases have more complex requirements than traditional applications. This led to the development of additional semantic data modeling concepts that were incorporated into conceptual data models such as ER model. EER model includes all the modeling concepts of the ER model
Sub-classes, Super-classes And Inheritance
• Sub-Classes – In many cases an entity type has various sub-groupings of its entities that are meaningful and need to be represented explicitly because of their significance to the DB applications – Entities of Employee entity type may be
grouped into Secretary, Engineer, Manager, Salaried_employee, Hourly_Employee
Each of the subgroup is sub-class of the Employee entity type. The Employee entity is called Super-class for each of these sub-classes
• An entity in sub-class is same as entity in the super-class but in a distinct specific role
Sub-classes, Super-classes And Inheritance
The relationship between a super-class and any of its sub-classes is a super-class/sub-class or class/sub-class relationship. A specialization is depicted by a hollow arrow-head pointing from the specialized (subtype) entity to the other (supertype) entity. It is often called an ISA or ISAN relationship
• Type Inheritance – A concept associated with sub-class. An entity that is member of a sub-class inherits all the attributes of the entity that is member of super-class
• Sub-class entity also inherits all the relationships in which the super-class participates
Specialization
• A process of defining a set of sub-classes of an entity type (set); this entity type is called super-class of the specialization. The sub-class is defined on the basis of some distinguishing characteristic of the super-class.
• Specialization is Top-down process
Recent EER Notation
Generalization • A reverse process of abstraction in which the
differences among several entity types are suppressed. Common features of sub-class entity types are identified and generalized them into a single super-class. It is a Bottom-up process
• The term generalization refer to the process of defining a generalized entity type from the given entity types
Specialization and Generalization
• Can have multiple specializations of an entity set based on different features – permanent-employee vs. temporary-
employee, in addition to secretary, manager, engineer
• Each particular employee would be – a member of one of permanent-employee or
temporary-employee
– and also a member of one of secretary, manager, engineer
Design Constraints on a Specialization/Generalization
Constraint on which entities can be members of a given subclass (lower-level) entity type
Condition-defined – An entity will become member of which subclass is determined by placing a condition on the value of some attribute of the superclass.
– E.g. membership in the SECRETARY subclass by the condition (Job_type = ‘Secretary’)
User-defined - When we do not have a condition for determining membership in a subclass, the subclass is called user-defined. Membership in such a subclass is determined by the database users when they apply the operation to add an entity to the subclass; hence, membership is specified individually for each entity by the user, not by any condition that may be evaluated automatically.
Constraint on whether or not entities may belong to more than one subclass (lower-level) entity type within a single generalization
–Disjoint - an entity in super class can belong to
only one subclass (lower-level) entity type
• Noted in E-R diagram by writing disjoint next to the ISA triangle
–Overlapping - an entity in super class can belong
to more than one subclass (lower-level) entity type
Design Constraints on a Specialization/Generalization
Completeness constraint -- specifies whether or not an entity in the super class (higher-level) entity set must belong to at least one of the subclass (lower-level) entity sets within a generalization
– Total : an entity in super class must belong to one of the subclass (lower-level) entity type
– Partial: an entity in super class need not belong to one of the subclass (lower-level) entity type
Design Constraints on a Specialization/Generalization
Aggregation
• Consider the ternary relationship works-on
• Suppose we want to record managers for tasks performed by an employee at a branch
• Eliminate this redundancy via aggregation
–Treat relationship as an abstract entity
–Allows relationships between relationships
–Abstraction of relationship into new entity
Aggregation
E-R Diagram With Aggregation
Keys
• A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.
• A candidate key of an entity set is a minimal super key – Customer-id is candidate key of customer
– account-number is candidate key of account
• Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
Keys
• A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.
• A candidate key of an entity set is a minimal super key – Customer-id is candidate key of customer
– account-number is candidate key of account
• Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
• Composite primary key - A primary key which is a combination of more than one attribute is called a composite primary key
• Overlapping candidate keys - Two candidate keys overlap if they involve any attribute in common.
e.g., in an student table, S#, C#
and Emailid, C# are two overlapping candidate keys. (they have C# in common)
• Non-Key Attributes
The attributes other than the Candidate Key attributes are called Non-Key attributes.
OR
• The attributes which do not participate in any of the Candidate keys
• Foreign key
A foreign key is a “copy” of a primary key that has been exported from one relation into another to represent the existence of a relationship between them. A foreign key is a copy of the whole of its parent primary key i.e. if the primary key is composite, then so is the foreign key
• Weak entity set represented by double rectangles.
• underline the discriminator of a weak entity set with a dashed line.
• payment-number – discriminator of the payment entity set
• Primary key for payment – (loan-number, payment-number)
• Note: the primary key of the strong
entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship.
• If loan-number were explicitly stored, payment could be made a strong entity, but then the relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute loan-number common to payment and loan
• Each strong entity set becomes a table
• Each single-valued attribute becomes a column
• Derived attributes are ignored
• Composite attributes are represented by components
• Multi-valued attributes are represented by a separate table
• The key attribute of the entity set becomes the primary key of the table
Reduction of an E-R Schema to Tables
Entity example
• Here address is a composite attribute
• Years of service is a derived attribute (can be calculated from date of joining and current date)
• Skill set is a multi-valued attribute
• The relational Schema Employee (E#, Name, Door_No, Street, City, Pincode, Joining_Date)
Emp_Skillset( E#, Skillset)
Each value of the multi-valued attribute maps to a separate row of the table
Converting weak entity types
• Weak entity types are converted into a table of their own, with the primary key of the strong entity acting as a foreign key in the table
• This foreign key along with the key of the weak entity (discriminator) form the composite primary key of this table
The Relational Schema Employee (E# ,…….) Dependant (Employee, Dependant_ID, Name, Address)
Converting relationships • The way relationships are represented depends
on the cardinality and the degree of the relationship and participation
• The possible cardinalities :
1:1, 1:M, N:M
• The degrees :
Unary
Binary
Ternary
● The participation :
Total
Partial
Binary 1:1
The primary key of the partial participant will become the foreign key of the total participant Employee( E#, Name,…) Department (Dept#, Name…,Head)
Binary 1 : 1
Case 2: Uniform participation types The primary key of either of the participants can become a foreign key in the other Employee (E#,name…) Chair( item#, model, location, used_by) (or) Employee ( E#, Name….Sits_on) Chair (item#,….)
Binary 1:N
The primary key of the relation on the “1” side of the relationship becomes a foreign key in the relation on the “N” side Teacher (ID, Name, Telephone, ...) Subject (Code, Name, ..., Teacher)
Binary 1 : N
• A new table is created to represent the relationship
• New table contains two foreign keys - one from each of the participants in the relationship
• The primary key of the new table is the combination of the two foreign keys
Student (Sid#,Title…) Course(C#,CName,…)
Enrolls (Sid#, C#)
Binary M : N
Unary 1 : 1 • Consider employees who are
also a couple
• The primary key field itself will become foreign key in the same table
Employee( E#, Name,... , Spouse)
Unary 1 : 1
Employee Table
EmpCode PK
EmpName
DateofJoining
Spouse FK
Unary 1:N
• The primary key field itself will become foreign key in the same table
• Same as unary 1:1
Employee( E#, Name,…,Manager)
Unary 1 : N
Employee Table
EmpCode PK
EmpName
DateofJoining
Manager FK
Unary M:N
• Employee( E#, Name,…)
• Guaranty( Guarantor, beneficiary)
• There will be two resulting tables. One to represent the entity and another to represent the M:N relationship as follows
Unary M : N
Employee Table
EmpCode PK
EmpName
DateofJoining
Spouse FK
Employee Table
Guarantor PK/FK
Beneficiary PK /FK
Ternary relationship
Ternary relationship
• Represented by a new table
• The new table contains three foreign keys
– one from each of the participating Entities
• The primary key of the new table is the combination of all three foreign keys
Prescription (Doctor#, Patient #, Medicine_Name)
• Descriptive attribute of relationship becomes a separate column of new table
Prescription (Doctor#, Patient #, Medicine_Name, doses, duration, next_visit)
Representing Specialization as Tables
• Method 1 – create a table for the higher level entity
– create a table for each lower level entity set, include primary key of higher level entity set and local attributes
table table attributes
person name, street, city customer name, credit-rating employee name, salary
Drawback - getting information about customer
and employee requires accessing two tables
• Method 2
– create a table for each entity set with all local and inherited attributes
table table attributes
customer name, street, city, credit-rating
employee name, street, city, salary If specialization is total, no need to create table for generalized entity (person)
Drawback - street and city may be stored
redundantly for persons who are both customers
and employees
Representing Aggregation as Tables
Relational Schema Diagram
Car Insurance Company