database design - for all levels

Upload: sri-janam

Post on 14-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Database Design - For ALL LEVELS

    1/50

    2010 Wipro Ltd - Confidential

    Sandeep Kar

    Database Design

  • 7/27/2019 Database Design - For ALL LEVELS

    2/50

    2010 Wipro Ltd - Confidential2

    Logical Database DesignPhysical Vs Logical Database ModelingNormalization TechniquesDe-normalization TechniquesData Modeling Tool - ERWIN

    Topics We are going to discuss

  • 7/27/2019 Database Design - For ALL LEVELS

    3/50

    2010 Wipro Ltd - Confidential3

    Database design is the process of producing a detailed data model of adatabase.

    Data modeling in software engineering is the process of creating a datamodel by applying formal data model descriptions using data modelingtechniques.

    Data modeling is atechnique for defining business requirements for adatabase.

    Definition

  • 7/27/2019 Database Design - For ALL LEVELS

    4/50

    2010 Wipro Ltd - Confidential4

    Types of Data ModelsTypes of Data Models

    Model

    Conceptual Logical Physical

  • 7/27/2019 Database Design - For ALL LEVELS

    5/50

    2010 Wipro Ltd - Confidential5

    Conceptual Data Model:A conceptual data model identifies thehighest-level relationships between the different entities.

    Logical Data Modelinguses Entity-Relationship Modeling technique(E/R Modeling technique)

    Physical Data Modelinguses Relational Modeling technique

    Logical Data Independence :The ability to change the logical(conceptual) schema without changing the External schema is called logicaldata independence. E.g. Views can shield users from changes in thestructure of the real tables .

    Modeling techniquesModeling techniques

  • 7/27/2019 Database Design - For ALL LEVELS

    6/50

    2010 Wipro Ltd - Confidential6

    Communicate via Business

    Analyst

    Data Modeler DBA Data Architect

    Business

    Analyst

    --- Produces

    Data Requirements

    Use Cases

    --- Data Requirements

    Use Cases

    LDM

    Data Modeler Logical Data Model --- Physical Data Model Logical and PhysicalData Models

    DBA --- Physical Data Model --- Physical Data Model

    Data Architect Data RequirementsUse Cases

    LDM

    Logical and PhysicalData Models Performs the detailedphysical design

    Creates the DDL and/or

    required DML

    Implements the model

    into the database

    Reviews the DataModels and assesses

    the completeness for

    the physical

    implementation

    Roles and responsibilitiesRoles and responsibilities

  • 7/27/2019 Database Design - For ALL LEVELS

    7/50

    2010 Wipro Ltd - Confidential7

    Includes the important entities and the relationships among them.

    No attribute is specified.

    No primary key is specified.

    A conceptualentity-relationship model shows how the business world sees

    information. It suppresses non-critical details in order to emphasize business rules and

    user objects.

    It typically includes only significant entities which have business

    meaning, along with theirrelationships.

    Conceptual Data Model:

  • 7/27/2019 Database Design - For ALL LEVELS

    8/50

    2010 Wipro Ltd - Confidential8

    Logical database designLogical database design

    Derive a logical model from the information represented in theER model (conceptual model)

    Validate the logical model to check if it fulfils clients

    data and

    transaction requirements

    We focus on one type of logical model which is relationalmodel

    Logical model = relational model Relational model means Collection of connected tables Mapping from conceptual model to logical model mainly

    involves

    Designing tables with primary keysAnd linking tables with foreign keys

    Logical modelling is independent of database type

  • 7/27/2019 Database Design - For ALL LEVELS

    9/50

    2010 Wipro Ltd - Confidential9

    Logical Data ModelLogical Data Model

    Creates/

    Iscreatedbya

    Customer

    CUSTOMERNUMBER:Number

    CUSTOMERFIRSTNAME:String

    CUSTOMERLASTNAME:String

    CUSTOMERBIRTHDATE:String

    Transaction

    TRANSACTIONIDENTIFIER:Number

    CUSTOMERNUMBER:Number(FK)TRANSACTIONAMOUNT:Number

    TRANSACTIONDATE:Datetime

  • 7/27/2019 Database Design - For ALL LEVELS

    10/50

    2010 Wipro Ltd - Confidential10

    How to read the cardinalitiesHow to read the cardinalities

    Must have one and only one

    May have one/Must not exceed one/Must have zero or one

    May have zero, one or more

    Must have at least one and possibly more

  • 7/27/2019 Database Design - For ALL LEVELS

    11/50

    2010 Wipro Ltd - Confidential11

    Entity is an LDM construct that has uniformstructure storing data instances in pre-definedformat

    Attribute is part of an entity and stores atomic dataof uniform structure (e.g. number, string etc.)

    Relationship is a modeling object that defines thedependency between two entities

    Logical Data ModelLogical Data Model

  • 7/27/2019 Database Design - For ALL LEVELS

    12/50

    2010 Wipro Ltd - Confidential12

    Individual tables are derived from strong entities(entities with a clear Primary key)

    Fields in the tables are derived from attributesassociated with entities

    Define the data types of the fields Define the primary key of the table

    Unique and Not Null

    Foreign keys are decided later while modelling therelationships Not all tables (relations) have foreign keys However, relation model is incomplete without deciding

    foreign keys

    Entities & their AttributesEntities & their Attributes

  • 7/27/2019 Database Design - For ALL LEVELS

    13/50

    2010 Wipro Ltd - Confidential13

    Table is a physical implementation of entity (s). Ithas uniform structure and comprises of records

    Column is a physical implementation of an attribute,stores a single value (although not necessarily) andhas uniform validation rule (s)

    Reference is a physical implementation of arelationship that defines basic integrity rules in themodel. It can be implemented using the database,

    trigger or application (e.g. Stored Procedure ofexternal application code)

    Physical Data ModelPhysical Data Model

  • 7/27/2019 Database Design - For ALL LEVELS

    14/50

    2010 Wipro Ltd - Confidential14

    Constraints are also defined, including primary keys, foreign keys, otherunique keys, and check constraints.

    Views can be created from database tables to summarize data or to simplyprovide the user with another perspective of certain data.

    When physical modeling occurs, objects are being defined at the schemalevel. A schema is a group of related objects in a database. A databasedesign effort is normally associated with one schema.

    Other objects such as indexes and snapshots can also be defined duringphysical modeling. Physical modeling is when all the pieces come togetherto complete the process of defining a database for a business.

    Physical Database Modeling

  • 7/27/2019 Database Design - For ALL LEVELS

    15/50

    2010 Wipro Ltd - Confidential15

    Physical modeling involves the actual design of a database according to therequirements that were established during logical modeling.

    Logical modeling mainly involves gathering the requirements of thebusiness, with the latter part of logical modeling directed toward the goalsand requirements of the database.

    Physical modeling deals with the conversion of the logical, or businessmodel, into a relational database model.

    During physical modeling, objects such as tables and columns are createdbased on entities and attributes that were defined during logical modeling.

    Logical Vs Physical Database Modeling

  • 7/27/2019 Database Design - For ALL LEVELS

    16/50

    2010 Wipro Ltd - Confidential16

    Identifying Relationship

    Non-Identifying Relationship

    Weak Relationship

    Binary relationships can be

    One-to-one(1:1)

    One-to-many(1:*)

    Many-to-many(*:*)

    Each of these is modelled differently

    Understanding 1:* type is particularly important Many real world relationships are of type 1:*

    Relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    17/50

    2010 Wipro Ltd - Confidential17

    An identifying relationship means that the child table cannot be uniquely

    identified without the parent. For example, you have this situation in theintersection table used to resolve a many-to-many relationship where the

    intersecting table's Primary Key is a composite of the left and right

    (parents) table's Primary Keys. Example...

    Account (AccountID, AccountNum, AccountTypeID)

    PersonAccount (AccountID, PersonID, Balance)

    Person(PersonID, Name)

    The Account to PersonAccount relationship and the Person to

    PersonAccount relationship are identifying because the child row

    (PersonAccount) cannot exist without having been defined in the parent

    (Account or Person). In other words: there is no personaccount when there

    is no Person or when there is no Account.

    Identifying Relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    18/50

    2010 Wipro Ltd - Confidential18

    A non-identifying relationship is one where the child can be identified

    independently of the parent ( Account - AccountType)

    Example...

    Account( AccountID, AccountNum, AccountTypeID )

    AccountType( AccountTypeID, Code, Name, Description )

    The relationship between Account and AccountType is non-identifying

    because each AccountType can be identified without having to exist in the

    parent table.

    Non-Identifying Relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    19/50

    2010 Wipro Ltd - Confidential19

    These are the most common type of relationships Also known as parent:child relationship

    One parent can have many children The entity on the One side of the relationship is known as the Parent

    entity

    The entity on the many side is known as the Child entity Our task: how 1:* relationship between two entities at ER

    Model level is represented in a relational model We assume that both the participating entities are modelled as tables (

    as explained earlier) Do we make any changes to these tables to reflect the relationship

    between them? Yes, we use a foreign keyto mark the relationship

    We make foreign key decision while modelling 1:* relationship

    One-to-many (1:*) relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    20/50

    2010 Wipro Ltd - Confidential20

    In a 1:* relationship Foreign key is designed as a column in the child table (table one the *

    side) Foreign key references the parent table (table on the 1 side)

    In other words, when you post a foreign key to a table it

    means This table is the child table and For every row in the parent table, this table may havemore than one

    (many) corresponding rows

    Create a few rows of data in the tables participating in the 1:*relationship and check if the foreign key is acting as a link for

    information from the child table to the information from theparent table Example data is always useful in designing foreign keys

    Foreign Key Design

  • 7/27/2019 Database Design - For ALL LEVELS

    21/50

    2010 Wipro Ltd - Confidential21

    Consider the 1:* relationship Oversees between Staff and PropertyForRent

    In this case, Staff is the Parent entity

    Because it is on the one side of the relationship PropertyForRent is the child entity

    Because it is one the many side of the relationship

    When we model this relationship at the relational level We assume that Staff and PropertyForRent are modelled as tables as discussed earlier We post a copy of the PrimaryKey, StaffNo from the Parent entity, Staff as a foreign key

    in the child entity,PropertyForRent

    Our final tables are Staff(StaffNo, lName, fName, Position)

    Primary key StaffNo PropertyForRent(PropertyNo, Street, Town, StaffNo)

    Primary key ProperrtyNoForeign key StaffNo references Staff(StaffNo)

    Example

    Staff PropertyForRentOversees0..1 0..*

  • 7/27/2019 Database Design - For ALL LEVELS

    22/50

    2010 Wipro Ltd - Confidential22

    There are two methods to tackle *:* relationships Method: Create a new table to represent the relationship

    We assume that the two entities participating in the relationship arealready modelled as tables as explained earlier

    The third table is created to represent the relationship

    This methods result in similar solutions Three tables, where one of the tables (relationship table) links both

    the entity tables through foreign keys

    Many-to-many (*:*) Relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    23/50

    2010 Wipro Ltd - Confidential23

    Example: Method for modelling *:* relationship

    Here, the *:* relationship between Client and

    PropertyForRent is directly represented as a new table

    viewing

    Primary key for the new entity includes the two foreign

    keys from the two participating entities

  • 7/27/2019 Database Design - For ALL LEVELS

    24/50

    2010 Wipro Ltd - Confidential24

    Generally, in relationship modelling we always identify theparent table Then post a copy of its primary key as the foreign key in the child

    table

    In this case of 1:1, max (cardinality) constraints which are 1:1do not help to identify the parent table

    Therefore we use min (participation) constraints to identifythe parent table

    For example, we choose the entity with min value zero as theparent entity, if the other participating entity has min value ofone

    One-to-one (1:1) relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    25/50

    2010 Wipro Ltd - Confidential25

    Complex relationships too can be simplified into simpler 1:1 or 1:*relationships first and then modelled at the logical level

    Alternatively, a new table can be created to represent a complexrelationship.

    Foreign keys are posted in the new table from all the participating entities

    Complex Relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    26/50

    2010 Wipro Ltd - Confidential26

    Complexity

    Conceptual

    Data Model

    Logical

    Data Model

    Physical

    Data Model

    Dimensional

    Data Model

    Physical

    Data Model

  • 7/27/2019 Database Design - For ALL LEVELS

    27/50

    2010 Wipro Ltd - Confidential27

    Enterprise Data Modeling tools:

    Rational Rose (Object Oriented and Relational modeling)

    Data Modeling tools:

    AllFusion (ERWin)-This tool is a Market leader (according to GG and

    Foresters) Power Designer-This tool is a major contender

    Embarkaderos E/R Studio

    Model repositories:

    Model Mart (ERWin/AllFusion)

    Power Designer Model repository

    Classification of Data Modeling tools

  • 7/27/2019 Database Design - For ALL LEVELS

    28/50

    2010 Wipro Ltd - Confidential28

    Normalization is the process of efficiently organizing data in a database.

    There are two goals of the normalization process:

    eliminating redundant data

    ensuring data dependencies (only storing related data in a table)

    What Is Database Normalization?

  • 7/27/2019 Database Design - For ALL LEVELS

    29/50

    2010 Wipro Ltd - Confidential29

    Decreased storage requirements by effectively choosing the data type

    Faster search performance!

    Smaller file for table scans.

    More directed searching.

    Improved data integrity!

    What are the Benefits of Database Normalization?

  • 7/27/2019 Database Design - For ALL LEVELS

    30/50

    2010 Wipro Ltd - Confidential30

    First Normal Form (1NF)

    Second Normal Form (2NF)

    Third Normal Form (3NF)

    Boyce-Codd Normal Form (BCNF)

    Fourth Normal Form (4NF) Fifth Normal Form (5NF)

    What are the Normal Forms?

  • 7/27/2019 Database Design - For ALL LEVELS

    31/50

    2010 Wipro Ltd - Confidential31

    Our Table

    name phone1 phone2 email1 email2

    Mike Hillyer 403-555-1717 403-555-1919 [email protected] [email protected]

    Tom Jensen 403-555-1919 403-555-1313 [email protected] [email protected]

    Ray Smith 403-555-1919 403-555-1111 [email protected]

    user

    name

    nickname

    phone1

    phone2

    phone3

    cell

    pager

    address

    city

    province

    postal_code

    country

    email1

    email2web_url

    company

    department

    picture

    notes

    email_format

  • 7/27/2019 Database Design - For ALL LEVELS

    32/50

    2010 Wipro Ltd - Confidential32

    Eliminate duplicative columns from the same table.

    Create separate tables for each group of related data and identify eachrow with a unique column or set of columns (the primary key).

    Benefits

    Easier to query/sort the data

    More scalable

    Each row can be identified for updating

    First Normal Form

  • 7/27/2019 Database Design - For ALL LEVELS

    33/50

    2010 Wipro Ltd - Confidential33

    One Solution

    first_name last_name phone email

    Mike Hillyer 403-555-1717 [email protected]

    Mike Hillyer 403-555-1919 [email protected]

    Tom Jensen 403-555-1919 [email protected]

    Tom Jensen 403-555-1313 [email protected]

    Ray Smith 403-555-1919 [email protected]

    Ray Smith 403-555-1111

    Multiple rows per user

    Emails are associated with only one other

    phone

    Hard to Search

    user

    first_name

    last_name

    nickname

    phonecell

    pager

    address

    city

    province

    postal_code

    country

    web_url

    department

    picture

    notes

  • 7/27/2019 Database Design - For ALL LEVELS

    34/50

    2010 Wipro Ltd - Confidential34

    Satisfying 1NF

    user

    PK user_id

    first_name

    last_namenickname

    address

    city

    province

    postal_code

    country

    web_url

    company

    department

    picture

    notes

    phone

    PK phone_id

    country_code

    number

    extension

    email

    PK email_id

    address

  • 7/27/2019 Database Design - For ALL LEVELS

    35/50

    2010 Wipro Ltd - Confidential35

    Three Forms

    One to (zero or) One

    One to (zero or) Many

    Many to Many

    One to One

    Same Table?

    One to Many

    Place PK of the One in the Many

    Many to Many

    Create a joining table

    Forming Relationships

  • 7/27/2019 Database Design - For ALL LEVELS

    36/50

    2010 Wipro Ltd - Confidential36

    Joining Tables

    user

    PK user_id

    first_name

    last_name

    nickname

    address

    city

    province

    postal_code

    countryweb_url

    picturenotes

    email_format

    email

    PK address

    FK1 user_id

    user_phone

    PK,FK1 phone_idPK user_id

    type

    phone

    PK phone_id

    country_code

    number

    extension

  • 7/27/2019 Database Design - For ALL LEVELS

    37/50

    2010 Wipro Ltd - Confidential37

    user

    PK user_id

    first_name

    last_name

    nickname

    address

    city

    province

    postal_code

    country

    web_url

    picture

    notes

    email_format

    email

    PK address

    FK1 user_id

    user_phone

    PK,FK1 phone_id

    PK user_id

    type

    phone

    PK phone_id

    country_code

    number

    extension

    Our User Table

    first_name last_name company department

    Mike Hillyer MySQL Documentation

    Tom Jensen CPNS Finance

    Ray Smith CPNS Documentation

  • 7/27/2019 Database Design - For ALL LEVELS

    38/50

    2010 Wipro Ltd - Confidential38

    Meet all the requirements of the first normal form.

    Remove subsets of data that apply to multiple rows of a table and place

    them in separate tables.

    Create relationships between these new tables and their predecessors

    through the use of foreign keys.

    Benefits

    Increased storage efficiency

    Less data repetition

    Second Normal Form

  • 7/27/2019 Database Design - For ALL LEVELS

    39/50

    2010 Wipro Ltd - Confidential39

    Satisfying 2NF

    email

    PK address

    type

    FK1 user_id

    phone

    PK phone_id

    country_code

    number

    extension

    type

    user

    PK user_id

    first_name

    last_name

    nickname

    address

    city

    province

    postal_code

    country

    web_url

    picture

    notes

    user_phone

    PK,FK1 user_id

    PK,FK2 phone_id

    company

    PK company_id

    name

    user_company

    PK,FK1 user_id

    PK,FK2 company_id

    department

    email

    PK address

    FK1 user_id

    user

    PK user_id

    first_name

    last_name

    nickname

    address

    city

    province

    postal_code

    country

    web_url

    picture

    notes

    email_format

  • 7/27/2019 Database Design - For ALL LEVELS

    40/50

    2010 Wipro Ltd - Confidential40

    Table must be in Second Normal Form

    If your table is 2NF, there is a good chance it is 3NF

    Remove columns that are not dependent upon the primary key.

    All attributes that are not part of the key must not depend on

    any other non-key attributes.

    Benefits

    No extraneous data

    Third Normal Form

  • 7/27/2019 Database Design - For ALL LEVELS

    41/50

    2010 Wipro Ltd - Confidential41

    Satisfying 3NF

    email

    PK address

    FK1 user_id

    format

    phone

    PK phone_id

    country_codenumber

    type

    user

    PK user_id

    first_namelast_name

    nickname

    address

    city

    province

    postal_code

    country

    web_url

    picture

    notes

    user_phone

    PK,FK1 user_id

    PK,FK2 phone_id

    extension

    company

    PK company_id

    name

    user_company

    PK,FK1 user_id

    PK,FK2 company_id

    department

  • 7/27/2019 Database Design - For ALL LEVELS

    42/50

    2010 Wipro Ltd - Confidential42

    Finding Balance

    email

    PK address

    FK1 user_id

    format

    phone

    PK phone_id

    FK1 type_id

    area_code

    NXX

    NCXFK2 country_id

    user

    PK user_id

    first_name

    last_name

    nickname

    unit

    street_number

    street_name

    street_type

    quadrant

    web_url

    picture

    notes

    FK1 postal_code

    user_phone

    PK,FK1 user_id

    PK,FK2 phone_id

    extension

    company

    PK company_id

    name

    user_department

    PK,FK1 user_id

    PK,FK2 department_id

    type

    PK type_id

    type

    country

    PK country_id

    Name

    phone_code

    department

    PK department_id

    name

    FK1 company_id

    postal_code

    PK postal_code

    FK1 city_id

    province

    PK province_id

    Name

    Abbreviation

    FK1 country_id

    city

    PK city_id

    name

    FK1 province_id

  • 7/27/2019 Database Design - For ALL LEVELS

    43/50

    2010 Wipro Ltd - Confidential43

    Only one valid reason exists for de-normalizing a relational design - to

    enhance performance.

    De-normalization is the process of putting one fact in numerous places.

    This speeds data retrieval at the expense of data modification.

    Normalize first, then de-normalize

    Use only when you cannot optimize

    Try temp tables, UNIONs, VIEWs,

    Always monitor and periodically re-evaluate all de-normalized

    applications.

    I/O saved , CPU saved , complexity of update programming , cost of

    returning to a normalized design

    Goal of de-normalization

  • 7/27/2019 Database Design - For ALL LEVELS

    44/50

    2010 Wipro Ltd - Confidential44

    Types of De-normalization

  • 7/27/2019 Database Design - For ALL LEVELS

    45/50

    2010 Wipro Ltd - Confidential45

    Surrogate Key: A unique {primary key}generated by the {RDBMS}that isnot derived from any data in the database and whose only significance is toact as the primary key. A surrogate key is frequently a sequential number(e.g. a {Sybase}"{identity column})

    Candidate Key: is a combination of attributes that can be uniquely used to

    identify a database record without any extraneous data. Each table mayhave one or more candidate keys. One of these candidate keys is selectedas the table primary key.

    Vertical slicing: A Projection is a vertical slicing of the table. You simplyindicate which columns you want.

    Horizontal slicing: A Selection is a horizontal slicing of the table. Theselection defines which records you returned out of all possible records inthe table. The projection is defined in the WHERE clause of the SQLstatement.

    Concepts

  • 7/27/2019 Database Design - For ALL LEVELS

    46/50

    2010 Wipro Ltd - Confidential46

    Using the Data Modeling tools

    CASE tool

    repository

    Target

    DB

    CASE

    Tool

    Model

    Meta-model

    DB

    Meta-model

    CompareGenerate

    DDL

  • 7/27/2019 Database Design - For ALL LEVELS

    47/50

    2010 Wipro Ltd - Confidential47

    Purpose of Model Mart in Erwin is to allow centralized sharing of models

    automate programming tasks and forward and reverse engineer databases

    include data types, macros and model storage that a data modeler must

    address.

    We can generate the Erwin met model to create a data dictionary that

    stores information about the data structures used in Erwin models

    Attribute Name as it appears in the logical model.

    Data type lists the default data type assigned to the table column. The

    actual data type used is dependent on which target server you use to hold

    the Erwin dictionary.

    Description indicates the type of information stored by the attribute.

    Valid Values lists the valid values, the reference table that contains thevalid value list, or validation expression for the attribute. Where no

    validation rule has been defined, the data must conform to the attributes

    data type.

    FK indicates if the attribute is a foreign key (FK) attribute. This column is

    blank for non-foreign key attributes.

    Data Modeling Tool - ERWIN

  • 7/27/2019 Database Design - For ALL LEVELS

    48/50

    2010 Wipro Ltd - Confidential48

    Pick the topic for analysis

    Create the Conceptual Data Model

    Create LDM

    Validate the LDM (Formal validation, normalization etc.)

    Perform the transition from LDM to PDM using ERWin

    Create PDM

    Perform the Physical design (indexes, table spaces, buffer pools, partitioning etc.) Validate the PDM (make sure that de-normalization did not change the data integirty,

    validate reference implementation through RI)

    Perform the model walkthrough (attack the PDM to see if we have any holes)

    Implement OLTP model (enjoy the work of DBA)

    Play with Data Model/Database to see what we can/cant do (experience the SQL againstour masterpiece)

    Define the requirements for Decision support system (understand what Data Analyticsmeans and why we have to create a separate data model for this)

    Create DSS model (Star, Snow flake, Constellation, Data Mart, Data Warehouse)

    PlanPlan

  • 7/27/2019 Database Design - For ALL LEVELS

    49/50

    Questions?

  • 7/27/2019 Database Design - For ALL LEVELS

    50/50

    2010 Wipro Ltd - Confidential

    Thank You

    Sandeep Kar

    Module Leader

    [email protected]