4. spatial dbms

Upload: sujan-singh

Post on 08-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 4. Spatial DBMS

    1/21

    4. Spatial DBMS 

    Introduction

    Computers and Information Technology have driven the human society towards a comfortable life wherethousands of activities are performed on a click of a button and help save a lot of time. Everyday activities suchas withdrawing money from a bank, making airline reservations, accessing a book from a library etc., are the

    processes that involve computer programs that access databases and database systems to facilitate all theseservices to the users.

    The database systems have been growing and are involved in diverse applications such as:

      A multimedia database to store pictures, video clips, and sound messages.

      A real time database technology to control industrial and manufacturing processes.

      A GIS database used for storing the geographical information.  Other common applications of databases include bank transactions, sales, reservations, registration etc.

    Let us start understanding the world of databases with its functional unit known as data.

    Data is defined as a collection of observations that could be in forms of facts, values, measurement, images etc.The data when used with a context becomes the information. It can be further processed, organized or simplysummarized.

    Given below is a list of numbers. This list of numbers is data since it communicates no meaning as such.

    25

    2324

    27

    As soon as it is qualified that the numbers refer to the temperature recorded in degree Celsius in different citiesof India on a particular day, the numbers become the temperature information.

  • 8/19/2019 4. Spatial DBMS

    2/21

    Date Cities Temperature (ºC)

    23 October 2011

    Delhi 25

    Mumbai 23

    Kolkata 24

    Chennai 27

    Data Storage

    There are two approaches of storing data:

    1.  File based2.  Database

    File based approach

    File-based system is a collection of application programs that perform services for the users with each programdefining and managing its data.

    A flat file is an ordinary file where records of the file do not contain any information to communicate the file

    structure or relationship among the records to the application which is using the file.

    Types of File Structure

     

    Unordered files  Ordered files  Index files

    Unordered files 

    Also known as heap files, the unordered sequential files have the basic type of organization where records areplaced in the file in the order in which they are inserted i.e. new records are inserted at the end of the file.

  • 8/19/2019 4. Spatial DBMS

    3/21

     

    Ordered files 

    The records in such a file can be ordered based on the values of one of their fields known as ordering fields. If

    the ordering field is the field whose value are distinct for each individual entity of the file, then the field is knownas ordering key. Reading the records in order of ordering key values becomes efficient as no sorting is required.

  • 8/19/2019 4. Spatial DBMS

    4/21

     Index Files 

    Indexes are additional access structures which are used to speed up the retrieval of the records in response to asearch condition. These provide alternative ways of accessing the records without affecting the physicalplacement of records.

    Data Structure Models

    Data models are the conceptual models that describe the structures of databases. Structure of a database isdefined by the data types, the constraints and the relationships for the description or storage of data. Followingare the most often used data models:

    Hierarchical Data Structure Model 

    It is the earliest database model that is evolved from file system where records are arranged in a hierarchy or as

    a tree structure. Records are connected through pointers that store the address of the related record. Eachpointer establishes a parent child relationship where a parent can have more than one child but a child can onlyhave one parent. There is no connection between the elements at the same level. To locate a particular record,you have to start at the top of the tree with a parent record and trace down the tree to the child.

    F igur e 3 : Example showing hi erarchical data structure  

  • 8/19/2019 4. Spatial DBMS

    5/21

     The figure above describes the electronic gadgets in day today use. We can see that flash  is a child of mp 3 players, which is a child of portable electronics, which is a child of  electronics. The topmostelement electronicshas no parent.

    Tube, LCD, plasma, CD players and 2 way radios are leaf nodes (don’t have any children). 

     Advantages 

      Easy to understand: The organization of database parallels a family tree understanding which is quiteeasy.

      Accessing records or updating records are very fast since the relationships have been predefined.

    Disadvantages 

      Large index files are to be maintained and certain attribute values are repeated many times which lead todata redundancy and increased storage.

      The rigid structure of this model doesn’t allow alteration of tables, therefore to add a new relationshipentire database is to be redefined.

    Network Data Structure Model 

    A network is a generalized graph that captures relationships between objects using connectivity. A networkdatabase consists of a collection of records that are connected to each other through links. A link is an association

    between two records. It allows each record to have many parents and many children thus allowing a naturalmodel of relationships between entities.

     Advantages   The many to many relationships are easily implemented in a network data model.

      Data access and flexibility in network model is better than that in hierarchical model. An application canaccess an owner record and the member records within a set .

      It enforces data integrity as a user must first define owner record and then the member records.

      The model eliminated redundancy but at the expense of more complicated relationships.

  • 8/19/2019 4. Spatial DBMS

    6/21

    Disadvantages

      The network model has a complex structure that requires familiarity from user’s as well as programmer’send.

    Relational Data Structure Model 

    The relational data model was introduced by Codd in 1970. The relational database relates or connects data indifferent files through the use of a common field. A flat file structure is used with a relational database model. Inthis arrangement, data is stored in different tables made up of rows and columns. The columns of a table arenamed by attributes. Each row in the table is called a tuple and represents a basic fact. No two rows of the sametable may have identical values in all columns.

    There are two crucial data integrity constraints viz. primary key and foreign key. A primary key is an attributewhose value is unique across all tuples (rows) in a relation (table). The primary key of one table appearing as anattribute of another table is known as a foreign key in that table.

  • 8/19/2019 4. Spatial DBMS

    7/21

     

     Advantages 

      The manager or administrator does not have to be aware of any data structure or data pointer. One caneasily add, update, delete or create records using simple logic.

    Disadvantages 

      A few search commands in a relational database require more time to process compared with otherdatabase models.

  • 8/19/2019 4. Spatial DBMS

    8/21

     

    Working with Tables 

    Most databases instead of keeping their data together in a single table, organize the data into multiple tables

    each focusing on a specific topic. A user can link these tables if required information isn’t present in a single table.The records in one table can be associated with records in the other table through a common field. The temporaryassociations can be made by joining and relating the tables.

    Join 

    Joining appends the fields of one table to fields of another through an attribute/field common to both the tables.

    Relate 

    Relate defines a relationship between two tables. Relates are bidirectional which means both tables involved willbe able to use the relate regardless of which table owns the relate. The associated data isn’t appended in the

    table like it is in join. However one can access the related data by selecting a particular record and then going tothe related tables against that record.

  • 8/19/2019 4. Spatial DBMS

    9/21

      Database and Database Management System

    A database is a collection of logically related data. It represents an aspect of a real world and is designed, builtor populated with data for a specific purpose. Many databases exist for many applications, and each one of them

    is maintained by a collection of programs known as a database management system.

    A database management system (DBMS) is a computer program that stores and manages large amounts of data.One can define, construct, edit and share the database among various users and applications.

    A database is defined by the data types, structures, and data constraints that are stored in the database.Constructing the database means storing the data in the database. Manipulating involves querying the databaseto retrieve specific data, updating the database, and generation of reports. Sharing is allowing multiple users andprograms to access the database concurrently. Database contents are divided into two parts viz. schema anddata. Schema is the structure of database. It indicates the rule which data must obey. Data on the other hand

    are the facts.

    Imagine that we want to store the information about the students of a particular class. This information in adatabase would be stored together in a single container called table. The table has rows (with different students)and columns (that contain facts on the students such as studentname, age etc.) The table is

    namedSTUDENT_INFO (see below).

    STUDENT_INFO 

    STUDENTNAME  STUDENTID   AGE   SEX  

    ALICE CE01 18 F

    ANDREW CE02 17 M

    DAVID CE03 18 M

    DONA CE04 18 F

    The schema would define that STUDENT_INFO has four facts/attributes viz. ‘ STUDENTNAME’, ‘STUDENTID’, ‘AGE’, ‘SEX’. To ensure that correct data is filled in all the columns of the table one can also enforce rules andconstraints for data input.

  • 8/19/2019 4. Spatial DBMS

    10/21

    Advantages of DBMS 

      Controlling Redundancy  

    Redundancy means storing the same data multiple times. DBMS checks redundancy and prevents duplication ofefforts, saves storage space and preserves the data files from becoming inconsistent.

      Restricting Unauthorized Access 

    A DBMS provides a security and authorization system, which the database administrator uses to create accounts

    and to specify account restriction.

      Providing Storage Structures for Efficient Query Processing 

    Database systems provide capabilities for efficient execution of queries and updates. Because the database istypically stored on disk, it provides specialized data structures to speed up disk search for the desired records.

    Auxiliary files called indexesare used for this purpose.

      Providing Backup and Recovery  

    The backup and recovery subsystem of the DBMS helps in recovering from hardware or software failures.

      Enforcing Integrity Constraints 

    Most database applications have certain integrity constraints that must be held for the data. A DBMS should

    provide capabilities for defining and enforcing these constraints.

    Along with the advantages, the DBMS usage involves overhead costs that are not incurred in conventional file

    processing. These overhead costs are due to:

      High initial investment in hardware, software and training  Overhead for providing security, concurrency control, recovery and integrity functions

  • 8/19/2019 4. Spatial DBMS

    11/21

    One may use regular files instead of a DBMS under the following circumstances:

      The database and applications are simple, well defined and are not expected to change.  Multiple user access to data isn’t required 

    Database Architecture 

    We can consider the database on three levels of abstraction:

    1.  External Level  refers to user’s view of the database. It describes a part of the database for particulargroup of users. Depending on their needs, different users access different parts of the database. Itemploys a powerful and flexible security mechanism by hiding parts of the database from certain users.

    2.  Conceptual Level refers to the logical structure of the entire database. It describes data as well as the

    relationships among the data.

    3.   Internal Level refers to the details of physical storage of the database on the computer. It consists ofdescription of storage space allocation for data and indexes, record placements and data compression.

    Figure 1 : Three tier architecture of a database 

  • 8/19/2019 4. Spatial DBMS

    12/21

      Entity Relationship Model

    The entity relationship (ER) model represents the conceptual design of a database. The ER diagram helps inunderstanding the components of a database and relationships among them.

    Entity Record 

    An entity is a real world item that exists on its own. The set of all possible values for an entity is the entity type.For example, a particular student such as ‘Ravi Kumar’  is an entity record. Student is the entity type in this case.

    In ER diagram we show entity type as a rectangle containing the type name.

    Attribute

    Properties that describe an entity are known as its attributes. The value of an attribute could be expressed innumbers or in text. The set of all possible values of an attribute is known as attribute domain. In ER diagramattributes are represented by ovals attached to the entity by a line.

    Attributes can be classified as:

    Key attributes: An attribute whose values are distinct for each individual entity record and are used for

    identifying an individual entity record are known as key attributes. For example in the student entity type,StudentID is the key attribute since no two students can have same StudentID.

    A key attribute is underlined in ER diagram.

  • 8/19/2019 4. Spatial DBMS

    13/21

    Non-key attributes : Attributes that are not unique but are used to describe the entities are known as non-keyattributes. Names, age, address of a student are the non key attributes.

    Simple : Attributes that can’t be divided into subparts are called simple attributes. For example StudentIDwhichis just a number is a simple attribute.

    Composite : Attributes that can be divided into subparts with each subpart having their own independentmeaning are composite attributes. For example Name of a student can be divided into two parts i.e. firstnameand last name. This could be illustrated by branching off the components of the attribute.

    Single valued: Attributes that can hold only single value at a time are called single valued attributes.  Age of a

    student can’t have more than one value and hence it is a single valued attribute. 

    Multiple valued: Attributes that can have more than one value are called multiple valued attributes. Forexample the contact number  of a student can have two or more than two phone numbers.

    A multi valued attribute is shown as:

  • 8/19/2019 4. Spatial DBMS

    14/21

    Derived attributes: The attributes that are derived using a mathematical formula and operations on otherattributes are called derived attributes.

    Stored attributes: The attributes from which another attributes can be derived are called stored attributes.Theage of a student can be calculated by counting the number of years starting from his date of birth to the present

    date. In this case age is the derived attribute and date of birth is the stored attribute. In ER diagram a derivedattribute is represented with a dotted oval and a line.

    Relationship 

    A relationship is an association among entity types. It is represented as a diamond in ER diagram.

    For example an entity ‘student’  can be associated with another entity ‘class’  as follows:

     ‘Attends’ is the relationship between the two entities.

    The degree of a relationship type is the number of participating entity types. The above example has degree 2

    and is therefore a binary relationship.

    Cardinality 

    Cardinality denotes the occurrences of data on either side of a relation. The cardinality ratio for a binaryrelationship specifies the maximum number of relationship instances an entity can participate in.

  • 8/19/2019 4. Spatial DBMS

    15/21

    A one to one relationship indicates that a single instance of one entity is associated with a single instance inthe related entity.

    A student performs a particular experiment.

    A one to many or a many to one relationship indicates that a single instance of one entity is associated with

    one or more instances of the related entity.

    A student associated with many projects indicates a one to many relationship, and many students associated

    with a single project indicate a many to one relationship.

    A many to many relationship indicates that either entity participating in the relationship may have manyinstances.

    Every student is attending one or more classes. Every class has one or more students.

  • 8/19/2019 4. Spatial DBMS

    16/21

    Example: The diagram shown below represents the academic functioning of a college. There are five entities viz.Department, Faculty, Student, Course, and Hostel . All the five entities have their own attributes.DNumber,FacultyID, StudentID, CourseID, and HostelID are the key attributes of Department, Faculty, Student,Course and Hostel  respectively. The entities are related to each other and the respective relationships areexplained below:

    A college has many departments. A department would have students as well as faculty. The one to manyrelationship between department and students, and, department and faculty states that a department belongs tomany students and it employs many faculty members. Looking at these relationships in a reverse direction

    conveys that a student as well as a faculty belongs to a single department and thus establishes one to onerelationship.

    A student can register himself into various courses; similarly a course can be studied by many students. A

    student lives in a single hostel but a hostel accommodates many students. A department offers many coursesbut a particular course belongs to a particular department. A faculty teaches many courses but a particularcourse is taught by a single faculty only.

  • 8/19/2019 4. Spatial DBMS

    17/21

     

    F igur e 2 : ER-Diagram showing academic functioning of a college  

  • 8/19/2019 4. Spatial DBMS

    18/21

      Normalization

    Normalization is a design technique which helps designing relational databases. The objective of normalization isto create a set of relational tables that are free of redundant data and to make data consistent.

    First normal form 

      Eliminate repeating groups in individual tables.

      Create a separate table for each set of related data.  Identify each set of related data with a primary key.

    Second normal form 

      Create separate tables for sets of values that apply to multiple records .  Relate these tables with a foreign key

    Third normal form 

      Eliminate fields that do not depend on the key

    An example from Microsoft Support ( http://support.microsoft.com/kb/100139 ) explains the process ofnormalization which is described as following:

    Unnormalized table 

    http://support.microsoft.com/kb/100139http://support.microsoft.com/kb/100139http://support.microsoft.com/kb/100139

  • 8/19/2019 4. Spatial DBMS

    19/21

    The example shows a table (Table1) in which the group ‘class’ is mentioned three times. Since one student hasseveral classes, these classes should be listed in a separate table. Another table in first normal form (Table2) iscreated by eliminating the repeating group (Class#), as shown below:

    First Normalization (No Repeating Groups) 

    Note the Class# values for each Student# value in the above table. Class# is not functionally dependent onStudent# (primary key), so this relationship is not in second normal form. Class# is separated from the firstnormalization table and is placed in another table (Table4).

    Second Normalization (Eliminate Redundant Data) 

  • 8/19/2019 4. Spatial DBMS

    20/21

    The attribute Adv-Room is functionally dependent on the Advisor attribute. It must be moved from the studenttable to some other table let’s say faculty table (Table6). 

    Third Normalization (Eliminate data not dependent on key)  

    Object Oriented Database

    According to Worboys (1995) spatial data can’t be managed properly by relational data model approach. Spatialdata don’t naturally fit into tabular structures. More attention has been given to the development of objectoriented approach of database design. The aim of object oriented model is to allow data modeling which is closer

    to real world. An object-oriented database uses objects as elements within database files. An object is a logicalgrouping of related data that represents a real world entity. Each object is a distinct entity which is identifiedusing a key attribute called ObjectID. The object can be grouped together to form a class. Objects of the sameclass have same attributes, behavior and relationships with other objects.

    The attributes of the object are known as its states. In addition to the states of an object information about itsbehavior is stored. These are the operations that are performed on the object.

    For example ‘hotel’ is an object with states such as name, address, number of rooms. The operations that can beperformed on this object can vary as plotting on map, adding to a database, increasing or decreasing number ofrooms etc. The object hotel belongs to a class called ‘buildings’. Any object in the class hotel will inherit theproperties of the class building. Thus the states and operations applicable to ‘building’ will also apply to ‘hotel’. 

  • 8/19/2019 4. Spatial DBMS

    21/21

    Longley et al. (2001) list the following features of object data models which make them good for modeling GISsystems:

      Encapsulation: packaging together of the description of state and behavior in each object  Inheritance: ability to use some or all characteristics of one object in another object 

    Polymorphism: specific implementation of operations such as create, delete etc for each object.

    References

    http://wofford-ecs.org/dataandvisualization/ermodel/material.htm viewed on 8 November 2011 Oppel, A2010,Databases demystified , 2 nd edn, McGraw Hill, United States

    http://support.microsoft.com/kb/100139 viewed on 14 November 2011 Elmasri, R & Navathe, S2000,Fundamentals of database systems, 3 rd edn, Pearson Education, Singapore

    Goodchild, M.F., Longley, P.A., Maguire, D. J. & Rhind, D.W 2001, Geographic information systems and science,John Wiley & Sons Ltd. , England.

    http://wofford-ecs.org/dataandvisualization/ermodel/material.htm%208%20November%202011http://support.microsoft.com/kb/100139http://support.microsoft.com/kb/100139http://support.microsoft.com/kb/100139http://support.microsoft.com/kb/100139http://wofford-ecs.org/dataandvisualization/ermodel/material.htm%208%20November%202011http://wofford-ecs.org/dataandvisualization/ermodel/material.htm%208%20November%202011