object oriented databases - aminer · pdf fileobject oriented databases connolly & begg....

38
1 Object Oriented Databases Connolly & Begg. Chapters 25-29. Fourth edition COMP 302 Valentina Tamma Objectives Advanced database applications. Unsuitability of RDBMSs for advanced database applications. Object-oriented concepts. Problems of storing objects in relational database. The next generation of database systems. Basics of object-oriented database analysis and design. COMP 302 Valentina Tamma Advanced Database Applications Computer-Aided Design/Manufacturing (CAD/CAM) Computer-Aided Software Engineering (CASE) Network Management Systems Office Information Systems (OIS) and Multimedia Systems Digital Publishing Geographic Information Systems (GIS) Interactive and Dynamic Web sites Other applications with complex and interrelated objects and procedural data. COMP 302 Valentina Tamma Computer-Aided Design (CAD) Stores data relating to mechanical and electrical design, for example, buildings, airplanes, and integrated circuit chips. Designs of this type have some common characteristics: Data has many types, each with a small number of instances. Designs may be very large Design is not static but evolves through time. Updates are far-reaching. Involves version control and configuration management. Cooperative engineering.

Upload: hoangnhu

Post on 21-Mar-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

1

Object Oriented Databases

Connolly & Begg. Chapters 25-29. Fourth edition

COMP 302 Valentina Tamma

Objectives

• Advanced database applications.

• Unsuitability of RDBMSs for advanced databaseapplications.

• Object-oriented concepts.

• Problems of storing objects in relational database.

• The next generation of database systems.

• Basics of object-oriented database analysis and design.

COMP 302 Valentina Tamma

Advanced Database Applications

• Computer-Aided Design/Manufacturing (CAD/CAM)• Computer-Aided Software Engineering (CASE)• Network Management Systems• Office Information Systems (OIS) and Multimedia Systems• Digital Publishing• Geographic Information Systems (GIS)• Interactive and Dynamic Web sites• Other applications with complex and interrelated objects and

procedural data.

COMP 302 Valentina Tamma

Computer-Aided Design (CAD)

• Stores data relating to mechanical and electricaldesign, for example, buildings, airplanes, andintegrated circuit chips.

• Designs of this type have some commoncharacteristics:– Data has many types, each with a small number of

instances.– Designs may be very large– Design is not static but evolves through time.– Updates are far-reaching.– Involves version control and configuration management.– Cooperative engineering.

Page 2: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

2

COMP 302 Valentina Tamma

Advanced Database Applications

• Computer-Aided Manufacturing (CAM)– Stores similar data to CAD, plus data about discrete

production.

• Computer-Aided Software Engineering (CASE)– Stores data about stages of software development lifecycle.

COMP 302 Valentina Tamma

Network Management Systems

• Coordinate delivery of communication servicesacross a computer network.

• Perform such tasks as network path management,problem management, and network planning.

• Systems handle complex data and require real-timeperformance and continuous operation.

• To route connections, diagnose problems, andbalance loadings, systems have to be able to movethrough this complex graph in real-time.

COMP 302 Valentina Tamma

Office Information Systems (OIS) and MultimediaSystems

• Stores data relating to computer control of informationin a business, including electronic mail, documents,invoices, and so on.

• Modern systems now handle free-form text,photographs, diagrams, audio and video sequences.

• Documents may have specific structure, perhapsdescribed using mark-up language such as SGML,HTML, or XML.

COMP 302 Valentina Tamma

Digital Publishing

• Becoming possible to store books, journals, papers,and articles electronically and deliver them over high-speed networks to consumers.

• As with OIS, digital publishing is being extended tohandle multimedia documents consisting of text,audio, image, and video data and animation.

• Amount of information available to be put online is inthe order of petabytes (1015 bytes), making themlargest databases DBMS has ever had to manage.

Page 3: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

3

COMP 302 Valentina Tamma

Geographic Information Systems (GIS)

• GIS database stores spatial and temporalinformation, such as that used in land managementand underwater exploration.

• Much of data is derived from survey and satellitephotographs, and tends to be very large.

• Searches may involve identifying features based, forexample, on shape, color, or texture, usingadvanced pattern-recognition techniques.

COMP 302 Valentina Tamma

Interactive and Dynamic Web Sites

• Consider online catalog for selling clothes. Web sitemaintains a set of preferences for previous visitorsand allows a visitor to:– obtain 3D rendering of any item based on color, size,

fabric, etc.;– modify rendering to account for movement, illumination,

backdrop, occasion, etc.;– select accessories to go with the outfit, from items

presented in a sidebar;• Need to handle multimedia content and to

interactively modify display based on userpreferences and user selections. Added complexity ofproviding 3D rendering.

COMP 302 Valentina Tamma

Weaknesses of RDBMSs

• Poor Representation of “Real World” Entities– Normalization leads to relations that do not correspond to

entities in “real world”.

• Semantic Overloading– Relational model has only one construct for representing data

and data relationships: the relation.– Relational model is semantically overloaded.

• Homogeneous Data Structure– Relational model assumes both horizontal and vertical

homogeneity.– Many RDBMSs now allow Binary Large Objects (BLOBs).

COMP 302 Valentina Tamma

Weaknesses of RDBMSs

• Poor Support for Integrity and Enterprise Constraints• Limited Operations

– RDBMs only have a fixed set of operations which cannot beextended.

• Difficulty Handling Recursive Queries– Extremely difficult to produce recursive queries.– Extension proposed to relational algebra to handle this type of

query is unary transitive (recursive) closure operation.

Page 4: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

4

COMP 302 Valentina Tamma

Example - Recursive Query

SELECT managerstaffNoFROM StaffWHERE staffNo=‘S005’UNIONSELECT managerstaffNoFROM StaffWHERE staffNo= (SELECT managerstaffNo FROM Staff WHERE staffNo=‘S005’)

COMP 302 Valentina Tamma

Weaknesses of RDBMSs

• Impedance Mismatch– Most DMLs lack computational completeness.– To overcome this, SQL can be embedded in a high-level 3rd

Generation Language.– This produces an impedance mismatch - mixing different

programming paradigms.– Estimated that as much as 30% of programming effort and

code space is expended on this type of conversion.

• Other Problems with RDBMSs– Transactions are generally short-lived and concurrency

control protocols not suited for long-lived transactions.– Schema changes are difficult.– RDBMSs are poor at navigational access.

COMP 302 Valentina Tamma

Object-Oriented Concepts

• Abstraction, encapsulation, information hiding.• Objects and attributes.• Object identity.• Methods and messages.• Classes, subclasses, superclasses, and inheritance.• Overloading.• Polymorphism and dynamic binding.

COMP 302 Valentina Tamma

Abstraction

• Process of identifying essential aspects of an entityand ignoring unimportant properties.

• Concentrate on what an object is and what it does,before deciding how to implement it.

Page 5: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

5

COMP 302 Valentina Tamma

Encapsulation and Information Hiding

Encapsulation– Object contains both data structure and set of

operations used to manipulate it.

Information Hiding– Separate external aspects of an object from its

internal details, which are hidden from outside.

• Allows internal details of an object to be changed withoutaffecting applications that use it, provided external detailsremain same.

• Provides data independence.

COMP 302 Valentina Tamma

Object

Uniquely identifiable entity that contains both theattributes that describe the state of a real-world objectand the actions associated with it.

– Definition very similar to that of an entity, however, objectencapsulates both state and behavior; an entity only modelsstate.

COMP 302 Valentina Tamma

Attributes

Contain current state of an object.

• Attributes can be classified as simple or complex.• Simple attribute can be a primitive type such as

integer, string, etc., which takes on literal values.• Complex attribute can contain collections and/or

references.• Reference attribute represents relationship.• An object that contains one or more complex attributes

is called a complex object.

COMP 302 Valentina Tamma

Object Identity

Object identifier (OID) assigned to object when it iscreated that is:

– System-generated.– Unique to that object.– Invariant.– Independent of the values of its attributes (that is, its state).– Invisible to the user (ideally).

Page 6: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

6

COMP 302 Valentina Tamma

Object Identity - Implementation

• In RDBMS, object identity is value-based: primary keyis used to provide uniqueness.

• Primary keys do not provide type of object identityrequired in OO systems:– key only unique within a relation, not across entire system;– key generally chosen from attributes of relation, making it

dependent on object state.

COMP 302 Valentina Tamma

Object Identity - Implementation

• Programming languages use variable names andpointers/virtual memory addresses, which alsocompromise object identity.

• In C/C++, OID is physical address in process memoryspace, which is too small - scalability requires thatOIDs be valid across storage volumes, possibly acrossdifferent computers.

• Further, when object is deleted, memory is reused,which may cause problems.

COMP 302 Valentina Tamma

Advantages of OIDs

• They are efficient.• They are fast.• They cannot be modified by the user.• They are independent of content.

COMP 302 Valentina Tamma

Methods and Messages

Method– Defines behavior of an object, as a set of encapsulated

functions.

Message– Request from one object to another asking second object to

execute one of its methods.

Page 7: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

7

COMP 302 Valentina Tamma

Object Showing Attributes and Methods

COMP 302 Valentina Tamma

Example of a Method

COMP 302 Valentina Tamma

Class

Blueprint for defining a set of similar objects.

• Objects in a class are called instances.• Class is also an object with own class attributes and

class methods.• Class different from type

COMP 302 Valentina Tamma

Class Instance Share Attributes and Methods

Page 8: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

8

COMP 302 Valentina Tamma

Subclasses, Superclasses, and Inheritance

Inheritance allows one class of objects to be definedas a special case of a more general class.

• Special cases are subclasses and more generalcases are superclasses.

• Process of forming a superclass is generalization;forming a subclass is specialization.

• Subclass inherits all properties of its superclass andcan define its own unique properties.

• Subclass can redefine inherited methods.

COMP 302 Valentina Tamma

Subclasses, Superclasses, and Inheritance

• All instances of subclass are also instances ofsuperclass.

• Principle of substitutability states that instance ofsubclass can be used whenever method/constructexpects instance of superclass.

• Relationship between subclass and superclass knownas A KIND OF (AKO) relationship.

• Four types of inheritance: single, multiple, repeated,and selective.

COMP 302 Valentina Tamma

Single Inheritance

COMP 302 Valentina Tamma

Multiple Inheritance

Page 9: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

9

COMP 302 Valentina Tamma

Repeated Inheritance

COMP 302 Valentina Tamma

Overriding, Overloading, and Polymorphism

Overriding– Process of redefining a property within a subclass.

Overloading– Allows name of a method to be reused with a class or

across classes.

Polymorphism– Means ‘many forms’. Three types: operation, inclusion,

and parametric.

COMP 302 Valentina Tamma

Example of Overriding

• Might define method in Staff class to increment salarybased on commission:method void giveCommission(float branchProfit) {

salary = salary + 0.02 * branchProfit; }

• May wish to perform different calculation for commissionin Manager subclass:method void giveCommission(float branchProfit) {

salary = salary + 0.05 * branchProfit; }

COMP 302 Valentina Tamma

Overloading Print Method

Page 10: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

10

COMP 302 Valentina Tamma

Dynamic Binding

Runtime process of selecting appropriate methodbased on an object’s type.

• With list consisting of an arbitrary number of objectsfrom the Staff hierarchy, we can write:

list[i]. print

• and runtime system will determine which print()method to invoke depending on the object’s(sub)type.

COMP 302 Valentina Tamma

Complex Objects

An object that consists of subobjects but is viewed asa single object.

• Objects participate in a A-PART-OF (APO)relationship.

• Contained object can be encapsulated within complexobject, accessed by complex object’s methods.

• Or have its own independent existence, and only anOID is stored in complex object.

COMP 302 Valentina Tamma

Object-Oriented Languages

Object-oriented concepts can be used in a database systemin different ways– Object-orientation can be used as a design tool, and be encoded

into, for example, a relational database– Analogous to modeling data with E-R diagram and then converting

to a set of relations)– The concepts of object orientation can be incorporated into a

programming language that is used to manipulate the database. Object-relational systems – add complex types and object-

orientation to relational language. Persistent programming languages – extend object-oriented

programming language to deal with databases by addingconcepts such as persistence and collections.

COMP 302 Valentina Tamma

Persistent Programming Languages

• Persistent Programming languages allow objects to be created and stored ina database, and used directly from a programming language– allow data to be manipulated directly from the programming language

No need to go through SQL.– No need for explicit format (type) changes

format changes are carried out transparently by system Without a persistent programming language, format changes

becomes a burden on the programmer– More code to be written– More chance of bugs

– allow objects to be manipulated in-memory no need to explicitly load from or store to the database

– Saved code, and saved overhead of loading/storing large amountsof data

Page 11: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

11

COMP 302 Valentina Tamma

Persistent Programming Languages

• Drawbacks of persistent programming languages– Due to power of most programming languages, it is easy to make

programming errors that damage the database.– Complexity of languages makes automatic high-level optimization

more difficult.– Do not support declarative querying as well as relational

databases

COMP 302 Valentina Tamma

Storing Objects in Relational Databases

1. One approach to achieving persistence with an OOPL isto use an RDBMS as the underlying storage engine.

2. Requires mapping class instances (i.e. objects) to oneor more tuples distributed over one or more relations.

3. To handle class hierarchy, have two basics tasks toperform:(1) design relations to represent class hierarchy;(2)design how objects will be accessed:

a) Writing Code to decompose the objects into tuples and storethem in relations

b) Writing code to read tuples from the relations andreconstruct the objects

COMP 302 Valentina Tamma

Storing Objects in Relational Databases

COMP 302 Valentina Tamma

Mapping Classes to Relations

Number of strategies for mapping classes to relations, althougheach results in a loss of semantic information (usually on thehierarchical structure).

(1) Map each class or subclass to a relation:

Staff (staffNo, fName, lName, position, sex, DOB, salary)Manager (staffNo, bonus, mgrStartDate)SalesPersonnel (staffNo, salesArea, carAllowance)Secretary (staffNo, typingSpeed)

The datatype of each attribute must be supported by the RDBMS!

Page 12: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

12

COMP 302 Valentina Tamma

Mapping Classes to Relations

(2) Map each subclass to a relationManager (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate)SalesPersonnel (staffNo, fName, lName, position, sex, DOB, salary, salesArea,carAllowance)Secretary (staffNo, fName, lName, position, sex, DOB, salary, typingSpeed)

(3) Map the hierarchy to a single relationStaff (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate,salesArea, carAllowance, typingSpeed, typeFlag)

TypeFlag is used to distinguish which type each tuple is (1 Manager, 2 SalesStaff,…)

COMP 302 Valentina Tamma

Next Generation Database Systems

First Generation DBMS: Network and Hierarchical– Required complex programs for even simple queries.– Minimal data independence.– No widely accepted theoretical foundation.

Second Generation DBMS: Relational DBMS– Helped overcome these problems.

Third Generation DBMS: OODBMS and ORDBMS.

COMP 302 Valentina Tamma

History of Data Models

Object-Oriented DBMSs – Conceptsand Design

Connolly and Begg, Chapter 26

Page 13: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

13

COMP 302 Valentina Tamma

Object-Oriented Database Design

COMP 302 Valentina Tamma

Relationships

• Relationships represented using reference attributes,typically implemented using OIDs.

• Consider how to represent following binary relationshipsaccording to their cardinality:– 1:1– 1:*– *:*.

COMP 302 Valentina Tamma

1:1 Relationship Between Objects A and B

• Add reference attribute to A and, to maintainreferential integrity, reference attribute to B.

COMP 302 Valentina Tamma

1:* Relationship Between Objects A and B

• Add reference attribute to B and attribute containing setof references to A.

Page 14: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

14

COMP 302 Valentina Tamma

*:* Relationship Between Objects A and B

• Add attribute containing set of references to eachobject.

• For relational database design, would decompose *:Ninto two 1:* relationships linked by intermediate entity.Can also represent this model in an ODBMS.

COMP 302 Valentina Tamma

*:* Relationships

COMP 302 Valentina Tamma

Alternative Design for *:* Relationships

COMP 302 Valentina Tamma

Referential Integrity

Several techniques to handle referential integrity:

• Do not allow user to explicitly delete objects.– System is responsible for “garbage collection”.

• Allow user to delete objects when they are no longerrequired.– System may detect invalid references automatically and

set reference to NULL or disallow the deletion.

Page 15: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

15

COMP 302 Valentina Tamma

Referential Integrity

• Allow user to modify and delete objects andrelationships when they are no longer required.– System automatically maintains the integrity of objects.– Inverse attributes can be used to maintain referential

integrity.

COMP 302 Valentina Tamma

Behavioral Design

• EER approach must be supported with technique thatidentifies behavior of each class (processingrequirements)

• Involves identifying:– public methods: visible to all users– private methods: internal to class.

• Three types of methods:– constructors and destructors– access– transform.

COMP 302 Valentina Tamma

Behavioral Design - Methods

• Constructor - creates new instance of class (with aunique OID).

• Destructor - deletes class instance no longer required.• Access - returns value of one or more attributes (Get).

Can return a single value or a collection.• Transform - changes state of class instance (Put).

COMP 302 Valentina Tamma

Identifying Methods

• Several methodologies for identifying methods, typicallycombine following approaches:– Identify classes and determine methods that may be

usefully provided for each class.– Decompose application in top-down fashion and

determine methods required to provide requiredfunctionality.

Page 16: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

16

COMP 302 Valentina Tamma

Reading material

• Object oriented analysis and design with UML: Connollyand Begg, 4th edition, 25.7

• Slides together with the notes on OO

COMP 302 Valentina Tamma

Objectives

• Framework for an OODM.• Basics of persistent programming languages.• Main points of OODBMS Manifesto.• Main strategies for developing an OODBMS.• Single-level v. two-level storage models.• Pointer swizzling.

COMP 302 Valentina Tamma

Objectives

• How an OODBMS accesses records.• Persistent schemes.• Advantages and disadvantages of orthogonal

persistence.• Issues underlying OODBMSs.• Advantages and disadvantages of OODBMSs.

COMP 302 Valentina Tamma

Object-Oriented Data Model

No one agreed object data model. One definition:

Object-Oriented Data Model (OODM)– Data model that captures semantics of objects supported

in object-oriented programming.

Object-Oriented Database (OODB)– Persistent and sharable collection of objects defined by an

ODM.

Object-Oriented DBMS (OODBMS)– Manager of an ODB.

Page 17: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

17

COMP 302 Valentina Tamma

Object-Oriented Data Model

• Zdonik and Maier present a threshold model that anOODBMS must, at a minimum, satisfy:

– It must provide database functionality.– It must support object identity.– It must provide encapsulation.– It must support objects with complex state.

• Inheritance useful but not essential

COMP 302 Valentina Tamma

Object-Oriented Data Model

• Khoshafian and Abnous define OODBMS as:1. OO = ADTs + Inheritance + Object identity2. OODBMS = OO + Database capabilities.

• Parsaye et al. gives:1. High-level query language with query optimization.2. Support for persistence, atomic transactions: concurrency

and recovery control.3. Support for complex object storage, indexes, and access

methods.4. OODBMS = OO system + (1), (2), and (3).

COMP 302 Valentina Tamma

Commercial OODBMSs

• GemStone from Gemstone Systems Inc.,• Objectivity/DB from Objectivity Inc.,• ObjectStore from Progress Software Corp.,• Ontos from Ontos Inc.,• FastObjects from Poet Software Corp.,• Jasmine from Computer Associates/Fujitsu,• Versant from Versant Corp.

COMP 302 Valentina Tamma

Origins of the Object-Oriented Data Model

Page 18: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

18

COMP 302 Valentina Tamma

Persistent Programming Languages (PPLs)

• Language that provides users with ability to(transparently) preserve data across successiveexecutions of a program, and even allows such data tobe used by many different programs.

• In contrast, database programming language (e.g.SQL) differs by its incorporation of features beyondpersistence, such as transaction management,concurrency control, and recovery.

COMP 302 Valentina Tamma

Persistent Programming Languages (PPLs)

• PPLs eliminate impedance mismatch by extendingprogramming language with database capabilities.– In PPL, language’s type system provides data model,

containing rich structuring mechanisms.

• In some PPLs procedures are ‘first class’ objects andare treated like any other object in language.– Procedures are assignable, may be result of expressions,

other procedures or blocks, and may be elements ofconstructor types.

– Procedures can be used to implement ADTs.

COMP 302 Valentina Tamma

Persistent Programming Languages (PPLs)

• PPL also maintains same data representation inmemory as in persistent store.– Overcomes difficulty and overhead of mapping between

the two representations.• Addition of (transparent) persistence into a PPL is

important enhancement to IDE, and integration of twoparadigms provides more functionality and semantics.

COMP 302 Valentina Tamma

OODBMS Manifesto

• Complex objects must be supported.• Object identity must be supported.• Encapsulation must be supported.• Types or Classes must be supported.• Types or Classes must be able to inherit from their

ancestors.• Dynamic binding must be supported.• The DML must be computationally complete (general

purpose programming language).

Page 19: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

19

COMP 302 Valentina Tamma

OODBMS Manifesto

• The set of data types must be extensible.• Data persistence must be provided.• The DBMS must be capable of managing very large

databases.• The DBMS must support concurrent users.• DBMS must be able to recover from hardware/software

failures.• DBMS must provide a simple way of querying data.

COMP 302 Valentina Tamma

OODBMS Manifesto

• The manifesto proposes the following optionalfeatures:– Multiple inheritance, type checking and type inferencing,

distribution across a network, design transactions andversions.

• No direct mention of support for security, integrity,views or even a declarative query language.

COMP 302 Valentina Tamma

Alternative Strategies for Developing an OODBMS

• Extend existing object-oriented programming language.– GemStone extended Smalltalk.

• Provide extensible OODBMS library.– Approach taken by Ontos, Versant, and ObjectStore.

• Embed OODB language constructs in a conventionalhost language.– Approach taken by O2,which has extensions for C.

COMP 302 Valentina Tamma

Alternative Strategies for Developing an OODBMS

• Extend existing database language with object-oriented capabilities.– Approach being pursued by RDBMS and OODBMS

vendors.– Ontos and Versant provide a version of OSQL.

• Develop a novel database data model/language.

Page 20: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

20

COMP 302 Valentina Tamma

Single-Level v. Two-Level Storage Model

Performance vs ease of use

• Traditional programming languages lack built-insupport for many database features.

• Increasing number of applications now requirefunctionality from both database systems andprogramming languages.

• Such applications need to store and retrieve largeamounts of shared, structured data.

COMP 302 Valentina Tamma

Single-Level v. Two-Level Storage Model

• With a traditional DBMS, programmer has to:– Decide when to read and update objects.– Write code to translate between application’s object

model and the data model of the DBMS.– Perform additional type-checking when object is read

back from database, to guarantee object will conform toits original type.

COMP 302 Valentina Tamma

Single-Level v. Two-Level Storage Model

• Difficulties occur because conventional DBMSs havetwo-level storage model: storage model in memory,and database storage model on disk.

• In contrast, OODBMS gives illusion of single-levelstorage model, with similar representation in bothmemory and in database stored on disk.– Requires clever management of representation of

objects in memory and on disk (called “pointerswizzling”).

COMP 302 Valentina Tamma

Two-Level Storage Model for RDBMS

Page 21: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

21

COMP 302 Valentina Tamma

Single-Level Storage Model for OODBMS

COMP 302 Valentina Tamma

Pointer Swizzling Techniques

The action of converting object identifiers (OIDs) tomain memory pointers.

• Logical OIDs vs Physical OIDs, both different from inmemory pointers

• Aim is to optimize access to objects.• Should be able to locate any referenced objects on

secondary storage using their OIDs.• Once objects have been read into cache, want to

record that objects are now in memory to prevent themfrom being retrieved again.

COMP 302 Valentina Tamma

Pointer Swizzling Techniques

• Could hold lookup table that maps OIDs to memorypointers (e.g. using hashing), but this is slow

• Pointer swizzling attempts to provide a more efficientstrategy by storing memory pointers in the place ofreferenced OIDs, and vice versa when the object iswritten back to disk.

COMP 302 Valentina Tamma

No Swizzling

• Easiest implementation is not to do any swizzling.• Objects faulted into memory, and handle passed to

application containing object’s OID.• OID is used every time the object is accessed.• System must maintain some type of lookup table -

Resident Object Table (ROT) - so that object’s virtualmemory pointer can be located and then used toaccess object.

• Inefficient if same objects are accessed repeatedly.• Acceptable if objects only accessed once.

Page 22: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

22

COMP 302 Valentina Tamma

Resident Object Table (ROT)

COMP 302 Valentina Tamma

Object Referencing

• Need to distinguish between resident and non-residentobjects.

• Most techniques variations of edge marking or nodemarking.

• Virtual memory as a directed graph where objects arenodes, and references the directed edges

• Edge marking marks every object pointer with a tag bit:– if bit set, reference is to memory pointer;– else, still pointing to OID and needs to be swizzled when

object it refers to is faulted into.

COMP 302 Valentina Tamma

Object Referencing

• Node marking requires that all object references areimmediately converted to virtual memory pointerswhen object is faulted into memory.

• First approach is software-based technique butsecond can be implemented using software orhardware-based techniques.

COMP 302 Valentina Tamma

Hardware-Based Schemes

• Use virtual memory access protection violations todetect accesses of non-resident objects.

• Use standard virtual memory hardware to triggertransfer of persistent data from disk to memory.

• Once page has been faulted in, objects are accessedvia normal virtual memory pointers and no furtherobject residency checking is required.

• Avoids overhead of residency checks incurred bysoftware approaches.

Page 23: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

23

COMP 302 Valentina Tamma

Pointer Swizzling - Other Issues

• Three other issues that affect swizzling techniques:

– Copy versus In-Place Swizzling.– Eager versus Lazy Swizzling.– Direct versus Indirect Swizzling.

COMP 302 Valentina Tamma

Copy versus In-Place Swizzling

• When faulting objects in, data can either be copied intoapplication’s local object cache or accessed in-placewithin object manager’s database cache .

• Copy swizzling may be more efficient as, in the worstcase, only modified objects have to be swizzled backto their OIDs.

• In-place may have to unswizzle entire page of objectsif one object on page is modified.

COMP 302 Valentina Tamma

Eager versus Lazy Swizzling

• Moss defines eager swizzling as swizzling all OIDs forpersistent objects on all data pages used byapplication, before any object can be accessed.

• More relaxed definition restricts swizzling to allpersistent OIDs within object the application wishes toaccess.

• Lazy swizzling only swizzles pointers as they areaccessed or discovered.

COMP 302 Valentina Tamma

Direct versus Indirect Swizzling

• Only an issue when swizzled pointer can refer to objectthat is no longer in virtual memory.

• With direct swizzling, virtual memory pointer ofreferenced object is placed directly in swizzled pointer.

• With indirect swizzling, virtual memory pointer is placedin an intermediate object, which acts as a placeholderfor the actual object.– Allows objects to be uncached without requiring swizzled

pointers to be unswizzled.

Page 24: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

24

COMP 302 Valentina Tamma

Accessing an Object with a RDBMS

COMP 302 Valentina Tamma

Accessing an Object with an OODBMS

COMP 302 Valentina Tamma

Persistent Schemes

• Consider three persistent schemes to make objectspersistent:

– Checkpointing.– Serialization.– Explicit Paging.

• Note, persistence can also be applied to (object) codeand to the program execution state.

COMP 302 Valentina Tamma

Checkpointing

• Copy all or part of program’s address space tosecondary storage.

• If complete address space saved, program can restartfrom checkpoint.

• In other cases, only program’s heap saved.• Two main drawbacks:

– Can only be used by program that created it.– May contain large amount of data that is of no use in

subsequent executions.

Page 25: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

25

COMP 302 Valentina Tamma

Serialization

• Copy closure of a data structure to disk.• Write on a data value may involve traversal of graph of

objects reachable from the value, and writing offlattened version of structure to disk.

• Reading back flattened data structure produces newcopy of original data structure.

• Sometimes called serialization, pickling, or in adistributed computing context, marshaling.

COMP 302 Valentina Tamma

Serialization

• Two inherent problems:– Does not preserve object identity.– Not incremental, so saving small changes to a large data

structure is not efficient.

COMP 302 Valentina Tamma

Explicit Paging

• Explicitly ‘page’ objects between application heap andpersistent store.

• Usually requires conversion of object pointers fromdisk-based scheme to memory-based scheme.

• Two common methods for creating/updating persistentobjects:– Reachability-based.– Allocation-based.

COMP 302 Valentina Tamma

Explicit Paging - Reachability-Based Persistence

• Object will persist if it is reachable from a persistent rootobject.

• Programmer does not need to decide at object creationtime whether object should be persistent.

• Object can become persistent by adding it to thereachability tree.

• Maps well onto language that contains garbagecollection mechanism (e.g. Smalltalk or Java).

Page 26: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

26

COMP 302 Valentina Tamma

Explicit Paging - Allocation-Based Persistence

• Object only made persistent if it is explicitly declaredas such within the application program.

• Can be achieved in several ways:

– By class.– By explicit call.

COMP 302 Valentina Tamma

Explicit Paging - Allocation-Based Persistence

• By class– Class is statically declared to be persistent and all

instances made persistent when they are created.– Class may be subclass of system-supplied persistent

class.• By explicit call

– Object may be specified as persistent when it is createdor dynamically at runtime.

COMP 302 Valentina Tamma

Orthogonal Persistence

• Three fundamental principles:

– Persistence independence.– Data type orthogonality.– Transitive persistence (originally referred to as‘persistence identification’ but ODMG term ‘transitivepersistence’ used here).

COMP 302 Valentina Tamma

Persistence Independence

• Persistence of object independent of how programmanipulates that object.

• Conversely, code fragment independent of persistenceof data it manipulates.

• Should be possible to call function with its parameterssometimes objects with long term persistence andsometimes only transient.

• Programmer does not need to control movement ofdata between long-term and short-term storage.

Page 27: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

27

COMP 302 Valentina Tamma

Data Type Orthogonality

• All data objects should be allowed full range ofpersistence irrespective of their type.

• No special cases where object is not allowed to belong-lived or is not allowed to be transient.

• In some PPLs, persistence is quality attributable toonly subset of language data types.

COMP 302 Valentina Tamma

Transitive Persistence

• Choice of how to identify and provide persistent objectsat language level is independent of the choice of datatypes in the language.

• Technique that is now widely used for identification isreachability-based.

COMP 302 Valentina Tamma

Orthogonal Persistence - Advantages

• Improved programmer productivity from simplersemantics.

• Improved maintenance.• Consistent protection mechanisms over whole

environment.• Support for incremental evolution.• Automatic referential integrity.

COMP 302 Valentina Tamma

Orthogonal Persistence - Disadvantages

• Some runtime expense in a system where everypointer reference might be addressing persistentobject.– System required to test if object must be loaded in from

disk-resident database.• Although orthogonal persistence promotes

transparency, system with support for sharing amongconcurrent processes cannot be fully transparent.

Page 28: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

28

COMP 302 Valentina Tamma

Versions

Allows changes to properties of objects to be managedso that object references always point to correct objectversion.

• Itasca identifies 3 types of versions:– Transient Versions.– Working Versions.– Released Versions.

COMP 302 Valentina Tamma

Versions and Configurations

COMP 302 Valentina Tamma

Versions and Configurations

COMP 302 Valentina Tamma

Schema Evolution

• Some applications require considerable flexibility indynamically defining and modifying database schema.

• Typical schema changes:

(1) Changes to class definition:(a) Modifying Attributes.(b) Modifying Methods.

Page 29: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

29

COMP 302 Valentina Tamma

Schema Evolution

(2) Changes to inheritance hierarchy:(a) Making a class S superclass of a class C.(b) Removing S from list of superclasses of C.(c) Modifying order of superclasses of C.

(3) Changes to set of classes, such as creating anddeleting classes and modifying class names.

• Changes must not leave schema inconsistent.

COMP 302 Valentina Tamma

Schema Consistency

1. Resolution of conflicts caused by multipleinheritance and redefinition of attributes andmethods in a subclass.

1.1 Rule of precedence of subclasses oversuperclasses.

1.2 Rule of precedence between superclasses of adifferent origin.

1.3 Rule of precedence between superclasses of thesame origin.

COMP 302 Valentina Tamma

Schema Consistency

2. Propagation of modifications to subclasses.

2.1 Rule for propagation of modifications.2.2 Rule for propagation of modifications in the event

of conflicts.2.3 Rule for modification of domains.

COMP 302 Valentina Tamma

Schema Consistency

3. Aggregation and deletion of inheritancerelationships between classes and creation andremoval of classes.

3.1 Rule for inserting superclasses.3.2 Rule for removing superclasses.3.3 Rule for inserting a class into a schema.3.4 Rule for removing a class from a schema.

Page 30: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

30

COMP 302 Valentina Tamma

Schema Consistency

COMP 302 Valentina Tamma

Client-Server Architecture

• Three basic architectures:

– Object Server.– Page Server.– Database Server.

COMP 302 Valentina Tamma

Object Server

• Distribute processing between the two components.• Typically, client is responsible for transaction

management and interfacing to programming language.• Server responsible for other DBMS functions.• Best for cooperative, object-to-object processing in an

open, distributed environment.

COMP 302 Valentina Tamma

Page and Database Server

Page Server• Most database processing is performed by client.• Server responsible for secondary storage and providing

pages at client’s request.

Database Server• Most database processing performed by server.• Client simply passes requests to server, receives

results and passes them to application.• Approach taken by many RDBMSs.

Page 31: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

31

COMP 302 Valentina Tamma

Client-Server Architecture

COMP 302 Valentina Tamma

Architecture - Storing and Executing Methods

• Two approaches:– Store methods in external files.– Store methods in database.

• Benefits of latter approach:– Eliminates redundant code.– Simplifies modifications.

COMP 302 Valentina Tamma

Architecture - Storing and Executing Methods

– Methods are more secure.– Methods can be shared concurrently.– Improved integrity.

• Obviously, more difficult to implement.

COMP 302 Valentina Tamma

Architecture - Storing and Executing Methods

Page 32: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

32

COMP 302 Valentina Tamma

Benchmarking - Wisconsin benchmark

• Developed to allow comparison of particular DBMSfeatures.

• Consists of set of tests as a single user covering:– updates/deletes involving key and non-key attributes;– projections involving different degrees of duplication in the

attributes and selections with different selectivities onindexed, non-index, and clustered attributes;

– joins with different selectivities;– aggregate functions.

COMP 302 Valentina Tamma

Benchmarking - Wisconsin benchmark

• Original benchmark had 3 relations: one called Onektupwith 1000 tuples, and two others calledTenktup1/Tenktup2 with 10000 tuples.

• Generally useful although does not cater for highlyskewed attribute distributions and join queries used arerelatively simplistic.

• Consortium of manufacturers formed TransactionProcessing Council (TPC) in 1988 to create series oftransaction-based test suites to measure database/TPenvironments.

COMP 302 Valentina Tamma

TPC Benchmarks

• TPC-A and TPC-B for OLTP (now obsolete).• TPC-C replaced TPC-A/B and based on order entry

application.• TPC-H for ad hoc, decision support environments.• TPC-R for business reporting within decision support

environments.• TPC-W, a transactional Web benchmark for

eCommerce.

COMP 302 Valentina Tamma

Object Operations Version 1 (OO1) Benchmark

• Intended as generic measure of OODBMSperformance. Designed to reproduce operationscommon in advanced engineering applications, such asfinding all parts connected to a random part, all partsconnected to one of those parts, and so on, to a depthof seven levels.

• About 1990, benchmark was run on GemStone, Ontos,ObjectStore, Objectivity/DB, and Versant, and INGRESand Sybase. Results showed an average 30-foldperformance improvement for OODBMSs overRDBMSs.

Page 33: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

33

COMP 302 Valentina Tamma

Advantages of OODBMSs

• Enriched Modeling Capabilities.• Extensibility.• Removal of Impedance Mismatch.• More Expressive Query Language.• Support for Schema Evolution.• Support for Long Duration Transactions.• Applicability to Advanced Database Applications.• Improved Performance.

COMP 302 Valentina Tamma

Disadvantages of OODBMSs

• Lack of Universal Data Model.• Lack of Experience.• Lack of Standards.• Query Optimization compromises Encapsulation.• Object Level Locking may impact Performance.• Complexity.• Lack of Support for Views.• Lack of Support for Security.

Additional reading

COMP 302 Valentina Tamma

UML

• Represents unification and evolution of severalOOAD methods, particularly:– Booch method,– Object Modeling Technique (OMT),– Object-Oriented Software Engineering (OOSE).

• Adopted as a standard by OMG and accepted bysoftware community as primary notation for modelingobjects and components.

Page 34: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

34

COMP 302 Valentina Tamma

UML

• Defined as “a standard language for specifying,constructing, visualizing, and documenting the artifactsof a software system”.

• The UML does not prescribe any particularmethodology, but instead is flexible and customizable tofit any approach and can be used in conjunction with awide range of software lifecycles and developmentprocesses.

COMP 302 Valentina Tamma

UML – Design Goals

• Provide ready-to-use, expressive visual modelinglanguage so users can develop and exchangemeaningful models.

• Provide extensibility and specialization mechanisms toextend core concepts.

• Be independent of particular programming languagesand development processes.

• Provide a formal basis for understanding the modelinglanguage.

• Encourage growth of object-oriented tools market.• Support higher-level development concepts such as

collaborations, frameworks, patterns, and components.• Integrate best practices.

COMP 302 Valentina Tamma

UML - Diagrams

• Structural:– class diagrams– object diagrams– component diagrams– deployment diagrams.

• Behavioral:– use case diagrams– sequence diagrams– collaboration diagrams– statechart diagrams– activity diagrams.

COMP 302 Valentina Tamma

UML – Object Diagrams

• Model instances of classes and used to describesystem at a particular point in time.

• Can be used to validate class diagram with “realworld” data and record test cases.

Page 35: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

35

COMP 302 Valentina Tamma

UML – Component Diagrams

• Describe organization and dependencies amongphysical software components, such as source code,run-time (binary) code, and executables.

COMP 302 Valentina Tamma

UML – Deployment Diagrams

• Depict configuration of run-time system, showinghardware nodes, components that run on thesenodes, and connections between nodes.

COMP 302 Valentina Tamma

UML – Use Case Diagrams

• Model functionality provided by system (use cases),users who interact with system (actors), and associationbetween users and the functionality.

• Used in requirements collection and analysis phase torepresent high-level requirements of system.

• More specifically, specifies a sequence of actions,including variants, that system can perform and thatyields an observable result of value to a particular actor.

COMP 302 Valentina Tamma

UML – Use Case Diagrams

Page 36: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

36

COMP 302 Valentina Tamma

UML – Use Case Diagrams

COMP 302 Valentina Tamma

UML – Sequence Diagrams

• Model interactions between objects over time, capturingbehavior of an individual use case.

• Show the objects and the messages that are passedbetween these objects in the use case.

COMP 302 Valentina Tamma

UML – Sequence Diagrams

COMP 302 Valentina Tamma

UML – Collaboration Diagrams

• Show interactions between objects as a series ofsequenced messages.

• Cross between an object diagram and a sequencediagram.

• Unlike sequence diagram, which has column/rowformat, collaboration diagram uses free-formarrangement, which makes it easier to see allinteractions involving a particular object.

Page 37: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

37

COMP 302 Valentina Tamma

UML – Collaboration Diagrams

COMP 302 Valentina Tamma

UML – Statechart Diagrams

• Show how objects can change in response to externalevents.

• Usually model transitions of a specific object.

COMP 302 Valentina Tamma

UML – Activity Diagrams

• Model flow of control from one activity to another.• Typically represent invocation of an operation, a step in

a business process, or an entire business process.• Consist of activity states and transitions between them.

COMP 302 Valentina Tamma

UML – Activity Diagrams

Page 38: Object Oriented Databases - AMiner · PDF fileObject Oriented Databases Connolly & Begg. ... • A m ou ntf inr aio vila ble ep lin is ... –RDBMS sar ep otnv ig l c

38

COMP 302 Valentina Tamma

UML – Usage in Database Design Methodology

• Produce use case diagrams from requirementsspecification or while producing requirementsspecification to depict main functions required ofsystem. Can be augmented with use case descriptions.

• Produce first cut class diagram (ER model).• Produce a sequence diagram for each use case or

group of related use cases.• May be useful to add a control class to class diagram to

represent interface between the actors and the system.

COMP 302 Valentina Tamma

UML – Usage in Database Design Methodology

• Update class diagram to show required methods in eachclass.

• Create state diagram for each class to show how classchanges state in response to messages. Messages areidentified from sequence diagrams.

• Revise earlier diagrams based on new knowledgegained during this process.