cis550 handout 41 cis 550 fall 2001 handout 4 object oriented databases
TRANSCRIPT
CIS550 Handout 4 1
CIS 550Fall 2001
Handout 4 Object Oriented Databases
CIS550 Handout 4 2
Why OODBS?• Consider a relational database in which we
have information about Books:
• Problem: Redundancy in design is forced by the fact that we cannot have set valued attributes.
ISBN Title Author Keyword Publisher0-201 Compilers J.D. Ullman Compiler Addison Wesley0-201 Compilers J.D. Ullman Grammar Addison Wesley0-201 Compilers A.V. Aho Compiler Addison Wesley0-201 Compilers A.V. Aho Grammar Addison Wesley0-201 Compilers J.E. Hopcroft Compiler Addison Wesley0-201 Compilers J.E. Hopcroft Grammar Addison Wesley
CIS550 Handout 4 3
“Normalized” Design• We could imagine another design:
ISBN Title Publisher 0-201 Compilers Addison Wesley
ISBN Author ISBN Keyword0-201 J.D. Ullman 0-201 Compiler0-201 A.V. Aho 0-201 Grammar0-201 J.E. Hopcroft
CIS550 Handout 4 4
Problems with NormalizedDesign
• The “object” book has been split over different relations.
• Artificial keys are introduced when modeling sets of sets. E.g. Cars can be be bought with certain features (CD, air conditioning, leather seats, etc). For each car, various packages of these features are available, each package being a particular set of features.
CIS550 Handout 4 5
Problems, cont.
• Impossible to represent order in relational data model
• While SQL can perform “group-by”, it cannot return a grouped result. E.g. “Give me all the authors together with a list of books written by that author”.
• We can call SQL from a “host” PL, but can we call external programs from SQL?
CIS550 Handout 4 6
The natural solution
• This is an example of the nested relational model, in which values may themselves be sets or tuples.
• What we’d like is a type system that allows us freely to combine sets and tuples (non-1NF relations)
• This implies new operators: nest and unnest.
ISBN Title Authors Keywords Publisher 0-201 Compilers {J.D. Ullman, {Compiler, Addison Wesley A.V. Aho, Grammar} J.E. Hopcroft}
CIS550 Handout 4 7
Unnest• Given nested relation R on the previous
slide, we can unnest on one of the set-valued fields:
ISBN Title Author Keyword Publisher 0-201 Compilers J.D. Ullman {Compiler, Addison Wesley Grammar}0-201 Compilers A.V. Aho {Compiler, Addison Wesley Grammar}0-201 Compilers J.E. Hopcroft {Compiler, Addison Wesley Grammar}
:UNNEST (R){Author}
CIS550 Handout 4 8
Nest
• We can also and get back the original relation.
• But this isn't true in general! Consider the following example:
(R){Author}
NEST
A B1 {2,3}1 {3,4}
CIS550 Handout 4 9
Object-orientation• Complex type system, including atomic
types (string, integer, float, char, etc); collection types (sets, bags, lists); and the record type.
• Object identity gives identification of objects independent of value. Contrast this with the notion of a key in the relational model.
• Values with identity are called objects.
CIS550 Handout 4 10
Object-orientation, cont.• Methods model the behavior of an
object. For example, modifying the value of an object is often dangerous, and is encapsulated by methods (e.g. RaiseSalary and MovePoint).
• Types + Identity + Methods are Classes.• Each of these new types gives rise to
new operations on those types(as with nest and unnest)
CIS550 Handout 4 11
Classes, Types and Names
• Consider an schema representing persons, employees and consultants.
2O
class Person inherit Object public type tuple(name:string, male: boolean, spouse: Person, children: list(Person), dob: Date) public method age: integerend;
CIS550 Handout 4 12
Schema, cont.class Employee inherit Person public type tuple(salary: integer, subordinates: unique set(Employee))end;
class Consultant inherit Person public type tuple(phone: integer, rate: integer, num_hours:integer) public method earnings: integerend;
CIS550 Handout 4 13
Schema, cont.
• Note that Date is a user-defined type, and that Persons and Employees are the only entry-points into the database.
type Date tuple(month: integer, day: integer, year: integer)
name Persons: unique set(Person)name Employees: unique set(Employee)
CIS550 Handout 4 14
Select…from…where• Print all females in the database.
• Print the values of all females in the database.
select pfrom p in Personswhere not(p.male)
Result is of type set(Person)
select *pfrom p in Personswhere not(p.male)
Result is of type set(tuple(name:string, male: boolean, spouse: Person, children: list(Person), dob: Date))
CIS550 Handout 4 15
Select…from…where, cont.
• Print the names of all females in the database.
Result is of type set(string).
select distinct p.namefrom p in Personswhere not(p.male)
CIS550 Handout 4 16
More queries...• Print the name and birth-date of all
females in the database who were born in July.
Result is of type set(tuple(name:string, dob: Date))
select struct(name: p.name, dob: p.dob)from p in Personswhere not(p.male) and p.dob.month= 7
CIS550 Handout 4 17
Path expressions• Print the names of all spouses.
• Print the names and dob of all spouses who were born in June.
Result is of type set(tuple(name:string, dob:Date)).
select p.spouse.namefrom p in Persons
select struct(name: p.spouse.name, dob: p.spouse.dob)from p in Personswhere p.spouse.dob.month=6
Result is of type set(string).
CIS550 Handout 4 18
Method Invocation
• Methods are functions which implicitly have as a parameter “self” and optional additional parameters. None of the example methods have any additional parameters.
• Print the names and age of all employees.
select struct(name: e.name, age: e.age())from e in Employees
CIS550 Handout 4 19
Aggregate Operators• Aggregate operator count operates on
collections of elements of any type, min and max operate on collections of ordered types, and sum operates on collections of numbers.
• Print the number of Employees.
• Note the difference from SQL:
count(Employees) Result is of type integer.
select count(*)from Employees
CIS550 Handout 4 20
Aggregate operators, cont.• Print the maximum number of children
of any person.
• Print the name and names of children of the persons with the maximum number of children.
max(select count(p.children) from p in Persons)
select struct(name: p. name, children: select c.name from c in p.children)from p in Persons where count(p.children)= max(select count(p1.children) from p1 in Person)
CIS550 Handout 4 21
Unnesting• Print all pairs (manager, employee) of
managers in Employees.
• Print the names of all subordinates of Joe Brown.
select struct(manager: e.name, employee: s.name)from e in Employees, s in e.subordinateswhere count(e.subordinates) >0
select s.namefrom e in Employees, s in e.subordinateswhere e.name="Joe Brown"
CIS550 Handout 4 22
An incorrect version• Compare the previous query with the
following (incorrect) query:
• The incorrect query would cause a compile time error because subordinates is of type set(Employee), and you can only use projection over a record type.
select e.subordinates.namefrom e in Employees
CIS550 Handout 4 23
Flatten• Flatten gets rid of one level of set
(collection) nesting, e.g. it takes something of type set(set(element)) and produces something of type set(element) by taking the union of all sets of elements. (This is not the same as the “unnest” operator we have discussed.)
flatten({{1,2},{3,4},{1,5}}) = {1,2,3,4,5,1}
CIS550 Handout 4 24
An example...• Print the names of employees who are
supervised by someone.
Note that
is of type set(set(Employees)).
select s.name from s in flatten(select e.supervises from e in Employees)
(select e.supervises from e in Employees)
CIS550 Handout 4 25
Group by• Group by introduces a level of set nesting,
and produces something of type set(tuple(label:..., partition: set(...))).
• For each salary, print the employees earning that salary.
Result is of type set(tuple(salary: integer,
partition: set(struct(e:Employee)))).
select * from e in Employeesgroup by e.salary
CIS550 Handout 4 26
Example, cont.
• We could also have written the previous query as:
select struct(salary: e.salary, partition:(select * from e1 in Employees where e.salary=e1.salary))from e in Employees
CIS550 Handout 4 27
Group by … having• “For each salary earned by more than 2
employees, print the set of names of employees earning that salary.”
• Result is of type set(tuple(salary:integer, name:set(string))).
select struct(salary: e.salary, names: (select e1.name from e1 in partition))from e in Employeegroup by e.salary having count(partition)>2
CIS550 Handout 4 28
Indexing into a list.• Lists are ordered sets, and are treated as
arrays indexed starting from 0. We cannot into sets.
• “Print all first-born children.”
• Result is of type set(Person). Note the explicit test that the list was non-empty. We could also have used count.
select p.children[0]from p in Personwhere p.children<>list()
CIS550 Handout 4 29
Lists, cont.
• “Print the names of all second-born children.” (Recall that list indexing starts with 0.)
• Result is of type set(string).
select p.children[1].namefrom p in Personwhere count(p.children)> 1
CIS550 Handout 4 30
Sort... in… by• “select...from...where” creates a bag of
elements from an input set, “select distinct...from...where” creates a set. To create a list of element as output from an input set, we use “sort...in...by”.
• “Print the names and ages of employees by increasing age.”
• Result type is list(tuple(name:string, age:integer)).
sort x in (select struct(name:e.name, age:e.age()) from e in Employees)by x.age()
CIS550 Handout 4 31
Creating sets from lists• A list can be converted to a set using
listtoset. For example, suppose we want to print each person with children represented as a set rather than a list.
• Result type is set(tuple(name:string,children:set(Person)).
select struct(name:p.name, children: listtoset(p.children))from p in Person
CIS550 Handout 4 32
Set operations: difference
• “Print the employees who are supervised by noone.”
• Result type is set(Employee).
select e from Employees where not(e in flatten(select e1.supervises from e1 in Employees)) OR(select e in Employees) -flatten(select e.supervises from e in Employees)
CIS550 Handout 4 33
Set operations: union• “Print the names of all people who are
either employees or consultants.”
• But we cannot do this because “Consultants” is not a named entry point into our database!
select e.name from e in Employees+ select c.name from c in Consultants
CIS550 Handout 4 34
Casting• Casting is sometimes necessary to support type
inferencing. • For example, suppose we know that all people in
our database are either Employees or Consultants: “Print the names of all consultants.”
• Result type is set(string). The type of the expression “select (Person)e from e in Employees” is set(Person).
select c.name from c in(Persons - (select (Person)e from e in Employees))
CIS550 Handout 4 35
Casting, cont.
• We could also have said
• Since Employee inherits Person, the type of the expression would have been the most general type (Person), and Employee would be implicitly cast to a Person.
Persons - Employees
CIS550 Handout 4 36
Casting, cont.• “Print names and earnings of consultants.”
• Note that
is of type set(Person), and that the Person type does not have the method earnings.
select struct(name:p.name, earnings:((Consultant)p).earnings())from p in (Persons - (select (Person)e from e in Employees))
Persons - (select (Person)e from e in Employees
CIS550 Handout 4 37
Quantifier Expressions• Expressions “exists” and “for all” also
exist in OQL. For example:• “Print the names of people who have had
at least one boy.”
• “Print the names of people who had all boys.”
select struct(name:p.name)from p in Personswhere exists c in p.children: c.male
select struct(name: p.name)from p in Personswhere forall c in p.children: c.male
CIS550 Handout 4 38
Element• Sometimes we want to turn a collection
containing one element into the element type.
• Note that this is not always safe, i.e. testing whether a collection has a single element is not something that can be checked at compile time, so a run-time error may occur.
CIS550 Handout 4 39
Element, an example• “For all people with single-child families, print
the person’s name and their child’s name.”
• Note that this is safe since we explicitly checked the number of children.
select struct(name:p.name, cname: element(select c.name from c in p.children))from p in Personswhere count(p.children)=1
CIS550 Handout 4 40
Element, another example• “Print the tuple of information about John
Kramer.”• If we know “name” is a key for Person,
then the following query will be correct:
• However, we do not really have the ability to specify keys in so there is no way that this can be checked at compile time!
element(select *p from p in Persons where p.name="John Kramer")
2O
CIS550 Handout 4 41
Summary• OQL is a language for complex object
databases. The types include:– base types- integer, string, boolean– tuple type - tuple( )– the collection types - unique set (set), set
(bag) and list – objects (class types)
• For each type, basic operations (including constructor and destructor operations) are defined.
NNlll :,...,:,: 2211
CIS550 Handout 4 42
Summary, cont• For example,
– For objects o, we can dereference (*o) and cast ((Person)o).
– For tuples, we can project over attributes (e.g. p.name).
– For sets, we can select elements, flatten, extract an element from a singleton set, take unions (+ or union), set difference (- or except) and intersection (* or intersect).
– For lists, we can extract indexed elements and create lists by specifying an ordering on elements.