2009.11.03 slide 1is 257 – fall 2009 jdbc and java access to dbms & introduction to data...

63
IS 257 – Fall 2009 2009.11.03 SLIDE 1 JDBC and Java Access to DBMS & Introduction to Data Warehouses University of California, Berkeley School of Information IS 257: Database Management

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

IS 257 – Fall 2009 2009.11.03 SLIDE 1

JDBC and Java Access to DBMS&

Introduction to Data Warehouses

University of California, Berkeley

School of Information

IS 257: Database Management

IS 257 – Fall 2009 2009.11.03 SLIDE 2

Lecture Outline

• Review:– Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL– Extending OR databases (examples from

PostgreSQL)

• Java and JDBC

• Introduction to Data Warehouses

IS 257 – Fall 2009 2009.11.03 SLIDE 3

Lecture Outline

• Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL

• Extending OR databases (examples from PostgreSQL)

• Java and JDBC

• Introduction to Data Warehouses

IS 257 – Fall 2009 2009.11.03 SLIDE 4

Object Relational Data Model

• Class, instance, attribute, method, and integrity constraints

• OID per instance• Encapsulation• Multiple inheritance hierarchy of classes• Class references via OID object

references• Set-Valued attributes• Abstract Data Types

IS 257 – Fall 2009 2009.11.03 SLIDE 5

Object Relational Extended SQL (Illustra)

• CREATE TABLE tablename {OF TYPE Typename}|{OF NEW TYPE typename} (attr1 type1, attr2 type2,…,attrn typen) {UNDER parent_table_name};

• CREATE TYPE typename (attribute_name type_desc, attribute2 type2, …, attrn typen);

• CREATE FUNCTION functionname (type_name, type_name) RETURNS type_name AS sql_statement

IS 257 – Fall 2009 2009.11.03 SLIDE 6

Object-Relational SQL in ORACLE

• CREATE (OR REPLACE) TYPE typename AS OBJECT (attr_name, attr_type, …);

• CREATE TABLE OF typename;

IS 257 – Fall 2009 2009.11.03 SLIDE 7

Example

• CREATE TYPE ANIMAL_TY AS OBJECT (Breed VARCHAR2(25), Name VARCHAR2(25), Birthdate DATE);

• Creates a new type

• CREATE TABLE Animal of Animal_ty;

• Creates “Object Table”

IS 257 – Fall 2009 2009.11.03 SLIDE 8

Constructor Functions

• INSERT INTO Animal values (ANIMAL_TY(‘Mule’, ‘Frances’, TO_DATE(‘01-APR-1997’, ‘DD-MM-YYYY’)));

• Insert a new ANIMAL_TY object into the table

IS 257 – Fall 2009 2009.11.03 SLIDE 9

PostgreSQL Classes

• The fundamental notion in Postgres is that of a class, which is a named collection of object instances. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has a permanent object identifier (OID) that is unique throughout the installation. Because SQL syntax refers to tables, we will use the terms table and class interchangeably. Likewise, an SQL row is an instance and SQL columns are attributes.

IS 257 – Fall 2009 2009.11.03 SLIDE 10

Creating a Class

• You can create a new class by specifying the class name, along with all attribute names and their types:

CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date);

IS 257 – Fall 2009 2009.11.03 SLIDE 11

PostgreSQL

• Postgres can be customized with an arbitrary number of user-defined data types. Consequently, type names are not syntactical keywords, except where required to support special cases in the SQL92 standard.

• So far, the Postgres CREATE command looks exactly like the command used to create a table in a traditional relational system. However, we will presently see that classes have properties that are extensions of the relational model.

IS 257 – Fall 2009 2009.11.03 SLIDE 12

Inheritance

CREATE TABLE cities ( name text, population float, altitude int -- (in ft));

CREATE TABLE capitals ( state char(2)) INHERITS (cities);

IS 257 – Fall 2009 2009.11.03 SLIDE 13

Inheritance

• In Postgres, a class can inherit from zero or more other classes.

• A query can reference either – all instances of a class – or all instances of a class plus all of its

descendants

IS 257 – Fall 2009 2009.11.03 SLIDE 14

Non-Atomic Values - Arrays

• The preceding SQL command will create a class named SAL_EMP with a text string (name), a one-dimensional array of int4 (pay_by_quarter), which represents the employee's salary by quarter and a two-dimensional array of text (schedule), which represents the employee's weekly schedule

• Now we do some INSERTSs; note that when appending to an array, we enclose the values within braces and separate them by commas.

IS 257 – Fall 2009 2009.11.03 SLIDE 15

PostgreSQL Extensibility

• Postgres is extensible because its operation is catalog-driven– RDBMS store information about databases, tables, columns, etc.,

in what are commonly known as system catalogs. (Some systems call this the data dictionary).

• One key difference between Postgres and standard RDBMS is that Postgres stores much more information in its catalogs– not only information about tables and columns, but also

information about its types, functions, access methods, etc.

• These classes can be modified by the user, and since Postgres bases its internal operation on these classes, this means that Postgres can be extended by users– By comparison, conventional database systems can only be

extended by changing hardcoded procedures within the DBMS or by loading modules specially-written by the DBMS vendor.

IS 257 – Fall 2009 2009.11.03 SLIDE 16

Rules System

• CREATE RULE name AS ON event

TO object [ WHERE condition ]

DO [ INSTEAD ] [ action | NOTHING ]

• Rules can be triggered by any event (select, update, delete, etc.)

IS 257 – Fall 2009 2009.11.03 SLIDE 17

Views as Rules

• Views in Postgres are implemented using the rule system. In fact there is absolutely no difference between a

CREATE VIEW myview AS SELECT * FROM mytab;

• compared against the two commands CREATE TABLE myview (same attribute list as

for mytab);CREATE RULE "_RETmyview" AS ON SELECT

TO myview DO INSTEAD SELECT * FROM mytab;

IS 257 – Fall 2009 2009.11.03 SLIDE 18

Extensions to Indexing

• Access Method extensions in Postgres

• GiST: A Generalized Search Trees – Joe Hellerstein, UC Berkeley

IS 257 – Fall 2009 2009.11.03 SLIDE 19

Indexing in OO/OR Systems

• Quick access to user-defined objects

• Support queries natural to the objects

• Two previous approaches– Specialized Indices (“ABCDEFG-trees”)

• redundant code: most trees are very similar• concurrency control, etc. tricky!

– Extensible B-trees & R-trees (Postgres/Illustra)

• B-tree or R-tree lookups only!• E.g. ‘WHERE movie.video < ‘Terminator 2’

IS 257 – Fall 2009 2009.11.03 SLIDE 20

GiST Approach

• A generalized search tree. Must be:

• Extensible in terms of queries

• General (B+-tree, R-tree, etc.)

• Easy to extend

• Efficient (match specialized trees)

• Highly concurrent, recoverable, etc.

IS 257 – Fall 2009 2009.11.03 SLIDE 21

GiST Applications

• New indexes needed for new apps...– find all supersets of S– find all molecules that bind to M– your favorite query here (multimedia?)

• ...and for new queries over old domains:– find all points in region from 12 to 2 o’clock

– find all text elements estimated relevant to a query string

IS 257 – Fall 2009 2009.11.03 SLIDE 22

Lecture Outline

• Review– Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL– Extending OR databases (examples from

PostgreSQL)

• Java and JDBC

• Introduction to Data Warehouses

IS 257 – Fall 2009 2009.11.03 SLIDE 23

Java and JDBC

• Java is probably the high-level language used in instruction and development today one of the earliest “enterprise” additions to Java was JDBC

• JDBC is an API that provides a mid-level access to DBMS from Java applications

• Intended to be an open cross-platform standard for database access in Java

• Similar in intent to Microsoft’s ODBC

IS 257 – Fall 2009 2009.11.03 SLIDE 24

JDBC Architecture

• The goal of JDBC is to be a generic SQL database access framework that works for any database system with no changes to the interface code

Oracle MySQL Postgres

Java Applications

JDBC API

JDBC Driver Manager

Driver Driver Driver

IS 257 – Fall 2009 2009.11.03 SLIDE 25

JDBC

• Provides a standard set of interfaces for any DBMS with a JDBC driver – using SQL to specify the databases operations.

Resultset

Statement

Resultset Resultset

Connection

PreparedStatement CallableStatement

DriverManager

Oracle Driver ODBC Driver Postgres Driver

Oracle DB Postgres DBODBC DB

Application

IS 257 – Fall 2009 2009.11.03 SLIDE 26

JDBC Simple Java Implementation

import java.sql.*;import oracle.jdbc.*;

public class JDBCSample {

public static void main(java.lang.String[] args) {

try { // this is where the driver is loaded //Class.forName("jdbc.oracle.thin"); DriverManager.registerDriver(new OracleDriver());

}catch (SQLException e) { System.out.println("Unable to load driver Class"); return;}

IS 257 – Fall 2009 2009.11.03 SLIDE 27

JDBC Simple Java Impl.

try { //All DB access is within the try/catch block... // make a connection to ORACLE on Dream Connection con = DriverManager.getConnection(

"jdbc:oracle:thin:@dream.sims.berkeley.edu:1521:dev", “mylogin", “myoraclePW");

// Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");

IS 257 – Fall 2009 2009.11.03 SLIDE 28

JDBC Simple Java Impl. // show the Results...

while(rs.next()) {System.out.println(rs.getString("NAME"));

} // Release the database resources... rs.close(); stmt.close(); con.close();}catch (SQLException se) { // inform user of errors... System.out.println("SQL Exception: " + se.getMessage()); se.printStackTrace(System.out);}

}}

IS 257 – Fall 2009 2009.11.03 SLIDE 29

JDBC

• Once a connection has been made you can create three different types of statement objects

• Statement– The basic SQL statement as in the example

• PreparedStatement– A pre-compiled SQL statement

• CallableStatement– Permits access to stored procedures in the

Database

IS 257 – Fall 2009 2009.11.03 SLIDE 30

JDBC Resultset methods

• Next() to loop through rows in the resultset

• To access the attributes of each row you need to know its type, or you can use the generic “getObject()” which wraps the attribute as an object

IS 257 – Fall 2009 2009.11.03 SLIDE 31

JDBC “GetXXX()” methods

SQL data type Java Type GetXXX()

CHAR String getString()

VARCHAR String getString()

LONGVARCHAR String getString()

NUMERIC Java.math.BigDecimal

GetBigDecimal()

DECIMAL Java.math.BigDecimal

GetBigDecimal()

BIT Boolean getBoolean()

TINYINT Byte getByte()

IS 257 – Fall 2009 2009.11.03 SLIDE 32

JDBC GetXXX() Methods

SQL data type Java Type GetXXX()

SMALLINT Integer (short) getShort()

INTEGER Integer getInt()

BIGINT Long getLong()

REAL Float getFloat()

FLOAT Double getDouble()

DOUBLE Double getDouble()

BINARY Byte[] getBytes()

VARBINARY Byte[] getBytes()

LONGVARBINARY Byte[] getBytes()

IS 257 – Fall 2009 2009.11.03 SLIDE 33

JDBC GetXXX() Methods

SQL data type

Java Type GetXXX()

DATE java.sql.Date getDate()

TIME java.sql.Time getTime()

TIMESTAMP Java.sql.Timestamp getTimeStamp()

IS 257 – Fall 2009 2009.11.03 SLIDE 34

Large Object Handling

• Large binary data can be read from a resultset as streams using:– getAsciiStream()– getBinaryStream()– getUnicodeStream()

ResultSet rs = stmt.executeQuery(“SELECT IMAGE FROM PICTURES WHERE PID = 1223”));if (rs.next()) {

BufferedInputStream gifData = new BufferedInputSteam( rs.getBinaryStream(“IMAGE”)); byte[] buf = new byte[4*1024]; // 4K buffer

int len;while ((len = gifData.read(buf,0,buf.length)) != -1) {

out.write(buf, 0, len);}

}

IS 257 – Fall 2009 2009.11.03 SLIDE 35

JDBC Metadata

• There are also methods to access the metadata associated with a resultSet– ResultSetMetaData rsmd = rs.getMetaData();

• Metadata methods include…– getColumnCount();– getColumnLabel(col);– getColumnTypeName(col)

IS 257 – Fall 2009 2009.11.03 SLIDE 36

JDBC access to MySQL

• The basic JDBC interface is the same, the only differences are in how the drivers are loadedpublic class JDBCTestMysql { public static void main(java.lang.String[] args) {

try { // this is where the driver is loaded Class.forName("com.mysql.jdbc.Driver").newInstance();}catch (InstantiationException i) { System.out.println("Unable to load driver Class"); return;}catch (ClassNotFoundException e) { System.out.println("Unable to load driver Class"); …

IS 257 – Fall 2009 2009.11.03 SLIDE 37

JDBC for MySQL

try { //All DB access is within the try/catch block... // make a connection to MySQL on Dream Connection con = DriverManager.getConnection(

"jdbc:mysql://localhost/ (this is really one line) MyDatabase?user=MyLogin&password=MySQLPW");

// Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");

• Otherwise everything is the same as in the Oracle example

• For connecting to the machine you are running the program on, you can use “localhost” instead of the machine name

IS 257 – Fall 2009 2009.11.03 SLIDE 38

Demo – JDBC for MySQL

• Demo of JDBC code on Harbinger

• Code is available on class web site

IS 257 – Fall 2009 2009.11.03 SLIDE 39

Lecture Outline

• Review– Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL– Extending OR databases (examples from

PostgreSQL)

• Java and JDBC

• Introduction to Data Warehouses

IS 257 – Fall 2009 2009.11.03 SLIDE 40

Overview

• Data Warehouses and Merging Information Resources

• What is a Data Warehouse?

• History of Data Warehousing

• Types of Data and Their Uses

IS 257 – Fall 2009 2009.11.03 SLIDE 41

Problem: Heterogeneous Information Sources

“Heterogeneities are everywhere”

Different interfaces Different data representations Duplicate and inconsistent information

PersonalDatabases

Digital Libraries

Scientific DatabasesWorldWideWeb

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 42

Problem: Data Management in Large Enterprises

• Vertical fragmentation of informational systems (vertical stove pipes)

• Result of application (user)-driven development of operational systems

Sales Administration Finance Manufacturing ...

Sales PlanningStock Mngmt

...

Suppliers

...Debt Mngmt

Num. Control

...Inventory

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 43

Goal: Unified Access to Data

Integration System

• Collects and combines information• Provides integrated view, uniform user interface• Supports sharing

WorldWideWeb

Digital Libraries Scientific Databases

PersonalDatabases

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 44

The Traditional Research Approach

Source SourceSource. . .

Integration System

. . .

Metadata

Clients

Wrapper WrapperWrapper

• Query-driven (lazy, on-demand)

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 45

Disadvantages of Query-Driven Approach

• Delay in query processing– Slow or unavailable information sources– Complex filtering and integration

• Inefficient and potentially expensive for frequent queries

• Competes with local processing at sources

• Hasn’t caught on in industry

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 46

The Warehousing Approach

DataDataWarehouseWarehouse

Clients

Source SourceSource. . .

Extractor/Monitor

Integration System

. . .

Metadata

Extractor/Monitor

Extractor/Monitor

• Information integrated in advance

• Stored in WH for direct querying and analysis

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 47

Advantages of Warehousing Approach

• High query performance– But not necessarily most current information

• Doesn’t interfere with local processing at sources– Complex queries at warehouse– OLTP at information sources

• Information copied at warehouse– Can modify, annotate, summarize, restructure, etc.– Can store historical information– Security, no auditing

• Has caught on in industry

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 48

Not Either-Or Decision

• Query-driven approach still better for– Rapidly changing information– Rapidly changing information sources– Truly vast amounts of data from large

numbers of sources– Clients with unpredictable needs

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 49

Data Warehouse EvolutionT

IME

200019951990198519801960 1975

Information-Based Management

DataRevolution

“MiddleAges”

“PrehistoricTimes”

RelationalDatabases

PC’s andSpreadsheets

End-userInterfaces

1st DW Article

DWConfs.

Vendor DWFrameworks

CompanyDWs

“Building theDW”

Inmon (1992)Data Replication

Tools

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 50

What is a Data Warehouse?

“A Data Warehouse is a –subject-oriented,– integrated,– time-variant,–non-volatile

collection of data used in support of management decision making processes.”

-- Inmon & Hackathorn, 1994: viz. Hoffer, Chap 11

IS 257 – Fall 2009 2009.11.03 SLIDE 51

DW Definition…

• Subject-Oriented:– The data warehouse is organized around the

key subjects (or high-level entities) of the enterprise. Major subjects include

• Customers• Patients• Students• Products• Etc.

IS 257 – Fall 2009 2009.11.03 SLIDE 52

DW Definition…

• Integrated– The data housed in the data warehouse are

defined using consistent• Naming conventions• Formats• Encoding Structures• Related Characteristics

IS 257 – Fall 2009 2009.11.03 SLIDE 53

DW Definition…

• Time-variant– The data in the warehouse contain a time

dimension so that they may be used as a historical record of the business

IS 257 – Fall 2009 2009.11.03 SLIDE 54

DW Definition…

• Non-volatile– Data in the data warehouse are loaded and

refreshed from operational systems, but cannot be updated by end-users

IS 257 – Fall 2009 2009.11.03 SLIDE 55

What is a Data Warehouse?A Practitioners Viewpoint

• “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”

• -- Barry Devlin, IBM Consultant

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 56

A Data Warehouse is...

• Stored collection of diverse data– A solution to data integration problem– Single repository of information

• Subject-oriented– Organized by subject, not by application– Used for analysis, data mining, etc.

• Optimized differently from transaction-oriented db

• User interface aimed at executive decision makers and analysts

IS 257 – Fall 2009 2009.11.03 SLIDE 57

… Cont’d

• Large volume of data (Gb, Tb)• Non-volatile

– Historical– Time attributes are important

• Updates infrequent• May be append-only• Examples

– All transactions ever at WalMart– Complete client histories at insurance firm– Stockbroker financial information and portfolios

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 58

Warehouse is a Specialized DB

Standard DB• Mostly updates• Many small transactions• Mb - Gb of data• Current snapshot• Index/hash on p.k.• Raw data• Thousands of users (e.g.,

clerical users)

Warehouse• Mostly reads• Queries are long and

complex• Gb - Tb of data• History• Lots of scans• Summarized, reconciled

data• Hundreds of users (e.g.,

decision-makers, analysts)

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 59

Summary

Operational SystemsEnterpriseModeling

BusinessInformation Guide

DataWarehouse

Catalog

Data WarehousePopulation

DataWarehouse

Business InformationInterface

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 60

Warehousing and Industry

• Warehousing is big business– $2 billion in 1995– $3.5 billion in early 1997– Predicted: $8 billion in 1998 [Metagroup]

• Wal-Mart is said to have the largest warehouse– 1000-CPU, 583 Terabyte, Teradata system

(InformationWeek, Jan 9, 2006)– “Half a Petabyte” in warehouse (Ziff Davis Internet,

October 13, 2004)– 1 billion rows of data or more are updated every day

(InformationWeek, Jan 9, 2006)– Some Government and Scientific database are larger,

however

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 61

Other Large Data Warehouses• Not including Wal-Mart and Ebay

(InformationWeek, Jan 9, 2006)

IS 257 – Fall 2009 2009.11.03 SLIDE 62

Types of Data

• Business Data - represents meaning– Real-time data (ultimate source of all business data)– Reconciled data– Derived data

• Metadata - describes meaning– Build-time metadata– Control metadata– Usage metadata

• Data as a product* - intrinsic meaning– Produced and stored for its own intrinsic value– e.g., the contents of a text-book

Slide credit: J. Hammer

IS 257 – Fall 2009 2009.11.03 SLIDE 63

Next Time

• More on Data Warehouses

• Introduction to data mining