2009.11.03 slide 1is 257 – fall 2009 jdbc and java access to dbms & introduction to data...
Post on 21-Dec-2015
215 views
TRANSCRIPT
IS 257 – Fall 2009 2009.11.03 SLIDE 1
JDBC and Java Access to DBMS&
Introduction to Data Warehouses
University of California, Berkeley
School of Information
IS 257: Database Management
IS 257 – Fall 2009 2009.11.03 SLIDE 2
Lecture Outline
• Review:– Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL– Extending OR databases (examples from
PostgreSQL)
• Java and JDBC
• Introduction to Data Warehouses
IS 257 – Fall 2009 2009.11.03 SLIDE 3
Lecture Outline
• Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL
• Extending OR databases (examples from PostgreSQL)
• Java and JDBC
• Introduction to Data Warehouses
IS 257 – Fall 2009 2009.11.03 SLIDE 4
Object Relational Data Model
• Class, instance, attribute, method, and integrity constraints
• OID per instance• Encapsulation• Multiple inheritance hierarchy of classes• Class references via OID object
references• Set-Valued attributes• Abstract Data Types
IS 257 – Fall 2009 2009.11.03 SLIDE 5
Object Relational Extended SQL (Illustra)
• CREATE TABLE tablename {OF TYPE Typename}|{OF NEW TYPE typename} (attr1 type1, attr2 type2,…,attrn typen) {UNDER parent_table_name};
• CREATE TYPE typename (attribute_name type_desc, attribute2 type2, …, attrn typen);
• CREATE FUNCTION functionname (type_name, type_name) RETURNS type_name AS sql_statement
IS 257 – Fall 2009 2009.11.03 SLIDE 6
Object-Relational SQL in ORACLE
• CREATE (OR REPLACE) TYPE typename AS OBJECT (attr_name, attr_type, …);
• CREATE TABLE OF typename;
IS 257 – Fall 2009 2009.11.03 SLIDE 7
Example
• CREATE TYPE ANIMAL_TY AS OBJECT (Breed VARCHAR2(25), Name VARCHAR2(25), Birthdate DATE);
• Creates a new type
• CREATE TABLE Animal of Animal_ty;
• Creates “Object Table”
IS 257 – Fall 2009 2009.11.03 SLIDE 8
Constructor Functions
• INSERT INTO Animal values (ANIMAL_TY(‘Mule’, ‘Frances’, TO_DATE(‘01-APR-1997’, ‘DD-MM-YYYY’)));
• Insert a new ANIMAL_TY object into the table
IS 257 – Fall 2009 2009.11.03 SLIDE 9
PostgreSQL Classes
• The fundamental notion in Postgres is that of a class, which is a named collection of object instances. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has a permanent object identifier (OID) that is unique throughout the installation. Because SQL syntax refers to tables, we will use the terms table and class interchangeably. Likewise, an SQL row is an instance and SQL columns are attributes.
IS 257 – Fall 2009 2009.11.03 SLIDE 10
Creating a Class
• You can create a new class by specifying the class name, along with all attribute names and their types:
CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date);
IS 257 – Fall 2009 2009.11.03 SLIDE 11
PostgreSQL
• Postgres can be customized with an arbitrary number of user-defined data types. Consequently, type names are not syntactical keywords, except where required to support special cases in the SQL92 standard.
• So far, the Postgres CREATE command looks exactly like the command used to create a table in a traditional relational system. However, we will presently see that classes have properties that are extensions of the relational model.
IS 257 – Fall 2009 2009.11.03 SLIDE 12
Inheritance
CREATE TABLE cities ( name text, population float, altitude int -- (in ft));
CREATE TABLE capitals ( state char(2)) INHERITS (cities);
IS 257 – Fall 2009 2009.11.03 SLIDE 13
Inheritance
• In Postgres, a class can inherit from zero or more other classes.
• A query can reference either – all instances of a class – or all instances of a class plus all of its
descendants
IS 257 – Fall 2009 2009.11.03 SLIDE 14
Non-Atomic Values - Arrays
• The preceding SQL command will create a class named SAL_EMP with a text string (name), a one-dimensional array of int4 (pay_by_quarter), which represents the employee's salary by quarter and a two-dimensional array of text (schedule), which represents the employee's weekly schedule
• Now we do some INSERTSs; note that when appending to an array, we enclose the values within braces and separate them by commas.
IS 257 – Fall 2009 2009.11.03 SLIDE 15
PostgreSQL Extensibility
• Postgres is extensible because its operation is catalog-driven– RDBMS store information about databases, tables, columns, etc.,
in what are commonly known as system catalogs. (Some systems call this the data dictionary).
• One key difference between Postgres and standard RDBMS is that Postgres stores much more information in its catalogs– not only information about tables and columns, but also
information about its types, functions, access methods, etc.
• These classes can be modified by the user, and since Postgres bases its internal operation on these classes, this means that Postgres can be extended by users– By comparison, conventional database systems can only be
extended by changing hardcoded procedures within the DBMS or by loading modules specially-written by the DBMS vendor.
IS 257 – Fall 2009 2009.11.03 SLIDE 16
Rules System
• CREATE RULE name AS ON event
TO object [ WHERE condition ]
DO [ INSTEAD ] [ action | NOTHING ]
• Rules can be triggered by any event (select, update, delete, etc.)
IS 257 – Fall 2009 2009.11.03 SLIDE 17
Views as Rules
• Views in Postgres are implemented using the rule system. In fact there is absolutely no difference between a
CREATE VIEW myview AS SELECT * FROM mytab;
• compared against the two commands CREATE TABLE myview (same attribute list as
for mytab);CREATE RULE "_RETmyview" AS ON SELECT
TO myview DO INSTEAD SELECT * FROM mytab;
IS 257 – Fall 2009 2009.11.03 SLIDE 18
Extensions to Indexing
• Access Method extensions in Postgres
• GiST: A Generalized Search Trees – Joe Hellerstein, UC Berkeley
IS 257 – Fall 2009 2009.11.03 SLIDE 19
Indexing in OO/OR Systems
• Quick access to user-defined objects
• Support queries natural to the objects
• Two previous approaches– Specialized Indices (“ABCDEFG-trees”)
• redundant code: most trees are very similar• concurrency control, etc. tricky!
– Extensible B-trees & R-trees (Postgres/Illustra)
• B-tree or R-tree lookups only!• E.g. ‘WHERE movie.video < ‘Terminator 2’
IS 257 – Fall 2009 2009.11.03 SLIDE 20
GiST Approach
• A generalized search tree. Must be:
• Extensible in terms of queries
• General (B+-tree, R-tree, etc.)
• Easy to extend
• Efficient (match specialized trees)
• Highly concurrent, recoverable, etc.
IS 257 – Fall 2009 2009.11.03 SLIDE 21
GiST Applications
• New indexes needed for new apps...– find all supersets of S– find all molecules that bind to M– your favorite query here (multimedia?)
• ...and for new queries over old domains:– find all points in region from 12 to 2 o’clock
– find all text elements estimated relevant to a query string
IS 257 – Fall 2009 2009.11.03 SLIDE 22
Lecture Outline
• Review– Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL– Extending OR databases (examples from
PostgreSQL)
• Java and JDBC
• Introduction to Data Warehouses
IS 257 – Fall 2009 2009.11.03 SLIDE 23
Java and JDBC
• Java is probably the high-level language used in instruction and development today one of the earliest “enterprise” additions to Java was JDBC
• JDBC is an API that provides a mid-level access to DBMS from Java applications
• Intended to be an open cross-platform standard for database access in Java
• Similar in intent to Microsoft’s ODBC
IS 257 – Fall 2009 2009.11.03 SLIDE 24
JDBC Architecture
• The goal of JDBC is to be a generic SQL database access framework that works for any database system with no changes to the interface code
Oracle MySQL Postgres
Java Applications
JDBC API
JDBC Driver Manager
Driver Driver Driver
IS 257 – Fall 2009 2009.11.03 SLIDE 25
JDBC
• Provides a standard set of interfaces for any DBMS with a JDBC driver – using SQL to specify the databases operations.
Resultset
Statement
Resultset Resultset
Connection
PreparedStatement CallableStatement
DriverManager
Oracle Driver ODBC Driver Postgres Driver
Oracle DB Postgres DBODBC DB
Application
IS 257 – Fall 2009 2009.11.03 SLIDE 26
JDBC Simple Java Implementation
import java.sql.*;import oracle.jdbc.*;
public class JDBCSample {
public static void main(java.lang.String[] args) {
try { // this is where the driver is loaded //Class.forName("jdbc.oracle.thin"); DriverManager.registerDriver(new OracleDriver());
}catch (SQLException e) { System.out.println("Unable to load driver Class"); return;}
IS 257 – Fall 2009 2009.11.03 SLIDE 27
JDBC Simple Java Impl.
try { //All DB access is within the try/catch block... // make a connection to ORACLE on Dream Connection con = DriverManager.getConnection(
"jdbc:oracle:thin:@dream.sims.berkeley.edu:1521:dev", “mylogin", “myoraclePW");
// Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");
IS 257 – Fall 2009 2009.11.03 SLIDE 28
JDBC Simple Java Impl. // show the Results...
while(rs.next()) {System.out.println(rs.getString("NAME"));
} // Release the database resources... rs.close(); stmt.close(); con.close();}catch (SQLException se) { // inform user of errors... System.out.println("SQL Exception: " + se.getMessage()); se.printStackTrace(System.out);}
}}
IS 257 – Fall 2009 2009.11.03 SLIDE 29
JDBC
• Once a connection has been made you can create three different types of statement objects
• Statement– The basic SQL statement as in the example
• PreparedStatement– A pre-compiled SQL statement
• CallableStatement– Permits access to stored procedures in the
Database
IS 257 – Fall 2009 2009.11.03 SLIDE 30
JDBC Resultset methods
• Next() to loop through rows in the resultset
• To access the attributes of each row you need to know its type, or you can use the generic “getObject()” which wraps the attribute as an object
IS 257 – Fall 2009 2009.11.03 SLIDE 31
JDBC “GetXXX()” methods
SQL data type Java Type GetXXX()
CHAR String getString()
VARCHAR String getString()
LONGVARCHAR String getString()
NUMERIC Java.math.BigDecimal
GetBigDecimal()
DECIMAL Java.math.BigDecimal
GetBigDecimal()
BIT Boolean getBoolean()
TINYINT Byte getByte()
IS 257 – Fall 2009 2009.11.03 SLIDE 32
JDBC GetXXX() Methods
SQL data type Java Type GetXXX()
SMALLINT Integer (short) getShort()
INTEGER Integer getInt()
BIGINT Long getLong()
REAL Float getFloat()
FLOAT Double getDouble()
DOUBLE Double getDouble()
BINARY Byte[] getBytes()
VARBINARY Byte[] getBytes()
LONGVARBINARY Byte[] getBytes()
IS 257 – Fall 2009 2009.11.03 SLIDE 33
JDBC GetXXX() Methods
SQL data type
Java Type GetXXX()
DATE java.sql.Date getDate()
TIME java.sql.Time getTime()
TIMESTAMP Java.sql.Timestamp getTimeStamp()
IS 257 – Fall 2009 2009.11.03 SLIDE 34
Large Object Handling
• Large binary data can be read from a resultset as streams using:– getAsciiStream()– getBinaryStream()– getUnicodeStream()
ResultSet rs = stmt.executeQuery(“SELECT IMAGE FROM PICTURES WHERE PID = 1223”));if (rs.next()) {
BufferedInputStream gifData = new BufferedInputSteam( rs.getBinaryStream(“IMAGE”)); byte[] buf = new byte[4*1024]; // 4K buffer
int len;while ((len = gifData.read(buf,0,buf.length)) != -1) {
out.write(buf, 0, len);}
}
IS 257 – Fall 2009 2009.11.03 SLIDE 35
JDBC Metadata
• There are also methods to access the metadata associated with a resultSet– ResultSetMetaData rsmd = rs.getMetaData();
• Metadata methods include…– getColumnCount();– getColumnLabel(col);– getColumnTypeName(col)
IS 257 – Fall 2009 2009.11.03 SLIDE 36
JDBC access to MySQL
• The basic JDBC interface is the same, the only differences are in how the drivers are loadedpublic class JDBCTestMysql { public static void main(java.lang.String[] args) {
try { // this is where the driver is loaded Class.forName("com.mysql.jdbc.Driver").newInstance();}catch (InstantiationException i) { System.out.println("Unable to load driver Class"); return;}catch (ClassNotFoundException e) { System.out.println("Unable to load driver Class"); …
IS 257 – Fall 2009 2009.11.03 SLIDE 37
JDBC for MySQL
try { //All DB access is within the try/catch block... // make a connection to MySQL on Dream Connection con = DriverManager.getConnection(
"jdbc:mysql://localhost/ (this is really one line) MyDatabase?user=MyLogin&password=MySQLPW");
// Do an SQL statement... Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");
• Otherwise everything is the same as in the Oracle example
• For connecting to the machine you are running the program on, you can use “localhost” instead of the machine name
IS 257 – Fall 2009 2009.11.03 SLIDE 38
Demo – JDBC for MySQL
• Demo of JDBC code on Harbinger
• Code is available on class web site
IS 257 – Fall 2009 2009.11.03 SLIDE 39
Lecture Outline
• Review– Object-Relational DBMS– OR features in Oracle– OR features in PostgreSQL– Extending OR databases (examples from
PostgreSQL)
• Java and JDBC
• Introduction to Data Warehouses
IS 257 – Fall 2009 2009.11.03 SLIDE 40
Overview
• Data Warehouses and Merging Information Resources
• What is a Data Warehouse?
• History of Data Warehousing
• Types of Data and Their Uses
IS 257 – Fall 2009 2009.11.03 SLIDE 41
Problem: Heterogeneous Information Sources
“Heterogeneities are everywhere”
Different interfaces Different data representations Duplicate and inconsistent information
PersonalDatabases
Digital Libraries
Scientific DatabasesWorldWideWeb
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 42
Problem: Data Management in Large Enterprises
• Vertical fragmentation of informational systems (vertical stove pipes)
• Result of application (user)-driven development of operational systems
Sales Administration Finance Manufacturing ...
Sales PlanningStock Mngmt
...
Suppliers
...Debt Mngmt
Num. Control
...Inventory
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 43
Goal: Unified Access to Data
Integration System
• Collects and combines information• Provides integrated view, uniform user interface• Supports sharing
WorldWideWeb
Digital Libraries Scientific Databases
PersonalDatabases
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 44
The Traditional Research Approach
Source SourceSource. . .
Integration System
. . .
Metadata
Clients
Wrapper WrapperWrapper
• Query-driven (lazy, on-demand)
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 45
Disadvantages of Query-Driven Approach
• Delay in query processing– Slow or unavailable information sources– Complex filtering and integration
• Inefficient and potentially expensive for frequent queries
• Competes with local processing at sources
• Hasn’t caught on in industry
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 46
The Warehousing Approach
DataDataWarehouseWarehouse
Clients
Source SourceSource. . .
Extractor/Monitor
Integration System
. . .
Metadata
Extractor/Monitor
Extractor/Monitor
• Information integrated in advance
• Stored in WH for direct querying and analysis
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 47
Advantages of Warehousing Approach
• High query performance– But not necessarily most current information
• Doesn’t interfere with local processing at sources– Complex queries at warehouse– OLTP at information sources
• Information copied at warehouse– Can modify, annotate, summarize, restructure, etc.– Can store historical information– Security, no auditing
• Has caught on in industry
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 48
Not Either-Or Decision
• Query-driven approach still better for– Rapidly changing information– Rapidly changing information sources– Truly vast amounts of data from large
numbers of sources– Clients with unpredictable needs
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 49
Data Warehouse EvolutionT
IME
200019951990198519801960 1975
Information-Based Management
DataRevolution
“MiddleAges”
“PrehistoricTimes”
RelationalDatabases
PC’s andSpreadsheets
End-userInterfaces
1st DW Article
DWConfs.
Vendor DWFrameworks
CompanyDWs
“Building theDW”
Inmon (1992)Data Replication
Tools
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 50
What is a Data Warehouse?
“A Data Warehouse is a –subject-oriented,– integrated,– time-variant,–non-volatile
collection of data used in support of management decision making processes.”
-- Inmon & Hackathorn, 1994: viz. Hoffer, Chap 11
IS 257 – Fall 2009 2009.11.03 SLIDE 51
DW Definition…
• Subject-Oriented:– The data warehouse is organized around the
key subjects (or high-level entities) of the enterprise. Major subjects include
• Customers• Patients• Students• Products• Etc.
IS 257 – Fall 2009 2009.11.03 SLIDE 52
DW Definition…
• Integrated– The data housed in the data warehouse are
defined using consistent• Naming conventions• Formats• Encoding Structures• Related Characteristics
IS 257 – Fall 2009 2009.11.03 SLIDE 53
DW Definition…
• Time-variant– The data in the warehouse contain a time
dimension so that they may be used as a historical record of the business
IS 257 – Fall 2009 2009.11.03 SLIDE 54
DW Definition…
• Non-volatile– Data in the data warehouse are loaded and
refreshed from operational systems, but cannot be updated by end-users
IS 257 – Fall 2009 2009.11.03 SLIDE 55
What is a Data Warehouse?A Practitioners Viewpoint
• “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
• -- Barry Devlin, IBM Consultant
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 56
A Data Warehouse is...
• Stored collection of diverse data– A solution to data integration problem– Single repository of information
• Subject-oriented– Organized by subject, not by application– Used for analysis, data mining, etc.
• Optimized differently from transaction-oriented db
• User interface aimed at executive decision makers and analysts
IS 257 – Fall 2009 2009.11.03 SLIDE 57
… Cont’d
• Large volume of data (Gb, Tb)• Non-volatile
– Historical– Time attributes are important
• Updates infrequent• May be append-only• Examples
– All transactions ever at WalMart– Complete client histories at insurance firm– Stockbroker financial information and portfolios
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 58
Warehouse is a Specialized DB
Standard DB• Mostly updates• Many small transactions• Mb - Gb of data• Current snapshot• Index/hash on p.k.• Raw data• Thousands of users (e.g.,
clerical users)
Warehouse• Mostly reads• Queries are long and
complex• Gb - Tb of data• History• Lots of scans• Summarized, reconciled
data• Hundreds of users (e.g.,
decision-makers, analysts)
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 59
Summary
Operational SystemsEnterpriseModeling
BusinessInformation Guide
DataWarehouse
Catalog
Data WarehousePopulation
DataWarehouse
Business InformationInterface
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 60
Warehousing and Industry
• Warehousing is big business– $2 billion in 1995– $3.5 billion in early 1997– Predicted: $8 billion in 1998 [Metagroup]
• Wal-Mart is said to have the largest warehouse– 1000-CPU, 583 Terabyte, Teradata system
(InformationWeek, Jan 9, 2006)– “Half a Petabyte” in warehouse (Ziff Davis Internet,
October 13, 2004)– 1 billion rows of data or more are updated every day
(InformationWeek, Jan 9, 2006)– Some Government and Scientific database are larger,
however
Slide credit: J. Hammer
IS 257 – Fall 2009 2009.11.03 SLIDE 61
Other Large Data Warehouses• Not including Wal-Mart and Ebay
(InformationWeek, Jan 9, 2006)
IS 257 – Fall 2009 2009.11.03 SLIDE 62
Types of Data
• Business Data - represents meaning– Real-time data (ultimate source of all business data)– Reconciled data– Derived data
• Metadata - describes meaning– Build-time metadata– Control metadata– Usage metadata
• Data as a product* - intrinsic meaning– Produced and stored for its own intrinsic value– e.g., the contents of a text-book
Slide credit: J. Hammer