© copyright 2000 m. rodriguez-martinez, all rights reserved automatic deployment of...
TRANSCRIPT
© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved
Automatic Deployment of Automatic Deployment of Application-Specific Metadata and Application-Specific Metadata and
Code in MOCHACode in MOCHA
Manuel Rodriguez-Martinez
Nick Roussopoulos
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
2
Client
IntroductionIntroduction
Database Middleware Systems: Used to integrate data from
multiple sources. Help to keep clients simple
• thin clients• economic ($$$) to deploy• Web-based GUI
Re-use existing servers• replacing them can be
expensive and dangerous Examples
• TSIMMIS, Garlic, DISCO, Oracle, Sybase, ...
Client
Oracle ImagesXML
Translator Translator Translator
IntegrationServer Catalog
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
3
Limitations of this SolutionLimitations of this Solution
Code Deployment ProblemCode Deployment Problem– Code for data types and
operators is user-defined• Polygon • Perimeter()
– Need to manually install the code to:
• clients• integration servers• translators
– Must be ported (C/C++ code)– Security (do not crash system) Does not scale well as the
number of sites increases• hard to deploy, upgrade and
maintain the code
ClientClient
Oracle ImagesXML
Translator Translator Translator
IntegrationServer Catalog
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
4
Limitations of this SolutionLimitations of this Solution
Query Processing ProblemQuery Processing Problem– Availability of code limits
operator placement options.• not all sites can evaluate the
operators in a query
– Integration server ends up doing most of the processing.
• data must be shipped to it
– Too much data movement! Does not scale well
• network becomes a major performance bottleneck
• limited bandwidth increases query execution time
ClientClient
Oracle ImagesXML
Translator Translator Translator
IntegrationServer Catalog
100MB
100MB
100MB
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
5
The MOCHA SolutionThe MOCHA Solution
Middleware system automatically deploys the code– ship Java classes for data types and operators
– done at run time in dynamic fashion
Provide information on how to use the code– metadata and control in XML and RDF
Exploit these features in query operator placement– place operators at sites that minimize data movement
• remote data sources get operators that filter the data• integration server gets operators that expand the data
– more on this: SIGMOD 2000 paper
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
6
MOCHA ArchitectureMOCHA Architecture
Client Client
Network
Oracle 8i InformixXML
RepositoryTextFiles
DAPDAP DAP DAP
QPC CatalogCodeRepository
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
7
Automatic Code DeploymentAutomatic Code Deployment
Select location, Composite(image)From RastersWhere week BETWEEN t1 and t2Group By location
QPC
Client
InternetInternet
CodeRepository
Catalog
Texas Virginia Maryland
DAP
Informix
DAP
Oracle
Virginia
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
8
InternetInternet
Answering the QueryAnswering the Query
Select location, Composite(image)From RastersWhere week BETWEEN t1 and t2Group By location
QPC
Client
CodeRepository
Catalog
Texas Virginia Maryland
DAP
Informix
DAP
Oracle
Virginia
200MB
tuples
100MB
tuples
results
200KB
results
150KB
results
150KB
results
200KBresults
150KB
results
200KB
results
350KB
results
350KB
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
9
Components of MOCHAComponents of MOCHA
• Client Application
• QPC– parsing (SQL)
– optimizing
– catalog management
– code deployment
– query execution
• DAP– data translation
– query execution
• Data Server– storage server
Client
QPC
Catalog
CodeRepository
DAP
Oracle
DAP
XML
DAP
Text
Internet
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
10
Catalog OrganizationCatalog Organization
• Holds information describing the structure and proper use of tables, data types and query operators.– Generically referred to as “resources”
• Each resource is uniquely identified by an URI:– mocha://cs1.umd.edu/EarthSci/Polygon
• Metadata is encoded using RDF (an XML derivative)makes it easy to understand, use and exchange metadata
• Each resource has a catalog entry in the form:
(URI, RDF File)
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
11
Metadata RequirementsMetadata Requirements
Select location, Composite(image)From RastersWhere week BETWEEN t1 and t2Group By location
locationimageweekband
Table RastersQuery:
1. What kind of metadata are needed?2. How to specified them?
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
12
RDF Model: RDF Model: Data TypesData Types
mocha:T
ype
mocha:Class
mocha:Repos ito
ry
mocha:Size
mocha:Creator
mocha://cs1.umd.edu/EarthSci/Raster
Raster
Raster.class cs1.umd.edu/EarthSci 1 megabyte
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
13
RDF Model: RDF Model: Query OperatorsQuery Operators
mocha:Aggregate
moch
a:Cl
ass
moc
ha:R
epos
itory
moc
ha:T
ype
moch
a:A
rgum
ents
Composite
mocha:Creator
moch
a:U
RI m
och
a:T
yp
e mocha:U
RI
rdf:type
Composite.class
cs1.umd.edu/EarthSci
. . .Raster rdf:Seq. . .
Raster
mocha:Result
mocha://cs1.umd.edu/EarthSci/Composite
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
14
RDF Model: RDF Model: TablesTables
mocha://cs1.umd.edu/EarthSciDB/Rasters
. . .
. . .
mocha:Table Rasters
cs1.umd.edu/EarthSciDB mocha:Database
mocha:Columns
rdf:type rdf:Seq
mocha:Owner [email protected]
moch
a:Colu
mn
moch
a:T
ype
mocha:URI
location Polygon . . .
mocha:Column
mocha:Type
mocha:URI
. . . image Raster
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
15
Metadata and Control ExchangeMetadata and Control Exchange
• QPC sends to each DAP: metadata for the data types and
operators they will receive query plan specifying task to do
• Metadata is serialized as XML– RDF serialization syntax
• Plans– XML documents– easy to use and understand– can be mapped to suitable form
• tree, DAG, graph, etc.
– prevents version inconsistencies• changes in Java classes
<rdf:Description about= “mocha://cs1.umd.edu/EarthSci/Raster”> <mocha:Type>Raster</mocha:Type> <mocha:Class> Raster.class </mocha:Class> <mocha:Repository>
cs1.umd.edu/EarthSci </mocha:Repository> <mocha:Size>1MB</mocha:Size> <mocha:Creator>[email protected] </mocha:Creator></rdf:Description>
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
16
Processing a Query in MOCHAProcessing a Query in MOCHA
Query Parsing
Resource Discovery
Query Optimization
Metadata and Control
Exchange
Code Deployment Phase
Query Execution
Select location, Composite(image)From RastersWhere week BETWEEN t1 and t2Group By location
locationimageweekband
Table Rasters
Query:
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
17
Performance of MOCHAPerformance of MOCHA
0
100
200
300
400
500
600
DB CPU NET MISC
shipping Composite() code to DAP cuts data movement by 99% 4-1 performance
improvement
Ru
nnin
g T
ime
(sec
s)
Select location, Composite(image)From RastersWhere week BETWEEN t1 and t2Group By location
Non-MOCHA MOCHA
Middleware Type
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
18
Benefits of MOCHABenefits of MOCHA
Middle-tier solution
Extensible
Java Code Re-usability– across platforms
Automatic Code Deployment – “Plug-n-Play”
Easier to Administer
XML-based Metadata
XML-based Control
Efficient Query Processing– data movement reduction– moving code vs. data
EDBT 2000 M. Rodriguez-Martinez – N. Roussopoulos
19
ConclusionsConclusions
• Identified limitations in existing middleware systems– Code Deployment Problem– Query Processing Problem
• Proposed a new framework to automate the deployment of new functionality:– automatic code deployment– efficient query processing
• Described its implementation in MOCHA,based on well-accepted technologies: Java, XML, RDF.
http://www.cs.umd.edu/projects/mocha/http://www.cs.umd.edu/projects/mocha/