information and security analytics lecture #1 unit #1: data management: overview

80
Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview Dr. Bhavani Thuraisingham May 27, 2010

Upload: gerek

Post on 09-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview. Dr. Bhavani Thuraisingham. May 27, 2010. Objective of the Unit. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

Information and Security Analytics

Lecture #1

Unit #1: Data Management: Overview

Dr. Bhavani Thuraisingham

May 27, 2010

Page 2: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-204/21/23 00:11

Objective of the Unit

0 This unit provides an overview of the developments in data management. It also provides an overview of data management, information management and knowledge management and illustrates a framework

0 Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997

Page 3: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-304/21/23 00:11

Outline of the Unit

0 What is Data Management?

0 Developments in Data Management

0 Current Status and Trends

0 Note on Data Administration

0 Data management, Information management, and Knowledge Management

Page 4: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-404/21/23 00:11

What is data management

0 One proposal: Data Management = Database System Management + Data Administration

0 Includes data analysis, data administration, database administration, auditing, data modeling, database system development, database application development

0 The tutorial will focus mainly on database system aspects of data management

Page 5: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-504/21/23 00:11

Developments in Database Systems

Network,Hierarchicaldatabase systems

Relational database systems, transaction processing, distributeddatabase systems

Heterogeneousdatabaseintegration,Migrating legacydatabases

Next generationdatabase systems:object-oriented,deductive, - - -

data warehousing,data mining,multimedia database systems,Internet database

Page 6: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-604/21/23 00:11

Current Status

DatabaseSystems

MultimediaDatabaseSystems

Data Warehousing Systems

Limitedintegration

betweenthe different

types of systems

Data Mining

Systems

SensorDatabaseSystems

Heterogeneous Database Systems

OftenStovepipedbyTechnology

Page 7: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-704/21/23 00:11

Vision for Database Management

Optical Storage Multimedia Databases

High-speed networks - voice, video, images

Multimedia Databases

Collaborative Analysis and Decision Making between Different Organizations

Robotic Tape Storage

Audio Video Text ImageryIn a recent report identifies to this organization it was learned that the Government forces overcame overwhelming odds. The attack took place at approximately 10:00 EST.

In a recent report identifies to this organization it was learned that the Government forces overcame overwhelming odds. The attack took place at approximately 10:00 EST.

Applications

Dissemination

$¥£

Distributed Multimedia Data Management

Integration ofDifferentTechnologies

Page 8: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-804/21/23 00:11

Some Outstanding Problems

HeterogeneousDatabase

Integration

MultimediaDatabase

Management

Real-timeDatabase

Management

Integrationwith other

Technologies

• Semantic heterogeneity• Inferencing• Transaction processing• Integrity• Security

• Data model• Index strategies• Synchronization• Data manipulation

• Quality of service• Operating system services • Transaction processing• Active databases

• Distributed processing• Mass storage• Information management• Knowledge management

MigratingLegacy

Applications

• Modernization• Enterprise modeling• Schema transformation

Integration

Page 9: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-904/21/23 00:11

Some Current Trends in Data Management0 Heterogeneous database integration

- Query, transactions, semantics, security and integrity

0 Migrating legacy databases

- Fine-grained encapsulation, distributed objects

0 Multimedia databases

- Query, model, quality-of-service, index

0 Data Warehousing

- Building a warehouse, query

0 Data Mining

- Multimedia databases, web data mining

0 Data management for collaboration

- Architecture, transactions

0 Web databases and digital libraries

- Query, transactions, index, security

Page 10: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1004/21/23 00:11

Interoperability of Heterogeneous Database Systems

Database System A Database System B

Network

Database System C(Legacy)

Transparent accessto heterogeneousdatabases - both usersand application programs;Query, Transactionprocessing

(Relational) (Object-Oriented)

Page 11: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1104/21/23 00:11

Note on Data Administration

0 Identifying the data

- Data may be in files, paper, databases, etc.

0 Analyzing the data

- Is the data of good quality?

- Is the data complete?

0 Data standardization

- Should one standardize all the data elements and metadata?

- Repositories for handling semantic heterogeneity?

0 Data modeling

- Structure the data, model the data and the processes

Page 12: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1204/21/23 00:11

Data, Information and Knowledge Management

0 Data Management

- Data: stored in databases, files or some media

- Data management includes modeling, storing, retrieving and anbalyzing the data

0 Information Management

- Information is what is obtained by making sense out of the data; E.g., Data with context

- Information management is about modeling, storing, retrieving and analyzing the information

0 Knowledge Management

- Knowledge is what is obtained when the information is understood; it enables one to take actions

- Knowledge management is about utilizing the knowledge to improve the business of an organization

Page 13: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1304/21/23 00:11

Data, Information and Knowledge Management: Alternative View: MITRE Model 1999/2000

Communication, Network, Operating System, Middleware

Data Management

Information Management

Knowledge Management

Decision

Support

Page 14: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1404/21/23 00:11

Information and Security Analytics

Lecture #1

Unit #2: Database Systems

Dr. Bhavani Thuraisingham

May 27, 2010

Page 15: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1504/21/23 00:11

Objective of the Unit

0 This unit will provide an overview of the concepts and developments in database systems

0 Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997

Page 16: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1604/21/23 00:11

Outline of the Unit

0 Concepts in database systems

0 Types of database systems

Page 17: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1704/21/23 00:11

Concepts in Database Systems

0 Definition of a Database system

0 Early systems

0 Metadata

0 Architectural Issues

- Schema, Functional

0 DBMS Design Issues

0 Other Issues

- Database design, Administration

Page 18: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1804/21/23 00:11

Database System

0 Consists of database, hardware, Database Management System (DBMS), and users

0 Database is the repository for persistent data

0 Hardware consists of secondary storage volumes, processors, and main memory

0 DBMS handles all users’ access to the database

0 Users include application programmers, end users, and the Database Administrator (DBA)

0 Need: Reduced redundancy, avoids inconsistency, ability to share data, enforce standards, apply security restrictions, maintain integrity, balance conflicting requirements

0 We have used the definition of a database management system given in C. J. Date’s Book (Addison Wesley, 1990)

Page 19: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-1904/21/23 00:11

An Example Database System

Database

Database Management SystemApplicationPrograms

Users

Adapted from C. J. Date, Addison Wesley, 1990

Page 20: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2004/21/23 00:11

Early systems: Hierarchical and Network Database Systems

Hierarchical Data Model

SUPPLIERS

SUPPLIES

PARTS

SUPPLIES

SUPPLIERS

SUPPLIES

PARTS

Network Data Model

Page 21: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2104/21/23 00:11

Metadata

0 Metadata describes the data in the database

- Example: Database D consists of a relation EMP with attributes SS#, Name, and Salary

0 Metadatabase stores the metadata

- Could be physically stored with the database

0 Metadatabase may also store constraints and administrative information

0 Metadata is also referred to as the schema or data dictionary

Page 22: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2204/21/23 00:11

Three-level Schema Architecture: Details

ExternalSchema A

ExternalSchema B

ConceptualSchema

InternalSchema

User A1 User A2 User A3 User B1 User B2

ExternalModel A

ExternalModel B

ConceptualModel

StoredDatabaseInternal Model

External/ConceptualMapping B

External/ConceptualMapping A

Conceptual/Internal Mapping

Page 23: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2304/21/23 00:11

Functional Architecture

User Interface Manager

QueryManager

Transaction Manager

Schema(Data Dictionary)Manager (metadata)

Security/IntegrityManager

FileManager

DiskManager

Data Management

Storage Management

Page 24: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2404/21/23 00:11

DBMS Design Issues

0 Query Processing

- Optimization techniques

0 Transaction Management

- Techniques for concurrency control and recovery

0 Metadata Management

- Techniques for querying and updating the metadatabase

0 Security/Integrity Maintenance

- Techniques for processing integrity constraints and enforcing access control rules

0 Storage management

- Access methods and index strategies for efficient access to the database

Page 25: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2504/21/23 00:11

Other Issues

0 Database design

- Generally a two-step process

=Semantic data model to capture the entities of the application and the relationships between the entities

=Generate the conceptual schema; theory of normal forms for relational databases

- Research on object-oriented approaches for database design

0 Database Administration

- Creating and deleting databases; backup and recovery, enforcing policies, auditing, etc.

Page 26: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2604/21/23 00:11

Types of Database Systems

0 Relational Database Systems

0 Object Database Systems

0 Deductive Database Systems

0 Other

- Real-time, Secure, Parallel, Scientific, Temporal, Wireless, Functional, Entity-Relationship, Sensor/Stream Database Systems, etc.

Page 27: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2704/21/23 00:11

Relational Database: Informal Overview

0 Collection of tables also called relations

0 Table has one or more columns also called attributes

0 Each table has zero or more rows also called tuples

0 Elements of a row take values from a pool of legal values

0 The values of one or more columns in a row uniquely identify the row. These columns form an identifier (also called key)

0 One identifier is designated as the unique identifier (also called primary key)

0 Querying relational databases using language called SQL (Structured Query Language)

Page 28: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2804/21/23 00:11

Relational Database: Example

Relation S:

S# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

Relation P:

P# PNAME COLOR WEIGHT CITYP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonP5 Cam Blue 12 ParisP6 Cog Red 19 London

Relation SP:

S# P# QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400

Page 29: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-2904/21/23 00:11

SQL: Data Manipulation

0 Select, Update, Delete, Insert

Examples:

SELECT S.S#, S.STATUSFROM SWHERE S.CITY = Paris

SELECT *FROM S

SELECT S.*, P.*FROM S, PWHERE S.CITY = P.CITY

UPDATE PSET COLOR = ‘Yellow’ WEIGHT = WEIGHT + 5 CITY = NULLWHERE P# = P2

Page 30: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3004/21/23 00:11

Features of Object-Oriented Database Systems Suitable for Advanced Applications

0 Objects (support for large and variable sized data blocks)

0 Class hierarchy (reusability)

0 Instance variables, composite and complex objects (complex data structures)

0 Methods, and message passing (object encapsulation)

0 Pointer swizzling (performance)

0 Tighter integration with programming languages (application program support)

0 Special mechanisms for long transactions and concurrency control, multimedia information management, schema management, versions management, storage management

Page 31: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3104/21/23 00:11

Concepts in Object Database Systems

0 Objects- every entity is an object

- Example: Book, Film, Employee, Car

0 Class

- Objects with common attributes are grouped into a class

0 Attributes or Instance Variables

- Properties of an object class inherited by the object instances

0 Class Hierarchy

- Parent-Child class hierarchy

0 Composite objects

- Book object with paragraphs, sections etc.

0 Methods

- Functions associated with a class

Page 32: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3204/21/23 00:11

Example Class Hierarchy

DocumentClass

D1 D2

Book Subclass

B1# of Chapters Volume #

Print-doc-att(ID)

Method1:

JournalSubclass

J1

Print-doc(ID)

Method2:

ID Name

Author

Publisher

Page 33: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3304/21/23 00:11

Example Composite Object

CompositeDocument

Object

Section 1Object

Section 2Object

Paragraph 1Object

Paragraph 2Object

Page 34: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3404/21/23 00:11

Deductive Database Systems

0 Database systems augmented with inference engines to deduce new data from existing data and rules

0 Example

- Rule: parent of a parent is a grandparent

- Data: John is Jane’s parent; Jane is Robert’s parent

- From the above, infer John is Robert’s grandparent

0 Loose and tight coupling architectures between the database system and inference engine

Page 35: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3504/21/23 00:11

Current Status

0 Database Systems is a mature technology; numerous products and prototypes

0 Much work followed in distributed and heterogeneous databases

0 Current directions include web database management as well as data management support for novel applications including E-commerce, Bioinformatics and Geoinformatics

0 Work still continues on developing new kinds of database systems including stream/sensor database systems

Page 36: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3604/21/23 00:11

Information and Security Analytics

Lecture #1

Unit #3: Distributed and Heterogeneous Database Systems

Dr. Bhavani Thuraisingham

May 27, 2010

Page 37: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3704/21/23 00:11

Objective of the Unit

0 This unit provides an overview of concepts in distributed and heterogeneous databases. In particular, definitions and functions, are discussed

0 Reference:

- Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997

- Heterogeneous Information Exchange and Organizational Hubs, Kluwer, 2002, Editors: Bestougeff, Dubois and Thuraisingham

Page 38: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3804/21/23 00:11

Outline of the Unit

0 Distributed Database Systems

- Architecture, Data Distribution, Functions

0 Heterogeneous Database Integration

0 Federated Database Management

0 Client-Server Database Management

0 Migrating Legacy Databases

0 Current Status and Directions

Page 39: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-3904/21/23 00:11

A Definition of a Distributed Database System

0 A collection of database systems connected via a network

0 The software that is responsible for interconnection is a Distributed Database Management System (DDBMS)

0 Each DBMS executes local applications and should be involved in at least one global application (Ceri and Pelagetti)

0 Homogeneous environment

Page 40: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4004/21/23 00:11

Architecture

Communication NetworkDistributed Processor 1

DBMS 1

Data-base 1 Data-

base 3

Data-base 2 DBMS 2

DBMS 3

Distributed Processor 2

Distributed Processor 3

Site 1

Site 2

Site 3

Page 41: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4104/21/23 00:11

Distributed Processor

DistributedQuery/UpdateProcessor

DistributedTransactionManager

Distributed Metadata Management

Network Interface

Local DBMS Interface

Integrity/SecurityManager

Page 42: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4204/21/23 00:11

Data Distribution

EMP1

SS# Name Salary

1 John 20 2 Paul 303 James 404 Jill 50

605 Mary6 Jane 70

D#

102020 201020

DnameD# MGR

10 30 40

Jane David Peter

DEPT1

SITE 1

SITE 2EMP2

SS# Name Salary9 Mathew 70

D#50

DnameD# MGR

50 Math John

Physics

DEPT2

David 80 30

Peter 90 40

7

8

C. Sci. English French

20 Paul

Page 43: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4304/21/23 00:11

Distributed Database Functions

0 Distributed Query Processing

- Optimization techniques across the databases

0 Distributed Transaction Management

- Techniques for distributed concurrency control and recovery

0 Distributed Metadata Management

- Techniques for managing the distributed metadata

0 Distributed Security/Integrity Maintenance

- Techniques for processing integrity constraints and enforcing access control rules across the databases

Page 44: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4404/21/23 00:11

DBMS 1

DQP DQP

DBMS 2

DQP

DBMS 3

EMP1 (20) EMP2 (30)DEPT2 (20)

EMP1 (20)EMP3 (50)DEPT3 (30)

Network

Query at site 1: Join EMP and DEPT on D#

Move EMP2 to site 3; Merge EMP1, EMP2, EMP3 to form EMPMove DEPT2 to site 3; Merge DEPT2 and DEPT3 to form DEPTJoin EMP and DEPT; Move result to site 1

Query Processing Example (Concluded)DQP(DistributedQueryProcessor)

Page 45: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4504/21/23 00:11

Transaction Processing Example

Site 1Coordinator

Transaction Tj

Site 2Participant

Site 3Participant

Site 4Participant

Subtransaction Tj2 Subtransaction Tj3

Subtransaction Tj4

Issues:Concurrency controlRecoveryData Replication

Two-phase commit:Coordinator queries participants whether they are ready to commitIf all participants agree, then coordinator sends request forthe participants to commit

DTM (Distributed Transaction Manager) responsible for executing the distributedtransaction

Page 46: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4604/21/23 00:11

Interoperability of Heterogeneous Database Systems

Database System A Database System B

Network

Database System C(Legacy)

Transparent accessto heterogeneousdatabases - both usersand application programs;Query, Transactionprocessing

(Relational) (Object-Oriented)

Page 47: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4704/21/23 00:11

Technical Issues on the Interoperability of Heterogeneous Database Systems

0 Heterogeneity with respect to data models, schema, query processing, query languages, transaction management, semantics, integrity, and security policies

0 Interoperability based on client-server architectures

0 Federated database management

- Collection of cooperating, autonomous, and possibly heterogeneous component database systems, each belonging to one or more federations

Page 48: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4804/21/23 00:11

Different Data Models

Node A Node B

Database Database

RelationalModel

NetworkModel

Node C

Database

Object-Oriented Model

Network

Node D

Database

HierarchicalModel

Developments: Tools for interoperability; commercial productsChallenges: Global data model

Page 49: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-4904/21/23 00:11

Schema Integration and Transformation: An approach

Schemadescribing

the networkdatabase

Schemadescribing

the hierarchicaldatabase

Schemadescribing

the object-orienteddatabase

Global Schema: Integratethe generic schemas

ExternalSchema I

External Schema II

External Schema III

Schemadescribing

the relationaldatabase

Generic schemadescribing

the relationaldatabase

Generic schemadescribing

the networkdatabase

Generic schemadescribing

the hierarchicaldatabase

Generic schemadescribing

the object-orienteddatabase

Challenges: Selecting appropriate generic representation; maintaining consistency during transformations; schema evolution

Page 50: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5004/21/23 00:11

Semantic Heterogeneity0 Semantic heterogeneity occurs when there is a disagreement about

the meaning or interpretation of the same data

Object O

Node A Node B

Database Database

Object Ointerpreted as

a passenger ship

Object Ointerpreted asa submarine

Challenges:Standard definitions;Repositories

Page 51: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5104/21/23 00:11

Federated Database Management

Database System A Database System B

Database System C

Cooperating databasesystems yet maintainingsome degree ofautonomy

Federation F1

Federation F2

Page 52: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5204/21/23 00:11

Autonomy

Component A Component B

Component C

local request

request from component

communicationthrough

federation

component Adoes not

communicatewith

component C

component A honorsthe local request first

Challenges:Adapt techniques to handle autonomy -e.g., transactionprocessing, schema integration; transitionresearch to products

Page 53: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5304/21/23 00:11

Schema Integration and Transformation in a Federated Environment

Adapted from Sheth and Larson, ACM Computing Surveys, September 1990

Component Schema for Component A

Component Schema for Component B

Component Schema for Component C

Local Schema 1

Local Schema 2

Generic Schema for Component A

Generic Schemafor Component B

Generic Schemafor Component C

Export Schemafor Component A

Export Schema Ifor Component B

Export Schemafor Component C

Federated Schemafor FDS - 1

Federated Schemafor FDS - 2

ExternalSchema 1.2 Schema 2.1

ExternalSchema 2.2

ExternalSchema 1.1

Export Schema IIfor Component B

External

Page 54: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5404/21/23 00:11

Security Policy Integration

Policies at the Componentlevel: e.g., Component policiesfor components A, B, and C

Generic policies for the components:e.g., generic policies for components A, B, and C

Export policies for the components:e.g., export policies for components A, B, and C(note: component may export different policiesto different federations)

Federated policies: integrate export policies of the components of the federation

External policies: Policiesfor the various classes of users

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Page 55: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5504/21/23 00:11

Federated Data and Policy Management

ExportData/Policy

ComponentData/Policy for

Agency A

Data/Policy for Federation

ExportData/Policy

ComponentData/Policy for

Agency C

ComponentData/Policy for

Agency B

ExportData/Policy

Page 56: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5604/21/23 00:11

Client-Server Architecture: Example

Network

Clientfrom Vendor A

Clientfrom Vendor B

Serverfrom Vendor C Server

from Vendor D

Database Database

Page 57: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5704/21/23 00:11

Remote Database Access (RDA) Model

RDA Service Provider

RDA Client RDA ServerDatabase

RDA ClientInterface

RDA ServerInterface

Interface between client and service provider can operate in synchronous or asynchronous modes

Page 58: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5804/21/23 00:11

Example Three-Tier Architecture

Client:User Interface Processing

Server:Local DBMS

NetworkIntermediate:Distributed Processor,Business Rules,Logic

Page 59: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-5904/21/23 00:11

Object-based Interoperability

Object Request Broker

Client

Object

Server

Object

Example Object Request Broker: Object Management Group’s (OMG) CORBA (Common Object Request Broker Architecture)

Page 60: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6004/21/23 00:11

Javasoft’s RMI (Remote Method Invocation)

RMI Business Objects

Clients Java-based Servers

Page 61: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6104/21/23 00:11

Microsoft’s Open Database Connectivity

DBMSVendor B

Microsoft’s ODBC

DBMSVendor A

ODBC Driverfor DBMS A

DatabaseA

Database B

ODBC Driverfor DBMS B

MicrosoftApplication C Microsoft

Application D

Page 62: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6204/21/23 00:11

Overview: Migrating Legacy Systems

0 Many of the current systems and applications may become obsolete

0 Need an approach to migrate these systems to new architectures

0 Evolutionary approach: incremental transition of today's systems into more flexible systems

0 Extensible system architecture ultimately replaces today's hardware and software architecture

0 Open systems approach, standards

Page 63: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6304/21/23 00:11

Migrating Legacy Database and Applications

0 Build business model in a sub-domain and relate data to existing databases and systems.

0 Wrap existing systems to provide access as needed.0 Incorporate middle tier services and begin migrating

workflow.0 Gradually migrate business logic and rely on business

objects for end-user systems.

Page 64: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6404/21/23 00:11

Migrating Business Logic

container

middle tier

businessobjects

data entry

AirspaceAirspace

Airspace2Airspace3

Airspace4

Airspace5

time turnpoints Elevations

xx,xx,xx xx,xx,xxnn:nnnn:nn

nn:nn

nn:nn

nn:nn

xx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xx

xx,xx,xx xx,xx,xxEtc.......

...

...

...

visualization

client tier

blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,

word processing

existing databases

AirspaceAirspace

Airspace2Airspace3

Airspace4

Airspace5

time turnpoints Elevations

xx,xx,xx xx,xx,xxnn:nnnn:nn

nn:nn

nn:nn

nn:nn

xx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xx

xx,xx,xx xx,xx,xxEtc.......

...

...

...

server tier

existing systems

existing processes

blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,

EDI Artifacts

distributionservices

CO

RB

AC

OR

BA

business logic

Page 65: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6604/21/23 00:11

Application vs. Database Migration

0 Extract schema from the legacy code

- Use reengineering tools

0 Extract metadata associated with the data

0 Deal with incomplete data and fill in the gaps

0 Build schemas in the target system from the extracted schema

0 Build the database

Page 66: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6704/21/23 00:11

Example: Legacy Migration using Objects

STOMPS

ACMJMPP

AUTODIN

JANAP128mesg:USMTF,ASCIItextJANAP128

mesg:USMTF,ASCIItext

CSP

Mesg:USMTF

IMOM

REM

JOTS

CAFMS

BASS

CIDB

ICM

CAFWSP

CTAPSRemote

ADS

RAAP

JMEMWO;Cmd

ATO; USMTFCMS

CMS

JMAPSWCCSData;USMTF

WCCS Data;JQL

WCCS Data;SQL

WCCS

JDSS

TACREP,ABSTATACSAMSTAT;USMTF - X.25

ATO, ACO;USMTF - X.25

ATO Data;SQL

ACO Data;SQL

APS

IPL

OB; IDBTF

Stored Procedures; SQLINTEL mesg;

USMTF, ASCII text

TNL, WO; IDBTF

IDBTFParser

Loader

SQL

IDBTF

TNL, WO;IDBTF

SQL

SQL

SQL

SQL

SQL

ATO SQL

WX Data; ASCII text

ATO; SQL

SQL

Logistics Data; SQL

ACO; USMTF

ACO; Text

ACO; Text

EOB; SQLATO; USMTF

SB, CS

WX Data;ASCII text - AWN

UGDF

ATO;USMTF - X.25

SQL

UFLINKCI

CMP

UMSG

CTAPS - Contingency Theater Automated Planning System

Application Interfaces Domain Interfaces Common Facilities

Object Services

Object Request Broker

TargettingPlanning/ATO

Collection Mgt...

MCG&IMessagingWeather...

User InterfaceCompound Data

System & Task Mgt...

SecurityConcurrency

Transactions...

Page 67: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6804/21/23 00:11

Example Lessons Learned:Experience with CORBA

0 CORBA provides an evolvable system integration platform

0 CORBA provides a path for legacy migration

- Applications can be coarsely wrapped as CORBA objects, providing 100% reuse

=Wrapping is a relatively straight forward technique

=Need to dig to uncover hidden dependencies

=Does not address duplication of common functions

- Applications can be reengineered to replace duplicated functions with CORBA based common services

=Substantially more difficult than coarse wrapping

Page 68: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-6904/21/23 00:11

Example: Migration using Object for Real-time Systems

Technology

provided by

Project

Technology

provided by

Project

Hardware

Display Processor

&Refresh

Channels

Consoles(14)

Navigation

Sensors

Data LinksData Analysis Programming

Group (DAPG)

FutureApp

FutureApp

FutureApp

Multi-SensorTracks

SensorDetections

Real Time Operating System

MSIApp

DataMgmt. Data

Xchg.

Infrastructure Services

Interface to DAPG, etc., will be simulated for project demonstration

Interface to DAPG, etc., will be simulated for project demonstration

Page 69: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7004/21/23 00:11

Current Status and Directions

0 Developments- Several prototypes and some commercial products

- Tools for schema integration and transformation

- Standards for interoperable database systems

0 Challenges being addressed- Semantic heterogeneity

- Autonomy and federation

- Global transaction management

- Integrity and Security

0 New challenges

- Scale

- Web data management

Page 70: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7104/21/23 00:11

Information and Security Analytics

Dr. Bhavani Thuraisingham

The University of Texas at Dallas

Lecture #1

Unit #4

Data Warehousing

May 28, 2010

Page 71: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7204/21/23 00:11

Outline

0 Data Warehousing

0 Data Warehouse to Data Mining

Page 72: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7304/21/23 00:11

What is a Data Warehouse?

0 A Data Warehouse is a:

- Subject-oriented

- Integrated

- Nonvolatile

- Time variant

- Collection of data in support of management’s decisions

- From: Building the Data Warehouse by W. H. Inmon, John Wiley and Sons

0 Integration of heterogeneous data sources into a repository

0 Summary reports, aggregate functions, etc.

Page 73: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7404/21/23 00:11

Example Data Warehouse

OracleDBMS forEmployees

SybaseDBMS forProjects

InformixDBMS forMedical

Data Warehouse:Data correlatingEmployees WithMedical Benefitsand Projects

Could beany DBMS; Usually based on the relational data model

UsersQuerythe Warehouse

Page 74: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7504/21/23 00:11

Some Data Warehousing Technologies

0 Heterogeneous Database Integration

0 Statistical Databases

0 Data Modeling

0 Metadata

0 Access Methods and Indexing

0 Language Interface

0 Database Administration

0 Parallel Database Management

Page 75: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7604/21/23 00:11

Data Warehouse Design

0 Appropriate Data Model is key to designing the Warehouse

0 Higher Level Model in stages

- Stage 1: Corporate data model

- Stage 2: Enterprise data model

- Stage 3: Warehouse data model

0 Middle-level data model

- A model for possibly for each subject area in the higher level model

0 Physical data model

- Include features such as keys in the middle-level model

0 Need to determine appropriate levels of granularity of data in order to build a good data warehouse

Page 76: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7704/21/23 00:11

Distributing the Data Warehouse

0 Issues similar to distributed database systems

Distributed Warehouse

Central Bank

Branch A Branch B

CentralWarehouse

CentralBank

Branch A Branch B

CentralWarehouse

Branch BWarehouse

Branch AWarehouse

Non-distributed Warehouse

Page 77: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7804/21/23 00:11

Multidimensional Data Model

Project Name

Project Leader

Project Sponsor

Project Cost

Project Duration

Dollars

Pounds

Yen

Years

Months

Weeks

Project Name

Project Leader

Project Sponsor

Project Cost

Project Duration

Dollars

Pounds

Yen

Years

Months

Weeks

Page 78: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-7904/21/23 00:11

Indexing for Data Warehousing

0 Bit-Maps

0 Multi-level indexing

0 Storing parts or all of the index files in main memory

0 Dynamic indexing

Page 79: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-8004/21/23 00:11

Metadata Mappings

Metadatafor Data source A

Metadatafor Data source B

Metadatafor Data source C

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadatafor the Warehouse

Metadatafor Data source A

Metadatafor Data source B

Metadatafor Data source C

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadatafor the Warehouse

Page 80: Information and  Security Analytics Lecture #1 Unit #1:  Data Management: Overview

1-8104/21/23 00:11

Data Mining

Data MiningKnowledge Mining

Knowledge Discoveryin Databases

Data Archaeology

Data Dredging

Database MiningKnowledge Extraction

Data Pattern Processing

Information Harvesting

Siftware

The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data, often previously unknown, using pattern recognition technologies and statistical and mathematical techniques(Thuraisingham 1998)