information and security analytics lecture #1 unit #1: data management: overview
DESCRIPTION
Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview. Dr. Bhavani Thuraisingham. May 27, 2010. Objective of the Unit. - PowerPoint PPT PresentationTRANSCRIPT
Information and Security Analytics
Lecture #1
Unit #1: Data Management: Overview
Dr. Bhavani Thuraisingham
May 27, 2010
1-204/21/23 00:11
Objective of the Unit
0 This unit provides an overview of the developments in data management. It also provides an overview of data management, information management and knowledge management and illustrates a framework
0 Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997
1-304/21/23 00:11
Outline of the Unit
0 What is Data Management?
0 Developments in Data Management
0 Current Status and Trends
0 Note on Data Administration
0 Data management, Information management, and Knowledge Management
1-404/21/23 00:11
What is data management
0 One proposal: Data Management = Database System Management + Data Administration
0 Includes data analysis, data administration, database administration, auditing, data modeling, database system development, database application development
0 The tutorial will focus mainly on database system aspects of data management
1-504/21/23 00:11
Developments in Database Systems
Network,Hierarchicaldatabase systems
Relational database systems, transaction processing, distributeddatabase systems
Heterogeneousdatabaseintegration,Migrating legacydatabases
Next generationdatabase systems:object-oriented,deductive, - - -
data warehousing,data mining,multimedia database systems,Internet database
1-604/21/23 00:11
Current Status
DatabaseSystems
MultimediaDatabaseSystems
Data Warehousing Systems
Limitedintegration
betweenthe different
types of systems
Data Mining
Systems
SensorDatabaseSystems
Heterogeneous Database Systems
OftenStovepipedbyTechnology
1-704/21/23 00:11
Vision for Database Management
Optical Storage Multimedia Databases
High-speed networks - voice, video, images
Multimedia Databases
Collaborative Analysis and Decision Making between Different Organizations
Robotic Tape Storage
Audio Video Text ImageryIn a recent report identifies to this organization it was learned that the Government forces overcame overwhelming odds. The attack took place at approximately 10:00 EST.
In a recent report identifies to this organization it was learned that the Government forces overcame overwhelming odds. The attack took place at approximately 10:00 EST.
Applications
Dissemination
$¥£
Distributed Multimedia Data Management
Integration ofDifferentTechnologies
1-804/21/23 00:11
Some Outstanding Problems
HeterogeneousDatabase
Integration
MultimediaDatabase
Management
Real-timeDatabase
Management
Integrationwith other
Technologies
• Semantic heterogeneity• Inferencing• Transaction processing• Integrity• Security
• Data model• Index strategies• Synchronization• Data manipulation
• Quality of service• Operating system services • Transaction processing• Active databases
• Distributed processing• Mass storage• Information management• Knowledge management
MigratingLegacy
Applications
• Modernization• Enterprise modeling• Schema transformation
Integration
1-904/21/23 00:11
Some Current Trends in Data Management0 Heterogeneous database integration
- Query, transactions, semantics, security and integrity
0 Migrating legacy databases
- Fine-grained encapsulation, distributed objects
0 Multimedia databases
- Query, model, quality-of-service, index
0 Data Warehousing
- Building a warehouse, query
0 Data Mining
- Multimedia databases, web data mining
0 Data management for collaboration
- Architecture, transactions
0 Web databases and digital libraries
- Query, transactions, index, security
1-1004/21/23 00:11
Interoperability of Heterogeneous Database Systems
Database System A Database System B
Network
Database System C(Legacy)
Transparent accessto heterogeneousdatabases - both usersand application programs;Query, Transactionprocessing
(Relational) (Object-Oriented)
1-1104/21/23 00:11
Note on Data Administration
0 Identifying the data
- Data may be in files, paper, databases, etc.
0 Analyzing the data
- Is the data of good quality?
- Is the data complete?
0 Data standardization
- Should one standardize all the data elements and metadata?
- Repositories for handling semantic heterogeneity?
0 Data modeling
- Structure the data, model the data and the processes
1-1204/21/23 00:11
Data, Information and Knowledge Management
0 Data Management
- Data: stored in databases, files or some media
- Data management includes modeling, storing, retrieving and anbalyzing the data
0 Information Management
- Information is what is obtained by making sense out of the data; E.g., Data with context
- Information management is about modeling, storing, retrieving and analyzing the information
0 Knowledge Management
- Knowledge is what is obtained when the information is understood; it enables one to take actions
- Knowledge management is about utilizing the knowledge to improve the business of an organization
1-1304/21/23 00:11
Data, Information and Knowledge Management: Alternative View: MITRE Model 1999/2000
Communication, Network, Operating System, Middleware
Data Management
Information Management
Knowledge Management
Decision
Support
1-1404/21/23 00:11
Information and Security Analytics
Lecture #1
Unit #2: Database Systems
Dr. Bhavani Thuraisingham
May 27, 2010
1-1504/21/23 00:11
Objective of the Unit
0 This unit will provide an overview of the concepts and developments in database systems
0 Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997
1-1604/21/23 00:11
Outline of the Unit
0 Concepts in database systems
0 Types of database systems
1-1704/21/23 00:11
Concepts in Database Systems
0 Definition of a Database system
0 Early systems
0 Metadata
0 Architectural Issues
- Schema, Functional
0 DBMS Design Issues
0 Other Issues
- Database design, Administration
1-1804/21/23 00:11
Database System
0 Consists of database, hardware, Database Management System (DBMS), and users
0 Database is the repository for persistent data
0 Hardware consists of secondary storage volumes, processors, and main memory
0 DBMS handles all users’ access to the database
0 Users include application programmers, end users, and the Database Administrator (DBA)
0 Need: Reduced redundancy, avoids inconsistency, ability to share data, enforce standards, apply security restrictions, maintain integrity, balance conflicting requirements
0 We have used the definition of a database management system given in C. J. Date’s Book (Addison Wesley, 1990)
1-1904/21/23 00:11
An Example Database System
Database
Database Management SystemApplicationPrograms
Users
Adapted from C. J. Date, Addison Wesley, 1990
1-2004/21/23 00:11
Early systems: Hierarchical and Network Database Systems
Hierarchical Data Model
SUPPLIERS
SUPPLIES
PARTS
SUPPLIES
SUPPLIERS
SUPPLIES
PARTS
Network Data Model
1-2104/21/23 00:11
Metadata
0 Metadata describes the data in the database
- Example: Database D consists of a relation EMP with attributes SS#, Name, and Salary
0 Metadatabase stores the metadata
- Could be physically stored with the database
0 Metadatabase may also store constraints and administrative information
0 Metadata is also referred to as the schema or data dictionary
1-2204/21/23 00:11
Three-level Schema Architecture: Details
ExternalSchema A
ExternalSchema B
ConceptualSchema
InternalSchema
User A1 User A2 User A3 User B1 User B2
ExternalModel A
ExternalModel B
ConceptualModel
StoredDatabaseInternal Model
External/ConceptualMapping B
External/ConceptualMapping A
Conceptual/Internal Mapping
1-2304/21/23 00:11
Functional Architecture
User Interface Manager
QueryManager
Transaction Manager
Schema(Data Dictionary)Manager (metadata)
Security/IntegrityManager
FileManager
DiskManager
Data Management
Storage Management
1-2404/21/23 00:11
DBMS Design Issues
0 Query Processing
- Optimization techniques
0 Transaction Management
- Techniques for concurrency control and recovery
0 Metadata Management
- Techniques for querying and updating the metadatabase
0 Security/Integrity Maintenance
- Techniques for processing integrity constraints and enforcing access control rules
0 Storage management
- Access methods and index strategies for efficient access to the database
1-2504/21/23 00:11
Other Issues
0 Database design
- Generally a two-step process
=Semantic data model to capture the entities of the application and the relationships between the entities
=Generate the conceptual schema; theory of normal forms for relational databases
- Research on object-oriented approaches for database design
0 Database Administration
- Creating and deleting databases; backup and recovery, enforcing policies, auditing, etc.
1-2604/21/23 00:11
Types of Database Systems
0 Relational Database Systems
0 Object Database Systems
0 Deductive Database Systems
0 Other
- Real-time, Secure, Parallel, Scientific, Temporal, Wireless, Functional, Entity-Relationship, Sensor/Stream Database Systems, etc.
1-2704/21/23 00:11
Relational Database: Informal Overview
0 Collection of tables also called relations
0 Table has one or more columns also called attributes
0 Each table has zero or more rows also called tuples
0 Elements of a row take values from a pool of legal values
0 The values of one or more columns in a row uniquely identify the row. These columns form an identifier (also called key)
0 One identifier is designated as the unique identifier (also called primary key)
0 Querying relational databases using language called SQL (Structured Query Language)
1-2804/21/23 00:11
Relational Database: Example
Relation S:
S# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
Relation P:
P# PNAME COLOR WEIGHT CITYP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonP5 Cam Blue 12 ParisP6 Cog Red 19 London
Relation SP:
S# P# QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400
1-2904/21/23 00:11
SQL: Data Manipulation
0 Select, Update, Delete, Insert
Examples:
SELECT S.S#, S.STATUSFROM SWHERE S.CITY = Paris
SELECT *FROM S
SELECT S.*, P.*FROM S, PWHERE S.CITY = P.CITY
UPDATE PSET COLOR = ‘Yellow’ WEIGHT = WEIGHT + 5 CITY = NULLWHERE P# = P2
1-3004/21/23 00:11
Features of Object-Oriented Database Systems Suitable for Advanced Applications
0 Objects (support for large and variable sized data blocks)
0 Class hierarchy (reusability)
0 Instance variables, composite and complex objects (complex data structures)
0 Methods, and message passing (object encapsulation)
0 Pointer swizzling (performance)
0 Tighter integration with programming languages (application program support)
0 Special mechanisms for long transactions and concurrency control, multimedia information management, schema management, versions management, storage management
1-3104/21/23 00:11
Concepts in Object Database Systems
0 Objects- every entity is an object
- Example: Book, Film, Employee, Car
0 Class
- Objects with common attributes are grouped into a class
0 Attributes or Instance Variables
- Properties of an object class inherited by the object instances
0 Class Hierarchy
- Parent-Child class hierarchy
0 Composite objects
- Book object with paragraphs, sections etc.
0 Methods
- Functions associated with a class
1-3204/21/23 00:11
Example Class Hierarchy
DocumentClass
D1 D2
Book Subclass
B1# of Chapters Volume #
Print-doc-att(ID)
Method1:
JournalSubclass
J1
Print-doc(ID)
Method2:
ID Name
Author
Publisher
1-3304/21/23 00:11
Example Composite Object
CompositeDocument
Object
Section 1Object
Section 2Object
Paragraph 1Object
Paragraph 2Object
1-3404/21/23 00:11
Deductive Database Systems
0 Database systems augmented with inference engines to deduce new data from existing data and rules
0 Example
- Rule: parent of a parent is a grandparent
- Data: John is Jane’s parent; Jane is Robert’s parent
- From the above, infer John is Robert’s grandparent
0 Loose and tight coupling architectures between the database system and inference engine
1-3504/21/23 00:11
Current Status
0 Database Systems is a mature technology; numerous products and prototypes
0 Much work followed in distributed and heterogeneous databases
0 Current directions include web database management as well as data management support for novel applications including E-commerce, Bioinformatics and Geoinformatics
0 Work still continues on developing new kinds of database systems including stream/sensor database systems
1-3604/21/23 00:11
Information and Security Analytics
Lecture #1
Unit #3: Distributed and Heterogeneous Database Systems
Dr. Bhavani Thuraisingham
May 27, 2010
1-3704/21/23 00:11
Objective of the Unit
0 This unit provides an overview of concepts in distributed and heterogeneous databases. In particular, definitions and functions, are discussed
0 Reference:
- Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997
- Heterogeneous Information Exchange and Organizational Hubs, Kluwer, 2002, Editors: Bestougeff, Dubois and Thuraisingham
1-3804/21/23 00:11
Outline of the Unit
0 Distributed Database Systems
- Architecture, Data Distribution, Functions
0 Heterogeneous Database Integration
0 Federated Database Management
0 Client-Server Database Management
0 Migrating Legacy Databases
0 Current Status and Directions
1-3904/21/23 00:11
A Definition of a Distributed Database System
0 A collection of database systems connected via a network
0 The software that is responsible for interconnection is a Distributed Database Management System (DDBMS)
0 Each DBMS executes local applications and should be involved in at least one global application (Ceri and Pelagetti)
0 Homogeneous environment
1-4004/21/23 00:11
Architecture
Communication NetworkDistributed Processor 1
DBMS 1
Data-base 1 Data-
base 3
Data-base 2 DBMS 2
DBMS 3
Distributed Processor 2
Distributed Processor 3
Site 1
Site 2
Site 3
1-4104/21/23 00:11
Distributed Processor
DistributedQuery/UpdateProcessor
DistributedTransactionManager
Distributed Metadata Management
Network Interface
Local DBMS Interface
Integrity/SecurityManager
1-4204/21/23 00:11
Data Distribution
EMP1
SS# Name Salary
1 John 20 2 Paul 303 James 404 Jill 50
605 Mary6 Jane 70
D#
102020 201020
DnameD# MGR
10 30 40
Jane David Peter
DEPT1
SITE 1
SITE 2EMP2
SS# Name Salary9 Mathew 70
D#50
DnameD# MGR
50 Math John
Physics
DEPT2
David 80 30
Peter 90 40
7
8
C. Sci. English French
20 Paul
1-4304/21/23 00:11
Distributed Database Functions
0 Distributed Query Processing
- Optimization techniques across the databases
0 Distributed Transaction Management
- Techniques for distributed concurrency control and recovery
0 Distributed Metadata Management
- Techniques for managing the distributed metadata
0 Distributed Security/Integrity Maintenance
- Techniques for processing integrity constraints and enforcing access control rules across the databases
1-4404/21/23 00:11
DBMS 1
DQP DQP
DBMS 2
DQP
DBMS 3
EMP1 (20) EMP2 (30)DEPT2 (20)
EMP1 (20)EMP3 (50)DEPT3 (30)
Network
Query at site 1: Join EMP and DEPT on D#
Move EMP2 to site 3; Merge EMP1, EMP2, EMP3 to form EMPMove DEPT2 to site 3; Merge DEPT2 and DEPT3 to form DEPTJoin EMP and DEPT; Move result to site 1
Query Processing Example (Concluded)DQP(DistributedQueryProcessor)
1-4504/21/23 00:11
Transaction Processing Example
Site 1Coordinator
Transaction Tj
Site 2Participant
Site 3Participant
Site 4Participant
Subtransaction Tj2 Subtransaction Tj3
Subtransaction Tj4
Issues:Concurrency controlRecoveryData Replication
Two-phase commit:Coordinator queries participants whether they are ready to commitIf all participants agree, then coordinator sends request forthe participants to commit
DTM (Distributed Transaction Manager) responsible for executing the distributedtransaction
1-4604/21/23 00:11
Interoperability of Heterogeneous Database Systems
Database System A Database System B
Network
Database System C(Legacy)
Transparent accessto heterogeneousdatabases - both usersand application programs;Query, Transactionprocessing
(Relational) (Object-Oriented)
1-4704/21/23 00:11
Technical Issues on the Interoperability of Heterogeneous Database Systems
0 Heterogeneity with respect to data models, schema, query processing, query languages, transaction management, semantics, integrity, and security policies
0 Interoperability based on client-server architectures
0 Federated database management
- Collection of cooperating, autonomous, and possibly heterogeneous component database systems, each belonging to one or more federations
1-4804/21/23 00:11
Different Data Models
Node A Node B
Database Database
RelationalModel
NetworkModel
Node C
Database
Object-Oriented Model
Network
Node D
Database
HierarchicalModel
Developments: Tools for interoperability; commercial productsChallenges: Global data model
1-4904/21/23 00:11
Schema Integration and Transformation: An approach
Schemadescribing
the networkdatabase
Schemadescribing
the hierarchicaldatabase
Schemadescribing
the object-orienteddatabase
Global Schema: Integratethe generic schemas
ExternalSchema I
External Schema II
External Schema III
Schemadescribing
the relationaldatabase
Generic schemadescribing
the relationaldatabase
Generic schemadescribing
the networkdatabase
Generic schemadescribing
the hierarchicaldatabase
Generic schemadescribing
the object-orienteddatabase
Challenges: Selecting appropriate generic representation; maintaining consistency during transformations; schema evolution
1-5004/21/23 00:11
Semantic Heterogeneity0 Semantic heterogeneity occurs when there is a disagreement about
the meaning or interpretation of the same data
Object O
Node A Node B
Database Database
Object Ointerpreted as
a passenger ship
Object Ointerpreted asa submarine
Challenges:Standard definitions;Repositories
1-5104/21/23 00:11
Federated Database Management
Database System A Database System B
Database System C
Cooperating databasesystems yet maintainingsome degree ofautonomy
Federation F1
Federation F2
1-5204/21/23 00:11
Autonomy
Component A Component B
Component C
local request
request from component
communicationthrough
federation
component Adoes not
communicatewith
component C
component A honorsthe local request first
Challenges:Adapt techniques to handle autonomy -e.g., transactionprocessing, schema integration; transitionresearch to products
1-5304/21/23 00:11
Schema Integration and Transformation in a Federated Environment
Adapted from Sheth and Larson, ACM Computing Surveys, September 1990
Component Schema for Component A
Component Schema for Component B
Component Schema for Component C
Local Schema 1
Local Schema 2
Generic Schema for Component A
Generic Schemafor Component B
Generic Schemafor Component C
Export Schemafor Component A
Export Schema Ifor Component B
Export Schemafor Component C
Federated Schemafor FDS - 1
Federated Schemafor FDS - 2
ExternalSchema 1.2 Schema 2.1
ExternalSchema 2.2
ExternalSchema 1.1
Export Schema IIfor Component B
External
1-5404/21/23 00:11
Security Policy Integration
Policies at the Componentlevel: e.g., Component policiesfor components A, B, and C
Generic policies for the components:e.g., generic policies for components A, B, and C
Export policies for the components:e.g., export policies for components A, B, and C(note: component may export different policiesto different federations)
Federated policies: integrate export policies of the components of the federation
External policies: Policiesfor the various classes of users
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
1-5504/21/23 00:11
Federated Data and Policy Management
ExportData/Policy
ComponentData/Policy for
Agency A
Data/Policy for Federation
ExportData/Policy
ComponentData/Policy for
Agency C
ComponentData/Policy for
Agency B
ExportData/Policy
1-5604/21/23 00:11
Client-Server Architecture: Example
Network
Clientfrom Vendor A
Clientfrom Vendor B
Serverfrom Vendor C Server
from Vendor D
Database Database
1-5704/21/23 00:11
Remote Database Access (RDA) Model
RDA Service Provider
RDA Client RDA ServerDatabase
RDA ClientInterface
RDA ServerInterface
Interface between client and service provider can operate in synchronous or asynchronous modes
1-5804/21/23 00:11
Example Three-Tier Architecture
Client:User Interface Processing
Server:Local DBMS
NetworkIntermediate:Distributed Processor,Business Rules,Logic
1-5904/21/23 00:11
Object-based Interoperability
Object Request Broker
Client
Object
Server
Object
Example Object Request Broker: Object Management Group’s (OMG) CORBA (Common Object Request Broker Architecture)
1-6004/21/23 00:11
Javasoft’s RMI (Remote Method Invocation)
RMI Business Objects
Clients Java-based Servers
1-6104/21/23 00:11
Microsoft’s Open Database Connectivity
DBMSVendor B
Microsoft’s ODBC
DBMSVendor A
ODBC Driverfor DBMS A
DatabaseA
Database B
ODBC Driverfor DBMS B
MicrosoftApplication C Microsoft
Application D
1-6204/21/23 00:11
Overview: Migrating Legacy Systems
0 Many of the current systems and applications may become obsolete
0 Need an approach to migrate these systems to new architectures
0 Evolutionary approach: incremental transition of today's systems into more flexible systems
0 Extensible system architecture ultimately replaces today's hardware and software architecture
0 Open systems approach, standards
1-6304/21/23 00:11
Migrating Legacy Database and Applications
0 Build business model in a sub-domain and relate data to existing databases and systems.
0 Wrap existing systems to provide access as needed.0 Incorporate middle tier services and begin migrating
workflow.0 Gradually migrate business logic and rely on business
objects for end-user systems.
1-6404/21/23 00:11
Migrating Business Logic
container
middle tier
businessobjects
data entry
AirspaceAirspace
Airspace2Airspace3
Airspace4
Airspace5
time turnpoints Elevations
xx,xx,xx xx,xx,xxnn:nnnn:nn
nn:nn
nn:nn
nn:nn
xx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xx
xx,xx,xx xx,xx,xxEtc.......
...
...
...
visualization
client tier
blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,
word processing
existing databases
AirspaceAirspace
Airspace2Airspace3
Airspace4
Airspace5
time turnpoints Elevations
xx,xx,xx xx,xx,xxnn:nnnn:nn
nn:nn
nn:nn
nn:nn
xx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xxxx,xx,xx xx,xx,xx
xx,xx,xx xx,xx,xxEtc.......
...
...
...
server tier
existing systems
existing processes
blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,blah,
EDI Artifacts
distributionservices
CO
RB
AC
OR
BA
business logic
1-6604/21/23 00:11
Application vs. Database Migration
0 Extract schema from the legacy code
- Use reengineering tools
0 Extract metadata associated with the data
0 Deal with incomplete data and fill in the gaps
0 Build schemas in the target system from the extracted schema
0 Build the database
1-6704/21/23 00:11
Example: Legacy Migration using Objects
STOMPS
ACMJMPP
AUTODIN
JANAP128mesg:USMTF,ASCIItextJANAP128
mesg:USMTF,ASCIItext
CSP
Mesg:USMTF
IMOM
REM
JOTS
CAFMS
BASS
CIDB
ICM
CAFWSP
CTAPSRemote
ADS
RAAP
JMEMWO;Cmd
ATO; USMTFCMS
CMS
JMAPSWCCSData;USMTF
WCCS Data;JQL
WCCS Data;SQL
WCCS
JDSS
TACREP,ABSTATACSAMSTAT;USMTF - X.25
ATO, ACO;USMTF - X.25
ATO Data;SQL
ACO Data;SQL
APS
IPL
OB; IDBTF
Stored Procedures; SQLINTEL mesg;
USMTF, ASCII text
TNL, WO; IDBTF
IDBTFParser
Loader
SQL
IDBTF
TNL, WO;IDBTF
SQL
SQL
SQL
SQL
SQL
ATO SQL
WX Data; ASCII text
ATO; SQL
SQL
Logistics Data; SQL
ACO; USMTF
ACO; Text
ACO; Text
EOB; SQLATO; USMTF
SB, CS
WX Data;ASCII text - AWN
UGDF
ATO;USMTF - X.25
SQL
UFLINKCI
CMP
UMSG
CTAPS - Contingency Theater Automated Planning System
Application Interfaces Domain Interfaces Common Facilities
Object Services
Object Request Broker
TargettingPlanning/ATO
Collection Mgt...
MCG&IMessagingWeather...
User InterfaceCompound Data
System & Task Mgt...
SecurityConcurrency
Transactions...
1-6804/21/23 00:11
Example Lessons Learned:Experience with CORBA
0 CORBA provides an evolvable system integration platform
0 CORBA provides a path for legacy migration
- Applications can be coarsely wrapped as CORBA objects, providing 100% reuse
=Wrapping is a relatively straight forward technique
=Need to dig to uncover hidden dependencies
=Does not address duplication of common functions
- Applications can be reengineered to replace duplicated functions with CORBA based common services
=Substantially more difficult than coarse wrapping
1-6904/21/23 00:11
Example: Migration using Object for Real-time Systems
Technology
provided by
Project
Technology
provided by
Project
Hardware
Display Processor
&Refresh
Channels
Consoles(14)
Navigation
Sensors
Data LinksData Analysis Programming
Group (DAPG)
FutureApp
FutureApp
FutureApp
Multi-SensorTracks
SensorDetections
Real Time Operating System
MSIApp
DataMgmt. Data
Xchg.
Infrastructure Services
Interface to DAPG, etc., will be simulated for project demonstration
Interface to DAPG, etc., will be simulated for project demonstration
1-7004/21/23 00:11
Current Status and Directions
0 Developments- Several prototypes and some commercial products
- Tools for schema integration and transformation
- Standards for interoperable database systems
0 Challenges being addressed- Semantic heterogeneity
- Autonomy and federation
- Global transaction management
- Integrity and Security
0 New challenges
- Scale
- Web data management
1-7104/21/23 00:11
Information and Security Analytics
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Lecture #1
Unit #4
Data Warehousing
May 28, 2010
1-7204/21/23 00:11
Outline
0 Data Warehousing
0 Data Warehouse to Data Mining
1-7304/21/23 00:11
What is a Data Warehouse?
0 A Data Warehouse is a:
- Subject-oriented
- Integrated
- Nonvolatile
- Time variant
- Collection of data in support of management’s decisions
- From: Building the Data Warehouse by W. H. Inmon, John Wiley and Sons
0 Integration of heterogeneous data sources into a repository
0 Summary reports, aggregate functions, etc.
1-7404/21/23 00:11
Example Data Warehouse
OracleDBMS forEmployees
SybaseDBMS forProjects
InformixDBMS forMedical
Data Warehouse:Data correlatingEmployees WithMedical Benefitsand Projects
Could beany DBMS; Usually based on the relational data model
UsersQuerythe Warehouse
1-7504/21/23 00:11
Some Data Warehousing Technologies
0 Heterogeneous Database Integration
0 Statistical Databases
0 Data Modeling
0 Metadata
0 Access Methods and Indexing
0 Language Interface
0 Database Administration
0 Parallel Database Management
1-7604/21/23 00:11
Data Warehouse Design
0 Appropriate Data Model is key to designing the Warehouse
0 Higher Level Model in stages
- Stage 1: Corporate data model
- Stage 2: Enterprise data model
- Stage 3: Warehouse data model
0 Middle-level data model
- A model for possibly for each subject area in the higher level model
0 Physical data model
- Include features such as keys in the middle-level model
0 Need to determine appropriate levels of granularity of data in order to build a good data warehouse
1-7704/21/23 00:11
Distributing the Data Warehouse
0 Issues similar to distributed database systems
Distributed Warehouse
Central Bank
Branch A Branch B
CentralWarehouse
CentralBank
Branch A Branch B
CentralWarehouse
Branch BWarehouse
Branch AWarehouse
Non-distributed Warehouse
1-7804/21/23 00:11
Multidimensional Data Model
Project Name
Project Leader
Project Sponsor
Project Cost
Project Duration
Dollars
Pounds
Yen
Years
Months
Weeks
Project Name
Project Leader
Project Sponsor
Project Cost
Project Duration
Dollars
Pounds
Yen
Years
Months
Weeks
1-7904/21/23 00:11
Indexing for Data Warehousing
0 Bit-Maps
0 Multi-level indexing
0 Storing parts or all of the index files in main memory
0 Dynamic indexing
1-8004/21/23 00:11
Metadata Mappings
Metadatafor Data source A
Metadatafor Data source B
Metadatafor Data source C
Metadata for Mappings and Transformations
Metadata for Mappings and Transformations
Metadata for Mappings and Transformations
Metadatafor the Warehouse
Metadatafor Data source A
Metadatafor Data source B
Metadatafor Data source C
Metadata for Mappings and Transformations
Metadata for Mappings and Transformations
Metadata for Mappings and Transformations
Metadatafor the Warehouse
1-8104/21/23 00:11
Data Mining
Data MiningKnowledge Mining
Knowledge Discoveryin Databases
Data Archaeology
Data Dredging
Database MiningKnowledge Extraction
Data Pattern Processing
Information Harvesting
Siftware
The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data, often previously unknown, using pattern recognition technologies and statistical and mathematical techniques(Thuraisingham 1998)