chapter 10 data and knowledge management. agenda information processing database data administrator...

67
Chapter 10 Data and Knowledge Management

Upload: violet-daniels

Post on 29-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Chapter 10

Data and Knowledge Management

Page 2: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Agenda

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 3: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data

• Set of discrete, objective facts about events

• Business - structured records of transactions

• Little relevance or purpose

Page 4: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Information

• Message with sender and receiver

• Meant to change way receiver perceives something

• Have an impact on his judgment / behavior

Page 5: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Processing

• Contextualize - why was data gathered?

• Categorize - what are its key components?

• Calculate - analyze mathematically

• Condense - summarize in more concise form

Page 6: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Information Processing

• Compare - in kind and in time

• Consequences - how used in decisions / actions

• Connections - relation to other information

• Conversation - what other people think about this information

Page 7: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Agenda

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 8: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Database

• Element

• Types

• Structure

• Models

• Creation

• Topology

Page 9: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Element

• Bit, byte, field, record, file, database

• Entity, attribute, key field

• Relation

• Class, object

Page 10: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Database Types

• Business database• Geographical information database• Knowledge database / deductive database• Multimedia database• Data warehouse• Data marts• Multimedia and hypermedia database• Object-oriented database

Page 11: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Database Structure

• Data definition language– Schema & subschema

• Data Manipulation language– Structured Query Language (SQL)– Query By Example (QBE)

• Data dictionary

Page 12: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Database Models• Hierarchical

– One to many– TPS or routine MIS

• Network– Many to many– TPS or routine MIS

• Relational– Normalization– Ad hoc reports or DSS

• Object-oriented– E-commerce

Page 13: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Database Creation

• Conceptual design– Logical view– Entity-relationship (ER) diagram– Normalization

Page 14: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Entity Relationship Diagram

• Entity: object or concept• Relationship: meaning association between

objects• Attribute: property of an object

– Simple & Composite

– Single-valued & multi-valued

– Derived

• Key– Primary key

– Foreign key

Page 15: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Normalization

• A technique for identifying a true primary key for a relation

• Types– First normal form: not repeating group– Second normal form: every non-primary-key

attribute is fully functionally dependent on the entire primary key

– Third normal form: no transit dependency

Page 16: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Structured Query Language

• Select

• Join

Page 17: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SQL DML - SELECT

• SELECT [DISTINCT|ALL] {* | [col-expr [AS newname]][,...]

FROM table-name [alias] [,...] [WHERE condition] [GROUP by colm [, colm] [HAVING condition]] ORDER BY colm [, colm]

Page 18: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SQL DML - SELECT

• SELECT attributes (or calculations: +, -, /, *)

FROM relation

• SELECT DISTINCT attributes FROM relation

Page 19: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples

• SELECT stunameFROM student;

• SELECT stuid, stuname, creditFROM student;

• SELECT stuid, stuname, credit+10FROM student;

• SELECT DISTINCT majorFROM student;

Page 20: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SQL DML - SELECT

• SELECT attributes (or * wild card) FROM relation WHERE condition

Page 21: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples

• SELECT *FROM student;

• SELECT stuname, major, creditFROM student

WHERE stuid = ‘S114’;• SELECT *

FROM facultyWHERE dept = ‘MIS’;

Page 22: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SELECT - WHERE condition

• AND OR• NOT IN• NOT IN BETWEEN• IS NULL IS NOT NULL• LIKE '%' multiple characters • LIKE ‘_’ single characters

Page 23: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples• SELECT *

FROM facultyWHERE dept = ‘MIS’

AND rank = ‘full professor’;• SELECT *

FROM facultyWHERE dept = ‘MIS’ OR

rank = ‘full professor’;• SELECT *

FROM facultyWHERE dept = ‘MIS’

NOT rank = ‘full professor’;

Page 24: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT *FROM class

WHERE room LIKE ‘B_S%’;

• SELECT *FROM class

WHERE room NOT LIKE ‘BUS%’;

• SELECT productid, productnameFROM inventory

WHERE onhand BETWEEN 50 and 100;

Page 25: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT companyid, companynameFROM companyWHERE companyname BETWEEN

‘G’ AND ‘K’;

• SELECT productid, productnameFROM inventoryWHERE onhand NOT BETWEEN

50 and 100;

• SELECT companyid, companynameFROM companyWHERE companyname NOT

BETWEEN ‘G’ AND ‘K’;

Page 26: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT facnameFROM facultyWHERE dept IN (‘MIS’, ‘ACT’);

• SELECT facnameFROM facultyWHERE rank NOT IN (‘assistant’, ‘lecture’);

• SELECT customernameFROM customerWHERE emailadd IS NOT NULL;

Page 27: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT customernameFROM customer

WHERE creditlimit IS NULL;

Page 28: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SELECT - aggregate functions

• COUNT (*)• COUNT• SUM• AVG• MIN• MAX

Page 29: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples

• SELECT COUNT(*)FROM student;

• SELECT COUNT(major)FROM student;

• SELECT COUNT(DISTINCT major)FROM student;

Page 30: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT COUNT(stuid), SUM(credit), AVG(credit), MAX(credit),

MIN(credit)FROM student;

Page 31: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SELECT - GROUP

• GROUP BY• HAVING

Page 32: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples

• SELECT major, AVG(credit)FROM studentGROUP BY major

HAVING COUNT(*) > 2;• SELECT course#, COUNT(stuid)

FROM enrollmentGROUP BY course#

HAVING COUNT(*) > 2;

Page 33: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT major, AVG(credit)FROM student

WHERE major IN (‘MIS’, ‘ACT’) GROUP BY major

HAVING COUNT(*) > 2;

Page 34: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SELECT - ORDER BY

• ORDER BY• ORDER BY ... DESC

Page 35: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples

• SELECT facname, rank FROM faculty

ORDER BY facname;

• SELECT facname, rank FROM faculty

ORDER BY rank DESC, facname;

Page 36: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SELECT - JOIN Tables

• Multiple tables in FROM clause• MUST have join conditions!!!

Page 37: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Examples

• SELECT stuname, gradeFROM student, enrollmentWHERE student.stuid =

enrollment.stuid;

Page 38: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT enrollment.course#, stuname, major

FROM class, enrollment, studentWHERE class.course# =

enrollment.course#AND

enrollment.stuid = student.stuid

AND facid = ‘F114’ ORDER BY enrollment.course#;

Page 39: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

SUBQUERY, EXIST, NOT EXIST

• SELECT s.stuname, majorFROM student s

WHERE EXIST(SELECT *

FROM enrollment e WHERE

s.stuid = e.stuid);

Page 40: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

• SELECT s.stuname, majorFROM student s

WHERE NOT EXIST(SELECT *

FROM enrollment e WHERE

s.stuid = e.stuid);

Page 41: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Database Creation• Physical design

– Physical view– Data topology (organization)

• Centralized• Distributed database

– Replicated database– Partitioned

• Organization & access method– Sequential file– Indexed sequential file– Direct or random file

• Security– Logical, physical, and transmitting

Page 42: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Selection Criteria

• User’ needs (type of application)• Compatibility• Portability• Reliability• Cost• Feature• Performance• Vendor’s support• Others?

Page 43: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Agenda

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 44: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Administrator

• Clean up data definitions

• Control shared data

• Manage distributed data

• Maintain data quality

Page 45: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Clean Up Definitions

• Synonyms / aliases

• Standard data definitions – Names and formats

• Data Dictionary– Active– Integrated

Page 46: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Control Shared Data

• Local - used by one unit

• Shared - used by two or more activities

• Impact of proposed program changes on shared data

• Program-to-data element matrix

Page 47: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Manage Distributed Data

• Geographically dispersed– Whether shared data or not

• Different levels of detail– Different management levels

Page 48: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Maintain Data Quality

• Put owners in charge of data– Verify data accuracy and quality

• Purge old data

Page 49: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Agenda

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 50: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

The DBMS

Data Base Management System: software that permits a firm to:– Centralize data– Manage them efficiently– Provide access to applications

• Such as payroll, inventory

Page 51: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

DBMS Components

• Data Definition Language (DDL)

• Data Manipulation Language (DML)

• Inquiry Language (IQL)

• Teleprocessing Interface (TP)

Page 52: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Definitions

• Views:– Physical - how stored– Logical - how viewed and used by users

• Schema - Overall logical layout of records and fields in a database

• Subschema: Individual user’s logical portion of database (view)

Page 53: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Agenda

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 54: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Distributing Data

• Centralized files

• Fragemented files– Distribute data without duplication– Users unaware of where data located

Page 55: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Distributing Data

• Replicated files– Data duplicated – One site has master file– Problem with data synchronization

• Decentralized files– Local data autonomy

Page 56: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Distributing Data

• Distributed files– Client / server systems– Stored centrally– Portion downloaded to workstation– Workstation can change data– Changes uploaded to central computer

Page 57: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Agenda

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 58: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Warehousing

• Collect large amounts of data from multiple sources over several years

• Classify each record into multiple categories– Age– Location– Gender

Page 59: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Warehousing

• Rapidly select and retrieve by multiple dimensions– All females in Chicago under 25 years of

age

• Provide tailored, on-demand reports

• Data mart: a replicated subset of the data warehouse– A functional or regional area

Page 60: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Mining

• Fitting models to, or determining patterns from, warehoused data

• Purposes:– Analyze large amount of data– Find critical points of knowledge– Perform automatic analyses

Page 61: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Mining Terms

• Data Visualization

• Drill-down Analysis– Hierarchical structure– Leads to increasing level of detail

• Expert System (ES) methodology– e.g., neural networks

Page 62: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Applications

• Finance - fraud detection

• Stock Market - forecasting

• Real estate - property evaluation

• Airlines - customer retention

• Retail - customer targeting

Page 63: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Data Mining Example

• What type customers are buying specific products?

• When are the times customers will most likely shop?

• What types of products can be sold together?

Page 64: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Points to Remember

• Information processing

• Database

• Data Administrator

• The DBMS

• Distributing data

• Data warehousing and data mining

Page 65: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Discussion Questions

• How can a database help an organization?

• Why normalization is very important for building a database?

• Do you see any problem of the database in your organization?

Page 66: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Discussion Questions• What kind of database model is most suitable for

– School?– Department store?– Police?

• Some organizations are hesitant to distribute data. These organizations feel that they may lose control.– Do they lose control? Why?– Could you suggest a “good” tactic?

• Could Data Mining pose a threat to individual privacy?– Why or why not?– If so, how can we mitigate that threat?– Do the advantages outweigh the disadvantages?

Page 67: Chapter 10 Data and Knowledge Management. Agenda Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data

Assignment

• Review chapters 10

• Read chapter 8, 9, and 11

• Group assignment

• Research paper