chapter 6: foundations of business intelligence - databases and information management dr. andrew p....

Post on 29-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 6:Foundations of Business

Intelligence - Databases and Information Management

Dr. Andrew P. Ciganek, Ph.D.

File Organization Concepts

• Computer system uses hierarchies– Database: Group of related files

– File: Group of records of same type

– Record: Group of related fields• Record: Describes an entity (person, place, thing)

– Field: Group of characters• Attribute: Characteristic describing the entity

– e.g., Date or Grade belong to entity COURSE

The Data Hierarchy

Problems With Traditional File Environment

• Files maintained by different departments– Data redundancy and inconsistency

• Data redundancy: Duplicate data in multiple files• Data inconsistency: Same attribute, different values

– Program-data dependence• Changes in program requires changes to data

accessed by program

Traditional File Processing

The Database Approach to Data Management

• Database– Data organized to serve many applications by

centralizing data and controlling redundant data• Database management system (DBMS)

– Separates logical and physical views of data– Solves problems of traditional file environment

• Controls redundancy• Eliminated inconsistency• Enables central management and security

Human Resources Database with Multiple Views

Relational DBMS

• Data as 2-dimension tables called relations or files• Each table contains data on entity and attributes• Table: Grid of columns and rows

– Rows: Records for different entities

– Columns: Represents attribute (field) for entity

– Key field: Field used to uniquely identify each record• Primary key: Field in table used for key fields• Foreign key: Primary key used in second table as

look-up field to identify records from original table

Relational Database Tables

Relational Database Tables

Operations of a Relational DBMS

• Basic operations to develop useful sets of data– SELECT: Creates subset of data of all records that

meet stated criteria

– JOIN: Combines relational tables to provide more information than available in individual tables

Basic Relational DBMS Operations

Select Part_Number = 137 or 150, Join by Supplier_Number

Example of an SQL Query

• Select Statement: Query data for specific info• Conditional Selection: ID which rows of a table are

displayed, based on criteria contained in the WHERE clause

• Joining Two Tables: Used to combine data from two or more tables and display the results

An Access Query

Designing Databases

• Design process identifies– Relationships among data elements, redundant

database elements

– Most efficient way to group data elements to meet business requirements, needs of app programs

• Normalization– Minimize redundant data elements

Normalization of Order

Using Databases to Improve Performance and Decision Making

• For very large databases and systems, special capabilities and tools are required for analyzing large quantities of data and for accessing data from multiple systems– Data warehousing

– Data mining

Database Warehouses

• Store current and historical data from many core operational transaction systems

• Consolidates and standardizes information for use across enterprise, but data cannot be altered

• Provide query, analysis, and reporting tools

Components of a Data Warehouse

Business Intelligence

• Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions– e.g., Harrah’s Entertainment analyzes customers to

develop gambling profiles and identify most profitable customers

• Principle tools include– Software for database query and reporting

– Online analytical processing (OLAP)

– Data mining

Online Analytical Processing (OLAP)

• Supports multidimensional data analysis– Gives first glimpse of possible relationships

• Enables viewing data using multiple dimensions– Each aspect of information (product, pricing, cost,

region, time period) is different dimension• e.g., How many washers sold in East in June?

• OLAP enables rapid, online answers to ad hoc queries

Multidimensional Data Model

Data Mining

• More discovery driven than OLAP– Finds hidden patterns, relationships in large dbs

– Infers rules to predict future behavior

– The patterns and rules are used to guide decision making and forecast the effect of those decisions

• Popularly used to provide detailed analyses of patterns in customer data for 1:1 marketing campaigns or to identify profitable customers

Using Databases to Improve Performance and Decision Making

• Predictive analysis– Uses data mining techniques, historical data, and

assumptions about future conditions to predict outcomes of events

• e.g., Probability a customer will respond to an offer or purchase a specific product

• Data mining seen as challenge to individual privacy– Combines information from many diverse sources to

create detailed “data image” about each of us • e.g., income, driving habits, hobbies, families, and

political interests

Text Mining:For’ and ‘Against’ Exercise

1. Read the article and the following statement.

2. Summarize the best evidence you can give FOR, or in support of, the statement.

3. Summarize the best evidence you can give AGAINST the statement.

4. Include only accurate evidence

• The benefits of text mining greatly outweigh the costs.

Web Mining

• Discovery and analysis of useful patterns and information from WWW– e.g., to understand customer behavior, evaluate

effectiveness of Web site, etc.

• Web content mining– Knowledge extracted from content of Web pages

• Web structure mining– e.g., links to and from Web page

• Web usage mining– User interaction data recorded by Web server

Managing Data Resources

• Establishing an information policy– Information policy: Specifies firm’s rules, procedures,

roles for sharing, standardizing data

– Data administration: Responsible for specific policies and procedures; data governance

– Database administration: Database design and management group responsible for defining, organizing, implementing, maintaining database

Ensuring Data Quality

• More than 25% critical data in Fortune 1000 company databases is inaccurate or incomplete– Before new database in place, need to identify and

correct faulty data and establish better routines for editing data once database in operation

– Most data quality problems stem from faulty input

Managing Data Resources

• Data quality audit– Structured survey of the accuracy and level of

completeness of the data in an IS

• Data cleansing– Detecting, and correcting data that are incorrect,

incomplete, improperly formatted, or redundant

– Enforces consistency among different sets of data from separate IS

top related