bus 516 - dr. ummaha hazra @nsu · personnel, payroll, and benefits, the corporation could ......

26
BUS 516 Foundations of Business Intelligence: Databases and Information Management

Upload: others

Post on 13-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

BUS 516

Foundations of Business Intelligence: Databases and Information Management

Page 2: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Data and Information Quality

• Why do we need accurate and timely information?

– Accurate information is free of errors.

– Information is timely when it is available to decision makers when it is needed.

– Information is relevant when it is useful and appropriate for the types of work and decisions that require it

Page 3: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Data Organization

• A bit represents the smallest unit of data a computer can handle • A group of bits, called a byte, represents a single character, which

can be a letter, a number, or another symbol • A grouping of characters into a word, a group of words, or a

complete number (such as a person’s name or age) is called a field • A group of related fields, such as the student’s name, the course

taken, the date, and the grade, comprises a record • A group of records of the same type is called a file • A group of related files makes up a database

Page 4: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Data Organization

Page 5: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Traditional File Management

• What are the problems with traditional file management?

– Data redundancy

– Inconsistency

– Program data dependence

– Lack of flexibility

– Poor security

– Lack of data sharing and availability

Page 6: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Database Approach

• A database is a collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data.

• Rather than storing data in separate files for each

application, data are stored so as to appear to users as being stored in only one location

• Instead of a corporation storing employee data in

separate information systems and separate files for personnel, payroll, and benefits, the corporation could create a single common human resources database

Page 7: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Database Management Systems (DBMS)

• A DBMS is software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs

• The DBMS acts as an interface between application programs and

the physical data files • When the application program calls for a data item, the DBMS finds

this item in the database and presents it to the application program.

• Using traditional data files, the programmer would have to specify

the size and format of each data element used in the program and then tell the computer where they were located

Page 8: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

DBMS

Page 9: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

DBMS Solves the Problems of the Traditional File Environment

• Reduces data redundancy and inconsistency by minimizing isolated files in which the same data are repeated

• Even if the organization maintains some redundant data, using a DBMS

eliminates data inconsistency because the DBMS can help the organization ensure that every occurrence of redundant data has the same values

• DBMS uncouples programs and data, enabling data to stand on their own • Access and availability of information will be increased and program

development and maintenance costs reduced because users and programmers can perform ad hoc queries of data in the database

• DBMS enables the organization to centrally manage data, their use, and

security

Page 10: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Relational DBMS

• Relational databases represent data as two-dimensional tables (called relations). Tables may be referred to as files. Each table contains data on an entity and its attribute

• Microsoft Access is a relational DBMS for desktop systems, whereas DB2, Oracle Database, and Microsoft SQL Server are relational DBMS for large mainframes and midrange computers. MySQL is an open-source DBMS, and Oracle Database Lite is a DBMS for small handheld computing devices

Page 11: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Relational DBMS

Page 12: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

NoSQL Non-Relational Databases

• Non-relational database management systems use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down. • They are useful for accelerating simple queries against large volumes of structured and unstructured data, including Web, social media, graphics, and other forms of data that are difficult to analyze with traditional SQL-based tools • Oracle NoSQL, Amazon’s SimpleDB

Page 13: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Databases in the Cloud

• Amazon and other cloud computing vendors provide relational database services as well as NoSQL non-relational database services.

• Cloud-based data management services have special appeal for Web-focused start-ups or small to medium-sized businesses seeking database capabilities at a lower price than in-house database products.

Page 14: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Database and Business Performance

• Businesses use their databases to keep track of basic transactions

• Databases provide information that will help the

company run the business more efficiently, and help managers and employees make better decisions

• If a company wants to know which product is the most popular or who is its most profitable customer, the answer lies in the data

Page 15: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Big Data

• Most data collected by organizations used to be transaction data that could easily fit into rows and columns of relational database management systems.

• Recently, there has been an explosion of data from Web traffic, e-mail messages, and social media content (tweets, status messages), as well as machine-generated data from sensors (used in smart meters, manufacturing sensors, and electrical meters) or from electronic trading systems.

• These data may be unstructured or semi-structured and thus not suitable for relational database products that organize data in the form of columns and rows.

• We now use the term big data to describe these datasets with volumes so huge that they are beyond the ability of typical DBMS to capture, store, and analyze.

• Big data doesn’t refer to any specific quantity, but usually refers to data in the petabyte and exabyte range—in other words, billions to trillions of records, all from different sources.

Page 16: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Business Value of Big Data

• Businesses are interested in big data because they can reveal more patterns and interesting anomalies than smaller data sets, with the potential to provide new insights.

• However, to derive business value from these data, organizations need new technologies and tools capable of managing and analyzing non-traditional data along with their traditional enterprise data.

Page 17: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Business Intelligence Infrastructure: Data Warehouse & Data Mart

• A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company

• The data warehouse makes the data available for anyone to access as

needed, but it cannot be altered. • Many firms use intranet portals to make the data warehouse information

widely available throughout the firm

• A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users

• A company might develop marketing and sales data marts to deal with customer information

Page 18: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Business Intelligence Infrastructure: Hadoop

• Relational DBMS and data warehouse products are not well-suited for organizing and analyzing big data or data that do not easily fit into columns and rows used in their data models.

• For handling unstructured and semi-structured data in vast quantities, as well as structured data, organizations are using Hadoop.

• Hadoop is an open source software framework managed by the Apache Software Foundation that enables distributed parallel processing of huge amounts of data across inexpensive computers.

• It breaks a big data problem down into sub-problems, distributes them among up to thousands of inexpensive computer processing nodes, and then combines the result into a smaller data set that is easier to analyze.

• You’ve probably used Hadoop to find the best airfare on the Internet, get directions to a restaurant, do a search on Google, or connect with a friend on Facebook.

Page 19: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Business Intelligence Infrastructure: In-Memory Computing

• Another way of facilitating big data analysis is to use in-memory computing, which relies primarily on a computer’s main memory (RAM) for data storage. (Conventional DBMS use disk storage systems.)

• In-memory processing makes it possible for very large sets of data,

amounting to the size of a data mart or small data warehouse, to reside entirely in memory.

• Complex business calculations that used to take hours or days are able to be completed within seconds, and this can even be accomplished on handheld devices.

• Powerful high-speed processors, multicore processing, and falling computer memory prices make in-memory computing possible.

Page 20: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Analytical Tools

• Online Analytical Processing (OLAP) – OLAP supports multidimensional data analysis, enabling

users to view the same data in different ways using multiple dimensions

– Each aspect of information—product, pricing, cost, region,

or time period—represents a different dimension. – A product manager could use a multidimensional data

analysis tool to learn how many washers were sold in the East in June, how that compares with the previous month and the previous June, and how it compares with the sales forecast

Page 21: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Tools for Business Intelligence

• Data mining provides insights into corporate data by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior

• The patterns and rules are used to guide decision making and forecast

the effect of those decisions.

• The types of information obtainable from data mining include associations, sequences, classifications, clusters, and forecasts

• Predictive analytics use data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events, such as the probability a customer will respond to an offer or purchase a specific product

Page 22: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Text mining & Web Mining

• Text mining tools are now available to help businesses analyze textual data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information

• The discovery and analysis of useful patterns and information from the World Wide Web is called Web mining – Businesses might turn to Web mining to help them understand

customer behavior, evaluate the effectiveness of a particular Web site, or quantify the success of a marketing campaign

• Web mining looks for patterns in data through content mining, structure mining, and usage mining

Page 23: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Web Mining

• Content mining: – Web content mining is the process of extracting knowledge

from the content of Web pages, which may include text, image, audio, and video data

• Structure mining:

– Web structure mining extracts useful information from the links embedded in Web documents.

– For example, links pointing to a document indicate the popularity of the document, while links coming out of a document indicate the richness or perhaps the variety of topics covered in the document

Page 24: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Web Mining

• Usage mining: – Web usage mining examines user interaction data

recorded by a Web server whenever requests for a Web site’s resources are received.

– The usage data records the user’s behavior when the user

browses or makes transactions on the Web site and collects the data in a server log.

– Analyzing such data can help companies determine the

value of particular customers, cross marketing strategies across products, and the effectiveness of promotional campaigns.

Page 25: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Information Policy and Data Governance

• Firm’s data are an important resource • An information policy specifies the organization’s rules for

sharing, disseminating, acquiring, standardizing, classifying, and inventorying information

• Data governance deals with the policies and processes for managing the availability, usability, integrity, and security of the data employed in an enterprise

• Data governance emphasizes on promoting privacy, security, data quality, and compliance with government regulations

Page 26: BUS 516 - Dr. Ummaha Hazra @NSU · personnel, payroll, and benefits, the corporation could ... database capabilities at a lower price than in-house database products. Database and

Ensuring Data Quality

• Data that are inaccurate, untimely, or inconsistent with other sources of information lead to incorrect decisions, product recalls, and financial losses.

• Inaccurate data in criminal justice and national security databases

might even subject you to unnecessarily surveillance or detention • Analysis of data quality often begins with a data quality audit,

which is a structured survey of the accuracy and level of completeness of the data in an information system

• Data cleansing, also known as data scrubbing, consists of activities

for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant