strategic information systems iv stv401t / b btip05 /...

15
Corporate Affairs and Marketing (CA&M) Brand and Event Management 1 STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS LECTURE: 03 (A) DATA MINING By: Dr. Tendani J. Lavhengwa [email protected]

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

1

STRATEGIC INFORMATION SYSTEMS IV

STV401T / B

BTIP05 / BTIX05 - BTECH

DEPARTMENT OF INFORMATICS

LECTURE: 03 – (A)

DATA MINING

By: Dr. Tendani J. Lavhengwa

[email protected]

Page 2: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

Inspirational Quotes

• My personal quote:

“Always be a thought ahead. Do not fear the blank page, everything started

somewhere”

• Quotes to consider as inspiration:

"We're entering a new world in which data may be more important than software" ~ Tim

O'Reilly

• Your quotes?

???

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Page 3: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

1. Literature in context and evolution

2. Data Mining - multiple definitions

3. Key terms and in definition

4. Goal of DM and Strategic Competitive Advantage

5. Blending DM with Confluence from multiple disciples

6. Data in Data Mining

7. How Data Mining works?

8. Data Mining applications and derived value

9. Databases vs Data Mining processing and Query examples

#. Start-up Items to discuss

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Page 4: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

1. Literature in context and evolution

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

In 1999, Dr Arno Penzias

[Nobel laureate and former chief scientist: Bell Labs)…

• “Data mining will become much more important and companies will throw away nothing about nothing about customers because it will be so valuable. If you’re not doing this, you’re out of business”

In 2006, Thomas Davenport

[Harvard Business Review] argued…

• “the latest strategic weapon for companies is analytical decision making””

Page 5: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

1. Literature in context and evolution

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Page 6: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

2. Data Mining - multiple definitions

Def...1:

• Discovering or ‘mining’ knowledge from large amounts of data…(Turban, 2011)

Def…2:

• knowledge discovery in databases:

• –Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

Def…3:

• advanced methods for exploring and modelling relationships in large amount of data

Def…4:

• The non-trivial extraction of implicit, previously unknown, and potentially useful

• information from large databases.

Def…5:

• the use of automated data analysis techniques to uncover relationships among data items

Def…6:

• finding hidden information in a database

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Page 7: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

3. Key terms and in definition

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

• a field in an item • a function that maps an expression into a measure space

• an expression that describes a subset of facts

• a set of facts (items), usually stored in a database

Data: Pattern:

Attribute: Interestingness:

Page 8: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

4. Goal of DM and Strategic Competitive Advantage

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Decision Making Chain

Decision Making Chain

Pre-Processing

Data Mining

[info]

Decision Making

Goal of Data Mining

Page 9: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

4. Blending DM with Confluence from multiple disciples

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Data Mining

confluence

Database Technology

Statistics

Machine Learning

Visualisation

Information Science

Other Disciples…

Page 10: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

6. Data in Data Mining

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

Page 11: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

7. How Data Mining works?

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

DM… seeks to identify four major patterns

DM… builds models to identify patterns among the attributes presented in the dataset

Associations Predictions

Clusters Sequential patterns /

relationships

Note: discussed later in: Mining Algorithms and technologies

Page 12: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

7. How Data Mining works?

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

• -establishing relationships among items that occur together

• - example: beer diapers going together in a market-basket analysis

Association:

• -the act of telling about the future occurrence

• -example: predicting the winner of a soccer match or a temperature for a particular day

Prediction / forecasting:

• -finding groups of entities with similar characteristics

• -example: -assigning customers into different income brackets and past purchase behaviour

Clustering:

• -finding time-based associations

• -discover time-ordered events

• -example: predicting that a customer who opens vehicle finance account will subsequently require an insurance policy (account). Later a minor scratches and dents cover

Sequence relationships / discovery

Major types of patterns Association, example

Page 13: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

8. Data Mining applications and derived value

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

–auto insurance:

• detect a group of people who stage accidents to collect on insurance

–money laundering:

•detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network)

–medical insurance:

•detect professional patients and ring of doctors and ring of references

Detecting inappropriate medical treatment

•–Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).

•Detecting telephone fraud

•–Telephone call model: destination of the call, duration, time of day or week. Analyse patterns that deviate from an expected norm.

•–British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.

•Retail

•–Analysts estimate that 38% of retail shrink is due to dishonest employees.

Page 14: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

9. Databases vs Data Mining processing and Query examples

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)

* D

B *

•Query

•– Well defined

•– SQL

•Data

•– Operational data

•Output

•– Precise

•– Subset of database

* D

M *

•Query

•– Poorly defined

•– No precise query language

•Data

•– Not operational data

•Output

•– Fuzzy

•– Not a subset of database

Database

•– Find all credit applicants with first name of Sane.

•– Identify customers who have purchased more than Rs.10,000 in the last month.

•– Find all customers who have purchased milk

Data Mining

•– Find all credit applicants who are poor credit risks. (classification)

•– Identify customers with similar buying habits. (Clustering)

•– Find all items which are frequently purchased with milk. (association rules)

Page 15: STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / …lavhengwaonline.weebly.com/uploads/8/4/7/8/84787622/dr... · 2018-02-17 · Corporate Affairs and Marketing (CA&M) Brand and

Corporate Affairs and Marketing (CA&M) Brand and Event Management

15

QUESTIONS & ENQUIRIES

[email protected]

---

LECTURE: 05(A) - DATA MINING (DM) / KNOWLEDGE DISCOVERY FROM DATA (KDD)