business intelligence & process modellingliacs.leidenuniv.nl/~takesfw/bipm/lecture3.pdf ·...

100
Business Intelligence & Process Modelling Frank Takes Universiteit Leiden Lecture 3 — BI & Descriptive Analytics BIPM — Lecture 3 — BI & Descriptive Analytics 1 / 84

Upload: others

Post on 05-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Business Intelligence & Process Modelling

Frank Takes

Universiteit Leiden

Lecture 3 — BI & Descriptive Analytics

BIPM — Lecture 3 — BI & Descriptive Analytics 1 / 84

Page 2: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

IP Viking map

http://map.norsecorp.com

BIPM — Lecture 3 — BI & Descriptive Analytics 2 / 84

Page 3: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Recap

Business Intelligence: anything that aims at providing actionableinformation that can be used to support business decision making

Business IntelligenceVisual AnalyticsDescriptive AnalyticsPredictive Analytics

Process Modelling (April and May)

BIPM — Lecture 3 — BI & Descriptive Analytics 3 / 84

Page 4: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Visual Analytics(“last week’s leftovers” or:

“how it’s not done”)

BIPM — Lecture 3 — BI & Descriptive Analytics 4 / 84

Page 5: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Visualization

Visualization: mapping data properties to visual attributes

Good visualization: “proper” mapping of data attributes to visualattributes and properly “balancing” the number of data propertiesand visual attributes used

Bad visualization:

False data inputMisleading visual attributesAbusing human background knowledge

BIPM — Lecture 3 — BI & Descriptive Analytics 5 / 84

Page 6: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Visualization

Visualization: mapping data properties to visual attributes

Good visualization: “proper” mapping of data attributes to visualattributes and properly “balancing” the number of data propertiesand visual attributes used

Bad visualization:

False data inputMisleading visual attributesAbusing human background knowledge

BIPM — Lecture 3 — BI & Descriptive Analytics 5 / 84

Page 7: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

“Unbiased” data

BIPM — Lecture 3 — BI & Descriptive Analytics 6 / 84

Page 8: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Rainbow colors

http://poynter.org/uncategorized/224413

BIPM — Lecture 3 — BI & Descriptive Analytics 7 / 84

Page 9: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Parts and sums

https://hbr.org/2014/12/vision-statement-how-to-lie-with-charts

BIPM — Lecture 3 — BI & Descriptive Analytics 8 / 84

Page 10: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

2D bars and icons

BIPM — Lecture 3 — BI & Descriptive Analytics 9 / 84

Page 11: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

2D bars explained

http://en.wikipedia.org/wiki/Misleading_graph

BIPM — Lecture 3 — BI & Descriptive Analytics 10 / 84

Page 12: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

2D bars explained

http://en.wikipedia.org/wiki/Misleading_graph

BIPM — Lecture 3 — BI & Descriptive Analytics 10 / 84

Page 13: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

2D bars explained

http://en.wikipedia.org/wiki/Misleading_graph

BIPM — Lecture 3 — BI & Descriptive Analytics 10 / 84

Page 14: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

3D pies

http://en.wikipedia.org/wiki/Misleading_graph

BIPM — Lecture 3 — BI & Descriptive Analytics 11 / 84

Page 15: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

3D pies

http://en.wikipedia.org/wiki/Misleading_graph

BIPM — Lecture 3 — BI & Descriptive Analytics 11 / 84

Page 16: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Color-coding geographic regions

https://hbr.org/2014/12/vision-statement-how-to-lie-with-charts

BIPM — Lecture 3 — BI & Descriptive Analytics 12 / 84

Page 17: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Color-coding geographic regions

https://hbr.org/2014/12/vision-statement-how-to-lie-with-charts

BIPM — Lecture 3 — BI & Descriptive Analytics 13 / 84

Page 18: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Axis ranges

https://hbr.org/2014/12/vision-statement-how-to-lie-with-charts

BIPM — Lecture 3 — BI & Descriptive Analytics 14 / 84

Page 19: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Axis ranges

https://hbr.org/2014/12/vision-statement-how-to-lie-with-charts

BIPM — Lecture 3 — BI & Descriptive Analytics 15 / 84

Page 20: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Who understands?

http://www.multimension.com/project/upgrading-clinical-infographics/

BIPM — Lecture 3 — BI & Descriptive Analytics 16 / 84

Page 21: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data Mining in a BI context

BIPM — Lecture 3 — BI & Descriptive Analytics 17 / 84

Page 22: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Overview

Data warehouse

Data preparation

Data mining theory recap

Data mining case studies

Data mining evaluation techniques

BIPM — Lecture 3 — BI & Descriptive Analytics 18 / 84

Page 23: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data warehouse

Data warehouse: a copy of transaction data specifically structuredfor query and analysis (R. Kimball)

Data warehouse: a system used for reporting and data analysis(Wikipedia)

Data warehouse: a subject oriented, integrated, nonvolatile,timestamped collection of data designed to support management’sdecision support needs (B. Inmon)

BIPM — Lecture 3 — BI & Descriptive Analytics 19 / 84

Page 24: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data warehouse data

In a data warehouse, data is organized around subjects(whereas information systems are organized around applications)

Data is collected from heterogeneous sources and may already beaggregated (for example from an ERP or CRM system)

Data is timestamped

Data is nonvolatile

BIPM — Lecture 3 — BI & Descriptive Analytics 20 / 84

Page 25: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data warehouse

http://savis.vn/

BIPM — Lecture 3 — BI & Descriptive Analytics 21 / 84

Page 26: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Transactional system vs. Data warehouse

Transactional System

Holds current data

Detailed data

Volatile data

High transaction frequency

Oriented on daily operations

Support for daily decisions

Many operational users

Availability very important

Data storage focus

Data warehouse

Current and historic data

Detailed and aggregated data

Nonvolatile data

Medium-low frequency

Oriented on data analysis

Support for strategic decisions

Few decision-making users

Availability not so important

Information acquisition focus

https://www.fer.unizg.hr/ (Business Intelligence)

BIPM — Lecture 3 — BI & Descriptive Analytics 22 / 84

Page 27: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data mining

Data mining: the computational process of discovering patterns inlarge data sets involving methods at the intersection of artificialintelligence, machine learning, statistics, and database systems(Wikipedia)

Data mining: the practice of examining large pre-existing databasesin order to generate new information (Oxford)

Data mining: knowledge discovery from data (or information) in anautomated way (DIKW pyramid)

BIPM — Lecture 3 — BI & Descriptive Analytics 23 / 84

Page 28: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

DIKW Pyramid

BIPM — Lecture 3 — BI & Descriptive Analytics 24 / 84

Page 29: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

DIKW Gaps

ZPR FER Zagreb - Business Intelligence 20113

BIPM — Lecture 3 — BI & Descriptive Analytics 25 / 84

Page 30: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data mining . . .

KDD: Knowledge Discovery in Databases

Data archeology

Information harvesting

Knowledge extraction

Machine learning

Big data techniques?

Data science?

Business intelligence?

BIPM — Lecture 3 — BI & Descriptive Analytics 26 / 84

Page 31: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data mining

http://blogs.sas.com/content/subconsciousmusings/2014/08/22

BIPM — Lecture 3 — BI & Descriptive Analytics 27 / 84

Page 32: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

KDD

Knowledge Discovery in Data is the

non-trivial process of identifyingvalid,novel,potentially usefuland ultimately understandable

patterns in data.

Fayyad et al., Advances in knowledge discovery and data mining,MIT press, 1996.

BIPM — Lecture 3 — BI & Descriptive Analytics 28 / 84

Page 33: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

KDD

BIPM — Lecture 3 — BI & Descriptive Analytics 29 / 84

Page 34: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Why data mining now?

Data flood / data explosion

Cloud computing power

Cheap storage

Algorithms have matured

Software is available

Competition is killing

BIPM — Lecture 3 — BI & Descriptive Analytics 30 / 84

Page 35: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data mining in businesses

Process management

Market basket analysis

Marketing

Customer loyalty

Fraud detection

Trend analysis

BIPM — Lecture 3 — BI & Descriptive Analytics 31 / 84

Page 36: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data mining in practice

1 Learn about the problem domain

2 Data selection

3 Data, cleaning, preprocessing and reduction

4 Data mining

5 Interpretation of information

6 Apply knowledge in domain

BIPM — Lecture 3 — BI & Descriptive Analytics 32 / 84

Page 37: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data preprocessing

Sampling

Normalization

Missing data

Data conflicts

Duplicate data

Ambiguity in data

BIPM — Lecture 3 — BI & Descriptive Analytics 33 / 84

Page 38: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Guidelines for successful data mining

The data must be available

The data must be relevant, adequate and clean

There must be a well-defined problem

The problem should not be solvable by means of ordinary query orOLAP tools

The results must be actionable

BIPM — Lecture 3 — BI & Descriptive Analytics 34 / 84

Page 39: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Successful data mining in businesses

Use a small team with a strong internal integration and a loosemanagement style

Carry out a small pilot project before a major data mining project

Identify a clear problem owner responsible for the project, e.g., fromsales or marketing

Try to realize a positive return on investment within 6 to 12 months

Have top management back the project up

BIPM — Lecture 3 — BI & Descriptive Analytics 35 / 84

Page 40: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Break?

BIPM — Lecture 3 — BI & Descriptive Analytics 36 / 84

Page 41: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data attribute types

Categorical attributes: discrete

Nominal attribute: has no logical ordering(e.g., colors or names)Ordinal attribute: has ordering(e.g.: bad, OK, good, perfect)

Numerical attributes continuous(e.g., 4.815m and EUR 162 342)

BIPM — Lecture 3 — BI & Descriptive Analytics 37 / 84

Page 42: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data quality

Accuracy

Completeness

Consistency (uniformity)

Validity

Timeliness

Data cleaning, data cleansing, data scrubbing, . . .

BIPM — Lecture 3 — BI & Descriptive Analytics 38 / 84

Page 43: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data quality

http://www.hicxsolutions.com/supplier-management-programmes/

BIPM — Lecture 3 — BI & Descriptive Analytics 39 / 84

Page 44: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

http://halobi.com/wp-content/uploads/data-quality-infographic.png

Page 45: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

http://halobi.com/wp-content/uploads/data-quality-infographic.png

Page 46: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Example: Corporate data quality

BIPM — Lecture 3 — BI & Descriptive Analytics 42 / 84

Page 47: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data quality

ORBIS database (Bureau van Dijk, http://orbis.bvdinfo.com)

Aggregates data from Chambers of Commerce across the world

Snapshot from September 2015

Extracted all firms (including meta-data such as operating revenue,employees, assets and market capitalization)

140,087,471 firms found. Is that all?

BIPM — Lecture 3 — BI & Descriptive Analytics 43 / 84

Page 48: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data quality

ORBIS database (Bureau van Dijk, http://orbis.bvdinfo.com)

Aggregates data from Chambers of Commerce across the world

Snapshot from September 2015

Extracted all firms (including meta-data such as operating revenue,employees, assets and market capitalization)

140,087,471 firms found. Is that all?

BIPM — Lecture 3 — BI & Descriptive Analytics 43 / 84

Page 49: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data quality

ORBIS database (Bureau van Dijk, http://orbis.bvdinfo.com)

Aggregates data from Chambers of Commerce across the world

Snapshot from September 2015

Extracted all firms (including meta-data such as operating revenue,employees, assets and market capitalization)

140,087,471 firms found.

Is that all?

BIPM — Lecture 3 — BI & Descriptive Analytics 43 / 84

Page 50: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Data quality

ORBIS database (Bureau van Dijk, http://orbis.bvdinfo.com)

Aggregates data from Chambers of Commerce across the world

Snapshot from September 2015

Extracted all firms (including meta-data such as operating revenue,employees, assets and market capitalization)

140,087,471 firms found. Is that all?

BIPM — Lecture 3 — BI & Descriptive Analytics 43 / 84

Page 51: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Observed data

Figure : Observed average revenue per country (darker is more)

BIPM — Lecture 3 — BI & Descriptive Analytics 44 / 84

Page 52: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Completeness per size categoryA

T

BE

BG CY

CZ

DE

DK

EE

ES FI FR GB

GR

HR

HU IE IT LT LU LV MT

NL

NO PL

PT

RO SE SI

SK

Country

0

50

100

150

2000-9 10-19 20-49 50-249 GE250

Com

ple

teness

(%

of

num

ber

of

com

panie

s)

Figure : Percentage of companies present, segmented by number of employees.

BIPM — Lecture 3 — BI & Descriptive Analytics 45 / 84

Page 53: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Assessing completeness

Lognormal distribution for firm revenue in a country

Idea: fix distribution scale based on known high quality countries

Estimate mean revenue for each country using World Bank indicators

Result: GDP per capita ∼ Mean revenue

Mean revenue ∼ Distribution location

Assess completeness by comparing observed average revenue withestimated average revenue

BIPM — Lecture 3 — BI & Descriptive Analytics 46 / 84

Page 54: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Assessing completeness

Lognormal distribution for firm revenue in a country

Idea: fix distribution scale based on known high quality countries

Estimate mean revenue for each country using World Bank indicators

Result: GDP per capita ∼ Mean revenue

Mean revenue ∼ Distribution location

Assess completeness by comparing observed average revenue withestimated average revenue

BIPM — Lecture 3 — BI & Descriptive Analytics 46 / 84

Page 55: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Assessing completeness

Lognormal distribution for firm revenue in a country

Idea: fix distribution scale based on known high quality countries

Estimate mean revenue for each country using World Bank indicators

Result: GDP per capita ∼ Mean revenue

Mean revenue ∼ Distribution location

Assess completeness by comparing observed average revenue withestimated average revenue

BIPM — Lecture 3 — BI & Descriptive Analytics 46 / 84

Page 56: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Mean vs. standard deviation

BIPM — Lecture 3 — BI & Descriptive Analytics 47 / 84

Page 57: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Understanding average revenue

Figure : Observed average revenue Figure : Estimated average revenue

BIPM — Lecture 3 — BI & Descriptive Analytics 48 / 84

Page 58: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Low average in rich countries

100 101102 103 104 105

Company Turnover (thds dollars)

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

Freq

uency

100 101102 103 104 105

Company Turnover (thds dollars)

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

100 101102 103 104 105

Company Turnover (thds dollars)

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

100 101102 103 104 105

Company Turnover (thds dollars)

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

BIPM — Lecture 3 — BI & Descriptive Analytics 49 / 84

Page 59: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Real completeness

BIPM — Lecture 3 — BI & Descriptive Analytics 50 / 84

Page 60: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Completeness per country

1 41E13E19E12E27E22E35E31E44E410E43E

57E52E65E61E74E71E83E8

NOSEEEFISKGBCZCLFRUSHUCHPTSIESITCADKKRDEJPAUPLGRIEBELUNLATILNZ

≥10010−110−210−310-3.5

Completeness

1 1E2 2E4 3E6 5E810 -1

100

101

102

103

104

105

106

1 1E2 2E4 3E6 5E810- 1

100

101

102

103

104

105

106

1 1E2 2E4 3E6 5E810- 1

100

101

102

103

104

105

106

A B

C

D

BIPM — Lecture 3 — BI & Descriptive Analytics 51 / 84

Page 61: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Categories of techniques

Machine learning

Supervised learning: learning on labeled dataSemi-supervised learning: partially labeled dataUnsupervised learning: leaning/mining on unlabeled dataReinforcement learning: agents learning to act in an environment

BIPM — Lecture 3 — BI & Descriptive Analytics 52 / 84

Page 62: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Unsupervised learning

BIPM — Lecture 3 — BI & Descriptive Analytics 53 / 84

Page 63: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Categories of techniques

Unsupervised learning: leaning/mining on unlabeled data

Supervised learning: learning on labeled data

Semi-supervised learning: partially labeled data

Reinforcement learning: agents learning to act in an environment

BIPM — Lecture 3 — BI & Descriptive Analytics 54 / 84

Page 64: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Unsupervised learning

Clustering

Anomaly detection

Pattern recognition

Data summarization

BIPM — Lecture 3 — BI & Descriptive Analytics 55 / 84

Page 65: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Clustering

Clustering

Data is unlabeled

Label data: grouping

BIPM — Lecture 3 — BI & Descriptive Analytics 56 / 84

Page 66: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Clustering

Clustering

Data is unlabeled

Label data: grouping

Grouping based on similarattributes: relatively close“neighbors” in n-dimensionalspace

BIPM — Lecture 3 — BI & Descriptive Analytics 57 / 84

Page 67: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

k-means Clustering

1 k means are randomly placed

2 k clusters are created by assigning each observation to the nearestmean (according to some distance notion)

3 the centroid of each cluster becomes the new mean

4 steps 1–3 are repeated until convergence

BIPM — Lecture 3 — BI & Descriptive Analytics 58 / 84

Page 68: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

k-means Clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 59 / 84

Page 69: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

1 Define a distance function between objects

2 Assign each object to its own cluster

3 Merge the two nearest clusters (based on distance between itsobjects) into one cluster

4 Until there is only one cluster, go to 3

5 Pick a level in the resulting dendogram as the preferred method ofclustering

BIPM — Lecture 3 — BI & Descriptive Analytics 60 / 84

Page 70: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 61 / 84

Page 71: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 61 / 84

Page 72: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 62 / 84

Page 73: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 63 / 84

Page 74: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 63 / 84

Page 75: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 64 / 84

Page 76: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical clustering

BIPM — Lecture 3 — BI & Descriptive Analytics 64 / 84

Page 77: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Clustering validation

Expectation-Maximization (EN) clustering: https://en.wikipedia.org/wiki/Expectation-maximization_algorithm

BIPM — Lecture 3 — BI & Descriptive Analytics 65 / 84

Page 78: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Hierarchical vs. k-means clustering

Time complexity (linear vs. quadratic)

Predefined number of clusters

Influence of outliers

Assumption of the presence of a hierarchical structure

BIPM — Lecture 3 — BI & Descriptive Analytics 66 / 84

Page 79: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Case: Anomalies in energy expenditure

BIPM — Lecture 3 — BI & Descriptive Analytics 67 / 84

Page 80: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Case: anomalies in energy expenditure

BSc project J. Kalmeijer in cooperation with “Rijkswaterstaat”

Total of 254 objects all over the Netherlands

Energy expenditure over 3 years known for each object

Measurements every 15 minutes:365 days × 24 hours × 4 measurements ≈ 35.000 yearlymeasurements

BIPM — Lecture 3 — BI & Descriptive Analytics 68 / 84

Page 81: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Objects

Public lighting or traffic control

Office

Tunnel

Radarpost

Pumping station

Floodgate or weir

Traffic control center

Bridge or dam

Small building

BIPM — Lecture 3 — BI & Descriptive Analytics 69 / 84

Page 82: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Goal

Clustering on all data:

Public lightingAll other objects

Clustering to detect object groups

Identify regular energy usage pattern of objects

Objects are of different types

Detect anomalies in energy usage per object type

Data-driven!

BIPM — Lecture 3 — BI & Descriptive Analytics 70 / 84

Page 83: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Approach

BIPM — Lecture 3 — BI & Descriptive Analytics 71 / 84

Page 84: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

A clustering result

Figure : Public lighting objects

BIPM — Lecture 3 — BI & Descriptive Analytics 72 / 84

Page 85: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Anomaly detection results

Figure : Outlier in seasonal behavior

BIPM — Lecture 3 — BI & Descriptive Analytics 73 / 84

Page 86: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Project results and conclusion

Objects clustered into types based on the data

Some anomalies detected for various types of objects

Correlations between weather and object (types) identified

Data-driven insight!

BIPM — Lecture 3 — BI & Descriptive Analytics 74 / 84

Page 87: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Unsupervised learning

Clustering

Anomaly detection

Pattern recognition

Data summarization

BIPM — Lecture 3 — BI & Descriptive Analytics 75 / 84

Page 88: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Market basket analysis

Han & Kamber, Data mining: Concepts and techniques, 2006

BIPM — Lecture 3 — BI & Descriptive Analytics 76 / 84

Page 89: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Association

X and Y are variables. There are N instances, of which NX

instances have variable X

Derive rules of the form IF(X) THEN Y X ⇒ Y

support(X ⇒ Y ) = NX∧Y /N

confidence(X ⇒ Y ) = NX∧Y /NX

lift(X ⇒ Y ) =NX∧Y N

NXNY

support: higher is better

confidence: close to 1

lift: factors higher is better

BIPM — Lecture 3 — BI & Descriptive Analytics 77 / 84

Page 90: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Association

X and Y are variables. There are N instances, of which NX

instances have variable X

Derive rules of the form IF(X) THEN Y X ⇒ Y

support(X ⇒ Y ) = NX∧Y /N

confidence(X ⇒ Y ) = NX∧Y /NX

lift(X ⇒ Y ) =NX∧Y N

NXNY

support:

higher is better

confidence: close to 1

lift: factors higher is better

BIPM — Lecture 3 — BI & Descriptive Analytics 77 / 84

Page 91: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Association

X and Y are variables. There are N instances, of which NX

instances have variable X

Derive rules of the form IF(X) THEN Y X ⇒ Y

support(X ⇒ Y ) = NX∧Y /N

confidence(X ⇒ Y ) = NX∧Y /NX

lift(X ⇒ Y ) =NX∧Y N

NXNY

support: higher is better

confidence:

close to 1

lift: factors higher is better

BIPM — Lecture 3 — BI & Descriptive Analytics 77 / 84

Page 92: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Association

X and Y are variables. There are N instances, of which NX

instances have variable X

Derive rules of the form IF(X) THEN Y X ⇒ Y

support(X ⇒ Y ) = NX∧Y /N

confidence(X ⇒ Y ) = NX∧Y /NX

lift(X ⇒ Y ) =NX∧Y N

NXNY

support: higher is better

confidence: close to 1

lift:

factors higher is better

BIPM — Lecture 3 — BI & Descriptive Analytics 77 / 84

Page 93: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Association

X and Y are variables. There are N instances, of which NX

instances have variable X

Derive rules of the form IF(X) THEN Y X ⇒ Y

support(X ⇒ Y ) = NX∧Y /N

confidence(X ⇒ Y ) = NX∧Y /NX

lift(X ⇒ Y ) =NX∧Y N

NXNY

support: higher is better

confidence: close to 1

lift: factors higher is better

BIPM — Lecture 3 — BI & Descriptive Analytics 77 / 84

Page 94: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Association rules

http://www.saedsayad.com

BIPM — Lecture 3 — BI & Descriptive Analytics 78 / 84

Page 95: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Unsupervised learning

Clustering

Anomaly detection

Pattern recognition

Data summarization

BIPM — Lecture 3 — BI & Descriptive Analytics 79 / 84

Page 96: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Anomaly detection

Supervised: normal/outlier can be learned as a class attribute

Semi-supervised: train on a labeled dataset, determine outliers inunlabeled data based on likelihood of a deviation

Unsupervised: identify patterns (for example, using clustering) andthen select small clusters or instances that do not logically fall inany of the large clusters

BIPM — Lecture 3 — BI & Descriptive Analytics 80 / 84

Page 97: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Assignment 1

Gaming industry context

Sales log spanning 4 years of sales

Apply and compare BI techniques

Inspect, visualize, aggregate, segment, score . . .

Deliverables:

1 Web-based BI Dashboard2 Short assignment report in LATEX

BIPM — Lecture 3 — BI & Descriptive Analytics 81 / 84

Page 98: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Assignment 1 — Hints

Model: MySQL database containing the data

View: HTML page using Javascript that reads JSON

Controller: PHP outputs relevant data in JSON

BIPM — Lecture 3 — BI & Descriptive Analytics 82 / 84

Page 99: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Lab session February 23

Make serious progress with Assignment 1

Continue with dashboard and data integration

Error reporting in PHP and other handy tricks:http://liacs.leidenuniv.nl/ict

Start thinking about the BI questions

Ask all relevant questions

BIPM — Lecture 3 — BI & Descriptive Analytics 83 / 84

Page 100: Business Intelligence & Process Modellingliacs.leidenuniv.nl/~takesfw/BIPM/lecture3.pdf · 2018-03-15 · Recap Business Intelligence: anything that aims at providing actionable information

Credits

Lecture partially based on (slides of the (previous edition of the)) course book:W. van der Aalst, Process Mining: Data Science in Action, 2nd edition,Springer, 2016.

Slides partially based on “From Data Mining to Knowledge Discovery: An

Introduction” by Gregory Piatetsky-Shapiro (KDnuggets.com)

BIPM — Lecture 3 — BI & Descriptive Analytics 84 / 84