better data quality for better data science...better data quality for better data science brandon...

47
#PIWorld ©2018 OSIsoft, LLC Better Data Quality for Better Data Science Brandon Perry 1 with the PI System

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Better Data Quality for Better Data Science

Brandon Perry

1

with the PI System

Page 2: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

2

Symptom: losing money to shutdowns

Cause: unexpected equipment failure

customer

Project: predict equipment failure

Symptom: many false alerts

Cause: poor data accuracy

Project: improve the data accuracy

Symptom: many false diagnoses

Cause: poor data interpretation

Page 3: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

3

Data Quality

-Accuracy -Believability -Completeness -Ease of

understanding

-Relevancy -Timeliness -Accessibility

some common dimensions:

Page 4: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

4

Time Average

(PI)

Average

(Excel)

18:00 2.95 3.02

18:10 2.36 1.64

18:20 9.58 10.00

18:30 7.73 8.44

18:40 22.45 22.87

18:50 7.89 6.71

Time Value

8/13/18 18:03 2.77

8/13/18 18:08 3.28

8/13/18 18:13 3.00

8/13/18 18:18 0.28

8/13/18 18:23 18.78

8/13/18 18:28 1.23

8/13/18 18:33 4.79

8/13/18 18:38 12.10

8/13/18 18:43 33.90

8/13/18 18:48 11.84

8/13/18 18:53 13.42

8/13/18 18:58 0.00

0.00

40.00

1

5

25

PI Excel

Raw data for 1h Averages every 10 minutes

Fermenter 13 bottom heater

Note log scale to show relative error

acsbrew.BREWERY.B2_CL_C1_FV13_TIC1550A/OUT.CV

Bottom TIC OUT [Control Value]

% Error

3

-30

4

9

2

-15

Page 5: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

5

I. Why it matters

II. What it is

III. What to do

Data Quality

Impact

Understanding

Action

Page 6: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

6

II. What it is

Page 7: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

7

Time Series “Samples”

Time Sequence “Signal”

( t, v )

( t, v )

Page 8: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

8

( t, vtemperature )

Interpolation

Gap

?

Page 9: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

9

? Gaps

Known

Gaps

Unknown

Gap

Page 10: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

10

☑ Questionable

☑ Substituted

☑ Annotated

this value was modified

this value might not be useful

this value has a note attached

Page 11: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

11

( t, v )

°C

value: 42.0 quality: Uncertain – Last Usable Value

Quality as reported by some sources

Complex

Quality

Metadata

Page 12: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

12

Uncompressed

Compressed

( t, v )

( t, v )

Page 13: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

13

Well-sampled

Under-sampled

Well-sampled,

compressed

Page 14: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

14

Trends @

10x compression

Frequency

spectra (black original)

from Thornhill, Nina F., Choudhury, M.A.A. Shoukat, Shah, Shirish L.: The impact of compression on data-driven process analyses. In: Journal of Process Control,14(2), 389 – 398 (2004)

Reproduced here under fair use for critique of this work

Page 15: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

15

Spike

Stick

Page 16: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

16

Falsely

precise

Realistic

Sensor accuracy: ±2%

42.018382

84.3

Page 17: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

17

Time Value A Value B

8/21/18 17:50 78.751

8/21/18 17:52 33.899

8/21/18 17:53 94.162

8/21/18 18:07 79.858

8/21/18 18:16 37.222 79.656

8/21/18 18:27 68.398

8/21/18 18:30 97.063

8/21/18 18:41 35.461

8/21/18 18:50 42.960

8/21/18 19:00 72.527

Time Value A Value B

8/21/18 17:50 82.663 78.751

8/21/18 17:52 33.899 86.657

8/21/18 17:53 12.679 94.162

8/21/18 18:07 56.308 79.858

8/21/18 18:16 37.222 79.656

8/21/18 18:27 68.398 64.163

8/21/18 18:30 79.185 97.063

8/21/18 18:41 18.486 35.461

8/21/18 18:50 8.759 42.960

8/21/18 19:00 72.527 74.234

Raw Interpolated together

Page 18: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

18

What is the average value

in this window?

Time-weighted

e.g. AVG() in SQL or Excel

Naïve

*there are certainly times where event weighting is the right thing, but this choice should be made deliberately

!!

Page 19: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

19

e.g. AVG() in SQL or Excel

What is the average value

in this window?

Naïve

Time-weighted

*there are certainly times where event weighting is the right thing, but this choice should be made deliberately

!!

Page 20: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

21

III. What to do

Page 21: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Adjust your data collection settings

22

Filtering

Compression

Sampling rate

Page 22: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Add sensor metadata to your PI Assets

23

Page 23: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Cleanse your raw data right in the PI System so others can benefit too

24

No

Data

Original

Cleansed

Page 24: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

25

PI

Integrators PI SQL

PI Web

API

PI

DataLink

Page 25: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Interpolate when you need regularity

26

• 1 • 2 • 3 • 4 • 5 • 6 • 7 • 8 • 9 • 10 • 11

10-minute

samples

Page 26: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Use time-weighted aggregates when appropriate, and set a minimum quality

27

Page 27: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Aggregate on phases or states

28

Batch Duration Min Rate Rate Variance Mean Temperature

1 4.65 10.1 0.20 32.9

2 4.22 10.8 0.19 33.0

3 7.41 0.02 4.2 13.5

Fill React Settle Decant Idle

PI Event Frames

Page 28: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

29

and now…

Page 29: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Contact Information

30

Brandon Perry

Research

OSIsoft

[email protected]

Page 30: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Data Quality at TransCanada

Keary Rogers & Ionuţ Buse

31

Page 31: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

32

TransCanada Corporation (TSX/NYSE: TRP)

One of North America’s Largest Natural Gas Pipeline Networks

• Operate 91,900 km (57,100 mi.) of pipelines

• Transport ~25 per cent of continental demand

• Over 650 Bcf of gas storage capacity

One of Canada’s Largest Private Sector Power Generators

• 11 power facilities, approximately 6,100 MW

• Diversified portfolio including wind, nuclear and natural gas

Premier Liquids Pipeline System

• 4,900 km (3,000 mi.)

• Keystone System transports ~20 per cent of Western Canadian

exports

• Safely delivered more than 1.9 billion barrels of Canadian oil to

U.S. markets

Page 32: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

33

North America Natural Gas Demand Growth

Page 33: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

34

City Centers

Page 34: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

35

Universities

Page 35: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

36

Schools

Page 36: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

37

Our Children

Page 37: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

38

Medical Facilities

Page 38: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

39

Elderly

Page 39: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

40

Why Does Real-Time Data Quality Matter?

OpsVision Condition Monitoring

Early Detection of

Functional Degradation

Fleet Optimization

Expose Data to

Operations Personnel

Asset Performance

& Efficiency

Page 40: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

41

How Real-time Data Impacts Our Business?

Functional

degradation

starts occurring

on the gas

producer bearing

drain packing

Abnormal Oil

Tank Pressure

increase is

flagged through

SQC anomaly

detection

Reliability Analyst

performs data

analysis &

communicates to

Maintenance

Lead

Unit is back in

service. Failure

was mitigated

without any

customer impact

Unit is taken

offline planned,

controlled &

safely. The drain

packing is

replaced

Page 41: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

42

Sensors

& PLC Network

PI

Interfaces

PI Data

Archive

PI Asset

Framework

Automation

& Control

Network

Support

Real-time

Systems

Core

Reliability

24/7 365 Hardware + Software People +

Real-time Data | Technology & People

Page 42: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

43

Real-time Data | Process

Automation

& Control

PI Asset

Framework

Data Quality

Check

Network

Support

Real-time

Systems

Core

Reliability Dashboard Statistics & Context

Communication

Documentation

Ensuring Data Completeness & Timeliness

Page 43: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

44

Real-time Data | Process Management Dashboard

Page 44: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

45

Real-time Data Quality | Failure Scenarios

Bad Value

Accomplished

Stale

Accomplished

Flat Line

Accomplished

Granularity

Future Work

Unexpected

system state is

written to the

current value

Data has

stopped updating

and the last

timestamp is

older than

exception max

Data is updating

but same value

gets written

Identified by

leveraging the

asset structure

Data is not collected at adequate granularity to be used in statistical and machine learning methods In-depth data analysis is required to address this issue

Com

ple

xity

Page 45: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Contact Information

46

Keary Rogers

Manager, Core Reliability

TransCanada US Gas Operations

[email protected]

Ionuţ Buse

Team Leader, Enterprise Analytics

TransCanada US Gas Operations

[email protected]

Page 46: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

Questions?

Please wait for

the microphone

State your

name & company

Please rate this session

in the mobile app!

Search

“OSIsoft” in

your app store

47

Page 47: Better Data Quality for Better Data Science...Better Data Quality for Better Data Science Brandon Perry 1 with the PI System . #PIWorld ©2018 OSIsoft, LLC 2 Symptom: losing money

#PIWorld ©2018 OSIsoft, LLC

48