legacy infrastructure and big data · 25.09.2018 · national grid 7 the main use case dominated...

18
Sam Young Analytics Development Leader 25 th September 2018 Legacy Infrastructure and Big Data

Upload: others

Post on 28-Oct-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

Sam Young

Analytics Development Leader

25th September 2018

Legacy Infrastructure and Big Data

Page 2: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

2National Grid

Contents page

| Legacy Infrastructure and Big Data | 25th September 2018

01 Business need & technology stack 03

02 Lessons on performance 05

03 Managing data quality across legacy systems 08

04 Two-way integration 11

05 Organisational challenges and opportunities 13

Page 3: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

3National Grid

The business need

| Legacy Infrastructure and Big Data | 25th September 2018

Data Lake

Capital Project

Management

(Oracle

Primavera)

Maintenance

Planning

(Ellipse)

Asset health &

condition monitoring

(Spreadsheets/IBM

cloud)

Asset Register

(Ellipse)

Outage Planning

(TOGA)

Workforce

Scheduling

(MyCalendar)

Staffing

Levels

(SAP)

Single

View of

the Plan

Page 4: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

4National Grid

The technology stack

| Legacy Infrastructure and Big Data | 25th September 2018

Storage

ETL

Integration

Modelling

Visualisation

On Premises

Page 5: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

5National Grid

Performance challenges

We experienced a performance bottleneck on the

ETL servers, which had an adverse effect on

stakeholder engagement

We needed to continually re-optimise the ETL job schedule

Performance worsened non-linearly with new use cases

Predicting production performance was hard because pre-

prod did not have continually refreshed data

| Legacy Infrastructure and Big Data | 25th September 2018

ETL Schedule

Page 6: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

6National Grid

“It is difficult to

make predictions,

especially about the

future”

Danish Proverb

On premises vs cloud

On Premises

Expensive and slow to scale

up infrastructure

Accurate initial predictions

are therefore very

important

Cloud

Cheap and quick to scale up

infrastructure

Accurate initial predictions

are therefore less

important

| Legacy Infrastructure and Big Data | 25th September 2018

Original:

On Premises

Revised:

Oracle Cloud

Page 7: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

7National Grid

The main use case dominated the data model

design

1. We have a large number of input tables from different

systems

2. We have a clearly defined set of ways they will be used

Solution: combine all the tables from different sources into a

handful of big tables with “everything” in

Implementing new use cases was made harder

Lack of intermediate tables led to rework/duplication and

reduced performance

Data model

| Legacy Infrastructure and Big Data | 25th September 2018

Original:

Big tables

Revised:

Intermediate tables

Page 8: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

8National Grid

Data quality in legacy systems always needs

improving

Integrating systems reveals this poor data quality

New data platforms often offer better data management

tools than legacy systems

These tools can be used to improve data quality in legacy

systems

Business ownership of data can strengthen because data

quality is more visible

Data quality in legacy systems

| Legacy Infrastructure and Big Data | 25th September 2018

Page 9: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

9National Grid

Example process for correcting data

| Legacy Infrastructure and Big Data | 25th September 2018

Identify incorrect

asset data

Raise via data

quality portal

Triage

Investigate

Request evidence

Update data in

legacy system

Provide evidence

Request update

En

gin

eer

Data

Qu

ality

Team

Data

Delivery

Team

Page 10: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

10National Grid

Example process for correcting data

| Legacy Infrastructure and Big Data | 25th September 2018

Identify incorrect asset data

DQ rules to

monitor

changes

Update data in

legacy system

En

gin

eer

Data

Qu

ality

Ex

pe

rt

Raise via data quality portal

Data

Delivery

Team

Page 11: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

11National Grid

“Almost anything is

easier to get into

than out of”

Agnes Allen

“Except for legacy

systems, where it’s

easier to get things

out than put them

in”

Sam Young

Agnes Allen’s Law

Legacy systems are often

not designed with nice APIs

e.g. raising new workorders

Robotic Process Automation

may provide a workaround

| Legacy Infrastructure and Big Data | 25th September 2018

Page 12: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

12National Grid

Robotic Process Automation

Robotic Process Automation automates

processes using existing front end interfaces to

reduce the need for back end integration.

“Gaffer tape for systems integration” – not an enduring

solution, but surprisingly useful

Ideal for well defined processes with limited permutations

Not robust to front end interface changes

National Grid has deployed it in various back office tasks

- e.g. translating shopping carts into orders in supplier

system

We are considering the possibility of automating the

creation of work orders in a legacy system using RPA

| Legacy Infrastructure and Big Data | 25th September 2018

Page 13: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

13National Grid

“Organisations

which design

systems are

constrained to

produce designs

which are copies of

the communication

structures of these

organisations”

Melvin Conway

Conway’s Law

“Systems reflect the

organisation that designed

them”

e.g. siloed organisations

produce siloed systems

Your systems and data

structures are a mirror

| Legacy Infrastructure and Big Data | 25th September 2018

Page 14: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

14National Grid

Leverage insights into organisation

If you’re struggling to integrate systems/data,

there’s usually an organisational reason as well

as a technical one

The classic “these two systems don’t have compatible keys”

is a symptom of processes that don’t join up well

We need to be pragmatic – implementing workarounds in

systems is often necessary

However suppressing the symptoms is detrimental in the

long term, so communicate the organisational insights and

ensure they are followed up

| Legacy Infrastructure and Big Data | 25th September 2018

Case study:

Tracking efficiency savings throughout the lifecycle of

capital projects

Page 15: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

15National Grid

Legacy systems fit a legacy organisation

Your current (and future) organisation is different

from the one that designed the legacy systems

Legacy systems can make organisational change harder

Design integration to enable rather than restrict future

change

- e.g. intermediate tables vs big tables

| Legacy Infrastructure and Big Data | 25th September 2018

Page 16: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

16National Grid

Integrate relationships not just systems

One of the biggest benefits from major

integration projects is the relationships that

develop across the organisation

Work out ways to maintain and reinforce those relationships

after the end of the project

• Coffee catchups

• Co-location

• Cross-functional teams

• Restructure

| Legacy Infrastructure and Big Data | 25th September 2018

Page 17: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

17National Grid

We integrate legacy systems to enable

change.

So value flexibility and consider using

“gaffer tape” for quick wins.

Remember that systems reflect

organisations, so help integrate the

organisation not just the systems.

| Legacy Infrastructure and Big Data | 25th September 2018

Conclusions

Page 18: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems

| Legacy Infrastructure and Big Data | 25th September 2018