legacy infrastructure and big data · 25.09.2018 · national grid 7 the main use case dominated...
TRANSCRIPT
![Page 1: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/1.jpg)
Sam Young
Analytics Development Leader
25th September 2018
Legacy Infrastructure and Big Data
![Page 2: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/2.jpg)
2National Grid
Contents page
| Legacy Infrastructure and Big Data | 25th September 2018
01 Business need & technology stack 03
02 Lessons on performance 05
03 Managing data quality across legacy systems 08
04 Two-way integration 11
05 Organisational challenges and opportunities 13
![Page 3: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/3.jpg)
3National Grid
The business need
| Legacy Infrastructure and Big Data | 25th September 2018
Data Lake
Capital Project
Management
(Oracle
Primavera)
Maintenance
Planning
(Ellipse)
Asset health &
condition monitoring
(Spreadsheets/IBM
cloud)
Asset Register
(Ellipse)
Outage Planning
(TOGA)
Workforce
Scheduling
(MyCalendar)
Staffing
Levels
(SAP)
Single
View of
the Plan
![Page 4: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/4.jpg)
4National Grid
The technology stack
| Legacy Infrastructure and Big Data | 25th September 2018
Storage
ETL
Integration
Modelling
Visualisation
On Premises
![Page 5: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/5.jpg)
5National Grid
Performance challenges
We experienced a performance bottleneck on the
ETL servers, which had an adverse effect on
stakeholder engagement
We needed to continually re-optimise the ETL job schedule
Performance worsened non-linearly with new use cases
Predicting production performance was hard because pre-
prod did not have continually refreshed data
| Legacy Infrastructure and Big Data | 25th September 2018
ETL Schedule
![Page 6: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/6.jpg)
6National Grid
“It is difficult to
make predictions,
especially about the
future”
Danish Proverb
On premises vs cloud
On Premises
Expensive and slow to scale
up infrastructure
Accurate initial predictions
are therefore very
important
Cloud
Cheap and quick to scale up
infrastructure
Accurate initial predictions
are therefore less
important
| Legacy Infrastructure and Big Data | 25th September 2018
Original:
On Premises
Revised:
Oracle Cloud
![Page 7: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/7.jpg)
7National Grid
The main use case dominated the data model
design
1. We have a large number of input tables from different
systems
2. We have a clearly defined set of ways they will be used
Solution: combine all the tables from different sources into a
handful of big tables with “everything” in
Implementing new use cases was made harder
Lack of intermediate tables led to rework/duplication and
reduced performance
Data model
| Legacy Infrastructure and Big Data | 25th September 2018
Original:
Big tables
Revised:
Intermediate tables
![Page 8: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/8.jpg)
8National Grid
Data quality in legacy systems always needs
improving
Integrating systems reveals this poor data quality
New data platforms often offer better data management
tools than legacy systems
These tools can be used to improve data quality in legacy
systems
Business ownership of data can strengthen because data
quality is more visible
Data quality in legacy systems
| Legacy Infrastructure and Big Data | 25th September 2018
![Page 9: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/9.jpg)
9National Grid
Example process for correcting data
| Legacy Infrastructure and Big Data | 25th September 2018
Identify incorrect
asset data
Raise via data
quality portal
Triage
Investigate
Request evidence
Update data in
legacy system
Provide evidence
Request update
En
gin
eer
Data
Qu
ality
Team
Data
Delivery
Team
![Page 10: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/10.jpg)
10National Grid
Example process for correcting data
| Legacy Infrastructure and Big Data | 25th September 2018
Identify incorrect asset data
DQ rules to
monitor
changes
Update data in
legacy system
En
gin
eer
Data
Qu
ality
Ex
pe
rt
Raise via data quality portal
Data
Delivery
Team
![Page 11: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/11.jpg)
11National Grid
“Almost anything is
easier to get into
than out of”
Agnes Allen
“Except for legacy
systems, where it’s
easier to get things
out than put them
in”
Sam Young
Agnes Allen’s Law
Legacy systems are often
not designed with nice APIs
e.g. raising new workorders
Robotic Process Automation
may provide a workaround
| Legacy Infrastructure and Big Data | 25th September 2018
![Page 12: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/12.jpg)
12National Grid
Robotic Process Automation
Robotic Process Automation automates
processes using existing front end interfaces to
reduce the need for back end integration.
“Gaffer tape for systems integration” – not an enduring
solution, but surprisingly useful
Ideal for well defined processes with limited permutations
Not robust to front end interface changes
National Grid has deployed it in various back office tasks
- e.g. translating shopping carts into orders in supplier
system
We are considering the possibility of automating the
creation of work orders in a legacy system using RPA
| Legacy Infrastructure and Big Data | 25th September 2018
![Page 13: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/13.jpg)
13National Grid
“Organisations
which design
systems are
constrained to
produce designs
which are copies of
the communication
structures of these
organisations”
Melvin Conway
Conway’s Law
“Systems reflect the
organisation that designed
them”
e.g. siloed organisations
produce siloed systems
Your systems and data
structures are a mirror
| Legacy Infrastructure and Big Data | 25th September 2018
![Page 14: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/14.jpg)
14National Grid
Leverage insights into organisation
If you’re struggling to integrate systems/data,
there’s usually an organisational reason as well
as a technical one
The classic “these two systems don’t have compatible keys”
is a symptom of processes that don’t join up well
We need to be pragmatic – implementing workarounds in
systems is often necessary
However suppressing the symptoms is detrimental in the
long term, so communicate the organisational insights and
ensure they are followed up
| Legacy Infrastructure and Big Data | 25th September 2018
Case study:
Tracking efficiency savings throughout the lifecycle of
capital projects
![Page 15: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/15.jpg)
15National Grid
Legacy systems fit a legacy organisation
Your current (and future) organisation is different
from the one that designed the legacy systems
Legacy systems can make organisational change harder
Design integration to enable rather than restrict future
change
- e.g. intermediate tables vs big tables
| Legacy Infrastructure and Big Data | 25th September 2018
![Page 16: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/16.jpg)
16National Grid
Integrate relationships not just systems
One of the biggest benefits from major
integration projects is the relationships that
develop across the organisation
Work out ways to maintain and reinforce those relationships
after the end of the project
• Coffee catchups
• Co-location
• Cross-functional teams
• Restructure
| Legacy Infrastructure and Big Data | 25th September 2018
![Page 17: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/17.jpg)
17National Grid
We integrate legacy systems to enable
change.
So value flexibility and consider using
“gaffer tape” for quick wins.
Remember that systems reflect
organisations, so help integrate the
organisation not just the systems.
| Legacy Infrastructure and Big Data | 25th September 2018
Conclusions
![Page 18: Legacy Infrastructure and Big Data · 25.09.2018 · National Grid 7 The main use case dominated the data model design 1. We have a large number of input tables from different systems](https://reader033.vdocuments.us/reader033/viewer/2022041416/5e1c0a29f781ec5b93193226/html5/thumbnails/18.jpg)
| Legacy Infrastructure and Big Data | 25th September 2018