creating a modern data architecture for digital transformation

36
Creating a Modern Data Architecture for Digital Transformation Rich Cullen Manager – Solutions Architecture, UK & NEUR

Upload: mongodb

Post on 16-Apr-2017

417 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Creating a Modern Data Architecture for Digital Transformation

Rich CullenManager – Solutions Architecture, UK & NEUR

Agenda

Transformation Challenges01

Architecture Patterns02

Summary03

Digital TransformationChallenges

Building Blocks – The New Enterprise Stack

TRADITIONAL MODERNISED

APPS On-Premise, Monoliths SaaS, Microservices

DATABASE Relational Non-Relational

EDW Teradata, Oracle, etc. Hadoop

COMPUTE Scale-Up Server Containers / Commodity Server / Cloud

STORAGE SAN Local Storage & Data Lakes

NETWORK Routers and Switches Software-Defined Networks

Challenges of Digital Transformation

Growth in Data Silos

Lack Real-Time Insight

Existing Systems Overwhelmed

Architecture Patterns

• Single View• Event Sourcing• CQRS• Data Domains• Polyglot Processing• Data Lake• Microservices• Containers• Continuous Delivery• Data-as-a-Service

Modern Approaches & Architecture Patterns

Turn Data into a Cross-Enterprise Asset

Single View Data-as-a-ServiceData Lake

Single View

• AKA: Data Hub, 360 Degree View, Multi-Channel display

• A system that gathers data…

• …from multiple, disconnected sources…

• …and aggregates to provide a single view

• Foundation for analytics – cross-sell, upsell, churn risk

What is a Single View?

• Customer

• Product

• Employee

• Asset

• Risk

• City

• Anything meaningful to a business

A Single View… of what?

…Mobile

App

Web

Call Centre CRM Social

Feed

But Data Is From Different Sources…

Why Not Use The Usual Tech – Relational Databases?

Database MUST simultaneously handle source systems complexity

Untenable change management

Complex data access

…Mobile

App

Web

Call Centre CRM Social

Feed

COMMON FIELDSCustomerID | eMail |

DYNAMIC FIELDSCan vary from record to record: location, action

Single View

Solution: Aggregate With A Dynamic Schema

• Flexible data model • Rich query, aggregation, search & reporting• High availability• Predictable scalability• Flexible deployment model

Single View – Required Database Capabilities

Single View – High Level Data Flow

Source:Web App

Source:CRM App

Source: Mainframe

System

Batch or real-time

Documents

Customer Service App

Churn Analytics

Risk Model

Real-Time Access

Update Queue

… GroupFilterSortCountAverageDeviations

Valid

atio

n

• Flexible data model • Rich query, aggregation, search & reporting• High availability• Predictable scalability• Flexible deployment model

Why MongoDB for Single View?

Single View of CustomerInsurance leader generates coveted single view of customers in 90 days – “The Wall”

Problem Why MongoDB ResultsProblem Solution Results

No single view of customer, leading to poor customer experience and churn

145 years of policy data, 70+ systems, 24 800 numbers, 15+ front-end apps that are not integrated

Spent 2 years, $25M trying build single view with Oracle – failed

Built “The Wall,” pulling in disparate data and serving single view to customer service reps in real time

Flexible data model to aggregate disparate data into single data store

Expressive query language and secondary indexes to serve any field in real time

Prototyped in 2 weeks

Deployed to production in 90 days

Decreased churn and improved ability to upsell/cross-sell

Operationalised Data Lake

• Centralised repository for data collected from operational systems

• Exploratory analytics

• Extension of EDW: often based on Hadoop

• 50% of organisations invested in data lakes*

* Gartner

What is a Data Lake?

Image courtesy of Cloudera

Data Warehouse/Data Lake Challenges

http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html

“Thru 2018, 70 percent of Hadoop deployments will not meet cost savings and revenue generation objectives due to skills and integration challenges.”Nick Heudecker, Research Director, Data Management & Integration

• Unify analytics with operational applications

• Create smart, contextually aware, data-driven

apps & insights

• Integrate operational database with data lake

How To Avoid Being In The 70%

• Smart/native integration with the data lake

• Powerful real-time analytics

• Flexible, governed data model

• Scale with the data lake

• Sophisticated management & security

• MongoDB provides all these capabilities

Operational Database Requirements

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing

Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalised Data Lake

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing

Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalised Data Lake

Configure where to land incoming data

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing

Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalised Data Lake

Raw data processed to generate analytics models

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing

Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalised Data LakeMongoDB exposes

analytics models to operational apps. Handles real time

updates

Mes

sage

Que

ue

Customer Data Mgmt Mobile App IoT App Live Dashboards

Raw Data

Processed Events

Distributed Processing

Frameworks

Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations

Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model

Sensors

User Data

Clickstreams

Logs

Churn Analysis

Enriched Customer Profiles

Risk Modeling

Predictive Analytics

Real-Time Access

Batch Processing, Batch Views

Design Pattern: Operationalised Data Lake

Compute new models against

MongoDB & HDFS

Problem Why MongoDB ResultsProblem Solution Results

Existing EDW with nightly batch loads

No real-time analytics to personalize user experience

Application changes broke ETL pipeline

Unable to scale as services expanded

Microservices architecture running on AWS

All application events written to Kafka queue, routed to MongoDB and Hadoop

Events that personalize real-time experience (ietriggering email send, additional questions, offers) written to MongoDB

All event data aggregated with other data sources and analyzed in Hadoop, updated customer profiles written back to MongoDB

2x faster delivery of new services after migrating to new architecture

Enabled continuous delivery: pushing new features every day

Personalized user experience, plus higher uptime and scalability

UK’s Leading Price Comparison SiteOut-pacing Internet search giants with continuous delivery pipeline powered by microservices & Docker running MongoDB, Kafka and Hadoop in the cloud

Data-as-a-Service

• Development agility

• Data re-use

• Operational efficiency

• Corporate governance and data lineage

• Cost accountability

Standardising the Database Environment

API Access Layer

Operational Data

CustomersProducts

AccountsTransactions

Physical Infrastructure

App1 App2 App3• Shared, multi-tenant database

accessible via a common API• Exposes CRUD, search,

geospatial, graph, analytics• Each data domain isolated into

its own replica set• Logically managed as one

service, UI for self-service provisioning & scaling

Data-as-a-Service High Level Architecture

Wrapping Up

Patterns for Modern Data Architectures

Existing Systems OverwhelmedGrowth in Data Silos Lack Real-Time Insight

Single View Data-as-a-ServiceOperationalised

Data Lake