develop a custom data solution architecture with northbay

37
“Teaching Old Data New TricksBrian Barker • CEO • NorthBay Solutions John Puopolo • SVP • Engineering • Eliza Corporation Ali Khan • Director, Business Intelligence and Analytics • Scholastic Sai Reddy Thangirala • Solutions Architect • Amazon Web Services

Upload: amazon-web-services

Post on 14-Apr-2017

924 views

Category:

Business


0 download

TRANSCRIPT

Page 1: Develop a Custom Data Solution Architecture with NorthBay

“Teaching Old Data New Tricks™”

Brian Barker • CEO • NorthBay Solutions

John Puopolo • SVP • Engineering • Eliza Corporation

Ali Khan • Director, Business Intelligence and Analytics • Scholastic

Sai Reddy Thangirala • Solutions Architect • Amazon Web Services

Page 2: Develop a Custom Data Solution Architecture with NorthBay

Agenda• Big Data on AWS• NorthBay• Eliza Corporation Case Study

• Challenges Eliza Faced• Strategic Goals• Why a Data Lake Approach was Chosen• Outcomes & Benefits Eliza Achieved

• Scholastic Case Study• Challenges• Goals• The AWS/NorthBay Decision• How the Initiative Unfolded• Key Learnings

Page 3: Develop a Custom Data Solution Architecture with NorthBay

Data is Growing

of new data will be created every second for every human being on the planet by 2020

http://www.whizpr.be/upload/medialab/21/company/Media_Presentation_2012_DigiUniverseFINAL1.pdf

1.7MB

compound annual growth rate of 58% surpassing $1 billion by 2020 forecasted for the Hadoop market

http://www.ap-institute.com/big-data-articles/big-data-what-is-hadoop-%E2%80%93-an-explanation-for-absolutely-anyone.aspx

http://www.marketanalysis.com/?p=279

58%of all data is ever analyzed and used at the moment

http://www.technologyreview.com/news/514346/the-data-made-me-do-it/

0.5%<

Page 4: Develop a Custom Data Solution Architecture with NorthBay

Big Data Is for Everyone

The market for Big Data technologies is growing more than six times faster than the information technology market as a whole….

…and those companies who use their data well win.

Page 5: Develop a Custom Data Solution Architecture with NorthBay

Why AWS for Big Data?

Immediately Available

Broad and Deep Capabilities

Trusted and Secure

Scalable

Page 6: Develop a Custom Data Solution Architecture with NorthBay

AWS Provides the Most Complete Platform for Big DataIt’s easy to get data to AWS, store it securely, and analyze it with the engine of your choice, without any long-term commitment or vendor lock-in

CollectImport/ExportSnowballDirect ConnectVM Import/Export

StoreAmazon S3EMRAmazon GlacierAmazon RedshiftDynamoDBAurora

AnalyzeAmazon KinesisLambdaEMREC2

Page 7: Develop a Custom Data Solution Architecture with NorthBay

What Can You Do With Big Data on AWS?

Big Data Repositories Clickstream Analysis ETL Offload

Machine Learning Online Ad Serving BI Applications

Page 8: Develop a Custom Data Solution Architecture with NorthBay

“Teaching Old Data New Tricks™” with NorthBay

Page 9: Develop a Custom Data Solution Architecture with NorthBay

“Teaching Old Data New Tricks™”

Untapped wealth - Companies gain tremendous leverage when “Teaching Old Data New Tricks™”

• So what does that mean?• You’ll hear 2 exciting Customer

Examples/Use Cases presented today

Building a HIPAA compliant Data Lake

Re-tooling old on premise technology on the fly

Customer Examples/Use Cases

Page 10: Develop a Custom Data Solution Architecture with NorthBay

Scholastic Preview of Coming Attractions• How did an old school $1.5B 100-year-old company re-invent its

old school IBM and Microsoft based big data system & analytics system on the fly?

• What was their starting point?• What factors did they consider when making their decision?• What did they decide on for technology and partners and why?• How did they implement?• What were the results?• Lessons learned?

Page 11: Develop a Custom Data Solution Architecture with NorthBay

AWS & NorthBay Background

Global Provider of Big Data Solutions

250+ Full-time professionals

145+ Clients

200+ Solutions launched

Page 12: Develop a Custom Data Solution Architecture with NorthBay

Conceptual Data Lake Architecture

Page 13: Develop a Custom Data Solution Architecture with NorthBay

Eliza Preview of Coming Attractions

• How does a high flying Healthcare services company re-platform its Enterprise Data Platform while processing millions of 'interactions' every day.

• Why the need to change?• What strategic goals had to be achieved?• What is so tough about "named value pairs" • Why a Data Lake and why NorthBay?• Which AWS services were chosen to leverage?• What did they decide on for technology and partners and why?• How did it turn out?• What did they learn?

Page 14: Develop a Custom Data Solution Architecture with NorthBay

Eliza CorporationJohn Puopolo, SVP, Engineering, Eliza Corporation

Page 15: Develop a Custom Data Solution Architecture with NorthBay

About Eliza Corporation

• Founded 2000• Leader in Health Engagement Management

(HEM) outreach services• Hundreds of millions of outreaches for

intensive operation and analytics processing• High-volume semi-structured data, complex

business flow of data• Variety of analytics/consumption needs

ranging from portal for customers to ML workloads

Page 16: Develop a Custom Data Solution Architecture with NorthBay

Challenges Eliza Faced

Eliza Corporation analyzes more than 300 million interactions per year 

Outreach questions and responses form a decision tree, and each question and response are captured as a pair, E.G.: <question, response> = <“Did you visit your physician in

the last 30 days?”, “Yes”>

 

Diverse downstream consumption requirements

 

Challenging to process and analyze data

Page 17: Develop a Custom Data Solution Architecture with NorthBay

Strategic Goals

Create next generation data architecture

Decouple Storage and Compute

Ability to process old & new data streams

Achieve HIPAA compliance

Ingest & store original datasets

Allow both real-time & batch processing

Enable access through entitlements and governance

Increase self-service for end-users

Page 18: Develop a Custom Data Solution Architecture with NorthBay

Conceptual Data Lake ArchitectureMonitoring, auditing, management, and alerting

Data System Analytics (Lineage, Profiling)

EDWETL

Data Lake Storage

Data Lake Archive

Catalog & Search

& Data Discovery

API & UI

Entitlements & Authorizations

Data Quality & Governance

Streaming Data Sources

Batch Data Sources

Data Sources & Ingestion Processing & Storage Consumption & Analytics

Real Time Analytics

BI tools

Hadoop (Shared

services)

Business Units

BI UI

Hadoop, SAS

(Business Unit

Dedicated)

D

D

D

D

D

Page 19: Develop a Custom Data Solution Architecture with NorthBay

Benefits of the New Enterprise Data Platform Architecture

• Hub & spoke model for one original copy of all enterprise analytics data

• Quality layer for consistent transformations and cleansing of data• Governance layer for entitlements and security management • Enable multiple consumption patterns called projections• A purpose-designed schema for an Enterprise Data Warehouse

(Redshift) for efficient reporting of known queries • Streamline and automated ingestion of source batch and streaming

data reducing human/manual touch points

Page 20: Develop a Custom Data Solution Architecture with NorthBay

Technical Architecture

Page 21: Develop a Custom Data Solution Architecture with NorthBay

Major AWS Services Used

Aurora

Kinesis + Kinesis Streams

Amazon Redshift Dynamo DB

Hive, Presto, Spark on EMR

CloudSearch, EC2

Page 22: Develop a Custom Data Solution Architecture with NorthBay

Benefits of a New Enterprise Data Platform

• Streamlined data load process by enabling schema on read

• Improved business agility by allowing schema on read

• Improved ability to manage costs by allowing separation of costs

• Provided ability to enable resources to scale on-demand

• Reduced end-to-end client analytics time

Page 23: Develop a Custom Data Solution Architecture with NorthBay

Key Learnings

• The nature of our data is name-value. We were doing too many transformations due to our original storage formats.

• Using mini-PoCs to form hypotheses and prove/disprove them led to an emergent architecture, which pointed us towards a data lake

• A data lake architecture fits our core business and growth plans extremely well

Page 24: Develop a Custom Data Solution Architecture with NorthBay

ScholasticAli Khan, Director, Business Intelligence and Analytics, Scholastic

Page 25: Develop a Custom Data Solution Architecture with NorthBay

About Scholastic

in annual revenue. The worlds largest publisher and

distributor of children’s books

website for U.S. elementary school teachers

employees globally

1.6B #1 8,400+

countries languages

165+ 45+ A leader in comprehensive

educational solutions

Page 26: Develop a Custom Data Solution Architecture with NorthBay

Existing Platform & Challenges• We taught old data new tricks

• IBM AS/400 was primary data warehouse platform, supplemented by Microsoft SQL Server to enable business intelligence

• 5,500+ AS/400 workloads, 350+ SQL Server workloads

• Inflexible architecture – slow time to market

• Unable to meet internal SLAs due to performance of daily ETL processes

• Scalability limitations with SQL Server Analysis Services (SSAS) for dashboards/reports

• Limited ability to perform self-service business intelligence

28

 

Page 27: Develop a Custom Data Solution Architecture with NorthBay

Project Goals

Improve performance, scalability, availability, logging, security

Enable self-service business intelligence

Integrate with existing technology stack

Align with the tech strategy (DevOps model, Cloud First)

Leverage the skill set of current team (SQL/relational)

Team up with an experienced partner

Page 28: Develop a Custom Data Solution Architecture with NorthBay

• AWS was chosen because of agility, scalability, elasticity, security and alignment with corporate strategy

• Redshift was chosen to replace AS400 and SQL Server for its relational-style high performance data store

• NorthBay was chosen for their expertise in Big Data and Amazon Redshift migrations

The Decision

30

Page 29: Develop a Custom Data Solution Architecture with NorthBay

Pilot Plans

Migrate function area in key business unit during a 3-month pilot

Demonstrate immediate business value

Stand up the AWS environment to allow IT to gain competence with AWS

Page 30: Develop a Custom Data Solution Architecture with NorthBay

Pilot Outcomes

Create core framework for migration

Implement ELT architecture and perform

validation

Establish visualization/self-service

capability through Tableau

Page 31: Develop a Custom Data Solution Architecture with NorthBay

Technical Architecture

AS400 / DB2(Source DB)

EMR Cluster running Sqoop Script

Output Bucket

EC2 Instance running Copy Command

Redshift (Staging)

Tableau(Reporting Tool)

Data Pipeline

SNS Topic

(Pipeline Status) (Pipeline Failure)

SNS Email Notification

Lambda (Save Pipeline Stats)

RDS MySQL Instance

(Save Pipeline Stats)

(Pipeline Configurations)

DynamoDB

DynamoDB Redshift (Data Warehouse)

RDS MySQL Instance

Page 32: Develop a Custom Data Solution Architecture with NorthBay

Core Framework• Jobs and job groups are defined as metadata in DynamoDB• Control-M Scheduler, Custom Application and Data Pipeline for

Orchestration• ELT Process with EMR/Sqoop for Extraction, Redshift Load and Transform

the data through SQL scripts• Core Framework allows for

• Restart capability from point of failure

• Capturing of operational statistics ( # of rows updated)

• Audit capability (which feed caused the fact to change)

34

Page 33: Develop a Custom Data Solution Architecture with NorthBay

Data Visualization Through Tableau

• Business users have access to facts/dimensions for standard reports through Tableau

• Power users have access to Staging tables for Ad-Hoc queries through Tableau

• Data Scientists have access to Files in S3 (from all extracts serving as Data Archive) using Hive and/or Presto

35

Page 34: Develop a Custom Data Solution Architecture with NorthBay

Accelerating the Program Timeline

36

• CTO moved budget forward to:

• Reduce project timeline by 50%

• Eliminate overhead of 2 platforms

• Parallel work streams (swim lanes) utilized the same core framework for migrating data for other business units

• NorthBay partners with each of those work streams to accelerate migration

• Users wanted to be on the new platform sooner

Page 35: Develop a Custom Data Solution Architecture with NorthBay

Lessons Learned - Technology

Isolate core framework with project specific code repositories

Make appropriate schema changes when migrating to new platform

Customize Framework for gathering operational stats (eg: # of rows loaded etc.)

Start with test automation tools and Acceptance Test Driven Development (ATDD) earlier in the project

Page 36: Develop a Custom Data Solution Architecture with NorthBay

Lessons Learned – Program Execution

Creating new data platforms and migrating data into them is easy, especially with AWS. Decommission of existing data platforms is hard!

“Data Champion” / “Data Guide” partnership absolutely critical for successful adoption of new platforms and working models

Importance of strong Agile coaches while scaling out Agile teams

Page 37: Develop a Custom Data Solution Architecture with NorthBay

Questions & AnswersBrian Barker • CEO • NorthBay Solutions [email protected] John Puopolo • SVP • Engineering • Eliza CorporationAli Khan • Director, Business Intelligence and Analytics • ScholasticSai Reddy Thangirala • Solutions Architect • Amazon Web Services

www.northbaysolutions.com [email protected]