hybrid data platform

12
1 Shankar Radhakrishnan Impetus Hybrid Data Platform Cloud Environment Connected with On-Premise Data Environment

Upload: hadoop-summit

Post on 07-Jan-2017

247 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Hybrid Data Platform

1

Shankar RadhakrishnanImpetus

Hybrid Data Platform

Cloud Environment Connected with On-Premise Data Environment

Page 2: Hybrid Data Platform

2

About Me• Director of Big Data Engineering with Impetus• Focus on Enterprise data architecture, Data platform solution

deployment, High Performance & Optimization• Believer of “Data is the most important digital asset”

Page 3: Hybrid Data Platform

4

Need For Hybrid Data Platform• Mixed work-load scenarios on Hadoop• Applications’ long-tail usage of data platforms• Time-spent on data preparation than processing• Time-spent on data movement• Geo-centric data processing and provisioning requirements• Cost effective solution options• Untapped scale up and scale out capabilities of Cloud• Limitations with a physical data center/platform setup

Page 4: Hybrid Data Platform

5

Hybrid Data Platform

“Combination of on-premise physical data infrastructure with Cloud based Big Data platform - to use as one extended, complementary, scalable data infrastructure”

Page 5: Hybrid Data Platform

6

Considerations• Changes to current architecture– Impact on on-premise infrastructure– Impact on business processes– Data availability and accessibility in the Cloud

• Impact on data exchange policy and procedures– Data Characteristics – Data at rest & in-motion– Geographical considerations

• Data Security• Virtual Cloud Geo-Fencing, Cloud Boundaries• Investment considerations– Technology Choices, Maturity and Adoption

Page 6: Hybrid Data Platform

7

Hybrid Data Platform Architecture

Databases

OtherData

Sources

Sensitive Data

Text Files,Binary Files

Smar

t Int

erfa

ce L

ayer

Secu

rity &

Acc

ess C

ontr

ol

HadoopOn Cloud

On-PremiseHadoop

Landing Zone

On-PremiseHadoop

Data Lake

Secu

rity &

Acc

ess C

ontr

ol

Appl

icatio

n In

terf

aces

IntegrationCheck-point

On-Prem/Cloud

3rd Parties

Analytics

Data Scientists

Business

Data AcquisitionLayer

Data IntegrationLayer

Data ProvisioningLayer

User Management

Access Audit and Control

Metadata Management

Data Security Management

BAR Management

DR Management

Workload Management

Key Management Master Data Management Data Quality Management Operations Management

Data Governance Layer

Page 7: Hybrid Data Platform

8

Data Integration

HadoopOn CloudJob/Task

Profiler

On-PremiseHadoop

Data Lake

IntegrationCheck-point

On-Prem/CloudData Upload

WorkflowOrganizer

PayloadOrganizer

User Profile

NetworkProfile

Data Profile

Private, SecuredTunnel

Private, SecuredTunnel

TransmissionChannel

Security Checks

Page 8: Hybrid Data Platform

9

Execution Workflow

S3(Data Landing)

PayloadOrganizer

Private, SecuredTunnel

TransmissionChannel

Security Checks

PayloadDelivery

Cloud HSM

Identity &Access

Management

Key ManagementService

CertificateManager

QuickSight

SNS( Push Notification )

On-PremiseHadoop

Data Lake

Private, SecuredTunnel

Data Pipeline

SQS( Queue Service )

RedShiftData warehouse

Kinesis

EMR/MapReduce

Page 9: Hybrid Data Platform

10

Data Exchange & Security

Cloud HSM

Identity &Access

Management

Key ManagementService

CertificateManager

1

2

3 4Data Center

Direct Connect

Secure Tunnel

VPC

On premise Data Center hosts Hadoop Cluster and hasconnectivity established to the Cloud

1

Uses Direct Connect option to connect to the privateCloud setup

2

Uses secured VPN tunnel to the dedicated Cloud setupfor data exchange3

Hadoop on Cloud setup connected with data center,secured behind firewall and access restrictions

4

Role based access control, process execution privileges,Identity management

5

5

Page 10: Hybrid Data Platform

11

Benefits• Comprehensive Solution Options– Modular and complementary data management options

• Flexibility– Meets dynamic business and technology demands

• Performance and Scalability– Scale up and out

• Best of both worlds– Play to platform’s strengths

• Economic$– Hybrid model provides best of TCO and ROI

Page 11: Hybrid Data Platform

12

Case Study• One of the worlds

largest producer of commodities, natural ores, conventional and unconventional energy resources, with suppliers and consumers as end users of data analytics

• Need to build an Hybrid Data Analytics Environment covering areas such as Productivity, Supply Chain and Operations

• Data to be loaded in less than 20 minutes

• Analytics queries to run in less than 5-seconds on 95% of the queries

• Highly available environment with both on-premise and Cloud connectivity

Page 12: Hybrid Data Platform

13

Thank You !

@shankariyer www.linkedin.com/in/2shankar

www.linkedin.com/in/2shankar