big data ready enterprise

17
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL 1 Big Data Ready Enterprise Sri Harsha Boda – Wipro Technologies

Upload: bigdata-meetup-kochi

Post on 08-Jan-2017

52 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL1

Big Data Ready EnterpriseSri Harsha Boda – Wipro Technologies

Page 2: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL2

Big Data Ready Enterprise

Page 3: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL3

Agenda

Common challenges when implementing at scale

How BDRE addresses the needs across the lifecycle

FastTrack Implementation using BDRE

Demo

Typical enterprise deployment view with BDRE

2

3

4

5

6

7

1

Typical use cases around Big Data Platform

Metadata Management in depth8

About BDRE

Page 4: Big data ready Enterprise

4

About BDRE

BDRE is an Apache Licensed (APL 2.0) open source project. Code is available on GitHub

Wipro’s largest opensource contribution till date.

Community choice winner of modern data applications track – Hadoop summit San Jose, 2016.

Page 5: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL5

Typical use cases organizations are embarking on Big Data Analytics

Information Delivery

Enterprise Data Hub / Lake

Information, Integration & Governance

Batch Data Processing

Event Stream & Micro batch Processing

Enterprise Data Provisioning

Platform

Low Latency Store

Complex multistep pipeline

transformation

Migration of EDW workloads

Data as a Service

Enterprise Analytical Platform

Page 6: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL6

Common challenges when implementing these use cases at scale

Skilled resource, shorter implementation cycles

Rapid Ingestion of data

Rework across several complex multi-step process

Robust application deployment support

Support flexible operations & SLA management

Robust operational metadata across technologies

Page 7: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL7

Pluggable Architecture

Community Driven

Distribution Compatible

How BDRE addresses the needs across the lifecycle

Operational functions you like to build

Development effort from scratch

Basic Hadoop– at the base

“Pre-built operational functions”

Brought it by BDRE

HADOOP

APPPLICATIONS

Minimal development effort through Customization on

BDRE components

Supporting Operational Functions

OPERATIONAL METADATA RAPID INGESTION VISUAL DATA PIPELINE AUTOMATED WORKFLOW ONE TOUCH DEPLOYMENT SLA MANANGEMENT RICH VISUALIZATION

Value – Add through BDRE

With BDRE Without BDRE

Implementation Jumpstart

Big Data Ready Enterprise

Page 8: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL8

FastTrack Implementation using BDREKey features that can be rapidly implemented using the product

Data Ingestion via Multiple Sources Abstraction layer: Component to ingest variety of data

(CPY, XML, DB, Mainframes)

Streaming Data Ingest – 16 sources with Twitter, Flume, logs, message queue

File Monitoring: Component to check validity of incoming data at file and record level

Cluster to Cluster Hive Table Migration

Job Automation & Security Integration

UI based Workflow Designer

Supports Hive, Pig, Map Reduce, Spark, R, Python

Automated Workflow Generator – Oozie/Airflow

Authentication : Integration with Kerberos & JAAS

Data Quality and Data Profiling

Enforce Data Quality and Data processing rules (during ingestion or post ingestion)

DQ Analysis, Integrity & Failure Handling

Data Loading - Test Data Generation

One Touch Deployment

Automated central deployment and application management.

Registry of all workflow processes / templates

Automated Process flow Planner

Operational Metadata & Lineage Job registry

Configuration management

Dependency management - Pipelining

Batch management/tracking

Real Time Execution status

Ingestion registry

Job monitoring and proactive/reactive alerting

Restartability

Analytics & Visualization

Support for Executing Models – R, Python, Spark

Zero Coding UI based configuration for common use cases

User Interface based metadata interaction& search

Data Exploration integration with notebooks

Visual Representation of workflow

Page 9: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL9

DEMO

Page 10: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL10

Typical enterprise deployment view with BDRE

NN RM

Browser

App Server

Eventing Framework

Espresso Email

Oozie Workflow Generator

Data Quality Workflow

Non HadoopWorkflows

Ingestion Workflow

Semantic Workflow

Bulk Data generationWorkflow

Job Deploy Scripts

SLA notification

BDRE UI App

BDRE Rest API

App Server

JAASEdge Node

OperationalMetadata

RDBMSMetastore

Rule Engine(for DQ)

Job

Job

Job

Job

Job

Hadoop Cluster

Proactive Reporting

APP Store(Git Repo)

Job Export/Import

Page 11: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL11

BDRE Metadata Management system

Page 12: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL12

Intra and Inter Process Dependency

Pid Enq id Parent id

300 Null Null

301 100 300

302 Null 300

303 Null 300

304 200 300

Process 101

Process 102

Process 103

Process 203

Process 204

Process 205

Process 202

Process 201

Process 100

Process 200

Process 401

Process 402

Process 301

Process 302

Process 303

Process 300

Process 304

Process 400

Page 13: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL13

Job Status Management

InitJob

HaltJob(success)

TermJob(failure)

InitStep

HaltStep(Success)

TermStep(Failure)

BDREOperational

Metadata

Fail queue

Success queue

Consumer

JIRA

MQ

Halt and TermJob APIs can send message to MQ for proactive alerting

Alternatively BDRE could directly connect to any alerting/ticket mgmt system skipping the MQ

Page 14: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL14

Batch Management

101

102

103

200201

202203

204

205

300301

302 303

304400

401 402Batch

Batch

BatchBatch

Queue

Batch

Batch

Queue

Batch

Logical pipeline between the processes

Process 200Process 300

Process 100Process 400

Workflow id 200

Workflow id 400Workflow id 100

Batch

A row is added to the queue table for all downstream upon each

successful execution of an upstream process.

Downstream looks up the queue and process all pending batches

en-queued by upstream.

Multiple source batch consumed = one target

batch is produced

Workflow 300

100

Page 15: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL15

Data Quality Component

Map only MR job

Mapper 1 Mapper 2 Mapper n

Rules

Guvnor API

Rule definition

Rule engine UI

Bad records Good records

HadoopOriginal file with all records

Page 16: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL16

Important Links

BDRE- GitHub Repo -https://github.com/WiproOpenSourcePractice/openbdreContains source code, setup instructions and demo videos

To contribute, please sign up at:BDRE – Jira: https://openbdre.atlassian.net/

Please join the community  https://groups.google.com/forum/#!forum/bdre.

If you have any questions/suggestions please email to  [email protected] .

Page 17: Big data ready Enterprise

© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL17

Sri Harsha Boda

Thank You

[email protected]