big data ready enterprise
TRANSCRIPT
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL1
Big Data Ready EnterpriseSri Harsha Boda – Wipro Technologies
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL2
Big Data Ready Enterprise
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL3
Agenda
Common challenges when implementing at scale
How BDRE addresses the needs across the lifecycle
FastTrack Implementation using BDRE
Demo
Typical enterprise deployment view with BDRE
2
3
4
5
6
7
1
Typical use cases around Big Data Platform
Metadata Management in depth8
About BDRE
4
About BDRE
BDRE is an Apache Licensed (APL 2.0) open source project. Code is available on GitHub
Wipro’s largest opensource contribution till date.
Community choice winner of modern data applications track – Hadoop summit San Jose, 2016.
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL5
Typical use cases organizations are embarking on Big Data Analytics
Information Delivery
Enterprise Data Hub / Lake
Information, Integration & Governance
Batch Data Processing
Event Stream & Micro batch Processing
Enterprise Data Provisioning
Platform
Low Latency Store
Complex multistep pipeline
transformation
Migration of EDW workloads
Data as a Service
Enterprise Analytical Platform
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL6
Common challenges when implementing these use cases at scale
Skilled resource, shorter implementation cycles
Rapid Ingestion of data
Rework across several complex multi-step process
Robust application deployment support
Support flexible operations & SLA management
Robust operational metadata across technologies
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL7
Pluggable Architecture
Community Driven
Distribution Compatible
How BDRE addresses the needs across the lifecycle
Operational functions you like to build
Development effort from scratch
Basic Hadoop– at the base
“Pre-built operational functions”
Brought it by BDRE
HADOOP
APPPLICATIONS
Minimal development effort through Customization on
BDRE components
Supporting Operational Functions
OPERATIONAL METADATA RAPID INGESTION VISUAL DATA PIPELINE AUTOMATED WORKFLOW ONE TOUCH DEPLOYMENT SLA MANANGEMENT RICH VISUALIZATION
Value – Add through BDRE
With BDRE Without BDRE
Implementation Jumpstart
Big Data Ready Enterprise
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL8
FastTrack Implementation using BDREKey features that can be rapidly implemented using the product
Data Ingestion via Multiple Sources Abstraction layer: Component to ingest variety of data
(CPY, XML, DB, Mainframes)
Streaming Data Ingest – 16 sources with Twitter, Flume, logs, message queue
File Monitoring: Component to check validity of incoming data at file and record level
Cluster to Cluster Hive Table Migration
Job Automation & Security Integration
UI based Workflow Designer
Supports Hive, Pig, Map Reduce, Spark, R, Python
Automated Workflow Generator – Oozie/Airflow
Authentication : Integration with Kerberos & JAAS
Data Quality and Data Profiling
Enforce Data Quality and Data processing rules (during ingestion or post ingestion)
DQ Analysis, Integrity & Failure Handling
Data Loading - Test Data Generation
One Touch Deployment
Automated central deployment and application management.
Registry of all workflow processes / templates
Automated Process flow Planner
Operational Metadata & Lineage Job registry
Configuration management
Dependency management - Pipelining
Batch management/tracking
Real Time Execution status
Ingestion registry
Job monitoring and proactive/reactive alerting
Restartability
Analytics & Visualization
Support for Executing Models – R, Python, Spark
Zero Coding UI based configuration for common use cases
User Interface based metadata interaction& search
Data Exploration integration with notebooks
Visual Representation of workflow
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL9
DEMO
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL10
Typical enterprise deployment view with BDRE
NN RM
Browser
App Server
Eventing Framework
Espresso Email
Oozie Workflow Generator
Data Quality Workflow
Non HadoopWorkflows
Ingestion Workflow
Semantic Workflow
Bulk Data generationWorkflow
Job Deploy Scripts
SLA notification
BDRE UI App
BDRE Rest API
App Server
JAASEdge Node
OperationalMetadata
RDBMSMetastore
Rule Engine(for DQ)
Job
Job
Job
Job
Job
Hadoop Cluster
Proactive Reporting
APP Store(Git Repo)
Job Export/Import
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL11
BDRE Metadata Management system
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL12
Intra and Inter Process Dependency
Pid Enq id Parent id
300 Null Null
301 100 300
302 Null 300
303 Null 300
304 200 300
Process 101
Process 102
Process 103
Process 203
Process 204
Process 205
Process 202
Process 201
Process 100
Process 200
Process 401
Process 402
Process 301
Process 302
Process 303
Process 300
Process 304
Process 400
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL13
Job Status Management
InitJob
HaltJob(success)
TermJob(failure)
InitStep
HaltStep(Success)
TermStep(Failure)
BDREOperational
Metadata
Fail queue
Success queue
Consumer
JIRA
MQ
Halt and TermJob APIs can send message to MQ for proactive alerting
Alternatively BDRE could directly connect to any alerting/ticket mgmt system skipping the MQ
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL14
Batch Management
101
102
103
200201
202203
204
205
300301
302 303
304400
401 402Batch
Batch
BatchBatch
Queue
Batch
Batch
Queue
Batch
Logical pipeline between the processes
Process 200Process 300
Process 100Process 400
Workflow id 200
Workflow id 400Workflow id 100
Batch
A row is added to the queue table for all downstream upon each
successful execution of an upstream process.
Downstream looks up the queue and process all pending batches
en-queued by upstream.
Multiple source batch consumed = one target
batch is produced
Workflow 300
100
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL15
Data Quality Component
Map only MR job
Mapper 1 Mapper 2 Mapper n
Rules
Guvnor API
Rule definition
Rule engine UI
Bad records Good records
HadoopOriginal file with all records
© 2016 WIPRO LTD | WWW.WIPRO.COM | CONFIDENTIAL16
Important Links
BDRE- GitHub Repo -https://github.com/WiproOpenSourcePractice/openbdreContains source code, setup instructions and demo videos
To contribute, please sign up at:BDRE – Jira: https://openbdre.atlassian.net/
Please join the community https://groups.google.com/forum/#!forum/bdre.
If you have any questions/suggestions please email to [email protected] .