overview of redpoint data management for hortonworks hadoop

8
Overview of RedPoint Data Management for Hortonworks Hadoop 2014

Upload: redpoint-global-inc

Post on 14-Jun-2015

380 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Overview of RedPoint Data Management for Hortonworks Hadoop

Overview of RedPoint Data Management for Hortonworks Hadoop2014

Page 2: Overview of RedPoint Data Management for Hortonworks Hadoop

2 RedPoint Global Inc.April 13, 2023© Confidential

What is Hadoop/Hadoop 2.0?

Hadoop 1.0

• All operations based on Map Reduce

• Intrinsic inconsistency of code based solutions

• Highly skilled and expensive resources needed

• 3rd party applications constrained by the need to generate code

Lowercostscaling

No needforstructure

Ease ofdatacapture

Hadoop 2.0

• Introduction of the YARN: “a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.”

• Mature applications can now operate directly on Hadoop

• Reduce skill requirements and increased consistency

Page 3: Overview of RedPoint Data Management for Hortonworks Hadoop

3 RedPoint Global Inc.April 13, 2023© Confidential

Challenges to Hadoop Adoption

• Severe shortage of MR skilled resources

• Very expensive resources and hard to retain

• Inconsistent skills lead to inconsistent results

• Under utilizes existing resources

• Prevents broad leverage of investments across enterprise

Skills Gap

• A nascent technology ecosystem around Hadoop

• Emerging technologies only address narrow slivers of functionality

• New applications are not enterprise class

• Legacy applications have built short term capabilities

Maturity & Governance

• Data is not useful in its raw state, it must be turned into information

• Benefit of Hadoop is that same data can be used from many perspectives

• Analysts must now do the structuring of the data based on intended use of the data

Data Into Information

Page 4: Overview of RedPoint Data Management for Hortonworks Hadoop

4 RedPoint Global Inc.April 13, 2023© Confidential

How RedPoint Helps

First YARN compliant ETL/data quality toolset on the market – brings together both Big Data and traditional data to create Big Information!

• Customer or Party Data

• Processing Speed

• Match Quality

• Ease of Use

by in:RANKED

#1The power to make your data the biggest asset your organization has

Page 5: Overview of RedPoint Data Management for Hortonworks Hadoop

5 RedPoint Global Inc.April 13, 2023© Confidential

RedPoint in a Hortonworks environment

APPL

ICAT

ION

SDA

TA S

YSTE

MSO

URC

ES

OLTP, ERP,CRM Systems

Documents, Emails

Web Logs,Click Streams

Social Networks

Machine Generated

SensorData

Geolocation Data

RepositoriesG

ov

ern

an

ce

&

Inte

gra

tio

n

Sec

uri

ty

Op

era

tio

ns

Data Access

Data Management

RDBMSEDWMPP

Data QualityData Integration

One application, one graphical user interface for traditional and Big Data

ELT ETL Cleanse Match De-dupe Merge/Purge Household Partition Parse Append Standardize Key Automate Monitor

Notify

Pre-built adaptersand ODBC drivers.

Pure YARN applicationNo MapReduce neededNo in-cluster installation

Page 6: Overview of RedPoint Data Management for Hortonworks Hadoop

6 RedPoint Global Inc.April 13, 2023© Confidential

Monitoring and Management Tools

Typical Hadoop architecture without RedPoint

AMBARI

MAPREDUCE

REST

DATA REFINEMENT

HIVEPIG

HTTP

STREAM

STRUCTURE

HCATALOG (metadata services)

Query/Visualization/ Reporting/Analytical

Tools and Apps

SOURCE DATA

- Sensor Logs- Clickstream- Flat Files- Unstructured- Sentiment- Customer- Inventory

DBs

JMSQueue’s

FilesFilesFiles

Data Sources

RDBMS

EDW

INTERACTIVE

HIVE Server2

LOAD

SQOOP

FLUME

WebHDFS

NFS

LOAD

SQOOP/Hive

Web HDFS

YARN

n

HDFS

1

Page 7: Overview of RedPoint Data Management for Hortonworks Hadoop

7 RedPoint Global Inc.April 13, 2023© Confidential

Monitoring and Management Tools

Typical Hadoop architecture with RedPoint

AMBARI

MAPREDUCE

REST

DATA REFINEMENT

HIVEPIG

HTTP

STREAM

STRUCTURE

HCATALOG (metadata services)

Query/Visualization/ Reporting/Analytical

Tools and Apps

SOURCE DATA

- Sensor Logs- Clickstream- Flat Files- Unstructured- Sentiment- Customer- Inventory

DBs

JMSQueue’s

FilesFilesFiles

Data Sources

RDBMS

EDW

INTERACTIVE

HIVE Server2

LOAD

SQOOP

WebHDFS

Flume

NFS

LOAD

SQOOP/Hive

Web HDFS

YARN

n

HDFS

1

Page 8: Overview of RedPoint Data Management for Hortonworks Hadoop

8 RedPoint Global Inc.April 13, 2023© Confidential