new database replication and data integration with hadoop...

Post on 04-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

New Database Replication and Data Integration with Hadoop and BI

Jeffrey Surretsky

NYOUG

December 2013

2

Big Data –Hadoop®

3

Petabyte

Exabyte

Zettabyte

Terabyte

The explosion of data continues to burden the data tool chain

Transactional DataTraditionally, only transactional data was generated and stored in databases

• Structured

• Measured growth

Human FilesBut over time, we started creating unstructured data

• Likes, tweets, relationships (social)

• Log files (machine)

• Exponential growth

Social & Machines have added exponentially

mainframe PC internet mobile machine

• Docs, Images, Video

• Multiple formats

• Fast growth

4

• Proliferation of new user generated data creation and data capture technologies

• Increased “interconnectedness” drives consumption (creating more data)

• Inexpensive storage makes it possible to keep more data longer

• Need to extract actionable insights from all data assets to gain competitive edge

*Source: IDC 2011

Big data market drivers

VelocityBatchNear timeReal timeStreams

VolumePetabytesRecordsTransactionsTables, files

VarietyStructuredUnstructuredSemi-structuredAll the above

3Vs

5

Big dataScaling up on RDBMs

• Partitioning

• Materialized Views

• In memory cache

• …and who are we kidding here!

RDBMS Yodabytes handle cannot!

6

Jan 1990

Big dataRDBMS Cluster

SQL

Jan 1990Feb

1990

SQL

Mar 1990

SQL

Apr 1990

SQL

May 1990

SQL

Jun 1990

SQL

Jul 1990

SQL

Aug 1990

SQL

Jun 2013

SQL

Controller

7

Big data - Hadoop

9

Big data – Hadoop benefits

Scalable storage

Massive parallel processing

Cost effective

10

Hadoop operational use cases

Staging

Warehousing

Archiving

1 2 3

Not glamorous, but highly effective.

11

Today’s solutions

Analytics

OLTPData

Warehouse

12

Log-based CDC Replication

• Near real-time log-based CDC from Oracle

• Applying Changes to Hadoop

13

Redo/Archive logs

Log-based CDC from Oracle-to-Oracle Architecture

Source Target

Export queue

Post queue

SQL

Post

Capture

Read

Export Import

Capture queue

14

Log-based CDC Replication – impact-free and limitless!

15

Capturequeue

Postqueue

Log-based CDC Data Integration Architecture

Target(s)

Capture

Read

JMS post

…And more

Combined source & target process implementation

Near real-time data integration

Custom App

Dell App

Oracle source

Redo/Archive logs

JMS queue

JMS queue

16

JMS queue

Log-based CDC Database Replication & Near Real-time Data Integration Summary

Source Target(s)

…And more

Near real-time data integration Custom app

Database replication

17

Connector for Hadoop

• Provides near real-time data replication from Oracle to Hadoop environments. The solution enables organizations to affordably replicate live data from Oracle tables

– In near real time to HDFS and Hive environments

– In real time to HBase

18

HBase HDFS

19

SQOOP

JMS

HBase HDFS

20

SQOOP

JMS

HBase HDFS

21

JMS

HBase HDFS

22

HBase HDFS

23

HBase HDFS

24

Log-based CDC

Connector for HadoopJMS

HBase HDFS

25

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

26

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

27

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

28

SharePlex for Oracle

Connector for HadoopJMS

HBase HDFS

29

Log-based CDC

Connector for HadoopJMS

HBase HDFS

30

Log-based CDC

Connector for HadoopJMS

HBase HDFS

31

Log-based CDC

Connector for HadoopJMS

HBase HDFS

32

Log-based CDC

Connector for HadoopJMS

HBase HDFS

33

Log-based CDC

Connector for HadoopJMS

HBase HDFS

34

Log-based CDC

Connector for HadoopJMS

HBase HDFS

35

Log-based CDC

Connector for HadoopJMS

HBase HDFS

36

Log-based CDC

Connector for HadoopJMS

HBase HDFS

37

Log-based CDC

Connector for HadoopJMS

HBase HDFS

38

Log-based CDC

SQOOP

Connector for HadoopJMS

HBase HDFS

SharePlex Connector for Hadoop architecture

39

Siebel CRM

PeopleSoftHR

SAPManufacturing

OracleFinancials

Data warehouse, stage and archive

Reporting Dashboards

Analytics

SharePlex Connector for Hadoop – use case

...

40

Questions

41

top related