oracle goldengate and apache kafka a deep dive into real-time data streaming

57
[email protected] www.rittmanmead.com @rittmanmead 1

Upload: michael-rainey

Post on 16-Apr-2017

373 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead 1

Page 2: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Michael Rainey | KScope16

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

2

Page 3: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Introduction

3

• Michael Rainey - Data Integration Lead - America- Oracle Data Integration expertise - Blog: http://ritt.md/mRainey - Oracle ACE Director

@mRainey

Page 4: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

About Rittman Mead

4

•World’s leading specialist partner for technical excellence, solutions delivery and innovation in Oracle Data Integration, Business Intelligence, Analytics and Big Data

•Providing our customers targeted expertise; we are a company that doesn’t try to do everything… only what we excel at

•70+ consultants worldwide including 1 Oracle ACE Director and 3 Oracle ACEs, offering training courses, global services, and consulting

•Founded on the values of collaboration, learning, integrity and getting things done

Unlock the potential of your organization’s data

•Comprehensive service portfolio designed to support the full lifecycle of any analytics solution

Page 5: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead 5

Visual Redesign Business User Training

Ongoing SupportEngagement Toolkit

Average user adoption for BI platforms is below 25%

Rittman Mead’s User Engagement Service can help

More info: http://ritt.md/ue

Page 6: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Today’s New Data Challenge

6

Page 7: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Data Integration Today

7

Page 8: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Typical Example - Marketing

8

• Financial data stored in RDBMS• Social media data, web logs, Google analytics, etc all in

various formats• Bring it all together for analysis ‣ Marketing campaign effect on sales

Page 9: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Relational Data Replication - Oracle GoldenGate

9

Page 10: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data (Then)

10

Page 11: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data (Now)

11

Page 12: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Streaming Data - Apache Kafka

12

“Publish-subscribe messaging rethought as a distributed commit log”

Image source: kafka.apache.org/

Page 13: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Streaming Data - Apache Kafka

13

Image source: kafka.apache.org/

Page 14: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Kafka - How is it used?

14

• Pure Event Streams• System Metrics• Derived Streams• Hadoop Data Loads / Data Publishing• Application Logs• Database Changes- Log Compaction - Data cleansing

Image source: confluent.io

Page 15: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Let’s Jump Right In

15

• An example…near and dear to my heartOne single view of the Oracle Data Integrator logs!

- Oracle Data Integrator session logs stored in the repository - ODI Agent logs are text based log files - To see the full picture of your ODI environment, they must be

combined

Page 16: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead 16

oracle.com/technetwork/database/bigdata-appliance/downloads

Page 17: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data - we talked about this…

17

Page 18: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

ODI Agent Logs to Kafka via Logstash

18

Page 19: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

ODI Agent Logs to Kafka via Logstash

18

Page 20: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Extract from the ODI Repository with GoldenGate 12c

19

• Prepare the database • Setup GoldenGate for Oracle Database- Install and configure • Setup Manager, Extract and Pump parameter files• Add Extract and Pump process groups• Start!

Page 21: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Prepare the Database for GoldenGate Extract - OGG User

20

Page 22: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Prepare the Database for GoldenGate Extract - Logging

21

Page 23: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Add Table Supplemental Logging

22

Page 24: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate Manager Parameter File - Source

23

Page 25: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate Extract Parameter File

24

Page 26: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate Pump Parameter File

25

Page 27: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Add Extract and Pump Process Groups

26

Page 28: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Stream ODI Agent Logs to Kafka via Logstash

27

• Application log processing is a standard use for Kafka- Many approaches to extract logs • Logstash- Part of the Elastic (formerly ELK) stack - Robin Moffatt blogged —> http://ritt.md/kafka-elk - Producer configuration for Kafka

Page 29: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Logstash to Kafka - Setup and Startup

28

• Startup Zookeeper- Already installed on Big Data Lite • Set Kafka server.properties- Broker ID - Number of partitions - Log retention period - Zookeeper connection • Start Kafka

Page 30: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Setup Logstash Configuration File

29

• Configuration File

• Start Logstash

Page 31: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

ODI Agent Logs to Kafka!

30

• Start the Kafka Console Consumer - delivered with Kafka

• Start the ODI Agent and…messages!

Page 32: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

ODI Agent Logs to Kafka!

31

Page 33: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate Transactions to Kafka

32

Page 34: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data

33

• Kafka one of many handlers- HDFS, HBase, Flume • Pluggable Formatters- Convert trail file transactions to alternate format - Avro, delimited text, JSON, XML • Metadata Provider- Handles mapping of source to target columns that differ in structure/name - Similar to SOURCEDEF file in GoldenGate - Avro or Hive

Page 35: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data - Kafka Handler

34

• Standard GoldenGate Extract / Pump processes- We just set this up • Replicat parameter file & process group• Kakfa Handler configuration• Kafka Producer properties- Note: Kafka 0.9.0+ now certified with GoldenGate for Big Data

12.2.1.1

Page 36: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Prerequisites

35

• Zookeeper & Kafka up and running• Add topic to broker up front vs dynamically- Option to create a topic per table (12.2.0.1.1) • Kafka Handler must have access to broker server• Kafka libraries must match Kafka version

Page 37: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Replicat Parameters

36

Page 38: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Kafka Handler Properties

37

• Properties allow communication between the GoldenGate adapter and Kafka

Page 39: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Kafka Handler Properties

38

• gg.handlerlist = kafkahandler• gg.handler.kafkahandler.type = kafka• gg.handler.kafkahandler.KafkaProducerConfigFile = kafka_producer.properties• gg.handler.kafkahandler.TopicName = odirepo- Kafka topic name • gg.handler.kafkahandler.format = json- Pluggable Formatter - Avro recommended for Kafka… • gg.handler.kafkahandler.BlockingSend = true|false • gg.handler.kafkahandler.includeTokens = true|false • gg.handler.kafkahandler.mode = tx- Transaction vs Operation mode

Page 40: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…One Topic Per Table

39

• gg.handler.kafkahandler.topicPartitioning=table - Option to split schema into one topic per table- Can be created dynamically• gg.handler.kafkahandler.mode=op - Operation mode to track individual table operations

Page 41: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Kafka Handler Properties

40

• goldengate.userexit.timestamp = utc• goldengate.userexit.writers = javawriter• javawriter.stats.display = TRUE• javawriter.stats.full = TRUE• gg.log = log4j• gg.log.level = INFO• gg.report.time = 30sec• gg.classpath = dirprm/:/u01/kafka/kafka_2.10-0.8.2.1/libs/*:- Location of the Kafka libraries• javawriter.bootoptions = -Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

Page 42: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Kafka Producer Configuration

41

• Access to the Kafka producer configuration parameters

More on this later

Page 43: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka…Startup

42

• Create a topic in Kakfa

• Add Replicat process group to GoldenGate on target

• Start Kafka console consumer

• Start GoldenGate extract/pump on source, replicat on target

Page 44: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate and Kafka Integration Complete!

43

Page 45: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Schemas…

44

• Schema automatically created - Stored in <ogg_home>/dirdef directory - Based on gg.handler.kafkahandler.format setting

Page 46: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Schemas…

44

• Schema automatically created - Stored in <ogg_home>/dirdef directory - Based on gg.handler.kafkahandler.format setting

Page 47: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Schemas…

44

• Schema automatically created - Stored in <ogg_home>/dirdef directory - Based on gg.handler.kafkahandler.format setting

Page 48: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

GoldenGate Big Data Adapter - What to Think About

45

• GoldenGate could be a single point of failure - Kafka is a fault-tolerant, distributed system • Source transactions may end up larger than expected - max.request.size • Need for speed?- batch.size - linger.ms - BlockingSend = false

Page 49: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Why GoldenGate with Kafka?

46

• GoldenGate- …is non-invasive - …has checkpoints for recovery - …moves data quickly - …is easy to setup

Page 50: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

In conclusion

47

• The new data challenge, not quite as challenging with Kafka- Kafka throughout

Page 51: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

In conclusion

47

• The new data challenge, not quite as challenging with Kafka- Kafka throughout

Page 52: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Questions?

48

Page 53: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Questions?

49

• Websites- kafka.apache.org - rittmanmead.com/blog • Contact- [email protected] - [email protected] • Twitter- @rittmanmead - @apachekafka - @mRainey

Page 54: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Questions?

49

• Websites- kafka.apache.org - rittmanmead.com/blog • Contact- [email protected] - [email protected] • Twitter- @rittmanmead - @apachekafka - @mRainey

Page 55: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead

Rittman Mead at KScope16

50

Oracle GoldenGate and Apache Kafka: A Deep Dive into Real-Time Data Streaming

Michael Rainey | Monday Jun 27, 4:30pm | Level 2 - Missouri

Free-Form Data Visualizations: First Session

Charles Elliott | Tuesday Jun 28, 8:30am | Level 2 - Superior ALunch & Learn: BI and Data Warehousing

Michael Rainey | Tuesday Jun 28, 12:45pm | Ballroom Level - Sheraton IILunch & Learn: Big Data and Advanced Analytics

Mark Rittman | Tuesday Jun 28, 12:45pm | Ballroom Level - Sheraton IIIOBIEE 12c and Essbase: What’s New for Integration and Reporting Against EPM Sources

Mark Rittman | Wednesday Jun 29, 10:15am | Ballroom Level - Sheraton III

A Walk Through the Kimball ETL Subsystems with Oracle Data Integration

Michael Rainey | Wednesday Jun 29, 11:30am | Level 2 - MayfairHow to Brand and Own Your OBIEE Interface: Past, Present, and Future

Andy Rocha & Pete Tamisin | Wednesday Jun 29, 2:00 pm | Ballroom Level - Sheraton III

Free-Form Data Visualizations: Second Session

Charles Elliott | Wednesday Jun 29, 2:00pm | Level 2 - Superior AOracle Big Data Discovery: Extending into Machine Learning and Advanced Visualizations

Mark Rittman | Wednesday Jun 29, 3:15pm | Level 2 - Missouri

Page 56: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead 51

Join Your Community! Tonight from 8:00 – 10:00 in Chicago IX

BIWarriorTriviaTimetogetoutthefunnyhats,ba6leaxes,andyourpassionfortriviabecauseBICommunityTriviaisback!ThisquirkyeventisquicklybecomingaKscopemust-see.ThisyearwearegeDngdownmedieval-stylewithBIWarriorTrivia,sogatheryourclanandpreparetoclash(mentally,ofcourse)aswesquareoffinthistrulysillyba6leofwits.Joinusaswedrinksomebeer,giveawayprizes,andputourknowledgetothetestatBITriviaNight!

Page 57: Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming

[email protected] www.rittmanmead.com @rittmanmead 52