oracle goldengate and apache kafka: a deep dive into real-time data streaming

Post on 09-Jan-2017

409 Views

Category:

Data & Analytics

24 Downloads

Preview:

Click to see full reader

TRANSCRIPT

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Michael Rainey | Oracle OpenWorld 2016

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

1

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Introduction

3

• Michael Rainey - Data Integration Lead - Oracle Data Integration expertise - Blog: http://ritt.md/mRainey - Oracle ACE Director

@mRainey

info@rittmanmead.com www.rittmanmead.com @rittmanmead

About Rittman Mead

4

•World’s leading specialist partner for technical excellence, solutions delivery and innovation in Oracle Data Integration, Business Intelligence, Analytics and Big Data

•Providing our customers targeted expertise; we are a company that doesn’t try to do everything… only what we excel at

•70+ consultants worldwide including 1 Oracle ACE Director and 2 Oracle ACEs, offering training courses, global services, and consulting

•Founded on the values of collaboration, learning, integrity and getting things done

Unlock the potential of your organization’s data

•Comprehensive service portfolio designed to support the full lifecycle of any analytics solution

info@rittmanmead.com www.rittmanmead.com @rittmanmead 5

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Data Integration Architecture

6

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Data Integration Architecture

6

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Example - Marketing

7

• Financial data stored in RDBMS• Social media data, web logs, Google analytics, etc all in

various formats• Bring it all together for analysis ‣ Marketing campaign effect on sales

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Relational Data Replication - Oracle GoldenGate

8

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data (Then)

9

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data (Now)

10

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Streaming Data - Apache Kafka

11

“Publish-subscribe messaging rethought as a distributed commit log”

Image source: kafka.apache.org/

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Streaming Data - Apache Kafka

12

Image source: kafka.apache.org/

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Kafka - How is it used?

13

• Pure Event Streams• System Metrics• Derived Streams• Hadoop Data Loads / Data Publishing• Application Logs• Database Changes- Log Compaction - Data cleansing

Image source: confluent.io

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Kafka - How is it used?

13

• Pure Event Streams• System Metrics• Derived Streams• Hadoop Data Loads / Data Publishing• Application Logs• Database Changes- Log Compaction - Data cleansing

Image source: confluent.io

XO

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Enterprise Data Bus

14

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Enterprise Data Bus

14

info@rittmanmead.com www.rittmanmead.com @rittmanmead

A simple example…

15

One view of the Oracle Data Integrator logs• ODI session logs - stored in the repository

database• ODI Agent logs - text files

To see the full picture of your ODI environment, they must be combined

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Steps to extract from the database

16

• Prepare the database • Setup GoldenGate for Oracle Database- Install and configure • Setup Manager, Extract and Pump parameter files• Add Extract and Pump process groups• Start the Extract and Pump processes

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Prepare the Database - OGG User Permissions

17

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Prepare the Database - Logging Settings

18

• Enable supplemental logging

• Set GoldenGate Replication parameter

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Add Table Supplemental Logging

19

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate Extract Setup

No change

20

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate Manager Parameter File

21

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate Extract Parameter File

22

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate Pump Parameter File

23

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Adding the Extract and Pump Process Groups

24

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Stream ODI Agent Logs to Kafka via Logstash

25

• Application log processing is a standard use for Kafka

• Logstash- Part of the Elastic (formerly ELK) stack - Robin Moffatt’s post—> http://ritt.md/kafka-elk - Producer configuration for Kafka

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Logstash to Kafka - Setup and Startup

26

• Startup Zookeeper- Elects controller broker - Tracks brokers and topic config - Manages access control and quotas • Set Kafka server.properties- Broker ID - Number of partitions - Log retention period - Zookeeper connection • Start Kafka

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Setup Logstash Configuration File

27

• Configuration File - logstash-odiagent-kafka-producer.conf

• Start Logstash

info@rittmanmead.com www.rittmanmead.com @rittmanmead

ODI Agent Logs to Kafka!

28

• Start the Kafka Console Consumer - delivered with Kafka

• Start the ODI Agent

info@rittmanmead.com www.rittmanmead.com @rittmanmead

ODI Agent Logs to Kafka!

29

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate Transactions to Kafka

30

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data

31

• Kafka - one of many handlers- HDFS, HBase, Flume, Hive • Pluggable Formatters- Convert trail file transactions to alternate format - Avro, delimited text, JSON, XML • Metadata Provider- Handles mapping of source to target columns that differ in structure/name - Similar to SOURCEDEF file in GoldenGate - Avro or Hive

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data - Kafka Handler Setup

32

• Standard GoldenGate Extract / Pump processes- Remember, no change here • Replicat for Java parameter file & process group• Kakfa Handler configuration• Kafka Producer properties- Note: Kafka 0.9.0+ now certified with GoldenGate for Big Data

12.2.1.1

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Another approach…

33

• Kafka Connect Handler (Open Source)- java.net/downloads/oracledi/GoldenGate - Uses the Kafka Connect framework - Can integrate with Confluent Platform & Schema Registry - Tables = Topics • Differences?- OGG for Big Data Kafka Handler uses pluggable formatters - Kafka Connect Handler builds up schemas and structs via the Kafka

Connect API

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Oracle GoldenGate for Big Data - Prerequisites

34

• Zookeeper & Kafka up and running• Add topic to broker up front vs dynamically- Option to create a topic per table (OGG for Big Data 12.2.0.1.1) • Kafka Handler must have access to broker server• Kafka libraries must match Kafka version

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Kafka topic per table

35

More on this later

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - Replicat Parameters

36

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - Kafka Handler Properties

37

Properties describe how communication between the GoldenGate adapter and Kafka will occur

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - Kafka Handler Properties

38

• gg.handlerlist = kafkahandler• gg.handler.kafkahandler.type = kafka• gg.handler.kafkahandler.KafkaProducerConfigFile = kafka_producer.properties• gg.handler.kafkahandler.TopicName = odirepo- Kafka topic name • gg.handler.kafkahandler.format = xml | delimitedtext | json | avro_row | avro_op- Pluggable Formatter - Avro recommended for Kafka… • gg.handler.kafkahandler.BlockingSend = true | false - true - synchronous (wait for acknowledgement before sending next message)• gg.handler.kafkahandler.mode = tx | op- Transaction vs Operation mode

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - Kafka Handler Properties

39

• goldengate.userexit.timestamp = utc• goldengate.userexit.writers = javawriter• javawriter.stats.display = TRUE• javawriter.stats.full = TRUE• gg.log = log4j• gg.log.level = INFO• gg.report.time = 30sec• gg.classpath = dirprm/:/u01/kafka/kafka_2.10-0.8.2.1/libs/*:- Location of the Kafka libraries• javawriter.bootoptions = -Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - One Topic Per Table

40

• gg.handler.kafkahandler.topicPartitioning = table - Option to split schema into one topic per table- Topics can be created dynamically• gg.handler.kafkahandler.mode = op - Operation mode required to track individual table operations

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - Kafka Producer Configuration

41

• Access to the Kafka producer configuration parameters

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka - Startup

42

• Create a topic in Kakfa (or one per table)

• Add Replicat process group to GoldenGate on target

• Start Kafka console consumer

• Start GoldenGate extract/pump on source, replicat on target

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate and Kafka Integration Complete!

43

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Schemas

44

• Schema automatically created - Stored in <ogg_home>/dirdef directory - Based on gg.handler.kafkahandler.format setting

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Schemas

44

• Schema automatically created - Stored in <ogg_home>/dirdef directory - Based on gg.handler.kafkahandler.format setting

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Schemas

44

• Schema automatically created - Stored in <ogg_home>/dirdef directory - Based on gg.handler.kafkahandler.format setting

info@rittmanmead.com www.rittmanmead.com @rittmanmead

GoldenGate Big Data Adapter - What to Think About

45

• GoldenGate might be a single point of failure - Kafka is a fault-tolerant, distributed system • Source transactions may end up larger than expected - max.request.size • Performance considerations- batch.size and linger.ms

• higher values = increased latency, better throughput - BlockingSend = false and Mode = tx - GROUPTRANSOPS • Monitoring- Confluent? Custom?

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Why GoldenGate with Kafka?

46

• GoldenGate- …is non-invasive - …has checkpoints for recovery - …moves data quickly - …is easy to setup

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Data Integration Architecture - Kafka throughout

47

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Questions?

48

info@rittmanmead.com www.rittmanmead.com @rittmanmead

Questions?

49

• Websites- kafka.apache.org - rittmanmead.com/blog • Contact- info@rittmanmead.com - michael.rainey@rittmanmead.com • Twitter- @rittmanmead - @apachekafka - @mRainey

top related