Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Download Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Post on 09-Jan-2017

277 views

Category:

Data & Analytics

6 download

TRANSCRIPT

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Michael Rainey | Oracle OpenWorld 2016

    Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

    1

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Introduction

    3

    Michael Rainey - Data Integration Lead

    - Oracle Data Integration expertise - Blog: http://ritt.md/mRainey - Oracle ACE Director

    @mRainey

    http://ritt.md/mRainey

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    About Rittman Mead

    4

    Worlds leading specialist partner for technical excellence, solutions delivery and innovation in Oracle Data Integration, Business Intelligence, Analytics and Big Data

    Providing our customers targeted expertise; we are a company that doesnt try to do everything only what we excel at

    70+ consultants worldwide including 1 Oracle ACE Director and 2 Oracle ACEs, offering training courses, global services, and consulting

    Founded on the values of collaboration, learning, integrity and getting things done

    Unlock the potential of your organizations data

    Comprehensive service portfolio designed to support the full lifecycle of any analytics solution

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead 5

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Data Integration Architecture

    6

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Data Integration Architecture

    6

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Example - Marketing

    7

    Financial data stored in RDBMS

    Social media data, web logs, Google analytics, etc all in

    various formats

    Bring it all together for analysis

    Marketing campaign effect on sales

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Relational Data Replication - Oracle GoldenGate

    8

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Oracle GoldenGate for Big Data (Then)

    9

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Oracle GoldenGate for Big Data (Now)

    10

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Streaming Data - Apache Kafka

    11

    Publish-subscribe messaging rethought as a distributed commit log

    Image source: kafka.apache.org/

    http://kafka.apache.org/

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Streaming Data - Apache Kafka

    12

    Image source: kafka.apache.org/

    http://kafka.apache.org/

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Kafka - How is it used?

    13

    Pure Event Streams

    System Metrics

    Derived Streams

    Hadoop Data Loads / Data Publishing

    Application Logs

    Database Changes

    - Log Compaction - Data cleansing

    Image source: confluent.io

    http://confluent.io

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Kafka - How is it used?

    13

    Pure Event Streams

    System Metrics

    Derived Streams

    Hadoop Data Loads / Data Publishing

    Application Logs

    Database Changes

    - Log Compaction - Data cleansing

    Image source: confluent.io

    XO

    http://confluent.io

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Enterprise Data Bus

    14

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Enterprise Data Bus

    14

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    A simple example

    15

    One view of the Oracle Data Integrator logs

    ODI session logs - stored in the repository

    database

    ODI Agent logs - text files

    To see the full picture of your ODI environment, they must be combined

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Steps to extract from the database

    16

    Prepare the database

    Setup GoldenGate for Oracle Database

    - Install and configure Setup Manager, Extract and Pump parameter files

    Add Extract and Pump process groups

    Start the Extract and Pump processes

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Prepare the Database - OGG User Permissions

    17

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Prepare the Database - Logging Settings

    18

    Enable supplemental logging

    Set GoldenGate Replication parameter

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Add Table Supplemental Logging

    19

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate Extract Setup

    No change

    20

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate Manager Parameter File

    21

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate Extract Parameter File

    22

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate Pump Parameter File

    23

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Adding the Extract and Pump Process Groups

    24

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Stream ODI Agent Logs to Kafka via Logstash

    25

    Application log processing is a standard use for Kafka

    Logstash

    - Part of the Elastic (formerly ELK) stack - Robin Moffatts post> http://ritt.md/kafka-elk - Producer configuration for Kafka

    http://ritt.md/kafka-elk

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Logstash to Kafka - Setup and Startup

    26

    Startup Zookeeper

    - Elects controller broker - Tracks brokers and topic config - Manages access control and quotas Set Kafka server.properties

    - Broker ID - Number of partitions - Log retention period - Zookeeper connection Start Kafka

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Setup Logstash Configuration File

    27

    Configuration File - logstash-odiagent-kafka-producer.conf

    Start Logstash

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    ODI Agent Logs to Kafka!

    28

    Start the Kafka Console Consumer - delivered with Kafka

    Start the ODI Agent

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    ODI Agent Logs to Kafka!

    29

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate Transactions to Kafka

    30

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Oracle GoldenGate for Big Data

    31

    Kafka - one of many handlers

    - HDFS, HBase, Flume, Hive Pluggable Formatters

    - Convert trail file transactions to alternate format - Avro, delimited text, JSON, XML Metadata Provider

    - Handles mapping of source to target columns that differ in structure/name - Similar to SOURCEDEF file in GoldenGate - Avro or Hive

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Oracle GoldenGate for Big Data - Kafka Handler Setup

    32

    Standard GoldenGate Extract / Pump processes

    - Remember, no change here Replicat for Java parameter file & process group

    Kakfa Handler configuration

    Kafka Producer properties

    - Note: Kafka 0.9.0+ now certified with GoldenGate for Big Data

    12.2.1.1

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Another approach

    33

    Kafka Connect Handler (Open Source)

    - java.net/downloads/oracledi/GoldenGate - Uses the Kafka Connect framework - Can integrate with Confluent Platform & Schema Registry - Tables = Topics Differences?

    - OGG for Big Data Kafka Handler uses pluggable formatters - Kafka Connect Handler builds up schemas and structs via the Kafka

    Connect API

    http://java.net/downloads/oracledi/GoldenGate

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Oracle GoldenGate for Big Data - Prerequisites

    34

    Zookeeper & Kafka up and running

    Add topic to broker up front vs dynamically

    - Option to create a topic per table (OGG for Big Data 12.2.0.1.1) Kafka Handler must have access to broker server

    Kafka libraries must match Kafka version

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Kafka topic per table

    35

    More on this later

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - Replicat Parameters

    36

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - Kafka Handler Properties

    37

    Properties describe how communication between the GoldenGate adapter and Kafka will occur

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - Kafka Handler Properties

    38

    gg.handlerlist = kafkahandler gg.handler.kafkahandler.type = kafka gg.handler.kafkahandler.KafkaProducerConfigFile = kafka_producer.properties gg.handler.kafkahandler.TopicName = odirepo

    - Kafka topic name gg.handler.kafkahandler.format = xml | delimitedtext | json | avro_row | avro_op

    - Pluggable Formatter - Avro recommended for Kafka gg.handler.kafkahandler.BlockingSend = true | false - true - synchronous (wait for acknowledgement before sending next message) gg.handler.kafkahandler.mode = tx | op

    - Transaction vs Operation mode

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - Kafka Handler Properties

    39

    goldengate.userexit.timestamp = utc

    goldengate.userexit.writers = javawriter

    javawriter.stats.display = TRUE

    javawriter.stats.full = TRUE

    gg.log = log4j

    gg.log.level = INFO

    gg.report.time = 30sec

    gg.classpath = dirprm/:/u01/kafka/kafka_2.10-0.8.2.1/libs/*:

    - Location of the Kafka libraries

    javawriter.bootoptions = -Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - One Topic Per Table

    40

    gg.handler.kafkahandler.topicPartitioning = table

    - Option to split schema into one topic per table

    - Topics can be created dynamically

    gg.handler.kafkahandler.mode = op

    - Operation mode required to track individual table operations

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - Kafka Producer Configuration

    41

    Access to the Kafka producer configuration parameters

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka - Startup

    42

    Create a topic in Kakfa (or one per table)

    Add Replicat process group to GoldenGate on target

    Start Kafka console consumer

    Start GoldenGate extract/pump on source, replicat on target

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate and Kafka Integration Complete!

    43

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Schemas

    44

    Schema automatically created

    - Stored in /dirdef directory - Based on gg.handler.kafkahandler.format setting

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Schemas

    44

    Schema automatically created

    - Stored in /dirdef directory - Based on gg.handler.kafkahandler.format setting

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Schemas

    44

    Schema automatically created

    - Stored in /dirdef directory - Based on gg.handler.kafkahandler.format setting

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    GoldenGate Big Data Adapter - What to Think About

    45

    GoldenGate might be a single point of failure

    - Kafka is a fault-tolerant, distributed system Source transactions may end up larger than expected

    - max.request.size Performance considerations

    - batch.size and linger.ms

    higher values = increased latency, better throughput - BlockingSend = false and Mode = tx - GROUPTRANSOPS Monitoring

    - Confluent? Custom?

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Why GoldenGate with Kafka?

    46

    GoldenGate

    - is non-invasive - has checkpoints for recovery - moves data quickly - is easy to setup

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Data Integration Architecture - Kafka throughout

    47

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Questions?

    48

  • info@rittmanmead.com www.rittmanmead.com @rittmanmead

    Questions?

    49

    Websites

    - kafka.apache.org - rittmanmead.com/blog Contact

    - info@rittmanmead.com - michael.rainey@rittmanmead.com Twitter

    - @rittmanmead - @apachekafka - @mRainey