Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Download Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Post on 09-Jan-2017

279 views

Category:

Data & Analytics

7 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Michael Rainey | Oracle OpenWorld 2016</p><p>Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming</p><p>1</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Introduction</p><p>3</p><p> Michael Rainey - Data Integration Lead </p><p>- Oracle Data Integration expertise - Blog: http://ritt.md/mRainey - Oracle ACE Director</p><p>@mRainey</p><p>http://ritt.md/mRainey</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>About Rittman Mead</p><p>4</p><p> Worlds leading specialist partner for technical excellence, solutions delivery and innovation in Oracle Data Integration, Business Intelligence, Analytics and Big Data</p><p> Providing our customers targeted expertise; we are a company that doesnt try to do everything only what we excel at</p><p> 70+ consultants worldwide including 1 Oracle ACE Director and 2 Oracle ACEs, offering training courses, global services, and consulting</p><p> Founded on the values of collaboration, learning, integrity and getting things done</p><p>Unlock the potential of your organizations data</p><p> Comprehensive service portfolio designed to support the full lifecycle of any analytics solution</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead 5</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Data Integration Architecture</p><p>6</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Data Integration Architecture</p><p>6</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Example - Marketing</p><p>7</p><p> Financial data stored in RDBMS</p><p> Social media data, web logs, Google analytics, etc all in </p><p>various formats</p><p> Bring it all together for analysis </p><p> Marketing campaign effect on sales</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Relational Data Replication - Oracle GoldenGate</p><p>8</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Oracle GoldenGate for Big Data (Then)</p><p>9</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Oracle GoldenGate for Big Data (Now)</p><p>10</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Streaming Data - Apache Kafka</p><p>11</p><p>Publish-subscribe messaging rethought as a distributed commit log</p><p>Image source: kafka.apache.org/</p><p>http://kafka.apache.org/</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Streaming Data - Apache Kafka</p><p>12</p><p>Image source: kafka.apache.org/</p><p>http://kafka.apache.org/</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Kafka - How is it used?</p><p>13</p><p> Pure Event Streams</p><p> System Metrics</p><p> Derived Streams</p><p> Hadoop Data Loads / Data Publishing</p><p> Application Logs</p><p> Database Changes</p><p>- Log Compaction - Data cleansing</p><p>Image source: confluent.io</p><p>http://confluent.io</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Kafka - How is it used?</p><p>13</p><p> Pure Event Streams</p><p> System Metrics</p><p> Derived Streams</p><p> Hadoop Data Loads / Data Publishing</p><p> Application Logs</p><p> Database Changes</p><p>- Log Compaction - Data cleansing</p><p>Image source: confluent.io</p><p>XO</p><p>http://confluent.io</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Enterprise Data Bus</p><p>14</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Enterprise Data Bus</p><p>14</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>A simple example</p><p>15</p><p>One view of the Oracle Data Integrator logs</p><p> ODI session logs - stored in the repository </p><p>database</p><p> ODI Agent logs - text files</p><p>To see the full picture of your ODI environment, they must be combined</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Steps to extract from the database</p><p>16</p><p> Prepare the database </p><p> Setup GoldenGate for Oracle Database</p><p>- Install and configure Setup Manager, Extract and Pump parameter files</p><p> Add Extract and Pump process groups</p><p> Start the Extract and Pump processes</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Prepare the Database - OGG User Permissions</p><p>17</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Prepare the Database - Logging Settings</p><p>18</p><p> Enable supplemental logging</p><p> Set GoldenGate Replication parameter</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Add Table Supplemental Logging</p><p>19</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate Extract Setup</p><p>No change</p><p>20</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate Manager Parameter File</p><p>21</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate Extract Parameter File</p><p>22</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate Pump Parameter File</p><p>23</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Adding the Extract and Pump Process Groups</p><p>24</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Stream ODI Agent Logs to Kafka via Logstash</p><p>25</p><p> Application log processing is a standard use for Kafka</p><p> Logstash</p><p>- Part of the Elastic (formerly ELK) stack - Robin Moffatts post&gt; http://ritt.md/kafka-elk - Producer configuration for Kafka</p><p>http://ritt.md/kafka-elk</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Logstash to Kafka - Setup and Startup </p><p>26</p><p> Startup Zookeeper</p><p>- Elects controller broker - Tracks brokers and topic config - Manages access control and quotas Set Kafka server.properties</p><p>- Broker ID - Number of partitions - Log retention period - Zookeeper connection Start Kafka</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Setup Logstash Configuration File</p><p>27</p><p> Configuration File - logstash-odiagent-kafka-producer.conf</p><p> Start Logstash</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>ODI Agent Logs to Kafka!</p><p>28</p><p> Start the Kafka Console Consumer - delivered with Kafka</p><p> Start the ODI Agent</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>ODI Agent Logs to Kafka!</p><p>29</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate Transactions to Kafka</p><p>30</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Oracle GoldenGate for Big Data</p><p>31</p><p> Kafka - one of many handlers</p><p>- HDFS, HBase, Flume, Hive Pluggable Formatters</p><p>- Convert trail file transactions to alternate format - Avro, delimited text, JSON, XML Metadata Provider</p><p>- Handles mapping of source to target columns that differ in structure/name - Similar to SOURCEDEF file in GoldenGate - Avro or Hive</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Oracle GoldenGate for Big Data - Kafka Handler Setup</p><p>32</p><p> Standard GoldenGate Extract / Pump processes</p><p>- Remember, no change here Replicat for Java parameter file &amp; process group</p><p> Kakfa Handler configuration</p><p> Kafka Producer properties</p><p>- Note: Kafka 0.9.0+ now certified with GoldenGate for Big Data </p><p>12.2.1.1</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Another approach</p><p>33</p><p> Kafka Connect Handler (Open Source)</p><p>- java.net/downloads/oracledi/GoldenGate - Uses the Kafka Connect framework - Can integrate with Confluent Platform &amp; Schema Registry - Tables = Topics Differences?</p><p>- OGG for Big Data Kafka Handler uses pluggable formatters - Kafka Connect Handler builds up schemas and structs via the Kafka </p><p>Connect API</p><p>http://java.net/downloads/oracledi/GoldenGate</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Oracle GoldenGate for Big Data - Prerequisites</p><p>34</p><p> Zookeeper &amp; Kafka up and running</p><p> Add topic to broker up front vs dynamically</p><p>- Option to create a topic per table (OGG for Big Data 12.2.0.1.1) Kafka Handler must have access to broker server</p><p> Kafka libraries must match Kafka version</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Kafka topic per table</p><p>35</p><p>More on this later</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - Replicat Parameters</p><p>36</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - Kafka Handler Properties</p><p>37</p><p>Properties describe how communication between the GoldenGate adapter and Kafka will occur</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - Kafka Handler Properties</p><p>38</p><p> gg.handlerlist = kafkahandler gg.handler.kafkahandler.type = kafka gg.handler.kafkahandler.KafkaProducerConfigFile = kafka_producer.properties gg.handler.kafkahandler.TopicName = odirepo</p><p>- Kafka topic name gg.handler.kafkahandler.format = xml | delimitedtext | json | avro_row | avro_op</p><p>- Pluggable Formatter - Avro recommended for Kafka gg.handler.kafkahandler.BlockingSend = true | false - true - synchronous (wait for acknowledgement before sending next message) gg.handler.kafkahandler.mode = tx | op</p><p>- Transaction vs Operation mode</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - Kafka Handler Properties</p><p>39</p><p> goldengate.userexit.timestamp = utc</p><p> goldengate.userexit.writers = javawriter</p><p> javawriter.stats.display = TRUE</p><p> javawriter.stats.full = TRUE</p><p> gg.log = log4j</p><p> gg.log.level = INFO</p><p> gg.report.time = 30sec</p><p> gg.classpath = dirprm/:/u01/kafka/kafka_2.10-0.8.2.1/libs/*:</p><p>- Location of the Kafka libraries</p><p> javawriter.bootoptions = -Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar </p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - One Topic Per Table</p><p>40</p><p> gg.handler.kafkahandler.topicPartitioning = table </p><p>- Option to split schema into one topic per table</p><p>- Topics can be created dynamically</p><p> gg.handler.kafkahandler.mode = op </p><p>- Operation mode required to track individual table operations</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - Kafka Producer Configuration</p><p>41</p><p> Access to the Kafka producer configuration parameters</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka - Startup</p><p>42</p><p> Create a topic in Kakfa (or one per table)</p><p> Add Replicat process group to GoldenGate on target</p><p> Start Kafka console consumer</p><p> Start GoldenGate extract/pump on source, replicat on target</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate and Kafka Integration Complete!</p><p>43</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Schemas</p><p>44</p><p> Schema automatically created </p><p>- Stored in /dirdef directory - Based on gg.handler.kafkahandler.format setting</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Schemas</p><p>44</p><p> Schema automatically created </p><p>- Stored in /dirdef directory - Based on gg.handler.kafkahandler.format setting</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Schemas</p><p>44</p><p> Schema automatically created </p><p>- Stored in /dirdef directory - Based on gg.handler.kafkahandler.format setting</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>GoldenGate Big Data Adapter - What to Think About</p><p>45</p><p> GoldenGate might be a single point of failure </p><p>- Kafka is a fault-tolerant, distributed system Source transactions may end up larger than expected </p><p>- max.request.size Performance considerations</p><p>- batch.size and linger.ms </p><p> higher values = increased latency, better throughput - BlockingSend = false and Mode = tx - GROUPTRANSOPS Monitoring</p><p>- Confluent? Custom?</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Why GoldenGate with Kafka?</p><p>46</p><p> GoldenGate</p><p>- is non-invasive - has checkpoints for recovery - moves data quickly - is easy to setup </p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Data Integration Architecture - Kafka throughout</p><p>47</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Questions?</p><p>48</p></li><li><p>info@rittmanmead.com www.rittmanmead.com @rittmanmead </p><p>Questions?</p><p>49</p><p> Websites</p><p>- kafka.apache.org - rittmanmead.com/blog Contact</p><p>- info@rittmanmead.com - michael.rainey@rittmanmead.com Twitter</p><p>- @rittmanmead - @apachekafka - @mRainey</p></li></ul>

Recommended

View more >