Kafka Connect & Streams - the ecosystem around Kafka

Download Kafka Connect & Streams - the ecosystem around Kafka

Post on 22-Jan-2018

380 views

Category:

Data & Analytics

3 download

Embed Size (px)

TRANSCRIPT

<ol><li> 1. Kafka Connect &amp; Streams the Ecosystem around Kafka Guido Schmutz @gschmutz doag2017 </li><li> 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 3. Our company. Kafka Connect &amp; Streams - the Ecosystem around Kafka Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N </li><li> 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. Kafka Connect &amp; Streams - the Ecosystem around Kafka 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers </li><li> 5. Agenda 1. What is Apache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Software Architecture Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 6. Demo Example Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect &amp; Streams - the Ecosystem around Kafka 27,Walter,Ward,Y,24-JUL-85,2017-10-0215:19:00 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012} </li><li> 7. What is Apache Kafka? Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 8. Apache Kafka History 2012 2013 2014 2015 2016 2017 Clustermirroring datacompression Intra-cluster replication 0.7 0.8 0.9 DataProcessing (StreamsAPI) 0.10 DataIntegration (ConnectAPI) 0.11 2018 ExactlyOnce Semantics Performance Improvements KSQLDeveloper Preview Kafka Connect &amp; Streams - the Ecosystem around Kafka 1.0 JBODSupport SupportJava9 </li><li> 9. Apache Kafka - Unix Analogy $ cat &lt; in.txt | grep "kafka" | tr a-z A-Z &gt; out.txt KafkaConnectAPI KafkaConnectAPIKafkaStreamsAPI KafkaCore(Cluster) Adaptedfrom:Confluent KSQL Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 10. Kafka High Level Architecture The who is who Producers write data to brokers. Consumers read data from brokers. All this is distributed. The data Data is stored in topics. Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 11. Kafka Producer Write Ahead Log / Commit Log Producers always append to tail (append to file, i.e. segment) Order is preserved for messages within same partition Kafka Broker MovementTopic 1 2 3 4 5 Truck 6 6 Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 12. Kafka Consumer - Partition offsets Offset A sequential id number assigned to messages in the partitions. Uniquely identifies a message within a partition. Consumers track their pointers via (offset, partition, topic) tuples Since Kafka 0.10: seek to offset by timestamp using method KafkaConsumer#offsetsForTimes ConsumerGroupA ConsumerGroupB 1 2 3 4 5 6 Consumerat "earliest" offset Consumerat "latest" offset Newdata fromProducer Consumerat specificoffset Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 13. How to get a Kafka environent Kafka Connect &amp; Streams - the Ecosystem around Kafka On Premises Bare Metal Installation Docker Mesos / Kubernetes Hadoop Distributions Cloud Oracle Event Hub Cloud Service Confluent Cloud </li><li> 14. Demo (I) Truck-2 truck position Truck-1 Truck-3 console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Testdata-GeneratorbyHortonworks Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 15. Demo (I) Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create--topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 16. Demo (I) Run Producer and Kafka-Console-Consumer Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 17. Demo (I) Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer(kafkaProps); ProducerRecord record = new ProducerRecord("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {} Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 18. Demo (II) devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 19. Demo (II) devices send to MQTT instead of Kafka Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 20. Demo (II) - devices send to MQTT instead of Kafka how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 21. Kafka Connect Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 22. Kafka Connect - Overview Source Connector Sink Connector Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 23. Kafka Connect Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect some useful transforms provided out-of-the-box Easily implement your own Optionally deploy 1+ transforms with each connector Modify messages produced by source connector Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: InsertField ReplaceField MaskField ValueToKey ExtractField TimestampRouter RegexRouter SetSchemaMetaData Flatten TimestampConverter Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 24. Kafka Connect Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Source:http://www.confluent.io/product/connectors ConfluentsupportedConnectors CertifiedConnectors CommunityConnectors Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 25. Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 console consumer Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 26. Demo (III) Create MQTT Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors"-H "Content-Type: application/json"-d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "connect.mqtt.kcql": "INSERT INTO truck_position SELECT * FROM truck/+/position", "name": "MqttSourceConnector", "connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01", "connect.mqtt.converter.throw.on.error": "true", "connect.mqtt.hosts": "tcp://mosquitto:1883" } }' Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 27. Demo (III) Call REST API and Kafka Console Consumer Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 28. Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 console consumer whataboutsome analytics? Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 29. Kafka Streams Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 30. Kafka Streams - Overview Designed as a simple and lightweight library in Apache Kafka no external dependencies on systems other than Apache Kafka Part of open source Apache Kafka, introduced in 0.10+ Leverages Kafka as its internal messaging layer Supports fault-tolerant local state Event-at-a-time processing (not microbatch) with millisecond latency Windowing with out-of-order data using a Google DataFlow-like model Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 31. Kafka Stream DSL and Processor Topology KStream stream1 = builder.stream("in-1"); KStream stream2= builder.stream("in-2"); KStream joined = stream1.leftJoin(stream2, ); KTable aggregated = joined.groupBy().count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 32. Kafka Stream DSL and Processor Topology KStream stream1 = builder.stream("in-1"); KStream stream2= builder.stream("in-2"); KStream joined = stream1.leftJoin(stream2, ); KTable aggregated = joined.groupBy().count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 33. Kafka Streams Cluster Processor Topology Kafka Cluster input-1 input-2 store(changelog) output 1 2 lj a t State Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 34. Kafka Cluster Processor Topology input-1 Partition0 Partition1 Partition2 Partition3 input-2 Partition0 Partition1 Partition2 Partition3 Kafka Streams 1 Kafka Streams 2 Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 35. Kafka Cluster Processor Topology input-1 Partition0 Partition1 Partition2 Partition3 input-2 Partition0 Partition1 Partition2 Partition3 Kafka Streams 1 Kafka Streams 2 Kafka Streams 3 Kafka Streams 4 Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 36. Stream vs. Table Event Stream State Stream (Change Log Stream) 2017-10-02T20:18:46 11,Normal,41.87,-87.67 2017-10-02T20:18:55 11,Normal,40.38,-89.17 2017-10-02T20:18:59 21,Normal,42.23,-91.78 2017-10-02T20:19:01 21,Normal,41.71,-91.32 2017-10-02T20:19:02 11,Normal,38.65,-90.2 2017-10-02T20:19:23 21,Normal41.71,-91.32 11 2017-10-02T20:18:46,11,Normal,41.87,-87.67 11 2017-10-02T20:18:55,11,Normal,40.38,-89.17 21 2017-10-02T20:18:59,21,Normal,42.23,-91.78 21 2017-10-02T20:19:01,21,Normal,41.71,-91.32 11 2017-10-02T20:19:02,11,Normal,38.65,-90.2 21 2017-10-02T20:19:23,21,Normal41.71,-91.32 Kafka Connect &amp; Streams - the Ecosystem around Kafka KStream KTable </li><li> 37. Kafka Streams: Key Features Kafka Connect &amp; Kafka Streams - The ecosystem around Apache Kafka Native, 100%-compatible Kafka integration Secure stream processing using Kafkas security features Elastic and highly scalable Fault-tolerant Stateful and stateless computations Interactive queries Time model Windowing Supports late-arriving and out-of-order data Millisecond processing latency, no micro-batching At-least-once and exactly-once processing guarantees </li><li> 38. Demo (IV) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_danger ous_driving dangerous_ driving console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 39. Demo (IV) - Create Stream final KStreamBuilder builder = new KStreamBuilder(); KStream source = builder.stream(stringSerde, stringSerde, "truck_position"); KStream positions = source.map((key,value) -&gt; new KeyValue(key, TruckPosition.create(value))); KStream filtered = positions.filter(TruckPosition::filterNonNORMAL); filtered.map((key,value) -&gt; new KeyValue(key,value._originalRecord)) .to("dangerous_driving"); Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 40. KSQL Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 41. KSQL: a Streaming SQL Engine for Apache Kafka Enables stream processing with zero coding required The simples way to process streams of data in real-time Powered by Kafka and Kafka Streams: scalable, distributed, mature All you need is Kafka no complex deployments available as Developer preview! STREAM and TABLE as first-class citizens STREAM = data in motion TABLE = collected state of a stream join STREAM and TABLE Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 42. KSQL Deployment Models Standalone Mode Cluster Mode Source:Confluent Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 43. Demo (V) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver 27,Walter,Ward,Y,24-JUL-85,2017-10-0215:19:00 console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012} Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 44. Demo (V) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | &lt; ___ | | | | | = = | .____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql&gt; Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 45. Demo (V) - Create Stream ksql&gt; CREATE STREAM dangerous_driving_s(ts VARCHAR,truckid VARCHAR,driverid BIGINT,routeid BIGINT,routename VARCHAR,eventtype VARCHAR,latitude DOUBLE,longitude DOUBLE,correlationid VARCHAR)WITH (kafka_topic='dangerous_driving',value_format='DELIMITED'); Message ---------------- Stream created Kafka Connect &amp; Streams - the Ecosystem around Kafka </li><li> 4...</li></ol>