apache kafka - scalable message processing and more!

Post on 22-Jan-2018

185 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Apache Kafka – Scalable Stream Processing and more!Guido Schmutz – 5.12.2017

@gschmutz guidoschmutz.wordpress.com

Guido Schmutz

Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataHead of Trivadis Architecture BoardTechnology Manager @ Trivadis

More than 30 years of software development experience

Contact: guido.schmutz@trivadis.comBlog: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

Apache Kafka – Scalable Stream Processing and more!

Agenda

1. What is Apache Kafka?2. Kafka Connect3. Kafka Integration with other components4. Kafka Streams5. KSQL

Apache Kafka – Scalable Stream Processing and more!

What is Apache Kafka?

Apache Kafka – Scalable Stream Processing and more!

Apache Kafka History

2012 2013 2014 2015 2016 2017

Clustermirroringdatacompression

Intra-clusterreplication0.7

0.8

0.9

DataProcessing(StreamsAPI)

0.10

DataIntegration(ConnectAPI)

0.11

2018

ExactlyOnceSemanticsPerformanceImprovements

KSQLDeveloperPreview

Apache Kafka – Scalable Stream Processing and more!

1.0 JBODSupportSupportJava9

Apache Kafka – A Streaming Platform

Apache Kafka – Scalable Stream Processing and more!

High-Level Architecture

Distributed Log at the Core

Scale-Out Architecture

Logs do not (necessarily) forget

Strong Ordering Guarantees

most business systems need strong ordering guarantees

messages that require relative ordering need to be sent to the same partition

supply same key for all messages that require a relative order

To maintain global ordering use a single partition topic

Producer 1

Consumer 1

Broker 1

Broker 2

Broker 3

Consumer 2

Consumer 3

Key-1

Key-2

Key-3Key-4

Key-5

Key-6

Key-3

Key-1

Apache Kafka – Scalable Stream Processing and more!

Durable and Highly Available Messaging

Producer 1

Broker 1

Broker 2

Broker 3

Producer 1

Broker 1

Broker 2

Broker 3

Consumer 1 Consumer 1

Consumer 2Consumer 2

Apache Kafka – Scalable Stream Processing and more!

Durable and Highly Available Messaging (II)

Producer 1

Broker 1

Broker 2

Broker 3

Producer 1

Broker 1

Broker 2

Broker 3

Consumer 1 Consumer 1

Consumer 2

Consumer 2

Apache Kafka – Scalable Stream Processing and more!

How to get a Kafka environent

Apache Kafka – Scalable Stream Processing and more!

On Premises• Bare Metal Installation

• Docker

• Mesos / Kubernetes

• Hadoop Distributions

Cloud• Oracle Event Hub Cloud Service

• Azure HDInsight Kafka

• Confluent Cloud

• …

Demo - Kafka

Truck-2 truckposition

Truck-1

Truck-3

consoleconsumer

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

Testdata-GeneratorbyHortonworks

Apache Kafka – Scalable Stream Processing and more!

Demo – Create Kafka Topic

$ kafka-topics --zookeeper zookeeper:2181 --create \--topic truck_position --partitions 8 --replication-factor 1

$ kafka-topics --zookeeper zookeeper:2181 –list__consumer_offsets_confluent-metrics_schemasdocker-connect-configsdocker-connect-offsetsdocker-connect-statustruck_position

Apache Kafka – Scalable Stream Processing and more!

Demo – Run Producer and Kafka-Console-Consumer

Apache Kafka – Scalable Stream Processing and more!

Demo – Java Producer to "truck_position"

Constructing a Kafka Producer

private Properties kafkaProps = new Properties();kafkaProps.put("bootstrap.servers","broker-1:9092);kafkaProps.put("key.serializer", "...StringSerializer");kafkaProps.put("value.serializer", "...StringSerializer");

producer = new KafkaProducer<String, String>(kafkaProps);

ProducerRecord<String, String> record =new ProducerRecord<>("truck_position", driverId, eventData);

try {metadata = producer.send(record).get();

} catch (Exception e) {}

Apache Kafka – Scalable Stream Processing and more!

Demo - MQTT instead of Kafka

Truck-2 truck/nn/position

Truck-1

Truck-3

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

Apache Kafka – Scalable Stream Processing and more!

Demo –MQTT instead of Kafka

Apache Kafka – Scalable Stream Processing and more!

Demo MQTT instead of Kafka – how to get the data into Kafka?

Truck-2 truck/nn/position

Truck-1

Truck-3

truckposition raw

?

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

Apache Kafka – Scalable Stream Processing and more!

Apache Kafka – wait there is more!

Apache Kafka – Scalable Stream Processing and more!

Source Connector

trucking_driver

Kafka BrokerSink

Connector

Stream Processing

Kafka Connect

Apache Kafka – Scalable Stream Processing and more!

Kafka Connect - Overview

SourceConnector

SinkConnector

Apache Kafka – Scalable Stream Processing and more!

Kafka Connect – Single Message Transforms (SMT)

Simple Transformations for a single message

Defined as part of Kafka Connect• some useful transforms provided out-of-the-box• Easily implement your own

Optionally deploy 1+ transforms with each connector• Modify messages produced by source

connector• Modify messages sent to sink connectors

Makes it much easier to mix and match connectors

Some of currently available transforms:• InsertField• ReplaceField• MaskField• ValueToKey• ExtractField• TimestampRouter• RegexRouter• SetSchemaMetaData• Flatten• TimestampConverter

Apache Kafka – Scalable Stream Processing and more!

Kafka Connect – Many Connectors

60+ since first release (0.9+)

20+ from Confluent and Partners

Source:http://www.confluent.io/product/connectors

ConfluentsupportedConnectors

CertifiedConnectors CommunityConnectors

Apache Kafka – Scalable Stream Processing and more!

Demo – Kafka Connect

Truck-2 truck/nn/position

Truck-1

Truck-3

mqtt tokafka

truck_position

consoleconsumer

Apache Kafka – Scalable Stream Processing and more!

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

Demo – Create MQTT Connect through REST API #!/bin/bashcurl -X "POST" "http://192.168.69.138:8083/connectors" \

-H "Content-Type: application/json" \-d $'{

"name": "mqtt-source","config": {"connector.class":

"com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector","connect.mqtt.connection.timeout": "1000","tasks.max": "1","connect.mqtt.kcql":

"INSERT INTO truck_position SELECT * FROM truck/+/position","name": "MqttSourceConnector","connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01","connect.mqtt.converter.throw.on.error": "true","connect.mqtt.hosts": "tcp://mosquitto:1883"}

}'

Apache Kafka – Scalable Stream Processing and more!

Demo – Call REST API and Kafka Console Consumer

Apache Kafka – Scalable Stream Processing and more!

Kafka Integration with othercomponents

Apache Kafka – Scalable Stream Processing and more!

Kafka and the Big Data / Fast Data ecosystem

Kafka integrates with many popular products / frameworks

• Apache Spark Streaming

• Apache Flink

• Apache Storm

• Apache Apex

• Apache NiFi

• StreamSets

• Oracle Stream Analytics

• Oracle Service Bus

• Oracle GoldenGate

• Oracle Event Hub Cloud Service

• Debezium CDC

• …

AdditionalInfo:https://cwiki.apache.org/confluence/display/KAFKA/EcosystemApache Kafka – Scalable Stream Processing and more!

StreamSets Data Collector

• Founded by ex-Cloudera, Informaticaemployees

• Continuous open source, intent-driven, big data ingest

• Visible, record-oriented approach fixes combinatorial explosion

• Batch or stream processing• Standalone, Spark cluster, MapReduce cluster

• IDE for pipeline development by ‘civilians’• Relatively new - first public release September

2015• So far, vast majority of commits are from

StreamSets staff

Apache Kafka – Scalable Stream Processing and more!

Demo StreamSets Data Collector

Truck-3

truckposition raw

truck/nn/positionTruck-4

Truck-5

KafkatoCassandra

{"truckid":"57","driverid":"15","routeid":"1927624662","eventtype":"Normal","latitude":"38.65","longitude":"-90.21","correlationId":"4412891759760421296"}

MQTT-2to Kafka

Edge

Port: 1883

trucking

Apache Kafka – Scalable Stream Processing and more!

Demo StreamSets Data Collector

Apache Kafka – Scalable Stream Processing and more!

Demo StreamSets Data Collector

Apache Kafka – Scalable Stream Processing and more!

Demo StreamSets Data Collector

Apache Kafka – Scalable Stream Processing and more!

Demo StreamSets Data Collector

Apache Kafka – Scalable Stream Processing and more!

Demo StreamSets Data Collector

Truck-3

truckposition raw

truck/nn/positionTruck-4

Truck-5

KafkatoCassandra

{"truckid":"57","driverid":"15","routeid":"1927624662","eventtype":"Normal","latitude":"38.65","longitude":"-90.21","correlationId":"4412891759760421296"}

MQTT-2to Kafka

Edge

Port: 1883

trucking

whataboutsomeanalytics?

Apache Kafka – Scalable Stream Processing and more!

Kafka Streams

Apache Kafka – Scalable Stream Processing and more!

Kafka Streams - Overview

• Designed as a simple and lightweight library in Apache Kafka

• no external dependencies on systems other than Apache Kafka

• Part of open source Apache Kafka, introduced in 0.10+• Leverages Kafka as its internal messaging layer• Supports fault-tolerant local state• Event-at-a-time processing (not microbatch) with millisecond

latency• Windowing with out-of-order data using a Google DataFlow-like

model

Apache Kafka – Scalable Stream Processing and more!

Kafka Stream DSL and Processor Topology

KStream<Integer, String> stream1 =builder.stream("in-1");

KStream<Integer, String> stream2=builder.stream("in-2");

KStream<Integer, String> joined =stream1.leftJoin(stream2, …);

KTable<> aggregated = joined.groupBy(…).count("store");

aggregated.to("out-1");

1 2

lj

a

t

State

Apache Kafka – Scalable Stream Processing and more!

Kafka Stream DSL and Processor Topology

KStream<Integer, String> stream1 =builder.stream("in-1");

KStream<Integer, String> stream2=builder.stream("in-2");

KStream<Integer, String> joined =stream1.leftJoin(stream2, …);

KTable<> aggregated = joined.groupBy(…).count("store");

aggregated.to("out-1");

1 2

lj

a

t

State

Apache Kafka – Scalable Stream Processing and more!

Kafka Streams Cluster

Processor Topology

Kafka Cluster

input-1

input-2

store(changelog)

output

1 2

lj

a

tState

Apache Kafka – Scalable Stream Processing and more!

Kafka Cluster

Processor Topology

input-1Partition0

Partition1

Partition2

Partition3

input-2Partition0

Partition1

Partition2

Partition3

Kafka Streams 1 Kafka Streams 2

Kafka Streams 3 Kafka Streams 4

Apache Kafka – Scalable Stream Processing and more!

Demo – Kafka Streams

Truck-2 truck/nn/position

Truck-1

Truck-3

mqtt tokafka

truck_position_s

detect_dangerous_driving

dangerous_driving

consoleconsumer

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

Apache Kafka – Scalable Stream Processing and more!

KafkatoCassandra trucking

Demo (IV) - Create Stream

final KStreamBuilder builder = new KStreamBuilder();

KStream<String, String> source = builder.stream(stringSerde, stringSerde, "truck_position");

KStream<String, TruckPosition> positions = source.map((key,value) ->

new KeyValue<>(key, TruckPosition.create(value)));

KStream<String, TruckPosition> filtered = positions.filter(TruckPosition::filterNonNORMAL);

filtered.map((key,value) -> new KeyValue<>(key,value._originalRecord))

.to("dangerous_driving");

Apache Kafka – Scalable Stream Processing and more!

KSQL

Apache Kafka – Scalable Stream Processing and more!

KSQL: a Streaming SQL Engine for Apache Kafka

• Enables stream processing with zero coding required• The simples way to process streams of data in real-time• Powered by Kafka and Kafka Streams: scalable, distributed, mature• All you need is Kafka – no complex deployments• available as Developer preview!

• STREAM and TABLE as first-class citizens• STREAM = data in motion• TABLE = collected state of a stream• join STREAM and TABLE

Apache Kafka – Scalable Stream Processing and more!

Demo – KSQL

Truck-2 truck/nn/position

Truck-1

Truck-3

mqtt tokafka

truck_position

detect_dangerous_driving

dangerous_driving

consoleconsumer

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

Apache Kafka – Scalable Stream Processing and more!

KafkatoCassandra trucking

Demo (V) - Start Kafka KSQL$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092

======================================= _ __ _____ ____ _ == | |/ // ____|/ __ \| | == | ' /| (___ | | | | | == | < \___ \| | | | | == | . \ ____) | |__| | |____ == |_|\_\_____/ \___\_\______| == == Streaming SQL Engine for Kafka =

Copyright 2017 Confluent Inc.

CLI v0.1, Server v0.1 located at http://localhost:9098

Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!

ksql>

Apache Kafka – Scalable Stream Processing and more!

Demo (IV) - Create Streamksql> CREATE STREAM truck_position_s \(ts VARCHAR, \truckid VARCHAR, \driverid BIGINT, \routeid BIGINT, \routename VARCHAR, \eventtype VARCHAR, \latitude DOUBLE, \longitude DOUBLE, \correlationid VARCHAR) \WITH (kafka_topic='truck_position', \

value_format='DELIMITED');

Message----------------Stream created

Apache Kafka – Scalable Stream Processing and more!

Demo (IV) - Create Stream

ksql> describe truck_position_s;

Field | Type---------------------------------ROWTIME | BIGINTROWKEY | VARCHAR(STRING)TS | VARCHAR(STRING)TRUCKID | VARCHAR(STRING)DRIVERID | BIGINTROUTEID | BIGINTROUTENAME | VARCHAR(STRING)EVENTTYPE | VARCHAR(STRING)LATITUDE | DOUBLELONGITUDE | DOUBLECORRELATIONID | VARCHAR(STRING)

Apache Kafka – Scalable Stream Processing and more!

Demo (IV) - Create Stream

ksql> SELECT * FROM truck_position_s;

1506922133306 | "truck/13/position0 | �2017-10-02T07:28:53 | 31 | 13 | 371182829 | Memphis to Little Rock | Normal | 41.76 | -89.6 | -20842639519146641061506922133396 | "truck/16/position0 | �2017-10-02T07:28:53 | 19 | 16 | 160405074 | Joplin to Kansas City Route 2 | Normal | 41.48 | -88.07 | -20842639519146641061506922133457 | "truck/30/position0 | �2017-10-02T07:28:53 | 26 | 30 | 160779139 | Des Moines to Chicago Route 2 | Normal | 41.85 | -89.29 | -20842639519146641061506922133485 | "truck/23/position0 | �2017-10-02T07:28:53 | 32 | 23 | 1090292248 | Peoria to Ceder Rapids Route 2 | Normal | 41.48 | -88.07 | -20842639519146641061506922133497 | "truck/12/position0 | �2017-10-02T07:28:53 | 80 | 12 | 1961634315 | Saint Louis to Memphis | Normal | 41.74 | -91.47 | -20842639519146641061506922133547 | "truck/14/position0 | �2017-10-02T07:28:53 | 73 | 14 | 1927624662 | Springfield to KC Via Columbia | Normal | 35.12 | -90.68 | -2084263951914664106

Apache Kafka – Scalable Stream Processing and more!

Demo (IV) - Create Stream

ksql> SELECT * FROM truck_position_s WHERE eventtype != 'Normal';

1506922264016 | "truck/11/position0 | �2017-10-02T07:31:04 | 27 | 11 | 1325712174 | Saint Louis to Tulsa Route2 | Lane Departure | 38.5 | -90.69 | -20842639519146641061506922281156 | "truck/11/position0 | �2017-10-02T07:31:21 | 27 | 11 | 1325712174 | Saint Louis to Tulsa Route2 | Unsafe tail distance | 37.81 | -92.31 | -20842639519146641061506922284436 | "truck/10/position0 | �2017-10-02T07:31:24 | 93 | 10 | 1384345811 | Joplin to Kansas City | Unsafe following distance | 37.02 | -94.54 | -20842639519146641061506922297887 | "truck/11/position0 | �2017-10-02T07:31:37 | 27 | 11 | 1325712174 | Saint Louis to Tulsa Route2 | Unsafe following distance | 37.09 | -94.23 | -2084263951914664106

Apache Kafka – Scalable Stream Processing and more!

Demo (IV) - Create Stream

ksql> CREATE STREAM dangerous_driving_s \WITH (kafka_topic= dangerous_driving', \

value_format='DELIMITED') \AS SELECT * FROM truck_position_s \WHERE eventtype != 'Normal';

Message----------------------------Stream created and running

ksql> select * from dangerous_driving_s;1506922849375 | "truck/11/position0 | �2017-10-02T07:40:49 | 90 | 11 | 160779139 | Des Moines to Chicago Route 2 | Overspeed | 41.48 | -88.07 | 35691830713478983661506922866488 | "truck/11/position0 | �2017-10-02T07:41:06 | 90 | 11 | 160779139 | Des Moines to Chicago Route 2 | Overspeed | 40.38 | -89.17 | 3569183071347898366

Apache Kafka – Scalable Stream Processing and more!

Demo (IV) - Create Stream

ksql> describe dangerous_driving_s;

Field | Type---------------------------------ROWTIME | BIGINTROWKEY | VARCHAR(STRING)TS | VARCHAR(STRING)TRUCKID | VARCHAR(STRING)DRIVERID | BIGINTROUTEID | BIGINTROUTENAME | VARCHAR(STRING)EVENTTYPE | VARCHAR(STRING)LATITUDE | DOUBLELONGITUDE | DOUBLECORRELATIONID | VARCHAR(STRING)

Apache Kafka – Scalable Stream Processing and more!

Demo - All

Truck-2 truck/nn/position

Truck-1

Truck-3

mqtt-source

truck_position

detect_dangerous_driving

dangerous_driving

TruckDriver

jdbc-source trucking_driver

join_dangerous_driving_driver

dangerous_driving_driver

27,Walter,Ward,Y,24-JUL-85,2017-10-0215:19:00

consoleconsumer

2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631

{"id":27,"firstName":"Walter","lastName":"Ward","available":"Y","birthdate":"24-JUL-85","last_update":1506923052012}

Apache Kafka – Scalable Stream Processing and more!

KafkatoCassandra trucking

Demo (V) – Create JDBC Connect through REST API #!/bin/bashcurl -X "POST" "http://192.168.69.138:8083/connectors" \

-H "Content-Type: application/json" \-d $'{

"name": "jdbc-driver-source","config": {

"connector.class": "JdbcSourceConnector","connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample","mode": "timestamp","timestamp.column.name":"last_update","table.whitelist":"driver","validate.non.null":"false","topic.prefix":"trucking_","key.converter":"org.apache.kafka.connect.json.JsonConverter","key.converter.schemas.enable": "false","value.converter":"org.apache.kafka.connect.json.JsonConverter","value.converter.schemas.enable": "false","name": "jdbc-driver-source","transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id"

}}'

Apache Kafka – Scalable Stream Processing and more!

Demo (V) – Create JDBC Connect through REST API

Apache Kafka – Scalable Stream Processing and more!

Demo (V) - Create Table with Driver Stateksql> CREATE TABLE driver_t \

(id BIGINT, \first_name VARCHAR, \last_name VARCHAR, \available VARCHAR) \WITH (kafka_topic='trucking_driver', \

value_format='JSON');Message----------------Table created

Apache Kafka – Scalable Stream Processing and more!

Demo (V) - Create Table with Driver Stateksql> CREATE STREAM dangerous_driving_and_driver_s \WITH (kafka_topic='dangerous_driving_and_driver_s', \

value_format='JSON') \AS SELECT driverid, first_name, last_name, truckid, routeid,routename, eventtype \FROM truck_position_s \LEFT JOIN driver_t \ON dangerous_driving_and_driver_s.driverid = driver_t.id;

Message----------------------------Stream created and running

ksql> select * from dangerous_driving_and_driver_s;1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Memphis to Little Rock Route 2 | Unsafe tail distance1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Joplin to KansasCity | Lane Departure1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Saint Louis to Chicago Route2 | Unsafe tail distance

Apache Kafka – Scalable Stream Processing and more!

Apache Kafka – Scalable Stream Processing and more!

Technology on its own won't help you.You need to know how to use it properly.

top related