apache kafka-a distributed streaming platform

Post on 08-Feb-2017

779 Views

Category:

Software

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apache Kafka

A Distributed Streaming Platform

StreamProcessing.be - Belgium Wednesday, 18th January 2017

< paolo @ confluent.io >

https://www.confluent.io/blog/stream-data-platform-1/

Industry shift from Big Data to Fast Data and Stream Processing

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt

Apache Kafka APIs and UNIX analogy

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt

Connect APIs

Apache Kafka APIs and UNIX analogy

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt

Producer/Consumer APIs

Apache Kafka APIs and UNIX analogy

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt

Streams APIs

Apache Kafka APIs and UNIX analogy

Streams APIs part of Apache Kafka

http://kafka.apache.org/documentation/streams

Build applications, not clusters

<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.1</version> </dependency>

Spot the difference(s)

How do I run in production?

How do I run in production?

As any other Java applications...

How do I run in production?

Uncool Cool

Typical High Level Architecture

Typical High Level Architecture

Real-time Data

Ingestion

Typical High Level Architecture

Stream Processing

Storage

Real-time Data

Ingestion

Typical High Level Architecture

Data Publishing / Visualization

Stream Processing

Storage

Real-time Data

Ingestion

How many clusters do you count?

NoSQL (Cassandra,

HBase, Couchbase,

MongoDB, …) or

Elasticsearch, Solr,

Storm, Flink, Spark

Streaming, Ignite, Akka

Streams, Apex, …

HDFS, NFS, Ceph,

GlusterFS, Lustre,

...

Apache Kafka

Simplicity is the ultimate sophistication

Apache Kafka Distributed Streaming Platform

Publish & Subscribe to streams of data like a messaging system

Store streams of data safely in a distributed replicated cluster

Process streams of data efficiently and in real-time

Node.js

Apache Kafka and Streams APIs benefits

• Build applications, not clusters • Native integration with Apacke Kafka • Elastic, fast, distributed, fault-tolerant, secure • Scalable: S, M, L, XL, XXL • Run everywhere: from containers to cloud • Streams (with KStream) and tables (with KTable)

• Local state replicated to Kafka for fault-tolerance • Windowing and event time semantics out of the box • Supports late-arriving and out-of-order events

Apache Kafka adoption across the industry… … everybody loves simplicity!

References

• http://kafka.apache.org/ • http://kafka.apache.org/documentation/streams

• http://docs.confluent.io/

• http://docs.confluent.io/current/streams/

• http://blog.confluent.io/

• http://github.com/confluentinc/examples

• http://github.com/apache/kafka/tree/trunk/streams

References

The easiest way to get you started

https://www.confluent.io/download/

SIMPLICITY

WE

YOUR FEEDBACK!

Discount code: kafcom17

Use the Apache Kafka community discount code to get $50 off

www.kafka-summit.org

Kafka Summit New York: May 8

Kafka Summit San Francisco: August 28

Presented by

top related