stream processing at scale with spring xd and kafka
TRANSCRIPT
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SPRINGONE2GXWASHINGTON, DC
Stream Processing at Scale with Spring XD and Kafka
By Marius Bogoevici @mariusbogoevici
(and more)
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
About me
• Staff Engineer at Pivotal
• Works on Spring XD, Spring Cloud Stream, Spring Cloud Data Flow, Spring Integration, Spring Integration
• Formerly: InfinityQuick, Red Hat (JBoss), SpringSource
2
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Safe Harbor Statement
The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward looking information in this presentation.
3
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 4
Stream Processing in the Big Data
landscape
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
The domain of stream processing
• Now have the ability to cheaply store and analyze huge quantities of data
• Traditionally the realm of batch processing but also demand for real-time
• aka ‘Stream Processing’
• Some characteristics of Stream Processing
• High volume
• Low latency
• Often data is grouped and ordered
5
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Streaming and real-time analysis examples
• “Internet of Things”
• Fraud Detection
• Measuring Quality of Service
• Predictive Maintenance
• Log aggregation
6
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 7
Our Toolkit
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 8
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Kafka message
• Initially developed at LinkedIn, open-sourced in 2011
• High-throughput distributed messaging system
• Publish-subscribe messaging rethought as a message log
• Decoupled data pipelines between producers and consumers
9
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Kafka: Topics and logs
• Each topic, divided in many partitions read/written independently
• Producers can target partitions and topics directly
• Distributed, replicated
• Linear reads, writes
10
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Producers and consumers
• Producers, consumers completely decoupled from each other
• Partitions are distributed and replicated
• Reads/writes from the partition leader
11
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Why Kafka for Stream Processing
• High throughput, low latency due to linear reads/writes
• High performance on commodity hardware
• Replayability (consuming from existing offsets)
• Can be used as a master dataset
• Consumer groups are completely independent of each other
• Guaranteed ordered delivery within a partition
• Competing consumer scenarios are difficult to implement
• Generally, not an issue with stream processing
12
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 13
+
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka
• Started in 2011
• Goal: adapting to the abstractions Spring Messaging and Spring Integration
• Easy access to the unique features of Kafka;
• Namespace, Java DSL support
• To migrate to 0.8.3 once available
• Defaults focused towards performance (disable ID generation, timestamp)
14
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka: Channel Adapters
15
Kafka Inbound Channel Adapter Kafka Outbound Channel AdapterSpring Messaging Message Channel
Spring Messaging Message
Spring Messaging Message
DI Configuration DI Configuration
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka Producer Configuration
• Default producer configuration
• Distinct per-topic producer configurations
• Destination target or partition controlled via SPEL expressions or headers
16
<int-‐kafka:producer-‐context id="kafkaProducerContext"> <int-‐kafka:producer-‐configurations> <int-‐kafka:producer-‐configuration broker-‐list="localhost:9092" key-‐class-‐type="java.lang.String" value-‐class-‐type="java.lang.String" topic="test1" value-‐serializer="kafkaSerializer" key-‐serializer="kafkaSerializer" compression-‐type="none"/> <int-‐kafka:producer-‐configuration broker-‐list="localhost:9092" topic="regextopic.*" compression-‐type="gzip"/> </int-‐kafka:producer-‐configurations> </int-‐kafka:producer-‐context>
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka Consumer• Own client based on Simple Consumer API
• Listen to specific partitions!
• Offset control - when to be written and where (no Zookeeper);
• Programmer-controlled acknowledgment;
• SI defaults focused towards performance
• No ID, timestamp generation (can be turned on, optionally)
• Concurrent message processing (preserving per-partition ordering) via Reactor
• Basic operations via KafkaTemplate
• Kafka specific headers
17
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka Consumer Example
18
<int-‐kafka:zookeeper-‐connect id="zkConnect" zk-‐connect=“localhost:2181"/>
<bean id="kafkaConfiguration" class="org.springframework.integration.kafka.core.ZookeeperConfiguration"> <constructor-‐arg ref="zkConnect"/> </bean>
<int-‐kafka:message-‐driven-‐channel-‐adapter id="adapter" channel="output" connection-‐factory="connectionFactory" key-‐decoder="decoder" payload-‐decoder="decoder" max-‐fetch="100" topics="${kafka.test.topic}"/>
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka Message Listener
• Auto-acknowledging
• With manual acknowledgment
19
public interface MessageListener { void onMessage(KafkaMessage message); }
public interface AcknowledgingMessageListener { void onMessage(KafkaMessage message, Acknowledgment acknowledgment); }
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka: Message Headers
20
public abstract class KafkaHeaders {private static final String PREFIX = "kafka_";public static final String TOPIC = PREFIX + "topic";public static final String MESSAGE_KEY = PREFIX + "messageKey";public static final String PARTITION_ID = PREFIX + "partitionId";public static final String OFFSET = PREFIX + "offset";public static final String NEXT_OFFSET = PREFIX + "nextOffset";public static final String ACKNOWLEDGMENT = PREFIX +
"acknowledgment";}
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Integration Kafka: Offset Management
• Injectable strategy
• Allows customizing the starting offsets
• Implementations: SI MetadataStore-backed (e.g. Redis, Gemfire), Kafka compacted topic-backed (pre-0.8.2), Kafka 0.8.2 native
• Messages can be auto acknowledged (by the adapter) or manually acknowledged (by the user)
• Manual acknowledgment useful when messages are processed asynchronously
• Acknowledgment passed as message header or as argument
21
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
DEMO: Spring Integration Kafka at Work
22
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 23
++
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD and Stream processing
• Higher abstractions are required
• Integrating seamlessly and transparently with the middleware
• Building on top of Spring Integration and Spring Batch
• Pre-built modules using the entire power of the Spring ecosystem
24
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Streams in Spring XD
25
HTTP$JMS$Ka*a$
RabbitMQ$JMS$
Gemfire$File$SFTP$Mail$JDBC$Twi;er$Syslog$TCP$UDP$MQTT$Trigger$
Filter$Transformer$
Spli;er$Aggregator$HTTP$Client$
JPMML$Evaluator$Shell$Python$Groovy$Java$RxJava$
Spark$Streaming$
File$HDFS$HAWQ$Ka*a$
RabbitMQ$Redis$Splunk$Mongo$Redis$JDBC$TCP$Log$Mail$
Gemfire$MQTT$
Dynamic$Router$Counters$
Note: Named channels allow for a directed graph of data flow
channel
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD: Stream DSL
26
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD - Message Bus abstraction
• Binds module inputs and outputs to a transport
27
Binds module inputs and outputs to a transport Performs Serialization (Kryo) Local, Rabbit, Redis, and Kafka
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD Distributed Runtime
28
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD and Kafka - the message bus
• Each pipe between modules is a topic;
• Spring XD creates topics automatically;
• Topics are pre-partitioned based on module count and concurrency;
• Overpartitioning is available as an option;
• Multiple consumer modules ‘divide’ the partition set of a topic using a deterministic algorithm;
29
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD and Kafka: Partitioning
• Required in distributed stateful processing: related data must be processed on the same node;
• Partitioning logic configured in Spring XD via deployment manifest
• partitionKeyExpression=payload.sensorId
• When using Kafka as a bus, partition key logic maps directly to Kafka transport partitioning natively
30
Partition 0
Partition 1
HTTP
HTTP
HTTP
Average Processor
Average Processor
Topic
http | avg-temperatures
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD and Kafka:
31
Combine the high performance of Kafka with the powerful abstractions of Spring XD
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Performance metrics of Spring XD 1.2
32
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Performance metrics of Spring XD 1.2
33
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Demo: Stream processing at scale with Spring XD
34
High%volume%
Low%Latency%
Programming%Model%
Distributed%Run;me%
Ka=a%%(moving%data)%
RxJava%(processing)%
Spring%XD%(distribu;ng)%
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 35
Cloud Native
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Stream Processing at Scale: Message-driven Microservices
36
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Scaling with streams
• Parallelism is key for scaling
• A single machine’s resources can be exhausted really fast
• CPU: serialization
• Memory: keeping state, caching lookup data
• Disk: especially for brokers
• Network: saturating the interfaces
• Vertical scaling == exponential cost
• And failover redundancy is still a requirement
• Cluster sizes can get really big
• tens, hundreds of module instances
• which leads to ….
37
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 38
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring Cloud Stream/Spring Cloud Data Flow
• Successors to Spring XD 1.x
• New goals:
• Cloud native;
• Micro service architecture for modules;
• Most valuable features should be preserved;
• Abstracting out the connection to the transport
• Same abstractions, backwards compatible DSL;
39
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 40Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Spring XD Modules
• Execute inside of a Spring XD Container
• Consist of beans in an application context
• Multiple modules may run in a container, each with their own class loader
5
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 41Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 7
• OOTB modules • Executable Boot Apps
• Boot for Spring Integration and Batch • Auto-Configuration, Bindings
XD
Container
Modules
Admin
Spring Cloud Data Flow
Spring Cloud Stream
Modules
Spring Cloud Task Modules
Spring Cloud Stream
Spring Cloud Task
• REST API, Shell, UI • Module Deployer SPI:
• Singlenode • Lattice • YARN • Cloud Foundry
From Modules to Microservices
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 42Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 10
Container Container
Spring XD 1.x
Host
Spring Cloud Stream
• Modules run inside containers • Container is a Boot app • Modules are ApplicationContexts
• Modules are executable Boot apps • Easier to use Spring Cloud features • Easier to test, consistent dev experience • Portable and “cloud-native”
M
Host
M
M M
M M M
M M
M
ZooKeeper Spring Cloud Data Flow (optional) SPI Implementation
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Transparent development model
43
public interface Source { String OUTPUT = "output"; @Output(Source.OUTPUT) MessageChannel output(); }
@EnableBinding(Source.class) @EnableConfigurationProperties(TimeSourceProperties.class) @Import(PeriodicTriggerConfiguration.class) public class TimeSource { @Autowired private TimeSourceProperties properties; @InboundChannelAdapter(value = Source.OUTPUT, poller = @Poller( trigger = PeriodicTriggerConfiguration.TRIGGER_BEAN_NAME, maxMessagesPerPoll = "1")) public String publishTime() { return new SimpleDateFormat(this.properties.getFormat()).format(new Date()); } }
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Where is Kafka ?
• Out of sight, but not out of mind
• On the class path ( just like all the other binders)
• Bindings/Channel Adapters are created transparently
• Channel names are topic names (overridable by Boot properties)
• Reusing code of the Spring XD 1.x Kafka message bus
• Partitioning/multiple instance support, with a microservice twist
• Spring Boot properties to compute the target partition set (instanceIndex, instanceCount)
44
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 45
INSTANCE_INDEX=0
HTTP
…
INSTANCE_INDEX=0
INSTANCE_INDEX=1
INSTANCE_INDEX=6
…
LOG
Kafka Service
http.0 (0) http.0 (1)
http.0 (2)
http.0 (6)
…
Broker 0 Broker 1 Broker 4
stream create logger --definition "http | log"
stream deploy logger --properties module.log.partitioned=true, module.log.count=7
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 46
++++
= massive scale!
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Demo: Scaled, Partitioned Streams on PCF and Lattice
47
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 48
Tell your friends!
Developing Real-Time Data Pipelines with Apache Kafka - Joe Stein, Thursday, 10:30
Learn More. Stay Connected.
@springcentral Spring.io/video