developing real-time data pipelines with spring and kafka
TRANSCRIPT
Developing real-time data pipelines with Spring and
KafkaMarius Bogoevici
Staff Engineer, Pivotal
@mariusbogoevici
Agenda• The Spring ecosystem today
• Spring Integration and Spring Integration Kafka
• Data integration
• Spring XD
• Spring Cloud Data Flow
Spring Framework
• Since 2002
• Java-based enterprise application development
• “Plumbing” should not be a developer concern
• Platform agnostic
Have you seen Spring lately?• XML-less operation (since Spring 3.0, 2009)
• Component detection via @ComponentScan
• Declarative stereotypes:
• @Component, @Controller, @Repository
• Dependency injection @Autowired
• Extensive ecosystem
A simple REST controller@RestControllerpublic class GreetingController {
private static final String template = "Hello, %s!"; private final AtomicLong counter = new AtomicLong();
@RequestMapping("/greeting") public Greeting greeting(@PathVariable(value="name", defaultValue="World") String name) { return new Greeting(counter.incrementAndGet(), String.format(template, name)); }}
Spring ecosystem
Spring Data: case study
Spring Data• Spring-based data access model
• Data mapping and repository abstractions
• Retains the characteristics of underlying data store
• Framework generated implementation
• Customized query support
Spring Data Repositories
public interface PersonRepository extends CrudRepository<Person, Long> {Person findByFirstName(String firstName);
}
@RestControllerpublic class PersonController {
@Autowired PersonRepository repository;
@RequestMapping(“/“) public List<Person> getAll() { return repository.findAll(); }
@RequestMapping(“/{firstName}”) public Person readOne(@PathVariable String firstName) { return repository.findByFirstname(String name); }}
Only declare the interfaces
Implementation is generated and injected
Spring DataJPA
Spring Data
REST
Spring Boot• Auto configuration: infrastructure automatically
created based on class path contents
• Smart defaults
• Standalone executable artifacts (“just run”)
• Uberjar + embedded runtime
• Configuration via CLI, environment
Spring Boot Application@Controller@EnableAutoConfigurationpublic class SampleController {
@RequestMapping("/") @ResponseBody String home() { return "Hello World!"; }
public static void main(String[] args) throws Exception { SpringApplication.run(SampleController.class, args); }}
java -jar application.jar
Spring Integration• Since 2007
• Pipes and Filters: Messages, Channels, Endpoints
• Enterprise Integration Patterns as first-class constructs
• Large set of adapters
• Java DSL
Spring IntegrationMessage
Encapsulates Data
(headers + payload)
Channel
Transports Data
Endpoint
Handles Data
Example: a simple pipeline
Message Translator
integerMessageSource inputChannel queueChannelmyFlow (transform, filter)
Example: a simple pipeline@Configuration@EnableIntegrationpublicclassMyConfiguration{@BeanpublicMessageSource<?>integerMessageSource(){MethodInvokingMessageSourcesource=newMethodInvokingMessageSource();source.setObject(newAtomicInteger());source.setMethodName("getAndIncrement");returnsource;}
@BeanpublicDirectChannelinputChannel(){returnnewDirectChannel();}
@BeanpublicIntegrationFlowmyFlow(){returnIntegrationFlows.from(this.integerMessageSource(),c->c.poller(Pollers.fixedRate(100))).channel(this.inputChannel()).filter((Integerp)->p>0).transform(Object::toString).channel(MessageChannels.queue()).get();}}
Spring Integration: Cafe Example
Spring Integration Components
• Enterprise Integration Patterns:
• Filter, Transform, Gateway, Service Activator, Aggregator, Channel Adapter, Routing Slip
• Adapters:
• JMS, RabbitMQ, Kafka, MongoDB, JDBC, Splunk, AWS (S3, SQS), Twitter, Email, etc.
Spring Integration Kafka• Started in 2011
• Goal: adapting to the abstractions Spring Messaging and Spring Integration
• Easy access to the unique features of Kafka;
• Namespace, Java DSL support
• To migrate to 0.9 once available
• Defaults focused towards performance (disable ID generation, timestamp)
Spring Integration Kafka: Channel Adapters
Kafka Inbound Channel Adapter
Kafka Outbound Channel AdapterMessage Channel
Message Message
Spring Integration Kafka Producer Configuration
• Default producer configuration
• Distinct per-topic producer configurations
• Destination target or partition controlled via expression evaluation or headers
Spring Integration Kafka Consumer
• Own client based on Simple Consumer API
• Listen to specific partitions!
• Offset control - when to be written and where (no Zookeeper);
• Programmer-controlled acknowledgment;
• Concurrent message processing (preserving per-partition ordering)
• Basic operations via KafkaTemplate
• Kafka specific headers
Spring Integration Kafka Message Listener
• Auto-acknowledging
• With manual acknowledgment
publicinterfaceMessageListener{ voidonMessage(KafkaMessagemessage);}
publicinterfaceAcknowledgingMessageListener{ voidonMessage(KafkaMessagemessage,Acknowledgmentacknowledgment);}
Spring Integration Kafka: Offset Management
• Injectable strategy
• Allows customizing the starting offsets
• Implementations: SI MetadataStore-backed (e.g. Redis, Gemfire), Kafka compacted topic-backed (pre-0.8.2), Kafka 0.8.2 native
• Messages can be auto acknowledged (by the adapter) or manually acknowledged (by the user)
• Manual acknowledgment useful when messages are processed asynchronously
• Acknowledgment passed as message header or as argument
Stream processing with Spring XD
• Higher abstractions are required
• Integrating seamlessly and transparently with the middleware
• Building on top of Spring Integration and Spring Batch
• Pre-built modules using the entire power of the Spring ecosystem
Streams in Spring XD
HTTP$JMS$Ka*a$
RabbitMQ$JMS$
Gemfire$File$SFTP$Mail$JDBC$Twi;er$Syslog$TCP$UDP$MQTT$Trigger$
Filter$Transformer$
Spli;er$Aggregator$HTTP$Client$
JPMML$Evaluator$Shell$Python$Groovy$Java$RxJava$
Spark$Streaming$
File$HDFS$HAWQ$Ka*a$
RabbitMQ$Redis$Splunk$Mongo$Redis$JDBC$TCP$Log$Mail$
Gemfire$MQTT$
Dynamic$Router$Counters$
Note: Named channels allow for a directed graph of data flow
channel
Spring XD: Stream DSL
Spring XD - Message Bus abstraction
• Binds module inputs and outputs to a transportBinds module inputs and outputs to a transport Performs Serialization (Kryo) Local, Rabbit, Redis, and Kafka
XD Modules
XD Admin
XD Containers
Zoo
Keep
erZo
o Ke
eperAdmin / Flo UI
Shell
CURL
Spring XD Architecture
Database© Copyright 2015 Pivotal. All rights reserved.
Spring XD and Kafka - the message bus
• Each pipe between modules is a topic;
• Spring XD creates topics automatically;
• Topics are pre-partitioned based on module count and concurrency;
• Overpartitioning is available as an option;
• Multiple consumer modules ‘divide’ the partition set of a topic using a deterministic algorithm;
Partitioning in Spring XD• Required in distributed stateful processing: related data must be
processed on the same node;
• Partitioning logic configured in Spring XD via deployment manifest
• partitionKeyExpression=payload.sensorId
• When using Kafka as a bus, partition key logic maps directly to Kafka transport partitioning natively
Partitioned Streams with Kafka
Partition 0
Partition 1
HTTP
HTTP
HTTP
Average Processor
Average Processor
Topic
http | avg-temperatures
Performance metrics of Spring XD 1.2
Performance metrics of Spring XD 1.2
Spring Cloud Data Flow (Spring XD 2.0)
© Copyright 2015 Pivotal. All rights reserved.
Goals• Scale without undeploying running stream or
batch pipelines • Avoid hierarchical ‘classloader' issues, inadvertent
spiral of ‘xd/lib’ • Skip network hops within a stream • Do rolling upgrades and continuous deployments
Spring Cloud Data Flow isa cloud native programming and operating model
for composable data microservices on a structured platform
© Copyright 2015 Pivotal. All rights reserved.
Spring Cloud Data Flow isa cloud native programming and operating model for composable data microservices on a
structured platform
© Copyright 2015 Pivotal. All rights reserved.
Spring Cloud Data Flow isa cloud native programming and operating
model for composable data microservices on a structured platform
@EnableBinding(Source.class)public class Greeter { @InboundChannelAdapter(Source.OUTPUT) public String greet() { return "hello world”; }}
@EnableBinding(Source.class)
@EnableBinding(Processor.class)
@EnableBinding(Sink.class)
public interface Source { String OUTPUT = "output"; @Output(Source.OUTPUT) MessageChannel output();}
© Copyright 2015 Pivotal. All rights reserved.
Spring Cloud Data Flow isa cloud native programming and operating
model for composable data microservices on a structured platform
continuous delivery
continuous deployment
monitoring
© Copyright 2015 Pivotal. All rights reserved.
Spring Cloud Data Flow isa cloud native programming and operating model for composable data microservices on a
structured platform
http transform jdbc
job foo < bar || baz & jaz
> bye
Streams
Jobsfoo
bar jaz
baz
bye
| |
© Copyright 2015 Pivotal. All rights reserved.
Spring Cloud Data Flow isa cloud native programming and operating model for composable data microservices on a
structured platform
YARN
?LATTICE
© Copyright 2015 Pivotal. All rights reserved.
AdminAdmin / Flo UI
Shell
CURL ??X
YARN
Bootified Modules
New Architecture
© Copyright 2015 Pivotal. All rights reserved.
XD [ Container ] Orchestration
ZooKeeper
HOST
XD ContainerXD Container
XD Modules© Copyright 2015 Pivotal. All rights reserved.
Messaging-Driven Data Microservices
HOST
Spring Cloud Stream Modules© Copyright 2015 Pivotal. All rights reserved.
Orchestrate Composable Data Microservices
HOST
Cloud Foundry YARN X
Spring Cloud Data Flow
Lattice
Spring Cloud Stream ModulesSpring Cloud Stream Binders [Rabbit, Kafka, Redis]
Partitioned stream scaling with SCDF and Kafka
INSTANCE_INDEX=0
HTTP
…
INSTANCE_INDEX=0
INSTANCE_INDEX=1
INSTANCE_INDEX=6
…
LOG
Kafka Service
http.0 (0) http.0 (1)
http.0 (2)
http.0 (6)
…
Broker 0 Broker 1 Broker 4
stream create logger --definition "http | log"
stream deploy logger --properties module.log.partitioned=true, module.log.count=7
Summary• Scalable pipelines composed of Spring Boot cloud
native applications
• Spring Cloud Stream provides the programming model
• Transparently mapping to Kafka-native concepts
• Spring Cloud Data Flow provides the orchestration model
Questions?
https://spring.io
http://projects.spring.io/spring-xd/http://cloud.spring.io/spring-cloud-dataflow/
http://projects.spring.io/spring-integration/
http://cloud.spring.io/spring-cloud-stream/