developing real-time data pipelines with spring and kafka

50
Developing real-time data pipelines with Spring and Kafka Marius Bogoevici Staff Engineer, Pivotal @mariusbogoevici

Upload: mariusbogoevici

Post on 15-Jan-2017

7.661 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Developing real-time data pipelines with Spring and Kafka

Developing real-time data pipelines with Spring and

KafkaMarius Bogoevici

Staff Engineer, Pivotal

@mariusbogoevici

Page 2: Developing real-time data pipelines with Spring and Kafka

Agenda• The Spring ecosystem today

• Spring Integration and Spring Integration Kafka

• Data integration

• Spring XD

• Spring Cloud Data Flow

Page 3: Developing real-time data pipelines with Spring and Kafka

Spring Framework

• Since 2002

• Java-based enterprise application development

• “Plumbing” should not be a developer concern

• Platform agnostic

Page 4: Developing real-time data pipelines with Spring and Kafka

Have you seen Spring lately?• XML-less operation (since Spring 3.0, 2009)

• Component detection via @ComponentScan

• Declarative stereotypes:

• @Component, @Controller, @Repository

• Dependency injection @Autowired

• Extensive ecosystem

Page 5: Developing real-time data pipelines with Spring and Kafka

A simple REST controller@RestControllerpublic class GreetingController {

private static final String template = "Hello, %s!"; private final AtomicLong counter = new AtomicLong();

@RequestMapping("/greeting") public Greeting greeting(@PathVariable(value="name", defaultValue="World") String name) { return new Greeting(counter.incrementAndGet(), String.format(template, name)); }}

Page 6: Developing real-time data pipelines with Spring and Kafka

Spring ecosystem

Page 7: Developing real-time data pipelines with Spring and Kafka
Page 8: Developing real-time data pipelines with Spring and Kafka

Spring Data: case study

Page 9: Developing real-time data pipelines with Spring and Kafka

Spring Data• Spring-based data access model

• Data mapping and repository abstractions

• Retains the characteristics of underlying data store

• Framework generated implementation

• Customized query support

Page 10: Developing real-time data pipelines with Spring and Kafka

Spring Data Repositories

public interface PersonRepository extends CrudRepository<Person, Long> {Person findByFirstName(String firstName);

}

@RestControllerpublic class PersonController {

@Autowired PersonRepository repository;

@RequestMapping(“/“) public List<Person> getAll() { return repository.findAll(); }

@RequestMapping(“/{firstName}”) public Person readOne(@PathVariable String firstName) { return repository.findByFirstname(String name); }}

Only declare the interfaces

Implementation is generated and injected

Page 11: Developing real-time data pipelines with Spring and Kafka

Spring DataJPA

Spring Data

REST

Page 12: Developing real-time data pipelines with Spring and Kafka

Spring Boot• Auto configuration: infrastructure automatically

created based on class path contents

• Smart defaults

• Standalone executable artifacts (“just run”)

• Uberjar + embedded runtime

• Configuration via CLI, environment

Page 13: Developing real-time data pipelines with Spring and Kafka

Spring Boot Application@Controller@EnableAutoConfigurationpublic class SampleController {

@RequestMapping("/") @ResponseBody String home() { return "Hello World!"; }

public static void main(String[] args) throws Exception { SpringApplication.run(SampleController.class, args); }}

java -jar application.jar

Page 14: Developing real-time data pipelines with Spring and Kafka

Spring Integration• Since 2007

• Pipes and Filters: Messages, Channels, Endpoints

• Enterprise Integration Patterns as first-class constructs

• Large set of adapters

• Java DSL

Page 15: Developing real-time data pipelines with Spring and Kafka

Spring IntegrationMessage

Encapsulates Data

(headers + payload)

Channel

Transports Data

Endpoint

Handles Data

Page 16: Developing real-time data pipelines with Spring and Kafka

Example: a simple pipeline

Message Translator

integerMessageSource inputChannel queueChannelmyFlow (transform, filter)

Page 17: Developing real-time data pipelines with Spring and Kafka

Example: a simple pipeline@Configuration@EnableIntegrationpublicclassMyConfiguration{@BeanpublicMessageSource<?>integerMessageSource(){MethodInvokingMessageSourcesource=newMethodInvokingMessageSource();source.setObject(newAtomicInteger());source.setMethodName("getAndIncrement");returnsource;}

@BeanpublicDirectChannelinputChannel(){returnnewDirectChannel();}

@BeanpublicIntegrationFlowmyFlow(){returnIntegrationFlows.from(this.integerMessageSource(),c->c.poller(Pollers.fixedRate(100))).channel(this.inputChannel()).filter((Integerp)->p>0).transform(Object::toString).channel(MessageChannels.queue()).get();}}

Page 18: Developing real-time data pipelines with Spring and Kafka

Spring Integration: Cafe Example

Page 19: Developing real-time data pipelines with Spring and Kafka

Spring Integration Components

• Enterprise Integration Patterns:

• Filter, Transform, Gateway, Service Activator, Aggregator, Channel Adapter, Routing Slip

• Adapters:

• JMS, RabbitMQ, Kafka, MongoDB, JDBC, Splunk, AWS (S3, SQS), Twitter, Email, etc.

Page 20: Developing real-time data pipelines with Spring and Kafka

Spring Integration Kafka• Started in 2011

• Goal: adapting to the abstractions Spring Messaging and Spring Integration

• Easy access to the unique features of Kafka;

• Namespace, Java DSL support

• To migrate to 0.9 once available

• Defaults focused towards performance (disable ID generation, timestamp)

Page 21: Developing real-time data pipelines with Spring and Kafka

Spring Integration Kafka: Channel Adapters

Kafka Inbound Channel Adapter

Kafka Outbound Channel AdapterMessage Channel

Message Message

Page 22: Developing real-time data pipelines with Spring and Kafka

Spring Integration Kafka Producer Configuration

• Default producer configuration

• Distinct per-topic producer configurations

• Destination target or partition controlled via expression evaluation or headers

Page 23: Developing real-time data pipelines with Spring and Kafka

Spring Integration Kafka Consumer

• Own client based on Simple Consumer API

• Listen to specific partitions!

• Offset control - when to be written and where (no Zookeeper);

• Programmer-controlled acknowledgment;

• Concurrent message processing (preserving per-partition ordering)

• Basic operations via KafkaTemplate

• Kafka specific headers

Page 24: Developing real-time data pipelines with Spring and Kafka

Spring Integration Kafka Message Listener

• Auto-acknowledging

• With manual acknowledgment

publicinterfaceMessageListener{ voidonMessage(KafkaMessagemessage);}

publicinterfaceAcknowledgingMessageListener{ voidonMessage(KafkaMessagemessage,Acknowledgmentacknowledgment);}

Page 25: Developing real-time data pipelines with Spring and Kafka

Spring Integration Kafka: Offset Management

• Injectable strategy

• Allows customizing the starting offsets

• Implementations: SI MetadataStore-backed (e.g. Redis, Gemfire), Kafka compacted topic-backed (pre-0.8.2), Kafka 0.8.2 native

• Messages can be auto acknowledged (by the adapter) or manually acknowledged (by the user)

• Manual acknowledgment useful when messages are processed asynchronously

• Acknowledgment passed as message header or as argument

Page 26: Developing real-time data pipelines with Spring and Kafka

Stream processing with Spring XD

• Higher abstractions are required

• Integrating seamlessly and transparently with the middleware

• Building on top of Spring Integration and Spring Batch

• Pre-built modules using the entire power of the Spring ecosystem

Page 27: Developing real-time data pipelines with Spring and Kafka

Streams in Spring XD

HTTP$JMS$Ka*a$

RabbitMQ$JMS$

Gemfire$File$SFTP$Mail$JDBC$Twi;er$Syslog$TCP$UDP$MQTT$Trigger$

Filter$Transformer$

Spli;er$Aggregator$HTTP$Client$

JPMML$Evaluator$Shell$Python$Groovy$Java$RxJava$

Spark$Streaming$

File$HDFS$HAWQ$Ka*a$

RabbitMQ$Redis$Splunk$Mongo$Redis$JDBC$TCP$Log$Mail$

Gemfire$MQTT$

Dynamic$Router$Counters$

Note: Named channels allow for a directed graph of data flow

channel

Page 28: Developing real-time data pipelines with Spring and Kafka

Spring XD: Stream DSL

Page 29: Developing real-time data pipelines with Spring and Kafka

Spring XD - Message Bus abstraction

• Binds module inputs and outputs to a transportBinds module inputs and outputs to a transport Performs Serialization (Kryo) Local, Rabbit, Redis, and Kafka

Page 30: Developing real-time data pipelines with Spring and Kafka

XD Modules

XD Admin

XD Containers

Zoo

Keep

erZo

o Ke

eperAdmin / Flo UI

Shell

CURL

Spring XD Architecture

Database© Copyright 2015 Pivotal. All rights reserved.

Page 31: Developing real-time data pipelines with Spring and Kafka

Spring XD and Kafka - the message bus

• Each pipe between modules is a topic;

• Spring XD creates topics automatically;

• Topics are pre-partitioned based on module count and concurrency;

• Overpartitioning is available as an option;

• Multiple consumer modules ‘divide’ the partition set of a topic using a deterministic algorithm;

Page 32: Developing real-time data pipelines with Spring and Kafka

Partitioning in Spring XD• Required in distributed stateful processing: related data must be

processed on the same node;

• Partitioning logic configured in Spring XD via deployment manifest

• partitionKeyExpression=payload.sensorId

• When using Kafka as a bus, partition key logic maps directly to Kafka transport partitioning natively

Page 33: Developing real-time data pipelines with Spring and Kafka

Partitioned Streams with Kafka

Partition 0

Partition 1

HTTP

HTTP

HTTP

Average Processor

Average Processor

Topic

http | avg-temperatures

Page 34: Developing real-time data pipelines with Spring and Kafka

Performance metrics of Spring XD 1.2

Page 35: Developing real-time data pipelines with Spring and Kafka

Performance metrics of Spring XD 1.2

Page 36: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow (Spring XD 2.0)

© Copyright 2015 Pivotal. All rights reserved.

Page 37: Developing real-time data pipelines with Spring and Kafka

Goals• Scale without undeploying running stream or

batch pipelines • Avoid hierarchical ‘classloader' issues, inadvertent

spiral of ‘xd/lib’ • Skip network hops within a stream • Do rolling upgrades and continuous deployments

Page 38: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow isa cloud native programming and operating model

for composable data microservices on a structured platform

© Copyright 2015 Pivotal. All rights reserved.

Page 39: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow isa cloud native programming and operating model for composable data microservices on a

structured platform

© Copyright 2015 Pivotal. All rights reserved.

Page 40: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow isa cloud native programming and operating

model for composable data microservices on a structured platform

@EnableBinding(Source.class)public class Greeter { @InboundChannelAdapter(Source.OUTPUT) public String greet() { return "hello world”; }}

@EnableBinding(Source.class)

@EnableBinding(Processor.class)

@EnableBinding(Sink.class)

public interface Source { String OUTPUT = "output"; @Output(Source.OUTPUT) MessageChannel output();}

© Copyright 2015 Pivotal. All rights reserved.

Page 41: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow isa cloud native programming and operating

model for composable data microservices on a structured platform

continuous delivery

continuous deployment

monitoring

© Copyright 2015 Pivotal. All rights reserved.

Page 42: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow isa cloud native programming and operating model for composable data microservices on a

structured platform

http transform jdbc

job foo < bar || baz & jaz

> bye

Streams

Jobsfoo

bar jaz

baz

bye

| |

© Copyright 2015 Pivotal. All rights reserved.

Page 43: Developing real-time data pipelines with Spring and Kafka

Spring Cloud Data Flow isa cloud native programming and operating model for composable data microservices on a

structured platform

YARN

?LATTICE

© Copyright 2015 Pivotal. All rights reserved.

Page 44: Developing real-time data pipelines with Spring and Kafka

AdminAdmin / Flo UI

Shell

CURL ??X

YARN

Bootified Modules

New Architecture

© Copyright 2015 Pivotal. All rights reserved.

Page 45: Developing real-time data pipelines with Spring and Kafka

XD [ Container ] Orchestration

ZooKeeper

HOST

XD ContainerXD Container

XD Modules© Copyright 2015 Pivotal. All rights reserved.

Page 46: Developing real-time data pipelines with Spring and Kafka

Messaging-Driven Data Microservices

HOST

Spring Cloud Stream Modules© Copyright 2015 Pivotal. All rights reserved.

Page 47: Developing real-time data pipelines with Spring and Kafka

Orchestrate Composable Data Microservices

HOST

Cloud Foundry YARN X

Spring Cloud Data Flow

Lattice

Spring Cloud Stream ModulesSpring Cloud Stream Binders [Rabbit, Kafka, Redis]

Page 48: Developing real-time data pipelines with Spring and Kafka

Partitioned stream scaling with SCDF and Kafka

INSTANCE_INDEX=0

HTTP

INSTANCE_INDEX=0

INSTANCE_INDEX=1

INSTANCE_INDEX=6

LOG

Kafka Service

http.0 (0) http.0 (1)

http.0 (2)

http.0 (6)

Broker 0 Broker 1 Broker 4

stream create logger --definition "http | log"

stream deploy logger --properties module.log.partitioned=true, module.log.count=7

Page 49: Developing real-time data pipelines with Spring and Kafka

Summary• Scalable pipelines composed of Spring Boot cloud

native applications

• Spring Cloud Stream provides the programming model

• Transparently mapping to Kafka-native concepts

• Spring Cloud Data Flow provides the orchestration model

Page 50: Developing real-time data pipelines with Spring and Kafka

Questions?

https://spring.io

http://projects.spring.io/spring-xd/http://cloud.spring.io/spring-cloud-dataflow/

http://projects.spring.io/spring-integration/

http://cloud.spring.io/spring-cloud-stream/