spark meets spring
TRANSCRIPT
2 © Copyright 2015 Pivotal. All rights reserved.
Jobs, Steps, Readers, Writers
Ingestion, Export, Orchestration, Hadoop
Controllers, REST, WebSocket
Channels, Adapters, Filters, Transformers
WEB INTEGRATION BATCH BIG DATA
SPRING CORE
FRAMEWORK SECURITY REACTOR
DATA
RELATIONAL DATA ACCESS
NON-RELATIONAL DATA ACCESS
BOOT
Bootable, Minimal, Ops-Ready
GRAILS Full-stack, Web
IO EXECUTION
IO FOUNDATION
IO COORDINATION SPRING CLOUD XD
Stream, Taps, Jobs
3 © Copyright 2015 Pivotal. All rights reserved.
Spring XD – 10,000 ft view
Spring XD Runtime
BIDIRECTIONAL
Compute HDFS
RDBMS
NoSQL
R, SAS
Streams Jobs
ingest workflow
export
taps
Predictive Modelling
>_
Redis
4 © Copyright 2015 Pivotal. All rights reserved.
Core Concepts
• Modules – Source polls external source or Event Driven – Processor takes input and produces output – Sink consumes input, outputs to external
system
• Streams – Source | {Processor}0…n | Sink
• Taps – Dynamically add taps to listen for events
• Jobs – Directed Graph of Steps – ETL jobs based on Spring Batch – Workflow orchestration on Hadoop or Spark
5 © Copyright 2015 Pivotal. All rights reserved.
Ingestion
� Stream data from a variety of sources
� Write data to a variety of sinks
� Dozens of sources/sinks out of the box – Kafka, Files, Gemfire, HTTP, HDFS…
� How to do this in XD? – Pipes and filters DSL
stream create tweets –definition “twitterstream | hdfs”
6 © Copyright 2015 Pivotal. All rights reserved.
Streams
HTTP Tail File Mail
Twi,er Gemfire Syslog TCP UDP JMS
RabbitMQ MQTT Trigger
Reactor TCP/UDP
Filter Transformer
Object-‐to-‐JSON JSON-‐to-‐Tuple
Spli,er Aggregator HTTP Client
JPMML Evaluator Shell Groovy Python Java
File HDFS JDBC TCP Log Mail
RabbitMQ Gemfire Splunk MQTT
Dynamic Router Counters
7 © Copyright 2015 Pivotal. All rights reserved.
Real Time Processing � Counters
� Model Scoring
� Functional Stream Processing – RxJava, Spark Streaming
� Custom Java, Python Code
� Spring Data Repositories – Map data structures to objects – Store in Cassandra, Gemfire, Neo4j, MongoDB, Elastic Search,
Couchbase, JPA..
8 © Copyright 2015 Pivotal. All rights reserved.
Real Time Processing � Custom Java code
� Tap to count events
� Tap to count occurrence of language
stream create tweets –definition “twitterstream | myProcessor | hdfs”
stream create tweetcount --definition "tap:stream:tweets > aggregate-counter"
stream create tweetlang --definition "tap:stream:tweets > field-value-counter --fieldName=lang”
9 © Copyright 2015 Pivotal. All rights reserved.
Dashboard and REST APIs � Spring XD REST APIs for Analytics – Easy to Create Counters, Gauges – Invoked by JavaScript Libraries - D3.js
� Spring Data Repositories – Map data structures to objects – Store in Cassandra, Gemfire, Neo4j,
Mongo DB, Elastic Search, Couchbase, JPA…
� Spring Data REST – Easy to expose REST APIs for
Repositories
10 © Copyright 2015 Pivotal. All rights reserved.
Batch Processing
� Job Orchestration – Hadoop (M/R, Pig, Hive) – Spark Batch
� ETL – CSV to HDFS – HDFS to JDBC
job create myjob --definition "hdfsjdbc --resources=/xd/data/*.csv --names=forename,surname --tableName=people"
11 © Copyright 2015 Pivotal. All rights reserved.
XD Admin Leader XD Admin
Leader XD Admin
Leader ZK
XD Container XD Container
module
module
module
� XD Admin – Assigns Modules to Containers – Re-Assigns on failures for HA
� Zoo Keeper – Tracks Container State
� XD Container – Standalone, YARN, or Cloud Foundry – Loads modules
▪ Isolates class loader and context
– Connects to data bus ▪ In memory direct channel ▪ Kafka, Rabbit MQ, Redis
XD UI XD Shell
Kafka/RabbitMQ/Redis
module
module
module
module
module
Batch Job State DB Analytics Repository
Runtime
12 © Copyright 2015 Pivotal. All rights reserved.
Data Partitioning • PartitionKey – what field in the data to partition on – e.g. payload.customer.id
• Partition ID - key.hashcode % count
13 © Copyright 2015 Pivotal. All rights reserved.
Spring XD – Spark Stream Processing XD Handles the Input/Output
Message Bus Receiver in Spark Cluster Message Bus Sender in Spark Cluster
Events processed at micro batch level
Java and Scala Interfaces Implement process method Process DStream input Create DStream output
14 © Copyright 2015 Pivotal. All rights reserved.
Resources � Code – https://github.com/spring-projects/spring-xd – https://github.com/spring-projects/spring-xd-samples
� Docs – http://docs.spring.io/spring-xd/docs/current/reference/html/
September 14-17, 2015 Washington, DC
http://springone2gx.com/