aws user group meetup berlin - kay lerch on apache nifi (2016-04-19)
TRANSCRIPT
www.immobilienscout24.de
Visual Dataflows with Apache NiFi
- and how they interact with AWS
AWS Berlin UserGroup – 2016-04-19 – Speaker: @KayLerch
Agenda
1. Why Apache NiFi?
2. That‘s Apache NiFi - Exploring the UI
3. AWS Integration Capabilities
4. AWS IoT – Basics (Recap)
5. Apache NiFi and AWS IoT
Seite 2
Apache NiFi & AWS | Kay Lerch
Why Apache NiFi
A brief overview of data processing and analysis
Seite 3
Apache NiFi & AWS | Kay Lerch
A brief overview of data processing and analysis
Stone age: no tooling at all
Seite 4
Data Producers
Data Consumers
Potential
Bottleneck
Integration
challenges
Left alone with
analytic challenges
unreliable
delivery
Apache NiFi & AWS | Kay Lerch
IoAT
A brief overview of data processing and analysis
Bronze age: invent the wheel (event broker) for reliable (message) transportation
Seite 5
Data Producers
Data Consumers
EventBroker
Limited durability
Hidden complexities
Left alone with
analytic challenges
Apache NiFi & AWS | Kay Lerch
A brief overview of data processing and analysis
Industrialization: stores for massive production of durable yet unstructured information
Seite 6
Hidden complexities
Data Producers
Data Consumers
EventBroker
Ingestion c
hallenges
Realtim
ela
g
Left alone with
analytic challenges
Data analysis challengesData processing challenges
Data security challenges
(Big) data stores
Apache NiFi & AWS | Kay Lerch
A brief overview of data processing and analysis
Digital age: realtime processing and analysis of (streaming) data
Seite 7
Hidden complexities
Data Producers
Data Consumers
EventBroker
(Big) data stores
Ingestion c
hallenges
Data analysis challenges(Realtime) Data Processing & Analytics
Apache NiFi & AWS | Kay Lerch
Integration
challenges
Data security
challenges
A brief overview of data processing and analysis
That‘s quite a lot of tooling and technology …
Seite 8
Hidden complexities
Data Producers
Data Consumers
EventBroker
(Big) data stores
Data analysis challenges(Realtime) Data Processing & Analytics
Apache NiFi & AWS | Kay Lerch
Integration
challenges
Data security challenges
Ingestion c
hallenges
Data security
challenges
A brief overview of data processing and analysis
That‘s quite a lot of tooling and technology …
Seite 9
Hidden complexities
Data Producers
Data Consumers
EventBroker
(Big) data stores
Data analysis challenges(Realtime) Data Processing & Analytics
Apache NiFi & AWS | Kay Lerch
Integration
challengesIngestion c
hallenges
Data security
challenges
If you want …
a (realtime) big picture of your dataflows an option to overlook lineage of each data element have the flexibility to change things on the fly prioritize data overcome challenges of integrating the variety of
technologies with one overarching solution enforce security and compliance along dataflows rely on extensibility driven by OS community satisfy all those needs and keep your tools get rid of only those tools focused on moving data without
making concessions to overall performance
… then you might love:
That’s Apache NiFi
Seite 10
Apache NiFi & AWS | Kay Lerch
Seite 11
That‘s Apache NiFi
in one page
Apache nifi supports powerful and scalable directed graphs of datarouting, transformation, and system mediation logic.
Web-based user interface
Seamless experience between design, control, feedback, and monitoring
Highly configurable
Loss tolerant vs guaranteed delivery, Low latency vs high throughput, Dynamic prioritization, Flow can be modified at runtime, Back pressure
Data Provenance
Track dataflow from beginning to end
Designed for extension
Build your own processors and more, Enables rapid development and effective testing
Secure
SSL, SSH, HTTPS, encrypted content, etc..., Pluggable role-based authentication/authorization
Apache NiFi & AWS | Kay LerchSource: https://nifi.apache.org/
Seite 12
That‘s Apache NiFi
in real and feel
Go to NiFi’s interface and understand:
Processors
Templates
Concept of back pressure
Concept of data prioritization
Provenence Graph
Apache NiFi & AWS | Kay Lerch
NiFi Cluster
NiFi Cluster Manager (NCM)
JVM
Node (Primary)
NiFi Clustered Architecture
JVM
Webserver
Provenance
Repository
Content
Repository
Flowfile
Repository
REST-APIAdminUI
Webserver REST-APIAdmin UI
Flow Controller
Cluster Manager
Processor 1
Processor 2
Isolated Processor
Controller Service 1
Controller Service 2
Controller Service n
Heart
beat
Leader
ele
ction
Report
change
Embedded Apache Zookeeper
Node (Slave)
JVM
Webserver
Provenance
Repository
Content
Repository
Flowfile
Repository
REST-APIAdminUI
Flow Controller
Processor 1
Processor 2
Isolated Processor
Controller Service 1
Controller Service 2
Controller Service n
Heart
beat
Report
change
Sync
Sta
te
Sync
Sta
te
AWS Integration Capabilities
Seite 14
Apache NiFi & AWS | Kay Lerch
Seite 15
Demos
to be covered
AWS Credential Provider Service
Integrating SQS
Integrating S3
Integrating Lambda
Integrating Kinesis Firehose
Integrating SNS
Integrating IoT
Apache NiFi & AWS | Kay Lerch
AWS IoT
Basics (Recap)
Seite 16
Apache NiFi & AWS | Kay Lerch
Seite 17
AWS IoT
The Shadow
AWS IoT
Thing
Thing Shadow
Rule
Reports State Mirrors State
in Shadow
Gets reported
state or sets
desired state
Propagates
desired state
Receives
desired state
Fulfills
desired s
tate
Subscribes to
particular messages
AWS Services
Some AWS
Resource
Routes
message
TLS1.2
TLS 1.2 Policy
Apache NiFi & AWS | Kay Lerch
AWS IoT
MQTT topics
AWS IoT
Thing Shadow
get
get/accepted
get/rejected
Request state
Get shadow state
Get error
update
update/accepted
update/rejected
update/delta
Update state
Confirmation
Get error
Changed
state
1
2
1
2
3
Thing topics name pattern: $aws/things/thing_name/...
Apache NiFi & AWS | Kay Lerch
Apache NiFi & AWS IoT
New processors
Seite 19
Apache NiFi & AWS | Kay Lerch
Seite 20
Apache NiFi & AWS IoT
Where NiFi comes in
If your managed services you want to integrate with your „things“ run on AWS you are good to go => Thing rules
If not, you need either an MQTT client (=> live data) or an application which communicates with managed AWS API (for shadow data)
AWS announced MQTT over WebSockets in January 2016
Which means you’re not limited to TLS connections anymore
Establish durable connection to AWS IoT endpoint
Then talk MQTT over websockets in order to subscribe or publish to the thing topics
AWS service limit on connection duration: 300 seconds
You need a way to reconnect your client to hold your MQTT subscriptions
NiFi processors have potential to become MQTT clients __|
Apache NiFi & AWS | Kay Lerch
Seite 21
Apache NiFi & AWS IoT
GetIOTMqtt – a MQTT client
AWS IoT
Thing Shadow
update
Update state
Establish
Connection
Subscribe
Receive state
1
2
3
Flow
file
Apache NiFi & AWS | Kay Lerch
Apache NiFi & AWS IoT
GetIOTMqtt – Reconnect accordingly
First of all: I don’t want to wait for the auto-termination. I want to act upfront
AWS IoT does not support persistent client sessions
Therefore:
If disconnecting and then reconnecting there is a short gap in which I probably miss a message
If a reconnect and then disconnect there is a short gap in which I probably receive messages twice
Fortunately one of these effects is officially accepted by the client anyways due to the quality of service level
if a subscription is desired with QoS=0 (“at most once message delivery”)
=> disconnect, then reconnect
=> maybe message loss
=> that’s fine
if a subscription is desired with QoS=1 (“at least one message delivery”)
=> reconnect, then disconnect
=> maybe duplicate message
=> that’s fine
QoS=2 (“exact one message delivery”) is not supported by AWS IoT __|
Session 1 Session 2
connect close connect
Potential
message loss
Session 1
Session 2
Potential
duplicates
connect closeconnect
Session 3
closeconnect
Potential
duplicates
close connect
Potential
message loss
Seite 23
Apache NiFi & AWS IoT
GetIOTMqtt – Configuration
Apache NiFi & AWS | Kay Lerch
Seite 24
Apache NiFi & AWS IoT
GetIOTMqtt – Live demo
Apache NiFi & AWS | Kay Lerch
Seite 25
Apache NiFi & AWS IoT
GetIOTMqtt – Live demo
Apache NiFi & AWS | Kay Lerch
Seite 26
Apache NiFi & AWS IoT
PutIOTMqtt – instruct a „thing“ (but bypass the shadow)
AWS IoT
Thing Shadow
update / delta
Update state
Establish
Connection
Publish state
1
2Flow
file
Flow
file
Apache NiFi & AWS | Kay Lerch
Seite 27
Apache NiFi & AWS IoT
PutIOTMqtt – Configuration
Apache NiFi & AWS | Kay Lerch
Seite 28
Apache NiFi & AWS IoT
PutIOTMqtt – Live demo
Apache NiFi & AWS | Kay Lerch
Seite 29
Apache NiFi & AWS IoT
GetIOTShadow – constantly check last reported state
AWS IoT
Thing Shadow
update
Report state
Request
Shadow
Flow
file
Flow
file
Apache NiFi & AWS | Kay Lerch
Seite 30
Apache NiFi & AWS IoT
GetIOTShadow – Configuration
Apache NiFi & AWS | Kay Lerch
Seite 31
Apache NiFi & AWS IoT
PutIOTShadow – instruct a „thing“ over its shadow
AWS IoT
Thing Shadow
update / delta
Desire state
Update
Shadow
Flow
file
Flow
file
Apache NiFi & AWS | Kay Lerch
Seite 32
Apache NiFi & AWS IoT
PutIOTShadow – Configuration
Apache NiFi & AWS | Kay Lerch
Seite 33
More to come
MiNiFi (lightweight agent as data collectors)
Variable registry
Improvement on HA / Cluster management
Multi tenancy
More Processors
Extension registry (choose nar’s from a central repository)
Apache NiFi & AWS | Kay Lerch
www.immobilienscout24.de
Thanks for you attention. Any questions?
Contact:
Immobilien Scout GmbH
Andreasstraße 10
10243 Berlin
Kay Lerch
Fon +49 30 24 301-1149