aws user group meetup berlin - kay lerch on apache nifi (2016-04-19)

34
www.immobilienscout24.de Visual Dataflows with Apache NiFi - and how they interact with AWS AWS Berlin UserGroup – 2016-04-19 – Speaker: @KayLerch

Upload: kay-lerch

Post on 23-Jan-2018

829 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

www.immobilienscout24.de

Visual Dataflows with Apache NiFi

- and how they interact with AWS

AWS Berlin UserGroup – 2016-04-19 – Speaker: @KayLerch

Page 2: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Agenda

1. Why Apache NiFi?

2. That‘s Apache NiFi - Exploring the UI

3. AWS Integration Capabilities

4. AWS IoT – Basics (Recap)

5. Apache NiFi and AWS IoT

Seite 2

Apache NiFi & AWS | Kay Lerch

Page 3: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Why Apache NiFi

A brief overview of data processing and analysis

Seite 3

Apache NiFi & AWS | Kay Lerch

Page 4: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

A brief overview of data processing and analysis

Stone age: no tooling at all

Seite 4

Data Producers

Data Consumers

Potential

Bottleneck

Integration

challenges

Left alone with

analytic challenges

unreliable

delivery

Apache NiFi & AWS | Kay Lerch

IoAT

Page 5: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

A brief overview of data processing and analysis

Bronze age: invent the wheel (event broker) for reliable (message) transportation

Seite 5

Data Producers

Data Consumers

EventBroker

Limited durability

Hidden complexities

Left alone with

analytic challenges

Apache NiFi & AWS | Kay Lerch

Page 6: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

A brief overview of data processing and analysis

Industrialization: stores for massive production of durable yet unstructured information

Seite 6

Hidden complexities

Data Producers

Data Consumers

EventBroker

Ingestion c

hallenges

Realtim

ela

g

Left alone with

analytic challenges

Data analysis challengesData processing challenges

Data security challenges

(Big) data stores

Apache NiFi & AWS | Kay Lerch

Page 7: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

A brief overview of data processing and analysis

Digital age: realtime processing and analysis of (streaming) data

Seite 7

Hidden complexities

Data Producers

Data Consumers

EventBroker

(Big) data stores

Ingestion c

hallenges

Data analysis challenges(Realtime) Data Processing & Analytics

Apache NiFi & AWS | Kay Lerch

Integration

challenges

Data security

challenges

Page 8: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

A brief overview of data processing and analysis

That‘s quite a lot of tooling and technology …

Seite 8

Hidden complexities

Data Producers

Data Consumers

EventBroker

(Big) data stores

Data analysis challenges(Realtime) Data Processing & Analytics

Apache NiFi & AWS | Kay Lerch

Integration

challenges

Data security challenges

Ingestion c

hallenges

Data security

challenges

Page 9: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

A brief overview of data processing and analysis

That‘s quite a lot of tooling and technology …

Seite 9

Hidden complexities

Data Producers

Data Consumers

EventBroker

(Big) data stores

Data analysis challenges(Realtime) Data Processing & Analytics

Apache NiFi & AWS | Kay Lerch

Integration

challengesIngestion c

hallenges

Data security

challenges

If you want …

a (realtime) big picture of your dataflows an option to overlook lineage of each data element have the flexibility to change things on the fly prioritize data overcome challenges of integrating the variety of

technologies with one overarching solution enforce security and compliance along dataflows rely on extensibility driven by OS community satisfy all those needs and keep your tools get rid of only those tools focused on moving data without

making concessions to overall performance

… then you might love:

Page 10: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

That’s Apache NiFi

Seite 10

Apache NiFi & AWS | Kay Lerch

Page 11: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 11

That‘s Apache NiFi

in one page

Apache nifi supports powerful and scalable directed graphs of datarouting, transformation, and system mediation logic.

Web-based user interface

Seamless experience between design, control, feedback, and monitoring

Highly configurable

Loss tolerant vs guaranteed delivery, Low latency vs high throughput, Dynamic prioritization, Flow can be modified at runtime, Back pressure

Data Provenance

Track dataflow from beginning to end

Designed for extension

Build your own processors and more, Enables rapid development and effective testing

Secure

SSL, SSH, HTTPS, encrypted content, etc..., Pluggable role-based authentication/authorization

Apache NiFi & AWS | Kay LerchSource: https://nifi.apache.org/

Page 12: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 12

That‘s Apache NiFi

in real and feel

Go to NiFi’s interface and understand:

Processors

Templates

Concept of back pressure

Concept of data prioritization

Provenence Graph

Apache NiFi & AWS | Kay Lerch

Page 13: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

NiFi Cluster

NiFi Cluster Manager (NCM)

JVM

Node (Primary)

NiFi Clustered Architecture

JVM

Webserver

Provenance

Repository

Content

Repository

Flowfile

Repository

REST-APIAdminUI

Webserver REST-APIAdmin UI

Flow Controller

Cluster Manager

Processor 1

Processor 2

Isolated Processor

Controller Service 1

Controller Service 2

Controller Service n

Heart

beat

Leader

ele

ction

Report

change

Embedded Apache Zookeeper

Node (Slave)

JVM

Webserver

Provenance

Repository

Content

Repository

Flowfile

Repository

REST-APIAdminUI

Flow Controller

Processor 1

Processor 2

Isolated Processor

Controller Service 1

Controller Service 2

Controller Service n

Heart

beat

Report

change

Sync

Sta

te

Sync

Sta

te

Page 14: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

AWS Integration Capabilities

Seite 14

Apache NiFi & AWS | Kay Lerch

Page 15: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 15

Demos

to be covered

AWS Credential Provider Service

Integrating SQS

Integrating S3

Integrating Lambda

Integrating Kinesis Firehose

Integrating SNS

Integrating IoT

Apache NiFi & AWS | Kay Lerch

Page 16: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

AWS IoT

Basics (Recap)

Seite 16

Apache NiFi & AWS | Kay Lerch

Page 17: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 17

AWS IoT

The Shadow

AWS IoT

Thing

Thing Shadow

Rule

Reports State Mirrors State

in Shadow

Gets reported

state or sets

desired state

Propagates

desired state

Receives

desired state

Fulfills

desired s

tate

Subscribes to

particular messages

AWS Services

Some AWS

Resource

Routes

message

TLS1.2

TLS 1.2 Policy

Apache NiFi & AWS | Kay Lerch

Page 18: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

AWS IoT

MQTT topics

AWS IoT

Thing Shadow

get

get/accepted

get/rejected

Request state

Get shadow state

Get error

update

update/accepted

update/rejected

update/delta

Update state

Confirmation

Get error

Changed

state

1

2

1

2

3

Thing topics name pattern: $aws/things/thing_name/...

Apache NiFi & AWS | Kay Lerch

Page 19: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Apache NiFi & AWS IoT

New processors

Seite 19

Apache NiFi & AWS | Kay Lerch

Page 20: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 20

Apache NiFi & AWS IoT

Where NiFi comes in

If your managed services you want to integrate with your „things“ run on AWS you are good to go => Thing rules

If not, you need either an MQTT client (=> live data) or an application which communicates with managed AWS API (for shadow data)

AWS announced MQTT over WebSockets in January 2016

Which means you’re not limited to TLS connections anymore

Establish durable connection to AWS IoT endpoint

Then talk MQTT over websockets in order to subscribe or publish to the thing topics

AWS service limit on connection duration: 300 seconds

You need a way to reconnect your client to hold your MQTT subscriptions

NiFi processors have potential to become MQTT clients __|

Apache NiFi & AWS | Kay Lerch

Page 21: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 21

Apache NiFi & AWS IoT

GetIOTMqtt – a MQTT client

AWS IoT

Thing Shadow

update

Update state

Establish

Connection

Subscribe

Receive state

1

2

3

Flow

file

Apache NiFi & AWS | Kay Lerch

Page 22: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Apache NiFi & AWS IoT

GetIOTMqtt – Reconnect accordingly

First of all: I don’t want to wait for the auto-termination. I want to act upfront

AWS IoT does not support persistent client sessions

Therefore:

If disconnecting and then reconnecting there is a short gap in which I probably miss a message

If a reconnect and then disconnect there is a short gap in which I probably receive messages twice

Fortunately one of these effects is officially accepted by the client anyways due to the quality of service level

if a subscription is desired with QoS=0 (“at most once message delivery”)

=> disconnect, then reconnect

=> maybe message loss

=> that’s fine

if a subscription is desired with QoS=1 (“at least one message delivery”)

=> reconnect, then disconnect

=> maybe duplicate message

=> that’s fine

QoS=2 (“exact one message delivery”) is not supported by AWS IoT __|

Session 1 Session 2

connect close connect

Potential

message loss

Session 1

Session 2

Potential

duplicates

connect closeconnect

Session 3

closeconnect

Potential

duplicates

close connect

Potential

message loss

Page 23: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 23

Apache NiFi & AWS IoT

GetIOTMqtt – Configuration

Apache NiFi & AWS | Kay Lerch

Page 24: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 24

Apache NiFi & AWS IoT

GetIOTMqtt – Live demo

Apache NiFi & AWS | Kay Lerch

Page 25: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 25

Apache NiFi & AWS IoT

GetIOTMqtt – Live demo

Apache NiFi & AWS | Kay Lerch

Page 26: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 26

Apache NiFi & AWS IoT

PutIOTMqtt – instruct a „thing“ (but bypass the shadow)

AWS IoT

Thing Shadow

update / delta

Update state

Establish

Connection

Publish state

1

2Flow

file

Flow

file

Apache NiFi & AWS | Kay Lerch

Page 27: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 27

Apache NiFi & AWS IoT

PutIOTMqtt – Configuration

Apache NiFi & AWS | Kay Lerch

Page 28: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 28

Apache NiFi & AWS IoT

PutIOTMqtt – Live demo

Apache NiFi & AWS | Kay Lerch

Page 29: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 29

Apache NiFi & AWS IoT

GetIOTShadow – constantly check last reported state

AWS IoT

Thing Shadow

update

Report state

Request

Shadow

Flow

file

Flow

file

Apache NiFi & AWS | Kay Lerch

Page 30: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 30

Apache NiFi & AWS IoT

GetIOTShadow – Configuration

Apache NiFi & AWS | Kay Lerch

Page 31: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 31

Apache NiFi & AWS IoT

PutIOTShadow – instruct a „thing“ over its shadow

AWS IoT

Thing Shadow

update / delta

Desire state

Update

Shadow

Flow

file

Flow

file

Apache NiFi & AWS | Kay Lerch

Page 32: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 32

Apache NiFi & AWS IoT

PutIOTShadow – Configuration

Apache NiFi & AWS | Kay Lerch

Page 33: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

Seite 33

More to come

MiNiFi (lightweight agent as data collectors)

Variable registry

Improvement on HA / Cluster management

Multi tenancy

More Processors

Extension registry (choose nar’s from a central repository)

Apache NiFi & AWS | Kay Lerch

Page 34: AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)

www.immobilienscout24.de

Thanks for you attention. Any questions?

Contact:

Immobilien Scout GmbH

Andreasstraße 10

10243 Berlin

Kay Lerch

Fon +49 30 24 301-1149

[email protected]