flink and nifi, two stars in the apache big data constellation

16
Flink and NiFi, Two Stars in the Apache Big Data Constellation Matthew Ring, Chicago Apache Flink Meetup, Jan. 19, 2016

Upload: matthew-ring

Post on 15-Apr-2017

2.504 views

Category:

Software


4 download

TRANSCRIPT

Flink and NiFi, Two Stars in the Apache Big Data Constellation

Matthew Ring, Chicago Apache Flink Meetup, Jan. 19, 2016

About me:● Matthew Ring is currently a Senior Software Engineer at HP Enterprise.

● Matt has been a professional Java developer in multiple industries, including finance, healthcare

and education, since 1999.

● Prior to that, he was an electrical engineer in defense communications.

● He is currently working on a new Investigative Analytics product for HP Enterprise.

● He has presented talks at JavaOne and Bank of America's developer conferences.

● His github is https://github.com/mring33621

What is NiFi?Origin:

NSA -> Onyara -> Apache NiFi

-> Hortonworks DataFlow

Summary:

Visual Dataflow Programming for Big Data/Fast Data Ingestion!

(Or, yet another package where you drop stuff on the screen and connect it with arrows)

What is NiFi?IMHO, good for:

● Ingestion● Format Conversion● Light (simple) Processing● Delivery to other systems

Screenshot?from: https://www.silvercloudcomputing.com/nifi.html

What is Flink?I’m pretty sure you’ve

already heard about it...

Together?● Similar, but different...● Friends in common:

○ Sockets○ Kafka○ HDFS○ Flume○ RabbitMQ○ NATS Messaging○ Elasticsearch○ Solr

● There is also the option of direct NiFi <-> Flink connections!

Together?● NiFi is visual● NiFi keeps a paper trail RE: the data

running through it● Supports monitoring/metrics reporting

○ Ambari○ Ganglia○ Reimann

● Oh, and you can modify flows while they are LIVE!

● NiFi has more friends to bring to the party:○ JSON/Avro/Parquet/Kite○ HTTP/S, UDP, S/FTP○ Text matching/parsing with regex○ Tagging (meta data)○ Scripting○ AWS S3, SQS, SNS, Azure events○ Tailing/Syslog○ HL7○ MongoDB○ HBase○ SQL○ JMS○ Images○ ...AND MORE!

Paper Trail!NiFi records:

● Content● Metadata● Provenance (touches)

Sooooo what?

● Allows replay of individual items!● Queryable through UI or REST interface● Assists in post hoc data forensics (compliance? legal discovery?)

Downsides?● Weak deployment paradigm

○ Can import/export flow templates

○ But various processor config values will need to be updated by hand when moving from env to env

● Weak clustering story○ non-elastic○ SPOF master node

● Weak querying capability from UI● Most processors are micro-batching (event-time stream processing is still

experimental)● Sometimes tedious -- have to think in terms of several little, built-in pieces to

get a simple job done

NOW IS TIME FOR QUIZ!

...err, how ‘bout a demo?

Demo NotesCustom Java code provides:

● synthetic intraday ticks● trader state management● glue logic● websocket backend for dashboard UI

Custom HTML/JS code provides:

● live dashboard UI● smoothie.js charts● knockout.js binding/templating

NiFi:

● observes orders○ can deny orders based on ‘compliance

rules’● observes executions

○ routes ‘suspicious’ executions to file system for future scrutiny

Flink Streaming provides:

● trade recommendation engine● execution engine

Demo: Screenshot of NiFi Flow

Demo: Screenshot of Live Web Dashboard

Questions?

Thank you!