flink and nifi, two stars in the apache big data constellation
TRANSCRIPT
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Matthew Ring, Chicago Apache Flink Meetup, Jan. 19, 2016
About me:● Matthew Ring is currently a Senior Software Engineer at HP Enterprise.
● Matt has been a professional Java developer in multiple industries, including finance, healthcare
and education, since 1999.
● Prior to that, he was an electrical engineer in defense communications.
● He is currently working on a new Investigative Analytics product for HP Enterprise.
● He has presented talks at JavaOne and Bank of America's developer conferences.
● His github is https://github.com/mring33621
What is NiFi?Origin:
NSA -> Onyara -> Apache NiFi
-> Hortonworks DataFlow
Summary:
Visual Dataflow Programming for Big Data/Fast Data Ingestion!
(Or, yet another package where you drop stuff on the screen and connect it with arrows)
What is NiFi?IMHO, good for:
● Ingestion● Format Conversion● Light (simple) Processing● Delivery to other systems
Together?● Similar, but different...● Friends in common:
○ Sockets○ Kafka○ HDFS○ Flume○ RabbitMQ○ NATS Messaging○ Elasticsearch○ Solr
● There is also the option of direct NiFi <-> Flink connections!
Together?● NiFi is visual● NiFi keeps a paper trail RE: the data
running through it● Supports monitoring/metrics reporting
○ Ambari○ Ganglia○ Reimann
● Oh, and you can modify flows while they are LIVE!
● NiFi has more friends to bring to the party:○ JSON/Avro/Parquet/Kite○ HTTP/S, UDP, S/FTP○ Text matching/parsing with regex○ Tagging (meta data)○ Scripting○ AWS S3, SQS, SNS, Azure events○ Tailing/Syslog○ HL7○ MongoDB○ HBase○ SQL○ JMS○ Images○ ...AND MORE!
Paper Trail!NiFi records:
● Content● Metadata● Provenance (touches)
Sooooo what?
● Allows replay of individual items!● Queryable through UI or REST interface● Assists in post hoc data forensics (compliance? legal discovery?)
Downsides?● Weak deployment paradigm
○ Can import/export flow templates
○ But various processor config values will need to be updated by hand when moving from env to env
● Weak clustering story○ non-elastic○ SPOF master node
● Weak querying capability from UI● Most processors are micro-batching (event-time stream processing is still
experimental)● Sometimes tedious -- have to think in terms of several little, built-in pieces to
get a simple job done
Demo NotesCustom Java code provides:
● synthetic intraday ticks● trader state management● glue logic● websocket backend for dashboard UI
Custom HTML/JS code provides:
● live dashboard UI● smoothie.js charts● knockout.js binding/templating
NiFi:
● observes orders○ can deny orders based on ‘compliance
rules’● observes executions
○ routes ‘suspicious’ executions to file system for future scrutiny
Flink Streaming provides:
● trade recommendation engine● execution engine