Download - mcubed london - data science at the edge
![Page 1: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/1.jpg)
Data science at the Edge
With NiFi, TensorFlow and a proper
cluster for good measure
Simon Elliston Ball
@sireb
![Page 2: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/2.jpg)
Simon Elliston Ball
• Product Manager
• Data Scientist
• Elephant herder
• @sireb
![Page 3: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/3.jpg)
Data gravity
588,000,000 km
• Size• Distance
![Page 4: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/4.jpg)
Other types of data gravity
•Compliance
• Legislation
•Political
•Paranoia
Photo: https://flic.kr/p/JvW7qh
![Page 5: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/5.jpg)
Sampling vs Big Data: a quick history
• Before we had cloud, clusters and GPUs…• MPP
• Super Computers
• Grids
• Cut down data size to fit in memory
![Page 6: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/6.jpg)
A quick intro to NiFi
• Guaranteed Delivery
• Prioritized queuing and buffering
• Data provenance
• Bi-directional communication
• Security – Authentication and multi-role authorization
• Visual command and control
• Templating
• Robust API
![Page 7: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/7.jpg)
and lots of adapters
![Page 8: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/8.jpg)
Demo: sending stuff around
• Pushing camera frames to the cloud
![Page 9: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/9.jpg)
Face detectionKey point locations
Lightweight models
Low contextual data
![Page 10: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/10.jpg)
face detection
• Simple haarcascader in opencv: https://github.com/simonellistonball/nifi-OpenCV
![Page 11: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/11.jpg)
Dlib Face
Detection
• 68 Facial Point Model
• c. 100MB
![Page 12: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/12.jpg)
Tensorflow in NiFi
• Our haarcascade was… Face detection didn’t do a great job
• Neural Networks
• Relatively Large models• Haarcascader: 677KB of XML
• Facenet trained model on LFW: 168 MB (and that’s zipped protobufs)
• Tensorflow: https://github.com/tspannhw/nifi-tensorflow-processor.git
![Page 13: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/13.jpg)
Face recognition• Huge databases of face hashes and feature measures
• Extra information and context around the person
• Computationally expensive and heavy network use
• Apple Face ID demo… too many people had tried the device beforehand, blew the database. One or two faces is easy, millions is another matter
![Page 14: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/14.jpg)
Rocket ship to the cloud
https://www.nasa.gov/sites/default/files/thumbnails/image/s83-35620-3k.jpg
![Page 15: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/15.jpg)
Cloud: ML all packaged up… for a price
![Page 16: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/16.jpg)
Tensorflow on Spark
• Why?
• Doesn’t TensorFlow already have a distributed compute model?
Existing clusters, multi-purpose clusters:
• Tensorframes, TensorflowOnSpark, CaffeOnSpark, Spark ML, SQL
• When?
• Training, batch scoring
![Page 17: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/17.jpg)
Broadening the example
• Where is your context?
• Why do you need context?• Detection
• Explanation
![Page 18: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/18.jpg)
Body worn video
• Record everything
• Record when you remember to press the button
• Record when it matters
What about?
• Live assist
• Evidence and accountability
![Page 19: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/19.jpg)
Netflow
Cybersecurity: progressive context
• Record everything: PCAP
• Send up the (maybe) interesting bits
• Fetch detail on demand
PCAP at Edge
1ST Pass Model Security Data Analytics Platform
adds context, more compute intensive modelling etc
Hmmm… That’s interesting
Let me tell you more…
“small” data flow
![Page 20: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/20.jpg)
ANPR: or why you can’t hide from parking fines
![Page 21: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/21.jpg)
Summary: progressive enhancement of context
Is it worth processing? Rough-cut and hashing Expensive deep analysis
@sireb
677KB of local model O(100MB) models Cloud scale models and data
name
Simon Elliston Ball
cognitive.face.emotion
surprise
cognitive.face.exposure
overExposure
cognitive.face.noise
high
![Page 22: mcubed london - data science at the edge](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6e50f27f8b9a635a8b5913/html5/thumbnails/22.jpg)
Thank you!
@sireb