![Page 1: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/1.jpg)
SynapseThe Hive Big Data Platform
Mohan Reddy, Chief ArchitectThe Hive
![Page 2: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/2.jpg)
Vision
The HivePortfolio
Online Enterprise Internet of Things
ApplicationsApplications
SynapseSynapseBig DataBig Data
Data InfrastructureData Infrastructure
Knowledge Action
The HiveBig Data
Stack
![Page 3: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/3.jpg)
• Accelerate product development & go‐to‐market of The Hive portfolio companies
• Plug the latest open source innovations in data science & infrastructure
• Engage & contribute back to relevant open source communities
• Share insights & experiences with The Hive Think Tank
Goals of Synapse
![Page 4: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/4.jpg)
Synapse for IoT Applications
Smart Home Smart Building Smart Factory
Synapse Data InfrastructureSynapse Data Infrastructure
Data‐driven ControlData‐driven ControlDeep LearningDeep Learning
SecuritySecurity
Business AppsBusiness AppsThe HiveIoT Portfolio
![Page 5: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/5.jpg)
5
Synapse IoT Compute Models
![Page 6: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/6.jpg)
• Fast changing open source technologies adding complexity to application design
• Realtime stream analytics for operations that can respond to patterns in live data streams
• Rethinking trade‐offs between scale‐up & scale‐out architectures, especially for realtime use‐cases
• Faster machine learning through smarter partitioning of data & parallelism in model building
• Data management, lineage and curation add significant overheads to product development
Trends driving Synapse Design
![Page 7: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/7.jpg)
7
Synapse Infrastructure Services
Visualization Service APIs
Machine Learning Provisioning & Deployment
Stream Processing Batch Processing
Storage
Data Ingestion & Lineage
![Page 8: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/8.jpg)
8
Synapse Service Abstractions
Visualization Service APIs
Machine Learning Provisioning & Deployment
Stream Processing Batch Processing
Storage
Data Ingestion & Lineage
Taswira Alchemy
Akili Chombo
Tempus Huduma
Ukoo
Duka
LambdaArchitecture
![Page 9: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/9.jpg)
9
Extendable Service Implementationsby Present/Future Open Source Projects
Visualization Service APIs
Machine Learning Provisioning & Deployment
Stream Processing Batch Processing
Storage
Data Ingestion & Lineage
Mophiline Kite Falcon
![Page 10: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/10.jpg)
• A framework to build, reuse, link, manage and run data and job pipelines
• The pipeline is a collection of procedural steps, interactions, input and output ‐ steps needed to describe a big data business process
• Datasets come from different sources, industry‐standard and proprietary adapters, Apache Flume, MQTT, iBeacon etc.,
• Based on Apache Falcon, Kite SDK, Morphlines
10
Ukoo ‐ Data Ingestion, Lineage and Management
![Page 11: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/11.jpg)
• An extensible framework to process realtime data and an API to compute real time ranking and aggregations
• Works with Spark Streaming and Storm
• Real time classification
11
Tempus ‐ Realtime Processor
![Page 12: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/12.jpg)
12
Tempus Speed Layer
• Stream Processing• Continuous
Computation • Transactional• Stores limited window
of data
• Complexity Isolated in this layer only
• Fault tolerant by autocorrection in the next batch run
• Compensates for batch latency
![Page 13: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/13.jpg)
• Data adapters and pipelines for different sources
• DSL based jobs using Scalding• Data connectors to storage layer
supporting HBase, Cassandra and Redis
• Input to machine learning models
13
Huduma ‐ Batch Processor
![Page 14: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/14.jpg)
• Framework and Infrastructure to run machine learning models
• Embedded models with code generation in R, Javascript and Java
• Online Classification Service• Large scale collaborative filtering
based recommendation engine• Uses MLLIB, GraphLab and
OXData.• Based on SMAC/Auto
Weka/GhostFace model selection
14
Akili ‐Machine Learning As a Service
![Page 15: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/15.jpg)
15
Akili – Schematic Description
![Page 16: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/16.jpg)
• Real‐time and batch views of data
• REST Interface• Scalable and Highly Available• Generic Service which
interfaces with Data Storage and other realtime and batch processes
16
Alchemy ‐ Service Layer
![Page 17: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/17.jpg)
• Scalable Interactive visualization
• Uses D3, Aperture and Gephi
• Works with Tableau.
17
Taswira ‐ Visualization framework
![Page 18: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive](https://reader034.vdocuments.us/reader034/viewer/2022052503/544604c1b1af9fca0b8b45e8/html5/thumbnails/18.jpg)
• Deployment of the components as a lightweight, portable, self‐sufficient container that will run virtually anywhere
• Docker based containers
18
Chombo ‐ Deployment Provisioning