stream analytics in the enterprise

Post on 16-Apr-2017

660 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stream Analytics in the Enterprise

About Us

• Emerging technology firm focused on helping enterprises build breakthrough software solutions

• Building software solutions powered by disruptive enterprise software trends

-Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile• Bringing innovation from startups and academic institutions to the enterprise

• Award winning agencies: Inc 500, American Business Awards, International Business Awards

• The elements of stream analytic solutions• Stream analytic platforms: on-premise vs. cloud• On-premise stream analytic platforms• Cloud stream analytic services• Complementary technologies

Agenda

The elements of enterprise stream analytic solutions

• Real time data ingestion• Execute SQL queries on dynamic streams of data• Time window queries • Connect query outputs to new data streams• Leverage reference data in the stream queries

Capabilities of Stream Analytic Solutions

Stream analytic platforms

Cloud vs. On-premise stream analytic platforms

Capabilities of Stream Analytic Solutions

ExtensibilityControlRich programming modelIntegration with on-premise big data pipeline

Complex infrastructureScalabilityMaintenance and monitoring

Simple provisioningElastic scalabilityIntegrated with PaaS offeringsRich monitoring and management experience

Integration with on-premise systemsExtensibility Lack of customization

On-premise stream analytic platforms Cloud stream analytic services

On-premise stream analytic platforms

Lead Platforms

Apache Storm

Apache Spark

Apache Samza

Apache Flink

Akka

Apache Storm

• Stream processing framework with micro-batching capabilities

• Included in most Hadoop distributions

• Main model (spouts and bolts) -One at a time -Lower latency -Operates on tuple streams• Trident -Micro-batching -Higher throughput

Apache Storm: Benefits vs. Challenges

• Broad adoption• Included in Hadoop distributions• Vibrant community • Extensibility • Support for different programming

languages

• Increasing competition from newer stacks

• Performance limitations at very large scale

Benefits Challenges

Apache Spark

• Micro-batching processing framework

• Elastic scalability models• Receivers split data into batches• Spark Streaming processes

batches and produces results• High throughput – higher latency • Functional APIs

Spark Streaming: Benefits vs. Challenges

• MPP infrastructure• Interoperability with other Spark

programming models (Java, Python, SQL)

• Integration with messaging frameworks

• Extensibility• Included in most Hadoop

distributions

• Time window queries• Complex infrastructure setup• Integration with line of business

systems

Benefits Challenges

Apache Samza

• Built to address some of the limitations of Apache Storm

• Deep integration with Samza and Yarn

• Simple API comparable to map-reduce

• Leverages Yarn for task distribution, fault tolerance and scalability

Apache Samza: Benefits vs. Challenges

• Highly scalable, fault-tolerant model

• Stateful stream data processing• Extensibility • Simple infrastructure

• Small adoption• Low level API• Heavy IO operations

Benefits Challenges

Apache Flink Streaming

• Alternative to Spark• Everything is a stream• Platform to unity batch and stream

processing• True streaming with adjustable

latency and throughput • Support different stream sources

and transformations

Apache Flink Streaming: Benefits vs. Challenges

• Combine batch and stream data processing

• Expressive APIs • Data flows and transformation • Extensiblity

• Small adoption• Limited state management • High availability models

Benefits Challenges

Akka Streams

• Micro-service, actor oriented model

• Messaging driven • Isolated failures• Reactive programming model

based on source, sinks and flows• DSL for stream data manipulation

Akka Streams: Benefits vs. Challenges

• Rich stream data processing model• Extensibility• Concurrency and thread-safey • Leverage mainstream Java and

Scala programming models

• Small adoption• Dependent on Akka’s architecture

style• Support for languages outside the

JVM

Benefits Challenges

Cloud stream analytic platforms

Lead Platforms

AWS Kinesis Analytics

Azure Stream Analytics

Bluemix Stream Analytics

AWS Kinesis

• Native stream data services in AWS

• Combines three products in a single platform

-Kinesis Streams -Kinesis Firehose -Kinesis Analytics• Kinesis Streams allows to collect

data streams from any applications• Kinesis Firehose provides a model

to load streaming data into AWS• Kinesis Analytics allow the

execution of SQL queries over data streams

AWS Kinesis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms

• AWS Kinesis Analytics hasn’t been released

• Interoperability with on-premise data streams

Benefits Challenges

Azure Stream Analytics

• Native stream analytic service in the Azure platform

• Allow the execution of SQL queries over dynamic streams of data

• Integrates with the other components of the Cortana Analytics suite

• Leverages Azure Event Hub for high volume data ingestion

• Very rich monitoring and analytic capabilities

Azure Stream Analytcis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms • Rich SQL query and analytics

model

• Interoperability with on-premise data streams

• Extensibility

Benefits Challenges

Bluemix Streaming Analytics

• Native stream analytic service in the IBM Bluemix platform

• Built upon IBM Streams technology

• Allow the execution of SQL queries over dynamic streams of data

• Support interactive and programmatic query models

• Rich analytic and monitoring capabilities

• Stream visualization graph

Azure Stream Analytcis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Rich SQL query and analytics

model

• Adoption • Interoperability with on-premise

data streams • Extensibility

Benefits Challenges

You can’t buy everything!

Capabilities of Enterprise Stream Analytic Solutions

• Stream tracking • Replay and simulation• Stream data testing • Integration with line of business systems • Stream data search • Integration with mainstream analytic tools

Complementary technologies

Other Relevant Technologies in Stream Analytic Solutions

• Enterprise messaging platforms • Time series databases• Stream data connectors

Enterprise Messaging Platforms

• Persistent messaging• Pub-sub messaging • Support for multiple messaging

patterns• Ordered messaging

Time Series Databases

• Store time stamped data• Time series query functions• Integrate real time and reference

data

Stream data connectors

• Develop stream data sources from line of business systems

• Integrate real time and reference data from enterprise systems into the stream data pipeline

• Combine real time data from multiple line of business systems into single data streams

Summary

• Stream data processing and analytics is a key element of modern enterprise data pipelines

• Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza, Spark Streaming, Flink Streaming, Akka….

• Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream Analytics, Bluemix Streaming Analytics…

• You can’t buy everything! Stream analytic solution require custom implementations

• When building stream analytic solutions, consider complementary technologies such as enterprise messaging stacks or time series databases

Thankshttp://Tellago.comInfo@Tellago.com

top related