streamanaltix 2.1.6 channelsdocs.streamanalytix.com › 2.1.6 › pdf › channels.pdfthe kafka...

18
StreamAnalytix 2.1.6 Channels pg. 1 STREAMANALTIX 2.1.6 CHANNELS Learn about Channels

Upload: others

Post on 28-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 1

STREAMANALTIX 2.1.6

CHANNELS

Learn about Channels

Page 2: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 2

Introduction Welcome to StreamAnalytix! StreamAnalytix platform enables enterprises to analyze and respond to events in real-time at Big Data scale. With its unique multi-engine architecture, StreamAnalytix provides an abstraction that offers a flexibility to execute data pipelines using a stream processing engine of choice depending upon the application use-case, taking into account the advantages of Storm or Spark Streaming based upon processing methodology (CEP, ESP) and latency.

About This Guide This guide describes the Channels and their Configuration. More Information Please visit www.streamanalytix.com To give us your feedback on your experience with the application and report bugs or problems, mail us at [email protected] To receive updated documentation in the future please register yourself at www.streamanalytix.com We welcome your feedback.

Page 3: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 3

Terms & Conditions This manual, the accompanying software and other documentation, is protected by U.S. and international copyright laws, and may be used only in accordance with the accompanying license agreement. Features of the software, and of other products and services of Impetus Technologies, may be covered by one or more patents. All rights reserved.

All other company, brand and product names are registered trademarks or trademarks of their respective holders. Impetus Technologies disclaims any responsibility for specifying which companies own which marks or which organizations.

USA Los Gatos Impetus Technologies, Inc. 720 University Avenue, Suite 130 Los Gatos, CA 95032, USA Ph.: 408.252.7111, 408.213.3310 Fax: 408.252.7114 © 2017 Impetus Technologies, Inc., All rights reserved.

If you have any comments or suggestions regarding this document, please send them via e-mail to [email protected]

Page 4: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 4

Table of Contents Introduction ............................................................................................................................................... 2

About This Guide.................................................................................................................................... 2

Terms & Conditions ................................................................................................................................... 3

CHANNELS .................................................................................................................................................. 5

Kafka…… ................................................................................................................................................. 5

RabbitMQ ............................................................................................................................................... 8

S3 Receiver ........................................................................................................................................... 10

DFS Receiver......................................................................................................................................... 11

Socket ……………………………………………………………………………………………………………………….......................12

Data Generator .................................................................................................................................... 13

ActiveMQ ............................................................................................................................................. 14

Replay…………………………………………………………………………………………………………………………………………….15

Custom Channel ................................................................................................................................... 16

Page 5: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 5

CHANNELS Data access in StreamAnalytix is recognized by Channels that are built-in drag and drop operators to consume data from various data sources such as message queues, transactional databases, log files, and sensors for IOT data. For Spark pipelines, you can use the following channels:

Kafka Reads messages from Kafka. RabbitMQ Reads messages from RabbitMQ. MapRStreams Reads messages from MapR Streams. S3 Receiver Reads objects from Amazon S3. DFS Receiver Reads data from the Hadoop Distributed File System (HDFS). Socket Reads data via TCP socket connection. Data Generator Generates random data for testing. Custom Reads data from any source.

For Storm pipelines, you can use the following channels:

Kafka Reads messages from Kafka. RabbitMQ Reads messages from RabbitMQ. MapRStreams Reads messages from MapR Streams. ActiveMQ Reads messages from ActiveMQ. Replay Replays failed messages. Custom Reads data from any source

Kafka Apache Kafka is publish-subscribe messaging system which can handle hundreds of megabytes of reads and writes per second from thousands of clients. The Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both Spark and Storm data pipelines. Configure Kafka Channel When you configure a Kafka channel to read data from a Kafka cluster, you provide Kafka and Zookeeper details. Configure additional properties such as Partition and Replication as needed. You can optionally add custom Kafka properties. Step 1: Drag the Kafka channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Page 6: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 6

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: Configure Kafka properties on the Configuration tab. These properties are different for Spark and Strom pipelines. Configuration properties for the Kafka channel for Spark pipelines:

Field Description Connection Name Select a Kafka connection. Topic Name Topic in Kafka from where messages will be read. ZK ID An Id for Zookeeper to track messages. Partitions Number of partitions. Each partition is an ordered immutable

sequence of message that is continually appended to a commit log. Replication Factor Number of replications. Replication provides stronger durability

and higher availability. For example, a topic with replication factor N can tolerate up to N-1 server failures without losing any messages committed to the log.

Force from Start TRUE will pull data from start. Add Configuration To add additional custom Kafka properties in key-value pairs.

Page 7: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 7

Configuration properties for the Kafka channel for Storm pipelines:

Field Description Connection Name Select a Kafka connection. Parallelism Number of executors (threads) of the channel. Max Spout Pending

Number of incoming messages that the channel will hold at any point.

Topic Name Topic in Kafka to which message will be published. ZK ID An Id for Zookeeper to track messages. Broker Zk Root Directory under which all topics and partition information will be stored.

Partitions Number of partitions on Kafka. Each partition is an ordered immutable sequence of message that is continually appended to a commit log.

Replication Factor Number of replications. For a topic with replication factor N, Kafka will tolerate up to N-1 server failures without losing any messages committed to the log.

Force From Start TRUE will pull data from start. Fetch Size Number of bytes of the message to be fetched from Kafka cluster in one

request. Buffer Size Socket buffer size (in bytes). Add Configuration Adds custom Kafka properties in key value pair.

Page 8: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 8

Enable Kafka Topic Administration Enabling Kafka topic administration allows you to create a topic in Kafka cluster from the Kafka channel. To enable Kaka Topic Administration, go to Superuser UI > Connections > edit Kafka Connection Properties > check Enable Topic Administration. Note: Kafka Topic Administration is not enabled by default.

RabbitMQ RabbitMQ Channel reads messages from the RabbitMQ cluster using its Exchanges and Queues. Following is the message Flow in RabbitMQ system:

1. Producer publishes a message to an Exchange. 2. Exchange receives the message and message attributes such as routing key depending on the

Exchange type. 3. Exchange routes the message into a Queue as per the message attributes. 4. Message stays in the queue until handled by a consumer. 5. Consumer handles the message.

RabbitMQ channel is available for both Spark and Storm data pipelines. Configure RabbitMQ channel When you configure a RabbitMQ channel to read data from a RabbitMQ cluster, you provide Exchange details and message attributes. You can optionally add custom RabbitMQ properties. Step 1: Drag the RabbitMQ channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: Configure RabbitMQ properties on the Configuration tab. These properties are different for Spark and Strom pipelines. Configuration properties for the RabbitMQ channel for Spark pipelines:

Page 9: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 9

Field Description Connection Name Select a RabbitMQ connection. Exchange Name Name of the RabbitMQ exchange. Exchange Type Exchange types: DIRECT, TOPIC & FANOUT. Exchange Durable TRUE for a durable exchange. Exchange Auto Delete TRUE to enable auto-delete. Routing Key Routing key that binds an exchange with a queue. Queue Name Name of the RabbitMQ queue. Queue Durable TRUE for a durable queue. Queue Auto Delete TRUE to enable auto-delete. Add Configuration To add additional custom RabbitMQ properties in key-value pair.

Configuration properties for the RabbitMQ channel for Storm pipelines:

Page 10: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 10

Field Description Connection Name Select a RabbitMQ connection. Parallelism Number of executors (threads) of the channel. Max Spout Pending Number of messages that the channel will hold at any point. Exchange Name Name of the RabbitMQ exchange. Exchange Type Exchange types: DIRECT, TOPIC & FANOUT Exchange Durable TRUE for a durable exchange. Exchange Auto Delete TRUE to enable auto-delete. Routing Key Routing key that binds an exchange with a queue. Queue Name Name of the RabbitMQ queue. Queue Durable TRUE for a durable queue. Queue Auto Delete TRUE to enable auto-delete. Add Configuration To add additional custom RabbitMQ properties in key-value pair.

S3 Receiver S3 Receiver channel reads objects from Amazon S3 bucket. Amazon S3 stores data as objects within resources called Buckets. S3 Receiver channel is available for Spark data pipelines. Configure S3 Receiver Channel

Page 11: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 11

Step 1: Drag the S3 Receiver channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: Configure S3 Receiver properties on the Configuration tab

Field Description

Connection Name Select a S3 connection.

Bucket Name Amazon S3 bucket name.

Path File or directory path from where data to be read.

Add Configuration To add additional custom S3 properties in a key-value pair.

DFS Receiver DFSReceiver enables you to read data from HDFS. DFSReceiver channel is available for Spark data pipelines. Configure DFSReceiver Channel

Page 12: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 12

Step 1: Drag the DFSReceiver channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: Configure DFSReceiver properties on the Configuration tab

Field Description Connection Name Select a HDFS connection. HDFS Path Mention the file path on HDFS. Add Configuration To add additional custom HDFS properties in key-value pairs.

Socket Socket channel allows you to consume data from a TCP data source from a pipeline. Configure message type and choose a Socket connection to start streaming data to a pipeline.

Socket channel is available for Spark data pipelines. Configure Socket Channel Step 1: Drag the Socket channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Page 13: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 13

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: Configure Socket properties on the Configuration tab

Field Description Connection Name Select a socket connection. Add Configuration To add additional custom properties in key-value pairs.

Data Generator Data Generator channel generates random data for testing your pipelines. Once the pipeline is tested with random data, you can replace the Data Generator channel with a channel for an actual data source. Data Generator channel is available for Spark data pipelines. Configure Data Generator Channel Step 1: Drag the Data Generator channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: On the Configuration tab, select a method to generate random data: Method 1: Generate Data You can enter boundary values to the message fields from the UI for random data generation.

Page 14: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 14

Method 2: Upload Data You can upload a text file, in the formats delimit or json, containing test data.

ActiveMQ ActiveMQ channel reads data from ActiveMQ message topics that maintain the order of messages and dispatches them to consumers. ActiveMQ channel is available for Storm data pipelines. Configure ActiveMQ Channel Step 1: Drag the ActiveMQ channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for custom parser, check the box to provide fully qualified name of your class.

Step 2: Configure ActiveMQ properties on the Configuration tab

Page 15: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 15

Field Description Connection Name Select a ActiveMQ connection.

Parallelism Number of executors (threads) of the channel. Max Spout Pending Number of messages that the channel will hold at any point. Topic Name Topic name from where ActiveMQ consumer will read the messages.

Routing Key Specifies a key for redirecting messages to specific topic. Add Configuration To add additional custom ActiveMQ properties in a key-value pair.

Replay Replay Channel re-processes messages that fail to process during pipeline execution. Replay channel ensures every message is processed in the pipeline, which is critical for enterprise-grade applications that need guaranteed stream processing capabilities.

Replay channel is available for Storm data pipelines. Configure Replay Channel

Page 16: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 16

Field Description Parallelism Number of executors (threads) of the channel. Max Spout Pending Number of messages that the channel will hold at any point.

Do you want to use Discarded Message Config

Check to hold the discarded messages for the time entered in maximum retries limit field. Otherwise, discarded messages will not be saved.

Max Retries Maximum number of times a message can replay. X-message-Ttl

Duration (in minutes) after which message will be ready for replay.

Add Configuration To add additional custom properties in key-value pairs.

Custom Channel Custom channel allows you to read data from any data source. You can write your own custom code to ingest data from any data source and build it as a custom channel to you can use into in your pipelines or even share it with other workspace users. Custom channel is available for both Spark and Storm data pipelines. Pre-requisites – Create Custom Code Jar To use a custom channel, first you need to create a jar file containing your custom code and then upload the jar file in a pipeline or as a registered component. To write a custom logic for your custom channel, download the Sample Project.

Import the downloaded Sample project as a maven project in Eclipse. Ensure that Apache Maven is installed on your machine and that the PATH for the same is set. Implement your custom code. Build the project, to create a jar file containing your code, using the following command: mvn clean install –DskipTests.

Page 17: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 17

Implementation for the custom code is different for Spark and Storm pipelines. For a Custom channel of a Spark pipeline, add your custom logic in the implemented methods of the class com.streamanalytix.framework.api.spark.channel.Channel. For a Custom channel of a Storm pipeline, add your custom logic in the implemented methods of the class com.streamanalytix.framework.api.storm.channel.Channel.

There are nine methods for you to implement:

1. Init: To add initialization calls 2. openChannel: Called when a task is initialized within a worker on the cluster 3. closeChannel: Called when Apache Storm spout for the channel shuts down 4. activateChannel: Called when the spout gets activated 5. deactivateChannel: Called when the spout gets deactivated 6. nextMessage: Called to emit tuples from the spout to the output collector 7. Acknowledge: Tuple emitted from the spout successful reached the output collector 8. Failure: Tuple emission from the spout failed to reach the output collector 9. declareChannelOutputFields: Declare the output stream schema

Configure Custom Channel Step 1: Drag the Custom channel on the pipeline canvas and right click on it to configure. On the Select Message tab, select a Message Type and provide a Message Name.

Page 18: STREAMANALTIX 2.1.6 Channelsdocs.streamanalytix.com › 2.1.6 › pdf › Channels.pdfThe Kafka channel reads data from an Apache Kafka cluster. Kafka channel is available for both

StreamAnalytix 2.1.6 Channels

pg. 18

Field Description Message Type Single: If only one message needs to be parsed in the pipeline.

Multi: If multiple messages needs to be parsed in the pipeline. Message Name Click on the Message Name input field to get a list of all the messages and

select message(s).

Schema Identifier In case, you have written your own code for schema identification, check the box to provide fully qualified name of your class.

Step 2: Configure Custom channel properties on the Configuration tab.

Field Description Parallelism Number of executors (threads) of the channel.

Max Spout Pending Number of incoming messages that the channel will hold at any point Channel Plugin Fully qualified name of the custom code class.

Add Configuration To add additional custom properties in key-value pairs.

Step 3: Upload the custom code jar file on the top-right corner of the pipeline canvas

To give us your feedback on your experience with the application and report bugs or problems, mail us at [email protected]