confluent platform on the aws cloud · platform and apache kafka capabilities in the managed...

19
Page 1 of 19 Confluent Platform on the AWS Cloud Quick Start Reference Deployment April 2017 Last update: August 2017 (revisions) Confluent, Inc. AWS Quick Start Reference Team Contents Overview................................................................................................................................. 2 Costs and Licenses.............................................................................................................. 2 Architecture............................................................................................................................ 3 Prerequisites .......................................................................................................................... 4 Specialized Knowledge ....................................................................................................... 4 Technical Requirements..................................................................................................... 4 Deployment Options .............................................................................................................. 4 Deployment Steps .................................................................................................................. 5 Step 2. Subscribe to the Appropriate Linux AMI ............................................................... 5 Step 1. Prepare Your AWS Account .................................................................................. .. 5 Step 3. Launch the Quick Start .......................................................................................... 5 Step 4. Test the Deployment ............................................................................................ 14 FAQ....................................................................................................................................... 16 Additional Resources ........................................................................................................... 18 GitHub Repository................................................................................................................ 18 Document Revisions ............................................................................................................ 19

Upload: trannhan

Post on 20-Feb-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Page 1 of 19

Confluent Platform on the AWS Cloud

Quick Start Reference Deployment

April 2017

Last update: August 2017 (revisions)

Confluent, Inc.

AWS Quick Start Reference Team

Contents

Overview ................................................................................................................................. 2

Costs and Licenses .............................................................................................................. 2

Architecture ............................................................................................................................ 3

Prerequisites .......................................................................................................................... 4

Specialized Knowledge ....................................................................................................... 4

Technical Requirements ..................................................................................................... 4

Deployment Options .............................................................................................................. 4

Deployment Steps .................................................................................................................. 5

Step 2. Subscribe to the Appropriate Linux AMI ............................................................... 5

Step 1. Prepare Your AWS Account .................................................................................... 5

Step 3. Launch the Quick Start .......................................................................................... 5

Step 4. Test the Deployment ............................................................................................ 14

FAQ....................................................................................................................................... 16

Additional Resources ........................................................................................................... 18

GitHub Repository................................................................................................................ 18

Document Revisions ............................................................................................................ 19

Page 2: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 2 of 19

This Quick Start deployment guide was created by Amazon Web Services (AWS) in

partnership with Confluent, Inc.

Quick Starts are automated reference deployments that use AWS CloudFormation

templates to launch, configure, and run the AWS compute, network, storage, and other

services required to deploy a specific workload on AWS.

Overview

This Quick Start reference deployment guide provides step-by-step instructions for

deploying Confluent Platform on the Amazon Web Services (AWS) Cloud.

Confluent Platform is the complete streaming platform for large-scale distributed

environments. Built on the core technology of Apache Kafka, Confluent Platform enables all

your interfaces and data systems to be connected. This connectivity allows you to make

decisions leveraging all your internal systems in real time. The Quick Start supports two

software editions: Confluent Open Source and Confluent Enterprise.

This Quick Start is for users who are looking to evaluate and use the full range of Confluent

Platform and Apache Kafka capabilities in the managed infrastructure environment of

AWS.

Alternatively, you can use Confluent Cloud, which is a fully managed Apache Kafka as a service on AWS.

Costs and Licenses

You are responsible for the cost of the AWS services used while running this Quick Start

reference deployment. There is no additional cost for using the Quick Start.

The AWS CloudFormation template for this Quick Start includes configuration parameters

that you can customize. Some of these settings, such as instance type, will affect the cost of

deployment. For cost estimates, see the pricing pages for each AWS service you will be

using. Prices are subject to change.

The Confluent software is provided in a bring-your-own-license model. The Quick Start

supports two software editions: Confluent Open Source and Confluent Enterprise.

Confluent Open Source does not require a license, while Confluent Enterprise requires the

purchase of a license directly from Confluent. For convenience, the Confluent Enterprise

offering will be deployed with a 30-day trial license by default.

Page 3: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 3 of 19

Architecture Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters

builds the following Confluent Platform environment in the AWS Cloud.

Figure 1: Quick Start Confluent Platform architecture on AWS

The Quick Start sets up the following:

A VPC configured across two Availability Zones. For each Availability Zone, this Quick

Start provisions one public subnet. The Confluent Platform deployment uses both

subnets.

Groups of EC2 instances for each logical role in the Confluent Platform deployment.

Instances are distributed evenly between the Availability Zones yet remain a single,

logical cluster.

– Zookeepers: Host the quorum management service and manage topic metadata.

– Brokers: Host the Kafka broker service and maintain topic data.

Page 4: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 4 of 19

– Workers: Support the additional Confluent Platform services. Confluent REST

Proxy and Kafka Connect containers are deployed on all worker instances. Single

copies of Confluent Control Center and Schema Registry services are deployed

separately on the first two workers.

Security group settings to allow for necessary network traffic into the instances:

– All Kafka client traffic to zookeepers and brokers

– REST and HTTP/HTTPS access to additional Confluent Platform services

Prerequisites

Specialized Knowledge

Before you deploy this Quick Start, we recommend that you become familiar with the

following AWS services. (If you are new to AWS, see Getting Started with AWS.)

Amazon VPC

Amazon EC2

Amazon EBS

You should also take the time to review the basics of the Confluent Platform architecture.

Excellent references are available in the Confluent Platform documentation.

Technical Requirements

The AWS account into which the Quick Start will be launched must support IAM

delegation.

Deployment Options This Quick Start provides two deployment options:

Deployment of Confluent Platform into a new VPC (end-to-end deployment)

builds a new AWS environment consisting of the VPC, subnets, security groups, and

other infrastructure components, and then deploys Confluent Platform into the EC2

instances within this new VPC.

Deployment of Confluent Platform into an existing VPC provisions new EC2

instances with Confluent Platform in an existing AWS network infrastructure.

The Quick Start allows customization of CIDR blocks, instance types, and Confluent

Platform cluster settings, as discussed later in this guide. You can also customize the size

and type of persistent storage used for the Kafka brokers.

Page 5: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 5 of 19

Deployment Steps

Step 1. Prepare Your AWS Account

1. If you don’t already have an AWS account, create one at https://aws.amazon.com by

following the on-screen instructions. Your AWS account must support IAM delegation.

2. Use the region selector in the navigation bar to choose the AWS Region where you want

to deploy Confluent Platform on AWS.

3. Create a key pair in your preferred region.

4. If necessary, request a service limit increase for the Amazon EC2 instance type(s) you

will use. You might need to do this if you already have an existing deployment that uses

this instance type, and you think you might exceed the default limit with this reference

deployment.

Step 2. Subscribe to the Appropriate Linux AMI

1. Sign in to the AWS Marketplace at https://aws.amazon.com/marketplace.

2. Open the page for the Amazon Linux, CentOS 7, or Ubuntu Server 16.04 LTS AMI

(whichever you plan to use for your deployment), and choose Continue. Be sure to

select the x86_64 /HVM versions of the AMI to support the maximum range of AWS

instance types.

3. Use the 1-Click Launch option to launch the AMI into your account on Amazon EC2.

This involves accepting the terms of the license agreement and receiving confirmation

email. For detailed instructions, see the AWS Marketplace documentation.

Step 3. Launch the Quick Start

Note You are responsible for the cost of the AWS services used while running this

Quick Start reference deployment. There is no additional cost for using this Quick

Start. For full details, see the pricing pages for each AWS service you will be using in

this Quick Start. Prices are subject to change.

Page 6: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 6 of 19

1. Choose one of the following options to launch the AWS CloudFormation template into

your AWS account. For help, see deployment options earlier in this guide.

Option 1

Deploy Confluent Platform into

a new VPC on AWS

Option 2

Deploy Confluent Platform into

an existing VPC on AWS

Important If you’re deploying Confluent Platform into an existing VPC, you may

want to select one with multiple subnets in different Availability Zones for high

availability. You can elect whether to distribute your cluster instances across one or

more Availability Zones when you launch the Quick Start.

Each deployment takes 10-15 minutes to complete (depending on number and size of

instances).

2. Check the region that’s displayed in the upper-right corner of the navigation bar, and

change it if necessary. This is where the network and instance infrastructure for the

Confluent Platform deployment will be built. The template is launched in the US West

(Oregon) Region by default.

3. On the Select Template page, keep the default setting for the template URL, and then

choose Next.

4. On the Specify Details page, change the stack name if needed. Review the parameters

for the template. Provide values for the parameters that require input. For all other

parameters, review the default settings and customize them as necessary. When you

finish reviewing and customizing the parameters, choose Next.

In the following tables, parameters are listed by category and described separately for

the two deployment options:

– Parameters for deploying Confluent Platform into a new VPC

– Parameters for deploying Confluent Platform into an existing VPC

Launch Launch

Page 7: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 7 of 19

Option 1: Parameters for deploying Confluent Platform into a new VPC

View template

Network Configuration:

Parameter label (name) Default Description

Availability Zones

(AvailabilityZones)

Requires input The list of Availability Zones to use for the subnets in the VPC.

The Quick Start uses two Availability Zones from your list and

preserves the logical order you specify.

VPC CIDR

(VPCCIDR)

10.0.0.0/16 CIDR block for the VPC.

Private Subnet 1 CIDR

(PrivateSubnet1CIDR)

10.0.0.0/19 CIDR block for the private subnet located in Availability Zone

1.

Private Subnet 2 CIDR

(PrivateSubnet2CIDR)

10.0.32.0/19 CIDR block for the private subnet located in Availability Zone

2.

Public Subnet 1 CIDR

(PublicSubnet1CIDR)

10.0.128.0/20 CIDR block for the public (DMZ) subnet located in Availability

Zone 1.

Public Subnet 2 CIDR

(PublicSubnet2CIDR)

10.0.144.0/20 CIDR block for the public (DMZ) subnet located in Availability

Zone 2.

Allowed External Access

CIDR

(RemoteAccessCIDR)

Requires input The CIDR IP range that is permitted to access Confluent

Platform services (including the Kafka brokers) . We

recommend that you set this value to a trusted IP range. For

example, you might want to grant only your corporate network

access to the software.

Allowed SSH Access

CIDR

(SSHAccessCIDR)

Requires input The CIDR IP range that is permitted to access the EC2

instances in the deployment via SSH. This constraint may

need to be more restrictive than the RemoteAccessCIDR

setting.

Confluent Platform Configuration:

Parameter label (name) Default Description

Cluster Name

(ClusterName)

awsqs Logical name for the Confluent Platform deployment. This

name will be used to tag the individual instances, so we

recommend that you select a unique name within your

organization.

Confluent Platform

Edition

(ConfluentEdition)

Confluent

Enterprise

The Confluent Platform offering to deploy (Confluent

Enterprise or Confluent Open Source). Confluent Enterprise

requires an additional software license after the 30-day free

trial that is included with the AWS Quick Start.

Confluent Platform

Version

(ConfluentVersion)

3.2.2 The version of Confluent Platform you want to deploy.

Supported versions will include those available from Confluent

after the April 2017 release of this Quick Start.

Page 8: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 8 of 19

Parameter label (name) Default Description

Repo(s) for additional

Connector jars

(ConnectorURLs)

— A comma-delimited list of public locations (e.g.,

https://s3.amazonaws.com/connector-bucket/dynamo) from

which additional jar files will be downloaded to expand the

available Kafka Connectors available to this deployment. By

default, only the Connectors included with Confluent Platform

are available.

Common Amazon EC2 Configuration (applied to all instance roles):

Parameter label (name) Default Description

Key Pair Name

(KeyPairName)

Requires input Public/private key pair, which allows you to connect securely

to your instance after it launches. When you created an AWS

account, this is the key pair you created in your preferred

region.

Linux Operating System

AMI

(LinuxOSAMI)

CentOS-7-HVM Desired Linux operating system for the instances used for the

deployment. The Quick Start supports Amazon Linux, CentOS,

and Ubuntu. Make sure that you’ve subscribed to the

appropriate AMI as described earlier in step 2.

Boot Disk Capacity

(BootDiskSize)

24 Size (in GiB) of the boot disk for the deployed instances.

Sufficient capacity should be allocated for application logs and

optional software components such as additional Kafka

Connectors.

Allocate a public IP for

each instance

(AssignPublicIP)

true Set to false if you don’t want to allocate a public IP address to

each instance.

Broker Instance Configuration:

Parameter label (name) Default Description

Broker Count

(NumBrokers)

3 Number of Kafka brokers to be deployed.

Instance Type

(BrokerNodeInstanceType)

m4.xlarge Instance type for the Kafka brokers.

Persistent Storage

(BrokerNodeStorage)

512 Capacity (in GiB) of external EBS storage to be associated with

each instance and used to store Kafka data. The default of 0

will result in ephemeral storage being allocated (when

supported by the selected instance type). If no external

storage is available, the Kafka data will be stored on the boot

disk volume.

EBS Volume Type

(BrokerNodeStorageType)

st1 EBS volume type (when external persistent storage is

allocated). For details on the characteristics and costs of the

different storage types, see the Amazon EC2 documentation.

Spot Price (optional)

(BrokerNodeSpotPrice)

0.00 Change the default setting if you’d like to use Spot Instances

for your deployment instead of On-Demand Instances. This

Page 9: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 9 of 19

Parameter label (name) Default Description

may be appropriate to save costs for trial or PoC deployments,

but is not recommended for production workloads.

Zookeeper Instance Configuration:

Parameter label (name) Default Description

Zookeeper Count

(NumZookeepers)

0 Number of Zookeeper server nodes to be deployed. Supported

values are 0, 1, 3, or 5. Because the Zookeeper service can be

optionally supported on the broker nodes, the default value of 0

directs the Quick Start to co-deploy the Zookeeper service on

the first three broker nodes.

Instance Type

(ZookeeperNodeInstance

Type)

m4.large Instance type for the Zookeeper server nodes.

Persistent Storage

(ZookeeperNodeStorage)

0 Capacity (in GiB) of external EBS storage to be associated with

each instance. The default of 0 will result in ephemeral

storage being allocated (when supported by the selected

instance type). This external storage is used only for

Zookeeper state information.

Spot Price (optional)

(ZookeeperNodeSpotPrice)

0.00 Change the default setting if you’d like to use Spot Instances for

your deployment instead of On-Demand Instances. This may be

appropriate to save costs for trial or PoC deployments, but is

not recommended for production workloads.

Worker Instance Configuration:

Parameter label (name) Default Description

Worker Count

(NumWorkers)

2 Number of hosts deployed to support additional Confluent

Platform services (Schema Registry, REST Proxy, Kafka

Connect, and Confluent Control Center).

Instance Type

(WorkerNodeInstance

Type)

m4.xlarge Instance type for the worker nodes.

Persistent Storage

(WorkerNodeStorage)

0 Capacity (in GiB) of external EBS storage to be associated with

each instance. The default of 0 will result in ephemeral

storage being allocated (when supported by the selected

instance type). External storage is generally not needed for

worker nodes.

Spot Price (optional)

(WorkerNodeSpotPrice)

0.00 Change the default setting if you’d like to use Spot Instances

for your deployment instead of On-Demand Instances. This

may be appropriate to save costs for trial or PoC deployments,

but is not recommended for production workloads.

Page 10: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 10 of 19

AWS Quick Start Configuration:

Parameter label (name) Default Description

Quick Start S3 Bucket

Name

(QSS3BucketName)

aws-quickstart S3 bucket where the Quick Start templates and scripts are

installed. Use this parameter to specify the S3 bucket name

you’ve created for your copy of Quick Start assets, if you decide

to customize or extend the Quick Start for your own use. The

bucket name can include numbers, lowercase letters,

uppercase letters, and hyphens, but should not start or end

with a hyphen.

Quick Start S3 Key

Prefix

(QSS3KeyPrefix)

quickstart-confluent-kafka/

The S3 key name prefix used to simulate a folder for your copy

of Quick Start assets, if you decide to customize or extend the

Quick Start for your own use. This prefix can include numbers,

lowercase letters, uppercase letters, hyphens, and forward

slashes.

Option 2: Parameters for deploying Confluent Platform into an existing VPC

View template

Network Configuration:

Parameter label (name) Default Description

VPC ID

(VPCID)

Requires input ID of your existing VPC (e.g., vpc-0343606e).

Subnet ID

(Subnet1ID)

Requires input Comma-separated list of one or more public subnets in your

existing VPC (e.g., subnet-a0246dcd).

Allowed External Access

CIDR

(RemoteAccessCIDR)

Requires input The CIDR IP range that is permitted to access Confluent

Platform services (including the Kafka brokers) . We

recommend that you set this value to a trusted IP range. For

example, you might want to grant only your corporate

network access to the software.

Allowed SSH Access

CIDR

(SSHAccessCIDR)

Requires input The CIDR IP range that is permitted to access the EC2

instances in the deployment via SSH. This constraint may

need to be more restrictive than the RemoteAccessCIDR

setting.

Confluent Platform Configuration:

Parameter label (name) Default Description

Cluster Name

(ClusterName)

awsqs Logical name for the Confluent Platform deployment. This

name will be used to tag the individual instances, so we

recommend that you select a unique name within your

organization.

Page 11: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 11 of 19

Parameter label (name) Default Description

Confluent Platform

Edition

(ConfluentEdition)

Confluent

Enterprise

The Confluent Platform offering to deploy (Confluent

Enterprise or Confluent Open Source). Confluent Enterprise

requires an additional software license after the 30-day free

trial that is included with the AWS Quick Start.

Confluent Platform

Version

(ConfluentVersion)

3.2.2 The version of Confluent Platform you want to deploy.

Supported versions will include those available from Confluent

after the April 2017 release of this Quick Start.

Repo(s) for additional

Connector jars

(ConnectorURLs)

— A comma-delimited list of public locations (e.g.,

https://s3.amazonaws.com/connector-bucket/dynamo) from

which additional jar files will be downloaded to expand the

available Kafka Connectors available to this deployment. By

default, only the Connectors included with Confluent Platform

are available.

Common Amazon EC2 Configuration (applied to all instance roles):

Parameter label (name) Default Description

Key Pair Name

(KeyPairName)

Requires input Public/private key pair, which allows you to connect securely

to your instance after it launches. When you created an AWS

account, this is the key pair you created in your preferred

region.

Linux Operating System

AMI

(LinuxOSAMI)

CentOS-7-HVM Desired Linux operating system for the instances used for the

deployment. The Quick Start supports Amazon Linux, CentOS,

and Ubuntu. Make sure that you’ve subscribed to the

appropriate AMI as described earlier in step 2.

Boot Disk Capacity

(BootDiskSize)

24 Size (in GiB) of the boot disk for the deployed instances.

Sufficient capacity should be allocated for application logs and

optional software components such as additional Kafka

Connectors.

Allocate a public IP for

each instance

(AssignPublicIP)

true Set to false if you don’t want to allocate a public IP address to

each instance.

Broker Instance Configuration:

Parameter label (name) Default Description

Broker Count

(NumBrokers)

3 Number of Kafka brokers to be deployed.

Instance Type

(BrokerNodeInstanceType)

m4.xlarge Instance type for the Kafka brokers.

Persistent Storage

(BrokerNodeStorage)

512 Capacity (in GiB) of external EBS storage to be associated with

each instance and used to store Kafka data. The default of 0

will result in ephemeral storage being allocated (when

supported by the selected instance type). If no external

Page 12: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 12 of 19

Parameter label (name) Default Description

storage is available, the Kafka data will be stored on the boot

disk volume.

EBS Volume Type

(BrokerNodeStorageType)

st1 EBS volume type (when external persistent storage is

allocated). For details on the characteristics and costs of the

different storage types, see the Amazon EC2 documentation.

Spot Price (optional)

(BrokerNodeSpotPrice)

0.00 Change the default setting if you’d like to use Spot Instances

for your deployment instead of On-Demand Instances. This

may be appropriate to save costs for trial or PoC deployments,

but is not recommended for production workloads.

Zookeeper Instance Configuration:

Parameter label (name) Default Description

Zookeeper Count

(NumZookeepers)

0 Number of Zookeeper server nodes to be deployed.

Supported values are 0, 1, 3, or 5. Because the Zookeeper

service can be optionally supported on the broker nodes, the

default value of 0 directs the Quick Start to co-deploy the

Zookeeper service on the first three broker nodes.

Instance Type

(ZookeeperNodeInstance

Type)

m4.large Instance type for the Zookeeper server nodes.

Persistent Storage

(ZookeeperNodeStorage)

0 Capacity (in GiB) of external EBS storage to be associated with

each instance. The default of 0 will result in ephemeral

storage being allocated (when supported by the selected

instance type). This external storage is used only for

Zookeeper state information.

Spot Price (optional)

(ZookeeperNodeSpotPrice)

0.00 Change the default setting if you’d like to use Spot Instances

for your deployment instead of On-Demand Instances. This

may be appropriate to save costs for trial or PoC deployments,

but is not recommended for production workloads.

Worker Instance Configuration:

Parameter label (name) Default Description

Worker Count

(NumWorkers)

2 Number of hosts deployed to support additional Confluent

Platform services (Schema Registry, REST Proxy, Kafka

Connect, and Confleunt Control Center).

Instance Type

(WorkerNodeInstanceType)

m4.xlarge Instance type for the worker nodes.

Persistent Storage

(WorkerNodeStorage)

0 Capacity (in GiB) of external EBS storage to be associated with

each instance. The default of 0 will result in ephemeral

storage being allocated (when supported by the selected

instance type). External storage is generally not needed for

worker nodes.

Page 13: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 13 of 19

Parameter label (name) Default Description

Spot Price (optional)

(WorkerNodeSpotPrice)

0.00 Change the default setting if you’d like to use Spot Instances

for your deployment instead of On-Demand Instances. This

may be appropriate to save costs for trial or PoC deployments,

but is not recommended for production workloads.

AWS Quick Start Configuration:

Parameter label (name) Default Description

Quick Start S3 Bucket

Name

(QSS3BucketName)

aws-quickstart S3 bucket where the Quick Start templates and scripts are

installed. Use this parameter to specify the S3 bucket name

you’ve created for your copy of Quick Start assets, if you decide

to customize or extend the Quick Start for your own use. The

bucket name can include numbers, lowercase letters,

uppercase letters, and hyphens, but should not start or end

with a hyphen.

Quick Start S3 Key

Prefix

(QSS3KeyPrefix)

quickstart-confluent-kafka/

The S3 key name prefix used to simulate a folder for your copy

of Quick Start assets, if you decide to customize or extend the

Quick Start for your own use. This prefix can include numbers,

lowercase letters, uppercase letters, hyphens, and forward

slashes.

5. On the Options page, you can specify tags (key-value pairs) for resources in your stack

and set advanced options. When you’re done, choose Next.

6. On the Review page, review and confirm the template settings. Under Capabilities,

select the check box to acknowledge that the template will create IAM resources.

7. Choose Create to deploy the stack.

8. Monitor the status of the stack. When the status is CREATE_COMPLETE, the

Confluent Platform cluster is ready.

9. Use the URL displayed in the Outputs tab for the stack to view the resources that were

created. The ClusterInfo output will include Kafka client details for applications

within the VPC (private host names) and publicly accessible URLs for other services.

Page 14: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 14 of 19

Figure 2: AWS CloudFormation Outputs tab

In Figure 2, the important link is “control.center.console”: “http://54.191.83.74:9021”,

which will take you to the Confluent Enterprise Management interface.

Step 4. Test the Deployment The Confluent Control Center graphical user interface is the best place to monitor your

deployment. The home screen provides a system health overview along with the ability to

track data rates across the different Kafka topics.

There are several different ways to deploy client applications that will produce data to, and

consume data from, the Confluent Platform cluster. The template will have enabled direct

network connectivity from the nodes specified in the RemoteAccessCIDR parameter, as

well as secure login support for nodes in SSHAccessCIDR. Secure login (via SSH or

equivalent tools) is available as user “kadmin”, using the same SSH key pair specified in the

deployment.

The following sections describe some simple test commands. You can run these tests from

any of the worker nodes, as those are guaranteed to have the proper Kafka client utilities.

Additionally, each worker node will have a number of files in the /tmp directory that will

list out the necessary hosts to support the commands. As an example, consider the

following files from a simple deployment

kadmin:~> cat /tmp/brokers ip-10-20-10-79 BROKERNODE0 i-0af010fe4625d0386 ip-10-20-1-34 BROKERNODE1 i-014f34d98944c5a02 ip-10-20-2-145 BROKERNODE2 i-0238af1ceb7e1fdd9

kadmin:~> cat /tmp/zookeepers ip-10-20-14-186 ZOOKEEPERNODE0 i-0669390bac2109ec4

Page 15: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 15 of 19

kadmin:~> cat /tmp/workers ip-10-20-12-209 WORKERNODE0 i-0b7a77ba419967056 ip-10-20-9-215 WORKERNODE1 i-09f615a2e94ce86d4

Commands requiring broker specifications would reference ip-10-20-10-79, and so on for

the other services.

Listing and creating Kafka topics

For Kafka topic metadata operations, use the kafka-topics command:

$ kafka-topics --zookeeper ip-10-20-14-186:2181 --list \

| grep -v controlcenter

__consumer_offsets

_confluent-command

_confluent-metrics

_confluent-monitoring

_schemas

connect-configs

connect-offsets

connect-status

wikipedia.parsed

wikipedia.raw

$ kafka-topics --zookeeper ip-10-20-14-186:2181 --create \

--topic ztest1 --partitions 1 --replication-factor 3

Created topic “ztest1”.

A subsequent --list call will show the newly created ztest1 topic.

Producing and consuming messages via the REST Proxy interface

The Confluent REST Proxy service is deployed on all the worker nodes. That

configuration allows you to send or receive messages on a given topic without the need

to develop a standalone Kafka client program. You can post messages to the newly

created ztest1 topic as follows:

$ EXPORT RPURL=HTTP://ip-10-20-9-215:8082

$ curl -X POST -H “Content-Type: application/vnd.kafka.json.v1+json” \

--data ‘{“records”:[{“value”:{“foo”:”bar”}}]}’ $RPURL/topics/ztest1

{"offsets":[{"partition":0,"offset":0,"error_code":null,"error":null}],"

key_schema_id":null,"value_schema_id":null}

Page 16: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 16 of 19

$ curl -X POST -H “Content-Type: application/vnd.kafka.json.v1+json” \

--data ‘{“records”:[{“value”:{“foo”:”baZ”}}]}’ $RPURL/topics/ztest1

{"offsets":[{"partition":0,"offset":1,"error_code":null,"error":null}],"

key_schema_id":null,"value_schema_id":null}

Those exact commands could be used against the public IP address of the worker nodes

from outside the AWS instances if desired.

To read the messages back, you can create a temporary consumer (which you should be

careful to delete when you’re finished).

$ curl -X POST -H "Content-Type: application/vnd.kafka.v1+json" \

--data '{"name": "ext_consumer_instance",

"format": "json", "auto.offset.reset": "smallest"}' \

$RPURL/consumers/ext_json_consumer

{"instance_id":"ext_consumer_instance",

"base_uri":"http://ip-10-20-9-215:8082/

consumers/ext_json_consumer/instances/ext_consumer_instance"}

$ curl -X GET -H "Accept: application/vnd.kafka.json.v1+json" \

$RPURL/consumers/ext_json_consumer/instances/ext_consumer_instance/topic

s/vtest1

[{"key":null,"value":{"foo":"bar"},"partition":0,"offset":0},{"key":null

,"value":{"foo":"baz"},"partition":0,"offset":1}]

$ curl -X DELETE

$RPURL/consumers/ext_json_consumer/instances/ext_consumer_instance

# No content in response

For additional sample commands and trial applications, see the Quick Start examples at

http://docs.confluent.io/3.2.0/quickstart.html. (You can start with step 5 since the AWS

Quick Start will have correctly deployed Confluent Platform.)

FAQ

Q. I encountered a CREATE_FAILED error when I launched the Quick Start. What should

I do?

A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the

template with Rollback on failure set to No. (This setting is under Advanced in the

AWS CloudFormation console, Options page.) With this setting, the stack’s state will be

retained and the instance will be left running, so you can troubleshoot the issue. (You'll

Page 17: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 17 of 19

want to look at the log files in /var/log/cfn-*.log as well as the output from the custom

deployment scripts in /tmp )

Important When you set Rollback on failure to No, you’ll continue to

incur AWS charges for this stack. Please make sure to delete the stack when

you’ve finished troubleshooting.

For additional information, see Troubleshooting AWS CloudFormation on the AWS website.

Q. I encountered a size limitation error when I deployed the AWS Cloudformation templates.

A. We recommend that you launch the Quick Start templates from the location we’ve provided or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a non-S3 location, you might encounter template size limitations when you create the stack. For more information about AWS CloudFormation limits, see the AWS documentation.

Q. The Quick Start deployment for Confluent Enterprise shows success, but the link to the Confluent Control Center does not show any content. What should I do?

A. Confirm that the other REST interfaces in the cluster are working. This can be done by using command-line tools such as curl or from your browser. You’ll use the public IP address of the first worker node (which will be tagged as cluster-worker-0 in your EC2 console, as shown in Figure 3).

Figure 3: Worker nodes in the Amazon EC2 console

Page 18: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 18 of 19

The following simple command can verify some basic functionality:

curl http://<ip-of-worker-0>:9021/2.0/status/os

This command confirms that the Control Center Service has started. You can try the

primary control.center.console address after a few minutes; the issue was likely a delay in

the retrieval of the metrics information from the cluster.

Additional Resources

AWS services

Amazon EC2

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/

AWS CloudFormation

https://aws.amazon.com/documentation/cloudformation/

Amazon VPC

https://aws.amazon.com/documentation/vpc/

Confluent Platform

Confluent Platform Overview

http://docs.confluent.io/current/platform.html

Deploying / Managing Connectors

http://docs.confluent.io/current/connect/managing.html

Confluent Control Center

http://docs.confluent.io/current/control-center/docs/userguide.html

Quick Start reference deployments

AWS Quick Start home page

https://aws.amazon.com/quickstart/

GitHub Repository You can visit our GitHub repository to download the templates and scripts for this Quick

Start, to post your comments, and to share your customizations with others.

Page 19: Confluent Platform on the AWS Cloud · Platform and Apache Kafka capabilities in the managed infrastructure environment of AWS. Alternatively, you can use Confluent Cloud, which is

Amazon Web Services – Confluent Platform on the AWS Cloud August 2017

Page 19 of 19

Document Revisions Date Change In sections

August 2017 Parameter updates for storage disk types, new

versions of the Confluent Platform, and S3

portability

Step 3

April 2017 Initial publication —

© 2017, Amazon Web Services, Inc. or its affiliates and Confluent, Inc. All rights reserved.

Terms of use

Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings

and practices as of the date of issue of this document, which are subject to change without notice. Customers

are responsible for making their own independent assessment of the information in this document and any

use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether

express or implied. This document does not create any warranties, representations, contractual

commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities

and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of,

nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, Version 2.0 (the "License"). You

may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/ or in the "license" file accompanying this file. This code is distributed on

an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and limitations under the License.