clickstream analytics on the aws cloud · amazon web services – clickstream analytics on the aws...

33
Page 1 of 33 Clickstream Analytics on the AWS Cloud Quick Start Reference Deployment September 2019 Cambridge Technology AWS Quick Start team Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start. Contents Overview .................................................................................................................................... 2 Clickstream analytics on AWS ............................................................................................... 3 Cost and licenses .................................................................................................................... 3 Architecture ............................................................................................................................... 4 Planning the deployment .......................................................................................................... 7 Specialized knowledge ........................................................................................................... 7 AWS account .......................................................................................................................... 7 Technical requirements ......................................................................................................... 7 Deployment options ...............................................................................................................8 Deployment steps ......................................................................................................................8 Step 1. Sign in to your AWS account ......................................................................................8 Step 2. Launch the Quick Start .............................................................................................. 9 Option 1: Parameters for deploying clickstream analytics into a new VPC .................... 10 Option 2: Parameters for deploying clickstream analytics into an existing VPC ............ 15 Step 3. Test the deployment ............................................................................................... 20

Upload: others

Post on 14-Mar-2020

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Page 1 of 33

Clickstream Analytics on the AWS Cloud

Quick Start Reference Deployment

September 2019

Cambridge Technology

AWS Quick Start team

Visit our GitHub repository for source files and to post feedback,

report bugs, or submit feature ideas for this Quick Start.

Contents

Overview .................................................................................................................................... 2

Clickstream analytics on AWS ............................................................................................... 3

Cost and licenses .................................................................................................................... 3

Architecture ............................................................................................................................... 4

Planning the deployment .......................................................................................................... 7

Specialized knowledge ........................................................................................................... 7

AWS account .......................................................................................................................... 7

Technical requirements ......................................................................................................... 7

Deployment options ...............................................................................................................8

Deployment steps ......................................................................................................................8

Step 1. Sign in to your AWS account ......................................................................................8

Step 2. Launch the Quick Start .............................................................................................. 9

Option 1: Parameters for deploying clickstream analytics into a new VPC .................... 10

Option 2: Parameters for deploying clickstream analytics into an existing VPC ............ 15

Step 3. Test the deployment ............................................................................................... 20

Page 2: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 2 of 33

Quick Start datasets ................................................................................................................ 21

Optional: Analyzing and visualizing data with Amazon QuickSight .................................. 22

Signing in to Amazon QuickSight ..................................................................................... 22

Setting up Amazon QuickSight ......................................................................................... 23

Connecting to the Amazon Redshift cluster/database .................................................... 23

Optional: Ingesting Apache web access logs with Kinesis Data Firehose.......................... 28

Best practices for using clickstream analytics on AWS .......................................................... 29

Security ................................................................................................................................... 30

FAQ ......................................................................................................................................... 30

GitHub repository ................................................................................................................... 31

Additional resources ............................................................................................................... 31

Document revisions ................................................................................................................. 32

This Quick Start was created by Cambridge Technology in collaboration with Amazon Web

Services (AWS). Cambridge Technology is an AWS Premier Consulting partner specializing

in big data.

Quick Starts are automated reference deployments that use AWS CloudFormation

templates to deploy key technologies on AWS, following AWS best practices.

Overview

This Quick Start reference deployment guide provides step-by-step instructions for

deploying clickstream analytics on the AWS Cloud.

Clickstream analytics is the process of collecting, analyzing, and reporting aggregate data

about which webpages someone visits and in what order. The path that a visitor takes

through a website is called a clickstream. Clickstream analytics can be a powerful tool for

doing market research and generating valuable business insights from the data logs of

online platforms.

This Quick Start is for users who want to get started with AWS-native components for a

clickstream analytics solution in the AWS Cloud. Once this foundational layer is in place,

you can use it to ingest, analyze, and generate business insights from your websites’

clickstream data.

Page 3: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 3 of 33

Clickstream analytics on AWS

This Quick Start builds a clickstream analytics solution that integrates AWS services such as

Amazon Kinesis Data Firehose, Amazon Simple Storage Service (Amazon S3), Amazon

Elasticsearch Service (Amazon ES), Amazon Redshift, and Amazon QuickSight. The

clickstream analytics solution provides these capabilities:

Streaming data ingestion, which can process millions of website clicks (clickstream

data) a day from global websites.

Near real-time visualizations and recommendations, with web usage metrics

that include events per hour, visitor count, web/HTTP user agents (e.g., a web browser),

abnormal events, aggregate event count, referrers, and recent events. You can build a

recommendation engine with Amazon Redshift application programming interfaces

(APIs).

Publishing of your website clickstream data to Amazon S3, Amazon Redshift,

and Amazon ES.

Analysis and visualizations of your clickstream data by using Kibana (an open-

source tool that comes with Amazon ES) and Amazon QuickSight.

Cost and licenses

You are responsible for the cost of the AWS services used while running this Quick Start

reference deployment. There is no additional cost for using the Quick Start.

The AWS CloudFormation template for this Quick Start includes configuration parameters

that you can customize. Some of these settings, such as instance type, will affect the cost of

deployment. For cost estimates, see the pricing pages for each AWS service you will be

using. Prices are subject to change.

Because this Quick Start uses AWS-native solution components, there are no costs or

license requirements beyond AWS infrastructure costs. This Quick Start also deploys

Kibana.

Tip After you deploy the Quick Start, we recommend that you enable the AWS Cost

and Usage Report to track costs associated with the Quick Start. This report delivers

billing metrics to an S3 bucket in your account. It provides cost estimates based on

usage throughout each month, and finalizes the data at the end of the month. For

more information about the report, see the AWS documentation.

Page 4: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 4 of 33

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters

builds the following clickstream analytics environment in the AWS Cloud.

Figure 1: Quick Start architecture for clickstream analytics on AWS

A highly available architecture that spans two Availability Zones.*

A VPC configured with public and private subnets according to AWS best practices, to

provide you with your own virtual network on AWS.*

In the public subnets:

– Managed network address translation (NAT) gateways to allow outbound

internet access for resources in the private subnets.*

– A Linux bastion host in an Auto Scaling group to allow inbound Secure Shell

(SSH) access to Amazon Elastic Compute Cloud (Amazon EC2) instances in

public and private subnets.*

– A publicly accessible Amazon Redshift cluster for data aggregation, analysis,

transformation, and creation of new clickstream datasets.

Page 5: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 5 of 33

In the private subnets, two web server instances running Apache in an Auto Scaling

group with Amazon Kinesis Agent installed.

AWS Identity and Access Management (IAM) security groups (stateful firewall) at the

EC2 instance level.

An Application Load Balancer (ALB) to balance traffic between the two web servers. A

separate target group is created for SSH access to the backend instances via the ALB, as

an alternative to using the bastion host.

Publicly accessible Amazon ES with Elasticsearch version 6.3 (default) for indexing and

searching functionality on the clickstream data.

Three Kinesis Data Firehose delivery streams to push clickstream data to the

destinations: Amazon S3, Amazon Redshift, and Amazon ES.

An Amazon S3 bucket for the Kinesis Data Firehose delivery stream.

Integration with other Amazon services such as Amazon S3, Amazon Kinesis Data

Firehose, Amazon ES with Kibana, and Amazon QuickSight

IAM roles to provide permissions to access AWS resources. Examples include

permitting Amazon ES to access VPC resources, and allowing Amazon Kinesis Data

Firehose to access Amazon S3, Amazon Redshift, and Amazon ES.

Amazon Simple Notification Service (Amazon SNS) to notify you about automatic

scaling operations and rollback of AWS CloudFormation stack creation.

Optionally, you can choose to include demo data. In this case, two schemas are created

in Amazon Redshift and loaded with sample data. One dataset is from Google Analytics

for a website’s traffic. The second dataset has clickstream data for January 2015,

released by Wikipedia. For more information, see Quick Start datasets, later in this

guide.

* The template that deploys the Quick Start into an existing VPC skips the components

marked by asterisks and prompts you for your existing VPC configuration.

Figure 2 shows how these components work together in a typical end-to-end process flow.

Page 6: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 6 of 33

Figure 2: Clickstream analytics process flow

The clickstream data as users navigate through a website captured in the web server logs

is sent to the Kinesis Data Firehose delivery stream using Kinesis Agent installed on the

web servers.

Once the data is processed, the Kinesis Data Firehose delivery stream sends the data in

near real-time to Amazon Redshift.

Kinesis Data Firehose with an Amazon S3 destination persists managed feeds to a

curated datasets bucket in Amazon S3.

Kinesis Data Firehose with an Amazon ES destination stores and indexes the dataset in

Amazon ES.

Amazon CloudWatch metrics monitor the health of the services.

If you need to do transformation on the clickstream data (website tracking logs), you can

use an AWS Lambda function in the Kinesis Data Firehose delivery stream, or create a

Kinesis Data Analytics application and use custom structured query language (SQL).

These are not included in this Quick Start.

You can run ad-hoc queries on the data in Amazon S3 with Amazon Athena, create and

share visualization dashboards on the Amazon Redshift data using Amazon QuickSight,

and use Kibana to visualize the data in Amazon ES. Amazon Athena and QuickSight,

however, are not included in this Quick Start.

Page 7: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 7 of 33

Planning the deployment

Specialized knowledge

This Quick Start assumes familiarity with website traffic logs and data visualization

software like Amazon QuickSight, Kibana, or Tableau.

This deployment guide also requires a moderate level of familiarity with AWS services. If

you’re new to AWS, visit the Getting Started Resource Center and the AWS Training and

Certification website for materials and programs that can help you develop the skills to

design, deploy, and operate your infrastructure and applications on the AWS Cloud.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by

following the on-screen instructions. Part of the sign-up process involves receiving a phone

call and entering a PIN using the phone keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for

the services you use.

Technical requirements

Before you launch the Quick Start, your account must be configured as specified in the

following table. Otherwise, deployment might fail.

Resources If necessary, request service limit increases for the following resources. You might need

to do this if you already have an existing deployment that uses these resources, and you

think you might exceed the default limits with this deployment. For default limits, see

the AWS documentation.

AWS Trusted Advisor offers a service limits check that displays your usage and limits

for some aspects of some services.

Resource This deployment uses

VPCs 1

Elastic IP addresses 1

IAM security groups 3

IAM roles 8

Auto Scaling groups 2

Application Load

Balancers 1

T2-micro instances 3

Page 8: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 8 of 33

Regions This deployment includes Amazon Kinesis Data Firehose, which isn’t currently

supported in all AWS Regions. For a current list of supported Regions, see AWS

Regions and Endpoints in the AWS documentation.

Key pair Make sure that at least one Amazon EC2 key pair exists in your AWS account in the

Region where you are planning to deploy the Quick Start. Make note of the key pair

name. You’ll be prompted for this information during deployment. To create a key pair,

follow the instructions in the AWS documentation.

If you’re deploying the Quick Start for testing or proof-of-concept purposes, we

recommend that you create a new key pair instead of specifying a key pair that’s already

being used by a production instance.

IAM permissions To deploy the Quick Start, you must log in to the AWS Management Console with IAM

permissions for the resources and actions the templates will deploy. The

AdministratorAccess managed policy within IAM provides sufficient permissions,

although your organization may choose to use a custom policy with more restrictions.

Deployment options

This Quick Start provides two deployment options:

Deploy clickstream analytics into a new VPC (end-to-end deployment). This

option builds a new AWS environment consisting of the VPC, subnets, NAT gateways,

security groups, bastion hosts, and other infrastructure components, and then deploys

clickstream analytics into this new VPC.

Deploy clickstream analytics into an existing VPC. This option provisions

clickstream analytics in your existing AWS infrastructure.

The Quick Start provides separate templates for these options. It also lets you configure

CIDR blocks, instance types, and clickstream analytics settings, as discussed later in this

guide.

Deployment steps

Step 1. Sign in to your AWS account

1. Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has

the necessary permissions. For details, see Planning the deployment earlier in this

guide.

2. Make sure that your AWS account is configured correctly, as discussed in the Technical

requirements section.

3. Create a service-linked role for Amazon ES, if you do not already have one in your AWS

account.

Page 9: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 9 of 33

Step 2. Launch the Quick Start

Notes The instructions in this section reflect the older version of the AWS

CloudFormation console. If you’re using the redesigned console, some of the user

interface elements might be different.

You are responsible for the cost of the AWS services used while running this Quick

Start reference deployment. There is no additional cost for using this Quick Start.

For full details, see the pricing pages for each AWS service you will be using in this

Quick Start. Prices are subject to change.

1. Sign in to your AWS account, and choose one of the following options to launch the

AWS CloudFormation template. For help choosing an option, see Deployment options

earlier in this guide.

Deploy clickstream analytics into a

new VPC on AWS

Deploy clickstream analytics into an

existing VPC on AWS

Important If you’re deploying clickstream analytics into an existing VPC, make

sure that your VPC has two private subnets in different Availability Zones for the

workload instances and that the subnets aren’t shared. This Quick Start doesn’t

support shared subnets. These subnets require NAT gateways or NAT instances in

their route tables, to allow the instances to download packages and software without

exposing them to the internet. You will also need the domain name option

configured in the Dynamic Host Configuration Protocol (DHCP) options as

explained in the Amazon VPC documentation. You will be prompted for your VPC

settings when you launch the Quick Start.

Each deployment takes about 30 minutes to complete.

2. Check the AWS Region that’s displayed in the upper-right corner of the navigation bar,

and change it if necessary. This is where the network infrastructure for clickstream

analytics will be built. The template is launched in the US West (Oregon) Region by

default.

• new VPC

• workloadDeploy • workload onlyDeploy

Page 10: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 10 of 33

Note This deployment includes services that aren’t supported in all AWS Regions.

For a list of supported Regions, see the pages for Amazon Kinesis Data Firehose and

Amazon QuickSight.

3. On the Select Template page, keep the default setting for the template URL, and then

choose Next.

4. On the Specify Details page, change the stack name if needed. Review the parameters

for the template. Provide values for the parameters that require input. For all other

parameters, review the default settings and customize them as necessary.

In the following tables, parameters are listed by category and described separately for

the two deployment options:

– Parameters for deploying clickstream analytics into a new VPC

– Parameters for deploying clickstream analytics into an existing VPC

When you finish reviewing and customizing the parameters, choose Next.

OPTION 1: PARAMETERS FOR DEPLOYING CLICKSTREAM ANALYTICS INTO A NEW VPC

View template

Network configuration:

Parameter label

(name) Default Description

Availability Zones

(AvailabilityZones)

Requires input The list of Availability Zones to use for the subnets in the VPC.

The Quick Start uses two Availability Zones from your list and

preserves the logical order you specify.

VPC CIDR

(VPCCIDR)

10.0.0.0/16 The CIDR block for the VPC.

Private subnet 1 CIDR

(PrivateSubnet1CIDR)

10.0.0.0/19 The CIDR block for the private subnet located in Availability

Zone 1.

Private subnet 2 CIDR

(PrivateSubnet2CIDR)

10.0.32.0/19 The CIDR block for the private subnet located in Availability

Zone 2.

Public subnet 1 CIDR

(PublicSubnet1CIDR)

10.0.128.0/20 The CIDR block for the public subnet located in Availability

Zone 1.

Public subnet 2 CIDR

(PublicSubnet2CIDR)

10.0.144.0/20 The CIDR block for the public subnet located in Availability

Zone 2.

Page 11: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 11 of 33

Bastion host configuration:

Parameter label

(name) Default Description

Bastion host key pair

name

(KeyPairName)

Requires input A public/private key pair, which allows you to connect securely

to your instance after it launches. This is the key pair you

created in your preferred AWS Region; see the Technical

requirements section. If you don’t have a key pair in this

Region, please create it before continuing.

Bastion instance type

(BastionInstanceType)

t2.micro The bastion host EC2 instance type.

Allowed CIDR for

external access to

bastion

(RemoteAccessCIDR)

Requires input A CIDR block that’s allowed external access to the bastion. We

recommend that you use a constrained CIDR range to reduce

the potential of inbound attacks from unknown IP addresses

(see http://checkip.dyndns.org/).

Bastion AMI operating

system

(BastionAMIOS)

Amazon-Linux-

HVM

The Amazon Linux distribution for the Amazon Machine

Image (AMI) to be used for the bastion instances.

Web server configuration

Parameter label

(name) Default Description

Application instance

type

(AppInstanceType)

t2.micro The application server EC2 instance type.

Application host key

pair name

(AppKeyPairName)

Requires input Public/private key pairs allow you to securely connect to your

application instance after it launches. If you don’t have a key

pair in this Region, please create it before continuing.

SNS configuration

Parameter label

(name) Default Description

Email ID to receive

alert notifications

(OperatorEMail)

Requires input The email address to notify you, if there are any scaling

operations.

Encrypt data configuration

Parameter label

(name) Default Description

Encrypt data at rest

(EncryptData)

no Set to yes to encrypt the data as it leaves your Amazon Kinesis

Data Firehose delivery stream.

Page 12: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 12 of 33

Amazon ES configuration

Parameter label

(name) Default Description

Amazon ES domain

name

(ESDomainName)

Requires input The user-defined Amazon ES domain name.

Amazon ES version

(ESVersion)

6.3 The user-defined Amazon ES version.

Number of instances

to run in cluster

(ESClusterInstance

Count)

1 For two Availability Zones, you must choose instances in

multiples of two.

Cluster instance type

(ESInstanceType)

m4.large.

elasticsearch

The instance type for Amazon ES nodes

Instance type for

dedicated master

(DedicatedMasterType)

m4.large.

elasticsearch

The master instance type for Amazon ES nodes.

Number of dedicated

masters in a cluster

(DedicatedMasterCount)

0 The number of dedicated masters to run. Leave the default

value for this field, if you don't want a dedicated master

instances.

Provisioned IOPS

(IOPS)

0 The provisioned IOPS value must be an integer between 1000

and 16000.

EBS volume size

(VolumeSize)

10 The IOPS total cluster size in GB (EBS volume size x instance

count).

Everyday snapshot

time

(AutomatedSnapshot

StartHour)

0 Schedule automated snapshots. Value should be between 0-

23.

EBS volume type

(VolumeType)

gp2 The type of volume used for instances in a cluster.

Amazon Elasticsearch destination configuration for Amazon Kinesis Data Firehose:

Parameter label

(name) Default Description

Index name

(ESIndex)

Requires input The index name of the Amazon ES domain.

Type name

(ESType)

Requires input The name of the Amazon ES type.

Index rotation

(ESIndexRotation)

NoRotation The frequency at which the Amazon ES index will be rotated.

Page 13: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 13 of 33

Parameter label

(name) Default Description

Buffer interval

(ESBufferInterval)

300 The number of seconds to buffer data before delivering to

Amazon S3 to be copied to Amazon ES (60 to 900).

Buffer size

(ESBufferSize)

5 MB of data to buffer before delivering to Amazon S3 to be

copied to Elasticsearch (1 to 100).

Amazon Redshift cluster configuration

Parameter label

(name) Default Description

Database name

(DatabaseName)

Requires input The name of the first database to be created when the Amazon

Redshift cluster is created.

Cluster type

(ClusterType)

single-node The type of Amazon Redshift cluster.

Number of nodes

(NumberOfNodes)

1 The number of compute nodes in the Amazon Redshift cluster.

For multi-node clusters, the NumberOfNodes parameter must

be greater than 1.

Node type

(NodeType)

dc2.large The type of Amazon Redshift node to be provisioned.

Redshift port number

(RedshiftPortNumber)

5439 The Amazon Redshift publicly accessible port number.

Include demo data

(isDemo)

no Set to yes if you want to ingest demo data into Amazon

Redshift.

Amazon Redshift configuration for Amazon Kinesis Data Firehose

Parameter label

(name) Default Description

Master user name

(MasterUser)

masteruser The name of the master user of the Amazon Redshift cluster.

Master user password

(MasterUserPassword)

Requires input The master user password for the Amazon Redshift cluster.

Table name

(RedshiftTableName)

apache_logs.access

_logs

The name of the table in the Amazon Redshift cluster. Do not

change it.

Column pattern

(RedshiftColumns)

Requires input The comma-separated list of the columns in the destination

Amazon Redshift table.

Buffer interval

(RedshiftBuffer

Interval)

300 The number of seconds to buffer data before delivering to

Amazon Redshift (60 to 900).

Buffer size

(RedshiftBufferSize)

5 MB of data to buffer before delivering to Amazon S3 (1 to 128).

Page 14: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 14 of 33

Amazon S3 destination configuration for Amazon Kinesis Data Firehose:

Parameter label

(name) Default Description

Buffer interval

(S3BufferInterval)

300 The number of seconds to buffer data before delivering to

Amazon S3 (60 to 900).

Buffer size

(S3BufferSize)

5 MB of data to buffer before delivering to Amazon S3 (1 to 128).

Destination prefix

(S3DestinationPrefix)

AggregatedData

The name of the prefix where the aggregated data will be

stored.

Custom website configuration:

Parameter label

(name) Default Description

Website content S3

bucket

(WebsiteContent)

Requires input The Amazon S3 location where your custom website contents

are uploaded. This Quick Start will deploy your website from

this location. Leave blank if there is no site to deploy.

AWS Quick Start configuration:

Note We recommend that you keep the default settings for the following two

parameters, unless you are customizing the Quick Start templates for your own

deployment projects. Changing the settings of these parameters will automatically

update code references to point to a new Quick Start location. For additional details,

see the AWS Quick Start Contributor’s Guide.

Parameter label

(name) Default Description

Quick Start S3 bucket

name

(QSS3BucketName)

aws-quickstart The S3 bucket you created for your copy of Quick Start assets,

if you decide to customize or extend the Quick Start for your

own use. The bucket name can include numbers, lowercase

letters, uppercase letters, and hyphens, but should not start or

end with a hyphen.

Quick Start S3 key

prefix

(QSS3KeyPrefix)

quickstart-ct-

clickstream-

analytics/

The S3 key name prefix used to simulate a folder for your copy

of Quick Start assets, if you decide to customize or extend the

Quick Start for your own use. This prefix can include numbers,

lowercase letters, uppercase letters, hyphens, and forward

slashes.

Page 15: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 15 of 33

OPTION 2: PARAMETERS FOR DEPLOYING CLICKSTREAM ANALYTICS INTO AN EXISTING

VPC

View template

Network configuration:

Parameter label

(name) Default Description

Availability Zones

(AvailabilityZones)

Requires input The list of Availability Zones to use for the subnets in the VPC.

The Quick Start uses two Availability Zones from your list and

preserves the logical order you specify.

Existing VPC ID

(VPC)

Requires input Choose an existing VPC.

Subnet configuration:

Parameter label

(name) Default Description

Existing public subnet

ID in AZ-1

(PublicSubnetA)

Requires input The public subnet in Availability Zone 1.

Existing public subnet

ID in AZ-2

(PublicSubnetB)

Requires input The public subnet in Availability Zone 2.

Existing private

subnet ID in AZ-1

(PrivateSubnetA)

Requires input The private subnet in Availability Zone 1.

Existing private

subnet ID in AZ-2

(PrivateSubnetB)

Requires input The private subnet in Availability Zone 2.

Bastion host configuration:

Parameter label

(name) Default Description

Bastion host key pair

name

(KeyPairName)

Requires input A public/private key pair, which allows you to connect securely

to your instance after it launches. This is the key pair you

created in your preferred AWS Region; see the Technical

requirements section. If you do not have one in this region,

please create it before continuing.

Bastion instance type

(BastionInstanceType)

t2.micro The bastion host EC2 instance type.

Page 16: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 16 of 33

Parameter label

(name) Default Description

Allowed CIDR for

external access to

bastion

(RemoteAccessCIDR)

Requires input A CIDR block that’s allowed external access to the bastion. We

recommend that you use a constrained CIDR range to reduce

the potential of inbound attacks from unknown IP addresses

(see http://checkip.dyndns.org/).

Bastion AMI operating

system

(BastionAMIOS)

Amazon-Linux-

HVM

The Amazon Linux distribution for the Amazon Machine

Image (AMI) to be used for the bastion instances.

Web server configuration

Parameter label

(name) Default Description

Application instance

type

(AppInstanceType)

t2.micro The application server EC2 instance type.

Application host key

pair name

(AppKeyPairName)

Requires input Public/private key pairs allow you to securely connect to your

application instance after it launches. If you don’t have a key

pair in this Region, please create it before continuing.

SNS configuration

Parameter label

(name) Default Description

Email ID to receive

alert notifications

(OperatorEMail)

Requires input The email address to notify you, if there are any scaling

operations.

Encrypt data configuration

Parameter label

(name) Default Description

Encrypt data at rest

(EncryptData)

no Set to yes to encrypt the data as it leaves your Amazon Kinesis

Data Firehose delivery stream.

Amazon ES configuration

Parameter label

(name) Default Description

Elasticsearch domain

name

(ESDomainName)

Requires input The user-defined Amazon ES domain name.

Page 17: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 17 of 33

Parameter label

(name) Default Description

Elasticsearch version

(ESVersion)

6.3 The user-defined Amazon ES version.

Number of instances

to run in cluster

(ESClusterInstance

Count)

1 For two Availability Zones, you must choose instances in

multiples of two.

Cluster instance type

(ESInstanceType)

m4.large.

elasticsearch

The instance type for Amazon ES nodes.

Instance type for

dedicated master

(DedicatedMasterType)

m4.large.

elasticsearch

The master instance type for Amazon ES nodes.

Number of dedicated

masters in a cluster

(DedicatedMasterCount)

0 The number of dedicated masters to run. Leave the default

value for this field, if you don’t want a dedicated master

instance.

IOPS for cluster

(IOPS)

0 The provisioned IOPS value must be an integer between 1000

and 16000.

EBS volume size

(VolumeSize)

10 IOPS total cluster size in GB (EBS volume size x instance

count).

Everyday snapshot

time

(AutomatedSnapshotSta

rtHour)

0 Schedule automated snapshots. Value should be between 0-

23.

EBS volume type

(VolumeType)

gp2 The type of volume used for instances in a cluster.

Amazon Elasticsearch destination configuration for Amazon Kinesis Data Firehose:

Parameter label

(name) Default Description

Index name

(ESIndex)

Requires input The index name of the Amazon ES domain.

Type name

(ESType)

Requires input The name of the Amazon ES type.

Index rotation

(ESIndexRotation)

NoRotation The frequency at which the Amazon ES index will be rotated.

Buffer interval

(ESBufferInterval)

300 The number of seconds to buffer data before delivering to

Amazon S3 to be copied to Amazon ES (60 to 900).

Buffer size

(ESBufferSize)

5 MB of data to buffer before delivering to Amazon S3 to be

copied to Amazon ES (1 to 100).

Page 18: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 18 of 33

Amazon Redshift cluster configuration

Parameter label

(name) Default Description

Database name

(DatabaseName)

Requires input The name of the first database to be created when the Amazon

Redshift cluster is created.

Cluster type

(ClusterType)

single-node The type of Amazon Redshift cluster.

Number of nodes

(NumberOfNodes)

1 The number of compute nodes in the Amazon Redshift cluster.

For multi-node clusters, the NumberOfNodes parameter must

be greater than 1.

Node type

(NodeType)

dc2.large The type of Amazon Redshift node to be provisioned.

Redshift port number

(RedshiftPortNumber)

5439 The Amazon Redshift publicly accessible port number.

Include demo data

(isDemo)

no Set to yes if you want to ingest demo data into Amazon

Redshift.

Amazon Redshift configuration for Amazon Kinesis Data Firehose

Parameter label

(name) Default Description

Master user name

(MasterUser)

masteruser The name of the master user of the Amazon Redshift cluster.

Master user password

(MasterUserPassword)

Requires input The master user password for the Amazon Redshift cluster.

Table name

(RedshiftTableName)

apache_logs.access

_logs

The name of the table in the Amazon Redshift cluster. Do not

change it.

Column pattern

(RedshiftColumns)

Requires input The comma-separated list of the columns in the destination

Amazon Redshift table.

Buffer interval

(RedshiftBuffer

Interval)

300 The number of seconds to buffer data before delivering to

Amazon Redshift (60 to 900).

Buffer size

(RedshiftBufferSize)

5 MB of data to buffer before delivering to Amazon Redshift (1

to 128).

Amazon S3 destination configuration for Amazon Kinesis Data Firehose

Parameter label

(name) Default Description

Buffer interval

(S3BufferInterval)

300 Number of seconds to buffer data before delivering to Amazon

S3 (60 to 900).

Page 19: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 19 of 33

Parameter label

(name) Default Description

Buffer size

(S3BufferSize)

5 MB of data to buffer before delivering to Amazon S3 (1 to 128).

Destination prefix

(S3DestinationPrefix)

AggregatedData The name of the prefix where the aggregated data will be

stored.

Custom website configuration:

Parameter label

(name) Default Description

Website content S3

bucket

(WebsiteContent)

Requires input The Amazon S3 location where your custom website contents

are uploaded. This Quick Start will deploy your website from

this location. Leave blank if there is no site to deploy.

AWS Quick Start configuration:

Note We recommend that you keep the default settings for the following two

parameters, unless you are customizing the Quick Start templates for your own

deployment projects. Changing the settings of these parameters will automatically

update code references to point to a new Quick Start location. For additional details,

see the AWS Quick Start Contributor’s Guide.

Parameter label

(name) Default Description

Quick Start S3 bucket

name

(QSS3BucketName)

aws-quickstart The S3 bucket you have created for your copy of Quick Start

assets, if you decide to customize or extend the Quick Start for

your own use. The bucket name can include numbers,

lowercase letters, uppercase letters, and hyphens, but should

not start or end with a hyphen.

Quick Start S3 key

prefix

(QSS3KeyPrefix)

quickstart-ct-

clickstream-

analytics /

The S3 key name prefix used to simulate a folder for your copy

of Quick Start assets, if you decide to customize or extend the

Quick Start for your own use. This prefix can include numbers,

lowercase letters, uppercase letters, hyphens, and forward

slashes.

5. On the Options page, you can specify tags (key-value pairs) for resources in your stack

and set advanced options. When you’re done, choose Next.

6. On the Review page, review and confirm the template settings. Under Capabilities,

select the two check boxes to acknowledge that the template will create IAM resources

and that it might require the capability to auto-expand macros.

Page 20: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 20 of 33

7. Choose Create to deploy the stack.

8. Monitor the status of the stack. When the status is CREATE_COMPLETE, the

Amazon ES cluster is ready.

9. Use the URLs displayed in the Outputs tab for the stack to view the resources that were

created and to verify the deployment, as discussed in the next step.

Step 3. Test the deployment

When the Quick Start deployment is complete, you can validate and test the deployment by

checking the resources in the Outputs tab of the AWS CloudFormation console.

Figure 3: Clickstream analytics outputs after successful deployment

Confirm the following:

The ALB endpoint (LoadBalancerDNSEndpoint) listed in the Outputs tab should open

the default Apache web server homepage, if you open it in a web browser.

The Amazon ES cluster (ElasticSearchDomainEndpoint) listed in the Outputs tab for

the stack is available in the Amazon Elasticsearch Service console at

https://console.aws.amazon.com/es/, and Kibana is accessible via a web browser at the

location mentioned in the Amazon Elasticsearch Service console.

Page 21: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 21 of 33

Note S3 buckets are retained after you delete the AWS CloudFormation stacks

created by this Quick Start, so your sample data for the Clickstream Analytics

solution remains available in your AWS account. To remove those buckets, delete the

contents of each bucket, and then delete each bucket. For more information, see the

Amazon S3 documentation.

Quick Start datasets

You can deploy this Quick Start with optional sample datasets and later extend it with your

own dataset when needed.

If you opted for the Quick Start deployment with sample data in Amazon Redshift, a

clickstream_demo schema is created in the Amazon Redshift cluster during deployment.

The clickstream_demo schema has the following two tables and structure:

ga_demo_data: This table holds sample data from Google Analytics for a website's traffic.

Column Data type

CIT VARCHAR (100)

COUNTRY_REGION VARCHAR (1000)

DATE DATE

EXITS INTEGER

MEDIUM VARCHAR (1000)

NUMBER_OF_RECORDS INTEGER

PAGE VARCHAR (1000)

PAGEVIEWS INTEGER

SECTION VARCHAR (1000)

TIME_ON_PAGE INTEGER

TOTAL_DOWNLOADS INTEGER

UNIQUE_VISITORS INTEGER

VISITS INTEGER

schematic_log: The following table holds a sample dataset of clickstream data for

January 2015 released by Wikipedia. The dataset contains referrer-article pairs from the

Page 22: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 22 of 33

English language, desktop version of Wikipedia—just a sample of the 4 billion total requests

made in January 2015.

Column Data Type

ACTION VARCHAR (100)

BYTES INTEGER

ITEM VARCHAR (1000)

NUMBER_OF_PURCHASES INTEGER

RESPONSE VARCHAR (100)

PCT_PURCHASE NUMERIC (8,2)

NUMBER_OF_VIEWS INTEGER

BRAND VARCHAR (1000)

CLICKHERE VARCHAR (1000)

CATEGORY VARCHAR (1000)

CLIENTIP VARCHAR (1000)

ITEMID VARCHAR (1000)

MSG VARCHAR (1000)

NUMBER_OF_RECORDS INTEGER

PRODUCTID VARCHAR (1000)

RBYTES INTEGER

RSTAT INTEGER

SERVERIP VARCHAR (1000)

SESSIONID VARCHAR (1000)

TIMESTAMP TIMESTAMP

URL VARCHAR (2000)

Optional: Analyzing and visualizing data with Amazon QuickSight

Post-deployment, using Amazon QuickSight, you can import or connect to your data,

analyze it, and share your data visualizations in reports and dashboards.

SIGNING IN TO AMAZON QUICKSIGHT

1. Go to the Amazon QuickSight page at https://quicksight.aws.amazon.com/.

2. In QuickSight account name, enter your account name. This is the same name you

used to create an Amazon QuickSight subscription. Keep it handy, in case you need it.

Page 23: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 23 of 33

3. Provide your email address, if prompted.

4. If the user name is blank, type your user name.

5. Choose one of the following:

– For organizational users: The user name provided by your administrator. Your

account can be based on IAM credentials, a single sign-on (SSO) service, or your

email address. If you have received an email invitation from another Amazon

QuickSight user, it will mention the type of credentials to use.

– For individual users: The user name you created for yourself. This is usually the

IAM credentials you created. User names that contain a semicolon (;) aren't

supported.

6. In Password, type the associated password. If you aren't sure, ask the administrator. If

you create a new password, in Confirm password, retype your password. Passwords

are case-sensitive, must be between 8 and 64 characters in length, and must contain at

least one character from three of the following categories:

– Lowercase letters (a–z)

– Uppercase letters (A–Z)

– Numbers (0–9)

– Non alphanumeric characters (~!@#$%^&*_-+=`|\(){}[]:;"'<>,.?/)

7. Choose Sign in. In some cases, this button is labeled Create account and sign in.

(Only for users invited by email.) You are prompted to type the account name

provided in your email invitation. If you mistype it, you get an authentication error. To

change the account name, choose the account name in Account name, and type in the

correct one.

SETTING UP AMAZON QUICKSIGHT

Follow the instructions provided in Setting Up Amazon QuickSight.

CONNECTING TO THE AMAZON REDSHIFT CLUSTER/DATABASE

1. Once you have logged in to Amazon QuickSight, select the Region where you have

launched the Quick Start, so that Amazon QuickSight can access the sample datasets

available in the Amazon Redshift cluster/database.

You should be able to see all Amazon Redshift clusters available in the selected Region,

including the Amazon Redshift cluster created by the AWS CloudFormation template.

Page 24: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 24 of 33

2. Create a new data source in Amazon QuickSight by choosing Manage data, and then

choosing New data set. Choose Redshift Auto-discovered, and then provide the

following details:

– Data source name: SampleDataSource

– Instance ID: See the RedshiftCluster value in the CloudFormation outputs section

– Connection type: Public network

– Database name : <DatabaseName parameter>

– Username: <Master user name parameter>

– Password: <Master user password parameter>

Figure 4: Amazon Redshift data source in QuickSight

3. Choose the required table. If you have imported the demo data, this will be

ga_demo_data. A second dataset will need to be created for schematic_log.

4. Choose New Analysis to start creating a new analysis or report in Amazon QuickSight.

Amazon QuickSight should show two datasets to select from for creating an analysis, as

depicted in the following figure.

Page 25: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 25 of 33

Figure 5: Your datasets in Amazon QuickSight

a. Select ga_demo_data to create analysis on the sample dataset.

b. Select the desired fields and the visual types, as shown in the following figure.

Drag and drop the fields for X axis, Value, and Color to see a page view

distribution from the ga_demo_data dataset.

Figure 6: Page view metrics

5. Go back to the QuickSight home page, and then choose New analysis to start creating a

new analysis or report in Amazon QuickSight. This time, select schematic_log as the

dataset.

6. Select the fields and the visual type as shown in the following figure. Drag and drop the

fields for Group by and Value to see a session wise analysis from the schematic_log

Page 26: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 26 of 33

dataset to see the number of views for each session with respect to action for a specific

URL.

Figure 7: Session wise analysis

7. Select the fields and the visual type as shown in the following figure. Drag and drop the

fields for Y axis, Value, and Group/Color to see a client wise analysis from the

schematic_log dataset to see the total number of sessions for each client IP address.

Page 27: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 27 of 33

Figure 8: Session frequency analysis – client wise

8. Select the fields and the visual type, as shown in the following figure. Drag and drop the

fields for Y axis, Value, and Group/Color to see a response analysis from the

schematic_log dataset. Visualize the number of events captured with a response status

such as Successful, Request Lost, No Response, Error Response, etc.

Page 28: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 28 of 33

Figure 9: Response analysis

Optional: Ingesting Apache web access logs with Kinesis Data Firehose

The Kinesis Agent is installed and configured in the web servers, which watch the Apache

access logs and ship a newly written file to the configured destinations with Kinesis Data

Firehose. The following schema/table (apache_logs.access_logs) is created in Amazon

Redshift to hold Apache access logs.

Column Data Type

HOST VARCHAR (1000)

IDENT VARCHAR (1000)

AUTHUSER VARCHAR (1000)

DATETIME TIMESTAMP

REQUEST VARCHAR (4000)

RESPONSE VARCHAR (4000)

BYTES VARCHAR (4000)

Page 29: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 29 of 33

The Kinesis agent is configured with the optionName LOGTOJSON and logFormat

COMMONAPACHELOG to transform common Apache access logs to JSON format before

sending to the Kinesis Data Firehose delivery stream.

COMMONAPACHELOG is the Apache Common Log format. Each log entry has the

following pattern by default: "%{host} %{ident} %{authuser} [%{datetime}] \"%{request}\"

%{response} %{bytes}"

If you have existing web logs that you want to deliver to Amazon Redshift as a one-time

activity, you can include the following configuration in your Kinesis Agent by using the

deliveryStream name as provisioned in your stack and creating a temp file with the existing

web log data (/tmp/mylog.txt). This configuration uses the LOGTOJSON for

COMMONAPACHELOG.

{ "cloudwatch.emitMetrics": true, "firehose.endpoint": "https://firehose.us-west-2.amazonaws.com", "flows": [ { "filePattern": "/tmp/mylog.txt", "deliveryStream": "<enter the Redshift delivery stream name provisioned using the stack", "initialPosition": "START_OF_FILE", "dataProcessingOptions": [ { "optionName": "LOGTOJSON", "logFormat": "COMMONAPACHELOG" } ] } ] }

You can customize the Amazon Redshift schema and Kinesis Agent configuration for other

web servers to suit your use case.

Best practices for using clickstream analytics on AWS

If you have sensitive data, you can enable server-side data encryption in Amazon Kinesis

Data Firehose. Use AWS CloudTrail to record actions taken by a user, role, or an AWS

service in Kinesis Data Firehose. Similarly, to protect sensitive data in Amazon ES, you can

enable encryption of data at rest by using AWS Key Management Service (AWS KMS) and

Page 30: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 30 of 33

also use node-to-node encryption. In Amazon S3, you can encrypt objects by using server-

side encryption with either Amazon S3-managed keys or AWS KMS-managed keys. In

Amazon Redshift, you can enable database encryption for your cluster using AWS KMS to

help protect the data at rest.

For Amazon ES, it is a best practice to use three dedicated master nodes and to deploy the

domain across three Availability Zones.

Security

The Amazon Redshift cluster is in a VPC and publicly accessible with a public IP address for

Kinesis Data Firehose to deliver clickstream data. The access is secured by allowing only the

Kinesis Data Firehose IP addresses for each available AWS Region.

The Amazon ES domain is protected from public access. The access policy of the Amazon

ES domain is configured to allow access from specific IPs, which is your IP address only.

FAQ

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the

template with Rollback on failure set to No. (This setting is under Advanced in the

AWS CloudFormation console, Options page.) With this setting, the stack’s state will be

retained and the instance will be left running, so you can troubleshoot the issue. (For

Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and

C:\cfn\log.)

Important When you set Rollback on failure to No, you will continue to incur

AWS charges for this stack. Please make sure to delete the stack when you finish

troubleshooting.

For additional information, see Troubleshooting AWS CloudFormation on the AWS

website.

Q. I encountered a size limitation error when I deployed the AWS CloudFormation

templates.

A. We recommend that you launch the Quick Start templates from the links in this guide or

from another S3 bucket. If you deploy the templates from a local copy on your computer or

from a non-S3 location, you might encounter template size limitations when you create the

Page 31: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 31 of 33

stack. For more information about AWS CloudFormation limits, see the AWS

documentation.

Q. I deployed the Quick Start in the EU (London) Region, but it didn’t work.

A. This Quick Start includes services that aren’t supported in all Regions. See the page for

Amazon QuickSight on the AWS website for a list of supported Regions.

Q. I encountered an “S3 bucket already exists” error during deployment.

A. S3 buckets created by this Quick Start are retained after you delete the CloudFormation

stacks, so your Clickstream Analytics sample data remains available in your AWS account.

To remove those buckets, delete the contents of each bucket, and then delete each bucket.

Q. I encountered a problem accessing the Kibana dashboard in Amazon ES.

A. Amazon ES is protected from public access. Make sure that your IP matches the input

parameter Remote Access CIDR, which is whitelisted for Amazon ES.

GitHub repository

To post feedback, submit feature ideas, or report bugs, use the Issues section of the

GitHub repository for this Quick Start. If you’d like to submit code, please review the Quick

Start Contributor’s Guide.

Additional resources

AWS resources

Getting Started Resource Center

AWS General Reference

AWS Glossary

AWS services

Amazon Athena

AWS CloudFormation

Amazon CloudWatch

Amazon EBS

Amazon EC2

Page 32: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 32 of 33

Amazon Elasticsearch Service

Kibana plug-in

Amazon Kinesis

Amazon QuickSight

Amazon Redshift

Amazon S3

Amazon SNS

Amazon VPC

Cambridge Technology products and documentation

Cambridge Technology website

Other Quick Start reference deployments

AWS Quick Start home page

Document revisions

Date Change In sections

September 2019 Initial publication —

Page 33: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a

Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019

Page 33 of 33

© 2019, Amazon Web Services, Inc. or its affiliates, and Cambridge Technology. All rights

reserved.

Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings

and practices as of the date of issue of this document, which are subject to change without notice. Customers

are responsible for making their own independent assessment of the information in this document and any

use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether

express or implied. This document does not create any warranties, representations, contractual

commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities

and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of,

nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, Version 2.0 (the "License"). You

may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/ or in the "license" file accompanying this file. This code is distributed on

an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and limitations under the License.