clickstream analytics on the aws cloud · amazon web services – clickstream analytics on the aws...
TRANSCRIPT
![Page 1: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/1.jpg)
Page 1 of 33
Clickstream Analytics on the AWS Cloud
Quick Start Reference Deployment
September 2019
Cambridge Technology
AWS Quick Start team
Visit our GitHub repository for source files and to post feedback,
report bugs, or submit feature ideas for this Quick Start.
Contents
Overview .................................................................................................................................... 2
Clickstream analytics on AWS ............................................................................................... 3
Cost and licenses .................................................................................................................... 3
Architecture ............................................................................................................................... 4
Planning the deployment .......................................................................................................... 7
Specialized knowledge ........................................................................................................... 7
AWS account .......................................................................................................................... 7
Technical requirements ......................................................................................................... 7
Deployment options ...............................................................................................................8
Deployment steps ......................................................................................................................8
Step 1. Sign in to your AWS account ......................................................................................8
Step 2. Launch the Quick Start .............................................................................................. 9
Option 1: Parameters for deploying clickstream analytics into a new VPC .................... 10
Option 2: Parameters for deploying clickstream analytics into an existing VPC ............ 15
Step 3. Test the deployment ............................................................................................... 20
![Page 2: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/2.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 2 of 33
Quick Start datasets ................................................................................................................ 21
Optional: Analyzing and visualizing data with Amazon QuickSight .................................. 22
Signing in to Amazon QuickSight ..................................................................................... 22
Setting up Amazon QuickSight ......................................................................................... 23
Connecting to the Amazon Redshift cluster/database .................................................... 23
Optional: Ingesting Apache web access logs with Kinesis Data Firehose.......................... 28
Best practices for using clickstream analytics on AWS .......................................................... 29
Security ................................................................................................................................... 30
FAQ ......................................................................................................................................... 30
GitHub repository ................................................................................................................... 31
Additional resources ............................................................................................................... 31
Document revisions ................................................................................................................. 32
This Quick Start was created by Cambridge Technology in collaboration with Amazon Web
Services (AWS). Cambridge Technology is an AWS Premier Consulting partner specializing
in big data.
Quick Starts are automated reference deployments that use AWS CloudFormation
templates to deploy key technologies on AWS, following AWS best practices.
Overview
This Quick Start reference deployment guide provides step-by-step instructions for
deploying clickstream analytics on the AWS Cloud.
Clickstream analytics is the process of collecting, analyzing, and reporting aggregate data
about which webpages someone visits and in what order. The path that a visitor takes
through a website is called a clickstream. Clickstream analytics can be a powerful tool for
doing market research and generating valuable business insights from the data logs of
online platforms.
This Quick Start is for users who want to get started with AWS-native components for a
clickstream analytics solution in the AWS Cloud. Once this foundational layer is in place,
you can use it to ingest, analyze, and generate business insights from your websites’
clickstream data.
![Page 3: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/3.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 3 of 33
Clickstream analytics on AWS
This Quick Start builds a clickstream analytics solution that integrates AWS services such as
Amazon Kinesis Data Firehose, Amazon Simple Storage Service (Amazon S3), Amazon
Elasticsearch Service (Amazon ES), Amazon Redshift, and Amazon QuickSight. The
clickstream analytics solution provides these capabilities:
Streaming data ingestion, which can process millions of website clicks (clickstream
data) a day from global websites.
Near real-time visualizations and recommendations, with web usage metrics
that include events per hour, visitor count, web/HTTP user agents (e.g., a web browser),
abnormal events, aggregate event count, referrers, and recent events. You can build a
recommendation engine with Amazon Redshift application programming interfaces
(APIs).
Publishing of your website clickstream data to Amazon S3, Amazon Redshift,
and Amazon ES.
Analysis and visualizations of your clickstream data by using Kibana (an open-
source tool that comes with Amazon ES) and Amazon QuickSight.
Cost and licenses
You are responsible for the cost of the AWS services used while running this Quick Start
reference deployment. There is no additional cost for using the Quick Start.
The AWS CloudFormation template for this Quick Start includes configuration parameters
that you can customize. Some of these settings, such as instance type, will affect the cost of
deployment. For cost estimates, see the pricing pages for each AWS service you will be
using. Prices are subject to change.
Because this Quick Start uses AWS-native solution components, there are no costs or
license requirements beyond AWS infrastructure costs. This Quick Start also deploys
Kibana.
Tip After you deploy the Quick Start, we recommend that you enable the AWS Cost
and Usage Report to track costs associated with the Quick Start. This report delivers
billing metrics to an S3 bucket in your account. It provides cost estimates based on
usage throughout each month, and finalizes the data at the end of the month. For
more information about the report, see the AWS documentation.
![Page 4: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/4.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 4 of 33
Architecture
Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters
builds the following clickstream analytics environment in the AWS Cloud.
Figure 1: Quick Start architecture for clickstream analytics on AWS
A highly available architecture that spans two Availability Zones.*
A VPC configured with public and private subnets according to AWS best practices, to
provide you with your own virtual network on AWS.*
In the public subnets:
– Managed network address translation (NAT) gateways to allow outbound
internet access for resources in the private subnets.*
– A Linux bastion host in an Auto Scaling group to allow inbound Secure Shell
(SSH) access to Amazon Elastic Compute Cloud (Amazon EC2) instances in
public and private subnets.*
– A publicly accessible Amazon Redshift cluster for data aggregation, analysis,
transformation, and creation of new clickstream datasets.
![Page 5: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/5.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 5 of 33
In the private subnets, two web server instances running Apache in an Auto Scaling
group with Amazon Kinesis Agent installed.
AWS Identity and Access Management (IAM) security groups (stateful firewall) at the
EC2 instance level.
An Application Load Balancer (ALB) to balance traffic between the two web servers. A
separate target group is created for SSH access to the backend instances via the ALB, as
an alternative to using the bastion host.
Publicly accessible Amazon ES with Elasticsearch version 6.3 (default) for indexing and
searching functionality on the clickstream data.
Three Kinesis Data Firehose delivery streams to push clickstream data to the
destinations: Amazon S3, Amazon Redshift, and Amazon ES.
An Amazon S3 bucket for the Kinesis Data Firehose delivery stream.
Integration with other Amazon services such as Amazon S3, Amazon Kinesis Data
Firehose, Amazon ES with Kibana, and Amazon QuickSight
IAM roles to provide permissions to access AWS resources. Examples include
permitting Amazon ES to access VPC resources, and allowing Amazon Kinesis Data
Firehose to access Amazon S3, Amazon Redshift, and Amazon ES.
Amazon Simple Notification Service (Amazon SNS) to notify you about automatic
scaling operations and rollback of AWS CloudFormation stack creation.
Optionally, you can choose to include demo data. In this case, two schemas are created
in Amazon Redshift and loaded with sample data. One dataset is from Google Analytics
for a website’s traffic. The second dataset has clickstream data for January 2015,
released by Wikipedia. For more information, see Quick Start datasets, later in this
guide.
* The template that deploys the Quick Start into an existing VPC skips the components
marked by asterisks and prompts you for your existing VPC configuration.
Figure 2 shows how these components work together in a typical end-to-end process flow.
![Page 6: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/6.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 6 of 33
Figure 2: Clickstream analytics process flow
The clickstream data as users navigate through a website captured in the web server logs
is sent to the Kinesis Data Firehose delivery stream using Kinesis Agent installed on the
web servers.
Once the data is processed, the Kinesis Data Firehose delivery stream sends the data in
near real-time to Amazon Redshift.
Kinesis Data Firehose with an Amazon S3 destination persists managed feeds to a
curated datasets bucket in Amazon S3.
Kinesis Data Firehose with an Amazon ES destination stores and indexes the dataset in
Amazon ES.
Amazon CloudWatch metrics monitor the health of the services.
If you need to do transformation on the clickstream data (website tracking logs), you can
use an AWS Lambda function in the Kinesis Data Firehose delivery stream, or create a
Kinesis Data Analytics application and use custom structured query language (SQL).
These are not included in this Quick Start.
You can run ad-hoc queries on the data in Amazon S3 with Amazon Athena, create and
share visualization dashboards on the Amazon Redshift data using Amazon QuickSight,
and use Kibana to visualize the data in Amazon ES. Amazon Athena and QuickSight,
however, are not included in this Quick Start.
![Page 7: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/7.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 7 of 33
Planning the deployment
Specialized knowledge
This Quick Start assumes familiarity with website traffic logs and data visualization
software like Amazon QuickSight, Kibana, or Tableau.
This deployment guide also requires a moderate level of familiarity with AWS services. If
you’re new to AWS, visit the Getting Started Resource Center and the AWS Training and
Certification website for materials and programs that can help you develop the skills to
design, deploy, and operate your infrastructure and applications on the AWS Cloud.
AWS account
If you don’t already have an AWS account, create one at https://aws.amazon.com by
following the on-screen instructions. Part of the sign-up process involves receiving a phone
call and entering a PIN using the phone keypad.
Your AWS account is automatically signed up for all AWS services. You are charged only for
the services you use.
Technical requirements
Before you launch the Quick Start, your account must be configured as specified in the
following table. Otherwise, deployment might fail.
Resources If necessary, request service limit increases for the following resources. You might need
to do this if you already have an existing deployment that uses these resources, and you
think you might exceed the default limits with this deployment. For default limits, see
the AWS documentation.
AWS Trusted Advisor offers a service limits check that displays your usage and limits
for some aspects of some services.
Resource This deployment uses
VPCs 1
Elastic IP addresses 1
IAM security groups 3
IAM roles 8
Auto Scaling groups 2
Application Load
Balancers 1
T2-micro instances 3
![Page 8: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/8.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 8 of 33
Regions This deployment includes Amazon Kinesis Data Firehose, which isn’t currently
supported in all AWS Regions. For a current list of supported Regions, see AWS
Regions and Endpoints in the AWS documentation.
Key pair Make sure that at least one Amazon EC2 key pair exists in your AWS account in the
Region where you are planning to deploy the Quick Start. Make note of the key pair
name. You’ll be prompted for this information during deployment. To create a key pair,
follow the instructions in the AWS documentation.
If you’re deploying the Quick Start for testing or proof-of-concept purposes, we
recommend that you create a new key pair instead of specifying a key pair that’s already
being used by a production instance.
IAM permissions To deploy the Quick Start, you must log in to the AWS Management Console with IAM
permissions for the resources and actions the templates will deploy. The
AdministratorAccess managed policy within IAM provides sufficient permissions,
although your organization may choose to use a custom policy with more restrictions.
Deployment options
This Quick Start provides two deployment options:
Deploy clickstream analytics into a new VPC (end-to-end deployment). This
option builds a new AWS environment consisting of the VPC, subnets, NAT gateways,
security groups, bastion hosts, and other infrastructure components, and then deploys
clickstream analytics into this new VPC.
Deploy clickstream analytics into an existing VPC. This option provisions
clickstream analytics in your existing AWS infrastructure.
The Quick Start provides separate templates for these options. It also lets you configure
CIDR blocks, instance types, and clickstream analytics settings, as discussed later in this
guide.
Deployment steps
Step 1. Sign in to your AWS account
1. Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has
the necessary permissions. For details, see Planning the deployment earlier in this
guide.
2. Make sure that your AWS account is configured correctly, as discussed in the Technical
requirements section.
3. Create a service-linked role for Amazon ES, if you do not already have one in your AWS
account.
![Page 9: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/9.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 9 of 33
Step 2. Launch the Quick Start
Notes The instructions in this section reflect the older version of the AWS
CloudFormation console. If you’re using the redesigned console, some of the user
interface elements might be different.
You are responsible for the cost of the AWS services used while running this Quick
Start reference deployment. There is no additional cost for using this Quick Start.
For full details, see the pricing pages for each AWS service you will be using in this
Quick Start. Prices are subject to change.
1. Sign in to your AWS account, and choose one of the following options to launch the
AWS CloudFormation template. For help choosing an option, see Deployment options
earlier in this guide.
Deploy clickstream analytics into a
new VPC on AWS
Deploy clickstream analytics into an
existing VPC on AWS
Important If you’re deploying clickstream analytics into an existing VPC, make
sure that your VPC has two private subnets in different Availability Zones for the
workload instances and that the subnets aren’t shared. This Quick Start doesn’t
support shared subnets. These subnets require NAT gateways or NAT instances in
their route tables, to allow the instances to download packages and software without
exposing them to the internet. You will also need the domain name option
configured in the Dynamic Host Configuration Protocol (DHCP) options as
explained in the Amazon VPC documentation. You will be prompted for your VPC
settings when you launch the Quick Start.
Each deployment takes about 30 minutes to complete.
2. Check the AWS Region that’s displayed in the upper-right corner of the navigation bar,
and change it if necessary. This is where the network infrastructure for clickstream
analytics will be built. The template is launched in the US West (Oregon) Region by
default.
• new VPC
• workloadDeploy • workload onlyDeploy
![Page 10: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/10.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 10 of 33
Note This deployment includes services that aren’t supported in all AWS Regions.
For a list of supported Regions, see the pages for Amazon Kinesis Data Firehose and
Amazon QuickSight.
3. On the Select Template page, keep the default setting for the template URL, and then
choose Next.
4. On the Specify Details page, change the stack name if needed. Review the parameters
for the template. Provide values for the parameters that require input. For all other
parameters, review the default settings and customize them as necessary.
In the following tables, parameters are listed by category and described separately for
the two deployment options:
– Parameters for deploying clickstream analytics into a new VPC
– Parameters for deploying clickstream analytics into an existing VPC
When you finish reviewing and customizing the parameters, choose Next.
OPTION 1: PARAMETERS FOR DEPLOYING CLICKSTREAM ANALYTICS INTO A NEW VPC
View template
Network configuration:
Parameter label
(name) Default Description
Availability Zones
(AvailabilityZones)
Requires input The list of Availability Zones to use for the subnets in the VPC.
The Quick Start uses two Availability Zones from your list and
preserves the logical order you specify.
VPC CIDR
(VPCCIDR)
10.0.0.0/16 The CIDR block for the VPC.
Private subnet 1 CIDR
(PrivateSubnet1CIDR)
10.0.0.0/19 The CIDR block for the private subnet located in Availability
Zone 1.
Private subnet 2 CIDR
(PrivateSubnet2CIDR)
10.0.32.0/19 The CIDR block for the private subnet located in Availability
Zone 2.
Public subnet 1 CIDR
(PublicSubnet1CIDR)
10.0.128.0/20 The CIDR block for the public subnet located in Availability
Zone 1.
Public subnet 2 CIDR
(PublicSubnet2CIDR)
10.0.144.0/20 The CIDR block for the public subnet located in Availability
Zone 2.
![Page 11: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/11.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 11 of 33
Bastion host configuration:
Parameter label
(name) Default Description
Bastion host key pair
name
(KeyPairName)
Requires input A public/private key pair, which allows you to connect securely
to your instance after it launches. This is the key pair you
created in your preferred AWS Region; see the Technical
requirements section. If you don’t have a key pair in this
Region, please create it before continuing.
Bastion instance type
(BastionInstanceType)
t2.micro The bastion host EC2 instance type.
Allowed CIDR for
external access to
bastion
(RemoteAccessCIDR)
Requires input A CIDR block that’s allowed external access to the bastion. We
recommend that you use a constrained CIDR range to reduce
the potential of inbound attacks from unknown IP addresses
(see http://checkip.dyndns.org/).
Bastion AMI operating
system
(BastionAMIOS)
Amazon-Linux-
HVM
The Amazon Linux distribution for the Amazon Machine
Image (AMI) to be used for the bastion instances.
Web server configuration
Parameter label
(name) Default Description
Application instance
type
(AppInstanceType)
t2.micro The application server EC2 instance type.
Application host key
pair name
(AppKeyPairName)
Requires input Public/private key pairs allow you to securely connect to your
application instance after it launches. If you don’t have a key
pair in this Region, please create it before continuing.
SNS configuration
Parameter label
(name) Default Description
Email ID to receive
alert notifications
(OperatorEMail)
Requires input The email address to notify you, if there are any scaling
operations.
Encrypt data configuration
Parameter label
(name) Default Description
Encrypt data at rest
(EncryptData)
no Set to yes to encrypt the data as it leaves your Amazon Kinesis
Data Firehose delivery stream.
![Page 12: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/12.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 12 of 33
Amazon ES configuration
Parameter label
(name) Default Description
Amazon ES domain
name
(ESDomainName)
Requires input The user-defined Amazon ES domain name.
Amazon ES version
(ESVersion)
6.3 The user-defined Amazon ES version.
Number of instances
to run in cluster
(ESClusterInstance
Count)
1 For two Availability Zones, you must choose instances in
multiples of two.
Cluster instance type
(ESInstanceType)
m4.large.
elasticsearch
The instance type for Amazon ES nodes
Instance type for
dedicated master
(DedicatedMasterType)
m4.large.
elasticsearch
The master instance type for Amazon ES nodes.
Number of dedicated
masters in a cluster
(DedicatedMasterCount)
0 The number of dedicated masters to run. Leave the default
value for this field, if you don't want a dedicated master
instances.
Provisioned IOPS
(IOPS)
0 The provisioned IOPS value must be an integer between 1000
and 16000.
EBS volume size
(VolumeSize)
10 The IOPS total cluster size in GB (EBS volume size x instance
count).
Everyday snapshot
time
(AutomatedSnapshot
StartHour)
0 Schedule automated snapshots. Value should be between 0-
23.
EBS volume type
(VolumeType)
gp2 The type of volume used for instances in a cluster.
Amazon Elasticsearch destination configuration for Amazon Kinesis Data Firehose:
Parameter label
(name) Default Description
Index name
(ESIndex)
Requires input The index name of the Amazon ES domain.
Type name
(ESType)
Requires input The name of the Amazon ES type.
Index rotation
(ESIndexRotation)
NoRotation The frequency at which the Amazon ES index will be rotated.
![Page 13: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/13.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 13 of 33
Parameter label
(name) Default Description
Buffer interval
(ESBufferInterval)
300 The number of seconds to buffer data before delivering to
Amazon S3 to be copied to Amazon ES (60 to 900).
Buffer size
(ESBufferSize)
5 MB of data to buffer before delivering to Amazon S3 to be
copied to Elasticsearch (1 to 100).
Amazon Redshift cluster configuration
Parameter label
(name) Default Description
Database name
(DatabaseName)
Requires input The name of the first database to be created when the Amazon
Redshift cluster is created.
Cluster type
(ClusterType)
single-node The type of Amazon Redshift cluster.
Number of nodes
(NumberOfNodes)
1 The number of compute nodes in the Amazon Redshift cluster.
For multi-node clusters, the NumberOfNodes parameter must
be greater than 1.
Node type
(NodeType)
dc2.large The type of Amazon Redshift node to be provisioned.
Redshift port number
(RedshiftPortNumber)
5439 The Amazon Redshift publicly accessible port number.
Include demo data
(isDemo)
no Set to yes if you want to ingest demo data into Amazon
Redshift.
Amazon Redshift configuration for Amazon Kinesis Data Firehose
Parameter label
(name) Default Description
Master user name
(MasterUser)
masteruser The name of the master user of the Amazon Redshift cluster.
Master user password
(MasterUserPassword)
Requires input The master user password for the Amazon Redshift cluster.
Table name
(RedshiftTableName)
apache_logs.access
_logs
The name of the table in the Amazon Redshift cluster. Do not
change it.
Column pattern
(RedshiftColumns)
Requires input The comma-separated list of the columns in the destination
Amazon Redshift table.
Buffer interval
(RedshiftBuffer
Interval)
300 The number of seconds to buffer data before delivering to
Amazon Redshift (60 to 900).
Buffer size
(RedshiftBufferSize)
5 MB of data to buffer before delivering to Amazon S3 (1 to 128).
![Page 14: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/14.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 14 of 33
Amazon S3 destination configuration for Amazon Kinesis Data Firehose:
Parameter label
(name) Default Description
Buffer interval
(S3BufferInterval)
300 The number of seconds to buffer data before delivering to
Amazon S3 (60 to 900).
Buffer size
(S3BufferSize)
5 MB of data to buffer before delivering to Amazon S3 (1 to 128).
Destination prefix
(S3DestinationPrefix)
AggregatedData
The name of the prefix where the aggregated data will be
stored.
Custom website configuration:
Parameter label
(name) Default Description
Website content S3
bucket
(WebsiteContent)
Requires input The Amazon S3 location where your custom website contents
are uploaded. This Quick Start will deploy your website from
this location. Leave blank if there is no site to deploy.
AWS Quick Start configuration:
Note We recommend that you keep the default settings for the following two
parameters, unless you are customizing the Quick Start templates for your own
deployment projects. Changing the settings of these parameters will automatically
update code references to point to a new Quick Start location. For additional details,
see the AWS Quick Start Contributor’s Guide.
Parameter label
(name) Default Description
Quick Start S3 bucket
name
(QSS3BucketName)
aws-quickstart The S3 bucket you created for your copy of Quick Start assets,
if you decide to customize or extend the Quick Start for your
own use. The bucket name can include numbers, lowercase
letters, uppercase letters, and hyphens, but should not start or
end with a hyphen.
Quick Start S3 key
prefix
(QSS3KeyPrefix)
quickstart-ct-
clickstream-
analytics/
The S3 key name prefix used to simulate a folder for your copy
of Quick Start assets, if you decide to customize or extend the
Quick Start for your own use. This prefix can include numbers,
lowercase letters, uppercase letters, hyphens, and forward
slashes.
![Page 15: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/15.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 15 of 33
OPTION 2: PARAMETERS FOR DEPLOYING CLICKSTREAM ANALYTICS INTO AN EXISTING
VPC
View template
Network configuration:
Parameter label
(name) Default Description
Availability Zones
(AvailabilityZones)
Requires input The list of Availability Zones to use for the subnets in the VPC.
The Quick Start uses two Availability Zones from your list and
preserves the logical order you specify.
Existing VPC ID
(VPC)
Requires input Choose an existing VPC.
Subnet configuration:
Parameter label
(name) Default Description
Existing public subnet
ID in AZ-1
(PublicSubnetA)
Requires input The public subnet in Availability Zone 1.
Existing public subnet
ID in AZ-2
(PublicSubnetB)
Requires input The public subnet in Availability Zone 2.
Existing private
subnet ID in AZ-1
(PrivateSubnetA)
Requires input The private subnet in Availability Zone 1.
Existing private
subnet ID in AZ-2
(PrivateSubnetB)
Requires input The private subnet in Availability Zone 2.
Bastion host configuration:
Parameter label
(name) Default Description
Bastion host key pair
name
(KeyPairName)
Requires input A public/private key pair, which allows you to connect securely
to your instance after it launches. This is the key pair you
created in your preferred AWS Region; see the Technical
requirements section. If you do not have one in this region,
please create it before continuing.
Bastion instance type
(BastionInstanceType)
t2.micro The bastion host EC2 instance type.
![Page 16: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/16.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 16 of 33
Parameter label
(name) Default Description
Allowed CIDR for
external access to
bastion
(RemoteAccessCIDR)
Requires input A CIDR block that’s allowed external access to the bastion. We
recommend that you use a constrained CIDR range to reduce
the potential of inbound attacks from unknown IP addresses
(see http://checkip.dyndns.org/).
Bastion AMI operating
system
(BastionAMIOS)
Amazon-Linux-
HVM
The Amazon Linux distribution for the Amazon Machine
Image (AMI) to be used for the bastion instances.
Web server configuration
Parameter label
(name) Default Description
Application instance
type
(AppInstanceType)
t2.micro The application server EC2 instance type.
Application host key
pair name
(AppKeyPairName)
Requires input Public/private key pairs allow you to securely connect to your
application instance after it launches. If you don’t have a key
pair in this Region, please create it before continuing.
SNS configuration
Parameter label
(name) Default Description
Email ID to receive
alert notifications
(OperatorEMail)
Requires input The email address to notify you, if there are any scaling
operations.
Encrypt data configuration
Parameter label
(name) Default Description
Encrypt data at rest
(EncryptData)
no Set to yes to encrypt the data as it leaves your Amazon Kinesis
Data Firehose delivery stream.
Amazon ES configuration
Parameter label
(name) Default Description
Elasticsearch domain
name
(ESDomainName)
Requires input The user-defined Amazon ES domain name.
![Page 17: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/17.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 17 of 33
Parameter label
(name) Default Description
Elasticsearch version
(ESVersion)
6.3 The user-defined Amazon ES version.
Number of instances
to run in cluster
(ESClusterInstance
Count)
1 For two Availability Zones, you must choose instances in
multiples of two.
Cluster instance type
(ESInstanceType)
m4.large.
elasticsearch
The instance type for Amazon ES nodes.
Instance type for
dedicated master
(DedicatedMasterType)
m4.large.
elasticsearch
The master instance type for Amazon ES nodes.
Number of dedicated
masters in a cluster
(DedicatedMasterCount)
0 The number of dedicated masters to run. Leave the default
value for this field, if you don’t want a dedicated master
instance.
IOPS for cluster
(IOPS)
0 The provisioned IOPS value must be an integer between 1000
and 16000.
EBS volume size
(VolumeSize)
10 IOPS total cluster size in GB (EBS volume size x instance
count).
Everyday snapshot
time
(AutomatedSnapshotSta
rtHour)
0 Schedule automated snapshots. Value should be between 0-
23.
EBS volume type
(VolumeType)
gp2 The type of volume used for instances in a cluster.
Amazon Elasticsearch destination configuration for Amazon Kinesis Data Firehose:
Parameter label
(name) Default Description
Index name
(ESIndex)
Requires input The index name of the Amazon ES domain.
Type name
(ESType)
Requires input The name of the Amazon ES type.
Index rotation
(ESIndexRotation)
NoRotation The frequency at which the Amazon ES index will be rotated.
Buffer interval
(ESBufferInterval)
300 The number of seconds to buffer data before delivering to
Amazon S3 to be copied to Amazon ES (60 to 900).
Buffer size
(ESBufferSize)
5 MB of data to buffer before delivering to Amazon S3 to be
copied to Amazon ES (1 to 100).
![Page 18: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/18.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 18 of 33
Amazon Redshift cluster configuration
Parameter label
(name) Default Description
Database name
(DatabaseName)
Requires input The name of the first database to be created when the Amazon
Redshift cluster is created.
Cluster type
(ClusterType)
single-node The type of Amazon Redshift cluster.
Number of nodes
(NumberOfNodes)
1 The number of compute nodes in the Amazon Redshift cluster.
For multi-node clusters, the NumberOfNodes parameter must
be greater than 1.
Node type
(NodeType)
dc2.large The type of Amazon Redshift node to be provisioned.
Redshift port number
(RedshiftPortNumber)
5439 The Amazon Redshift publicly accessible port number.
Include demo data
(isDemo)
no Set to yes if you want to ingest demo data into Amazon
Redshift.
Amazon Redshift configuration for Amazon Kinesis Data Firehose
Parameter label
(name) Default Description
Master user name
(MasterUser)
masteruser The name of the master user of the Amazon Redshift cluster.
Master user password
(MasterUserPassword)
Requires input The master user password for the Amazon Redshift cluster.
Table name
(RedshiftTableName)
apache_logs.access
_logs
The name of the table in the Amazon Redshift cluster. Do not
change it.
Column pattern
(RedshiftColumns)
Requires input The comma-separated list of the columns in the destination
Amazon Redshift table.
Buffer interval
(RedshiftBuffer
Interval)
300 The number of seconds to buffer data before delivering to
Amazon Redshift (60 to 900).
Buffer size
(RedshiftBufferSize)
5 MB of data to buffer before delivering to Amazon Redshift (1
to 128).
Amazon S3 destination configuration for Amazon Kinesis Data Firehose
Parameter label
(name) Default Description
Buffer interval
(S3BufferInterval)
300 Number of seconds to buffer data before delivering to Amazon
S3 (60 to 900).
![Page 19: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/19.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 19 of 33
Parameter label
(name) Default Description
Buffer size
(S3BufferSize)
5 MB of data to buffer before delivering to Amazon S3 (1 to 128).
Destination prefix
(S3DestinationPrefix)
AggregatedData The name of the prefix where the aggregated data will be
stored.
Custom website configuration:
Parameter label
(name) Default Description
Website content S3
bucket
(WebsiteContent)
Requires input The Amazon S3 location where your custom website contents
are uploaded. This Quick Start will deploy your website from
this location. Leave blank if there is no site to deploy.
AWS Quick Start configuration:
Note We recommend that you keep the default settings for the following two
parameters, unless you are customizing the Quick Start templates for your own
deployment projects. Changing the settings of these parameters will automatically
update code references to point to a new Quick Start location. For additional details,
see the AWS Quick Start Contributor’s Guide.
Parameter label
(name) Default Description
Quick Start S3 bucket
name
(QSS3BucketName)
aws-quickstart The S3 bucket you have created for your copy of Quick Start
assets, if you decide to customize or extend the Quick Start for
your own use. The bucket name can include numbers,
lowercase letters, uppercase letters, and hyphens, but should
not start or end with a hyphen.
Quick Start S3 key
prefix
(QSS3KeyPrefix)
quickstart-ct-
clickstream-
analytics /
The S3 key name prefix used to simulate a folder for your copy
of Quick Start assets, if you decide to customize or extend the
Quick Start for your own use. This prefix can include numbers,
lowercase letters, uppercase letters, hyphens, and forward
slashes.
5. On the Options page, you can specify tags (key-value pairs) for resources in your stack
and set advanced options. When you’re done, choose Next.
6. On the Review page, review and confirm the template settings. Under Capabilities,
select the two check boxes to acknowledge that the template will create IAM resources
and that it might require the capability to auto-expand macros.
![Page 20: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/20.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 20 of 33
7. Choose Create to deploy the stack.
8. Monitor the status of the stack. When the status is CREATE_COMPLETE, the
Amazon ES cluster is ready.
9. Use the URLs displayed in the Outputs tab for the stack to view the resources that were
created and to verify the deployment, as discussed in the next step.
Step 3. Test the deployment
When the Quick Start deployment is complete, you can validate and test the deployment by
checking the resources in the Outputs tab of the AWS CloudFormation console.
Figure 3: Clickstream analytics outputs after successful deployment
Confirm the following:
The ALB endpoint (LoadBalancerDNSEndpoint) listed in the Outputs tab should open
the default Apache web server homepage, if you open it in a web browser.
The Amazon ES cluster (ElasticSearchDomainEndpoint) listed in the Outputs tab for
the stack is available in the Amazon Elasticsearch Service console at
https://console.aws.amazon.com/es/, and Kibana is accessible via a web browser at the
location mentioned in the Amazon Elasticsearch Service console.
![Page 21: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/21.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 21 of 33
Note S3 buckets are retained after you delete the AWS CloudFormation stacks
created by this Quick Start, so your sample data for the Clickstream Analytics
solution remains available in your AWS account. To remove those buckets, delete the
contents of each bucket, and then delete each bucket. For more information, see the
Amazon S3 documentation.
Quick Start datasets
You can deploy this Quick Start with optional sample datasets and later extend it with your
own dataset when needed.
If you opted for the Quick Start deployment with sample data in Amazon Redshift, a
clickstream_demo schema is created in the Amazon Redshift cluster during deployment.
The clickstream_demo schema has the following two tables and structure:
ga_demo_data: This table holds sample data from Google Analytics for a website's traffic.
Column Data type
CIT VARCHAR (100)
COUNTRY_REGION VARCHAR (1000)
DATE DATE
EXITS INTEGER
MEDIUM VARCHAR (1000)
NUMBER_OF_RECORDS INTEGER
PAGE VARCHAR (1000)
PAGEVIEWS INTEGER
SECTION VARCHAR (1000)
TIME_ON_PAGE INTEGER
TOTAL_DOWNLOADS INTEGER
UNIQUE_VISITORS INTEGER
VISITS INTEGER
schematic_log: The following table holds a sample dataset of clickstream data for
January 2015 released by Wikipedia. The dataset contains referrer-article pairs from the
![Page 22: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/22.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 22 of 33
English language, desktop version of Wikipedia—just a sample of the 4 billion total requests
made in January 2015.
Column Data Type
ACTION VARCHAR (100)
BYTES INTEGER
ITEM VARCHAR (1000)
NUMBER_OF_PURCHASES INTEGER
RESPONSE VARCHAR (100)
PCT_PURCHASE NUMERIC (8,2)
NUMBER_OF_VIEWS INTEGER
BRAND VARCHAR (1000)
CLICKHERE VARCHAR (1000)
CATEGORY VARCHAR (1000)
CLIENTIP VARCHAR (1000)
ITEMID VARCHAR (1000)
MSG VARCHAR (1000)
NUMBER_OF_RECORDS INTEGER
PRODUCTID VARCHAR (1000)
RBYTES INTEGER
RSTAT INTEGER
SERVERIP VARCHAR (1000)
SESSIONID VARCHAR (1000)
TIMESTAMP TIMESTAMP
URL VARCHAR (2000)
Optional: Analyzing and visualizing data with Amazon QuickSight
Post-deployment, using Amazon QuickSight, you can import or connect to your data,
analyze it, and share your data visualizations in reports and dashboards.
SIGNING IN TO AMAZON QUICKSIGHT
1. Go to the Amazon QuickSight page at https://quicksight.aws.amazon.com/.
2. In QuickSight account name, enter your account name. This is the same name you
used to create an Amazon QuickSight subscription. Keep it handy, in case you need it.
![Page 23: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/23.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 23 of 33
3. Provide your email address, if prompted.
4. If the user name is blank, type your user name.
5. Choose one of the following:
– For organizational users: The user name provided by your administrator. Your
account can be based on IAM credentials, a single sign-on (SSO) service, or your
email address. If you have received an email invitation from another Amazon
QuickSight user, it will mention the type of credentials to use.
– For individual users: The user name you created for yourself. This is usually the
IAM credentials you created. User names that contain a semicolon (;) aren't
supported.
6. In Password, type the associated password. If you aren't sure, ask the administrator. If
you create a new password, in Confirm password, retype your password. Passwords
are case-sensitive, must be between 8 and 64 characters in length, and must contain at
least one character from three of the following categories:
– Lowercase letters (a–z)
– Uppercase letters (A–Z)
– Numbers (0–9)
– Non alphanumeric characters (~!@#$%^&*_-+=`|\(){}[]:;"'<>,.?/)
7. Choose Sign in. In some cases, this button is labeled Create account and sign in.
(Only for users invited by email.) You are prompted to type the account name
provided in your email invitation. If you mistype it, you get an authentication error. To
change the account name, choose the account name in Account name, and type in the
correct one.
SETTING UP AMAZON QUICKSIGHT
Follow the instructions provided in Setting Up Amazon QuickSight.
CONNECTING TO THE AMAZON REDSHIFT CLUSTER/DATABASE
1. Once you have logged in to Amazon QuickSight, select the Region where you have
launched the Quick Start, so that Amazon QuickSight can access the sample datasets
available in the Amazon Redshift cluster/database.
You should be able to see all Amazon Redshift clusters available in the selected Region,
including the Amazon Redshift cluster created by the AWS CloudFormation template.
![Page 24: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/24.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 24 of 33
2. Create a new data source in Amazon QuickSight by choosing Manage data, and then
choosing New data set. Choose Redshift Auto-discovered, and then provide the
following details:
– Data source name: SampleDataSource
– Instance ID: See the RedshiftCluster value in the CloudFormation outputs section
– Connection type: Public network
– Database name : <DatabaseName parameter>
– Username: <Master user name parameter>
– Password: <Master user password parameter>
Figure 4: Amazon Redshift data source in QuickSight
3. Choose the required table. If you have imported the demo data, this will be
ga_demo_data. A second dataset will need to be created for schematic_log.
4. Choose New Analysis to start creating a new analysis or report in Amazon QuickSight.
Amazon QuickSight should show two datasets to select from for creating an analysis, as
depicted in the following figure.
![Page 25: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/25.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 25 of 33
Figure 5: Your datasets in Amazon QuickSight
a. Select ga_demo_data to create analysis on the sample dataset.
b. Select the desired fields and the visual types, as shown in the following figure.
Drag and drop the fields for X axis, Value, and Color to see a page view
distribution from the ga_demo_data dataset.
Figure 6: Page view metrics
5. Go back to the QuickSight home page, and then choose New analysis to start creating a
new analysis or report in Amazon QuickSight. This time, select schematic_log as the
dataset.
6. Select the fields and the visual type as shown in the following figure. Drag and drop the
fields for Group by and Value to see a session wise analysis from the schematic_log
![Page 26: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/26.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 26 of 33
dataset to see the number of views for each session with respect to action for a specific
URL.
Figure 7: Session wise analysis
7. Select the fields and the visual type as shown in the following figure. Drag and drop the
fields for Y axis, Value, and Group/Color to see a client wise analysis from the
schematic_log dataset to see the total number of sessions for each client IP address.
![Page 27: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/27.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 27 of 33
Figure 8: Session frequency analysis – client wise
8. Select the fields and the visual type, as shown in the following figure. Drag and drop the
fields for Y axis, Value, and Group/Color to see a response analysis from the
schematic_log dataset. Visualize the number of events captured with a response status
such as Successful, Request Lost, No Response, Error Response, etc.
![Page 28: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/28.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 28 of 33
Figure 9: Response analysis
Optional: Ingesting Apache web access logs with Kinesis Data Firehose
The Kinesis Agent is installed and configured in the web servers, which watch the Apache
access logs and ship a newly written file to the configured destinations with Kinesis Data
Firehose. The following schema/table (apache_logs.access_logs) is created in Amazon
Redshift to hold Apache access logs.
Column Data Type
HOST VARCHAR (1000)
IDENT VARCHAR (1000)
AUTHUSER VARCHAR (1000)
DATETIME TIMESTAMP
REQUEST VARCHAR (4000)
RESPONSE VARCHAR (4000)
BYTES VARCHAR (4000)
![Page 29: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/29.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 29 of 33
The Kinesis agent is configured with the optionName LOGTOJSON and logFormat
COMMONAPACHELOG to transform common Apache access logs to JSON format before
sending to the Kinesis Data Firehose delivery stream.
COMMONAPACHELOG is the Apache Common Log format. Each log entry has the
following pattern by default: "%{host} %{ident} %{authuser} [%{datetime}] \"%{request}\"
%{response} %{bytes}"
If you have existing web logs that you want to deliver to Amazon Redshift as a one-time
activity, you can include the following configuration in your Kinesis Agent by using the
deliveryStream name as provisioned in your stack and creating a temp file with the existing
web log data (/tmp/mylog.txt). This configuration uses the LOGTOJSON for
COMMONAPACHELOG.
{ "cloudwatch.emitMetrics": true, "firehose.endpoint": "https://firehose.us-west-2.amazonaws.com", "flows": [ { "filePattern": "/tmp/mylog.txt", "deliveryStream": "<enter the Redshift delivery stream name provisioned using the stack", "initialPosition": "START_OF_FILE", "dataProcessingOptions": [ { "optionName": "LOGTOJSON", "logFormat": "COMMONAPACHELOG" } ] } ] }
You can customize the Amazon Redshift schema and Kinesis Agent configuration for other
web servers to suit your use case.
Best practices for using clickstream analytics on AWS
If you have sensitive data, you can enable server-side data encryption in Amazon Kinesis
Data Firehose. Use AWS CloudTrail to record actions taken by a user, role, or an AWS
service in Kinesis Data Firehose. Similarly, to protect sensitive data in Amazon ES, you can
enable encryption of data at rest by using AWS Key Management Service (AWS KMS) and
![Page 30: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/30.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 30 of 33
also use node-to-node encryption. In Amazon S3, you can encrypt objects by using server-
side encryption with either Amazon S3-managed keys or AWS KMS-managed keys. In
Amazon Redshift, you can enable database encryption for your cluster using AWS KMS to
help protect the data at rest.
For Amazon ES, it is a best practice to use three dedicated master nodes and to deploy the
domain across three Availability Zones.
Security
The Amazon Redshift cluster is in a VPC and publicly accessible with a public IP address for
Kinesis Data Firehose to deliver clickstream data. The access is secured by allowing only the
Kinesis Data Firehose IP addresses for each available AWS Region.
The Amazon ES domain is protected from public access. The access policy of the Amazon
ES domain is configured to allow access from specific IPs, which is your IP address only.
FAQ
Q. I encountered a CREATE_FAILED error when I launched the Quick Start.
A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the
template with Rollback on failure set to No. (This setting is under Advanced in the
AWS CloudFormation console, Options page.) With this setting, the stack’s state will be
retained and the instance will be left running, so you can troubleshoot the issue. (For
Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and
C:\cfn\log.)
Important When you set Rollback on failure to No, you will continue to incur
AWS charges for this stack. Please make sure to delete the stack when you finish
troubleshooting.
For additional information, see Troubleshooting AWS CloudFormation on the AWS
website.
Q. I encountered a size limitation error when I deployed the AWS CloudFormation
templates.
A. We recommend that you launch the Quick Start templates from the links in this guide or
from another S3 bucket. If you deploy the templates from a local copy on your computer or
from a non-S3 location, you might encounter template size limitations when you create the
![Page 31: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/31.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 31 of 33
stack. For more information about AWS CloudFormation limits, see the AWS
documentation.
Q. I deployed the Quick Start in the EU (London) Region, but it didn’t work.
A. This Quick Start includes services that aren’t supported in all Regions. See the page for
Amazon QuickSight on the AWS website for a list of supported Regions.
Q. I encountered an “S3 bucket already exists” error during deployment.
A. S3 buckets created by this Quick Start are retained after you delete the CloudFormation
stacks, so your Clickstream Analytics sample data remains available in your AWS account.
To remove those buckets, delete the contents of each bucket, and then delete each bucket.
Q. I encountered a problem accessing the Kibana dashboard in Amazon ES.
A. Amazon ES is protected from public access. Make sure that your IP matches the input
parameter Remote Access CIDR, which is whitelisted for Amazon ES.
GitHub repository
To post feedback, submit feature ideas, or report bugs, use the Issues section of the
GitHub repository for this Quick Start. If you’d like to submit code, please review the Quick
Start Contributor’s Guide.
Additional resources
AWS resources
Getting Started Resource Center
AWS General Reference
AWS Glossary
AWS services
Amazon Athena
AWS CloudFormation
Amazon CloudWatch
Amazon EBS
Amazon EC2
![Page 32: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/32.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 32 of 33
Amazon Elasticsearch Service
Kibana plug-in
Amazon Kinesis
Amazon QuickSight
Amazon Redshift
Amazon S3
Amazon SNS
Amazon VPC
Cambridge Technology products and documentation
Cambridge Technology website
Other Quick Start reference deployments
AWS Quick Start home page
Document revisions
Date Change In sections
September 2019 Initial publication —
![Page 33: Clickstream Analytics on the AWS Cloud · Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019 Page 4 of 33 Architecture Deploying this Quick Start for a](https://reader030.vdocuments.us/reader030/viewer/2022040400/5e70591df6852e1afe590b56/html5/thumbnails/33.jpg)
Amazon Web Services – Clickstream Analytics on the AWS Cloud September 2019
Page 33 of 33
© 2019, Amazon Web Services, Inc. or its affiliates, and Cambridge Technology. All rights
reserved.
Notices
This document is provided for informational purposes only. It represents AWS’s current product offerings
and practices as of the date of issue of this document, which are subject to change without notice. Customers
are responsible for making their own independent assessment of the information in this document and any
use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether
express or implied. This document does not create any warranties, representations, contractual
commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities
and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of,
nor does it modify, any agreement between AWS and its customers.
The software included with this paper is licensed under the Apache License, Version 2.0 (the "License"). You
may not use this file except in compliance with the License. A copy of the License is located at
http://aws.amazon.com/apache2.0/ or in the "license" file accompanying this file. This code is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.