log analytics with amazon elasticsearch service and amazon kinesis - march 2017 aws online tech...
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log Analytics with Amazon Kinesis and Amazon Elasticsearch
Service
What to do with a terabyte of logs?
data source Amazon Kinesis Firehose Amazon Elasticsearch Service
Kibana
Log analytics architecture
Amazon Elasticsearch Service is a cost-effective
managed service that makes it easy to deploy,
manage, and scale open source Elasticsearch for log
analytics, full-text search and more.Amazon
Elasticsearch Service
Amazon Elasticsearch Service benefits
Easy to use
Open-source compatible
Secure
Highly available
AWS integrated
Scalable
Adobe Developer Platform (Adobe I/O)
P R O B L E M• Cost effective monitor for
XL amount of log data
• Over 200,000 API calls per second at peak - destinations, response times, bandwidth
• Integrate seamlessly with other components of AWS eco-system.
S O L U T I O N• Log data is routed with
Amazon Kinesis to Amazon Elasticsearch Service, then displayed using AES Kibana
• Adobe team can easily see traffic patterns and error rates, quickly identifying anomalies and potential challenges
B E N E F I T S• Management and
operational simplicity
• Flexibility to try out different cluster config during dev and test
AmazonKinesisStreams
Spark StreamingAmazon
Elasticsearch Service
Data Sources
11
00
McGraw Hill Education
P R O B L E M• Supporting a wide catalog
across multiple services in multiple jurisdictions
• Over 100 million learning events each month
• Tests, quizzes, learning modules begun / completed / abandoned
S O L U T I O N
• Search and analyze test results, student/teacher interaction, teacher effectiveness, student progress
• Analytics of applications and infrastructure are now integrated to understand operations in real time
B E N E F I T S
• Confidence to scale throughout the school year. From 0 to 32TB in 9 months
• Focus on their business, not their infrastructure
Get set up right
Amazon ES overview
Amazon Route 53
Elastic LoadBalancingIAM
CloudWatch
Elasticsearch API
CloudTrail
Data pattern
Amazon ES cluster
logs_01.21.2017
logs_01.22.2017
logs_01.23.2017
logs_01.24.2017
logs_01.25.2017
logs_01.26.2017
logs_01.27.2017
Shard 1
Shard 2
Shard 3hostidentauthtimestampetc.
Each index hasmultiple shards
Each shard containsa set of documents
Each document containsa set of fields and values
One index per day
Deployment of indices to a cluster
• Index 1– Shard 1– Shard 2– Shard 3
• Index 2– Shard 1– Shard 2– Shard 3
Amazon ES cluster
12
3
12
3
12
3
12
3
Primary Replica
1
3
3
1
Instance 1,Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
How many instances?
The index size will be about the same as the corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements• Either local storage or up to 1.5TB of EBS per
instance
• Example: 2TB corpus will need 4 instances– Assuming a replica and using EBS– Or with i2.2xlarge nodes (1.6TB ephemeral storage)
Instance type recommendations
Instance WorkloadT2 Entry point. Dev and test.
M3, M4 Equal read and write volumes.
R3, R4 Read-heavy or workloads with high memory demands (e.g., aggregations).
C4 High concurrency/indexing workloads
I2 Up to 1.6 TB of SSD instance storage.
Cluster with no dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1,Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
Cluster with dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodesData nodes: queries and updates
Master node selection
• < 10 nodes - m3.medium, c4.large• 11-20 nodes - m4.large, r4.large, m3.large, r3.large• 21-40 nodes - c4.xlarge, m4.xlarge, r4.xlarge, m3.xlarge
Cluster with zone awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3
Small use cases
• Logstash co-located on the Application instance
• SigV4 signing via provided output plugin
• Up to 200GB of data• m3.medium + 100G EBS
data nodes• 3x m3.medium master nodes
ApplicationInstance
Large use cases
AmazonDynamoDB
AWSLambda
Amazon S3 bucket
Amazon CloudWatch
• Data flows from instances and applications via Lambda; CWL is implicit
• SigV4 signing via Lambda/roles
• Up to 5TB of data• r3.2xlarge + 512GB EBS
data nodes• 3x m3.medium master nodes
XL use cases
Amazon Kinesis
• Ingest supported through high-volume technologies like Spark or Kinesis
• Up to 60 TB of data• R3.8xlarge + 640GB data
nodes• 3x m3.xlarge master nodes
Amazon EMR
Best practices
Data nodes = Storage needed/Storage per nodeUse GP2 EBS volumesUse 3 dedicated master nodes for production deploymentsEnable Zone AwarenessSet indices.fielddata.cache.size = 40
Amazon Kinesis
Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver, process streams on AWS
Amazon KinesisStreams
Amazon KinesisAnalytics
Amazon KinesisFirehose
Amazon Kinesis Streams
• Easy administration• Build real time applications with framework of choice• Low cost
Amazon Kinesis Firehose
• Zero administration• Direct-to-data store integration• Seamless elasticity
Amazon Kinesis Analytics
• Interact with streaming data in real-time using SQL• Build fully managed and elastic stream processing
applications that process data for real-time visualizations and alarms
Amazon Kinesis - Firehose vs. Streams
Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.
Kinesis Firehose overview
Delivery Stream: Underlying AWS resource
Destination: Amazon ES, Amazon Redshift, or Amazon S3
Record: Put records in streams to deliver to destinations
Kinesis Firehose Data Transformation• Firehose buffers up to 3MB of ingested data• When buffer is full, automatically invokes Lambda function,
passing array of records to be processed• Lambda function processes and returns array of transformed
records, with status of each record• Transformed records are saved to configured destination
[{" "recordId": "1234", "data": "encoded-data" }, { "recordId": "1235", "data": "encoded-data" }]
[{ "recordId": "1234", "result": "Ok" "data": "encoded-data" }, { "recordId": "1235", "result": "Dropped" "data": "encoded-data" }]
Kinesis Firehose delivery architecture with transformations
S3 bucket
source records
data source
source records
Amazon ElasticsearchService
Firehosedelivery stream
transformedrecords
delivery failure
Data transformation function
transformation failure
Kinesis Firehose features for ingest
Serverless scale Error handling S3 Backup
Best practices
Use smaller buffer sizes to increase throughput, but be careful of concurrency
Use index rotation based on sizing
Default: stream limits: 2,000 transactions/second, 5,000 records/second, and 5 MB/second
Log analysis with aggregations
Amazon ES aggregations
Buckets – a collection of documents meeting some criterionMetrics – calculations on the content of buckets
Bucket: time
Met
ric: c
ount
host:199.72.81.55 with <histogram of verb>
1, 4, 8, 12, 30, 42, 58, 100...
Look up
199.72.81.55
Field data
GETGETPOSTGETPUTGETGETPOST
Buckets
GETPOSTPUT
521
Counts
A more complicated aggregation
Bucket: ARNBucket: RegionBucket: eventNameMetric: Count
Best practices
Make sure that your fields are not_analyzed
Visualizations are based on buckets/metrics
Use a histogram on the x-axis first, then sub-aggregate
Run Elasticsearch in the AWS cloud with Amazon
Elasticsearch Service
Use Kinesis Firehose to ingest data simply
Kibana for monitoring, Elasticsearch queries for
deeper analysisAmazon Elasticsearch
Service
What to do next
Qwiklab: https://qwiklabs.com/searches/lab?keywords=introduction%20to%20amazon%20elasticsearch%20serviceCentralized logging solutionhttps://aws.amazon.com/answers/logging/centralized-logging/Our overview page on AWShttps://aws.amazon.com/elasticsearch-service/
Q&A
Thank you for joining!