cloudreach voices - aws cloudtrail for detecting dns gotchas in the cloud

10
Copyright ©2015 Cloudreach limited Not if. When Cloudreach Voices Cloudy Issues Explored Our take on Cloud Technology

Upload: cloudreach

Post on 20-Jan-2017

489 views

Category:

Technology


4 download

TRANSCRIPT

Copyright ©2015 Cloudreach limitedNot if. When

Cloudreach Voices Cloudy Issues Explored Our take on Cloud Technology

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

Many services in Cloud environments use DNS to reference resources.

Services like RDS and ELB in AWS may have their underlying hosts change from time to time and it can cause some unintentional outages that can be hard to detect.

What’s the Problem?

is that some applications like to only resolve DNS once.

If the first time an application resolves example.eu-west-1.elb.amazonaws.com it stores it in memory, then it can save a little bit of time and resources the next time the same DNS name is used. DNS is designed to be cached, but only for a set time.

Every DNS entry comes with a “Time to Live” (TTL) which specifies how long the entry is supposed to be valid for, and it usually ranges from about 5 minutes to 24 hours.

The main problem

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

This one has caused us issues in the past: Nginx. If you’re not familiar with Nginx, it’s a web server that can be used as a reverse proxy. A common setup is to have an ELB with Nginx instances caching requests from your other web servers. In this configuration, you’d typically have a proxy_pass in the Nginx config which points to the web tier ELB as shown here:

Here’s an example

This will likely work for you for some of the time, until one day your entire cache layer goes down at once and only comes back up when you restart Nginx on each instance, because the IP addresses of the ELB changed, but Nginx never re-resolves the ELB's name into the new IP address.

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

To get around this with the community version of Nginx, you have to force it to re-evaluate DNS with a configuration like the below (replacing the resolver value with a relevant DNS server).

Referencing the web ELB by a variable and having a resolver defined will make Nginx cache the DNS entry for as long as the TTL. This aligns much better with how DNS is supposed to work (and also means there's no real performance impact).

Is there a solution?

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

Most other applications that have this problem have an option to get around it, but sometimes it can be difficult to detect.

AWS CloudTrail for Detection

You could run through every part of your application and run tests to see if this problem shows up (e.g. by referencing a DNS entry, updating it and then checking if the changes are eventually reflected in the application) but if you’re using AWS... there’s a good way to detect when it’s happened.

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

Using the scenario described previously as an example (an Nginx cache in front of a web tier):

When the IP addresses for the web ELB change, all the Nginx instances won’t be able to act as a reverse proxy. It’s worth noting that the IP addresses I’m referring to are for the ELB itself, and not for the instances behind it. The IPs only need to change when the ELB itself needs to scale or the underlying hosts serving it need to be refreshed.

AWS CloudTrail for Detection

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

Fortunately, there’s a good way to see when this happens.

The AWS CloudTrail service logs every API action taken on your AWS account.

This includes not just your actions, but also the actions of some internal AWS systems. The below screenshot shows the ELB service deleting one of it’s unused network interfaces.

Copyright ©2015 Cloudreach limited

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2016 Cloudreach Limited

If your application has problems every time something like this happens, then DNS being cached for too long is likely your problem.

Some important things to note are the source IP address (elasticloadbalancing.amazonaws.com) and username (root), this shows that the API call comes from an ELB service itself, and not someone manually making the change.

Cloudreach Voices: AWS CloudTrail and Resolving DNS Gotchas in the Cloud

Copyright ©2014 Cloudreach limited

Liked this Deck?

Follow our Twitter, LinkedIn and Blog below

Copyright ©2016 Cloudreach Limited