dns cluster - meetupfiles.meetup.com/10485232/slawek_skowron-dnscluster.pdf · dns cluster...
TRANSCRIPT
DNS CLUSTERAutomated Internal DNS Service with Amazon VPC integration
Sławomir Skowron System Engineer (DevOps Team) [email protected]
2013
DevOps Krakow #Meet 1
• Domain Name System is hierarchical and distributed naming system
• Essentially name service for TCP/IP networks
• Allow IP address resolution mechanism
• Adds tree based domain name space,
• Name space is sub-divides into zones and start with root zone
• One of the first NoSQL key-value database
WHAT IS DNS ?
DOMAIN NAME SERVERSSoftware on servers that store, manage and serve information about own part of domain namespace called zone
Two types of servers: master and slave
DNS QUERIES
• Recursive - querying other servers until positive response
• Iterative - add local response (cache, local zone) or give info where to look for it.
Two type of external queries: Recursive and Iterative
Cached Queries - DNS Cache - improve latency and throughput
AMAZON EC2 DNS (VPC) PROBLEMS
• Route-53 (right now) is not supporting internal DNS domains
• Amazon VPC Internal DNS support only ec2.internal domains
• Amazon VPC DHCP in default support only AWS DNS
USE CASE
• Available only in LAN and through VPN
• Only A and SRV - infrastructure DNS
• Resolv local and forward if not exist
• No zone transfer, No slaves, No masters
• Updates are simple, secure and fast
Our own DNS Service
SOLUTION
• Clustering for High Availability and Performance
• Integration with our VPC’s DHPC
• Availability in every Amazon Region
• Caching
• Fully Automated and Integrated with Instance Provisioning
• Support for our name space
Our own DNS Service
SOLUTION• Puppet 3 as Configuration Management solution
• Puppet Hiera, PuppetDB integration
• TheForeman - http://theforeman.org/
• Foreman integrates with BIND
• Unbound as DNSCluster core - local zones, forwarder, cache
• Git for store zones and versioning
WHAT’S WRONG WITH PUPPET ?
• Puppet is slow
• Hard and slow flow developing with Puppet
• Hard to integrate on running machines before puppet.
• PuppetDB is ok but it’s not scalable enough
• Everything go through Foreman and BIND in our case
ANSIBLE• Minimal setup - Python + Libs - pip install ansible
• Use existing auth (root, sudo) on SSH as default transport or accelerated mode
• Ad-hoc operations built in
• async, sync and parallel operations
• Predictable, easy to expand (plugins, connectors, filters, modules)
• Use powerful templates in jinja2
• outputs in json,
• configure in yaml
ANSIBLE @ BASE• Two months of work all in GIT
• 15 playbooks (Universal Flow)
• 25 roles
• 180 yaml files
• 52 template
SOLUTION• Ansible
• Unbound as DNSCluster core - local zones, forwarder, cache
• Git for store zones and versioning
• Amazon VPC DHCP integration - under development
• ETCD integration - under development
IMPROVEMENT• Simple workflow
• Faster development
• Fast Deploy with low memory/cpu consumption
• No central DB
• All data are stored in 3 places and can be restored from running machines
• Work as push or pull workflow
• Integrated with VPC DHCP if new DNSCluster is created
KISS as core thinking
DNSCLUSTER PERFORMANCE
1 500 10000
500
1000
1500
2000
2500
AWS DNS
DNSCLUSTER 1 node (1 cpu core – ec2.x1.small)
UNBOUND local cache (forwarders: 3 dnscluster nodes – 3 x ec2.x1.small) 1 pass – 1 unbound thread
UNBOUND local cache (forwarders: 3 dnscluster nodes – 3 x ec2.x1.small) 2 pass – from cache – 1 unbound threads
UNBOUND local cache (forwarders: 3 dnscluster nodes – 3 x ec2.x1.small) 2 pass – from cache – 2 unbound threads
Concurrency
QPS
Queries per second / Concurrency
DNSCLUSTER PERFORMANCELatency / Concurrency
1 500 10000
0.02
0.04
0.06
0.08
0.1
0.12
AWS DNS
DNSCLUSTER 1 node (1 cpu core – ec2.x1.small)
UNBOUND local cache (forwarders: 3 dnscluster nodes – 3 x ec2.x1.small) 1 pass – 1 unbound thread
UNBOUND local cache (forwarders: 3 dnscluster nodes – 3 x ec2.x1.small) 2 pass – from cache – 1 unbound threads
UNBOUND local cache (forwarders: 3 dnscluster nodes – 3 x ec2.x1.small) 2 pass – from cache – 2 unbound threads
Concurrency
Late
nsy
[sec
onds
]
SOON / NEXT TIME ?
Monitoring and Alertingsecond element for our auto scaling
Ansible Universal Template Flow Created @ Base for simple consistent create/destroy instances