novel multi-region clusters - instaclustr...cassandra multi-region clusters cs2014.key created date:...
TRANSCRIPT
![Page 1: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/1.jpg)
Novel Multi-region ClustersCassandra Deployments Split Between Heterogeneous Data Centres
with NAT & DNS-SD
#CassandraSummit
![Page 3: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/3.jpg)
Instaclustr
• Instaclustr provides Cassandra-as-a-service in the cloud (Currently only on AWS — Google Cloud in private beta)
• We currently manage 50+ Cassandra nodes for various customers
• We often get requests to do cool things — and try and make it happen!
![Page 4: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/4.jpg)
Multi-DC @ Instaclustr• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud
• Works out-of-the-box today.
• Requires per-node public IP
• Private network clusters ⇄ Cloud clusters
• Easy if your private network allocates per-node public IP addresses
• VPNs
• Something else?
![Page 5: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/5.jpg)
• Overview of multi- region/data centre clusters
• What is supported out-of-the-box
• Alternative solutions
• Supporting technology overview (NAT/PAT and DNS-SD)
• Implementation
![Page 6: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/6.jpg)
Single Node
• What you get from running apt-get install cassandra and /usr/bin/cassandra
• Fragile (no redundancy)
• Dev/test/sandbox only
C*
![Page 7: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/7.jpg)
Multi-node, Single Data Centre• Two or more servers running
Cassandra within one DC
• Replication of data (redundancy)
• Increased capacity (storage + throughput)
• Baseline for production clusters
C* C*
C*
![Page 8: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/8.jpg)
Multi-node, Multi-DC
• Cassandra running in two or more data centres
• Global deployments
• Data near your customers (reduced latency)
• Supported out-of-the-box
C* C*
C*
C* C*
C*
C* C*
C*
![Page 9: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/9.jpg)
Snitches• Understands data centres and racks
• Implementation may automatically determine node DC and rack (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file)
• Node DC and rack is advertised via Gossip
• Determine node proximity (estimated link latency)
• Cluster may use a combination of Snitch implementations
![Page 10: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/10.jpg)
Data Centres
• Collection of Racks
• Complete replications
• Geographically separate
• Possibly high-latency interconnects (e.g. East Coast US → Sydney, ~300ms round-trip)
![Page 11: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/11.jpg)
Racks
• Collection of nodes
• May fail as a single unit
• Modelled on the traditional DC rack/cage (n-servers running of a UPS)
![Page 12: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/12.jpg)
☁• Amazon Web Services
(use EC2MultiRegionSnitch)
• Data Centre ≡ AWS Region(e.g. US_East_1, AP_SOUTHEAST_2)
• Rack ≡ Availability Zone(e.g. us-east-1a, ap-southeast-2b)
• Google Cloud Platform(no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!)
• Data Centre ≡ GCP Region(e.g. US, Europe)
• Rack ≡ Zone(e.g. us-central1-a, europe-west1-a)
![Page 13: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/13.jpg)
Data Centre Aware• Cassandra is data centre aware
• Only fetch data from a remote DC if absolutely required (remote data is more “expensive”)
• Clients can be made data centre aware
• If your app knows its DC, client will talk to the closest DC
![Page 14: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/14.jpg)
Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();
![Page 15: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/15.jpg)
Multi DC Support
• Per-node public (internet-facing) IP address
• Optionally, per-node private IP address
• Per-node public address is used for inter-data centre connectivity
• Per node private address is used for intra-data centre connectivity
![Page 16: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/16.jpg)
Multi DC Support• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional
• Easy to setup per-node public and private addresses
• Private network clusters ⇄ Cloud clusters
• Private networks: 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often 𝑥 > 𝑛)
• done via Network Address Translation
![Page 17: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/17.jpg)
IPv4 Address Space Exhaustion
Source: http://www.potaroo.net/tools/ipv4/
![Page 18: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/18.jpg)
Multi-DC Support
• IPv4
• Address exhaustion
• Over time, will become more expensive to purchase addresses
• Wasteful(being a good internet citizen)
![Page 19: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/19.jpg)
Alternatives• IPv6
• Java supports it ∴ Cassandra probably supports it (untested by us)
• Global IPv6 adoption is ~4%(according to Google — google.com/intl/en/ipv6/statistics.html)
• IPv6/IPv4 hybrid(Teredo, 6over4, et. al.)
• AWS EC2 does not support IPv6. End of story. (Elastic Load Balancer does support IPv6)
![Page 20: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/20.jpg)
Alternatives• VPNs
• tinc, OpenVPN, etc.
• All private address space — no dual addressing
• Requires multiple links — between every DC and per client
• Address space overlaps between multiple VPNs
• Connectivity to multiple clusters an issue (for multi-cluster apps, centralised monitoring, etc)
![Page 21: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/21.jpg)
Data Centres Links
3 3
5 10
7 21
![Page 22: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/22.jpg)
Alternatives
• Network Address Translation (NAT)(aka IP Masquerading or Port Address Translation (PAT))
• Deployed on most private networks
• Connectivity between private network clusters ⇄ Cloud clusters
• Supports client connectivity to multiple clusters
![Page 23: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/23.jpg)
NAT Basics• Re-maps IP address spaces
(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*)
• 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often n = 1, 𝑥 > 𝑛)
• Port Address Translation
• Private port ↔ Public port
• Outbound connections only without port forwarding or NAT traversal
• Per DC gateway device — performs NAT and port forwarding
![Page 24: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/24.jpg)
NAT with Inbound Connections
• Static port forwarding(configured on the gateway)
• Automatic port forwarding — UPnP, NAT-PMP/PCP (configured by the application, e.g. Cassandra)
• NAT Traversal — STUN, ICE, etc.
![Page 25: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/25.jpg)
NAT + C∗
Situation: 𝑛 Cassandra nodes, 1 public address per data centre
• Port forward different public ports for each node
• Advertise assigned ports
• Modify Cassandra and client applications to connect to advertised ports
![Page 26: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/26.jpg)
Advertising Port Mappings• Extend Cassandra Gossip
• Include port numbers in node address announcements
• Allow seed node addresses to include port numbers
• Allow multiple nodes to have identical public & private addresses(only port numbers differ per DC)
• How to bootstrap? SIP?
• Cassandra must be aware of the allocated ports in order to advertise
• Hard if C* is not directly responsible for the port mapping (e.g. static port forwarding)
• Too many modifications to internals
![Page 27: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/27.jpg)
Advertising Port Mappings• DNS-SD — dns-sd.org
(aka Bonjour/Zeroconf)
• Reads — works with existing DNS implementations(it’s just a DNS query)
• Even inside restrictive networks, DNS usually works
• Combination of DNS TXT, SRV and PTR records.
• Updates
• via DNS Update & TSIG — supported by bind
• via API — e.g. for AWS Route 53
![Page 28: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/28.jpg)
Advertising Port Mappings• DNS-SD cont’d.
• SRV records contain hostname and port(i.e., hostname of the NAT gateway and public C* port)
• TXT records contain key=value pairs(useful for additional connection & config details)
• Modify C* connection code to lookup foreign node port from DNS
• Modify client driver connection code to lookup ports from DNS
• Can be queried & updated out-of-band(updated by the NAT device or central management server which knows which ports were mapped)
![Page 29: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/29.jpg)
Advertised Details• Each cluster is it’s own browse domain
• Each NAT gateway device has an A record in the browse domain
• Each DNS-SD service is named based on the private IP address
• Requires unique private IP addresses across data centres
• SRV port is the C* thrift port
• Additional ports are advertise via TXT
![Page 30: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/30.jpg)
Configuration• Cassandra is configured to only use private addresses
• On cluster creation
• Establish a new DNS-SD browse domain
• Create A records for each gateway device
• NAT gateway device is notified when a new C* node is started
• Allocates random public ports for C* and configures Port Forwarding
• Updates DNS-SD
• New SRV and TXT record
![Page 31: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/31.jpg)
$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Browsing for _cassandra._tcp
A/R Flags if Domain Service Type Instance NameAdd 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3
$ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237
$ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
Non-authoritative answer:Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.auAddress: 54.209.123.195
Output of dns-sd (Can also use avahi-browse, dig, or any other DNS query tool)
![Page 32: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/32.jpg)
Java Driver Modifications
• This is usually a no-op (the default is IdentityTranslater)
• Modify translate() to perform a DNS-SD lookup.
• The address parameter is a node private IP address.
• Locate a service with a name = private IP address to determine public IP/port.
public interface AddressTranslater { public InetSocketAddress translate(InetSocketAddress address); }
![Page 33: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/33.jpg)
Modifying Cassandra
• Responsible for managing Socket connections.
• Modify newSocket() to perform a DNS-SD lookup.
• The endpoint parameter is a node private IP address.
• Locate a service with a name = private IP address to determine public IP/port
public class OutboundTcpConnectionPool{
⋮ public static Socket newSocket(InetAddress endpoint) throws IOException {…} ⋮ }
![Page 34: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z](https://reader033.vdocuments.us/reader033/viewer/2022050314/5f770ea4147af10df51fb6b3/html5/thumbnails/34.jpg)
C* C*
C*
C* C*
C*
NAT Gateway NAT Gateway
DNS (+ DNS-SD) Server (Route 53, Self-hosted, etc)Client
Application