keeping the internet fast and resilient for you and your customers
TRANSCRIPT
Unreliable Internet Nick Wondra @ Cloudflare
Martin J. Levy @ Cloudflare
November 2017
Todays agenda
● Introduction (Tim Fong)
○ Why does the Internet sometime “misbehave” when it
comes to delivering applications?
○ What are some ways to solve this?
● Martin J. Levy (20 min)
○ The Internet and how it’s tied together
○ BGP and topology
○ Testing (example of tools and techniques)
● Nick Wondra (20 min)
○ Approaches to solve the problem
○ Examples of mechanisms in place
● Summary (5 min)
● Audience Q/A (10 min)
The Internet and how it’s tied togetherMartin J. Levy
The Internet
● Technically – a somewhat complex subject○ The Internet is a collection of networks
○ No network stands alone (all interconnected)
○ Robustness can be created
○ Multi-homing (more than one transit/path)
○ Peering between “like” networks
○ Diversity (physical and logical)
○ Nothing is static!
● Internet was developed for something different
● Many types of data (and data layers)
● TCP/UDP vs FTP/HTTP/SMTP vs TLS vs XML/JSON
The Internet - just how complex? (hint: very!)This is the representation of
a single network (a medium
sized telco) and its
interconnections globally to
various other backbone
networks.
A full diagram would have
upwards of 60,000
independent networks
depicted on a single
diagram, which is hard to
follow.
Glueing the Internet together - BGP routing
● The IETF specified a protocol (BGP4) that can handle:○ Massive routing tables
○ CIDR routing (ability to specify IP network address plus a network size)
○ IPv4 & IPv6
○ Rules for routing internally within a network
○ Rules for routing to an external network
○ Much more!
● BGP in real-life is used by every network on the Internet○ Every destination on the globe exists within the BGP global routing tables
○ Everything is public, visible, exposed, and recorded
What works? What breaks? What’s the fix?
● There’s no steady state within the Internet
○ The path from A to Z is forever changing. Sometime for the better
○ The BGP routing protocol has to address many factors:
■ Physical interruptions (a fiber break)
■ Planned maintenance (upgrades to facilities or services)
■ Increases in capabilities - for example, a new undersea cable
■ Third party “hiccups”
○ Commercial agreements (and disagreements)
■ Purchasing from a different Internet service provider
■ Ending contracts and changing service providers
● What we do know is that it keeps our network engineers busy!
The Internet - it keeps on growing
A new undersea is laid
between the African
coast and the
Seychelles island
(replacing a satellite
connection)
The Internet - When it breaks, it breaks!
This is an example of what happens each-and-every day all
around the globe. The physical layer of the Internet is fragile.
All those bright spray-painted lines you see on a street (before
someone digs it up) is meant to stop this from happening.
It doesn’t!
Protocol stack - what’s above the physical layer
● Layers provide capabilities○ Application - the end-users view
○ Transport - HTTP & TLS
○ Internet - IP and routing
○ Data link - that fiber in the ground
● Each layer has its possible failures
Physical
Data Link
Network
Transport
Session
Presentation
Application
OSI Network Model
Data Link
Internet
Transport
Application
The TCP/IP Model
Distance, Latency, Variable Paths, and more
150 msec
70 msec
230 msec
400 msec
● Speed of light○ Very constant![1]
● Distance ~= Hops○ Reliability decreases
● Variable paths
○ Redundancy .vs.
○ Non-deterministic
● Variable providers
○ Sometimes useful
[1] https://www.quora.com/What-is-precisely-the-speed-of-light-in-fiber-optics
Monitoring tools and more
● Beyond ping - or what’s really happening to your packets?
[1] http://bgp.he.net/
[2] http://atlas.ripe.net/
[3] http://stat.ripe.net/
Approaches to solve the problemNick Wondra
Change the model of the Internet?
● Address the content, not the server○ Content centric networking, et al
○ Route requests based on content location
○ Content is decentralized, moves through the network
● Requires changes deep in the protocol stack○ … but lots of investment built into current infrastructure
Change the core Internet protocols?
● Can we build a better BGP?○ Low-level distance and performance metrics may not translate to
application performance
○ Many networks = many systems to change
● Can we build a better transport?○ TCP and UDP deeply ingrained in end-user systems and network
middleboxes (firewalls, LBs, WAN optimizers, etc)
● Evolve new solutions on top of existing frameworks○ Solve for problems in the malleable network layers
○ Example: TLS 1.3
■ More secure and faster (fewer RTTs)
○ Example: TCP+HTTP => UDP+QUIC
■ Speed: connection establishment, session multiplexing
■ Resilience: congestion control, forward error correction
■ Flexibility: connection migration
● The challenge is distribution○ Clients and servers must opt-in
Evolution, not revolution
● Cloudflare has Points of Presence (PoPs) across the globe○ PoPs close to every Internet user and server
○ Transit/peering with multiple networks at every PoP
○ Proxies 10% of all web requests
● Global Internet performance and reliability monitoring○ Real-time feedback as data traverses the network
○ Can “test” network paths that BGP wouldn’t use
○ Use performance metrics that matter to web applications (TTFB,
response time)
Value of a large global network
Global footprint = path control
● Force routing paths by pinning to intermediate PoPs
150ms
200ms
● Evolution inside the network, transparent to client and server
Global footprint = distribution channel
TCP+HTTPTCP+HTTP UDP+QUIC
SummaryNick Wondra & Martin J. Levy
Summary, Questions, and Thank You!
Martin J. Levy - Network Strategy
@Cloudflare
@mahtin
Nick Wondra - Systems Engineer
@Cloudflare
@nickwondra
Appendix
Additional Reading (via Cloudflare blog)
● Argo & Warp:○ https://blog.cloudflare.com/argo/
○ https://blog.cloudflare.com/the-making-of-cloudflare-warp/
● Railgun:○ https://blog.cloudflare.com/cacheing-the-uncacheable-cloudflares-railgun-
73454/
● Load Balancing:○ https://blog.cloudflare.com/introducing-load-balancing-intelligent-failover-with-
cloudflare/
● TLS:○ https://blog.cloudflare.com/introducing-tls-client-auth/