wide-area route control for distributed services

Download Wide-Area Route Control for Distributed Services

If you can't read please download the document

Upload: sheryl

Post on 08-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Vytautas Valancius, Nick Feamster, Akihiro Nakao, and Jennifer Rexford. Wide-Area Route Control for Distributed Services. Cloud Computing. Cloud computing is on the rise Provides computing resources and storage in cloud data centers Hosting on the steroids for Internet services. - PowerPoint PPT Presentation

TRANSCRIPT

  • Vytautas Valancius, Nick Feamster, Akihiro Nakao, and Jennifer Rexford

  • Cloud computing is on the rise

    Provides computing resources and storage in cloud data centers

    Hosting on the steroids for Internet services*

  • *Cloud Data CenterData Center RouterInternetRouting updatesPacketsISP1ISP2Hosted services have different requirementsToo slow for interactive service, orToo costly for bulk transfer!

  • Multiple upstream ISPsAmazon EC2 has at least 58 routing peers in Virginia data center

    Data center router picks one route to a destination for all hosted servicesPackets from all hosted applications use the same path*

  • Obtain connectivity to upstream ISPsPhysical connectivityContracts and routing sessions

    Obtain the Internet numbered resources from authorities

    Expensive and time-consuming!*

  • **Cloud Data CenterInternetISP1ISP2Virtual Router BVirtual Router ATransit PortalRoutesPacketsFull Internet route control to hosted cloud services!

  • Motivation and Overview

    Connecting to the Transit Portal

    Advanced Transit Portal Applications

    Scaling the Transit Portal

    Future Work & Summary *

  • Separate Internet router for each serviceVirtual or physical routers

    Links between service router and TPEach link emulates connection to upstream ISP

    Routing sessions to upstream ISPsTP exposes standard BGP route control interface*

  • Transit PortalVirtual BGP Router*Cloud client with two upstream ISPsISP 1 is preferredISP 1 exhibits excessive jitterCloud client reroutes through ISP 2ISP 1ISP 2BGPSessionsTraffic

  • Server with custom routing software4GB RAM, 2x2.66GHz Xeon coresThree active sites with upstream ISPsAtlanta, Madison, and PrincetonA number of active experimentsBGP poisoning (University of Washington)IP Anycast (Princeton University)Advanced Networking class (Georgia Tech)*

  • Internet services require fast name resolution

    IP anycast for name resolutionDNS servers with the same IP addressIP address announced to ISPs in multiple locationsInternet routing converges to the closest server

    Available only to large organizations*

  • ISP1ISP2ISP3ISP4Transit PortalTransit PortalAsiaNorth AmericaAnycast Routes*TP allows hosted applications use IP anycast

  • Internet services in geographically diverse data centers

    Operators migrate Internet users connections

    Two conventional methods:DNS name re-mappingSlowVirtual machine migration with local re-routingRequires globally routed network*

  • ISP1ISP2ISP3ISP4Transit PortalTransit PortalAsiaNorth AmericaTunneled Sessions*Internet

  • Scale to dozens of sessions to ISPs and hundreds of sessions to hosted servicesAt the same time:Present each client with sessions that have an appearance of direct connectivity to an ISP

    Prevented clients from abusing Internet routing protocols*

  • Conventional BGP router:Receives routing updates from peersPropagates routing update about one path onlySelects one path to forward packetsScalable but not transparent or flexible*ISP1ISP2BGP RouterUpdatesClient BGP RouterClient BGP RouterPackets

  • Routing ProcessStore and propagate all BGP routes from ISPsSeparate routing tablesReduce memory consumptionSingle routing process - shared data structuresReduce memory use from 90MB/ISP to 60MB/ISP*ISP1ISP2Virtual RouterVirtual RouterRouting Table 1Routing Table 2

  • Routing ProcessHundreds of routing sessions to clientsHigh CPU load

    Schedule and send routing updates in bundles Reduces CPU from 18% to 6% for 500 client sessions*ISP1ISP2Virtual RouterVirtual RouterRouting Table 1Routing Table 2

  • Forwarding TableConnecting clientsTunneling and VLANs

    Curbing memory usageSeparate virtual routing tables with default to upstream50MB/ISP -> ~0.1MB/ISP memory use in forwarding table*ISP1ISP2Virtual BGP RouterVirtual BGP RouterForwarding Table 1Forwardng Table 2

  • Future work:More deployment sitesMaking TP accessible for network research test-beds (e.g., GENI, CoreLab)Faster forwarding (NetFPGA, OpenFlow)Lightweight interface to route control*

  • Limited routing control for hosted servicesTransit Portal gives wide-area route controlAdvanced applications with many TPsOpen source implementationScales to hundreds of client sessionsThe deployment is realCan be used today for research and educationMore information http://valas.gtnoise.net/tp*Questions?

  • Used in a Next-Generation Internet Course at Georgia Tech in Spring 2010

    Students set up virtual networks and connect directly to TP via OpenVPNLive feed of BGP routesRoutable IP addresses for in class topology inference and performance measurements

    *

  • *Three active sitesPrinceton, NJAtlanta, GAMadison, WI

    Internet numbered resources from ARINAS 47065IPv4 prefix: 168.62.16.0/21

    Whether you like the cloud buzzword or not, the cloud computing is here.

    Cloud computing is supported by many industry players in different forms from pure or virtualized hardware access to application execution engines.

    All these cloud platforms have something in common. They are replacing tradictional company owned data centers, and what is more important they lower the entry barrier to small and innovative startups.

    In general, most cloud platforms provide computing resources, storage, but rather limited networking control, which will be the focus of our todays talk.

    **So what precisely is the problem?

    Imagine you have a cloud facility and two users in it. Startup A and startup B. One is offering bulk transfer services (e.g. file transfer) and another offers interactive service (e.g. VoIP servivce provider). You might imagine that routing requirements for these two services are different.

    Now cloud provider has dozens of upstreams. For example, using Routeviews data we estimated that Amazon EC2 datacenter in virginia has at least 58 peering connections with upstream ISPs. This number is a lower bound since route views show only partial number of peerings.

    What matters however is that EC2 selects one route to a given destination for all of its services. The problem is that no-one controls that selection but Amazon. For example Amazon might select cheapest route but as we know cheapest route is not always the best.*Maybe you dont need this slide. You could just say it out loud on the previous slide.

    Add more. List this pain in the ass.

    So lets say we need route control. Lets say we want the cheapest providers. Or lets say we want the best performance. In any case, if you want more control you need to get the following:Internet numbered resources, such as AS number and IP addreses. It is not easy, in the States you need to present ARIN with business certificate and a contract with two ISPs to get an AS number. To get an IP prefix that you could route you need to explain to ARIN why you need it.

    You also need to acquire space in colocation facility where you are going to put your router, which you also need to purchase.

    You also need to get connectivity to collocation facility, to connect your servers in your data center or cloud and the routers in colocation facility. Sometimes this connectivity to popular colos might be very expensive.

    You need to get the named contracts with upstream ISPs, negotiate prices etc.

    All that is very expensive and infeasible to startups and smaller companies.*Hopefully by now we see that some services might need wide-area route control.

    To understand why is it challenging to use existing platforms to provide wide-area route control the cloud users, we first need to go over briefly

    This slide was a bit boring you should perhaps explain why (or in some more detail) what you are going to describe for each part of the talk (e.g., to understand TP, we first need to understand how Internet routing works). Etc. Introduction of the word BGP here is a bit abrupt. Also, what is control plane and data plane? This audeience will not know.

    So given this high level motivation and the idea how TP works, we need to delve deeper.

    We will discusss how TP can provide cloud services with transparent routing. More specifically we will talk on how control plane work and how data plane works.

    We will then briefly mention how TP performs and the present some advanced transit portal applications.

    Well finish with future work and summary.*For example, your application want to control the incoming traffic. Your application might be extremely sensitive to jitter, while most other applications are doing just fine with the cheaper provider.

    **In contranst TP has higher requirements than a standard router.

    We need to avoid selecting single route, we need to send all of them while having only single session to upstream ISPs.

    To save memory we need to make sure we reuse redundant information in routes: For 20 routing tables shared memory saves about 30% of space: 1800MB to 1100 MB reduction.

    To save CPU we need to use update grouping: For 500 sessions CPU savings are around 60% (or about 9 percentage points to 12% CPU load).

    Since clients dont own resources we need to use some update rewriting to trick BGP.

    *In contranst TP has higher requirements than a standard router.

    We need to avoid selecting single route, we need to send all of them while having only single session to upstream ISPs.

    To save memory we need to make sure we reuse redundant information in routes: For 20 routing tables shared memory saves about 30% of space: 1800MB to 1100 MB reduction.

    To save CPU we need to use update grouping: For 500 sessions CPU savings are around 60% (or about 9 percentage points to 12% CPU load).

    Since clients dont own resources we need to use some update rewriting to trick BGP.

    *As you might remember, clients need to perceive connection to transit portal as direct connection to ISP. To accomplish that, we setup separate tunnels*Maybe delete?

    Merge with deployment.