robotron: top-down network - sigcommconferences.sigcomm.org/.../session05-paper02... · robotron:...
TRANSCRIPT
![Page 1: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/1.jpg)
![Page 2: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/2.jpg)
Robotron: Top-down Network Management at Scale
Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, HongyiZengACM SIGCOMM 2016August 25, 2016
![Page 3: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/3.jpg)
Scale of Facebook Community
1.7 Billion 500 Million 1 Billion1 Billionon Facebook Monthly on Whatsapp Monthly on Instagram Monthly on Messenger Monthly
![Page 4: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/4.jpg)
Network Management at Facebook
`
...
...
...
...
...
...
...
...
...R
...R
...
...
1 511
512 1024
• Goals: Build and evolve FB network• Example tasks: circuit/device
turnup, network monitoring• Human interactions -> outages
What’s involved?
![Page 5: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/5.jpg)
• Distributed Configurations• Multiple Domains• Versioning• Dependency• Vendor Differences
Network Management at FacebookWhy is it hard?
![Page 6: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/6.jpg)
Network Management at Facebook
2004-2007 2008 2009 2010 2011 2012 2013 2014 2015
ManualConfigurationandMonitoringwithad-hocscripts
Early days…
![Page 7: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/7.jpg)
Contribution
2004-2007 2008 2009 2010 2011 2012 2013 2014 2015
ManualConfigurationandMonitoringwithad-hocscripts
Robotronstarted
OurPaper
• Shed light on• Network management tasks• Robotron’s usage• Evolution of Roboron• Our experiences using Robotron
![Page 8: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/8.jpg)
Overview of Facebook’s NetworkLifecycle of user requests
POPsInternet Backbone Data CentersUsers
![Page 9: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/9.jpg)
Point of Presence (POP)
POPsInternet Backbone Data CentersUsers
• Standardized topology• Services: LB, Cache• Common tasks• Build/upgrade a cluster• Provisioning new peering
circuits
![Page 10: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/10.jpg)
Backbone
POPsInternet Backbone Data CentersUsers
• Irregular, demand-driven topology• Common tasks:• Add/migrate circuits• Add/remove routers
![Page 11: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/11.jpg)
Datacenter
POPsInternet Backbone Data CentersUsers
• Standardized topology• Services: Web, Cache,
Database• Common tasks• Build/decomm a cluster• Cluster capacity upgrade
![Page 12: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/12.jpg)
POP
Overview of Facebook’s Network
0
0.2
0.4
0.6
0.8
1
# o
f cl
ust
ers
(norm
aliz
ed)
Time
Gen3V6Gen3
Gen2V6Gen2-DGen2-CGen2-BGen2-A
Gen1
0
0.2
0.4
0.6
0.8
1
# o
f cl
ust
ers
(n
orm
aliz
ed
)
Time
Gen2Gen1
(normalize
d)
DC
Multiple versions of FB cluster architectures co-exist
8 generations
![Page 13: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/13.jpg)
Robotron: “Top-Down” Network Management System@FBOverview
FBNet DB
NetworkDesign
ConfigGeneration Deployment Monitoring
![Page 14: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/14.jpg)
FBNet: Modeling the NetworkExample 4-post POP cluster
20G
Internet
PSWa PSWb PSWc PSWd
PR1
BB1 BB2
To Top-of-Rack switches & servers
PR2
4-post POPCluster
![Page 15: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/15.jpg)
NetworkswitchLinecard
PhysicalInterface
PhysicalInterface
AggregatedInterface
V6Prefix
BgpV6Session
Circuit
Circuit
FBNet: Modeling the NetworkObject
PR1PSWa
10G
10Get1/1
et1/2
et2/1
et3/1
ae0 ae12001::1 2001::2
eBGP session
Linecard
Circuit
![Page 16: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/16.jpg)
name=PSWaslot=1
model=X
name=et1/1
name=et1/2
name=ae0
prefix=2001::1
NetworkswitchLinecard
PhysicalInterface
PhysicalInterface
AggregatedInterface
V6Prefix
BgpV6Session
speed=10G
Circuit
speed=10G
Circuit
FBNet: Modeling the NetworkValue
PR1PSWa
10G
10Get1/1
et1/2
et2/1
et3/1
ae0 ae12001::1 2001::2
eBGP session
Linecard
Circuit
![Page 17: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/17.jpg)
name=PSWaslot=1
model=Xdevice=
name=et1/1linecard=
agg_interface=
name=et1/2agg_interface=
linecard=
name=ae0
prefix=2001::1interface=
a_prefix=z_prefix=
NetworkswitchLinecard
PhysicalInterface
PhysicalInterface
AggregatedInterface
V6Prefix
BgpV6Session
a_endpoint=z_endpoint=speed=10G
Circuit
a_endpoint=z_endpoint=speed=10G
Circuit
FBNet: Modeling the NetworkRelationship
PR1PSWa
10G
10Get1/1
et1/2
et2/1
et3/1
ae0 ae12001::1 2001::2
eBGP session
Linecard
Circuit
It’s complicated
![Page 18: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/18.jpg)
FBNet Model Snippet
class PhysicalInterface(Interface):linecard = models.ForeignKey(Linecard)agg_interface = models.ForeignKey(
AggregatedInterface)
![Page 19: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/19.jpg)
FBNet Model SnippetRelated models
class PhysicalInterface(Interface):linecard = models.ForeignKey(Linecard)agg_interface = models.ForeignKey(
AggregatedInterface)
![Page 20: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/20.jpg)
FBNet Model SnippetModel inheritance
class PhysicalInterface(Interface):linecard = models.ForeignKey(Linecard)agg_interface = models.ForeignKey(
AggregatedInterface)
![Page 21: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/21.jpg)
FBNet
FBNet: ArchitectureAPI Layer
ReadAPIReadAPIReadServiceReadAPIReadAPIWriteService• RPC services
• Read: fine-grained per-model query
• Write: task-based• High Availability: Multiple
replicas per DC
![Page 22: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/22.jpg)
FBNet
FBNet: ArchitectureAPI Layer
ReadAPIReadAPIReadServiceReadAPIReadAPIWriteService• 1 primary, multiple secondary
DBs• Scalability: 1 slave per DC
Primary SlaveSlaveSecondary
ReplicationStream
![Page 23: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/23.jpg)
Robotron’s management life cycle
NetworkDesign
ConfigGeneration
FBNet DB
Deployment Monitoring
![Page 24: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/24.jpg)
Network DesignDesign intent à FBNet objects
Cluster(devices={
PR: DeviceSpec(hardware=“Router_Vendor1”num_devices=2)
PSW: DeviceSpec(hardware=“Switch_Vendor2”num_devices=4)
},Link_groups=[
LinkGroup(a_device=PR,z_device=PSW,pifs_per_agg=2,ip=V6)
])
Template for a POP cluster FBNet objects
BackboneRouters:2NetworkSwitches:4
Circuits:16PhysicalInterfaces:32
AggregatedInterfaces:16V6Prefixes:16
BgpV6Sessions:8
94 objectsacross7models
PR1 PR2
PSWa PSWb PSWc PSWd
![Page 25: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/25.jpg)
Config GenerationFBNet objects à Device configs
PR1 PR2
PSWa PSWb PSWc PSWd
FBNet
FBNet objects
Per-deviceobjects
Vendoragnostic
Config Schema
PR1 PSWa
PSWc
PSWb
PSWd
PR2
struct Device {1: list<AggregatedInterface> aggs,
}struct AggregatedInterface {1: string name,2: i32 number,3: string v4_prefix,4: string v6_prefix,5: list<PhysicalInterface> pifs,
}struct PhysicalInterface {1: string name,
}
![Page 26: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/26.jpg)
Config GenerationFBNet objects à Device configs
Vendor 1 Vendor 2
Config Schema
interfacetemplate
BGPtemplate
MPLStemplate…
interfacetemplate
BGPtemplate
MPLStemplate…
PR1 PR2
PSWa PSWb PSWc PSWd
FBNet
PR1 PSWa
PSWc
PSWb
PSWd
PR2
FBNet objects
Per-deviceobjects
Vendoragnostic
PR1 config
PR2 config
PSWa config PSWb config
PSWc config PSWd config
Vendor-specificDeviceConfigs
VendorSpecific
{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown
!{% endfor %}
![Page 27: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/27.jpg)
Config GenerationFBNet objects à Device configs
Vendor 1 Vendor 2
Config Schema
interfacetemplate
BGPtemplate
MPLStemplate…
interfacetemplate
BGPtemplate
MPLStemplate…
PR1 PR2
PSWa PSWb PSWc PSWd
FBNet
PR1 PSWa
PSWc
PSWb
PSWd
PR2
FBNet objects
Per-deviceobjects
Vendoragnostic
PR1 config
PR2 config
PSWa config PSWb config
PSWc config PSWd config
Vendor-specificDeviceConfigs
VendorSpecific
{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown
!{% endfor %}
![Page 28: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/28.jpg)
Config GenerationFBNet objects à Device configs
Vendor 1 Vendor 2
Config Schema
interfacetemplate
BGPtemplate
MPLStemplate…
interfacetemplate
BGPtemplate
MPLStemplate…
PR1 PR2
PSWa PSWb PSWc PSWd
FBNet
PR1 PSWa
PSWc
PSWb
PSWd
PR2
FBNet objects
Per-deviceobjects
Vendoragnostic
PR1 config
PR2 config
PSWa config PSWb config
PSWc config PSWd config
Vendor-specificDeviceConfigs
VendorSpecific
{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown
!{% endfor %}
![Page 29: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/29.jpg)
Config GenerationFBNet objects à Device configs
Vendor 1 Vendor 2
Config Schema
interfacetemplate
BGPtemplate
MPLStemplate…
interfacetemplate
BGPtemplate
MPLStemplate…
PR1 PR2
PSWa PSWb PSWc PSWd
FBNet
PR1 PSWa
PSWc
PSWb
PSWd
PR2
FBNet objects
Per-deviceobjects
Vendoragnostic
PR1 config
PR2 config
PSWa config PSWb config
PSWc config PSWd config
Vendor-specificDeviceConfigs
VendorSpecific
{% for agg in device.aggs %}interface {{agg.name}}mtu 9192no switchportload-interval 30{% if agg.v4_prefix %}ip addr {{agg.v4_prefix}}{% endif %}{% if agg.v6_prefix %}ipv6 addr {{agg.v6_prefix}}{% endif %}no shutdown
!{% endfor %}
![Page 30: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/30.jpg)
• # of FBNet model change?• # changed FBNet objects per design change?• Frequency and size of config change?
Usage Statistics
![Page 31: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/31.jpg)
FBNet Model ChangesHow much does FBNet model change over time?
• Still many changes over time• Reasons: new models, values, relationships
![Page 32: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/32.jpg)
Design ChangesHow many FBNet object are changed per design change?
0
0.25
0.5
0.75
1
1 10 100 1,000 10,000
CD
F ac
ross
des
ign
chan
ges
# of FBNet objects
AllInterface
Circuitv6 Prefixv4 Prefix
Device
0
0.25
0.5
0.75
1
1 10 100 1,000 10,000C
DF
acro
ss d
esig
n ch
ange
s
# of FBNet objects
AllInterface
Circuitv6 Prefixv4 Prefix
Device
POP/DC
Backbone
![Page 33: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/33.jpg)
Design ChangesHow many FBNet object are changed per design change?
0
0.25
0.5
0.75
1
1 10 100 1,000 10,000
CD
F ac
ross
des
ign
chan
ges
# of FBNet objects
AllInterface
Circuitv6 Prefixv4 Prefix
Device
0
0.25
0.5
0.75
1
1 10 100 1,000 10,000C
DF
acro
ss d
esig
n ch
ange
s
# of FBNet objects
AllInterface
Circuitv6 Prefixv4 Prefix
Device
POP/DC
Backbone
• POP/DC: bigger design changes• Backbone: smaller design changes
![Page 34: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/34.jpg)
• Median number of config lines changed per week• POP/DC devices: 500 lines• Backbone devices: <100 lines
• Avg number of times changes happen per week• POP/DC devices: 2.53• Backbone devices: 12.46
Configuration ChangesWhat’s the frequency and size of configuration change?
• POP/DC: few bigger config changes• Backbone: many smaller config changes
![Page 35: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/35.jpg)
Evolution of RobotronBottom-up, experience driven
2008 2009 2010 2011 2012 2013 2014 2015 2016
FBNetmodelingstarted
Activemonitoring
Passivemonitoring
BasicDeployment
Basicdesignandconfiggeneration
Robotron
![Page 36: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/36.jpg)
• A new eBGP session needed a proper import policy• Robotron was used without proper support à egress link
saturated• Most development time spent on model changes
Experience: Modeling is laboriousProblem Scenario: new eBGP session configuration
• Lesson: Modeling is hard• Open problem: Lack of a network model
widely accepted by vendors
![Page 37: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/37.jpg)
1. An engineer updated FBNet to add a new rack, but forgot to generate config
2. The engineer pushed stale config3. The rack added never came online
Experience: Coupling changes is keyProblem Scenario: POP cluster switch turnup
• Lesson: Network design, config generation and deployment should be tightly coupled
• Open problem:• Atomicity• Conflict resolution
![Page 38: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/38.jpg)
• Engineer bypassed Robotron to manually configure devices• SSH into device• Make config change• Log out
• Needed upon emergencies• Passively curtail with config monitoring
Experience: Fallback is importantProblem Scenario: Robotron-less management
• Lesson: Bypassing mechanism is needed • Open problem:• How to reliably account for such
activities?• How to safely revert such activities?
![Page 39: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/39.jpg)
• First work sharing experience on a production network management system• Open research problems:• Network modeling• Atomicity and conflict resolution across management tasks• Make network management system work with manual fallback
mechanisms
Conclusion
![Page 41: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/41.jpg)
• Irregular, demand-driven topology• PRs/DRs form an iBGP
mesh• Common tasks:• Add/migrate circuits• Add/remove
BBs/PRs/DRs
Overview of Facebook’s NetworkBackbone: Interconnecting POPs/DCs
BB
BBBB
BB
BB
BB
BB
PR1
PR2
To POPs & Internet
DR1
DR2
To DCs
![Page 42: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/42.jpg)
• Standardized topology• Services: LB (Proxygen),
Cache• Common tasks• Build/upgrade a cluster• Provisioning new peering
circuits
Overview of Facebook’s NetworkPoint of Presence (POP)
Internet
PR1
BB1 BB2
PR2
POPClusters
![Page 43: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/43.jpg)
• Standardized topology• Services: Web, Cache (TAO),
Database• Common tasks• Build/decomm a cluster• Cluster capacity upgrade
Overview of Facebook’s NetworkData Center
DR1
BB3 BB4
DCClusters
DR2
![Page 44: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/44.jpg)
FBNet: Modeling the NetworkObject, Value, and Relationship
PR1PSWa
10G
10Get1/1
et1/2
et2/1
et3/1
ae0 ae12001::1 2001::2
eBGP session
Linecard
Circuitname=PSWa
slot=1model=Xdevice=
name=et1/1linecard=
agg_interface=
name=et1/2agg_interface=
linecard=
name=ae0
prefix=2001::1interface=
a_prefix=z_prefix=
NetworkswitchLinecard
PhysicalInterface
PhysicalInterface
AggregatedInterface
V6Prefix
BgpV6Session
a_endpoint=z_endpoint=speed=10G
Circuit
a_endpoint=z_endpoint=speed=10G
Circuit
![Page 45: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/45.jpg)
Dependencies between FBNet models
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30CD
F a
cro
ss m
od
els
# of related models
![Page 46: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/46.jpg)
• Manual config changes on devices are error-prone• Ideal: All changes made through Robotron• Reality: Robotron has latency, bugs and missing features. Quick fixes
needed upon emergency• Alternatives to discourage manual changes:• Config monitoring• Automatic config override after emergency window
Experience: Fallback is neededProblem Scenario: manual changes to devices
![Page 47: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/47.jpg)
• Bottom-up config analysis: [Benson11,Sung09,Kim11,…]• Abstraction-driven design and config generation:• Top down config optimization: [Condor, Sun13]• Centralized platform for network management: [Onix,
Statesman]• Template based config generation: [Enck09]• Config modeling: [OpenConfig, DMTF]
Related Work
![Page 48: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/48.jpg)
FBNet
Desired
FBNet: Modeling the NetworkDesired versus Derived
A
B C
Derived
A
B C=?
![Page 49: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/49.jpg)
• New device: full config replacement• Existing devices: Incremental “Live” updates• Dryrun, Atomic, Phased, etc
DeploymentDevice configs à Devices
![Page 50: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/50.jpg)
• Passive monitoring• Active monitoring• Config monitoring
MonitoringIs the network healthy?
![Page 51: Robotron: Top-down Network - SIGCOMMconferences.sigcomm.org/.../Session05-Paper02... · Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y](https://reader030.vdocuments.us/reader030/viewer/2022040903/5e765b30f6588a1b6a725770/html5/thumbnails/51.jpg)