cisco aci multi-pod design and deployment · cisco aci multi-pod design and deployment john weston...
TRANSCRIPT
Cisco ACI Multi-Pod Design and Deployment
John Weston – Technical Marketing Engineer
BRKACI-2003
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Spark
Questions? Use Cisco Spark to communicate with the speaker after the session
1. Find this session in the Cisco Live Mobile App
2. Click “Join the Discussion”
3. Install Spark or go directly to the space
4. Enter messages/questions in the space
How
cs.co/ciscolivebot#BRKACI-2003
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Session Objectives
At the end of the session, the participants should be able to:
Articulate the different deployment options to interconnect Cisco ACI networks (Multi-Pod vs. Multi-Site)
Understand the functionalities and specific design considerations associated to the ACI Multi-Pod Fabric option
Initial assumption:
The audience already has a good knowledge of ACI main concepts (Tenant, BD, EPG, L2Out, L3Out, etc.)
4BRKACI-2003
• ACI Network and Policy Domain Evolution
• ACI Multi-Pod Deep Dive
Overview, Use Cases and Supported Topologies
APIC Cluster Deployment Considerations
Inter-Pod Connectivity Deployment Considerations
Control and Data Planes
Connecting to the External Layer 3 Domain
Network Services Integration
Migration Scenarios
Agenda
ACI Network and Policy Domain Evolution
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
ACI 1.1 - Geographically
Stretch a single Pod
DC1 DC2
ACI Stretched Fabric
APIC Cluster
ACI Single Pod Fabric
ACI 1.0 -
Leaf/Spine Single
Pod Fabric
ACI 2.0 - Multiple Networks
(Pods) in a single Availability
Zone (Fabric)
Pod ‘A’
MP-BGP - EVPN
…
IPNPod ‘n’
ACI Multi-Pod Fabric
APIC Cluster
ISE
ISE 2.1 & ACI 1.2
Federation of Identity
and Interconnect
TrustSec and ACI using
IP based EPG/SGT
ACI 3.1/3.2 - Remote Leaf
and vPod extends an
Availability Zone (Fabric) to
remote locations
Cisco ACI Fabric and Policy Domain Evolution
ACI 3.0 – Multiple Availability
Zones (Fabrics) in a Single
Region ’and’ Multi-Region
Policy Management
Fabric ‘A’
MP-BGP - EVPN
…
IPFabric ‘n’
ACI Multi-Site
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Single APIC Cluster/Single Fabric Multiple APIC Clusters/Multiple Fabrics
Pod ‘A’ Pod ‘n’
MP-BGP - EVPN
Multi-Pod (from 2.0 Release)
…
IPN
APIC Cluster
DC1 DC2
ACI Fabric
Stretched Fabric
APIC Cluster
IPFabric ‘A’ Fabric ‘n’
MP-BGP - EVPN
Multi-Site (3.0 Release, Q3CY17)
…
Fabric ‘n’Fabric ‘A’
Multi-Fabric (with L2 and L3 DCI)
L2/L3
DCI
Inter-Site
App
ACI
Multi-Site
Fabric and Policy Domain EvolutionDeployment Options
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKACI-2003
Terminology
Pod – A Leaf/Spine network sharing a common control plane (ISIS, BGP, COOP, …)
Pod == Network Fault Domain
Fabric – Scope of an APIC Cluster, it can be one or more Pods
Fabric == Availability Zone (AZ) or Tenant Change Domain
Multi-Pod – Single APIC Cluster with multiple leaf spine networks
Multi-Pod == Multiple Networks within a Single Availability Zone (Fabric)
Multi-Fabric – Multiple APIC Clusters + associated Pods (you can have Multi-Pod with Multi-Fabric)*
Multi-Fabric == Multi-Site == a DC infrastructure Region with multiple AZs
10* Available from ACI release 3.1
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
ACI Multi-SiteOverview
11BRKACI-2003
• Separate ACI Fabrics with independent APIC clusters
• ACI Multi-Site Orchestrator pushes cross-fabric
configuration to multiple APIC clusters providing
scoping of all configuration changes
• MP-BGP EVPN control plane between sites
• Data Plane VXLAN encapsulation across
sites
• End-to-end policy definition and enforcement
MP-BGP - EVPN
Availability Zone ‘A’ Availability Zone ‘B’
IP Network
VXLAN
Site 1 Site 2REST
API GUI
Multi-Site Orchestrator
For More Information on
ACI Multi-Site:
BRKACI-2125
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKACI-2125 12
Fabric ‘A’ (AZ 1)
Fabric ‘B’ (AZ 2)
Application
workloads
deployed across
availability zones
Typical Requirement Creation of Two Independent Fabrics/AZs
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKACI-2125 13
Pod ‘1.A’ Pod ‘2.A’
Pod ‘1.B’ Pod ‘2.B’
‘Classic’ Active/Active
Multi-Pod Fabric ‘A’ (AZ 1)
Multi-Pod Fabric ‘B’ (AZ 2)
‘Classic’ Active/Active
Application
workloads
deployed across
availability zones
Typical Requirement Creation of Two Independent Fabrics/AZs
ACI Multi-Site
ACI Multi-Pod Deep Dive
Overview, Use Cases and Supported Topologies
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Forwarding control plane (IS-IS, COOP)
fault isolation
Data Plane VXLAN encapsulation between
Pods
End-to-end policy enforcement
Multiple ACI Pods connected by an IP Inter-Pod
L3 network, each Pod consists of leaf and spine
nodes
Managed by a single APIC Cluster
Single Management and Policy Domain
Pod ‘A’
MP-BGP - EVPN
Pod ‘n’
…
IS-IS, COOP, MP-BGP IS-IS, COOP, MP-BGP
Inter-Pod Network
APIC Cluster
Availability Zone
VXLAN
ACI Multi-PodOverview
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 17BRKACI-2003
Single Availability Zone with Maintenance & Configuration Zones
Scoping ‘Network Device’ Changes
Configuration Zone ‘B’Configuration Zone ‘A’
Configuration Zones can span any required set of switches, simplest approach may be to map a
configuration zone to an availability zone, applies to infrastructure configuration and policy only
Maintenance Zones – Groups of
switches managed as an “upgrade”
group
ACI Multi-Pod
Fabric
APIC Cluster
Inter-Pod Network
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 18BRKACI-2003
Reducing the Impact of Configuration ErrorsIntroducing Configuration Zones
Select entire Pod
Select specific Leaf Switches
Change the deployment
mode
Show the changes not applied yet
to a Disabled zone
Three different zone deployment modes:
Enabled (default): updates are immediately sent to all nodes part of the zone
Note: a node not part of any zone is equivalent to a node part of a zone set to enabled.
Disabled: updates are postponed until the zone deployment mode is changed (or a node is removed from the zone)
Triggered: send postponed updates to the nodes part of the zone
The deployment mode can be configured for an entire Pod or for a specified set of leaf switches
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 19BRKACI-2003
Single Availability Zone with Tenant Isolation Isolation for ‘Virtual Network Zone and Application’ Changes
ACI Multi-Pod
Fabric
Tenant ‘Prod’ Configuration/Change Domain Tenant ‘Dev’ Configuration/Change Domain
The ACI ‘Tenant’ construct provide a domain of application and associated virtual
network policy change
Domain of operational change for an application (e.g. production vs. test)
APIC Cluster
Inter-Pod Network
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
POD 1 POD 2
POD 3
3 (or more) DC Sites directly connected
Dark fiber/DWDM (up to 50 msec RTT**)
10G*/40G/100G 10G*/40G/100G
10G*/40G/100G
10G/40G/100G
Multiple sites interconnected by a
generic L3 network
L3(up to 50msec RTT**)
10G*/40G/100G
10G*/40G/100G
10G*/40G/100G
10G*/40G/100G
POD 1 POD n
…
Intra-DC
10G*/40G/100G 10G*/40G/100G
APIC Cluster
Two DC sites directly connected
POD 1 POD 2Dark fiber/DWDM (up to 50 msec RTT**)
10G*/40G/100G 10G*/40G/100G
10G/40G/100G
APIC Cluster
** 50 msec support added in SW release 2.3(1)* 10G only with QSA adapters on EX/FX spines
ACI Multi-PodSupported Topologies
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
All existing Nexus 9000 HW supported as leaf and spine nodes
Maximum number of supported ACI leaf nodes (across all Pods)
Up to 80 leaf nodes supported with a 3 nodes APIC cluster
300 leaf nodes (across Pods) with a 5 nodes APIC Cluster
400 leaf nodes (across Pods) with a 7 nodes APIC Cluster (from ACI release 2.2(2e))
Maximum 200 leaf nodes per Pod
Up to 6 spines per Pod
Maximum number of supported Pods
4 in 2.0(1)/2.0(2) releases
6 in 2.1(1) release
10 in 2.2(2e) release
12 in 3.0(1) release
21BRKACI-2003
ACI Multi-PodSW/HW Support and Scalability Values
APIC Cluster Deployment Considerations
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Processes are active on all nodes (not active/standby)
The Data Base is distributed as active + 2 backup instances (shards) for every attribute
The Data Base is replicated
across APIC nodes
APIC APIC APIC
Shard 2
Shard 1
Shard 3
Shard 1Shard 1
Shard 2 Shard 2Shard 3 Shard 3
One copy is ‘active’ for every
specific portion of the Data
Base
23BRKACI-2003
APIC – Distributed Multi-Active Data Base
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
APIC will allow read-only access to the DB
when only one node remains active (standard
DB quorum)
Hard failure of two nodes cause all shards to
be in ‘read-only’ mode (of course reboot etc.
heals the cluster after APIC nodes are up)
Additional APIC will increase the system scale (up to
7* nodes supported) but does not add more
redundancy
Hard failure of two nodes would cause inconsistent
behaviour across shards (some will be in ‘read-only’
mode, some in ‘read-write’ mode)
APIC APIC APIC APIC APIC APIC APIC APICX X Shards in
‘read-only’
modeX X
Shards in
‘read-only’
mode
Shards in
‘read-write’ mode
24BRKACI-2003
APIC Cluster Deployment ConsiderationsSingle Pod Scenario
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
25BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
X
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
26BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
Read/Write Read Only
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
27BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
28BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
APIC
29BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
APIC
30BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
APIC
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC APIC
31BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
APIC
Pod isolation scenario: same considerations as with
single Pod (different behaviour across shards)
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC APIC
X
32BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
APIC
Pod isolation scenario: same considerations as with
single Pod (different behaviour across shards)
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC APIC
33BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC
Pod isolation scenario: changes still possible
on APIC nodes in Pod1 but not in Pod2
P
od hard failure scenario: recommendation is to
activate a standby node to make the cluster
fully functional again
APIC
Pod isolation scenario: same considerations as with
single Pod (different behaviour across shards)
P
Possible to restore the whole fabric state to the latest taken
configuration snapshot (‘ID Recovery’ procedure – needs BU
and TAC involvement)
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC APICX XX
X
34BRKACI-2003
APIC Cluster Deployment ConsiderationsMulti-Pod – 2 Pods Scenario
X XX
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC
35BRKACI-2003
APIC Cluster Deployment ConsiderationsWhat about a 4 Nodes APIC Cluster?
Q2CY18
Intermediate scalability values compared to a 3 or 5 nodes cluster scenario (up to
170 leaf nodes supported)
Pod isolation scenario: same considerations as with 5 nodes (different behaviour
across shards)
Pod hard failure scenario
• No chance of total loss of information for any shard
• Can bring up a standby node in the second site to regain full majority for all the shards
Pending internal validation, scoped for Q2CY18
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC
X
36BRKACI-2003
APIC Cluster Deployment ConsiderationsWhat about a 4 Nodes APIC Cluster?
Q2CY18
Intermediate scalability values compared to a 3 or 5 nodes cluster scenario (up to
170 leaf nodes supported)
Pod isolation scenario: same considerations as with 5 nodes (different behaviour
across shards)
Pod hard failure scenario
• No chance of total loss of information for any shard
• Can bring up a standby node in the second site to regain full majority for all the shards
Pending internal validation, scoped for Q2CY18
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APIC
37BRKACI-2003
APIC Cluster Deployment ConsiderationsWhat about a 4 Nodes APIC Cluster?
Q2CY18
Intermediate scalability values compared to a 3 or 5 nodes cluster scenario (up to
170 leaf nodes supported)
Pod isolation scenario: same considerations as with 5 nodes (different behaviour
across shards)
Pod hard failure scenario
• No chance of total loss of information for any shard
• Can bring up a standby node in the second site to regain full majority for all the shards
Pending internal validation, scoped for Q2CY18
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APICX XX
38BRKACI-2003
APIC Cluster Deployment ConsiderationsWhat about a 4 Nodes APIC Cluster?
Q2CY18
Intermediate scalability values compared to a 3 or 5 nodes cluster scenario (up to
170 leaf nodes supported)
Pod isolation scenario: same considerations as with 5 nodes (different behaviour
across shards)
Pod hard failure scenario
• No chance of total loss of information for any shard
• Can bring up a standby node in the second site to regain full majority for all the shards
Pending internal validation, scoped for Q2CY18
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APICX XX
39BRKACI-2003
APIC Cluster Deployment ConsiderationsWhat about a 4 Nodes APIC Cluster?
Q2CY18
Intermediate scalability values compared to a 3 or 5 nodes cluster scenario (up to
170 leaf nodes supported)
Pod isolation scenario: same considerations as with 5 nodes (different behaviour
across shards)
Pod hard failure scenario
• No chance of total loss of information for any shard
• Can bring up a standby node in the second site to regain full majority for all the shards
Pending internal validation, scoped for Q2CY18
APIC
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pod 1 Pod 2
Up to 50 msec
APICAPICAPIC APICX XX
40BRKACI-2003
APIC Cluster Deployment ConsiderationsWhat about a 4 Nodes APIC Cluster?
Q2CY18
Intermediate scalability values compared to a 3 or 5 nodes cluster scenario (up to
170 leaf nodes supported)
Pod isolation scenario: same considerations as with 5 nodes (different behaviour
across shards)
Pod hard failure scenario
• No chance of total loss of information for any shard
• Can bring up a standby node in the second site to regain full majority for all the shards
Pending internal validation, scoped for Q2CY18
APIC
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Main recommendation: deploy a 3 nodes APIC cluster when less than 80 leaf nodes are deployed across Pods
From Q2CY18 can deploy 4 nodes if the scalability requirements are met
When 5 (or 7) nodes are really needed for scalability reasons, follow the rule of thumb of never placing more than two APIC nodes in the same Pod (when possible):
Pod1 Pod2 Pod3 Pod4 Pod5 Pod6
APIC APIC APIC APIC APIC
APIC APIC APIC APIC
APIC APIC APIC APIC
APIC
APIC
APIC APIC APICAPIC APIC
2 Pods*
3 Pods
4 Pods
5 Pods
6+ Pods APIC APIC APICAPIC APIC
*’ID Recovery’ procedure possible for recovering of lost information41BRKACI-2003
APIC Cluster Deployment ConsiderationsDeployment Recommendations
Inter-Pod Connectivity Deployment Considerations
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Not managed by APIC, must be separately configured (day-0 configuration)
IPN topology can be arbitrary, not mandatory to connect to all spine nodes
Main requirements:
Multicast BiDir PIM needed to handle Layer 2 BUM* traffic
OSPF to peer with the spine nodes and learn VTEP reachability
Increase MTU support to handle VXLAN encapsulated traffic
DHCP-Relay
Pod ‘A’ Pod ‘B’
Web/AppDB Web/App
MP-BGP - EVPN
APIC Cluster
* Broadcast, Unknown unicast, Multicast 43BRKACI-2003
ACI Multi-Pod Inter-Pod Network (IPN) Requirements
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Yes, with QSA adapters supported on the ACI spine devices
Available from 2.1(1h) release on EX/FX based HW
No plans to introduce support for first generation spines (including 9336-PQ ‘baby spine’)
44BRKACI-2003
Inter-Pod ConnectivityFrequently Asked Questions
What platforms can or should I deploy in the IPN?
Nexus 9200s, 9300-EX, but also any other switch or router supporting all the IPN requirements
First generation Nexus 9300s/9500s notsupported as IPN nodes
Can I use a 10G connection between the spines and the IPN
network?
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 45BRKACI-2003
Inter-Pod ConnectivityFrequently Asked Questions (2)
Do I need a dedicated pair of IPN devices in each Pod?
Can use a single pair of IPN devices, but before 2.1(1h) release mandates the use of 40G/100G inter-Pod links
POD 1 POD 2
10G*/40G/100G
connectionsIPN Devices
I have two sites connected with dark fiber/DWDM circuits, can I
connect the spines back-to-back?
No, because of multicast requirement for L2 multi-destination inter-Pod traffic
APIC Cluster
POD 1 POD 2
APIC Cluster
X
Control and Data Planes
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Discovery and provisioning
of all the devices in the
local Pod
2
Provisioning interfaces on the spines
facing the IPN and EVPN control
plane configuration
3
Spine 1 in Pod 2 connects to
the IPN and generates DHCP
requests
14
DHCP requests are relayed
by the IPN devices back to
the APIC in Pod 1
5
DHCP response reaches Spine 1
allowing its full provisioning
6
‘Seed’ Pod 1
Single APIC Cluster
APIC Node 2 joins the
Cluster
9
Discovery and provisioning
of all the devices in the
local Pod
7
APIC Node 2 connected to
a Leaf node in Pod 2
81
APIC Node 1 connected to a
Leaf node in ‘Seed’ Pod 1 Pod 2
1
10 Discover other Pods following the same procedure
47BRKACI-2003
ACI Multi-PodAuto-Provisioning of Pods
For more information on how to
setup an ACI Fabric from scratch:
BRKACI-2004, BRKACI-2820
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Separate IP address pools for VTEPs assigned by APIC to each Pod
Summary routes advertised toward the IPN via OSPF routing
IS-IS convergence events local to a Pod not propagated to remote Pods
Spine nodes redistribute other Pods summary routes into the local IS-IS process
Needed for local VTEPs to communicate with remote VTEPs
OSPF OSPF
10.0.0.0/16 10.1.0.0/16
Leaf Node Underlay VRF
IP Prefix Next-Hop
10.1.0.0/16 Pod1-S1, Pod1-S2, Pod1-S3, Pod1-S4
IS-IS to OSPF
mutual redistribution
48BRKACI-2003
ACI Multi-PodIPN Control Plane
10.0.0.0/16
10.1.0.0/16
IPN Network Routing Table
IPN
APIC Cluster
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 49BRKACI-2003
ACI Multi-PodInter-Pod MP-BGP EVPN Control Plane
MP-BGP EVPN to sync Endpoint (EP) and Multicast Group information
All remote Pod entries associated to a Proxy VTEP next-hop address (not part of local TEP Pool)
Same BGP AS across all the Pods
iBGP EVPN sessions between spines in separate Pods
Full mesh MP-iBGP EVPN sessions between local and remote spines (default behavior)
Optional RR deployment (recommended one RR in each Pod for resiliency)
MP-BGP - EVPN
EP1
EP1 Leaf 1
Proxy A Proxy B
EP1 Proxy A
EP2 EP3
EP2 Leaf 3
Proxy B
Proxy B
EP3
EP4
EP4
EP2 Proxy A
Leaf 4
Leaf 6
EP3
EP4
COOP
Single BGP ASN
IPN
APIC Cluster
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 50BRKACI-2003
ACI Multi-PodInter-Pod Data Plane
EP2EP1
Proxy A Proxy B
4
Spine encapsulates
traffic to local leaf
EP2 Leaf 4
EP1 Proxy A
6
If policy allows it, EP2
receives the packetVM1 sends traffic destined
to remote EP2
11
VTEP IP Class-ID Tenant PacketVNID
Policy and network
information carried
across Pods
5
EP1 Pod1 L4
Proxy B*
Leaf learns remote EP1
location and enforces policy
EP2 e1/1
EP2 unknown, traffic is
encapsulated to the local Proxy
A Spine VTEP (adding S_Class
information)
Proxy A*
2
EP1 e1/3
= VXLAN Encap/Decap
IPN
Spine encapsulates
traffic to remote
Proxy B Spine VTEP
3
EP1 Leaf 4
EP2 Proxy B
APIC Cluster
EP1
EPG
EP2
EPGC
Configured on APIC
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 51BRKACI-2003
ACI Multi-Pod Inter-Pod Data Plane (2)
EP2EP1
*
Leaf learns remote VM2 location
(no need to enforce policy)
Proxy A
9
EP2 Pod2 L4
7
VM2 sends traffic back to
remote VM1
VM1 receives the packet
10
*
EP1 e1/3
= VXLAN Encap/Decap
IPN
APIC Cluster
Proxy A Proxy B
EP1 Pod1 L4
Proxy B*
Leaf enforces policy in ingress
and, if allowed, encapsulates
traffic to remote Leaf node L4
8
EP1
EPG
EP2
EPGC
Configured on APIC
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 52BRKACI-2003
ACI Multi-PodInter-Pod Data Plane (3)
11
From this point EP1 to EP2 communication is encapsulated
Leaf to Leaf (VTEP to VTEP) and policy always applied at the
ingress leaf (applies to both L2 and L3 communication)
EP2EP1
*
EP1 Pod1 L4
Proxy B*
Proxy A
EP2 Pod2 L4
*
EP1 e1/3
= VXLAN Encap/Decap
IPN
APIC Cluster
Proxy A Proxy B
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 53BRKACI-2003
ACI Multi-PodUse of Multicast for Inter-Pod Layer 2 BUM Traffic
IGMP Join for (*, GIPo1)
BD1 GIPo1: 225.1.1.128
IPN1
IPN2
Ingress replication for BUM* traffic not supported with Multi-Pod
PIM Bidir is the only validated and supported option
Scalable: only a single (*,G) entry is created in the IPN for each BD
Fast-convergent: no requirement for data-driven multicast state creation
A spine is elected authoritative for each Bridge Domain:
Generates an IGMP Join on a specific link toward the IPN
Always sends/receives BUM traffic on that link
BUM traffic originated in
the local Pod
BUM traffic
originated from a
remote Pod
Spine 1 elected
authoritative for BD1
BUM: Broadcast, Unknown Unicast, Multicast
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 54BRKACI-2003
ACI Multi-PodUse of Multicast for Inter-Pod BUM Traffic
EP2EP1
*
BD1 has associated MG1,
traffic is flooded intra-Pod via
one multi-destination tree
2
6
VM2 receives the BUM
frameVM1 in BD1 generates a
BUM* frame
11
5
BUM frame is flooded
along one of the trees
associated to MG1
BUM: Layer 2 Broadcast, Unknown Unicast, Multicast
3
Spine 2 is designated to send
MG1 traffic toward the IPN
IPN replicates traffic to all the
PODs that joined MG1
(optimized delivery to Pods)4
APIC Cluster
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 55BRKACI-2003
ACI Multi-PodPIM Bidir for BUM – Supported Topologies
EP2
**
IPN1
IPN2
IPN3
IPN4
APIC Cluster
EP2
**
IPN1
IPN2
IPN3
IPN4
APIC Cluster
Create full-mesh connections between IPN devices
More costly for geo-dispersed Pods, as it requires more links between sites
Alternatively, connect local IPN devices with a port-channel interface (for resiliency)
In both cases, it is critical to ensure that the preferred path toward the RP from any IPN devices is not via a spine
Recommendation is to increase the OSPF cost of the interfaces between IPN and spines
interface Ethernet1/49.4
description L3 Link to Pod1-Spine1
mtu 9150
encapsulation dot1q 4
ip address 192.168.1.1/31
ip ospf cost 100
ip ospf network point-to-point
ip router ospf IPN area 0.0.0.0
ip pim sparse-mode
ip dhcp relay address 10.1.0.2
ip dhcp relay address 10.1.0.3
Full Mesh between remote IPN devices
Directly connect local IPN devices
IPN1
IPN2
e1/49
Connecting to the External Layer 3 Domain
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
WAN
Client
PE
PE
PE
PE
Connecting to WAN Edge devices at Border Leaf nodes
Definition of a L3Out logical construct
VRF-lite hand-off for extending L3 multi-tenancy outside the ACI fabric
Each tenant defines one (or more) L3Out with a set of Logical Nodes, Logical Interfaces, peering protocol
L3Out
Border Leafs
58BRKACI-2003
Connecting ACI to Layer 3 Domain‘Traditional’ L3Out on the BL Nodes
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
A Pod does not need to have a dedicated WAN connection (i.e. can offer transit services to other Pods)
Multiple WAN connections can be deployed across Pods
Outbound traffic: by default VTEPs always select WAN connection in the local Pod based on preferred metric
Pod 1 Pod 2
Pod 3
MP-BGP - EVPN
WANWAN
59BRKACI-2003
Connecting Multi-Pod to Layer 3 Domain‘Traditional’ L3Out on the BL Nodes
By default traffic flows
are hashed across
L3Outs of remote
Pods
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Asymmetric traffic paths creates issues when independent active perimeter FWs are deployed across Pods
Tuning routing is possible to ensure ingress/egress traffic leverages always the same Pod’s L3Out
• Reverting to an “Active/Standby” mode of operation for the deployed FWs
Host routes advertisement is a best option to ensure all the deployed FWs are actively utilized
• Support for host route advertisement on BL nodes planned for a future ACI release
• Requires an L3Out connection in each Pod
Pod 1 Pod 2
Pod 3
MP-BGP - EVPN
WANWAN
Active FW Active FW
Connecting Multi-Pod to Layer 3 Domain‘Traditional’ L3Out on the BL Nodes (2)
60BRKACI-2003
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 61BRKACI-2003
Connecting ACI to Layer 3 Domain‘GOLF’ Design
DCIOTV/VPLS
WAN
Client
PE
PE
PE
PE
GOLF Routers (ASR 9000, ASR
1000, Nexus 7000)
Direct or indirect connection from spines to WAN Edge routers
Better scalability, one protocol session for all VRFs, no longer constraint by border leaf HW table
VXLAN handoff with MP-BGP EVPN
Simplified tenant L3Out configuration
Support for host routes advertisement out of the ACI Fabric
VRF configuration automation on GOLF router through OpFlex exchange
= VXLAN Encap/Decap
For More Information on
GOLF Deployment:
LABACI-2101
VXLAN Data
Plane
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
MP-BGP
EVPN
WAN
Centralized WAN Edge Devices
Common when Pods represent rooms/halls in the same physical DC
MP-BGP EVPN peering required from spines in each Pod and the centralized WAN Edge devices
WAN Edge
Routers
IPN
GOLF and Multi-Pod IntegrationCentralized and Distributed Models
WAN
Distributed WAN Edge Devices
Pods usually represent separate physical DCs
Full mesh of EVPN peerings between Pods and WAN Edge routers
IPN IPN
WAN Edge
RoutersWAN Edge
Routers
MP-BGP
EVPN
For more info on GOLF and Multi-Pod integration: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=94038&backBtn=true
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
IPN
MP-BGP EVPN Control PlaneMP-BGP EVPN Control Plane
Pod ‘A’ Pod ‘B’
Host routes for endpoint belonging
to public BD subnets in Pod ‘A’Host routes for endpoint belonging
to public BD subnets in Pod ‘B’
WAN Edge devices inject host
routes into the WAN or register
them in the LISP database
63BRKACI-2003
GOLF and Multi-Pod IntegrationInter-DC Scenario
APIC Cluster
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 64BRKACI-2003
GOLF and Multi-Pod IntegrationInter-DC Scenario (2)
IPN
WAN
10.10.10.10 10.10.10.11
Proxy A Proxy B
G3,G4 Routing Table
10.10.10.0/24 B
10.10.10.11/32 B
G1,G2 Routing Table
10.10.10.0/24 A
10.10.10.10/32 A
Remote Router Table
10.10.10.10/32 G1,G2
10.10.10.11/32 G3,G4
Granular inbound path
optimization( host route
advertisement into the WAN or
integration with LISP)
APIC Cluster
Network Services Integration
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Active and Standby pair deployed across Pods
No issues with asymmetric flows but may cause traffic hair-pinning across the IPN
Works in all scenarios from release 2.3Active Standby
Active/Standby Active/Standby
Independent Active/Standby pair deployed in each Pod
Use PBR (managed or unmanaged mode)
Only for perimeter FW use case assuming proper solution is adopted to keep symmetric ingress/egress flows
Cluster
FW cluster deployed across Pods
Not currently supported (scoped for 1HCY18)
66BRKACI-2003
ACI Multi-PodNetwork Services Integration Models
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Active Standby
IPN
APIC Cluster
Active/Standby Pair across Pods Option 1: FW in L2 Mode
WAN
L3Out-1
WAN
L3Out-2
WAN
= East-West
= North-South
L2 Mode L2 Mode
MAC G
BRKACI-2003 67
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Active/Standby Pair across Pods Option 2: FW in L3 Mode and PBR
WAN
L3Out-
1
WAN
L3Out-
2
WAN
IPN
PBR Policy
Applied Here
APIC Cluster
L3 Mode
Active
L3 Mode
Standby
= East-West
= North-South
BRKACI-2003 68
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Active/Standby Pair across Pods Option 2: FW in L3 Mode and PBR
WAN
L3Out-
1
WAN
L3Out-
2
WAN
IPN
PBR Policy
Applied Here
APIC Cluster
L3 Mode
Active
L3 Mode
Standby
= East-West
= North-South
BRKACI-2003 69
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Web VM1 Web VM2
L3Out
ASA In/Out
APIC Cluster
IPN
L3 Mode
Active
L3 Mode
Standby
FW in L3 Mode and L3Outs Single L3Out Defined across Pods
BDs associated to L3Outs
extended via Multi-Pod
BRKACI-2003 70
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
FW in L3 Mode and L3Outs Single L3Out Defined across Pods (Dynamic Routing)
Web VM1 Web VM2
192.168.1.201 192.168.1.202
L3Out
ASA In/Out
Secondary IP: 192.168.3.254
Active ASA: 192.168.3.1 Standby ASA: 192.168.3.2
Peering
APIC Cluster
IPN
Traffic Bounced
across Pods
L3 Mode
Active
L3 Mode
Standby
Routing table (VRF1)
External IP prefix via Pod2-sLeaf1/2
Note: supported from ACI SW releases 2.1(3), 2.2(3), 2.3(1) and 3.0(1)
and deploying EX/FX HW for ACI service leaf nodes
Routing table (VRF1)
External IP prefix via Pod1-sLeaf1/2
BRKACI-2003 71
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
FW in L3 Mode and L3Outs Single L3Out Defined across Pods (Static Routing)
Web VM1 Web VM2
192.168.1.201 192.168.1.202
L3Out
ASA In/Out
Secondary IP: 192.168.3.254
APIC Cluster
IPN
Traffic Bounced
across Pods
L3 Mode
Active
L3 Mode
Standby
Routing table (VRF1)
External IP prefix via Pod2-sLeaf1/2
Note: supported from ACI SW releases 2.1(3), 2.2(3), 2.3(1) and 3.0(1)
and deploying EX/FX HW for ACI service leaf nodes
Secondary IP: 192.168.3.254
Active ASA: 192.168.3.1 Standby ASA: 192.168.3.2
Static route
injected into MP-
BGP VPNv4 Fabric
control plane
Static route
injected into MP-
BGP VPNv4 Fabric
control plane
Routing table (VRF1)
External IP prefix via Pod1-sLeaf1/2
Secondary IP: 192.168.3.254
BRKACI-2003 72
Migration Scenarios
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 74BRKACI-2003
Migration ScenariosAdding Pods to an Existing ACI
Pod1
MP-BGP - EVPN
Pod2
Pod1 Pod2
MP-BGP EVPN
Add connections to the
IPN network
Connect and auto-provision
the other Pod(s)
Add connections to the
IPN network
Connect and auto-provision
the other Pod(s)
1
Distribute the APIC nodes
across Pods
Distribute the APIC nodes
across Pods
2
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 75BRKACI-2003
Migration ScenariosConverting Stretched Fabric to Multi-Pod
Pod1
MP-BGP EVPN
Pod2
Re-cabling of the physical interconnection (especially when using DWDM circuits that must be reused)
Re-addressing the VTEP address space for the second Pod disruptive procedure as it requires a clean install on the second Pod
Not internally QA-validated or recommended
3
Conclusions and Q&A
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Pod ‘1.A’ Pod ‘2.A’
Pod ‘1.B’ Pod ‘2.B’
‘Classic’ Active/Active
Multi-Pod Fabric ‘A’ (AZ 1)
Multi-Pod Fabric ‘B’ (AZ 2)
‘Classic’ Active/Active
Application
workloads
deployed across
availability zones
ACI Multi-Site
ACI Multi-Pod & Multi-SiteA Reason for Both
BRKACI-2003
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 78BRKACI-2003
Conclusions
Cisco ACI offers different multi-fabric options that can be deployed today
There is a solid roadmap to evolve those options in the short and mid term
Multi-Pod represents the natural evolution of the existing Stretched Fabric design
Multi-Site will replace the Dual-Fabric approach
Cisco will offer migration options to drive the adoption of those new solutions
MP-BGP EVPNMP-BGP EVPN
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 79BRKACI-2003
Where to Go for More Information
ACI Stretched Fabric White Paper
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_kb-aci-stretched-fabric.html#concept_524263C54D8749F2AD248FAEBA7DAD78
ACI Multi-Pod White Paper
http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-737855.html?cachemode=refresh
ACI Dual Fabric Design Guide
http://www.cisco.com/c/en/us/solutions/data-center-virtualization/application-centric-infrastructure/white-paper-c11-737077.html?cachemode=refresh
ACI and GOLF High Level Integration Paper
http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-736899.html
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Spark
Questions? Use Cisco Spark to communicate with the speaker after the session
1. Find this session in the Cisco Live Mobile App
2. Click “Join the Discussion”
3. Install Spark or go directly to the space
4. Enter messages/questions in the space
How
cs.co/ciscolivebot#BRKACI-2003
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Please complete your Online Session Evaluations after each session
• Complete 4 Session Evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt
• All surveys can be completed via the Cisco Live Mobile App or the Communication Stations
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at www.ciscolive.com/global/on-demand-library/.
Complete Your Online Session Evaluation
© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Tech Circle
• Meet the Engineer 1:1 meetings
• Related sessions
82BRKACI-2003
Thank you