Published: 2024-06-09
This article will cover three interesting problems I faced when deploying a Fortigate Firewall (FW) cluster in a VXLAN EVPN (MLAG) datacenter topology. The customer had requested that each FW member was to be deployed in separate leaf-pairs for increased redundancy. The cluster were to provide stateful inter-VRF inspection. Let us quickly walk through the topology before we get to the problems:
Our two FW members FW1a and FW1b are deployed to different leaf-pairs; LE03 and LE04 respectively. While this decision makes perfect sense, it brought along a few interesting design decisions that we would not have to consider should the two FW members be connected to the same leaf pair.
I have also added two servers to the topology so that we can generate traffic to test the design, both connected to LE05. The servers live in vrfs A and B respectively, forcing traffic between them to traverse our FW. The diagram below displays the logical topology and traffic flow:
The IPv6 range 2001:db8:dc01::/48 was allocated to this data center site. From that range we allocate 2001:db8:dc01:a00::/56 to vrf A and 2001:db8:dc01:b00::/56 to vrf B. As FW1 should inspect traffic traveling between the two vrfs, it needs connectivity to both vrfs. We setup a linknet on vlan 100 for vrf A and a linknet on vlan 200 for vrf B. SRV6 was placed vlan 106 and SRV7 in vlan 207.
We run OSPF/iBGP in the VXLAN-EVPN underlay using unnumbered Ethernet links. This setup reduces the configuration as interfaces Ethernet1 through Ethernet6 on the spines are identical. It also greatly reduces the number of IP-addresses required for the topology. You can see the full (initial) configuration below:
service routing protocols model multi-agent
!
hostname SP01
!
interface Ethernet1-6
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Loopback0
ip address 10.0.0.1/32
ip ospf area 0.0.0.0
!
ip routing
!
router bgp 65000
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN route-reflector-client
neighbor EVPN timers 5 15
neighbor EVPN send-community
neighbor 10.0.0.31 peer group EVPN
neighbor 10.0.0.32 peer group EVPN
neighbor 10.0.0.41 peer group EVPN
neighbor 10.0.0.42 peer group EVPN
neighbor 10.0.0.51 peer group EVPN
neighbor 10.0.0.52 peer group EVPN
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname SP02
!
interface Ethernet1-6
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Loopback0
ip address 10.0.0.2/32
ip ospf area 0.0.0.0
!
ip routing
!
router bgp 65000
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN route-reflector-client
neighbor EVPN timers 5 15
neighbor EVPN send-community
neighbor 10.0.0.31 peer group EVPN
neighbor 10.0.0.32 peer group EVPN
neighbor 10.0.0.41 peer group EVPN
neighbor 10.0.0.42 peer group EVPN
neighbor 10.0.0.51 peer group EVPN
neighbor 10.0.0.52 peer group EVPN
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname LE03a
!
vlan 100
name "FW1;VRF=A"
!
vlan 200
name "FW1;VRF=B"
!
vlan 4094
name "MLAG"
trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
description "PEER-LINK"
switchport mode trunk
switchport trunk group PEER-LINK
!
interface Port-Channel4
description "FW1"
switchport mode trunk
mlag 4
!
interface Ethernet1
description "SP01"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet2
description "SP02"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet3
description "LE03b"
channel-group 3 mode active
!
interface Ethernet4
description "FW1a"
channel-group 4 mode active
!
interface Loopback0
description "OSPF/BGP ROUTER-ID"
ip address 10.0.0.31/32
ip ospf area 0.0.0.0
!
interface Loopback1
description "VXLAN SOURCE-INTERFACE"
ip address 10.0.0.3/32
ip ospf area 0.0.0.0
!
interface Vlan100
description "FW1;VRF=A"
vrf A
arp aging timeout 290
ipv6 address 2001:db8:dc01:a00::3/64
!
interface Vlan200
description "FW1;VRF=B"
vrf B
arp aging timeout 290
ipv6 address 2001:db8:dc01:b00::3/64
!
interface Vlan4094
no autostate
ip address 10.0.3.1/30
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 100 vni 100
vxlan vlan 200 vni 200
vxlan vrf A vni 5001
vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
domain-id LE03
local-interface Vlan4094
peer-address 10.0.3.2
peer-link Port-Channel3
reload-delay mlag 60
reload-delay non-mlag 30
!
router bgp 65000
no bgp default ipv4-unicast
distance bgp 20 200 200
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN send-community
neighbor 10.0.0.1 peer group EVPN
neighbor 10.0.0.2 peer group EVPN
!
vlan 100
rd 10.0.0.31:100
route-target both 65000:100
redistribute learned
!
vlan 200
rd 10.0.0.31:200
route-target both 65000:200
redistribute learned
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname LE03b
!
vlan 100
name "FW1;VRF=A"
!
vlan 200
name "FW1;VRF=B"
!
vlan 4094
name "MLAG"
trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
description "PEER-LINK"
switchport mode trunk
switchport trunk group PEER-LINK
!
interface Port-Channel4
description "FW1"
switchport mode trunk
mlag 4
!
interface Ethernet1
description "SP01"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet2
description "SP02"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet3
description "LE03a"
channel-group 3 mode active
!
interface Ethernet4
description "FW1a"
channel-group 4 mode active
!
interface Loopback0
description "OSPF/BGP ROUTER-ID"
ip address 10.0.0.32/32
ip ospf area 0.0.0.0
!
interface Loopback1
description "VXLAN SOURCE-INTERFACE"
ip address 10.0.0.3/32
ip ospf area 0.0.0.0
!
interface Vlan100
description "FW1;VRF=A"
vrf A
arp aging timeout 290
ipv6 address 2001:db8:dc01:a00::4/64
!
interface Vlan200
description "FW1;VRF=B"
vrf B
arp aging timeout 290
ipv6 address 2001:db8:dc01:b00::4/64
!
interface Vlan4094
no autostate
ip address 10.0.3.2/30
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 100 vni 100
vxlan vlan 200 vni 200
vxlan vrf A vni 5001
vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
domain-id LE03
local-interface Vlan4094
peer-address 10.0.3.1
peer-link Port-Channel3
reload-delay mlag 60
reload-delay non-mlag 30
!
router bgp 65000
no bgp default ipv4-unicast
distance bgp 20 200 200
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN send-community
neighbor 10.0.0.1 peer group EVPN
neighbor 10.0.0.2 peer group EVPN
!
vlan 100
rd 10.0.0.32:100
route-target both 65000:100
redistribute learned
!
vlan 200
rd 10.0.0.32:200
route-target both 65000:200
redistribute learned
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname LE04a
!
vlan 100
name "FW1;VRF=A"
!
vlan 200
name "FW1;VRF=B"
!
vlan 4094
name "MLAG"
trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
description "PEER-LINK"
switchport mode trunk
switchport trunk group PEER-LINK
!
interface Port-Channel4
description "FW1"
switchport mode trunk
mlag 4
!
interface Ethernet1
description "SP01"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet2
description "SP02"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet3
description "LE04b"
channel-group 3 mode active
!
interface Ethernet4
description "FW1a"
channel-group 4 mode active
!
interface Loopback0
description "OSPF/BGP ROUTER-ID"
ip address 10.0.0.41/32
ip ospf area 0.0.0.0
!
interface Loopback1
description "VXLAN SOURCE-INTERFACE"
ip address 10.0.0.4/32
ip ospf area 0.0.0.0
!
interface Vlan100
description "FW1;VRF=A"
vrf A
arp aging timeout 290
ipv6 address 2001:db8:dc01:a00::5/64
!
interface Vlan200
description "FW1;VRF=B"
vrf B
arp aging timeout 290
ipv6 address 2001:db8:dc01:b00::5/64
!
interface Vlan4094
no autostate
ip address 10.0.4.1/30
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 100 vni 100
vxlan vlan 200 vni 200
vxlan vrf A vni 5001
vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
domain-id LE04
local-interface Vlan4094
peer-address 10.0.4.2
peer-link Port-Channel3
reload-delay mlag 60
reload-delay non-mlag 30
!
router bgp 65000
distance bgp 20 200 200
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN send-community
neighbor 10.0.0.1 peer group EVPN
neighbor 10.0.0.2 peer group EVPN
!
vlan 100
rd 10.0.0.41:100
route-target both 65000:100
redistribute learned
!
vlan 200
rd 10.0.0.41:200
route-target both 65000:200
redistribute learned
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname LE04b
!
vlan 100
name "FW1;VRF=A"
!
vlan 200
name "FW1;VRF=B"
!
vlan 4094
name "MLAG"
trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
description "PEER-LINK"
switchport mode trunk
switchport trunk group PEER-LINK
!
interface Port-Channel4
description "FW1"
switchport mode trunk
mlag 4
!
interface Ethernet1
description "SP01"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet2
description "SP02"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet3
description "LE04a"
channel-group 3 mode active
!
interface Ethernet4
description "FW1a"
channel-group 4 mode active
!
interface Loopback0
description "OSPF/BGP ROUTER-ID"
ip address 10.0.0.42/32
ip ospf area 0.0.0.0
!
interface Loopback1
description "VXLAN SOURCE-INTERFACE"
ip address 10.0.0.4/32
ip ospf area 0.0.0.0
!
interface Vlan100
description "FW1;VRF=A"
vrf A
arp aging timeout 290
ipv6 address 2001:db8:dc01:a00::6/64
!
interface Vlan200
description "FW1;VRF=B"
vrf B
arp aging timeout 290
ipv6 address 2001:db8:dc01:b00::6/64
!
interface Vlan4094
no autostate
ip address 10.0.4.2/30
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 100 vni 100
vxlan vlan 200 vni 200
vxlan vrf A vni 5001
vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
ip routing vrf A
ip routing vrf B
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
domain-id LE04
local-interface Vlan4094
peer-address 10.0.4.1
peer-link Port-Channel3
reload-delay mlag 60
reload-delay non-mlag 30
!
router bgp 65000
distance bgp 20 200 200
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN send-community
neighbor 10.0.0.1 peer group EVPN
neighbor 10.0.0.2 peer group EVPN
!
vlan 100
rd 10.0.0.42:100
route-target both 65000:100
redistribute learned
no redistribute host-route
!
vlan 200
rd 10.0.0.42:200
route-target both 65000:200
redistribute learned
no redistribute host-route
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname LE05a
!
vlan 106
name SRV6
!
vlan 207
name SRV7
!
vlan 4094
name "MLAG"
trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
description "PEER-LINK"
switchport mode trunk
switchport trunk group PEER-LINK
!
interface Ethernet1
description "SP01"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet2
description "SP02"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet3
description "LE05b"
channel-group 3 mode active
!
interface Ethernet6
description SRV6
switchport access vlan 106
!
interface Loopback0
description "OSPF/BGP ROUTER-ID"
ip address 10.0.0.51/32
ip ospf area 0.0.0.0
!
interface Loopback1
description "VXLAN SOURCE-INTERFACE"
ip address 10.0.0.5/32
ip ospf area 0.0.0.0
!
interface Vlan106
vrf A
arp aging timeout 290
ipv6 address virtual 2001:db8:dc01:a06::1/64
!
interface Vlan207
vrf B
arp aging timeout 290
ipv6 address virtual 2001:db8:dc01:b07::1/64
!
interface Vlan4094
no autostate
ip address 10.0.5.1/30
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 106 vni 106
vxlan vlan 207 vni 207
vxlan vrf A vni 5001
vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
ip routing vrf A
ip routing vrf B
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
domain-id LE05
local-interface Vlan4094
peer-address 10.0.5.2
peer-link Port-Channel3
reload-delay mlag 60
reload-delay non-mlag 30
!
router bgp 65000
router-id 10.0.0.51
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN send-community
neighbor 10.0.0.1 peer group EVPN
neighbor 10.0.0.2 peer group EVPN
!
vlan 106
rd 10.0.0.51:106
route-target both 65000:106
redistribute learned
!
vlan 207
rd 10.0.0.51:207
route-target both 65000:207
redistribute learned
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
vrf A
rd 10.0.0.51:5001
route-target import evpn 65000:5001
route-target export evpn 65000:5001
router-id 10.0.0.51
bgp default ipv6-unicast
redistribute connected
!
vrf B
rd 10.0.0.51:5002
route-target import evpn 65000:5002
route-target export evpn 65000:5002
router-id 10.0.0.51
bgp default ipv6-unicast
redistribute connected
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
service routing protocols model multi-agent
!
hostname LE05b
!
vlan 106
name SRV6
!
vlan 207
name SRV7
!
vlan 4094
name "MLAG"
trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
description "PEER-LINK"
switchport mode trunk
switchport trunk group PEER-LINK
!
interface Ethernet1
description "SP01"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet2
description "SP02"
no switchport
ip address unnumbered Loopback0
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Ethernet3
description "LE05a"
channel-group 3 mode active
!
interface Ethernet7
description SRV7
switchport access vlan 207
!
interface Loopback0
description "OSPF/BGP ROUTER-ID"
ip address 10.0.0.52/32
ip ospf area 0.0.0.0
!
interface Loopback1
description "VXLAN SOURCE-INTERFACE"
ip address 10.0.0.5/32
ip ospf area 0.0.0.0
!
interface Vlan106
vrf A
arp aging timeout 290
ipv6 address virtual 2001:db8:dc01:a06::1/64
!
interface Vlan207
vrf B
arp aging timeout 290
ipv6 address virtual 2001:db8:dc01:b07::1/64
!
interface Vlan4094
no autostate
ip address 10.0.5.2/30
ip ospf network point-to-point
ip ospf area 0.0.0.0
!
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 106 vni 106
vxlan vlan 207 vni 207
vxlan vrf A vni 5001
vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
ip routing vrf A
ip routing vrf B
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
domain-id LE05
local-interface Vlan4094
peer-address 10.0.5.1
peer-link Port-Channel3
reload-delay mlag 60
reload-delay non-mlag 30
!
router bgp 65000
router-id 10.0.0.52
neighbor EVPN peer group
neighbor EVPN remote-as 65000
neighbor EVPN update-source Loopback0
neighbor EVPN send-community
neighbor 10.0.0.1 peer group EVPN
neighbor 10.0.0.2 peer group EVPN
!
vlan 106
rd 10.0.0.52:106
route-target both 65000:106
redistribute learned
!
vlan 207
rd 10.0.0.52:207
route-target both 65000:207
redistribute learned
!
address-family evpn
neighbor EVPN activate
!
address-family ipv4
no neighbor EVPN activate
!
vrf A
rd 10.0.0.52:5001
route-target import evpn 65000:5001
route-target export evpn 65000:5001
router-id 10.0.0.52
bgp default ipv6-unicast
redistribute connected
!
vrf B
rd 10.0.0.52:5002
route-target import evpn 65000:5002
route-target export evpn 65000:5002
router-id 10.0.0.52
bgp default ipv6-unicast
redistribute connected
!
router ospf 1
redistribute connected
max-lsa 12000
!
end
config system global
set hostname "FW1a"
end
config system ha
set group-id 1
set group-name "FW1"
set mode a-p
set hbdev "port3" 0
set session-pickup enable
set session-pickup-delay enable
set override enable
set priority 200
end
config system interface
edit "port1"
set mode static
next
edit "LEAF"
set vdom "root"
set type aggregate
set member "port1" "port2"
set lldp-transmission enable
set snmp-index 9
next
edit "LEAF;VRF=A"
set vdom "root"
set device-identification enable
set role lan
set snmp-index 10
config ipv6
set ip6-address 2001:db8:dc01:a00::1/64
set ip6-allowaccess ping
end
set interface "LEAF"
set vlanid 100
next
edit "LEAF;VRF=B"
set vdom "root"
set device-identification enable
set role lan
set snmp-index 12
config ipv6
set ip6-address 2001:db8:dc01:b00::1/64
set ip6-allowaccess ping
end
set interface "LEAF"
set vlanid 200
next
end
config firewall policy
edit 1
set srcintf "any"
set dstintf "any"
set action accept
set srcaddr "all"
set dstaddr "all"
set srcaddr6 "all"
set dstaddr6 "all"
set schedule "always"
set service "ALL"
next
end
config system global
set hostname "FW1b"
end
config system ha
set group-id 1
set group-name "FW1"
set mode a-p
set hbdev "port3" 0
set session-pickup enable
set session-pickup-delay enable
set override enable
set priority 150
end
We decided that we wanted to use a routing protocol between FW1 and the leaves in each vrf to exchange routes. We chose to use eBGP with FW1 residing in AS 65001 and the leaves in AS 65000. The FW should advertise a default route into each vrf; the leaves should advertise directly-connected subnets.
This leads us to our first interesting design choice. The initial plan was for each FW member to establish adjacencies to the leaf-pair it was physically connected to. FW1a would peer with LE03a/b; FW1b would peer with LE04a/b. One benefit of this plan would be that only the leaf-pair connected to the active FW member would receive the default route, ensuring that all traffic to exit the vrf would always hit LE03 when FW1a was active; LE04 when FW1b was active. This plan does not work for multiple reasons. The largest problem is that the configuration is synchronized between FW1a/b, so whichever adjacencies you configure on one member will also exist on the second.
The only viable solution here is to configure FW1a/b to peer with both LE03a/b and LE04a/b, totaling four adjacencies per vrf. This is what that configuration looks like:
FW1a # show router bgp
config router prefix-list
edit "DEFAULT_ROUTE"
config rule
edit 1
set prefix 0.0.0.0 0.0.0.0
next
end
next
end
config router prefix-list6
edit "DEFAULT_ROUTE"
config rule
edit 1
set prefix6 ::/0
unset ge
unset le
next
end
next
end
config router route-map
edit "RM_LEAF_OUT"
config rule
edit 4
set match-ip-address "DEFAULT_ROUTE"
next
edit 6
set match-ip6-address "DEFAULT_ROUTE"
next
end
next
end
config router bgp
set as 65001
config neighbor
edit "2001:db8:dc01:a00::3"
set advertisement-interval 0
set description "LE03a;VRF=A"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:a00::4"
set advertisement-interval 0
set description "LE03b;VRF=A"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:a00::5"
set advertisement-interval 0
set description "LE04a;VRF=A"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:a00::6"
set advertisement-interval 0
set description "LE04b;VRF=A"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:b00::3"
set advertisement-interval 0
set description "LE03a;VRF=B"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:b00::4"
set advertisement-interval 0
set description "LE03b;VRF=B"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:b00::5"
set advertisement-interval 0
set description "LE04a;VRF=B"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
edit "2001:db8:dc01:b00::6"
set advertisement-interval 0
set description "LE04b;VRF=B"
set remote-as 65000
set route-map-out6 "RM_LEAF_OUT"
next
end
config redistribute6 "static"
set status enable
end
end
# Default route to be advertised into each VRF.
config router static6
edit 1
set blackhole enable
next
end
interface Vlan100
description "FW1;VRF=A"
vrf A
ipv6 address 2001:db8:dc01:a00::3/64
!
router bgp 65000
!
vlan 100
rd 10.0.0.31:100
route-target both 65000:100
redistribute learned
!
vlan 200
rd 10.0.0.31:200
route-target both 65000:200
redistribute learned
!
vrf A
rd 10.0.0.31:5001
route-target import evpn 65000:5001
route-target export evpn 65000:5001
router-id 10.0.0.31
bgp default ipv6-unicast
neighbor 2001:db8:dc01:a00::1 remote-as 65001
neighbor 2001:db8:dc01:a00::1 description FW1;VRF=A
neighbor 2001:db8:dc01:a00::1 timers 10 30
redistribute connected
!
vrf B
rd 10.0.0.31:5002
route-target import evpn 65000:5002
route-target export evpn 65000:5002
router-id 10.0.0.31
bgp default ipv6-unicast
neighbor 2001:db8:dc01:b00::1 remote-as 65001
neighbor 2001:db8:dc01:b00::1 description FW1;VRF=B
neighbor 2001:db8:dc01:b00::1 timers 10 30
redistribute connected
interface Vlan100
description "FW1;VRF=A"
vrf A
ipv6 address 2001:db8:dc01:a00::6/64
!
router bgp 65000
!
vlan 100
rd 10.0.0.42:100
route-target both 65000:100
redistribute learned
!
vlan 200
rd 10.0.0.42:200
route-target both 65000:200
redistribute learned
!
vrf A
rd 10.0.0.42:5001
route-target import evpn 65000:5001
route-target export evpn 65000:5001
router-id 10.0.0.42
bgp default ipv6-unicast
neighbor 2001:db8:dc01:a00::1 remote-as 65001
neighbor 2001:db8:dc01:a00::1 description FW1;VRF=A
neighbor 2001:db8:dc01:a00::1 timers 10 30
redistribute connected
!
vrf B
rd 10.0.0.42:5002
route-target import evpn 65000:5002
route-target export evpn 65000:5002
router-id 10.0.0.42
bgp default ipv6-unicast
neighbor 2001:db8:dc01:b00::1 remote-as 65001
neighbor 2001:db8:dc01:b00::1 description FW1;VRF=B
neighbor 2001:db8:dc01:b00::1 timers 10 30
redistribute connected
Alright, that was relatively easy. Or was it..?
When deploying the configuration above, we noticed that when FW1a is the active member, it would only establish BGP adjacencies to its physically connected neighbors LE03a/b. The adjacencies to LE04a/b would not come up. If we failed over the cluster to FW1b then it would establish adjacencies to LE04a/b, but not LE03a/b. ICMPv6 worked just fine, the Fortigate could reach all leaves in the subnet. The NDP table show neighbor addresses MAC-addresses correctly, also as expected. But BGP just wouldn't establish.
Finding the answer require some information gathering and digging through packet captures. Let's analyze the output below:
LE04b(vrf:A)#show bgp evpn route-type mac-ip vni 100 detail
"L2VNI"
BGP routing table entry for mac-ip 0009.0f09.0100
RD: 10.0.0.31:100
Paths: 1 available
Local
10.0.0.3 from 10.0.0.1 (10.0.0.1)
Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
Originator: 10.0.0.31, Cluster list: 10.0.0.1
Extended Community:
Route-Target-AS:65000:100
TunnelEncap:tunnelTypeVxlan
VNI: 100
"L2VNI + L3VNI"
BGP routing table entry for mac-ip 0009.0f09.0100 2001:db8:dc01:a00::1
RD: 10.0.0.32:100
Paths: 1 available
Local
10.0.0.3 from 10.0.0.1 (10.0.0.1)
Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
Originator: 10.0.0.32, Cluster list: 10.0.0.1
Extended Community:
Route-Target-AS:65000:100
Route-Target-AS:65000:5001
TunnelEncap:tunnelTypeVxlan
EvpnRouterMac:5001.0000.0032
VNI: 100
L3 VNI: 5001
The above output shows the EVPN routes learned for VNI 100. We have received two routes; one containing only L2VNI information and the other containing both L2VNI and L3VNI. We can see this because the second entry tell us the IPv6-address in addition to the MAC-address. Each route generate its own set of entries in their respective table. For example, the L2VNI-only route generates these MAC/VXLAN-address table entries:
"L2VNI"
LE04b(vrf:A)#show mac address-table vlan 100
Vlan Mac Address Type Ports Moves Last Move
---- ----------- ---- ----- ----- ---------
100 0009.0f09.0100 DYNAMIC Vx1 1 1:20:08 ago
LE04b(vrf:A)#show vxlan address-table vlan 100
VLAN Mac Address Type Prt VTEP Moves Last Move
---- ----------- ---- --- ---- ----- ---------
100 0009.0f09.0100 EVPN Vx1 10.0.0.3 1 1:20:08 ago
The L2VNI-only EVPN route generated the MAC-address table entry above. It also created an entry in the VXLAN-address table, telling VXLAN behind VTEP the MAC-address lives.
Moving on to the L2VNI+L3VNI route, this route generated the following ND-entries and static route:
"L2VNI + L3VNI"
LE04b(vrf:A)#show ipv6 neighbors 2001:db8:dc01:a00::1
IPv6 Address Age Hardware Addr Interface
2001:db8:dc01:a00::1 - 0009.0f09.0100 Vl100, Vxlan1
LE04b(vrf:A)#show ipv6 route 2001:db8:dc01:a00::1
B I 2001:db8:dc01:a00::1/128 [200/0]
via VTEP 10.0.0.3 VNI 5001 router-mac 5001.0000.0032
C 2001:db8:dc01:a00::/64 [0/0]
via Vlan100, directly connected
The L2VNI+L3VNI route gave our leaf enough information to generate an entry in the IPv6 Neighbor-table aswell as create a static ::/128 host-route pointing behind a VXLAN VTEP.
I couldn't find anything conclusive based on the above output, so I resorted to performing a packet capture. Let's analyze the packets captured on FW1:
"Request"
Ethernet II, Src: 0009.0f09.0100, Dst: 5001.0000.0042
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
Internet Protocol Version 6, Src: 2001:db8:dc01:a00::1, Dst: 2001:db8:dc01:a00::6
Internet Control Message Protocol v6
Type: Echo request
"Reply"
Ethernet II, Src: 5001.0000.0032, Dst: 0009.0f09.0100
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
Internet Protocol Version 6, Src: 2001:db8:dc01:a00::6, Dst: 2001:db8:dc01:a00::1
Internet Control Message Protocol v6
Type: Echo reply
The above capture show an ICMPv6 request/reply from FW1 to LE04b. In the request we can see that the destination MAC-address is set to 5001.0000.0042, the MAC-address of LE04b. However, the reply was sent from 5001.0000.0032. That MAC-address belongs to LE03b. That's weird.
Let's continue and look at the packets captured on between SP01 and LE04b:
"Request"
Ethernet II, Src: 5001.0000.0001, Dst: 5001.0000.0042
Internet Protocol Version 4, Src: 10.0.0.3, Dst: 10.0.0.4
User Datagram Protocol, Src Port: 41830, Dst Port: 4789
Virtual eXtensible Local Area Network
VXLAN Network Identifier (VNI): 100
Ethernet II, Src: 0009.0f09.0100, Dst: 5001.0000.0042
Internet Protocol Version 6, Src: 2001:db8:dc01:a00::1, Dst: 2001:db8:dc01:a00::6
Internet Control Message Protocol v6
Type: Echo (ping) request (128)
"Reply"
Ethernet II, Src: 5001.0000.0042, Dst: 5001.0000.0001
Internet Protocol Version 4, Src: 10.0.0.4, Dst: 10.0.0.3
User Datagram Protocol, Src Port: 53928, Dst Port: 4789
Virtual eXtensible Local Area Network
VXLAN Network Identifier (VNI): 5001
Ethernet II, Src: 5001.0000.0042, Dst: 5001.0000.0032
Internet Protocol Version 6, Src: 2001:db8:dc01:a00::6, Dst: 2001:db8:dc01:a00::1
Internet Control Message Protocol v6
Type: Echo (ping) reply (129)
MAC-address 5001.0000.0001 belongs to SP01
This time the ICMPv6 packets are VXLAN-encapsulated. The VXLAN-packet was received from LE03 (10.0.0.3) to LE04 (10.0.0.4) on L2VNI 100. The inner packet has FW1 as the source MAC-address and LE04b as the destination(5001.0000.0042). So far so good.
When looking at the reply from LE04b back to FW1, we see a few differences. One difference is that LE04b decided to use the L3VNI 5001. We can also see that the inner packet destination MAC-address is set to LE03b (5001.0000.0032) instead of FW1.
In case the output above was difficult to parse, I took the liberty of creating two diagrams to help visualize what's happening. I also simplified the MAC-addresses to instead show hostnames. Let's see if we can spot what's going wrong:
ICMP packet walk from FW1 to LE04b
The above diagram has pieced together the packet captures we did on the FW1-LE03b and SP01-LE04b links, showing the ICMP request as it travels from FW1 to LE04b. Everything looks correct, so let us review the opposite direction:
ICMP reply from LE04b back to FW1
As we can see in the diagram above, there are lots things going wrong. One example is that the inside packet generated by LE04b has LE03b as its destination MAC-address, not FW1. But this is just a symptom. The root cause is that LE04b is using L3VNI 5001 to get the packet to FW1, not L2VNI 100.
The above output tells us everything we need to know! As we learned in the L3VPN article, L3VNI require some gymnastics to be performed by the sending and receiving leaf:
LE04b(vrf:A)#show ipv6 route 2001:db8:dc01:a00::1
B I 2001:db8:dc01:a00::1/128 [200/0]
via VTEP 10.0.0.3 VNI 5001 router-mac 5001.0000.0032 (LE03b)
According to the host-route installed on LE04b, for packet to reach FW1, LE04b must VXLAN-encapsulate the packet with VNI 5001 and set LE03b as the destination MAC-address. LE04b is a good boy and does as requested, causing all sorts of problems.
LE03b receive the VXLAN packet and see that VNI 5001 matches vrf A. When the inner Ethernet frame is processed LE03b realizes that the destination MAC-address is its own MAC-address, so it knows it is the intended destination for the frame. At this point the Ethernet frame has done its job and its header is discarded.
Now LE03b examines the IP header. It sees that the destination IP 2001:db8:dc01:a00::1 (FW1) is not a local IP-address, so LE03b is not the intended destination for the IP packet. A routing lookup is performed to learn that the destination is directly connected via the Vlan100 interface.
LE03b generates a new Ethernet header, setting itself (5001.0000.0032) as the source and FW1 (0009.0f09.0100) as the destination. The Ethernet header is prepended to the IP packet and the frame is sent to FW1.
I had to spend a lot of time thinking about how to solve this. My initial plan was to utilize the no redistribute host-route command on the FW1 vlans. This would stop the L2VNI+L3VNI EVPN route from being advertised into the VRF, which in turn stop the other leaves from generating that pesky FW1 ::/128 host-route:
router bgp 65000
!
vlan 100
no redistribute host-route
!
vlan 200
no redistribute host-route
However, not advertising the host-route creates a problem on LE05 as this leaf-pair relies on the host-route to figure out which leaf-pair FW1 is currently connected to. Without the host-route, LE05 only knows how to reach 2001:db8:dc01:a00::/64. This route is advertised by both LE03 and LE04, so traffic would be load-balanced between the two leaf-pairs. This would cause a suboptimal routing as whenever FW1a was active, traffic along the LE05-SP01-LE04 path would have to be VXLAN-forwarded back via LE04-SP01-LE03 before it could be sent to FW1.
We need a way to selectively block the L3VNI routes on LE03 and LE04 while still advertising the host-routes to LE05.
I ended up creating an inbound EVPN route-map on LE03 and LE04 to reject any route containing both the 65000:100 and 65000:5001 extended communities. Let's look at the config below:
ip extcommunity-list ECL_FW1_L2VNI permit 65000:100
ip extcommunity-list ECL_FW1_L2VNI permit 65000:200
ip extcommunity-list ECL_FW1_L3VNI permit 65000:5001
ip extcommunity-list ECL_FW1_L3VNI permit 65000:5002
route-map "RM_EVPN_IN" deny 10
match extcommunity ECL_FW1_L2VNI
sub-route-map "RM_MATCH_FW1_L3VNI"
!
route-map "RM_EVPN_IN" permit 100
route-map "RM_MATCH_FW1_L3VNI" permit 10
match extcommunity ECL_FW1_L3VNI
!
route-map "RM_MATCH_FW1_L3VNI" deny 100
router bgp 65000
neighbor EVPN route-map "RM_EVPN_IN" in
To make this work I had to combine two route-maps as it was the only way that would allow me to match only if two communities were present on the same route. Let's start with RM_EVPN_IN that is applied inbound on our EVPN bgp neighbors SP01 and SP02.
Whenever a route was received from these neighbors then the route-map RM_EVPN_IN deny 10 would first be checked. This entry would only match if the route contained extended communities 65000:100 or 65000:200, as configured in ECL_FW1_L2VNI. If there was a match, we would then use the sub-route-map command to call RM_MATCH_FW1_L3VNI which would check if the route contained extended communities 65000:5001 or 65000:5002. If both route-maps found a match, the route would be rejected and no ::/128 host-route would be installed.
This solution stops LE03 and LE04 from installing the FW1 host-route while still allowing LE05 to install it and avoid suboptimal routing. We can see the routes from LE03 being rejected on LE04b below:
LE04b#show bgp evpn route-type mac-ip
Network Next Hop
* > RD: 10.0.0.31:100 mac-ip 0009.0f09.010
10.0.0.3
* > RD: 10.0.0.31:200 mac-ip 0009.0f09.010
10.0.0.3
* > RD: 10.0.0.32:100 mac-ip 0009.0f09.010
10.0.0.3
RD: 10.0.0.31:100 mac-ip 0009.0f09.0100 2001:db8:dc01:a00::
PolicyReject
RD: 10.0.0.32:100 mac-ip 0009.0f09.0100 2001:db8:dc01:a00::
PolicyReject
RD: 10.0.0.31:200 mac-ip 0009.0f09.0100 2001:db8:dc01:b00::
PolicyReject
Note: I must confess that I do not know if there is a better way to achieve this. The EVPN route filtering options are limited, atleast on the Arista Lab platform that I'm using. I get the feeling that the design I'm going for is not something you're supposed to do, that I'm doing something wrong. Anyway, whatever.
We now have a fully functioning topology where FW1 can establish BGP adjacencies to the LE03 and LE04 leaves. LE05 receive the host-route, allowing it to forward traffic to the leaf closest to FW1. Everything is great, but then a FW1 HA failover is triggered...
You are doing a great job keeping up thus far. We're getting close to the end but we have one more big problem to solve.
What happens to the routing when the FW1 cluster perform a failover? If I perform a redundancy test and kill FW1a while pinging from SRV6 to SRV7, I can see that traffic continues to flow for about 10 or seconds. After that point no traffic is getting through. Then, after around 180 seconds or so, traffic start flowing again. This is of course unacceptable in this day and age, a HA failover should be seamless.
To understand this problem we need to explain a few things. The first thing we need to know is what state is synchronized between the two FW1 members:
TCP sessions flowing through the cluster are synchronized thanks to the set session-pickup enable command under config system ha. This allows FW1b to seamlessly continue traffic inspection when it becomes the active member. As the Fortigate is a stateful device, FW1b would not allow traffic for an established TCP session if there was no matching entry in its session table. So synchronizing the session table is vital for a failover to not impact active sessions flowing through it.
The kernel routing table on the active member is also synchronized. This table is the FIB and is used to make routing decisions for incoming packets. Synchronizing this table ensures that FW1b can continue forwarding packets after a failover, before it has had a chance to establish any routing protocol adjacencies. The default route-TTL is 10 seconds.
When this timer expires, the synchronized kernel routes are removed. This essentially gives FW1b member a 10 second window to establish new routing protocol adjacencies to learn new routes and replace the ones learned from FW1a.
BGP sessions are not synchronized to the passive member. This means that any established BGP session has to be reestablished on failover. The reason this is a problem for us is that, by default, BGP will not accept more than one active session per adjacency. If LE03a has an established adjacency to FW1 and it suddenly receive a BGP Hello from the same neighbor, LE03a will ignore it. Not until the current BGP session has timed out will LE03a allow a new BGP session to establish.
Based on this information we can figure out what's happening here. In the first ten seconds after failover, the new primary firewall can keep forwarding traffic based on the synchronized kernel routes. During this time the firewall attempts to establish the BGP adjacencies it finds in its configuration, but receive no response from the leaves. After ten seconds, these routes time out and are removed from the kernel routing table. As the BGP adjacencies on the leaves are still active, they won't allow the firewall to establish a new session until after 180 seconds when their current BGP session finally time out.
One way to solve this could be to make sure the route-TTL is set to 200 seconds so that BGP with its 180 second default hold timer has a chance to establish before the cluster-learned kernel routes expire on FW1b. Alternatively, we could lower the BGP hold time to 9 seconds or less to accomplish the same goal.
However, neither approach fully work. The reason is that when the BGP session is timed out LE03a will withdraw the default route it learned from FW1a. This route withdrawal is enough to create some small window in time where no default route exist inside the VRF. Traffic from any server that can't be routed by any of the leaves due to missing routes will yield an ICMP Host unreachable message back to that server. The server OS will then terminate that TCP session.
Any subsequent packet that was received for that TCP session will be dropped as there's no matching connection. If that was a long-running database synchronization process, the application now has to handle the fact that the TCP session has to reestablish. Some applications handle this better than others. So even if the route is gone for only a second, the impact may be great.
To the rescue comes BGP Graceful Restart. This is a feature that allow an established BGP adjacency to gracefully restart. By configuring set graceful-restart enable on FW1 and utilizing the enabled-by-default graceful-restart-helper command on the Arista leaves, LE03a and its leaf buddies will allow FW1b to establish a BGP adjancy before the stale adjacency to FW1a times out. As the same session is restarted there is no withdrawal/readvertisal of the default route, avoiding the ICMP Host unreachable problem.
config router bgp
set graceful-restart enable
end
*Graceful restart capability is automatically advertised to all BGP neighbors. *
Note: Applying this will hard restart all adjacencies. Also, this command unfortunately means that BGP neighbor-ranges nor neighbor-groups can be used on FW1. This is because the firewall has to initialize the adjacencies automatically when it comes online. When using a neighbor-range, it can only accept BGP adjacencies from neighbors, not initialize its own sessions.
The BGP sessions usually come up within a couple of seconds after a failover so there is generally little reason to increase the route-TTL, but if you may do so if it is required in your environment. Graceful Restart should not be used together with BFD as BFD would take down the old adjacency and cause a route withdrawal before a new adjacency even has a chance to establish.
The seemingly simple customer requirement of connecting firewall members to different leaf-pairs ended up sending me down a deep EVPN-shaped rabbit hole. From figuring out how L3VNIs take priority over L2VNIs to learning why Graceful Restart is a better option than BFD in this topology, I had a lot of new ground to cover.
I want to thank you for getting all the way to the end of this article. I hope you learned something and that reading it was worth your time.
If you want more to read, please consider other posts in my VXLAN series: