Firewall Cluster and EVPN

Published: 2024-06-09

This article will cover three interesting problems I faced when deploying a Fortigate Firewall (FW) cluster in a VXLAN EVPN (MLAG) datacenter topology. The customer had requested that each FW member was to be deployed in separate leaf-pairs for increased redundancy. The cluster were to provide stateful inter-VRF inspection. Let us quickly walk through the topology before we get to the problems:

Our two FW members FW1a and FW1b are deployed to different leaf-pairs; LE03 and LE04 respectively. While this decision makes perfect sense, it brought along a few interesting design decisions that we would not have to consider should the two FW members be connected to the same leaf pair.

I have also added two servers to the topology so that we can generate traffic to test the design, both connected to LE05. The servers live in vrfs A and B respectively, forcing traffic between them to traverse our FW. The diagram below displays the logical topology and traffic flow:

The IPv6 range 2001:db8:dc01::/48 was allocated to this data center site. From that range we allocate 2001:db8:dc01:a00::/56 to vrf A and 2001:db8:dc01:b00::/56 to vrf B. As FW1 should inspect traffic traveling between the two vrfs, it needs connectivity to both vrfs. We setup a linknet on vlan 100 for vrf A and a linknet on vlan 200 for vrf B. SRV6 was placed vlan 106 and SRV7 in vlan 207.

We run OSPF/iBGP in the VXLAN-EVPN underlay using unnumbered Ethernet links. This setup reduces the configuration as interfaces Ethernet1 through Ethernet6 on the spines are identical. It also greatly reduces the number of IP-addresses required for the topology. You can see the full (initial) configuration below:

service routing protocols model multi-agent
!
hostname SP01
!
interface Ethernet1-6
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Loopback0
   ip address 10.0.0.1/32
   ip ospf area 0.0.0.0
!
ip routing
!
router bgp 65000
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN route-reflector-client
   neighbor EVPN timers 5 15
   neighbor EVPN send-community
   neighbor 10.0.0.31 peer group EVPN
   neighbor 10.0.0.32 peer group EVPN
   neighbor 10.0.0.41 peer group EVPN
   neighbor 10.0.0.42 peer group EVPN
   neighbor 10.0.0.51 peer group EVPN
   neighbor 10.0.0.52 peer group EVPN
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname SP02
!
interface Ethernet1-6
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Loopback0
   ip address 10.0.0.2/32
   ip ospf area 0.0.0.0
!
ip routing
!
router bgp 65000
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN route-reflector-client
   neighbor EVPN timers 5 15
   neighbor EVPN send-community
   neighbor 10.0.0.31 peer group EVPN
   neighbor 10.0.0.32 peer group EVPN
   neighbor 10.0.0.41 peer group EVPN
   neighbor 10.0.0.42 peer group EVPN
   neighbor 10.0.0.51 peer group EVPN
   neighbor 10.0.0.52 peer group EVPN
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname LE03a
!
vlan 100
   name "FW1;VRF=A"
!
vlan 200
   name "FW1;VRF=B"
!
vlan 4094
   name "MLAG"
   trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
   description "PEER-LINK"
   switchport mode trunk
   switchport trunk group PEER-LINK
!
interface Port-Channel4
   description "FW1"
   switchport mode trunk
   mlag 4
!
interface Ethernet1
   description "SP01"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   description "SP02"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description "LE03b"
   channel-group 3 mode active
!
interface Ethernet4
   description "FW1a"
   channel-group 4 mode active
!
interface Loopback0
   description "OSPF/BGP ROUTER-ID"
   ip address 10.0.0.31/32
   ip ospf area 0.0.0.0
!
interface Loopback1
   description "VXLAN SOURCE-INTERFACE"
   ip address 10.0.0.3/32
   ip ospf area 0.0.0.0
!
interface Vlan100
   description "FW1;VRF=A"
   vrf A
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:a00::3/64
!
interface Vlan200
   description "FW1;VRF=B"
   vrf B
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:b00::3/64
!
interface Vlan4094
   no autostate
   ip address 10.0.3.1/30
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 100 vni 100
   vxlan vlan 200 vni 200
   vxlan vrf A vni 5001
   vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
   domain-id LE03
   local-interface Vlan4094
   peer-address 10.0.3.2
   peer-link Port-Channel3
   reload-delay mlag 60
   reload-delay non-mlag 30
!
router bgp 65000
   no bgp default ipv4-unicast
   distance bgp 20 200 200
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 100
      rd 10.0.0.31:100
      route-target both 65000:100
      redistribute learned
   !
   vlan 200
      rd 10.0.0.31:200
      route-target both 65000:200
      redistribute learned
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname LE03b
!
vlan 100
   name "FW1;VRF=A"
!
vlan 200
   name "FW1;VRF=B"
!
vlan 4094
   name "MLAG"
   trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
   description "PEER-LINK"
   switchport mode trunk
   switchport trunk group PEER-LINK
!
interface Port-Channel4
   description "FW1"
   switchport mode trunk
   mlag 4
!
interface Ethernet1
   description "SP01"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   description "SP02"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description "LE03a"
   channel-group 3 mode active
!
interface Ethernet4
   description "FW1a"
   channel-group 4 mode active
!
interface Loopback0
   description "OSPF/BGP ROUTER-ID"
   ip address 10.0.0.32/32
   ip ospf area 0.0.0.0
!
interface Loopback1
   description "VXLAN SOURCE-INTERFACE"
   ip address 10.0.0.3/32
   ip ospf area 0.0.0.0
!
interface Vlan100
   description "FW1;VRF=A"
   vrf A
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:a00::4/64
!
interface Vlan200
   description "FW1;VRF=B"
   vrf B
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:b00::4/64
!
interface Vlan4094
   no autostate
   ip address 10.0.3.2/30
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 100 vni 100
   vxlan vlan 200 vni 200
   vxlan vrf A vni 5001
   vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
   domain-id LE03
   local-interface Vlan4094
   peer-address 10.0.3.1
   peer-link Port-Channel3
   reload-delay mlag 60
   reload-delay non-mlag 30
!
router bgp 65000
   no bgp default ipv4-unicast
   distance bgp 20 200 200
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 100
      rd 10.0.0.32:100
      route-target both 65000:100
      redistribute learned
   !
   vlan 200
      rd 10.0.0.32:200
      route-target both 65000:200
      redistribute learned
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname LE04a
!
vlan 100
   name "FW1;VRF=A"
!
vlan 200
   name "FW1;VRF=B"
!
vlan 4094
   name "MLAG"
   trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
   description "PEER-LINK"
   switchport mode trunk
   switchport trunk group PEER-LINK
!
interface Port-Channel4
   description "FW1"
   switchport mode trunk
   mlag 4
!
interface Ethernet1
   description "SP01"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   description "SP02"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description "LE04b"
   channel-group 3 mode active
!
interface Ethernet4
   description "FW1a"
   channel-group 4 mode active
!
interface Loopback0
   description "OSPF/BGP ROUTER-ID"
   ip address 10.0.0.41/32
   ip ospf area 0.0.0.0
!
interface Loopback1
   description "VXLAN SOURCE-INTERFACE"
   ip address 10.0.0.4/32
   ip ospf area 0.0.0.0
!
interface Vlan100
   description "FW1;VRF=A"
   vrf A
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:a00::5/64
!
interface Vlan200
   description "FW1;VRF=B"
   vrf B
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:b00::5/64
!
interface Vlan4094
   no autostate
   ip address 10.0.4.1/30
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 100 vni 100
   vxlan vlan 200 vni 200
   vxlan vrf A vni 5001
   vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
   domain-id LE04
   local-interface Vlan4094
   peer-address 10.0.4.2
   peer-link Port-Channel3
   reload-delay mlag 60
   reload-delay non-mlag 30
!
router bgp 65000
   distance bgp 20 200 200
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 100
      rd 10.0.0.41:100
      route-target both 65000:100
      redistribute learned
   !
   vlan 200
      rd 10.0.0.41:200
      route-target both 65000:200
      redistribute learned
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname LE04b
!
vlan 100
   name "FW1;VRF=A"
!
vlan 200
   name "FW1;VRF=B"
!
vlan 4094
   name "MLAG"
   trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
   description "PEER-LINK"
   switchport mode trunk
   switchport trunk group PEER-LINK
!
interface Port-Channel4
   description "FW1"
   switchport mode trunk
   mlag 4
!
interface Ethernet1
   description "SP01"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   description "SP02"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description "LE04a"
   channel-group 3 mode active
!
interface Ethernet4
   description "FW1a"
   channel-group 4 mode active
!
interface Loopback0
   description "OSPF/BGP ROUTER-ID"
   ip address 10.0.0.42/32
   ip ospf area 0.0.0.0
!
interface Loopback1
   description "VXLAN SOURCE-INTERFACE"
   ip address 10.0.0.4/32
   ip ospf area 0.0.0.0
!
interface Vlan100
   description "FW1;VRF=A"
   vrf A
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:a00::6/64
!
interface Vlan200
   description "FW1;VRF=B"
   vrf B
   arp aging timeout 290
   ipv6 address 2001:db8:dc01:b00::6/64
!
interface Vlan4094
   no autostate
   ip address 10.0.4.2/30
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 100 vni 100
   vxlan vlan 200 vni 200
   vxlan vrf A vni 5001
   vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
ip routing vrf A
ip routing vrf B
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
   domain-id LE04
   local-interface Vlan4094
   peer-address 10.0.4.1
   peer-link Port-Channel3
   reload-delay mlag 60
   reload-delay non-mlag 30
!
router bgp 65000
   distance bgp 20 200 200
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 100
      rd 10.0.0.42:100
      route-target both 65000:100
      redistribute learned
      no redistribute host-route
   !
   vlan 200
      rd 10.0.0.42:200
      route-target both 65000:200
      redistribute learned
      no redistribute host-route
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname LE05a
!
vlan 106
   name SRV6
!
vlan 207
   name SRV7
!
vlan 4094
   name "MLAG"
   trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
   description "PEER-LINK"
   switchport mode trunk
   switchport trunk group PEER-LINK
!
interface Ethernet1
   description "SP01"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   description "SP02"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description "LE05b"
   channel-group 3 mode active
!
interface Ethernet6
   description SRV6
   switchport access vlan 106
!
interface Loopback0
   description "OSPF/BGP ROUTER-ID"
   ip address 10.0.0.51/32
   ip ospf area 0.0.0.0
!
interface Loopback1
   description "VXLAN SOURCE-INTERFACE"
   ip address 10.0.0.5/32
   ip ospf area 0.0.0.0
!
interface Vlan106
   vrf A
   arp aging timeout 290
   ipv6 address virtual 2001:db8:dc01:a06::1/64
!
interface Vlan207
   vrf B
   arp aging timeout 290
   ipv6 address virtual 2001:db8:dc01:b07::1/64
!
interface Vlan4094
   no autostate
   ip address 10.0.5.1/30
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 106 vni 106
   vxlan vlan 207 vni 207
   vxlan vrf A vni 5001
   vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
ip routing vrf A
ip routing vrf B
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
   domain-id LE05
   local-interface Vlan4094
   peer-address 10.0.5.2
   peer-link Port-Channel3
   reload-delay mlag 60
   reload-delay non-mlag 30
!
router bgp 65000
   router-id 10.0.0.51
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 106
      rd 10.0.0.51:106
      route-target both 65000:106
      redistribute learned
   !
   vlan 207
      rd 10.0.0.51:207
      route-target both 65000:207
      redistribute learned
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
   !
   vrf A
      rd 10.0.0.51:5001
      route-target import evpn 65000:5001
      route-target export evpn 65000:5001
      router-id 10.0.0.51
      bgp default ipv6-unicast
      redistribute connected
   !
   vrf B
      rd 10.0.0.51:5002
      route-target import evpn 65000:5002
      route-target export evpn 65000:5002
      router-id 10.0.0.51
      bgp default ipv6-unicast
      redistribute connected
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

service routing protocols model multi-agent
!
hostname LE05b
!
vlan 106
   name SRV6
!
vlan 207
   name SRV7
!
vlan 4094
   name "MLAG"
   trunk group PEER-LINK
!
vrf instance A
!
vrf instance B
!
interface Port-Channel3
   description "PEER-LINK"
   switchport mode trunk
   switchport trunk group PEER-LINK
!
interface Ethernet1
   description "SP01"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   description "SP02"
   no switchport
   ip address unnumbered Loopback0
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description "LE05a"
   channel-group 3 mode active
!
interface Ethernet7
   description SRV7
   switchport access vlan 207
!
interface Loopback0
   description "OSPF/BGP ROUTER-ID"
   ip address 10.0.0.52/32
   ip ospf area 0.0.0.0
!
interface Loopback1
   description "VXLAN SOURCE-INTERFACE"
   ip address 10.0.0.5/32
   ip ospf area 0.0.0.0
!
interface Vlan106
   vrf A
   arp aging timeout 290
   ipv6 address virtual 2001:db8:dc01:a06::1/64
!
interface Vlan207
   vrf B
   arp aging timeout 290
   ipv6 address virtual 2001:db8:dc01:b07::1/64
!
interface Vlan4094
   no autostate
   ip address 10.0.5.2/30
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 106 vni 106
   vxlan vlan 207 vni 207
   vxlan vrf A vni 5001
   vxlan vrf B vni 5002
!
ip virtual-router mac-address 00:11:22:33:44:55
!
ip routing
ip routing vrf A
ip routing vrf B
!
ipv6 unicast-routing
ipv6 unicast-routing vrf A
ipv6 unicast-routing vrf B
!
mlag configuration
   domain-id LE05
   local-interface Vlan4094
   peer-address 10.0.5.1
   peer-link Port-Channel3
   reload-delay mlag 60
   reload-delay non-mlag 30
!
router bgp 65000
   router-id 10.0.0.52
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 106
      rd 10.0.0.52:106
      route-target both 65000:106
      redistribute learned
   !
   vlan 207
      rd 10.0.0.52:207
      route-target both 65000:207
      redistribute learned
   !
   address-family evpn
      neighbor EVPN activate
   !
   address-family ipv4
      no neighbor EVPN activate
   !
   vrf A
      rd 10.0.0.52:5001
      route-target import evpn 65000:5001
      route-target export evpn 65000:5001
      router-id 10.0.0.52
      bgp default ipv6-unicast
      redistribute connected
   !
   vrf B
      rd 10.0.0.52:5002
      route-target import evpn 65000:5002
      route-target export evpn 65000:5002
      router-id 10.0.0.52
      bgp default ipv6-unicast
      redistribute connected
!
router ospf 1
   redistribute connected
   max-lsa 12000
!
end

config system global
    set hostname "FW1a"
end
config system ha
    set group-id 1
    set group-name "FW1"
    set mode a-p
    set hbdev "port3" 0
    set session-pickup enable
    set session-pickup-delay enable
    set override enable
    set priority 200
end
config system interface
    edit "port1"
        set mode static
    next
    edit "LEAF"
        set vdom "root"
        set type aggregate
        set member "port1" "port2"
        set lldp-transmission enable
        set snmp-index 9
    next
    edit "LEAF;VRF=A"
        set vdom "root"
        set device-identification enable
        set role lan
        set snmp-index 10
        config ipv6
            set ip6-address 2001:db8:dc01:a00::1/64
            set ip6-allowaccess ping
        end
        set interface "LEAF"
        set vlanid 100
    next
    edit "LEAF;VRF=B"
        set vdom "root"
        set device-identification enable
        set role lan
        set snmp-index 12
        config ipv6
            set ip6-address 2001:db8:dc01:b00::1/64
            set ip6-allowaccess ping
        end
        set interface "LEAF"
        set vlanid 200
    next
end
config firewall policy
    edit 1
        set srcintf "any"
        set dstintf "any"
        set action accept
        set srcaddr "all"
        set dstaddr "all"
        set srcaddr6 "all"
        set dstaddr6 "all"
        set schedule "always"
        set service "ALL"
    next
end

config system global
    set hostname "FW1b"
end
config system ha
    set group-id 1
    set group-name "FW1"
    set mode a-p
    set hbdev "port3" 0
    set session-pickup enable
    set session-pickup-delay enable
    set override enable
    set priority 150
end

Problem One - BGP Adjacencies

We decided that we wanted to use a routing protocol between FW1 and the leaves in each vrf to exchange routes. We chose to use eBGP with FW1 residing in AS 65001 and the leaves in AS 65000. The FW should advertise a default route into each vrf; the leaves should advertise directly-connected subnets.

This leads us to our first interesting design choice. The initial plan was for each FW member to establish adjacencies to the leaf-pair it was physically connected to. FW1a would peer with LE03a/b; FW1b would peer with LE04a/b. One benefit of this plan would be that only the leaf-pair connected to the active FW member would receive the default route, ensuring that all traffic to exit the vrf would always hit LE03 when FW1a was active; LE04 when FW1b was active. This plan does not work for multiple reasons. The largest problem is that the configuration is synchronized between FW1a/b, so whichever adjacencies you configure on one member will also exist on the second.

The only viable solution here is to configure FW1a/b to peer with both LE03a/b and LE04a/b, totaling four adjacencies per vrf. This is what that configuration looks like:

FW1a # show router bgp
config router prefix-list
    edit "DEFAULT_ROUTE"
        config rule
            edit 1
                set prefix 0.0.0.0 0.0.0.0
            next
        end
    next
end
config router prefix-list6
    edit "DEFAULT_ROUTE"
        config rule
            edit 1
                set prefix6 ::/0
                unset ge
                unset le
            next
        end
    next
end
config router route-map
    edit "RM_LEAF_OUT"
        config rule
            edit 4
                set match-ip-address "DEFAULT_ROUTE"
            next
            edit 6
                set match-ip6-address "DEFAULT_ROUTE"
            next
        end
    next
end
config router bgp
    set as 65001
    config neighbor
        edit "2001:db8:dc01:a00::3"
            set advertisement-interval 0
            set description "LE03a;VRF=A"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:a00::4"
            set advertisement-interval 0
            set description "LE03b;VRF=A"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:a00::5"
            set advertisement-interval 0
            set description "LE04a;VRF=A"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:a00::6"
            set advertisement-interval 0
            set description "LE04b;VRF=A"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:b00::3"
            set advertisement-interval 0
            set description "LE03a;VRF=B"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:b00::4"
            set advertisement-interval 0
            set description "LE03b;VRF=B"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:b00::5"
            set advertisement-interval 0
            set description "LE04a;VRF=B"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
        edit "2001:db8:dc01:b00::6"
            set advertisement-interval 0
            set description "LE04b;VRF=B"
            set remote-as 65000
            set route-map-out6 "RM_LEAF_OUT"
        next
    end
    config redistribute6 "static"
        set status enable
    end
end
# Default route to be advertised into each VRF.
config router static6
    edit 1
        set blackhole enable
    next
end

interface Vlan100
   description "FW1;VRF=A"
   vrf A
   ipv6 address 2001:db8:dc01:a00::3/64
!
router bgp 65000
   !
   vlan 100
      rd 10.0.0.31:100
      route-target both 65000:100
      redistribute learned
   !
   vlan 200
      rd 10.0.0.31:200
      route-target both 65000:200
      redistribute learned
   !
   vrf A
      rd 10.0.0.31:5001
      route-target import evpn 65000:5001
      route-target export evpn 65000:5001
      router-id 10.0.0.31
      bgp default ipv6-unicast
      neighbor 2001:db8:dc01:a00::1 remote-as 65001
      neighbor 2001:db8:dc01:a00::1 description FW1;VRF=A
      neighbor 2001:db8:dc01:a00::1 timers 10 30
      redistribute connected
   !
   vrf B
      rd 10.0.0.31:5002
      route-target import evpn 65000:5002
      route-target export evpn 65000:5002
      router-id 10.0.0.31
      bgp default ipv6-unicast
      neighbor 2001:db8:dc01:b00::1 remote-as 65001
      neighbor 2001:db8:dc01:b00::1 description FW1;VRF=B
      neighbor 2001:db8:dc01:b00::1 timers 10 30
      redistribute connected

interface Vlan100
   description "FW1;VRF=A"
   vrf A
   ipv6 address 2001:db8:dc01:a00::6/64
!
router bgp 65000
   !
   vlan 100
      rd 10.0.0.42:100
      route-target both 65000:100
      redistribute learned
   !
   vlan 200
      rd 10.0.0.42:200
      route-target both 65000:200
      redistribute learned
   !
   vrf A
      rd 10.0.0.42:5001
      route-target import evpn 65000:5001
      route-target export evpn 65000:5001
      router-id 10.0.0.42
      bgp default ipv6-unicast
      neighbor 2001:db8:dc01:a00::1 remote-as 65001
      neighbor 2001:db8:dc01:a00::1 description FW1;VRF=A
      neighbor 2001:db8:dc01:a00::1 timers 10 30
      redistribute connected
   !
   vrf B
      rd 10.0.0.42:5002
      route-target import evpn 65000:5002
      route-target export evpn 65000:5002
      router-id 10.0.0.42
      bgp default ipv6-unicast
      neighbor 2001:db8:dc01:b00::1 remote-as 65001
      neighbor 2001:db8:dc01:b00::1 description FW1;VRF=B
      neighbor 2001:db8:dc01:b00::1 timers 10 30
      redistribute connected

Alright, that was relatively easy. Or was it..?

Problem Two - BGP does not establish

When deploying the configuration above, we noticed that when FW1a is the active member, it would only establish BGP adjacencies to its physically connected neighbors LE03a/b. The adjacencies to LE04a/b would not come up. If we failed over the cluster to FW1b then it would establish adjacencies to LE04a/b, but not LE03a/b. ICMPv6 worked just fine, the Fortigate could reach all leaves in the subnet. The NDP table show neighbor addresses MAC-addresses correctly, also as expected. But BGP just wouldn't establish.

Finding the answer require some information gathering and digging through packet captures. Let's analyze the output below:

LE04b(vrf:A)#show bgp evpn route-type mac-ip vni 100 detail

"L2VNI"
BGP routing table entry for mac-ip 0009.0f09.0100
 RD: 10.0.0.31:100
 Paths: 1 available
  Local
    10.0.0.3 from 10.0.0.1 (10.0.0.1)
      Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
      Originator: 10.0.0.31, Cluster list: 10.0.0.1
      Extended Community:
        Route-Target-AS:65000:100
        TunnelEncap:tunnelTypeVxlan
      VNI: 100

"L2VNI + L3VNI"
BGP routing table entry for mac-ip 0009.0f09.0100 2001:db8:dc01:a00::1
 RD: 10.0.0.32:100
 Paths: 1 available
  Local
    10.0.0.3 from 10.0.0.1 (10.0.0.1)
      Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best
      Originator: 10.0.0.32, Cluster list: 10.0.0.1
      Extended Community:
        Route-Target-AS:65000:100
        Route-Target-AS:65000:5001
        TunnelEncap:tunnelTypeVxlan
        EvpnRouterMac:5001.0000.0032
      VNI: 100
      L3 VNI: 5001

The above output shows the EVPN routes learned for VNI 100. We have received two routes; one containing only L2VNI information and the other containing both L2VNI and L3VNI. We can see this because the second entry tell us the IPv6-address in addition to the MAC-address. Each route generate its own set of entries in their respective table. For example, the L2VNI-only route generates these MAC/VXLAN-address table entries:

"L2VNI"
LE04b(vrf:A)#show mac address-table vlan 100
Vlan    Mac Address       Type        Ports      Moves   Last Move
----    -----------       ----        -----      -----   ---------
 100    0009.0f09.0100    DYNAMIC     Vx1        1       1:20:08 ago

LE04b(vrf:A)#show vxlan address-table vlan 100
VLAN  Mac Address     Type      Prt  VTEP             Moves   Last Move
----  -----------     ----      ---  ----             -----   ---------
 100  0009.0f09.0100  EVPN      Vx1  10.0.0.3         1       1:20:08 ago

The L2VNI-only EVPN route generated the MAC-address table entry above. It also created an entry in the VXLAN-address table, telling VXLAN behind VTEP the MAC-address lives.

Moving on to the L2VNI+L3VNI route, this route generated the following ND-entries and static route:

"L2VNI + L3VNI"
LE04b(vrf:A)#show ipv6 neighbors 2001:db8:dc01:a00::1
IPv6 Address           Age Hardware Addr   Interface
2001:db8:dc01:a00::1   -   0009.0f09.0100  Vl100, Vxlan1

LE04b(vrf:A)#show ipv6 route 2001:db8:dc01:a00::1

 B I      2001:db8:dc01:a00::1/128 [200/0]
           via VTEP 10.0.0.3 VNI 5001 router-mac 5001.0000.0032
 C        2001:db8:dc01:a00::/64 [0/0]
           via Vlan100, directly connected

The L2VNI+L3VNI route gave our leaf enough information to generate an entry in the IPv6 Neighbor-table aswell as create a static ::/128 host-route pointing behind a VXLAN VTEP.

Packet captures

I couldn't find anything conclusive based on the above output, so I resorted to performing a packet capture. Let's analyze the packets captured on FW1:

"Request"
  Ethernet II, Src: 0009.0f09.0100, Dst: 5001.0000.0042
  802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::1, Dst: 2001:db8:dc01:a00::6
  Internet Control Message Protocol v6
    Type: Echo request

"Reply"
  Ethernet II, Src: 5001.0000.0032, Dst: 0009.0f09.0100
  802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::6, Dst: 2001:db8:dc01:a00::1
  Internet Control Message Protocol v6
    Type: Echo reply

The above capture show an ICMPv6 request/reply from FW1 to LE04b. In the request we can see that the destination MAC-address is set to 5001.0000.0042, the MAC-address of LE04b. However, the reply was sent from 5001.0000.0032. That MAC-address belongs to LE03b. That's weird.

Let's continue and look at the packets captured on between SP01 and LE04b:

"Request"
  Ethernet II, Src: 5001.0000.0001, Dst: 5001.0000.0042
  Internet Protocol Version 4, Src: 10.0.0.3, Dst: 10.0.0.4
  User Datagram Protocol, Src Port: 41830, Dst Port: 4789
  Virtual eXtensible Local Area Network
    VXLAN Network Identifier (VNI): 100
  Ethernet II, Src: 0009.0f09.0100, Dst: 5001.0000.0042
  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::1, Dst: 2001:db8:dc01:a00::6
  Internet Control Message Protocol v6
    Type: Echo (ping) request (128)

"Reply"
  Ethernet II, Src: 5001.0000.0042, Dst: 5001.0000.0001
  Internet Protocol Version 4, Src: 10.0.0.4, Dst: 10.0.0.3
  User Datagram Protocol, Src Port: 53928, Dst Port: 4789
  Virtual eXtensible Local Area Network
    VXLAN Network Identifier (VNI): 5001
  Ethernet II, Src: 5001.0000.0042, Dst: 5001.0000.0032
  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::6, Dst: 2001:db8:dc01:a00::1
  Internet Control Message Protocol v6
    Type: Echo (ping) reply (129)

MAC-address 5001.0000.0001 belongs to SP01

This time the ICMPv6 packets are VXLAN-encapsulated. The VXLAN-packet was received from LE03 (10.0.0.3) to LE04 (10.0.0.4) on L2VNI 100. The inner packet has FW1 as the source MAC-address and LE04b as the destination(5001.0000.0042). So far so good.

When looking at the reply from LE04b back to FW1, we see a few differences. One difference is that LE04b decided to use the L3VNI 5001. We can also see that the inner packet destination MAC-address is set to LE03b (5001.0000.0032) instead of FW1.

Packet capture diagrams

In case the output above was difficult to parse, I took the liberty of creating two diagrams to help visualize what's happening. I also simplified the MAC-addresses to instead show hostnames. Let's see if we can spot what's going wrong:

ICMP packet walk from FW1 to LE04b

The above diagram has pieced together the packet captures we did on the FW1-LE03b and SP01-LE04b links, showing the ICMP request as it travels from FW1 to LE04b. Everything looks correct, so let us review the opposite direction:

ICMP reply from LE04b back to FW1

As we can see in the diagram above, there are lots things going wrong. One example is that the inside packet generated by LE04b has LE03b as its destination MAC-address, not FW1. But this is just a symptom. The root cause is that LE04b is using L3VNI 5001 to get the packet to FW1, not L2VNI 100.

We figured it out!

The above output tells us everything we need to know! As we learned in the L3VPN article, L3VNI require some gymnastics to be performed by the sending and receiving leaf:

Sending leaf (LE04b)

LE04b(vrf:A)#show ipv6 route 2001:db8:dc01:a00::1
 B I      2001:db8:dc01:a00::1/128 [200/0]
           via VTEP 10.0.0.3 VNI 5001 router-mac 5001.0000.0032 (LE03b)

According to the host-route installed on LE04b, for packet to reach FW1, LE04b must VXLAN-encapsulate the packet with VNI 5001 and set LE03b as the destination MAC-address. LE04b is a good boy and does as requested, causing all sorts of problems.

Receiving leaf (LE03b)

LE03b receive the VXLAN packet and see that VNI 5001 matches vrf A. When the inner Ethernet frame is processed LE03b realizes that the destination MAC-address is its own MAC-address, so it knows it is the intended destination for the frame. At this point the Ethernet frame has done its job and its header is discarded.

Now LE03b examines the IP header. It sees that the destination IP 2001:db8:dc01:a00::1 (FW1) is not a local IP-address, so LE03b is not the intended destination for the IP packet. A routing lookup is performed to learn that the destination is directly connected via the Vlan100 interface.

LE03b generates a new Ethernet header, setting itself (5001.0000.0032) as the source and FW1 (0009.0f09.0100) as the destination. The Ethernet header is prepended to the IP packet and the frame is sent to FW1.

Not the solution

I had to spend a lot of time thinking about how to solve this. My initial plan was to utilize the no redistribute host-route command on the FW1 vlans. This would stop the L2VNI+L3VNI EVPN route from being advertised into the VRF, which in turn stop the other leaves from generating that pesky FW1 ::/128 host-route:

router bgp 65000
   !
   vlan 100
      no redistribute host-route
   !
   vlan 200
      no redistribute host-route

However, not advertising the host-route creates a problem on LE05 as this leaf-pair relies on the host-route to figure out which leaf-pair FW1 is currently connected to. Without the host-route, LE05 only knows how to reach 2001:db8:dc01:a00::/64. This route is advertised by both LE03 and LE04, so traffic would be load-balanced between the two leaf-pairs. This would cause a suboptimal routing as whenever FW1a was active, traffic along the LE05-SP01-LE04 path would have to be VXLAN-forwarded back via LE04-SP01-LE03 before it could be sent to FW1.

We need a way to selectively block the L3VNI routes on LE03 and LE04 while still advertising the host-routes to LE05.

Problem Two Solution

I ended up creating an inbound EVPN route-map on LE03 and LE04 to reject any route containing both the 65000:100 and 65000:5001 extended communities. Let's look at the config below:

ip extcommunity-list ECL_FW1_L2VNI permit 65000:100
ip extcommunity-list ECL_FW1_L2VNI permit 65000:200
ip extcommunity-list ECL_FW1_L3VNI permit 65000:5001
ip extcommunity-list ECL_FW1_L3VNI permit 65000:5002

route-map "RM_EVPN_IN" deny 10
   match extcommunity ECL_FW1_L2VNI
   sub-route-map "RM_MATCH_FW1_L3VNI"
!
route-map "RM_EVPN_IN" permit 100

route-map "RM_MATCH_FW1_L3VNI" permit 10
   match extcommunity ECL_FW1_L3VNI
!
route-map "RM_MATCH_FW1_L3VNI" deny 100

router bgp 65000
   neighbor EVPN route-map "RM_EVPN_IN" in

To make this work I had to combine two route-maps as it was the only way that would allow me to match only if two communities were present on the same route. Let's start with RM_EVPN_IN that is applied inbound on our EVPN bgp neighbors SP01 and SP02.

Whenever a route was received from these neighbors then the route-map RM_EVPN_IN deny 10 would first be checked. This entry would only match if the route contained extended communities 65000:100 or 65000:200, as configured in ECL_FW1_L2VNI. If there was a match, we would then use the sub-route-map command to call RM_MATCH_FW1_L3VNI which would check if the route contained extended communities 65000:5001 or 65000:5002. If both route-maps found a match, the route would be rejected and no ::/128 host-route would be installed.

This solution stops LE03 and LE04 from installing the FW1 host-route while still allowing LE05 to install it and avoid suboptimal routing. We can see the routes from LE03 being rejected on LE04b below:

LE04b#show bgp evpn route-type mac-ip 
          Network                Next Hop        
 * >      RD: 10.0.0.31:100 mac-ip 0009.0f09.010
                                 10.0.0.3        
 * >      RD: 10.0.0.31:200 mac-ip 0009.0f09.010
                                 10.0.0.3        
 * >      RD: 10.0.0.32:100 mac-ip 0009.0f09.010
                                 10.0.0.3        
          RD: 10.0.0.31:100 mac-ip 0009.0f09.0100 2001:db8:dc01:a00::
                                 PolicyReject
          RD: 10.0.0.32:100 mac-ip 0009.0f09.0100 2001:db8:dc01:a00::
                                 PolicyReject
          RD: 10.0.0.31:200 mac-ip 0009.0f09.0100 2001:db8:dc01:b00::
                                 PolicyReject

Note: I must confess that I do not know if there is a better way to achieve this. The EVPN route filtering options are limited, atleast on the Arista Lab platform that I'm using. I get the feeling that the design I'm going for is not something you're supposed to do, that I'm doing something wrong. Anyway, whatever.

We now have a fully functioning topology where FW1 can establish BGP adjacencies to the LE03 and LE04 leaves. LE05 receive the host-route, allowing it to forward traffic to the leaf closest to FW1. Everything is great, but then a FW1 HA failover is triggered...

Problem Three - HA failover Convergence

You are doing a great job keeping up thus far. We're getting close to the end but we have one more big problem to solve.

What happens to the routing when the FW1 cluster perform a failover? If I perform a redundancy test and kill FW1a while pinging from SRV6 to SRV7, I can see that traffic continues to flow for about 10 or seconds. After that point no traffic is getting through. Then, after around 180 seconds or so, traffic start flowing again. This is of course unacceptable in this day and age, a HA failover should be seamless.

To understand this problem we need to explain a few things. The first thing we need to know is what state is synchronized between the two FW1 members:

  • TCP sessions flowing through the cluster are synchronized thanks to the set session-pickup enable command under config system ha. This allows FW1b to seamlessly continue traffic inspection when it becomes the active member. As the Fortigate is a stateful device, FW1b would not allow traffic for an established TCP session if there was no matching entry in its session table. So synchronizing the session table is vital for a failover to not impact active sessions flowing through it.

  • The kernel routing table on the active member is also synchronized. This table is the FIB and is used to make routing decisions for incoming packets. Synchronizing this table ensures that FW1b can continue forwarding packets after a failover, before it has had a chance to establish any routing protocol adjacencies. The default route-TTL is 10 seconds.
    When this timer expires, the synchronized kernel routes are removed. This essentially gives FW1b member a 10 second window to establish new routing protocol adjacencies to learn new routes and replace the ones learned from FW1a.

  • BGP sessions are not synchronized to the passive member. This means that any established BGP session has to be reestablished on failover. The reason this is a problem for us is that, by default, BGP will not accept more than one active session per adjacency. If LE03a has an established adjacency to FW1 and it suddenly receive a BGP Hello from the same neighbor, LE03a will ignore it. Not until the current BGP session has timed out will LE03a allow a new BGP session to establish.

Based on this information we can figure out what's happening here. In the first ten seconds after failover, the new primary firewall can keep forwarding traffic based on the synchronized kernel routes. During this time the firewall attempts to establish the BGP adjacencies it finds in its configuration, but receive no response from the leaves. After ten seconds, these routes time out and are removed from the kernel routing table. As the BGP adjacencies on the leaves are still active, they won't allow the firewall to establish a new session until after 180 seconds when their current BGP session finally time out.

One way to solve this could be to make sure the route-TTL is set to 200 seconds so that BGP with its 180 second default hold timer has a chance to establish before the cluster-learned kernel routes expire on FW1b. Alternatively, we could lower the BGP hold time to 9 seconds or less to accomplish the same goal.

However, neither approach fully work. The reason is that when the BGP session is timed out LE03a will withdraw the default route it learned from FW1a. This route withdrawal is enough to create some small window in time where no default route exist inside the VRF. Traffic from any server that can't be routed by any of the leaves due to missing routes will yield an ICMP Host unreachable message back to that server. The server OS will then terminate that TCP session.

Any subsequent packet that was received for that TCP session will be dropped as there's no matching connection. If that was a long-running database synchronization process, the application now has to handle the fact that the TCP session has to reestablish. Some applications handle this better than others. So even if the route is gone for only a second, the impact may be great.

Problem Three Solution

To the rescue comes BGP Graceful Restart. This is a feature that allow an established BGP adjacency to gracefully restart. By configuring set graceful-restart enable on FW1 and utilizing the enabled-by-default graceful-restart-helper command on the Arista leaves, LE03a and its leaf buddies will allow FW1b to establish a BGP adjancy before the stale adjacency to FW1a times out. As the same session is restarted there is no withdrawal/readvertisal of the default route, avoiding the ICMP Host unreachable problem.

config router bgp
   set graceful-restart enable
end

*Graceful restart capability is automatically advertised to all BGP neighbors. *

Note: Applying this will hard restart all adjacencies. Also, this command unfortunately means that BGP neighbor-ranges nor neighbor-groups can be used on FW1. This is because the firewall has to initialize the adjacencies automatically when it comes online. When using a neighbor-range, it can only accept BGP adjacencies from neighbors, not initialize its own sessions.

The BGP sessions usually come up within a couple of seconds after a failover so there is generally little reason to increase the route-TTL, but if you may do so if it is required in your environment. Graceful Restart should not be used together with BFD as BFD would take down the old adjacency and cause a route withdrawal before a new adjacency even has a chance to establish.

Conclusion

The seemingly simple customer requirement of connecting firewall members to different leaf-pairs ended up sending me down a deep EVPN-shaped rabbit hole. From figuring out how L3VNIs take priority over L2VNIs to learning why Graceful Restart is a better option than BFD in this topology, I had a lot of new ground to cover.

I want to thank you for getting all the way to the end of this article. I hope you learned something and that reading it was worth your time.

If you want more to read, please consider other posts in my VXLAN series:

References:

Copyright 2021-2023, Emil Eliasson.
All Rights Reserved.