VXLAN L2VPN

Published: 2023-01-13
Updated: 2023-01-20

This article will cover the main feature and selling point of VXLAN, tunneling Ethernet frames over a routed network to create L2VPN topologies. I will focus on Arista vEOS in my lab topology as it is the vendor whose VXLAN implementation I know the best. The article will start by looking into VXLAN flood and learn, the default and simplest mode. There is no separate control plane and MAC-addresses are learned when Ethernet frames are forwarded from end devices.

After examining flood and learn we will venture into a modern approach using the BGP EVPN address family to create a separate control plane for our VXLAN topology. This involves dynamically building VXLAN flood lists and advertising MAC-addresses between switches. We end the article with a look at the powerful but tricky ARP Suppression mechanism enabled by EVPN.

This article builds on my VXLAN Introduction post, so you may want to skim through that post first.


Topology

Our configuration is based on the topology diagram described below. SW1-SW3 are VXLAN L2VPN capable devices with two L2VPNs configured, BLUE and GREEN.


VXLAN Flood and learn

This is the default VXLAN operating mode. The switches perform normal MAC-learning whenever they receive an Ethernet frame, adding the source MAC-address as an entry in the local MAC-address table describing which switchport the MAC-address was learned on. Any future Ethernet frame that is received with a destination MAC-address matching this entry is then only forwarded out on the corresponding switchport. This saves a lot of unnecessary flooding when unicast packets are transported between end devices.

Let's examine the configuration for the topology above using VXLAN forwarding:

! device: SW1 (vEOS-lab, EOS-4.28.3M)
service routing protocols model multi-agent
vlan 10
   name BLUE
!
vlan 20
   name GREEN
!
interface Ethernet1
   description R1
   switchport mode trunk
!
interface Ethernet2
   description SW2
   no switchport
   ip address 10.1.2.1/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   description SW3
   no switchport
   ip address 10.1.3.1/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet4
   description PC11
   switchport access vlan 10
!
interface Ethernet5
   description PC12
   switchport access vlan 20
!
interface Loopback0
   description VXLAN-VTEP
   ip address 10.0.0.1/32
!
interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vlan 10 vni 10
   vxlan vlan 20 vni 20
   vxlan flood vtep 10.0.0.2 10.0.0.3
!
ip routing
!
router ospf 1
   redistribute connected
!
end

! device: SW2 (vEOS-lab, EOS-4.28.3M)
service routing protocols model multi-agent
vlan 10
   name BLUE
!
vlan 20
   name GREEN
!
interface Ethernet1
   no switchport
   ip address 10.1.2.2/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   ip ospf network point-to-point
!
interface Ethernet3
   no switchport
   ip address 10.2.3.2/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet4
   switchport access vlan 10
!
interface Ethernet5
   switchport access vlan 20
!
interface Loopback0
   description VXLAN-VTEP
   ip address 10.0.0.2/32
!
interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vlan 10 vni 10
   vxlan vlan 20 vni 20
   vxlan flood vtep 10.0.0.1 10.0.0.3
!
ip routing
!
router ospf 1
   redistribute connected
!
end

! device: SW3 (vEOS-lab, EOS-4.28.3M)
service routing protocols model multi-agent
vlan 10
   name BLUE
!
vlan 20
   name GREEN
!
interface Ethernet1
   no switchport
   ip address 10.1.3.3/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   no switchport
   ip address 10.2.3.3/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   ip ospf network point-to-point
!
interface Ethernet4
   switchport access vlan 10
!
interface Ethernet5
   switchport access vlan 20
!
interface Loopback0
   description VXLAN-VTEP
   ip address 10.0.0.3/32
!
interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vlan 10 vni 10
   vxlan vlan 20 vni 20
   vxlan flood vtep 10.0.0.1 10.0.0.2
!
ip routing
!
router ospf 1
   redistribute connected
!
end

interface Ethernet0/0.10
 encapsulation dot1Q 10
 ip address 10.0.10.1 255.255.255.0
!
interface Ethernet0/0.20
 encapsulation dot1Q 20
 ip address 10.0.20.1 255.255.255.0

Looking at the SW1 configuration we can see that two VLANs, BLUE and GREEN, are defined with vlan id 10 and 20. The configuration under interface Vxlan1 is what enables VXLAN flood and learn behavior.

  • vxlan source-interface Loopback0 specifies which IP-address to use as source-IP when VXLAN-encapsulating traffic. This is also the IP-address that the switch will expect as the destination IP when receiving VXLAN-encapsulated traffic from SW2 or SW3. The UDP port is the default 4789, there is no need to change it.

  • vxlan vlan 10 vni 10 maps local vlan 10 to vni 10.

  • vxlan flood vtep 10.0.0.2 10.0.0.3 defines a VXLAN flood list, telling SW1 to flood BUM-traffic to SW2 (10.0.0.2) and SW3 (10.0.0.3). You can configure per-VLAN flood lists, but I decided against it here to keep things simple.

Using this configuration, our switches are able to VXLAN-tunnel Ethernet frames between devices in a VLAN over routed links.

With the configuration explained, let's examine the VXLAN output after the network has been running for a while:

SW1#show mac address-table vlan 10
Vlan Mac Address Type    Ports
---- ----------- ----    -----
  10 R1          DYNAMIC Et1
  10 PC11        DYNAMIC Et4
  10 PC21        DYNAMIC Vx1
  10 PC31        DYNAMIC Vx1

SW1#show vxlan address-table vlan 10
VLAN  Mac Address Type    Prt VTEP
----  ----------- ----    --- ----
  10  PC21        DYNAMIC Vx1 10.0.0.2
  10  PC31        DYNAMIC Vx1 10.0.0.3

SW1#show vxlan vni 
VNI       VLAN       Source       Interface       802.1Q Tag
--------- ---------- ------------ --------------- ----------
10        10         static       Ethernet1       10        
                                  Ethernet4       untagged  
                                  Vxlan1          10        

SW1#show ip route
 C        10.0.0.1/32 is directly connected, Loopback0
 O E2     10.0.0.2/32 [110/1] via 10.1.2.2, Ethernet2
 O E2     10.0.0.3/32 [110/1] via 10.1.3.3, Ethernet3
 C        10.1.2.0/29 is directly connected, Ethernet2
 C        10.1.3.0/29 is directly connected, Ethernet3
 O        10.2.3.0/29 [110/20] via 10.1.2.2, Ethernet2
                               via 10.1.3.3, Ethernet3

SW2#show mac address-table vlan 10
Vlan Mac Address Type    Ports
---- ----------- ----    -----
  10 PC11        DYNAMIC Vx1
  10 PC21        DYNAMIC Et4
  10 PC31        DYNAMIC Vx1

SW1#show vxlan address-table vlan 10
VLAN  Mac Address Type    Prt VTEP
----  ----------- ----    --- ----
  10  PC11        DYNAMIC Vx1 10.0.0.1
  10  PC31        DYNAMIC Vx1 10.0.0.3

SW2#show vxlan vni 
VNI       VLAN       Source       Interface       802.1Q Tag
--------- ---------- ------------ --------------- ----------
10        10         static       Ethernet4       untagged  
                                  Vxlan1          10        

SW2#show ip route
 O E2     10.0.0.1/32 [110/1] via 10.1.2.1, Ethernet1
 C        10.0.0.2/32 is directly connected, Loopback0
 O E2     10.0.0.3/32 [110/1] via 10.2.3.3, Ethernet3
 C        10.1.2.0/29 is directly connected, Ethernet1
 O        10.1.3.0/29 [110/20] via 10.2.3.3, Ethernet3
 C        10.2.3.0/29 is directly connected, Ethernet3

SW3#show mac address-table vlan 10
Vlan Mac Address Type    Ports
---- ----------- ----    -----
  10 PC11        DYNAMIC Vx1
  10 PC21        DYNAMIC Vx1
  10 PC31        DYNAMIC Et4

SW1#show vxlan address-table vlan 10
VLAN  Mac Address Type    Prt VTEP
----  ----------- ----    --- ----
  10  PC11        DYNAMIC Vx1 10.0.0.1
  10  PC21        DYNAMIC Vx1 10.0.0.2

SW2#show vxlan vni 
VNI       VLAN       Source       Interface       802.1Q Tag
--------- ---------- ------------ --------------- ----------
10        10         static       Ethernet4       untagged  
                                  Vxlan1          10  

SW3#show ip route
 O E2     10.0.0.1/32 [110/1] via 10.2.3.2, Ethernet2
 O E2     10.0.0.2/32 [110/1] via 10.2.3.2, Ethernet2
 C        10.0.0.3/32 is directly connected, Loopback0
 O        10.1.2.0/29 [110/20] via 10.2.3.2, Ethernet2
 C        10.1.3.0/29 is directly connected, Ethernet1
 C        10.2.3.0/29 is directly connected, Ethernet2 

Focusing on the MAC-address table of SW1, it has four entries: PC11, PC21, PC31 and R1. PC11 and R1 are locally connected to their respective Ethernet ports. PC21 and PC31, however, are reachable via VXLAN. There are no more VXLAN-details in this table.

Looking at the VXLAN-address table of SW1, we can see more details about PC21 and PC31. PC21 is reachable via VTEP 10.0.0.2, so we now know what destination IP-address to set when VXLAN-encapsulating traffic to PC21.

Last in the output is the routing table, showing that the best path to 10.0.0.2 is via Ethernet2.

ARP request packet walk

To understand how VXLAN flood and learn operates, let's imagine that the network has just booted up and that PC11 (10.0.10.11) is trying to communicate with PC31 (10.0.10.31). Because all nodes are fresh, PC11 has no idea about the MAC-address of PC31, so it sends out an ARP request. We will examine how SW1, SW2 and SW3 handle MAC-learning and populate their MAC-address and VXLAN-address tables as the ARP request flows through the network. Let's go through it in detail, starting with PC11.

Step 1:

PC11 doesn't know the MAC-address of PC31 (10.0.10.31) so it generates an ARP request. The Ethernet header destination MAC-address is FF:FF:FF, making it a broadcast frame. The diagram below show the ARP request from PC11 sent to SW1:

SW1 receive the broadcast frame on vlan 10 (Blue). The source MAC-address in the Ethernet header is added as an entry in the vlan Blue MAC-address table, allowing SW1 to remember where to send traffic destined for PC11. This is standard Ethernet MAC-learning.

Then SW1 examines the destination MAC-address and realizes that it's a broadcast frame, so it floods it out on all other ports in the VLAN. As part of the flooding process, SW1 sends one copy to each VTEP in the Flood list: 10.0.0.2 and 10.0.0.3. So SW1 creates two VXLAN-encapsulated copies of the broadcast frame and sends one each to SW2 and SW3.

Step 2:

SW2 and SW3 receive their copy of the VXLAN packet from SW1. They decapsulate the broadcast frame, mapping it to vlan BLUE based on the VNI. While decapsulating, they perform MAC-learning on the source MAC-address, remembering that PC11 (00:00:11) is reachable via VTEP 10.0.0.1. SW2 and SW3 then flood the broadcast frame out on all vlan BLUE switchports.

Step 3:

PC21 and PC31 receive their own copy of the ARP request sent by PC11. PC21 ignores the ARP request since he is not the intended target. PC31 processes the ARP request and generates an ARP reply, which we will cover shortly.

This packet walk is a typical example of how BUM traffic is handled by VXLAN. The ingress VTEP perform Ingress Replication and send each egress VTEP a copy, allowing them to flood the frame out on their local switchports. Any time an Ethernet frame is received by a switch, either on a local port or on the virtual VXLAN interface after encapsulation, the source MAC-address is added as an entry in the MAC-address table for future forwarding reference. The destination MAC-address is then examined against existing MAC-address table entries and if an entry exist, forwarded based on that information. If no entry exist, the frame is flooded out on all other local switchports.

Because the MAC-address is only learned when the switch receive a frame, the dataplane is also the control plane. By forwarding and receiving customer/user traffic, the network learns where all endpoints live.

ARP reply packet walk

Let's look at the ARP reply from PC31. The ARP reply is a unicast frame sent with PC11 set as the destination MAC-address.

SW3 receives the ARP reply from PC31. It learns the PC31 source MAC-address in a normal MAC-learning fashion. The destination MAC-address is a unicast MAC-address, and there is a matching entry in the local MAC-address table with Vx1 as egress interface. The VXLAN-address table shows that the MAC-address is reachable via VTEP 10.0.0.1, so SW3 VXLAN-encapsulates the unicast response and sends it to SW1.

SW1 receives the response from SW3, performs MAC-learning on 00:00:31 and sends the frame to PC11.

Because the ARP reply is unicast, SW2 doesn't receive it. So if PC21 wants to communicate with PC31, it has to send an ARP request of its own.

VXLAN Flood and learn conclusion

While VXLAN is an impressive protocol, it does require quite a lot of static configuration. If we add a VXLAN switch to the topology, then all other switches must update their flood list configuration to enable communication. Additionally, there is no separate control plane so a MAC-address is only learned by a switch when it receives a frame. This creates unnecessary flooding PC21 wants to communicate with PC31 because SW2 doesn't know where PC31 resides.

Next up we are looking at EVPN and how it can help us solve some of the issues with VXLAN flood and learn.


VXLAN EVPN

This is a BGP address family that was designed to advertise MAC-addresses. This creates a separate control plane that can be used to efficiently spread information about MAC-addresses (and IP-addresses) in the network without having to flood BUM packets. While EVPN was initially designed for use with MPLS, VXLAN was quickly added and today VXLAN + EVPN is a very popular and powerful combination.

EVPN has multiple route types:

  • Type 2: MAC-IP. Advertises MAC-addresses and IP-addresses, creating a separate control plane for VXLAN.

  • Type 3: IMET. A route for telling other VXLAN switches that we want to receive BUM-traffic for a specific VNI.

  • Type 5: IP-prefix. I covered this in my VXLAN L3VPN post and will not be covered here.

IMET route type (Type 3)

IMET is short for Inclusive Multicast Ethernet Tag. The IMET route type allows a switch to dynamically build a flood list for a VNI. We no longer have to configure flood lists manually in the switches. Whenever a switch joins a VNI, it sends out an IMET route advertisement, telling other switches that it wants to receive BUM traffic for that VNI.

Let's configure EVPN for vlan 10 (Blue) and then examine the relevant output:

service routing protocols model multi-agent
!
vlan 10
   name BLUE
!
vlan 20
   name GREEN
!
interface Ethernet1
   description R1
   switchport mode trunk
!
interface Ethernet2
   no switchport
   ip address 10.1.2.1/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   no switchport
   ip address 10.1.3.1/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet4
   switchport access vlan 10
!
interface Ethernet5
   switchport access vlan 20
!
interface Loopback0
   ip address 10.0.0.1/32
!
interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vlan 10 vni 10
   vxlan vlan 20 vni 20
!
ip routing
!
router bgp 65000
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.2 peer group EVPN
   neighbor 10.0.0.3 peer group EVPN
   !
   vlan 10
      rd 65000:10
      route-target import 65000:10
      route-target export 65000:10
   !
   address-family evpn
      neighbor EVPN activate
!
router ospf 1
   redistribute connected

service routing protocols model multi-agent
!
vlan 10
   name BLUE
!
vlan 20
   name GREEN
!
interface Ethernet1
   no switchport
   ip address 10.1.2.2/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   ip ospf network point-to-point
!
interface Ethernet3
   no switchport
   ip address 10.2.3.2/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet4
   switchport access vlan 10
!
interface Ethernet5
   switchport access vlan 20
!
interface Loopback0
   ip address 10.0.0.2/32
!
interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vlan 10 vni 10
   vxlan vlan 20 vni 20
!
ip routing
!
router bgp 65000
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.3 peer group EVPN
   !
   vlan 10
      rd 65000:10
      route-target import 65000:10
      route-target export 65000:10
   !
   address-family evpn
      neighbor EVPN activate
!
router ospf 1
   redistribute connected

service routing protocols model multi-agent
!
vlan 10
   name BLUE
!
vlan 20
   name GREEN
!
interface Ethernet1
   no switchport
   ip address 10.1.3.3/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet2
   no switchport
   ip address 10.2.3.3/29
   ip ospf network point-to-point
   ip ospf area 0.0.0.0
!
interface Ethernet3
   ip ospf network point-to-point
!
interface Ethernet4
   switchport access vlan 10
!
interface Ethernet5
   switchport access vlan 20
!
interface Loopback0
   ip address 10.0.0.3/32
!
interface Vxlan1
   vxlan source-interface Loopback0
   vxlan udp-port 4789
   vxlan vlan 10 vni 10
   vxlan vlan 20 vni 20
!
ip routing
!
router bgp 65000
   neighbor EVPN peer group
   neighbor EVPN remote-as 65000
   neighbor EVPN update-source Loopback0
   neighbor EVPN send-community
   neighbor 10.0.0.1 peer group EVPN
   neighbor 10.0.0.2 peer group EVPN
   !
   vlan 10
      rd 65000:10
      route-target import 65000:10
      route-target export 65000:10
   !
   address-family evpn
      neighbor EVPN activate
!
router ospf 1
   redistribute connected

Looking at the SW1 configuration, we have removed the vxlan flood vtep x.x.x.x y.y.y.y from under the interface Vxlan1 configuration, stopping BUM-traffic from being flooded between the VTEPs. To get BUM-traffic flooding again we configured BGP EVPN adjacencies between our VXLAN switch VTEPs (Loopback0).

The most important part of the configuration is adding vlan 10 under the router bgp 65000 configuration mode. We set a route-distinguisher (RD) and a route-target (RT) as these are required for the routes to exported and imported correctly. If you want to learn more about RDs and RTs you can read more in my article on MPLS L3VPN.

Let's examine the IMET routes that were generated after the EVPN configuration was applied to SW1-3:

SW1#show bgp evpn
          Network                Next Hop     Metric  LocPref Weight  Path
 * >      RD: 65000:10 imet 10.0.0.1
                                 -            -       -       0       i
 * >      RD: 65000:10 imet 10.0.0.2
                                 10.0.0.2     -       100     0       i
 * >      RD: 65000:10 imet 10.0.0.3
                                 10.0.0.3     -       100     0       i

SW1#show bgp evpn detail 
BGP routing table entry for imet 10.0.0.1, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community: 
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.1
BGP routing table entry for imet 10.0.0.2, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.2 from 10.0.0.2 (10.0.0.2)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: 
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.2
BGP routing table entry for imet 10.0.0.3, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.3 from 10.0.0.3 (10.0.0.3)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: 
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.3

SW2#show bgp evpn detail
BGP routing table entry for imet 10.0.0.1, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.1 from 10.0.0.1 (10.0.0.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community:
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.1
BGP routing table entry for imet 10.0.0.2, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community:
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.2
BGP routing table entry for imet 10.0.0.3, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.3 from 10.0.0.3 (10.0.0.3)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community:
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.3

SW3#show bgp evpn detail
BGP routing table entry for imet 10.0.0.1, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.1 from 10.0.0.1 (10.0.0.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community:
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.1
BGP routing table entry for imet 10.0.0.2, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.2 from 10.0.0.2 (10.0.0.2)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community:
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.2
BGP routing table entry for imet 10.0.0.3, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community:
         Route-Target-AS:65000:10
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      PMSI Tunnel: Ingress Replication, Tunnel ID: 10.0.0.3

Focusing on the SW1 output, there are three routes in the BGP EVPN RIB, all of the IMET route-type. The top route (for imet 10.0.0.1) was originated by SW1 and advertised to its BGP neighbors. The other two routes were receives from SW2 and SW3, respectively.

The PMSI Tunnel tunnel-type attribute is set with value Ingress Replication (IR). When IR is set as the tunnel-type, RFC 5614 says the advertising switch must also advertise its VTEP IP-address as Tunnel ID. When the receiving VXLAN switch receive this route it adds the Tunnel-ID IP-address to its vxlan flood list for the VLAN matching the VNI.

Thanks to these routes we have a working flood list on our VXLAN switches for vlan 10. This has eliminated the need to manually edit the vxlan flood list on every single switch whenever a switch is added or removed. BGP now does this for us!

One final thing to note is that the network is still using flood and learn. The data plane is also the control plane as the switch must receive a frame for MAC-learning. Let's fix that with Type-2 EVPN routes.

MAC-IP route type (Type 2)

It's time to start advertising MAC-addresses with EVPN and see what that looks like. With this added configuration, our switches are able to build a control plane that is separated from the data plane. The main benefit of this separation is that the switches don't have to rely on BUM-traffic flooding to learn where in the topology a MAC-address lives.

For example, as soon as PC11 start sending frames, SW1 will perform MAC-learning and add PC11's MAC-address to its vlan 10 MAC-address table. By also advertising the MAC-address to SW2 and SW3, they will know that PC11 is reachable via VTEP 10.0.0.1. If PC11 stop talking and its MAC-addres entry times out on SW1 it withdraws the BGP route, removing the entry from the MAC-address tables of SW2 and SW3 aswell.

While this may not make a huge difference yet, if we look back at the ARP Reply Packet-walk example from earlier, SW2 would not learn the MAC-address of PC31 because the ARP reply was sent as unicast to PC11. With EVPN active, as soon as SW3 learns the MAC-address of PC31, the MAC-address will be advertised with BGP EVPN, allowing SW1 and SW2 to also learn the MAC-address.

Note: PC21 must still send out an ARP request for PC31 if it wants to start communicating, which SW2 has to flood into the network. We will look into reducing this flooding with EVPN ARP Suppression later in the article.

What configuration must be added to deploy this awesome MAC-address advertising feature? Let's find out:

SW1#show running-config section router bgp
router bgp 65000
   vlan 10
      rd 65000:10
      route-target import 65000:10
      route-target export 65000:10
      redistribute learned

The magic command is redistribute learned under vlan 10 in the BGP configuration. Such a simple command yet so powerful! Let's examine the BGP EVPN RIB again to see what routes have been added:

SW1#show bgp evpn
          Network                Next Hop     Metric  LocPref Weight  Path
 * >      RD: 65000:10 mac-ip aabb.cc00.0011
                                 -            -       -       0       i
 * >      RD: 65000:10 mac-ip aabb.cc00.0021
                                 10.0.0.2     -       100     0       i
 * >      RD: 65000:10 mac-ip aabb.cc00.0031
                                 10.0.0.3     -       100     0       i
 * >      RD: 65000:10 imet 10.0.0.1
                                 -            -       -       0       i
 * >      RD: 65000:10 imet 10.0.0.2
                                 10.0.0.2     -       100     0       i
 * >      RD: 65000:10 imet 10.0.0.3
                                 10.0.0.3     -       100     0       i

SW1#show mac address-table vlan 10
Vlan    Mac Address       Type        Ports  
----    -----------       ----        -----  
  10    aabb.cc00.0011    DYNAMIC     Et4    
  10    aabb.cc00.0021    DYNAMIC     Vx1    
  10    aabb.cc00.0031    DYNAMIC     Vx1    

SW1#show vxlan address-table vlan 10
VLAN  Mac Address     Type      Prt  VTEP    
----  -----------     ----      ---  ----    
  10  aabb.cc00.0021  EVPN      Vx1  10.0.0.2
  10  aabb.cc00.0031  EVPN      Vx1  10.0.0.3

SW1#show bgp evpn detail
BGP routing table entry for mac-ip aabb.cc00.0011, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community: 
         Route-Target-AS:65000:10 
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      ESI: 0000:0000:0000:0000:0000
BGP routing table entry for mac-ip aabb.cc00.0021, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.2 from 10.0.0.2 (10.0.0.2)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: 
         Route-Target-AS:65000:10 
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      ESI: 0000:0000:0000:0000:0000
BGP routing table entry for mac-ip aabb.cc00.0031, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.3 from 10.0.0.3 (10.0.0.3)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: 
         Route-Target-AS:65000:10 
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      ESI: 0000:0000:0000:0000:0000

SW2#show bgp evpn detail
BGP routing table entry for mac-ip aabb.cc00.0011, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.1 from 10.0.0.1 (10.0.0.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: 
         Route-Target-AS:65000:10 
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      ESI: 0000:0000:0000:0000:0000
BGP routing table entry for mac-ip aabb.cc00.0021, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    - from - (0.0.0.0)
      Origin IGP, metric -, localpref -, weight 0, valid, local, best
      Extended Community: 
         Route-Target-AS:65000:10 
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      ESI: 0000:0000:0000:0000:0000
BGP routing table entry for mac-ip aabb.cc00.0031, Route Distinguisher: 65000:10
 Paths: 1 available
  Local
    10.0.0.3 from 10.0.0.3 (10.0.0.3)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: 
         Route-Target-AS:65000:10 
         TunnelEncap:tunnelTypeVxlan
      VNI: 10
      ESI: 0000:0000:0000:0000:0000

SW3#show bgp evpn
Route status codes: * - valid, > - active, S - Stale, E - ECMP head, e - ECMP
                    c - Contributing to ECMP, % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
          Network                Next Hop              Metric  LocPref Weight  Path
 * >      RD: 65000:10 mac-ip aabb.cc00.0011
                                 10.0.0.1              -       100     0       i
 * >      RD: 65000:10 mac-ip aabb.cc00.0021
                                 10.0.0.2              -       100     0       i
 * >      RD: 65000:10 mac-ip aabb.cc00.0031
                                 -                     -       -       0       i
 * >      RD: 65000:10 imet 10.0.0.1
                                 10.0.0.1              -       100     0       i
 * >      RD: 65000:10 imet 10.0.0.2
                                 10.0.0.2              -       100     0       i
 * >      RD: 65000:10 imet 10.0.0.3
                                 -                     -       -       0       i

To keep the output at a decent level I removed the IMET routes from the detailed output. We can see that the MAC-addresses of PC11, PC21 and PC31 are now advertised by EVPN. The receiving switches have installed the MAC-addresses into their local MAC-address and VXLAN-addres tables.

A new BGP Path attribute ESI can be spotted in the output. The default value is all-zeroes, meaning this ethernet segment is singlehomed. ESI is an open standard-based alternative to setting up a MLAG Port-Channel. This allows a device to setup a Port-Channel (LAG) to two VXLAN switches while eliminating the need for an MLAG configuration on those two switches. I cover ESI in my EVPN Multihoming article.


ARP Suppression

One powerful feature of the type 2 MAC-IP route is that it can advertise ARP-entries, which in turn allow a VXLAN switch to perform ARP Suppression. As soon as an ARP entry for PC21 is generated on SW2, it will advertise the MAC-address + IP-address combo with EVPN to SW1 and SW3.

If PC11 then wants to communicate with PC21, it will send an ARP request. Instead of SW1 flooding the ARP request to SW2 and SW3, it uses its local EVPN-generated ARP entry to immediately send an ARP reply back to PC11. Thanks to this ARP Suppression mechanism, many ARP packets are never flooded into the network, lowering BUM-traffic overhead as shown below:

When the ARP-entry on SW1 expires, the IP-address is removed from the MAC-IP route. This removal tell SW2 and SW3 to remove their ARP entries.

For a switch to process ARP packets and perform ARP Suppression, it must have a VLAN-interface and IP-address configured in that subnet. Let's examine that configuration and see how it operates:

SW1#sh run int vlan10
interface Vlan10
   ip address 10.0.10.1/24
   arp aging timeout 290

SW2#sh run int vlan10
interface Vlan10
   ip address 10.0.10.2/24
   arp aging timeout 290

SW3#sh run int vlan10
interface Vlan10
   ip address 10.0.10.3/24
   arp aging timeout 290

Very simply configuration, adding a VLAN-interface for vlan 10. Then we configure a unique IP-address on each switch to avoid IP-address collisons. The switches are now able to create ARP entries that can be advertised with EVPN. I will cover the arp aging timeout command further down.

To generate an ARP entry, we need to communicate with a device. So I tell SW2 to communicate with PC21 with the ping 10.0.10.21 command, forcing SW2 to generate an ARP request to ask PC21 which destination MAC-address to use. In the real world, the IP-address on the VLAN-interface is typically the default gateway for all devices in the subnet, so ARP entries are naturally kept valid while the switch is routing traffic in and out of the subnet.

SW1#show bgp evpn
 * >      RD: 65000:10 mac-ip aabb.cc00.0021 10.0.10.21
                                 10.0.0.2              -       100     0       i
 * >      RD: 65000:10 mac-ip aabb.cc00.0031
                                 10.0.0.3              -       100     0       i
SW1#show arp
Address         Age (sec)  Hardware Addr   Interface
10.0.10.21              -  aabb.cc00.0021  Vlan10, Vxlan1

SW1#show mac address-table 
Vlan    Mac Address       Type        Ports      Moves   Last Move
----    -----------       ----        -----      -----   ---------
  10    aabb.cc00.0021    DYNAMIC     Vx1        1       0:39:46 ago

SW1#show vxlan address-table vlan 10
VLAN  Mac Address     Type      Prt  VTEP             Moves   Last Move
----  -----------     ----      ---  ----             -----   ---------
  10  aabb.cc00.0021  EVPN      Vx1  10.0.0.2         1       0:39:50 ago

This shows the output on SW1 after receiving the MAC-IP route for PC21 from SW2. I have highlighted the most interesting line. The MAC-IP route for PC21 now show a MAC-address and an IP-address. These two are combined into an ARP entry on SW1, also shown above. Of course, the MAC-address is imported into the MAC-address and VXLAN-address tables as usual.

Thanks to this ARP-entry on SW1, whenever PC11 wants to communicate with PC21 and sends an ARP request, SW1 can immediately reply without having to flood the request to the rest of the network.

ARP Suppression Drawbacks

One drawback of ARP suppression in our scenario is that many applications rely on ARP and Gratuitous ARP for keepalive between nodes in a cluster. If we suppress these ARP packets then these applications break.

Another problem appear if you include Inter-VLAN routing (with Asymmetric IRB) in your VXLAN design. Here the problem might be related to MAC-entry and ARP entry synchronization. A MAC-address entry times out and is removed after 5 minutes of inactivity. An ARP-entry times out after 4 hours.

Let's imagine a scenario where SW1 is doing Inter-VLAN routing for traffic from PC11 (vlan BLUE) to PC32 (vlan GREEN). This involves SW1 locally learning the PC11 ARP-entry on vlan BLUE and learning the PC32 ARP-entry on vlan GREEN from SW3. PC11 and PC32 run a long-lived TCP connection that for some reason quiets down for 10 minutes. The MAC-address for PC32 times out on SW3, forcing SW1 to remove its ARP entry for PC32.

When PC11 start communicating with PC32 again, SW1 has no way of forwarding the traffic as it doesn't know how to reach PC32 anymore, so it holds the packets from PC11 while it sends out an ARP request on vlan GREEN to find PC32 agqain. If the reply is too slow, SW1 has to drop the packet from PC11 and return an "ICMP Host Unreachable". This will trigger PC11 to kill the long-lived TCP session.

To combat this problem, it is a best practice to lower the arp aging timeout to slightly less than that of the MAC-address entry timeout. In my case I lowered the value to 290 seconds. The Arista switch has a built-in ARP Refresh mechanism which sends out a new ARP Request whenever an ARP entry is about to expire. With this configuration SW3 will generate an ARP Request for PC32 every 290 seconds and flood it out on its local switchports. As long as PC32 responds, the MAC and ARP entries stay active and no routes are withdrawn.

So the fix is simple; make ARP entry time out faster than MAC entry. There may be other caveats to this of which I am unaware. Search your favorite engine for "ARP Problems in EVPN" if you want to learn more.


Conclusion

This ends my article on VXLAN L2VPN. We have looked at pure VXLAN configuration aswell as VXLAN + EVPN configuration. We have looked at some of the benefits and drawbacks associated with flood and learn versus separating data-plane from control plane. Lastly we looked at ARP Suppression, a fascinating topic in my opinion.

If you want more to read, please consider other posts in my VXLAN series:


Copyright 2021-2023, Emil Eliasson.
All Rights Reserved.