MPLS L3VPN’s – Part 1

MPLS Layer 3 VPN’s (MPLS L3VPN) or also called MPLS IP VPN, is one of the most used application of MPLS in today’s networks. This service is most popular with Service Provider’s but can also be found in Enterprise WAN and Data-center environments. This post will focus on an overview of the different technologies used to create an MPLS IP VPN network and how they interact with each other.

MPLS L3 VPN is, like the name says, a way to create a Layer 3 VPN by harnessing the power of MPLS. The terminology “L3” or Layer 3 refers to the third layer of the OSI model and it is basically just a fancy way of saying that we are segregating at the network layer by creating a separate routing and forwarding table for each VPN. This means that for each VPN we can have an overlapping IP addressing topology running on a shared backbone network.

MPLS VPN Connection Model:

Let’s start by looking at a common topology for an MPLS VPN network. This is what a simplified Service Provider network would look like:

2015-11-14 12_40_05-untitled

The purpose of the network in this example is to send traffic for each customer from one site to another over the routed Service Provider network while maintaining a separate routing and forwarding instance for each customer. In an enterprise environment, the same topology could be used but with the goal being instead of separating traffic between different departments, applications, groups, services or any other logical domain that might be a requirement of the business. There are different components and technologies used in this network and we will explore them one by one separately.

The first part of this network is the Core MPLS Network, made of Provider (P) devices.

2015-11-14 14_29_57-untitled

P devices sit inside of the MPLS network and run a link-state routing protocol (OSPF or IS-IS) with an MPLS distribution protocol on all of their interfaces. There are a couple of MPLS distribution protocols that can be used: Label Distribution Protocol (LDP), Resource Reservation Protocol with Traffic Engineering Extensions (RSVP-TE), Constraint-based routed LDP (CR-LDP), Multiprotocol BGP (MP-BGP) and Segment Routing, but we will focus on LDP for simplicity and because it is possibly the most widely used.

A quick overview of LDP:

Label Distribution Protocol (LDP) is a protocol used to form Label-Switched Paths (LSP’s) by using the existing routing table made by an IGP. LSP’s are sequences of MPLS enabled devices that forward packets of a certain Forwarding Equivalence Class (FEC). A FEC is a set of packets that a single device forwards the same next hop, with the same interface and the same treatment. If this all sounds confusing and unclear, just think of an LSP as a unidirectional tunnel between two devices that share common characteristics. Without going into too much details on this technology, LDP will create an LSP using P devices and it will be used to forward customer traffic across the MPLS network.

The second part of this network is the Provider Edge (PE) devices.

2015-11-14 14_31_29-untitled

PE devices sit at the edge of the MPLS network in between Provider (P) and Customer Edge (CE) devices. They use MPLS/LDP to the P devices and IP to the CE devices. By running LDP with the P devices, the PE’s are able to create an LSP to the other PE’s. The labels forming this LSP will be used for forwarding packets over the MPLS core and are more commonly called the IGP label or Transport Label.

PE’s also need to create an MP-BGP connection to other PE’s to exchange VPNv4 information as follow:

2015-11-14 15_40_24-untitled

We will see later on what is contained exactly in these BGP updates but one of the key information exchanged will be the VPN address that will be imposed as a label, more commonly known as the VPN label.

PE devices also have another very important role and that is to hold a separate routing instance for every CE. This is made possible by using a separate Virtual Routing and Forwarding (VRF) instances for each customer.

A quick overview of VRF’s:

Virtual Routing and Forwarding (VRF) is a technology that allows multiple routing and forwarding tables to exist on a single device. It is basically the equivalent of a VLAN for Layer 3 segmentation. The use of a VRF alone is called VRF-lite, but within the context of MPLS it is just called a VRF. The difference really is that VRF-lite does not make use of two critical components used in MPLS IP VPN’s: the Route Distinguisher’s (RD’s) and Route Target’s (RT’s).

Route Distinguishers (RD’s) are defined within a VRF’s configuration and their only purpose in L3VPN’s is to make an IPv4 prefix globally unique for route exchange. This is important so that the service provider can distinguish between two same prefixes from different customers.

Let’s say Customer 1 and Customer 2 both have the network and are advertising this prefix through the Service Provider network. How would the Service Provider know from what customer this route came from? It can’t, unless there is something to differentiate both routes. That is exactly why the route distinguisher exists. During the route exchange process, the route distinguisher will be appended on the IPv4 address as follows:

2015-11-14 16_49_08-Untitled - Notepad

Route Target’s (RT’s) are also defined within a VRF configuration and in a similar format as the Route Distinguisher but they have a completely different purpose. Their role is to tell PE device which prefixes should be exported or imported in the VPNv4 routing table using a BGP extended Community Attribute.

Again, we will see how this whole process works later on this post but for now just remember that the Route Distinguisher makes an IPv4 address unique across the MPLS network and the Route Target defines which prefixes get imported or exported on the PE devices.

The third and final part of the network that we need to look at are the CE devices:

2015-11-14 17_22_57-untitled

CE devices sit at the edge of the network and they exchange IP routes with the PE devices using an IP routing protocol (BGP, OSPF, EIGRP, IS-IS, RIP). On the CE devices, the routing is done in the global routing table (normal non-VRF routing table) but on the PE, the routing is done in a separate VRF for each customer. VRF’s are locally significant and this means that the CE does not need to know if the PE interface is in a VRF or not. From the perspective of the customer, the routes advertised to the PE device are tunneled transparently across the service provider network and advertised to the CE at the other end without any special configuration.

To better understand the whole process, let’s look at the life of an IP packet as it moves across the Service Provider network from its signaling (control-plane) to it’s forwarding (forwarding plane). For simplicity, we will only look at Customer1 in a single direction (uni-directional traffic).

  1. The first step is to have LDP to do it’s magic. LDP Label Binding and signaling will be done hop by hop from the loopback’s of a PE to the other PE device, forming an LSP.

2015-11-15 12_37_42-untitled

In this case the LSP is created for Loopback0 ( prefix) and the labels signaled are: Null -> 33 -> 16 -> 40. The first label is always signaled as Null because of the Penultimate hop popping (PHP) function. This process is simply an optimization to avoid a double lookup of the label at the last hop device of the LSP; the Label Edge Router (LER). You can read more on this feature if you check RFC3031 section 3.16

  1. In the second step, the customer sends a route update from the CE device and it is advertised through the routing protocol configured between the CE and PE device. When it reaches the PE device, it is signaled through MP-BGP to the other PE as a VPNv4 route with a Route-Distinguisher attached to the prefix, a Route-Target and a Label. In this case, the route 200:1: is sent with Route-Target 200:1 and Label 21.

2015-11-15 12_57_24-untitled

The update packet in Wireshark  looks like this:


In the first box you can see the Route-Target value of 200:1. You can also see in the third box the Label value sent of 21 for the IPv4 route Finally, in the last box you can see the Route-Distinguisher value that makes the route globally unique.

  1. Finally, the customer at the opposite side sends packets for the signaled route. These packets are forwarded from the CE using IPv4 to the PE. From the PE, the IPv4 packets will be encapsulated with two labels to be forwarded over the MPLS backbone, the VPN and the IGP label. Again, the VPN label was signaled by MP-BGP and tells us in which VRF of the opposing PE to send the traffic into. The IGP label was signaled by LDP and is used to forward packets hop by hop across the MPLS backbone.

2015-11-15 13_15_33-untitled

In this example, when the IPv4 packet for destination enters the ingress PE device from the CE, the PE device looks up the destination IP address ( in its forwarding table and finds the correct VRF to send the traffic by looking at which interface the route entered the PE device in the first place. It encapsulates the packet with both IGP and VPN labels (21 and 40) and sends it to the first P device. Hop by hop, the P devices will change the IGP label based on the BGP next-hop IP address ( and finally remove it at the last P before entering the PE device because of the Null label. The last hop PE device looks up the VPN label in its forwarding table and forwards the packet without any labels (as an IP packet) towards the CE device.

Again, this example only showed unidirectional forwarding. If bi-directional traffic was required, Step 1 and Step 2 would have to happen again in the reversed direction for the route back towards the source to be installed in the control-plane of the remote CE device.

In conclusion, MPLS L3VPN is a mix of several different protocols forming a highly scalable and flexible VPN solution. In Part 2, I will be looking at different use cases for MPLS L3VPN’s and some of the most common deployment scenarios.


Hour 440: BGP Conditional Route Injection

BGP Conditional Route injection and Conditional Route advertisement are two of the more advanced BGP features that I have encountered in the CCIE R&S Lab. It is important to understand the difference between both of these. The first will INJECT a route based on a condition and the second will ADVERTISE an already existing route based on another condition. Also, the Conditional Route Injection feature is much more dangerous and complex to configure than the Conditional Advertisement one. Today, I will be presenting you a Case on why and how to configure BGP Conditional Route Injection.

Case A:


Let’s pretend you are a customer that has a dual-homed ISP connection. You are currently receiving an aggregated route of from both of these. You want to be able to route using BGP to the and prefixes through ISP1 and aswell as through ISP2. This would be pretty easy if you would have control over the ISP1 and ISP2 routers as you could ask them to unsupress these networks and receive them without having to change any configuration on your side. Unfortunately, the process of going through the ISP’s and asking them to change configurations takes too long and you need this done today. We will be using the Conditional Route Injection to be able to accomplish this. The configuration can be done in 5 steps: Continue reading

Hour 413: BGP Administrative Distance manipulation

In some design situations, you might need to set up a BGP interconnection that will act as backup link to an IGP. The problem with this situation is that by default, eBGP will have an AD of 20 and will take precedence over any IGP (OSPF= 110, EIGRP = 90, IS-IS = 115, RIP = 120). Today, I will be talking about the different options to implement this sort of design and the different problems you can encounter with each of them.

Option 1: Change BGP Administrative Distance per neighbor

This is probably the easiest and most scalable way of changing the AD of BGP routes. It is done under the address-family (unicast, multicast or vrf) with the distance <AD> <neighbor> <wildcard> <optional ACL>. The reason this is scalable solution is that you can specify the neighbor that you want the AD changed as well as an ACL that matches prefixes of that neighbor. One problem with this method is that these prefixes will have their AD changed but BGP will still re-advertise them to any eBGP neighbors that have don’t have any inbound filtering. This can lead to asymmetrical routing for these routes because AD will only be changed locally. To exaplain this situation let’s take for example this topology:


If SW1 ( sends traffic to SW2 (, by default the traffic will go through SW1-R1-R3-R2-SW2 then come back SW2-R2-R3-R1-SW1. If we change the AD on R1 the traffic will go through SW1-R1-R2-SW2 and come back SW2-R2-R3-R1-SW1. Again, the reason for this is that AD will be changed for the prefix locally only and result in asymmetrical routing. We can be fix this by changing the AD of R2 or by having strict inbound filtering of prefixes on R1 (do not accept from neighbor R2).

Option 2: Change BGP Administrative Distance per address-family

The second option available is of course, changing the AD of BGP per address-family. This means that you will change the AD of all routes in the unicast, multicast or vrf address-family. This is done under the address-family section of the BGP process with the distance bgp <ebgp> <ibgp> <local routes>. The problem with this is that it is not scalable as all future BGP routes in that address-family will have their AD changed. Why would you ever use this then? Well, if you just don’t care about future BGP connections this could be an option. Another reason is that in some platforms like 3000 NX-OS switches that have limited BGP capabilities, Option 1 is not available as a command and this is the only solution.

Option 3: BGP backdoor

BGP backdoor is a command introduced to avoid the problem encountered in Option 1. Under the BGP process address-family you use the network <network> mask <network mask> backdoor command. Any eBGP prefixes in that address-family that matches the network command will have their AD changed from 20 to 200 and will not cause BGP to generate an advertisement for that network. This last statement is the important part, as not advertising that network will not cause the problem we had in Option 1. The problem with this option is that if you have 200 unique prefixes to change, you will have to enter 200 network <network> mask <network mask> backdoor commands in the BGP process.

Option 4: Change IGP Administrative Distance

Another method to a BGP backup link design would be to lower the IGP AD. For all IGP’s in IOS (OSPF, EIGRP, RIP and IS-IS), the command is the same. You will use under the IGP process the distance <AD> <IP source address> <wildcard mask> <optional ACL>. This is a very similar method as Option 1 and has the same problems. To avoid asymmetry you will need to change the AD on neighboring routers or have an inbound filtering for the IGP.

Option 5: Change AD through PBR

This option is not available on all platforms (only NX-OS 7000 switches as far as I know) but I have to mention it as it might be available in future releases. It is the called Policy-based administrative distance. Using this method you can change the distance of a prefix by creating a route-map. The command goes something like this:

route-map CHANGE-AD permit 10

match ip address prefix-list <prefix list name>

set distance <eBGP AD> <iBGP AD> <local AD>

router bgp

address-family ipv4 unicast

table-map CHANGE-AD

Again, this option is not offered on IOS or every NX-OS platforms so you are most likely to use the other methods.

Hope this was informative.

Hour 280: CCIE Lab vs Production example

The CCIE Lab does not follow the conventional “best practices” that you would see in a production network. To demonstrate this, I will take this topology as an example:


This is a typical CCIE Lab topology, where you have BGP peering between the loopbacks but using an IGP to establish connectivity between these loopbacks. In this scenario, we are trying to advertise R1’s Loopback IP address to R3 and R3’s Loopback IP address to R1. Let’s take a look at each routers configurations: Continue reading

Hour 72: BGP Review Part 2

Took me some time for this one. Part 1 can be found here.

Consult the symbols legend at the end of the post for information on symbols.


  • [(RTR)neighbor <ip> send-communities] By default no communities are exchanged between any peers
  • [(RM)set community No-advertise] – Do not send beyond local router
  • [(RM)set community No-export]  Do not send beyond local AS
  • [(RM)set community Local-as]  Do not send to EBGP sub-AS peers within confed. Within single AS works the same as no-export, but not recommended
  • [(RM)set community Internet] – permit any – overwrite all communities and allow prefix to be announced everywhere
  • [(RM)set comm-list <id | name> delete] delete single community
  • [(RM)set community none] deletes all communities
  • [ip community-list <1-99> permit|deny <value…>] Max 16 single community numbers
  • [ip community-list 1 permit 2000:100 100:2000] logical AND
  • [ip extcommunity-list standard | expanded <name> <seq> permit | deny <values>]
  • [ip bgp-community new-format] Change default numbered NN:AA (represented as a single number) community format to AA:NN (AS number followed by the community number)
  • [ip community-list <100-199) permit|deny <regexp>] extended ACL allows Regular Expressions

REGEXP Continue reading

Hour 65: BGP Review Part 1

Consult the symbols legend at the end of the post for information on symbols.


  • AD 20 EBGP
  • AD 200 IBGP
  • BGP Best Path Selection is used to determine the path used. Decision process mnemonic:

We Love Oranges As Oranges Mean Pure Perfect Refreshment

1. Weight – largest preferred

  • [(RTR)neighbor <ip> weight <weight>] Sets the weight for a metric ( 0 by another BGP peer and localy originated 32768 by default)
  • [(RTR)neighbor filter-list <acl> weight <#>] references an AS_PATH ACL. Any routes from the peer whose weights are not set by the [(RTR)neighbor filter-list <acl> weight <#>] command have their weights set by [(RTR)neighbor <ip> weight <weight>]
  • [(route-map)set weight <weight>] only the AS_PATH can be matched
  • Any routes locally originated (network, aggregate, redistribute) is assigned weight 32768

2. Local-preference – largest preferred

  • Default Local-preference is 100
  • [(RTR)bgp default local-preference <pref>] Globally set
  • [(Route-map)set local-preference <pref>]

3. Originated Locally (in decreasing preference)

  • [(RTR)neighbor <ip> default-originate] does not require default route to be in routing table
  • [(RTR)network] default-route must be in routing table
  • [(RTR)default-information originate] use explicitly with redistribution & has to be in routing table
  • [(RTR)aggregate-address <net> <mask>] route must be in routing table

4. AS_PATH – Shortest preferred

  • Private AS range: 64512-65535( last 1024)
  • [(RTR)bgp bestpath as-path ignore] HIDDEN COMMAND
  • Up to 4 different components: AS_SEQ, AS_SET, AS_CONFED_SEQ, AS_CONFED_SET
  • [(RTR)neighbor <ip> remove-private-as] Private AS is removed toward that neighbor. Only tail AS is removed.
  • [(RTR)neighbor <ip> local-as <as> %no-prepend% %replace-as%%dual-as%%] Local AS is also seen on the router where it is configured. Local AS is prepended to all paths received from that peer, so internal routers with that native as will see a loop. no prepend: works for prefixes send toward own AS. Local AS is removed. replace-as: works for outbound prefixes, replaces real AS in path with local AS.
  • [(Route-map)set as-path prepend <as> %<as>%]
  • [(RTR)bgp maxas-limit <#>] Drop paths with number of AS’s exceeding this number. Default is 75.
  • [(RTR)neighbor <ip> allowas-in] Allow own AS in the path (split AS)

5. ORIGIN Code – Lowest preferred Continue reading

Hours 60: BGP Default Routes Originated Locally

Today, let’s have a look at the third preferred path selection attribute; BGP routes originated locally. Lets take a look in particular to default routes originated locally. There are a couple of ways you can originate a default route for BGP; through the default-information originate command, with the network command and with the neighbor <ip> default-originate command. But what are the differences between these different methods?

The default-information originate command requires that the default-route be learned using redistribution AND be in the routing table. This can be through an IGP or a static route.The network command requires the default-route to be in the routing table. This can be again through an IGP or a static route. Finally, the neighbor <ip> default-originate command does not require a default route to be in the routing table but it will only originate a default-route for the neighboring router. What if these 3 commands are in place? What is the order of preference? Well I didn’t find any information on this so I set up a quick lab to test this: Continue reading