It’s been almost a year since I have been out of the military and I thought it would be a good idea to dedicate a blog to them. More specifically I want to do a technical blog on a type of problem I sometimes encountered while working there; GRE recursive routing. Any enterprise that runs tunneling protocols like GRE/IPSEC or VTI’s will most likely encounter these type of problems at one point or another.
First, I want to separate this post in two cases; Case A and Case B. They are different scenarios but their root causes are both the same.
Case A: Recursive Routing due to less specific route
In case A, we have a topology where R2 and R3 are VPN devices and R1 and R4 are routers running a GRE tunnel between each other sharing routing information through an IGP.
When working for the DND, we used INE’s (Inline Network Encryptors) as VPN devices. These devices enabled us to do hardware encryption between each other. In this example the 2 nodes (R2 and R3) will simulate these INE’s. Like most VPN’s, we would run static routes between them to establish connectivity. To simulate these devices I will run static routes on R2 and R3 to get full reachability in the network:
R1 and R4 being stub’s would run a default route to the VPN devices. With this set up, R1 loopback could reach R4’s loopback.
Now we would establish the GRE tunnel between R1 and R4 using the loopback’s as source and destination:
Tunnel comes up correctly and can pass ICMP through just fine. Now we will use the tunnel to pass IGP information (in this case OSPF) to be able to reach 10.110.1.0/24 and 10.110.2.0/24 networks. So let’s add to OSPF area 0 the tunnel, the loopback’s and the both 10.110.1.0/24 and 10.110.2.0/24:
As soon as we add the loopback, the tunnel goes down and gives us a recursive routing error. A minute after that the tunnel comes back up and goes down again, continuously flapping.
Why is this happening? This is a classic recursive routing error. A recursive routing error is when a look up for prefix X points at prefix Y for the next-hop, and a look up for prefix Y points at prefix X for the next-hop. In this case, this happens because the tunnel is originally established with a less specific prefix (the default route) and when the router learns of a more specific way to get to 220.127.116.11 (through the tunnel itself) it does a recursive look up: when a look up for prefix 18.104.22.168 points at prefix 192.168.255.4 as next hop and for prefix 192.168.255.4 points at 22.214.171.124 for next hop. When this happens, the tunnel goes down is disabled due to recursive routing.
Resolving this problem is pretty simple; do not advertise either’s tunnel destination interface in the IGP.
As you can see, the tunnel comes back up after a couple of seconds once the loopbacks are taken out of OSPF. This case is pretty simple to fix but it can be hard to understand if you do not understand how recursive routing behaves.
Case B: Recursive Routing due to better metric
In this second scenario, we will remove the VPN devices out of the topology and simply have 3 devices connected in a V shape as follow:
Now for this topology, we will run RIPv2 as our IGP and we will run it everywhere. Now you are asking yourself, why RIPv2 and not OSPF or EIGRP? Well, we will talk about this later but for now; let’s just run RIP on every interface.
We can send data through and see that we are learning R3 Loopback through RIPv2. Now let’s establish the GRE tunnel and add it to RIPv2.
A couple of seconds after the tunnel interface was added to RIPv2, we can see that we are having a recursive routing error. Again, this occurs when a look up for prefix X points at prefix Y for the next-hop, and a look up for prefix Y points at prefix X for the next-hop. In this scenario, the tunnel destination is dynamically learned through the tunnel interface itself because the metric of the tunnel (1 hop) is better than the metric through the physical path (2 hops). Now before we explore the solution to this problem, let’s go back and explain why I said we wouldn’t be using EIGRP or OSPF for this scenario.
In OSPF in EIGRP there is something very special that happens when we create a tunnel; the metric of the tunnel interface is artificially increased to prevent this kind of problem. Indeed, in OSPF we use cost for the metric and EIGRP we use a combination of bandwidth and delay (by default), let’s look at these two metrics on an enabled tunnel interface:
As you can see, both these IGP’s artificially increase the metrics of the tunnel interface. This means that in order to reproduce this problem, we would have to manually change the tunnel interface with the ip ospf cost x (for OSPF) or delay x (for EIGRP) command.
How can we solve this problem? There are several solutions:
– Do not advertise in the routing protocol used over the GRE path the tunnel endpoints
– Use route filtering to filter tunnel endpoints from being learned over the GRE tunnel
– Do not use same routing protocol over both the physical and tunneling path: this is the easiest but requires the use of another routing protocol
– Do not advertise in the routing protocol the same network over both the physical and the GRE path
– Use VRF aware GRE tunnels to separate the routing tables
Unless you are using RIPv2 or weird metrics in your network, this scenario is unlikely to happen but for the CCIE exam you should know this. Hope this was informative.