EdgeRouter: Policy Based Routing for OpenVPN when Load Balancing dual WANs

By | May 9, 2016

Trying to fix my OpenVPN site-to-site link and due to the environment update I had to do some changes. The initial setup of the OpenVPN is here https://blog.voina.org/edgerouter-dual-wan-hair-pin-multiple-networks-openvpn-site-to-site-vpn/

First of all there is a new EdgeRouter ER-8 that is directly linked to the main ISP I got this from Amazon.de see Ubiquiti ER-8 Netzwerk/Router
. In the future the plan is to have it link to another cable land-line ISP (the 3rd) so this will load balance also between two ISPs.

The current setup looks like:

routers

Primary Site:

ER-8 (with load-balancing WAN1 and WAN 2):
– WAN 1: eth0 linked to the ISP 1 through a Hitron cable modem in bridge mode. Thus the ER-8 gets the IP from the ISP.
– WAN 2: eth1 not linked.
– LAN 7: eth7 to internal LAN 2
– LAN 11: eth2 internal LAN 11

D-Link DWR-921 LTE:
– WAN 1: LTE link to Mobile service ISP.

EdgeRouter POE:
– WAN1: etho, IP = 192.168.7.10 linked to EdgeRouter ER-8 eth7 with gateway 192.168.7.1
– WAN2: eth1, IP = 192.168.0.50 link to D-Link DWR-921 LTE eth4 with gateway 192.168.0.1
– LAN 2: switch0, all the internal LAN

Remote Site:

UPC Cable Modem:
– WAN 1: eth0 linked to the ISP 1

EdgeRouter Lite:
– WAN 1: eth0, link to UPC Cable Modem eth1 with gateway 192.168.0.1
– LAN 9: eth1, local service LAN
– LAN 10: eth2, local management LAN

Now I can safely apply what ubnt-stig suggested on http://community.ubnt.com/t5/EdgeMAX/Dual-WAN-failover-OpenVPN-site-to-site/m-p/1524860/highlight/false#M104986

I will have to define in fact a policy based routing for my OpenVPN site-to-site connection.

STEP 1: Define new routing tables with static routes for each load-balanced WAN

The problem with load-balancing with failover is that sometimes is counter intuitive how it works. If not specified when the failover occurs a new routing table is forked with some default values copied from the main table. As a result if you have DHCP WANs with some default routes you may end up with missing or wrong routes.
The safest way is to statically specify the default route for each WAN. Of course this implies that both your WANs have in fact static IPs and your default gateways are also static IPs. Sadly if one of your WAN IPs is obtain by DHCP there is still no valid solution as firmware 1.8.

Define two new routing tables:
– table 1 : that will be the routing table for WAN 1
– table 2 : that will be the routing table for WAN 2

In table 1 we add the default route for eth0

In table 2 we add the default route for eth1

STEP 2: Define a firewall modify policy to select a different routing table for load-balace “group G”.

Change the routing table for load-balance

List the new load-balance configuration

STEP 3: Add a static route to the remote LAN 9

Add the route to the remote site also on main table as static route. This is important because we have to instruct the router that this network is accessible through the OpenVPN vtun0 interface.

STEP 4: Add the static routes to the WANs with different distance

Initially both my default routes were obtained by DHCP. Because of that two default routes were still added there so I was getting lost packages while pinging the remote networks. Even after I switched to static IPs and defined by hand the static routes I was getting the same results.

In fact even by pending the external WAN1 IP I got lost packets. This means that by default packets were routed on both eth0 and eth1

Then I tried to delete the default routes and leave only the default routes from table 1 and table 2.
This still did not work because from some reason the router needs a “route of last resort”.

Then I tried to add only the route to WAN 1 as the route of last resort. Somehow with this case the load-balancer was unable to verify now that WAN 2 is up. Strange that the route test ping does seem to ignore table 2 and wants to go through the main routing table. Because in the main routing table there was no route to the WAN 2 it was failing.

The miracle solution was to define static routes to both WANs in the main table but with different distances. By defining route to WAN 1 with distance 1 and route to WAN 2 with distance 200 (allowed values are between 1 and 250) problem is solved.
– all the packets that were routed by the main table will go through the route with the smaller distance. I am no longer getting lost packets for VPN or by ping to the WAN 1
– packets that need to go through WAN 2 explicitly will be able to do so bacause there is a static route to WAN 2.

The final routing configuration looks like:

There are no changes to the remote site.

STEP 5: Apply the changes

Reset the load balance with ubnt-stig trick

Reset the OpenVPN tunnel to apply the new configuration:

Advertisements