Strange routing issue across multiple sites

Question

I'll start with a diagram of what everything looks like today and then get into the history:

Diagram

From that diagram, from top to bottom:

RTR 2 is at a remote site. (Site A)

RTR 3 is at a remote site. (Site B)

RTR1, CoreSW2, CoreSW1, and SW3 are all in the same building (HQ)

SW1 and SW2 are at the same remote site, but in 2 separate buildings. (Site C)

I'll try to explain this as clearly as possible.

Site C used to to be set up like Sites A and B, in that there was a Cisco router with a serial link to RTR1 at the site. Then there was a layer 2 switch hanging off that router handling the users. We are in the process of getting rid of all our T1's at remote sites and moving to fiber, and handling all of our switching and routing at the main site (HQ). I created the SVIs and used HSRP v2 in our 2 core switches, so:

Vlan 550

xxx.xxx.120.2 -CoreSW1

xxx.xxx.120.3 -CoreSW2

xxx.xxx.120.1 -Standby 550 ip

xxx.xxx.120.4 -SW1(L2)

xxx.xxx.120.5 -SW2 (L2)

The connection between SW1 and SW2 is a fiber link from one building to the other. Here is where it gets strange.

RTR 2 can ping and communicate with every device without an issue, and every device can communicate with it.

RTR 3 cannot ping SW1 (120.4) but it CAN ping SW5 (120.5). It can also reach 120.1,120.2,and 120.3, which would be expected since those are technically at HQ and not at Site C. It can also get elsewhere on the network, however I can't wrap my head around it being able to reach SW2 when then only route there is through SW1.

A traceroute to RTR 3 from SW1 stops at 120.2. A traceroute to SW1 from RTR 3 stops at xxx.xxx.8.69. (Network 8.68/30 - 8.69 is the serial interface on RTR 1, 8.70 is Ser0/0/0 on RTR 3, also its default route.)

I assume I am missing something simple here, even though this was a complicated lead-in, however I've been looking at this way too long and I can't figure it out.

Full disclosure, I am also in the process of getting rid of a lot of old static routes that existed in this network and enabling EIGRP.

Is there any smoking gun as to these communication problems? I can give more info as needed. Also, there are obviously other devices on the network here...there are other routers and T1s connected to RTR 1 via serial cables, etc. I tried to just include relevant stuff. Maybe a problem with my SVIs, although configuration on those are pretty light?

Thanks~

score 0 · Accepted Answer · edited Jun 11 '20 at 10:02

0

A traceroute to RTR 3 from SW1 stops at 120.2. A traceroute to SW1 from RTR 3 stops at xxx.xxx.8.69. (Network 8.68/30 - 8.69 is the serial interface on RTR 1, 8.70 is Ser0/0/0 on RTR 3, also its default route.)

I created the SVIs and used HSRP v2 in our 2 core switches

Possible Issue: You may be having an issue with the return route.

Update: Southbound traffic through the core make be taking an asynchronous return path.

Update 2: You do have some asynchronous routing. If a traceroute from SW1 to RTR3 stops at 120.2 then it is hitting CoreSW1 rather than CoreSW2, the most direct path, according to your diagram.

Troubleshooting: I would make sure that each L3 device has a specific or default route back to RTR3. I suggest looking at each routing table. I would check to make sure that you have pings from each core switch and RTR1 to xxx.xxx.120.5.

You could have a problem in the Core where one core has a route back but no route on the other core switch.

Update: What mechanism do you have in place to prevent asynchronous routing? Check the links between Core SW2--SW3 and Core SW1--SW3. If they are L2 links, make sure that you have Core-SW2 as the STP Root Bridge. If they are L3, make sure that Core SW2 is the preferred next hop.

Update 2: Make sure that HSRP is working as expected and that CoreSW2 is the primary and active router in HSRP. You have a general design issue because both Core SW2 and Core SW1 should have direct paths to RTR1.

edited Jun 11 '20 at 10:02

Community

1

answered Dec 22 '16 at 20:14

TDurden

246
1
3
6

Id second this, check your return route – boopzz Dec 22 '16 at 20:30
Thanks- I just double checked and verified pings: RTR1 can ping 120.5 but not 120.4. CoreSW1 and CoreSW2 can ping both 120.4 AND 120.5. From RTR1 - sh ip route xxx.xxx.254.81 (RTR3's Fa0/1) is known via "eigrp 1", sh ip route xxx.xxx.8.70 (RTR3's Serial Interface) is known via "connected". From CoreSW1 and CoreSW2, sh ip route xxx.xxx.254.81 and xxx.xxx.8.70 are both known through eigrp via RTR1's ip as the next hop. – CD305865 Dec 22 '16 at 20:57
Regarding the updates: SW3 has a default route sending all traffic to xxx.xxx.0.1; which is an SVI between CoreSW2 and CoreSW1. In standby config, CoreSW2 is currently the active interface, however just to be sure, I went ahead and changed the default route in SW3 to CoreSW2's main ip. No change there though, I still cannot ping 120.4 from RTR 3, but can ping 120.5, as well as everything else on that range. 120.4 does (obviously) ping from SW3, if I had not mentioned that before. – CD305865 Dec 27 '16 at 15:58
No ACLs or Firewalls in effect? Can you post a Routing Table for RTR3 and SW3? – TDurden Dec 27 '16 at 20:17
Sure. http://pastebin.com/bU1KHxBK There is an old ACL on a lot of my routers pointing to the old Solarwinds servers, but they aren't applied anywhere. I'll go ahead and remove them just to be sure though. – CD305865 Dec 28 '16 at 13:50
Looks to be an issue with HSRP. I ended up putting the link from RTR 1 back into CoreSW1 (it used to be there, but I had issues with host flapping between the switchport it was connected to and Po1) and put a static route in RTR 1 that sends traffic for that network directly across CoreSW2, as opposed to the SVI ip. I will troubleshoot HSRP separately. Thanks for the assist. – CD305865 Dec 29 '16 at 17:40

Strange routing issue across multiple sites

1 Answers1