1

I have a network where my WAN (a CableModem) goes into a (dubiously configured) Ubuntu box which acts as the router for the whole company.

We recently switched our ISP, and we're getting a very weird behavior. In Vista/Server 2008 machines, some websites don't work. This comes and goes a bit, but there's one, www.dilbert.com that never works.

We've been experimenting with this over weeks, and these are the things we've found so far...

  • The Linux server itself CAN load the Dilbert homepage
  • Windows XP/2003 machines work PERFECTLY
  • Windows Vista/2008 machines have problems
  • Sometimes, on my machine (2008), in FireFox, the page loads. It never loads in Chrome, for example. Now, once it loaded in Firefox, if I press F5, it dies immediately, it only loads once. The error is "the connection was reset"
  • Restarting my machine, the router, the cablemodem, etc, won't make it load again, I still get "connection reset", it also doesn't work from other machines. Some time afterwards (days), however, it will load again, once.
  • Also, I can sometimes open a telnet connection (using the Windows telnet client) to www.dilbert.com:80 and do a "get /", in which case I get a bunch of HTML, and the connection closes, expectedly. Now, if I open the same connection again, it connects, but as soon as I press "g", it disconnects me. This is the lowest level kind of information I could find.
  • Using "Fiddler" to inspect the network traffic, with some browsers the request shows with "response status 0", an empty response, and no error details. With other browsers, it doesn't even appear.
  • Finally, if I connect the cablemodem directly to a 2008 PC, without the router in between, everything works wonderfully, so I'm 99.99% sure the problem is inside my network, and not with my ISP.

Now, my best guess is that this is some kind of weird interaction between Vista's network layer (which as I understand, was rewritten from XP's, it's not the same), and Ubuntu's.
That's all I got. Besides that, I'm completely dumbfounded, and starting to believe there's a curse on my building.

Can any of you think of any plausible ideas as to what might be going on / what I could do to fix / diagnose this?

NOTE: I know NOTHING about linux, although I can run commands if you can think of something that'll give me some useful info.

Thanks!
Daniel

wfaulk
  • 6,828
  • 7
  • 45
  • 75
Daniel Magliola
  • 1,402
  • 9
  • 20
  • 33

2 Answers2

3

Feels like an MTU issue to me. I don't know what Vista's MTU algorithm is, though.

As a simple test, set your Vista machine's MTU artificially low, like 500, and see if that resolves the problem. If so, we can say that MTU is the cause and go back to looking at exactly how to fix that.


What is MTU?

MTU == Maximum Transmission Unit. Effectively, it's the largest ethernet packet that your computer will send. The IP spec says that if a router receives a packet too large to send out its (other) interface, it can either fragment the packet or send back an ICMP message that says "this is too big". Fragmenting is bad for performance, though, so many of those routers never agree to fragment. Also, modern OSes use a trick to determine the optimal MTU for a particular path by sending packets in decreasing sizes until they no longer hear a response back about the packet being too big. This is Path MTU Discovery, or PMTUD.

What's happening to you is that a router somewhere is refusing to fragment a packet, but your Vista machine is never hearing its refusal notification. When you set your MTU really small, you're telling the OS to always transmit packets smaller than any modern circuit is likely to be able t oaccept, so your packets will never encounter the need to fragment.


Update

Now that we know it's an MTU issue, and one that involves only your Ubuntu router, that would imply that your Ubuntu router is breaking packet fragmentation somehow.

Is it running a firewall? Is it blocking ICMP? If so, try disabling that. (Obviously, move back to the standard MTU of 1500 first; otherwise you'll be debugging a problem that doesn't exist any more.)

wfaulk
  • 6,828
  • 7
  • 45
  • 75
  • Hmmmm. I love you. I mean it. Can I propose? :-) MTU=500 fixed it... Now the question is, what should I set it to? (I don't know what MTU really is, honestly) – Daniel Magliola Oct 02 '09 at 22:05
  • Oh, btw, it was set to 1500 before I changed it, in case it helps – Daniel Magliola Oct 02 '09 at 22:07
  • 1
    Please don't. My wife will hunt you down. – wfaulk Oct 02 '09 at 22:09
  • Running a firewall: yes. Blocking ICMP: I can ping my server from the inside of the network, not from the outside (destination port unreachable), not sure what that means. What should I disable? – Daniel Magliola Oct 02 '09 at 22:21
  • NOTE: I just disabled ICMP blocking, now I can ping from the outside, but the problem still persists. You can see my firewall script here: http://pastebin.com/m38aa9950 THANK YOU!!! – Daniel Magliola Oct 02 '09 at 22:22
  • Thanks for the explanation on what MTU is. Now, I had a weird scenario. I set out to empirically determine it, so I started doing a binary search... 1000, 800, 600, 580 didn't work. 550, 560, 570, 579 worked. After 579, I set it to 580 and it DID work. 600, 700, 800, 1000 worked. 1200 didn't work, and after that, 1000, 800, etc didn't work... So there's something funky going on here... – Daniel Magliola Oct 02 '09 at 22:30
  • I'm not an iptables expert, so I think I'm going to have to defer. As for the variable MTU, it's possible that your packets are not always taking the same route, and the routers on each path have different MTUs. You can probably see this with a traceroute/tracert, but I'm betting that they won't work, since they also need to receive ICMP responses. – wfaulk Oct 02 '09 at 22:37
  • So what do you think I should do? If I set MTU's in all my Vista machines to 500, will that end up destroying my network performance, or is that an acceptable workaround? Also, why do you think this affects Dilbert.com and not other stuff? – Daniel Magliola Oct 02 '09 at 22:42
  • 1
    It's not going to destroy your performance. If getting to those web sites is a real problem, it's certainly a viable short term workaround. As for what to do: wait for an iptables expert to pop around? – wfaulk Oct 02 '09 at 23:06
2

If you can, I would switch to something that was actually designed for use as a router.

I personally recommend pfSense.One of the common deployments of pfSense, is as a perimeter firewall.

Brad Gilbert
  • 2,473
  • 2
  • 21
  • 19
  • +1 for using a router, especially for someone who's not an expert. (Don't know anything specifically about pfSense.) – wfaulk Oct 04 '09 at 07:15
  • Hmm, VERY interesting. Now, I don't quite get it... Is that its own operating system? Is it a piece of software that runs on top of another OS? (windows/linux/whatever) Is it easy to use / configure, by someone who is a "power user", but hasn't got sysadmin-level experience? (this last part is key) – Daniel Magliola Oct 05 '09 at 23:18
  • http://pfsense.org/ – Brad Gilbert Oct 07 '09 at 16:51