Through ISA running on SBS 2003 SP2 as a LAN gateway, no matter how many clients in the LAN are running, I'm seeing massive amounts of HTTP timeouts to any external service. The server has two network cards: one is used for LAN/DMZ (different IP ranges; using virtual LANs on one port) and one card for WAN. ISA is not used as a web proxy and I verified the timeouts with various tools, from the usual visual browsers over wget and telnet and in e.g. PHP applications.
I used some scripts and rrdtool to make this graph, which measures loading time of an external resource (I've had already tested seven different external websites, of course with appropriate permissions, to test load time; the all look the same); the unit (seen on the total left) is in seconds; I've set a 30 second max timeout while gathering the data.
(Note: this image is around 270kb in size and 16000 pixel wide!) ISA Timeout RRD graph http://markus.fischer.name/tmp/isa_timeout.png
This graph spans 24 hours; the outage from around 11 to 13:20 was due a re-configuration (which obviously didn't change anything).
I've already verified the following things:
- traffic from LAN/DMZ over Server to WAN causes timeouts
- traffic from Server to WAN causes no timeouts
- traffic from LAN/DMZ to Server causes no timeouts
Hardware, like switches and cables, have already been verified to not cause this.
Update:
I decided to take this up higher and open an M$ support ticket for that issue. I'll append updates as I receive them.
Update 2:
Two weeks have passed, not much progress. I'm actually not going after the ticket myself but we've a company which does that for us. I think that was a smart move, saves me time for other things.
Anyway, the ticket was forgotten by M$ in the first week thus only last week there was progress, leading to a patch for ISA to be deployed, which unfortunately didn't change anything.
Next move was that they requested extensive reporting information which they've received yesterday.
Update 3:
Now it's 10th of August. The problem suddenly disappeared on 6th of August. Right in the middle of the day at around 11:17 the last of the permanently measured timeouts occurred. Since then, no such problems from no network from no external hosts in this type of scenario could be detected.
There could be no single action identified to stay in connection with this sudden disappearance. The night before was a partial outage within the company and at 12:30 we reseted some hardware which didn't full recover after the outage (we only got aware that the problem has gone until the late afternoon this day).
From my support company as well as from M$ itself, besides gathering logs and reports, nothing came up before and after this so far. Since time is money I've to suspend further research into this for now ...