3

I have a customer who has problems with our software crashing from time to time. Unfortunately, if you unplug the network cable in the middle of a transaction our software will crash every time and there is nothing that can be (or at least nothing that will ever be) done about this.

I believe their network is experiencing hiccups from time to time that are causing the software to crash, but I'm not sure how to go about proving it (my background is programming and databases, but not so much networking).

When I ping any of the client machines from the server they all respond in less than a millisecond, but whenever it crashes we can find some kind of error message in the event log about losing connection to a file (sorry, I know that's really vague but I haven't been to the site yet myself and the person who went didn't write it down).

I'm really not sure how to go about testing for something like this to even know what kind of networking tools I should be looking for. If someone could point me in the right direction I'd greatly appreciate it!

BVernon
  • 391
  • 6
  • 18

2 Answers2

3

Smokeping is a pretty slick monitoring tool if packet loss and latency are all you're interested in.

If that doesn't fit the bill, this question may have some answers that interest you.


To be clear: these sort of issues can be exceedingly frustrating to troubleshoot. It's highly likely that monitoring tools won't give you any useful information. The most likely tool you should consider employing to track this down is the packet capture. Take one both on the client and on the server, and then correlate timestamps with when the customer's app crashes and also possibly with data from Smokeping (or whatever other monitoring tool you choose).

EEAA
  • 108,414
  • 18
  • 172
  • 242
  • Before you do the packet capture, make sure to set up ntp if it isn't already running on both client and server. – Ladadadada Aug 06 '13 at 07:16
0

If the client machine in question is on a decent managed switch, a network engineer should be able to tell you if the machine's network port is flapping at all. Bad cables, flakey NIC's, bad ports on the switch, driver issues, all could cause the network port itself to flap.

Check the windows event logs on the clients machine around the time in question. Create a custom view in event viewer that contains all of the event logs with a custom time period around the crash, which will allow you to see everything that windows took note of during that time period.

You could setup a ping from the server to the client at a small interval - say twice a second - and let that run in the background until there is an issue. Log the output to a file so you don't lose the results.

If you are going to try and correlate logs across multiple machines, make sure that their clocks are relatively in sync.

1z2z
  • 1