How to investigate the cause for unresponsiveness on Linux?

1

1

I apologize that the problem I'm writing about is not very concrete. I use KDE4 on Debian testing, and use KDE's file manager Dolphin very often, most of the time without problems. Recently, I suppose after a system update, Dolphin is often very unresponsive. This may happen directly at startup – a minute or so passes before the window comes up – it may also happen later after for some time everything has been fine. The window contents are no longer updated, it takes ages until a file is opened after it was clicked on, etc. A reboot sometimes makes the problem go away, but not for long. I thought it might have to do with optical drive access, but the problem stays even if there is no medium in the drive. – I don't have any mounted network file systems. There are also no other processes eating up CPU time and/or disk bandwidth.

Now, the question I'm asking is not about this particular problem with Dolphin, but this:

How can I generally deal with the situation that a program becomes unresponsive? Is there a standard strategy to find out what causes such a problem, so that 1) I might find a fix or workaround for myself and/or 2) be able to submit a useful bug report?

In this case, because I thought it might have to do with Dolphin trying to access certain files and hanging because there's some kind of block, I started dolphin under strace and tried to make sense of the messages. However, there are lots and lots of "errors" of type "EAGAIN (Resource temporarily unavailable)" or "ENOENT (No such file or directory)", most of which don't appear to represent a problem. The only thing I learned reliably is that even if Dolphin doesn't react to user input that does not mean that there isn't a lot going on in response to mouse movements and mouse clicks...

Is strace the right tool? If yes, what should I look for in its output? If not, what should I use instead?

A. Donda

Posted 2013-12-10T19:27:26.693

Reputation: 660

Question was closed 2013-12-15T19:21:12.340

Answers

1

Well, strace prints a list of system calls made by the program. It may be useful and educational to use it, but if you are not a programmer it may not be very practical.

htop

If what you want is to make poorly responsive system get back to working state, then one of the most useful programs I have found is htop. Basically it shows you real-time system usage in a terminal. You should read a bit about it - it is very well documented and quite a few articles have been published about it. You use it in a terminal, therefore if your desktop has frozen but if you can still log in your computer via ssh it works. E.g. from your windows machine via pUTTY. It gives you a list of processes and shows the most important information about them. With F6 you sort processes by specific resource usage (e.g. processor, memory, swap) and thus you can see which program is the resource hog. With F4 you can filter by program name - just start typing. F5 shows you process tree and likely will show you what files are open by your program. With F9 you can send whichever KILL signal you want to the program. Nice thing is - you can simply move up and down with arrow keys and press numbers to select options - you should experiment a bit to appreciate this.

My rule of thumb is - if the system has not hung so much that pressing Num Lock does not blink the NumLock light, then chances are that some simple investigations and - SIGHUP or SIGKILL from htop will bring it back to stability. If the situation repeats - then you can fill the bug report.

r0berts

Posted 2013-12-10T19:27:26.693

Reputation: 1 585

Hello r0berts, thanks for your reply. The thing is, its not the system or desktop that is frozen, but just this particular program. I did use top to look at the state of system ressources, and I couldn't find anything out of the ordinary. htop looks really nice, thanks for the tip! But I don't think it will enlighten this problem. And while I'm far from being a professional programmer, I'm not completely clueless. So do you have an idea how I could identify a problem with one program, where the freezing is not related to high CPU, memory, or bandwidth use? – A. Donda – 2013-12-11T23:31:40.467

Hi, yes I think I fall in the same category - not a programmer, but not completely clueless. If resources are not the problem then we probably are thinking of some other programming glitch. A couple more of tools to look specifically at what happens with dolphin would be

lsof pidofdoolphin – r0berts – 2013-12-12T12:17:05.447

1

Sorry for prev. A couple more useful tools: lsof lsof pidOFdoolphin (get pid from top or pidof). That will see files opened by dolphin. wireshark to see if there are outgoing requests that never come back; file managers often wait for network to time out and that may appear as a freeze. Could be glitch in program or plugin. Switch network off altogether and see if the same behaviour happens. nmon is another useful way to look at your system. Plus an interesting forum discussion describing somewhat similar conditions link

– r0berts – 2013-12-12T12:38:17.177

1When you come to the solution, perhaps you could post it here. It is interesting why a program (widely used at that) gets non responsive and does not show any impact on system resources or networking. – r0berts – 2013-12-12T18:49:23.777