2

I'm using libvirt with Xen 4. Every time I use the virsh tool, it takes a long time until it's started. I start virsh on the same machine where the Xen hypervisor resides.

Example:

root@xen1:~#: time virsh list
 Id Name                   State
------------------------------------
  0 Domain-0               running


real    0m6.505s
user    0m0.000s
sys     0m0.020s

How can I speed this up? It also happens when I run virsh without arguments. I don't get any errors, even in the log file.

Daniel
  • 2,877
  • 5
  • 20
  • 24
  • 1
    try running `strace -ff virsh list` to find, what it does when it seems idle ... either it will pause at some place - where the last line will probably tell you what it's waiting on (add relevant last lines if you need help figuring out what't it doing) or it will output continuous stream. then run it like `strace -fo strace.out -ff -tt virsh list` and try to find what it spends most time doing. `strace -c -ff visrh list` may also help ... – Fox Dec 11 '11 at 19:36
  • `09:45:09.432100 poll([{fd=14, events=POLLIN}, {fd=15, events=POLLIN}], 2, -1) = 1 ([{fd=14, revents=POLLIN}]) 09:45:11.661586 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0` Not sure what that means, though. – Daniel Dec 12 '11 at 08:48
  • well that means it's waiting on input from filedescriptor 14 and 15 ... either try to find, what has been written to these fds to figure out what is it waiting on, or try looking up what are those fds (either by searching the strace for open, or connect calls resulting in these fds, or by looking it up in `lsof`) – Fox Dec 12 '11 at 10:42
  • Having the same issue here. All virsh commands related to the hypervisor are slow, sometimes taking close to a minute. I see similar things with strace and lsof tells me that these two fds are a unix socket and a fifo pipe. Any help? – r.v Nov 05 '15 at 03:29
  • I have the same problem, it waits on fd=5 and 6 and then on fd=3. – Xdg Dec 13 '17 at 18:23

1 Answers1

3

Not really an answer, but I don't have enough rep to just comment on your post.

If you note, the user and sys times are very low. It isn't that the virsh program is taking a lot of resources or needing much actual cpu time. If the real (i.e., wall clock) time is high it's an indicator that your system is very busy with other things and it's taking a while to get to you.

Try running "top" to get a view on what is making your machine so busy. Look especially in the %CPU column to see what the busiest programs are. You can use "<" and ">" to change the sort column. Look also at the %MEM column to see if something is eating up a large amount of your RAM (compare with the RES column, which gives you the resident set size of each process... resident set size is the amount of memory actually in use vs. VIRT which is the total memory that process wants). In the "S" column, if you see a lot of processes in state "D", that indicates that you are I/O bound somewhere. These processes are blocked waiting for I/O.

jlp
  • 401
  • 2
  • 5
  • Good point, jlp. I ran top while invoking `virsh`, but there is no process that consumes a lot of CPU or memory. Actually, there's nothing running on that host, not even a domU. So the load is 0.00 all the time, and the machine has 12 CPU cores. The only thing I see is the process `xenstore` which consumes 3% of CPU when invoking `virsh`. – Daniel Dec 11 '11 at 10:41
  • I agree with @Fox that you need to find out what's on the other end of those two FDs. Do you have the full strace output that you can put in a pastebin or somewhere? – jlp Dec 13 '11 at 07:01
  • `strace` actually created two files.. The first is [here](http://pastebin.com/pjuiFUrR), the second one [here](http://pastebin.com/W2H3QLVL). I guess there's something wrong with the libvirt socket; but I have this issue on all Debian 6 boxes. – Daniel Dec 13 '11 at 07:50
  • 1
    Okay, reviewing the logs in the pastebins, the pauses are while reading from fd 4 (these reads are in the second paste... there's a four second pause and a three second pause while it's reading from fd 4). Back tracking into the first paste, you can see that fd 4 is attached to a unix domain socket, /var/run/xenstored/socket (see around line 510 in the first paste). To me this indicates that there is something doing on with whomever is on the other end of that socket... likely xenstored. You probably want to do some similar tracing on xenstored and see what he's doing. – jlp Dec 13 '11 at 18:33
  • 2
    Try lsof /var/run/xenstored/socket to make sure what's on the other end, and when you get the pid, use strace -p pid to attach to it. – jlp Dec 13 '11 at 18:35
  • jlp, I gave the bounty (+50) to you. The problem is not yet solved, but your answers were really helpful to me. The `strace` output from xenstored is quite big. I just did a `virsh list`, and the trace is over 1MB. Couldn't find a pastebin that allows to post that amount of data, so I've uploaded it [here](http://80.74.157.156/strace.txt). Couldn't find any errors, though. – Daniel Dec 15 '11 at 17:59
  • Well, I wouldn't necessarily expect any errors. We're really looking for places where there are unusual pauses or delays. I'll take a look at your long trace and let you know if I see anything. – jlp Dec 19 '11 at 18:02
  • Took a quick look and I don't see any obvious pauses. There are a lot of lock and unlock operations on /var/lib/xenstored/tdb*, which I suppose are expected. Perhaps there is some underlying filesystem issue on the dom0 wher e/var/lib/xenstored is mounted? You might also try "xenstore-control check" to check the integrity of the DB. I'm pretty much out of ideas at this point. – jlp Dec 19 '11 at 18:15
  • I did find this: http://www.devco.net/archives/2007/12/05/xen_no_space_left_on_device_sillyness.php – jlp Dec 19 '11 at 18:17