6

I have a GlusterFS 2 node 2 replica setup. I am planning to use it as OpenStack instance store, in which the VM disk image is stored.

From my tests, if the GlusterFS node which the hypervisor currently mounts on fails, (using default GlusterFS settings) it takes about 45 seconds for the connection to timeout and the glusterfs client fails over to the other node. During this 45 seconds IO operations will hang, from VM's perspective that means the disk become unresponsive.

I know for Linux, if the disk become unresponsive, after some time (I'm not sure how long) kernel will remount the filesystem as read-only.

I can also lower the value of GlusterFS volume's network.ping-timeout, which will reduce the failover time.

My question is, how much should I set this value such that most OS can tolerate the unresponsive time of virtual disk without side effects?

To be more precise, I would like to know the disk unresponsive time that Windows NTFS, FreeBSD UFS/ZFS and Linux ext4 can tolerate. What are the parameters involved? (for example, /sys/block/sda/device/timeout on Linux)

related information:

Update: @the-wabbit has answered about Linux and Windows, I would also like to know the case of FreeBSD

Pellaeon
  • 953
  • 1
  • 7
  • 7

2 Answers2

4

The disk driver typically will wait until a configurable timeout is exceeded before even reporting an error for the operation requested.

As you have found out, this is /sys/block/<devicename>/device/timeout in Linux and defaults to 60 30 seconds.

Windows is storing this configuration as a global setting TimeoutValue (REG_DWORD) in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\ with a 60 seconds default.

As long as no error is reported upstream, you will see no immediate action (like a ro-remount of the FS), even after the timeout is up you will typically see more error handler action (logging, resetting the device etc.) before an error is passed back to the upper layer.

But be aware that there will be other implications affecting overall availability.

  • applications or system services might implement own timeouts and throw exceptions on expiration
  • on servers with a high request turnaround, you will see queues fill up and memory exhaust as new clients keep submitting new requests with the old requests still waiting for the storage to respond.
  • if you happen to have swap space on the failed device, all page in / page out requests will stall, effectively blocking the processes working on these memory pages.

In general, you will want to keep the failover time as low as possible while still operating without premature failovers due to occasional load spikes or network glitches. Determining the right value for your use specific case is very much trial-and-error work over a prolonged period of operation. For general-use server VMs I would aim for something in the magnitude of 10 seconds, if feasible and supported by your infrastructure.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • 2
    Strange, I spot checked a couple of Linux boxes and found the timeout set to 30 seconds. I don't recall having changed this, though it's possible that tuned did it. – Michael Hampton Sep 18 '15 at 09:49
  • 1
    @MichaelHampton no, you are right - [the default seems to be 30 seconds](http://www.docs.cloudbyte.com/knowledge-base-articles/customizing-disk-timeout-values-for-ha-settings-on-various-operating-systems/) – the-wabbit Sep 18 '15 at 09:53
  • Are there other parameters involved? Because `/sys/block//device/timeout` doesn't seem to have anything to do with filesystem, I believe there should be some other related filesystem parameters? – Pellaeon Sep 19 '15 at 02:01
1

FreeBSD has the geom_mountver (https://www.freebsd.org/cgi/man.cgi?gmountver), which can be used to make it tolerate any failover time. If you're using ZFS, you might need to disable the deadman timer; it will panic the box if an IO doesn't complete in 15 minutes (IIRC).