I have a GlusterFS 2 node 2 replica setup. I am planning to use it as OpenStack instance store, in which the VM disk image is stored.
From my tests, if the GlusterFS node which the hypervisor currently mounts on fails, (using default GlusterFS settings) it takes about 45 seconds for the connection to timeout and the glusterfs client fails over to the other node. During this 45 seconds IO operations will hang, from VM's perspective that means the disk become unresponsive.
I know for Linux, if the disk become unresponsive, after some time (I'm not sure how long) kernel will remount the filesystem as read-only.
I can also lower the value of GlusterFS volume's network.ping-timeout
, which will reduce the failover time.
My question is, how much should I set this value such that most OS can tolerate the unresponsive time of virtual disk without side effects?
To be more precise, I would like to know the disk unresponsive time that Windows NTFS, FreeBSD UFS/ZFS and Linux ext4 can tolerate. What are the parameters involved? (for example, /sys/block/sda/device/timeout
on Linux)
related information:
Update: @the-wabbit has answered about Linux and Windows, I would also like to know the case of FreeBSD