-2

I believe everyone who are working on virtualized environment like Xen has experienced some sort of IO problems which lead to read only mode for guest machine. We experience same issue couple of times a year and IO problems not always affect every VM. Which is interesting because the storage was not there for every VM.

This has created a question in my mind that; "Can I configure an IO error proof Linux guest ?"

To make a small research I took a look at problematic and non problematic Linux machines and saw that in their "fstab" file one of them has "error=remount-ro" and one has "defaults". And I believe this is the only difference in between.

But as the storage is not there at that moment, can "defaults" options harm the linux ? How can I build a IO error proof Linux that will not crash when the storage is not there ? What happens when there is no storage for 1 minutes and the mounting option was "defaults" ? Should I use another mount option other that "defaults" ?

Additional thread that is searching for the same I guess. noatime & nodiratime

Harun Baris Bulut
  • 455
  • 1
  • 8
  • 20

1 Answers1

2

If you have I/O errors in the host machines for whatever reason, you really shouldn't try to ignore these and continue - that is an absolute fundamental concept to not loose your data.

Instead of messing around with the VMs, create a system where the backing store doesn't disappear several times a year (because no, I've never experienced something like this, this is not normal and indicates massive problems in your setup).

If you have failover time issues, it should help to increase the timeout for the device so it would wait longer before generating a failure.

This has to be done on the host machine:

echo 180 > /sys/block/_devicename_/device/timeout

and if you use IDE disk emulation, on the guest as well.

You can add this information to your udev rules or add it to rc.local to make it persistent across reboots.

Notes: 180 sec. is often recommended as timeout values by storage vendors. I still would be worried if this happens multiple times a year.

Sven
  • 97,248
  • 13
  • 177
  • 225
  • What about this scenario; you have a storage which has 2 controllers like equallogic and when one controller goes down second boots up in 10 to 40 seconds. What about this ? There is not storage connection for at least 10 secs ? – Harun Baris Bulut Aug 31 '14 at 11:03
  • Okey now this is kind of answer I am looking for thank you very much. Unfortunately this happens with brand new products, you may think that we are living with murphy rules. – Harun Baris Bulut Aug 31 '14 at 12:23