10

Linux systems sometimes remount the root file system as read-only, e.g. if there's an I/O error.

I have a machine that becomes useless when this happens, and I end up rebooting it manually.

Is there a way to make Linux just automatically reboot when this happens? A read-only mount is useless to me.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
user541686
  • 427
  • 1
  • 6
  • 14
  • 15
    I'd also investigate the source of these I/O errors. The last time an ext2 filesystem went readonly for me was in 1994, and the cause could be traced to a broken CPU fan. – Simon Richter Nov 23 '21 at 15:35
  • 12
    You have an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here. The correct solution is not to make the system reboot on an IO error (the accepted answer explains how to do that, _but_ that’s actually rather risky for multiple reasons), it’s to _fix the root cause of the IO errors_, because then the filesystem will not randomly get mounted read-only. If it’s only intermittent and the storage device is good, you probably have suspect RAM or a flaky PSU, both of which can cause much bigger issues than a simple filesystem error. – Austin Hemmelgarn Nov 23 '21 at 21:27
  • 1
    @AustinHemmelgarn: I don't have an XY problem here. You're just making a lot of assumptions that don't hold true in the case(s) I'm asking about. – user541686 Nov 24 '21 at 00:35
  • @SimonRichter: I indeed have tried looking into the cause, but thanks for the reminder, others should probably do that before rebooting. – user541686 Nov 24 '21 at 00:39
  • I just realized somehow I posted this on ServerFault rather than Unix.SE as I had intended to! Glad it's still on-topic I guess, but feel free to migrate if needed. – user541686 Nov 24 '21 at 00:45
  • 7
    Rebooting rather than sort out the reason for the R/O remount has a high likelihood of making the problem *worse* - especially if it fails to mount the system on reboot and you're now stuck with an entirely unresponsive system. – Shadur Nov 24 '21 at 15:02
  • 7
    @user541686 You have random IO errors. That _will_ cause other problems eventually (and trust me, they will be much more of a pain to fix than just rebooting the system), hence my assertion that this is an XY problem. The fact that you do not recognize the X as a problem does not make it any less of an XY problem. – Austin Hemmelgarn Nov 24 '21 at 17:34
  • @AustinHemmelgarn: I'm well aware of what's going on in my situation and why I resorted to this solution. Unfortunately you're not. The fact that you don't recognize you're still making unfounded assumptions about my situation doesn't make you more correct, but admittedly I can't stop you from lecturing. – user541686 Nov 24 '21 at 18:34
  • 1
    @Shadur: I fully understand all that, believe it or not. Nobody is saying this solution should be used in every situation. I'm just telling you I have **a** situation where this solution makes sense. If you can't imagine why, that's fine. Just have faith in me that I'm not stupid and that I'm only asking this because there's information I have that you don't. – user541686 Nov 24 '21 at 18:38
  • 4
    @user541686, if there's relevant information, provide it. Don't just say "trust me". – Mark Nov 24 '21 at 21:13
  • 3
    @Mark: No, I won't provide irrelevant info. It's quite literally nobody's business what situation I'm dealing with that gave rise to this question. If you would rather believe it's out of my stupidity, feel free to continue believing that; don't feel obligated to "trust me". It's not like I can force you. – user541686 Nov 25 '21 at 02:31
  • @user541686 You're papering over I/O errors on the root filesystem with a reboot, on the ***HOPE*** that your system will return to operational status. You're coming across as someone who thinks they know everything but in reality is just smart enough to be dangerous. You may think you know why you're getting IO errors, but what happens **when** you get one that's not like you think? You get a dead system that you can't access. "I know what's going on!" doesn't provide any limits as to what **can** go on - the universe doesn't care about what you think you know. – Andrew Henle Nov 25 '21 at 12:23
  • 3
    @Mark (and others) _"... if there's relevant information, provide it. Don't just say trust me."_ - I don't think it's worth barking up the XY tree here. First of all, the question as it stands (panic/reboot instead of remounting ro) is a perfectly valid and answerable question. Secondly, the OP seems well aware that I/O errors are, ahem, not ideal, and has now explicitly declared that area off-topic. Sadly, sometimes there's just nothing you can do to fix the root cause _right now_, and a workaround is needed. With that in mind, I don't think we're in a place to demand OP provide more context. – marcelm Nov 25 '21 at 16:20

2 Answers2

23

I deduce you are using ext3 or ext4 as the file system. If so, you can mount it with the errors=panic option and configure watchdog to reboot your system in case a panic happen.

While more complex than roelvanmeer's answer (which I upvoted), it has the added bonus of working for all panic-level kernel crash.

As suggested by NikitaKipriyanov, setting the panic=5 kernel boot option can be a simpler alternative to watchdog (which has more configuration options but it is slightly more complex as result).

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 2
    Alternative to watchdog might be adding something like `kernel.panic = 5` into the `/etc/sysctl.d/panic-reboot.conf`. – Nikita Kipriyanov Nov 23 '21 at 09:51
  • Thank you! I'll give this a shot. Hopefully it won't [fail to reboot](https://forums.debian.net//viewtopic.php?f=5&t=102033)! – user541686 Nov 23 '21 at 10:07
  • @NikitaKipriyanov good suggestion, I'll edit my answer. Thanks. – shodanshok Nov 23 '21 at 10:45
  • 6
    warning: probable reboot loop – joshudson Nov 23 '21 at 15:54
  • @joshudson: Yeah I'm planning to watch out for that, that's definitely an important warning for anyone trying this. – user541686 Nov 24 '21 at 00:38
  • @joshudson If it reboots at all. Relying on a system that knows its root filesystem might be corrupt and/or its root disk broken to reboot is based on wishful thinking and unicorns. – Andrew Henle Nov 25 '21 at 12:25
  • 2
    @AndrewHenle: I've brought a lot of systems up with a trashed root filesystem. Usually I can' take over the boot process and get fsck to run because the damage rarely hits `/sbin` or files that haven't changed in awhile. – joshudson Nov 25 '21 at 16:13
  • @joshudson You hope... ;-) My thoughts here are based on the idea that trying to soldier on when your root filesystem device is tossing IO errors is a misguided effort in the first place and throwing in a reboot only makes significant issues more likely - "My root device is going bad, so let's do something that ***really*** depends on the root device being fully functional and having proper access to the bulk of the filesystem!" – Andrew Henle Nov 25 '21 at 16:24
14

Maybe not a very pretty solution, but my first thought would be to run a command from cron every minute:

test -w / || reboot
roelvanmeer
  • 1,720
  • 2
  • 11
  • 25
  • +1 thanks, this'll be a great fallback if the other solution fails! – user541686 Nov 23 '21 at 10:07
  • I think it is not guaranteed that `test -w` checks if the filesystem is read-write. Though GNU `test` and `test` built into `bash` seems to do that. --- Here you can see what should POSIX-compliant `test` do: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html#tag_20_128_05 As I understand it `test` is only required to check the access rights of the file. – pabouk - Ukraine stay strong Nov 23 '21 at 18:19
  • 1
    In which case `tee -a /root/.bash_history < /dev/null || reboot` will work. – joshudson Nov 23 '21 at 19:15
  • @joshudson Even simpler: `touch /writecheck || reboot` – shodanshok Nov 23 '21 at 20:38
  • @shodanshok That's great if you don't mind a file called `/writecheck` lying around at the root of your filesystem, since it'll be created when the filesystem _isn't_ read-only. The other proposed methods were attempting to avoid creating any spurious empty files. (Though if pabouk is right — which I'm 50/50 on, personally — actually-creating a file may be unavoidable, in order to fully determine the filesystem's read-only state.) – FeRD Nov 23 '21 at 22:07
  • Another question about the problem of testing write access: [How to non-invasively test for write access to a file?](https://unix.stackexchange.com/q/159557/19702) – pabouk - Ukraine stay strong Nov 24 '21 at 10:49
  • 2
    @shodanshok that could lead to unexpected reboots - or reboot loops - from error conditions unrelated to filesystem errors, eg temporary upsets of the libc installation, OOM conditions, anything that could make touch fail.... – rackandboneman Nov 24 '21 at 19:23
  • 3
    @rackandboneman sure - but *any* script with `|| reboot` is subject to these issues. Moreover, if `touch` fails on your system due to libc issues, you probably have worse problem then a reboot loop. Anyway, as stated in my answer, `watchdog` is the way to go for more advanced needings. – shodanshok Nov 24 '21 at 20:07