1

I'm running into a problem with some disks on an Ubuntu system. They keep going into readonly mode. We have systems running in 5 different ski resorts and this problem has now occurred in 3 of them.

The machines process movies and run a webserver and some other basic services. All machines used to have regular SATA disks. We have now installed SSD-disks in two machines, so far these haven't gone into readonly and they have been running for weeks. Some resorts have a lot of traffic and data to process and some less. We haven't been able to establish a relation between that, the problem and the type of disk.

The video processing system is not ours and we rely on a partner to keep this running. They claim there is no problem with the OS. We are responsible for hardware and the entire system. The problem with this is that I can't login to check any logs and the partner won't give me access. What we would like to know is what causes a disk to enter readonly mode so we can takes measures to fix this.

Things we have done so far:

  • Change disks on two machines to SSD => now running well
  • Improve/fix wiring to ensure stable power supply
  • Run memory test on machine with this problem => no problems found
  • Replace broken CPU on one machine which had this problem. The problem did come back after we changed the CPU so we also replaced the disk for another regular one since we ran out of SSDs. It has now been running well for 18 hrs.

I ran into this thread claiming this could be a kernel bug. Any comments on that?

I will be running checks on one of the replaced disks today.

Happy with all feedback! - Abel

Abel
  • 123
  • 1
  • 3

2 Answers2

0

My experience with this is as follows..

When a Linux dumps a disk into read-only mode its an attempt to save the disk from further damaging itself. So chances are Linux is seeing something wrong with the drive and it's attempting to try to save the data on it for you.

You should backup data on it while you can and run badblocks and smartctl over the disk to see if there are any issues

Mike
  • 21,910
  • 7
  • 55
  • 79
0

They claim there is no problem with the OS. We are responsible for hardware and the entire system. The problem with this is that I can't login to check any logs and the partner won't give me access.

You should ask your partner for a copy of the logs in order to diagnose possibly failing hardware. If your partner is unwilling to help then they are your adversary, not your partner, and you should look for a real partner. < /soapbox >

It is most likely a hardware problem: possibly the hardware fails under load (i.e., it sucks), the hardware is faulty and needs to be replaced, or there are loose connections in the computers.

A possible avenue of mitigation is to buy better hardware, including RAID and a UPS.

Mark Wagner
  • 17,764
  • 2
  • 30
  • 47