0

I recently got my webhost (Hetzner) to add a paid of 16TB SATA drives to my webserver. Currently using 2.5TB of them. They're RAID 1 mirrored.

I also have two 4TB nVME drives with 700GB currently on them, also RAID 1 mirrored.

Every week CentOS kicks off a cronjob to run a "check" on both of my md arrays. They happen concurrently, with the nVME one finishing after 5 hours. The SATA one takes a painful 18hours, at 200MB/sec the whole time.

# Run system wide raid-check once a week on Sunday at 1am by default
0 1 * * Sun root /usr/sbin/raid-check

My server is plenty powerful, with a 32core EPYC and 128GB of ram, but I do notice an IO slowdown when this check is running.

  1. is it necessary to run these weekly?

  2. 200MB/sec * 18 hours means it's doing the whole 16TB, not just the occupied space. Can this be made smarter/lazier in any way, so it only runs on the occupied space?

  3. could this job be niced or similar? I appreciate it would take longer, but that might be preferable. see edit below

  4. would scripting pauses into this be a bad idea? So instead of 18 hours in one hit I could do (say) 3 hours per night?

  5. is this a problem everyone suffers, or have I made some poor decisions? Would getting a hardware raid card installed make me much happier, for example?

Edit

I have now discovered /etc/sysconfig/raid-check and changed NICE=low to NICE=idle. I guess I won't know what difference that makes until next week.

Codemonkey
  • 1,034
  • 2
  • 17
  • 36
  • 1
    Use btrfs-raid1 (by using btrfs filesystem) instead of stupid mdadm raid1. – paladin May 12 '22 at 14:06
  • Can you tell me more @paladin - why would that be better? And I assume I can't convert it in-place, I'd need to move the data to other drives first, then move back? I'm a full stack dev running my own business/server/site, I'm happy to admit this ain't my field of expertise. Hell, I don't have a field of expertise these days! – Codemonkey May 12 '22 at 14:20
  • 1
    btrfs filesystem supports raid on filesystem level, while mdadm does raid on block level. btrfs also does create checksum of all files and all data, while mdadm doesn't. mdadm is just stupid. btrfs compares all all metadata and all data with checksum and also is ablw to compare it with a copy (raid1 or dup copy). Should something be corrupt, only the corrupt file will be repaired, there is no need for an entire disk block level check. But please read about btrfs first, as some functions of this filesystem are different to your usual ext4 and co.. – paladin May 12 '22 at 14:52
  • 1
    You should really read more about it [here](https://btrfs.wiki.kernel.org/index.php/Main_Page). btrfs is production ready and stable to use when you use it in the right way. I'll write a small summarize later. PS you should really not use btrfs-raid5 or btrfs-raid6 mode, as those modes are experimental and highly daangerous (more dangerous than raid0). A btrfs filesystem should also always be mounted with `noatime` mount option. – paladin May 12 '22 at 14:54

1 Answers1

2

No, MD RAID can't be smarter than this. If you want to only check used areas, use ZFS, or perhaps BTRFS.

Weekly check is too often. Do this on monthly basis, or even every other month.

I don't know what this NICE really does. If it's setting the I/O nice of the [mdX_resync] kernel process, that's good and use idle. What you can limit is the bandwidth of the check: it's set in the /sys/block/mdX/md/sync_speed_max file in kB/s. This is a virtual file, e.g. it'll be reset after system restart.

By the way, it's limited at 200 MB/s by default and you seem to hit that limit. You may increase speed for SSDs (set 5000000 and and see in what time they will be checked). And instead "pausing" it for HDDs, I'd play with limits (e.g. during periods of high load I'd set lower limit, during idle time I'd set 600000 — SATA 6 Gb/S interface maximum bandwidth).

I doubt HW RAID card will make things much better.

Nikita Kipriyanov
  • 8,033
  • 1
  • 21
  • 39
  • A hardware RAID card will make things much better. The md checkarray command scans every sector of every disk for consistency and bit rot. This is done by the process reading every block so it is I/O intensive and somewhat CPU intensive. With a hardware raid, these functions are run from within the card so no I/O on the bus and the cpu is not involved. – doneal24 May 12 '22 at 13:51
  • Interesting, thank you. I certainly thought it odd that the nVME check took so long, the 200MB/s limit makes sense. Although I would LIKE to run the job less often, I believe that Debian opts for monthly and RHEL weekly. Who's to say which is correct... can you flesh out why you believe weekly to be "too often"? – Codemonkey May 12 '22 at 13:57
  • Additionally, do you know at what point raid-check will re-load the conf file? Or how to make it do so? I've tried idling the checks (`echo idle > /sys/devices/virtual/block/mdX/md/sync_action`) and then starting again but that doesn't seem to do it. (I've set `MAX_CONCURRENT=1` and it's happily doing both at the same time right now) – Codemonkey May 12 '22 at 14:05