How often you should scan depends on a lot of things.
- Age of the disks. The older they are, the more likely they are to contain evil.
- The original quality of the disks in question. Stuff sold as 'enterprise' is more likely to last error-free, and the 1+TB size disks of 2014 are a lot more reliable than their 2009 equivalents were when they shipped.
- How sensitive your production I/O is to the scrubbing I/O.
- How much of your dataset you consider to be your working set.
The hardware RAID vendors often include a background scrub process for this very reason, some even allow you to tune the I/O priority of the scrubbing process which allows you to avoid (or greatly reduce) the production I/O penalty for a scrub. Of course, if your priority is low and your prod I/O runs the disks mostly flat out you'll probably never complete a scrub and not even notice it until you get a failure.
Unfortunately, I don't know if the Linux kernel deprioritizes scrubbing I/O or not. Either way, it's a good idea to test it with your prod loads to be sure any hits to performance are acceptable. If it is acceptable, good! If it isn't, you get to make a choice on whether or not to add spindles to allow scrub+prod I/O or just accept the risk of possible array failures down the road.
Another thing that impacts scrubbing frequency is I/O usage pattern. If the production loads only hit a minority of the disks, the only I/O that would normally find a bad block in the idle portion would be your scrub; in that case you want to scrub more often. If your production loads routinely read the whole disk-set (like daily full backups), then production I/O is going to stumble across problems sooner and you can scrub less often.
A good plan of action would be:
- Run some tests to see if scrubbing will get in the way of production.
- Figure out how long a full scrub takes while you're at it.
- Figure out what percentage of your disk-set will get multiple accesses in a given week (include backup I/O, if any, in this calculation).
- Based on 1 and 2 decide if you're in the less-often or more-often camp.
Once you have that data...
- If a full scan takes under a day and doesn't impact production noticeably, you can go as often as once a week.
- If a full scan takes under a day and does impact production, figure out what part of your week/month is least affected and try to run it then.
- If a full scan takes over a day but under a week and doesn't impact production, run it as often as every other week or once every other month.
- If a full scan takes over a day but under a week and does impact production, consider adding resources to allow it to be run, require scans to be run during arranged maintenance windows, or take advantage of the idle/check ability of scrubbing to do it in fits and starts continually.
- If a full scan takes over a week, once a month is often enough. But if it impacts production, you will need to add resources to allow it to complete.