How likely am I to even encounter actual data corruption making files unreadable? How?
Obviously, given infinite time you're certain to encounter it.
Realistically though, it's still pretty likely unless you have very expensive enterprise grade hardware, and even then it's not hugely unlikely.
More likely though, you'll end up encountering data corruption that just changes the file contents, but doesn't make them unreadable (unless you've got insane numbers of tiny files, simple statistics means you're more likely to have corruption in file data than in file metadata). When this happens, you can get all kinds of odd behaviors just like if you had bad hardware (though usually it will be more consistent and localized than bad hardware). If you're lucky, it's some non-critical data that gets corrupted, and you can easily fis things. If you're moderately unlucky, you have to rebuild the system from scratch. If you're really unlucky, you just ran into an error that caused you to go bankrupt because it happened to hit critical data in a production system and your service is now down while you rebuild the whole thing from scratch and try to put the database back the way it should be.
Short answer, data corruption is likely enough that even home users should be worrying about it.
Can Ext4 or the system file manager already detect data errors on copy/move operations, making me at least aware of a problem?
Ext4 is notoriously bad on this point. Their default behavior n running into an internal consistency error is to mark the filesystem for check on next remount, and then continue as if nothing is wrong. I've lost whole systems in the past because of this behavior.
More generically, in most cases, the best you can hope for from a filesystem not specifically designed to verify it's data is to remount read-only if it runs into an internal error with it's own data structures or file metadata. The thing is though, unless the filesystem specifically handles verification of it's own internal structures beyond simple stuff like bounds checking, this won't catch everything, things will just go wrong in odd ways.
To get anything more, you need the filesystem to verify it's own internal data structures with checksums, error correcting codes, erasure coding, or some similar approach. Even then, unless it does the same for file data, you're still at non-negligible risk of data loss.
What happens if one of the madam-Raid1 drives holds different data due to one drive having bad sectors? Will I still be able to retrieve the correct file or will the array be unable to decide which file is the correct one and lose it entirely?
It depends on the RAID level, the exact RAID implementation, and whether or not you have it set to auto-recover. Assuming you have auto recovery on:
For RAID1 and RAID10:
- With hardware RAID and only two replicas, it usually picks the first replica and syncs the array to that.
- In some hardware RAID systems with more than two replicas, it checks if a majority of the replicas match, and if so overwrites the ones that don't match with that.
- With software RAID, it usually does the same as with hardware RAID unless it has a clear indication that the discrepancy is a result of a failed write (in which case it picks the copy that it knows was completely written).
- With BTRFS, it looks at which copy has a correct checksum, and replaces the one that doesn't with that.
- I believe that ZFS works like BTRFS here.
For RAID4/5/6 and other cases of erasure coding, almost everything behaves the same when it comes to recovery, either data gets rebuilt from the remaining devices if it can be, or the array is effectively lost. ZFS and BTRFS in this case just give you a quicker (in terms of total I/O) way to check if the data is correct or not.
Note that none of these operate on a per-file basis and most don't allow you to easily pick the 'correct' one, they either work completely, fail completely, or alternately return good or bad data for the out of sync region.