I have 3 identical internal 7200 RPM SATA hard disk drives on a Linux machine. I'm looking for a storage set-up that will give me all of this:
- Different data sets (filesystems or subtrees) can have different RAID levels so I can choose performance, space overhead, and risk trade-offs differently for different data sets while having a few number of physical disks (very important data can be 3xRAID1, important data can be 3xRAID5, unimportant reproducible data can be 3xRAID0).
- If each data set has an explicit size or size limit, then the ability to grow and shrink the size limit (offline if need be)
- Avoid out-of-kernel modules
- R/W or read-only COW snapshots. If it's a block-level snapshots, the filesystem should be synced and quiesced during a snapshot.
- Ability to add physical disks and then grow/redistribute RAID1, RAID5, and RAID0 volumes to take advantage of the new spindle and make sure no spindle is hotter than the rest (e.g., in NetApp, growing a RAID-DP raid group by a few disks will not balance the I/O across them without an explicit redistribution)
Not required but nice-to-haves:
- Transparent compression, per-file or subtree. Even better if, like NetApps, analyzes the data first for compressibility and only compresses compressible data
- Deduplication that doesn't have huge performance penalties or require obscene amounts of memory (NetApp does scheduled deduplication on weekends, which is good)
- Resistance to silent data corruption like ZFS (this is not required because I have never seen ZFS report any data corruption on these specific disks)
- Storage tiering, either automatic (based on caching rules) or user-defined rules (yes, I have all-identical disks now but this will let me add a read/write SSD cache in the future). If it's user-defined rules, these rules should have the ability to promote to SSD on a file level and not a block level.
- Space-efficient packing of small files
I tried ZFS on Linux but the limitations were:
- Upgrading is additional work because the package is in an external repository and is tied to specific kernel versions; it is not integrated with the package manager
- Write IOPS does not scale with number of devices in a raidz vdev.
- Cannot add disks to raidz vdevs
- Cannot have select data on RAID0 to reduce overhead and improve performance without additional physical disks or giving ZFS a single partition of the disks
ext4 on LVM2 looks like an option except I can't tell whether I can shrink, extend, and redistribute onto new spindles RAID-type logical volumes (of course, I can experiment with LVM on a bunch of files). As far as I can tell, it doesn't have any of the nice-to-haves so I was wondering if there is something better out there. I did look at LVM dangers and caveats but then again, no system is perfect.