2

I have 3 identical internal 7200 RPM SATA hard disk drives on a Linux machine. I'm looking for a storage set-up that will give me all of this:

  • Different data sets (filesystems or subtrees) can have different RAID levels so I can choose performance, space overhead, and risk trade-offs differently for different data sets while having a few number of physical disks (very important data can be 3xRAID1, important data can be 3xRAID5, unimportant reproducible data can be 3xRAID0).
  • If each data set has an explicit size or size limit, then the ability to grow and shrink the size limit (offline if need be)
  • Avoid out-of-kernel modules
  • R/W or read-only COW snapshots. If it's a block-level snapshots, the filesystem should be synced and quiesced during a snapshot.
  • Ability to add physical disks and then grow/redistribute RAID1, RAID5, and RAID0 volumes to take advantage of the new spindle and make sure no spindle is hotter than the rest (e.g., in NetApp, growing a RAID-DP raid group by a few disks will not balance the I/O across them without an explicit redistribution)

Not required but nice-to-haves:

  • Transparent compression, per-file or subtree. Even better if, like NetApps, analyzes the data first for compressibility and only compresses compressible data
  • Deduplication that doesn't have huge performance penalties or require obscene amounts of memory (NetApp does scheduled deduplication on weekends, which is good)
  • Resistance to silent data corruption like ZFS (this is not required because I have never seen ZFS report any data corruption on these specific disks)
  • Storage tiering, either automatic (based on caching rules) or user-defined rules (yes, I have all-identical disks now but this will let me add a read/write SSD cache in the future). If it's user-defined rules, these rules should have the ability to promote to SSD on a file level and not a block level.
  • Space-efficient packing of small files

I tried ZFS on Linux but the limitations were:

  • Upgrading is additional work because the package is in an external repository and is tied to specific kernel versions; it is not integrated with the package manager
  • Write IOPS does not scale with number of devices in a raidz vdev.
  • Cannot add disks to raidz vdevs
  • Cannot have select data on RAID0 to reduce overhead and improve performance without additional physical disks or giving ZFS a single partition of the disks

ext4 on LVM2 looks like an option except I can't tell whether I can shrink, extend, and redistribute onto new spindles RAID-type logical volumes (of course, I can experiment with LVM on a bunch of files). As far as I can tell, it doesn't have any of the nice-to-haves so I was wondering if there is something better out there. I did look at LVM dangers and caveats but then again, no system is perfect.

Easter Sunshine
  • 246
  • 1
  • 5
  • 11

2 Answers2

4

You're asking for too much. These are unrealistic requirements, especially given that you're talking about a trio of low-speed consumer disks. What are you planning?

  • Use ZFS on Linux and FOUR disks. If you're talking about expansion, you probably have room for four data disks. You don't mention your Linux distribution, but upgrades have not been an issue with CentOS 6.x. Mirrors are expandable. RAID-Z1/2/3 sets simply are not. Set compression per-filesystem and be done with it. You can rebalance data by copying. But plan better, and the expansion issues won't be a limitation. This gives you compression, snapshots, tiering and data integrity. Forget dedupe on ZFS. You probably don't need it. If you do, you should be planning for the resource requirements it needs.

  • As for the limitations of ZFS on Linux, you should understand the basics of planning a ZFS storage setup. Use a ZIL device to increase write IOPS.

  • There are hardware RAID solutions like the HP Smart Array controller line, which would allow the different RAID protections on a single group of drives... It automatically rebalances/redistributes data during expansions. Reductions are not possible. You can export HP logical drives as block devices for ZFS, so you end up with the filesystem benefits, but smart use of the underlying hardware devices as well. In the example below, the zpools vol1 and vol2 are comprised of single devices that correspond to logicaldrive 2 and logicaldrive 3 from the HP RAID controller output:

ZFS pool information:

[root@ZFS-on-Linux ~]# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vol1   119G  48.2G  70.8G    40%  1.00x  ONLINE  -
vol2  99.5G  42.6G  56.9G    42%  1.00x  ONLINE  -

HP RAID controller output:

Smart Array P400 in Slot 8                (sn: P61630G9SVN702)

   array A (SAS, Unused Space: 609621  MB)


      logicaldrive 1 (72.0 GB, RAID 1+0, OK)
      logicaldrive 2 (120.0 GB, RAID 1+0, OK)
      logicaldrive 3 (100.0 GB, RAID 5, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)

Combine the two above, and you get most of your requirements.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
2

That's quite a wish list you have.

Linux md RAID (mdadm) + LVM will get you most of the requirements and nearly none of the nice-to-haves.

You can carve your disks into partitions. Do md RAID with different levels across the different partitions. Put the md RAID volumes in LVM volume groups, create logical volumes out of the VGs. Put a filesystem (like ext4) on the LV. You can boot md RAID1 if it doesn't have LVM. So use RAID1 without LVM for /boot. md RAID volumes can shrink and grow buy adding disks, as can LVM and ext4.

LVM can do snapshots. But they aren't native COW, so there is a performance penalty if the snapshots use the same physical disks. They are best for point-in-time temporary captures to backup, not for unlimited snapshots like ZFS.

It could look something like this:

sda1 + sdb1 + sdc1 = md0 RAID1
sda2 + sdb2 + sdc2 = md1 RAID10 (yes, you can do odd numbered RAID10)
sda3 + sdb3 + sdc3 = md2 RAID5
sda4 + sdb4 + sdc4 = md3 RAID0

md0 = no LVM, format as ext4, mount as /boot
md1 = LVM, divide into two LVs
      Format one LV as ext4, mount as / (root)
      Use other LV for swap
md2 = LVM, create one LV
      make the LV is smaller than the VG, to leave space for snapshots
      Format the LV as ext4, mount as /data
md3 = LVM, one LV taking the whole VG
      Format as ext4, mounts as /stuff_i_want_to_lose
Anton Cohen
  • 1,112
  • 6
  • 7