17

I'm debating about using LVM for a media/file server because I would like to combine multiple physical hard disks into one volume. I do not wish to use any RAID in my LVM so my question is:

If one of the multiple hard disks in my volume were to go down would I lose all my data or would I just lose the data that was stored on that individual disk?

Also, if I were to just lose the data on the individual disk, would it be as simple as replacing that disk and restoring what was on it from a backup to recover?

slm
  • 7,355
  • 16
  • 54
  • 72
Fujin
  • 175
  • 1
  • 1
  • 6

7 Answers7

12

If one of the multiple hard disks in my volume were to go down would I lose all my data or would I just lose the data that was stored on that individual disk?

No you will lose data stored on whole LVM

Also, if I were to just lose the data on the individual disk, would it be as simple as replacing that disk and restoring what was on it from a backup to recover?

No it isnt that simple

You can read here similar question LVM and disaster recovery

B14D3
  • 5,110
  • 13
  • 58
  • 82
8

Simple: You are looking for mhddfs.
It pretends to be one large filesystem, writes to the disks in the order they where mentioned and eventually moves large files to a different device, if the first one was too full. It can actually also use subfolders on the disks, allowing the same functionality.
The individual disks have to be mounted first and remain accessible. It does not alter the filesystems at all and does not care which filesystem is in place (as long as free space is correctly reported by the filesystem). In case a disk is lost, you'll have to remount your mhddfs again (on the fly) and the data on that disk is gone.
Usage:

mhddfs /dir1,/dir2[,/path/to/dir3] /path/to/mount [-o options]

or in /etc/fstab

 mhddfs#/path/to/dir1,/path/to/dir2 /mnt/point fuse defaults 0 0

Complex&Powerful: You want unionfs.
While mhddfs is nice and extremely simple, I've had problems with file permissions when granting others access via SSH. I couldn't find any solution, but found unionfs.
Unionfs also allows you to mount several folders across different filesystems into one, but does it's magic on permissions. You can merge several read only folders and one writable one together, so it appears as one. People you shared your merged folder with can then write to a read-only folder - as it appears to them - but the files end up in the single writable one. Linux boot CDs work like this, the writable disk is a ramdisk. People can even delete files in read only folders, which does not really delete the file, but creates a hidden whitelist file in their write-directory. If you catch all the options, you can basically use your filesystem as a poor mans SVN.
If you use the SVN-like options too much, you might miss data existing twice (improbable in your scenario, but possible), while your writable folder fills up with tiny, hidden whitelist-files. Other than that, it keeps your disks clean and individually usable. What happens if a file is too large for a disk, I don't know yet.
Usage:

 unionfs-fuse -o cow,max_files=32768 \
                 -o allow_other,use_ino,suid,dev,nonempty \
                 /path/to/dir1=rw:/path/to/dir2=ro:/dir3
                 /u/union/etc

where =rw makes the folder read and writable and =ro makes it read only, even if the permissions would state otherwise. In etc/fstab this is

unionfs-fuse#/path/to/dir1=rw:/path/to/dir2=ro:dir3 /path/to/mount fuse cow,allow_other 0 0
DennisH
  • 193
  • 1
  • 6
  • How do these deal with a removable disk not being plugged in? – endolith May 10 '15 at 13:59
  • @endolith A mhddfs on a removable disk would mean not knowing what will end on it, a unionfs makes actual sense, but requires remounting differently, depending on the availability of the removable device. – DennisH May 12 '15 at 19:44
  • I don't understand your comment. If you combined 3 USB drives using mhddfs and put files on them, and then remove one, does anything permanently bad happen? If you plug it back in, does it go back to the way it was before? – endolith May 14 '15 at 18:52
  • 3
    @endolith Unplugging during write is never good. If you unplug after writing, the USB drives are all individually OK, no problem. You can unplug a drive, plug it into PC2, use it, plug it back into PC1 and remount to continue using them as before. In between the mount point on your usual PC1 needs to be informed that you took the drive (remount). But, assume you share 2 files with PC2, you can't tell which drive of the mhddfs mount they'll end up in, or if it is even the same drive. Drives are still reachable individually to check/ensure the location, but that doesn't involve mhddfs anymore. – DennisH May 15 '15 at 08:45
4

If you're just connecting multiple devices together there wouldn't be any redundancy, so you could lose the data. But if you're using a media/file server for a business, you shouldn't lose anything because you have everything backed up to a backup server/tape drive.

Why are you avoiding RAID? The point of RAID is availability; if you don't want to lose time due to disk failure, you can use a RAID 1 configuration, which can also speed up your reads. They're not too expensive, pay for themselves the first time you have a disk failure, and if you are REALLY avoiding having to pay for a card you can set up Linux to use software RAID although it takes a little more care in the setup and troubleshooting to make sure you replace the correct drive.

Otherwise you'd have to jump through some hoops to try to recover what data you can from the remaining disks. It would be possible, but you're kinda' asking for a lot more trouble than you should have. Get a good backup in place, and reconsider the RAID.

Bart Silverstrim
  • 31,092
  • 9
  • 65
  • 87
  • 1
    I'm avoiding RAID because I'm not concerned about availability. This is for personal use. I'm trying to avoid RAID because I'm looking more for reliability. One little thing goes wrong in RAID and you lose the entire array and I don't feel like dealing with that. I rather have one large file depository that I can backup rather then several. – Fujin Mar 01 '11 at 14:05
  • 1
    I'm not sure I understand that. RAID makes you not lose the array if one thing goes wrong. Not using RAID is when one failure will kill the whole thing. – JOTN Mar 01 '11 at 14:11
  • 4
    @user72630: You have serious misconceptions about how a RAID works. First of all, there are different RAID levels, most of which are designed to avoid data loss in case of a disk failure. Then you can configure a RAID to have only one file system on multiple disks, just like you are planning with LVM. Please read http://en.wikipedia.org/wiki/RAID – Sven Mar 01 '11 at 14:17
  • What the others said. The whole point of having RAID is keeping availability...i.e., drive fails, data is still accessible. With appropriate gear you can hot-swap the failed drive and not worry about downtime *at all*. – Bart Silverstrim Mar 01 '11 at 14:49
3

If you are using one file system spanning all LVM volumes, the whole file system will be damaged as the FS doesn't know about the underlying physical volumes and won't create structures aligned to it. It may be possible to rescue some of the parts on the working disks, but there is no guarantee for that.

And just recovering the files of the damaged disk won't work either for the same reason.

Sven
  • 97,248
  • 13
  • 177
  • 225
2

I think a much simpler route would be to configure mdadm for your media partition. If you don't have the hardware for "real RAID" going the mdadm route would be considerably easier, and seem to meet your requirements for redundancy and simple disk replacement.

# Format your drives first
# Create your MD
mdadm --create /dev/md1 --level=5 --raid-devices=3 /dev/sda2 /dev/sdb2 /dev/sdc2

# In the event that a drive fails do the following
mdadm /dev/md1 --fail /dev/sda1
# Format the new drive
mdadm --add /dev/md1 /dev/sda1

For more information: http://en.wikipedia.org/wiki/Mdadm

If one of the multiple hard disks in my volume were to go down would I lose all my data or would I just lose the data that was stored on that individual disk?

If you use mdadm and RAID 5, you'd be able to lose one drive, and have the array functional, albiet you'd experience performance degradation.

Aaron Lake
  • 121
  • 2
1

I think the important thing to understand which hasn't been mentioned is a file in a filesystem is not necessarily sitting in one spot on a disk. It's broken up into blocks which may reside anywhere inside the filesystems. The first 4K if your file might be on disk1, the next disk2, etc. You can imagine the mess of trying to recovery anything if you lost a chunk of the filesystem.

JOTN
  • 1,727
  • 1
  • 10
  • 12
0

Btrfs is a good choice here; you can have metadata resilient to the loss of one disk (the "raid1" chunk profile); the data on the other disks will still be reachable (just so we're clear, that translates to files full of holes wherever the missing disk is referenced). This is done by running btrfs balance with a filter:

sudo btrfs balance start -m convert=raid1 /mnt/point
Gabriel
  • 261
  • 2
  • 4