Linux: How to create image of used files only?

0

I want to create an image of a Windows file system, while running Linux. The file system is encrypted but will be decrypted for the purpose of the backup.

How can I create an image of only my files? So if I have a 4GB USB with 1 .txt inside, an image would only be of the 1 .txt and would not take up 4GB of space.

This doesn't seem like a big issue on a small scale, but I will be backing up Windows, which will be encrypted by Linux, by following: https://veracrypt.codeplex.com/wikipage?title=How%20to%20Back%20Up%20Securely

So I only want my image to take up the amount of space that Windows is taking. NOT 250GB. All or most search results, if I missed some, say to use dd. dd creates complete images. When someone has previously asked for their image to only take up the necessary space, people have said to overwrite the spare space with 0s. Which 1) Doesn't resolve the question and 2) is impossible with SSDs because of wear leveling.

user138072

Posted 2016-12-30T08:29:25.467

Reputation: 101

Answers

1

The suggestion of filling free space with zeros does help, both with raw images (which can then be efficiently gzip'd) and with specialized formats like qcow/vmdk/vhd (which have built-in support for 'sparse' areas).

It shouldn't actually hurt the SSD – I think all recent disks are smart enough to recognize a block of all-zeros and quietly map all such blocks to the same flash cell. (Besides, it's still only one write per block, and you can TRIM it back afterwards to mark as unused.)

Otherwise, you'd need a tool which understands your specific filesystem in order to know which areas are actually unused – for Windows NTFS, ntfsclone can do this. By default it creates raw images but skips the unused areas (telling the OS to mark them as "sparse"), so even though the resulting .img appears 4 GB, it only occupies several MB on disk.

You can later convert the .img file to a dynamic-size .vhd or .vmdk using qemu-img.

Alternatively, ntfsclone --save-image will directly output its own sparse image format (for use with ntfsclone itself only).

Another imaging tool is Clonezilla.

Though personally I often just use disk2vhd via Windows itself – it creates sparse VHD format images via Volume Shadow Copy.

user1686

Posted 2016-12-30T08:29:25.467

Reputation: 283 655

So I think I understand your first point. If the SSD is 200GB and the OS is 100GB, you're saying I should fill the rest with 'raw images' (what do you mean by that? Empty jpeg pictures?). So I'll have to create 100GB of pictures before I can backup the OS? That sounds like an awful lot of effort and time wasted for a backup to me. Not to mention a fine way to wear out the SSD. Also, I'll be repeating this backup method, if it works, with my 1TB HDD meaning I'll have to create 100s GBs of images on a HDD. This sounds like an awful lot of effort to backup an NTFS file system. – user138072 – 2016-12-30T14:45:14.817

Second paragraph. What do you mean? I'd be writing 0s to my SSD, which would 1) Wear it out unnecessarily and 2) Be inefficient. I also generally don't understand what you mean. As for TRIM it back afterwards, do you mean securely erase the drive afterwards via manufacturer software? I tried that software in the past and it didn't work. Not to mention I'd have to be wasting time and writing a crap loads of bits to my SSD just for a backup, after the backup. And then restore the backup. Unless I've misunderstood you? – user138072 – 2016-12-30T14:50:03.990

@user138072: No (and no); no; it's not; and about time you mention that. 1) I was saying that you should fill the rest of the disk (with null data) before imaging it, not with images. You used the same term 'disk image' in your question, you know it has other meanings besides pictures. 2) A single write won't wear out the SSD, because it's still a single write out of millions that the SSD can handle. (Also, the disk's firmware will collapse all of them into one flash cell anyway.) – user1686 – 2016-12-30T14:50:44.130

@user138072: 3) For repeated backups, see the list of alternative methods (including ntfsclone and disk2vhd). 4) No, that's not what TRIM means. It's how the OS tells the disk's wear-leveling software which space is no longer used. – user1686 – 2016-12-30T14:53:20.727

Third paragraph. "ntfsclone" sounds promising. How can I use it? I could not find it in the Linux Mint repository. Fourth paragraph. Why would I want to convert it to a virtual hard disk? I want the disk to be a backup of an OS on an SSD and don't intend to use it in a virtual machine. I'll look into disk2vhd, but I'd ideally want to be doing this in Linux. It would be easier to manage my other backups without the fear of malware, if my computer gets infected. If it did get infected, I'd have to risk infecting my backups to restore my OS. Comment chain ended. – user138072 – 2016-12-30T14:54:27.120

start="5">

  • ntfsclone is part of "ntfs-3g" / "ntfsprogs". 6) It doesn't matter where you want to use it. A disk image is a disk image; the VM-oriented ones (VHD, VMDK, etc.) only hold extra metadata describing which areas of the disk/image are actually in use – so the image file can be much smaller than the disk's full size.
  • < – user1686 – 2016-12-30T15:00:54.493

    Oh. My bad, I thought you meant pictures. Yeah I understand disk images. By null data, do you mean 0s? I don't think think this would be possible, especially before installing the OS. It would be encrypted by Veracrypt and so it will overwrite the drive for security - to ensure an adversary cannot determine which bits are encrypted data and which bits are just encrypted in place. Hence my desire to create the ntfs image via Linux. I will be creating an image every week, maybe more often if there is a need to; I imagine this would have a bad effect in the long run? – user138072 – 2016-12-30T15:03:42.797

    In that case, yes; use the specialized tools. – user1686 – 2016-12-30T15:06:11.497