How do I convert a Linux disk image into a sparse file?

12

7

I have a bunch of disk images, made with ddrescue, on an EXT partition, and I want to reduce their size without losing data, while still being mountable.

How can I fill the empty space in the image's filesystem with zeros, and then convert the file into a sparse file so this empty space is not actually stored on disk?

For example:

> du -s --si --apparent-size Jimage.image 
120G Jimage.image
> du -s --si Jimage.image 
121G Jimage.image

This actually only has 50G of real data on it, though, so the second measurement should be much smaller.

This supposedly will fill empty space with zeros:

cat /dev/zero > zero.file
rm zero.file

But if sparse files are handled transparently, it might actually create a sparse file without writing anything to the virtual disk, ironically preventing me from turning the virtual disk image into a sparse file itself. :) Does it?

Note: For some reason, sudo dd if=/dev/zero of=./zero.file works when cat does not on a mounted disk image.

endolith

Posted 2010-07-31T03:58:39.090

Reputation: 6 626

Note: sudo cat /dev/zero > zero.file doesn't work because your bash (running as you, not root) does the redirection before executing the sudo command. See http://unix.stackexchange.com/questions/1416/redirecting-stdout-to-a-file-you-dont-have-write-permission-on

– Fritz – 2016-09-16T15:12:26.187

2Writing zeroes into a file will not create a sparse file. It's a different concept. As you seek/read a sparse file when the OS discovers the block of data isn't really there (the block list is empty for data in that region) it (the OS) auto magically fills the read buffer with zero bytes. – hotei – 2010-07-31T20:27:14.220

Answers

19

First of all, sparse files are only handled transparently if you seek, not if you write zeroes.

To make it more clear, the example from Wikipedia

dd if=/dev/zero of=sparse-file bs=1k count=0 seek=5120

does not write any zeroes, it will open the output file, seek (jump over) 5MB and then write zero zeroes (i. e. nothing at all). This command (not from Wikipedia)

dd if=/dev/zero of=sparse-file bs=1k count=5120

will write 5MB of zeroes and will not create a sparse file!

As a consequence, a file that is already non-sparse will not magically become sparse later.

Second, to make a file with lots of zeroes sparse, you have to cp it

cp --sparse=always original sparsefile

or you can use tar's or rsync's --sparse option as well.

mihi

Posted 2010-07-31T03:58:39.090

Reputation: 3 217

1According to Wikipedia, writing zeros with dd will create a sparse file. Can you explain what "seeking" means? – endolith – 2010-07-31T19:33:23.303

1What about cat then? There is nothing in the man page about sparse files, so I assume cat /dev/zero > zero.file is perfectly OK to fill empty space with zeros? – Ludwig Weinzierl – 2010-08-01T08:03:59.360

2@endolith: Updated my answer to make clear what the difference is to use dd for writing zeroes or for seeking. – mihi – 2010-08-01T11:54:47.163

2@Ludwig Weinzierl: Yes, that cat command will fill your entire disk (or at least the amount not reserved for root or by quotas) with "real" zeroes, and create no sparse files. – mihi – 2010-08-01T11:58:26.767

@mihi: Any dd command with a count of 0 (zero) is guaranteed to do nothing by virtue of the count=0. Has nothing to do with sparse etc. I like where you're going with this but you need a better example. – hotei – 2010-08-08T23:07:13.470

@hotei: Even if you give count=0, it will still honour the seek option before writing zero bytes. And the example is from Wikipedia. A seek beyond the end of the disk will create a sparse file, regardless if you write after the seek or not. – mihi – 2010-08-11T16:13:40.507

tar or rsync with sparse is still making a copy of the file, right? so you need space for two copies of it. – endolith – 2011-05-24T00:01:46.457

1@endolith you will need extra space, yes. but since you can compress the tarball, you will only need space for the original file and a compressed version of the sparse file. – mihi – 2011-06-01T16:16:37.767

Just used this to reduce a file from 21G to 86M. :D – endolith – 2013-04-03T01:04:14.573

12

Perhaps the easiest way to sparsify a file in place would be to use fallocate utility as follows:

fallocate -v --dig-holes {file_name}

fallocate(1) is provided by util-linux package on Debian.

Onlyjob

Posted 2010-07-31T03:58:39.090

Reputation: 324

1For some reason, fallocate --dig-holes resulted in 103GiB file from 299GiB original, while cp --sparse=always gave me 93GiB — all with the same SHA1 sum (sizes checked via du -B1G vs du --apparent-size -B1G). So fallocate seems to give inferior results. – Ruslan – 2017-05-21T09:08:10.490

3

Editing my answer for completeness:

  1. Balloon empty FS space with zeroes (WARNING: this changes your disk image):

losetup --partscan --find --show disk.img

Assume it gives /dev/loop1 as the disk and there is only one partition, otherwise we need to repeat this for every partition with mountable FS in it (ignore swap partition etc.).

mkdir -p /mnt/tmp mount /dev/loop1p1 /mnt/tmp dd if=/dev/zero of=/mnt/tmp/tempfile

Let it finish to failure with ENOSPC.

/bin/rm -f /mnt/tmp/tempfile umount /mnt/tmp losetup -d /dev/loop1

  1. Copy into a sparse image:

'dd' has an option to convert a file with zeroes to a sparse file:

dd if=disk.img of=disk-sparse.img conv=sparse

Lam Das

Posted 2010-07-31T03:58:39.090

Reputation: 31

1Yes, this option is not from the time when OP asked. This was more of "leave a bread crumb for other searchers"...:-) – Lam Das – 2018-08-20T02:02:30.087

1depending on filesystem type, zerofree may be faster than mounting and writing zeroes to the filesystem, and making the disk image grow less if it already contained lots of zeroes. – mihi – 2018-09-06T19:19:25.637

2

Do you mean that your ddrescue created image is, say, 50 GB and in reality something much less would suffice?

If that's the case, couldn't you just first create a new image with dd:

dd if=/dev/zero of=some_image.img bs=1M count=20000

and then create a filesystem in it:

mkfsofyourchoice some_image.img

then just mount the image, and copy everything from the old image to new one? Would that work for you?

Janne Pikkarainen

Posted 2010-07-31T03:58:39.090

Reputation: 6 717

2

PartImage can create disk images that only store the used blocks of a filesystem, thus drastically reducing the required space by ignoring unused block. I don't think you can directly mount the resulting images, but going:

image -> partimage -> image -> cp --sparse=alway

Should produce what you want (might even be possible to stick the last step, haven't tried).

Grumbel

Posted 2010-07-31T03:58:39.090

Reputation: 3 100

1Unfortunately the images created by partimage are not mountable without expanding them out again, making them suitable only for archival purposes. – Perkins – 2017-02-19T18:28:57.010

0

There's now a tool called virt-sparsify which will do this. It fills up the empty space with zeros and then copies the image to a sparse file. It requires installing a lot of dependencies, though.

endolith

Posted 2010-07-31T03:58:39.090

Reputation: 6 626

-2

I suspect you'll require a custom program written to that spec if that's REALLY what you want to do. But is it...?

If you've actually got lots of all-zero areas then any good compression tool will get it down significantly. And trying to write sparse files won't work in all cases. If I recall correctly, even sparse files take up a minimum of 1 block of output storage where the input block contains ANY bits that are non-zero. For instance - say you had a file that had an average of even 1 non-zero bit per 512 byte block - it can't be written "sparsely". By the way, you're not going to lose data if you compress the file with zip, bzip, bzip2 or p7zip. They aren't like mpeg or jpeg compression that is lossy.

On the other hand, if you need to do random seek reads into the file then compression might be more trouble than it's worth and you're back to the sparse write. A competent C or C++ programmer should be able to write something like that in an hour or less.

hotei

Posted 2010-07-31T03:58:39.090

Reputation: 3 645

One way to have your compressed images and mount them too is to simply store them on a filesystem that supports native compression. Makes data recovery awful if you have a drive crash, but that's what backups are for, right? – Perkins – 2017-02-19T18:31:05.250

Interesting - a downvote yet I notice there's no refutation of what I wrote. If it's accurate but unhelpful that's not a reason to downvote. If it's not accurate and not helpful then it does deserve it. – hotei – 2010-07-31T16:03:34.603

I see elsewhere that the OP had a question relating to mounting compressed images. I'm assuming this is a continuation of that thread. Knowing that I can now see why my suggestion of compression wasn't accepted. A simple C program is still an easy way to create sparse files. BUT - will the (unspecified) OS let you mount a sparse ISO. As picky as the Ubuntu ISO mounter is I'm not 100% sure that's going to work either... but best of luck in any case. – hotei – 2010-07-31T16:35:04.060

4why reinvent the wheel? cp --sparse=always does the work fine – mihi – 2010-07-31T18:36:34.113

@mihi: That's a good idea. I didn't know about the sparse option as it's not available in BSD flavors (http://www.freebsd.org/cgi/man.cgi?query=cp&apropos=0&sektion=0&manpath=FreeBSD+8.1-RELEASE&format=html) and I have never had the requirement to look at a Linux man page for cp (until today).

– hotei – 2010-07-31T20:23:42.243