51

I have a disk, say /dev/sda.

Here is fdisk -l:

 Disk /dev/sda: 64.0 GB, 64023257088 bytes
255 heads, 63 sectors/track, 7783 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000e4b5

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          27      209920   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              27         525     4000768    5  Extended
Partition 2 does not end on cylinder boundary.
/dev/sda5              27         353     2621440   83  Linux
/dev/sda6             353         405      416768   83  Linux
/dev/sda7             405         490      675840   83  Linux
/dev/sda8             490         525      282624   83  Linux

I need to make an image to store on our file server for use in flashing other devices we are manufacturing so I only want the used space (only about 4gb). I want to keep the mbr etc... as this device should be boot ready as soon as the copy is finished.

Any ideas? I previously had been using dd if=/dev/sda of=[//fileserver/file], but at that time, my master copy was on a 4gb flash ide.

Jonathan Henson
  • 889
  • 2
  • 10
  • 16

7 Answers7

46

Back in the day I ran into a similar problem with embedded Linux distributions - get rid of all the junk before compressing the image.

dd if=/dev/zero of=asdf.txt. Wait until it dies. Delete asdf.txt.

You've just written zeros to all free space on the device.

Now take a disk image and run it through gzip. Voila, sparse image.

Probably doesn't scale very well and could cause problems if you actually need to write to the disk, but hey.

You could take an rsync snapshot of the disk to another volume, zero that, and then take that disk image.

Note: Could be hazardous for SSD, user should consider this operation befor committing.

Diamond
  • 8,791
  • 3
  • 22
  • 37
Rob Bos
  • 824
  • 7
  • 8
  • 1
    if I run it through gzip, will I have to unzip it before using it? And by run it through gzip, do I just pipe it during the dd process? – Jonathan Henson Oct 17 '12 at 15:30
  • 5
    Yes. `dd if=sda2.gz | gunzip > /dev/sda2` and `dd if=/dev/sda2 | gzip > sda2.gz` – Rob Bos Oct 20 '12 at 15:56
  • 5
    "You've just written zeros to all free space on the device". You mean partition, not device, I think. So you'd need to run that command with an `of` path for each partition. – jiggunjer Feb 05 '16 at 04:46
  • If the physical media is an SSD, it may now think every sector on the device has been used. This will give the SSD fewer spare sectors to work with and possibly decrease performance as a result. If the driver and firmware has TRIM support that condition will only apply until you delete the file again. If you keep the file in place while you create the image, you will have to delete the file again after restoring the image. That may be useful if the image is restored to SSD. – kasperd Feb 19 '16 at 11:14
  • There is a few additional concerns to keep in mind. Since this method requires the file system to be mounted read-write, there is a risk that changes to the underlying file system while the copying is in progress will lead to an inconsistent image. On one occasion I have seen the resulting copy be so inconsistent that fsck would actually segfault when trying to repair the inconsistencies on the copy. Also filling up the device can cause other processes needing to write to the media to fail. – kasperd Feb 19 '16 at 11:18
  • You can use `pv` instead of `dd` to get the same result. `pv /dev/zero > ~/zero.txt; rm ~/zero.txt` – CousinCocaine Apr 04 '18 at 12:21
  • @kasperd: that's what TRIM thing is for on SDD. To tell the drive what sectors can be freed. And if filesystem knows that you removed `zero.txt` already so it knows sectors it already occupied are now free so will be reclaimed. In fact w/o TRIM drive would never know what sector is unused as it got no idea what is the "data" here and you would end up with permanently full SDD once you wrote as many bytes to it as its capacity. – Marcin Orlowski Apr 19 '19 at 01:47
  • @CousinCocaine you can accomplish the same result with `status=progress` dd parameter. Though not all DD versions have that option, so `pv` is useful on systems without that option – TheTechRobo Stands for Ukraine Nov 02 '20 at 20:34
29

Assuming you want to save /dev/sdXN to /tgtfs/image.rawand you are root:

  1. mkdir /srcfs && mount /dev/sdXN /srcfs

  2. Use zerofill or just:
    dd if=/dev/zero of=/srcfs/tmpzero.txt
    to fill unused blocks with zero; wait for it to fill the file system completely then:
    rm /srcfs/tmpzero.txt

  3. Take the image with dd and use conv=sparse to punch zeros on-the-fly:
    dd conv=sparse if=/dev/sdxn of=/tgtfs/image.raw

If you want to use compression you don't need to punch the zeros with dd as zero blocks are highly compressible:

dd if=/dev/sdxn | gz -c | dd of=/tgtfs/image.raw

PS: You should note that it is not a good idea to this (filling the file system with zeros) on a flash memory based storage media (i.e. your source file system being a SSD) on a regular basis, as it will cause extensive write to your SSD and reduce its lifespan. (but it's alright for occasional transfer of data)

  • 6
    This is the correct answer. Use `dd conv=sparse`. – bahamat Jan 06 '15 at 22:06
  • 1
    What's wrong with doing this on flash storage? – Dan Mar 08 '17 at 19:18
  • 2
    @Dan (depending on the hardware and software design and configuration) it may cause extensive write to your SSD and reduce its lifespan. and overally, it is OK for moving data from old disk to new one (or what the OP wanted to do), but disk/partition level backup isn't a good solution for regular backup and restore, even on HDDs. file level backup (i.e. coping files from one file-system to another), or file-system level backup (with file-systems like BTRFS with `btrfs snapshot` and `btrfs send` tools) is a better solution IMHO. – Microsoft Linux TM Mar 09 '17 at 07:23
  • 1
    Hint: If you don't have `gz` on your `PATH` (like I didn't, on GParted Live), you can use `gzip -c` instead. – XtraSimplicity Feb 11 '19 at 01:12
  • What do you mean: `to punch zeros on-the-fly` ?? Can I use this method to make a new `image.raw` file whose file size is the same as the used space on `/dev/sdXN` ? – FlexMcMurphy May 05 '21 at 11:47
  • 1
    @FlexMcMurphy After you fill the filesystem with binary zeros, by using dd's `conv=sparse` option, you will essentially tell it to omit those all-zero regions from the image, thus your image will roughly only take as much space as actual used space of your filesystem. – Microsoft Linux TM May 06 '21 at 09:10
  • OK. I mount the loop device pointing to my big .img file locally then dd a binay zeros file into it until full then delete that file then dd the device file pointing to it into another .img file using `conv=spare`. My new .img file is still the same size as the original img file? The orig .img file contains a partition from another HDD I made like this: `dd if=/dev/sdc6 of=partition.img bs=1M` Maybe your thing won't work in this case because I don't have a partition table in my .img file... only a filesystem? – FlexMcMurphy May 08 '21 at 11:26
  • @FlexMcMurphy What filesystem are you using? – Microsoft Linux TM May 09 '21 at 08:04
  • It's a Fat32 filesystem. I solved my problem in another way... I made a blank .img file using dd then partitioned it then again used dd to copy `partition.img` INTO the partition on the blank img. Now I could resize the partition containing partition.img using GParted. – FlexMcMurphy May 10 '21 at 14:41
  • Thanks a lot for the tip about compression by gziping combined to dd. My case a 1TB with 700GB unallocated was compressed to 123GB of gziped raw image. I created a new partition for that 700GB unallocated and fill with zeros in advance. A the end, it means yet another unused blocks in the other partition also was compressed indeed. I ran it for the entire disk. – Amadeu Barbosa Nov 28 '21 at 13:54
21

Use dd, with the count option.

In your case you were using fdisk so I will take that approach. Your "sudo fdisk -l "produced:

    Disk /dev/sda: 64.0 GB, 64023257088 bytes
    255 heads, 63 sectors/track, 7783 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x0000e4b5

    Device Boot      Start         End      Blocks   Id  System
    /dev/sda1   *           1          27      209920   83  Linux
    Partition 1 does not end on cylinder boundary.
    /dev/sda2              27         525     4000768    5  Extended
    Partition 2 does not end on cylinder boundary.
    /dev/sda5              27         353     2621440   83  Linux
    /dev/sda6             353         405      416768   83  Linux
    /dev/sda7             405         490      675840   83  Linux
    /dev/sda8             490         525      282624   83  Linux

The two things you should take note of are 1) the unit size, and 2) the "End" column. In your case you have cylinders that are equal to 8225280 Bytes. In the "End" column sda8 terminates at 525 (which is 525[units]*16065*512 = ~4.3GB)

dd can do a lot of things, such as starting after an offset, or stopping after a specific number of blocks. We will do the latter using the count option in dd. The command would appear as follows:

    sudo dd if=/dev/sda of=/your_directory/image_name.iso bs=8225280 count=526

Where -bs is the block size (it is easiest to use the unit that fdisk uses, but any unit will do so long as the count option is declared in these units), and count is the number of units we want to copy (note that we increment the count by 1 to capture the last block).

plasmapotential
  • 319
  • 2
  • 3
  • 1
    FYI: to show units in cylinders, use `fdisk -l -u=cylinders /dev/sda` – xinthose Sep 28 '17 at 18:37
  • 7
    Why isn't this the accepted answer? It seems to be the least intrusive option as it doesn't modify the source. – user33326 Jan 13 '18 at 11:23
  • 5
    @user33326 because this answer is good for not copying unpartitioned space on a drive, not unused space within partitions, which is what the OP cares about. – GDorn Apr 08 '19 at 20:18
  • Exactly what I was after, ty! Creating a Raspberry Pi custom image, but have a 128G sd-card. This gets me an image of just the used size, not a 128G image that other solutions provide. – Justin Dec 01 '21 at 21:59
16

While /dev/zeroing the free-disk-space and use dd conv=sparse/gz -c is possible, on huge disks with empty space running in 100s of GBs, /dev/zeroing is painfully slow - not to mention that as other answers noted, /dev/zeroing an SDD till EOF.

Here's what I did when I ran into this situation:

  • On a lubuntu live CD, used gparted to 'shrink' the disk to minimum possible size, leaving rest of the space unallocated

  • Used
    dd bs=1M count=<size_in_MBs> if=/dev/sdX | gzip -c --fast| dd of=/path/to/image.gz to create the fast-compressed image (needless to say, you may want to skip the compression if you have sufficient space to store raw data (or are otherwise inclined to reduce CPU loading)

  • Used
    dd if=/path/to/image.gz | gunzip -c | dd bs=1M of=/dev/sdY to copy the data back onto different disk
  • Used gparted again to 'expand' the partition

I haven't tried it for multiple partitions but I believe the process above can be adapted to copy 'partitions' if partition-table on destination disk is created first and only the data contained in the partition is copied via dd - reading/writing offsets (skip/seek option of dd, respectively) would be required as appropriate.

Ashish Chopra
  • 161
  • 1
  • 3
  • 1
    this is the real answer, just use the `count` parameter – Gordy Jan 03 '17 at 00:18
  • Besides of gparted, one can also use `f2resize` sometimes (may be handy e.g. on virtual machines, where gparted is often not available). – Suma May 14 '20 at 17:21
9

You can't. dd is a very low level tool and it has no means of distinguishing between files and empty space.

On the other hand the empty space will compress very, very nicely so if you are only concerned about storage space, not for example write time, then just pipe it through gzip.

c2h5oh
  • 1,489
  • 10
  • 13
  • 8
    Assuming the free space hasn't previously been used. You can zero fill the free space first to ensure the compression works as expected. – Sirex Oct 16 '12 at 23:41
  • 3
    True. And it only complicates the process and makes it take even longer. – c2h5oh Oct 16 '12 at 23:52
7

Assuming the rest of the drive is empty (all zeros) you could pipe your DD through gzip, which should compress the empty space quite nicely. You can use a tool like zerofree to make sure your empty space is actually blank so it compresses nicely.

If you use a tool like partimage, clonezilla or some of the other linux cloning tools, they would handle most of this for you automatically.

Zoredache
  • 128,755
  • 40
  • 271
  • 413
  • partimage and clonezilla are actually smart enough to skip *reading* the free space, rather than relying on you to write zeros to it, and then have dd or gzip drop or compress the zeros after reading them in. – psusi Jan 07 '15 at 00:39
6

The accepted answer is not right. I agree with the comment above. I use dd with count parameter to back up my disk on a regular base. Simply replace the BACKUP_FOLDER and letter of your device with "X":

Define last used block of the disk:

ct=$(fdisk -l | awk '$1 == "/dev/sdX" { print $3 }')

Then cloning the disk (excluding it's empty space):

dd if=/dev/sdX bs=512 count=$ct | gzip > BACKUP_FOLDER/sdX_$(date +"%Y-%m-%d").img.gz >>"$LOG"
Aloha D
  • 61
  • 1
  • 1
  • Sorry, but `fdisk -l | awk '$1 == "/dev/sdX" { print $3 }'` doesn't print anything (yes, I've replaced sdX with the intended disk) – Henrique de Sousa Dec 22 '21 at 12:55
  • @HenriquedeSousa: you have to replace with the _last partition_ (`/dev/sdx9`), not the device. – MestreLion May 13 '22 at 05:25
  • Am not an expert on FS and low level block copying at all. But your answer to look for the last block info of the source FS and only copy this then seems the most plausible to achieve only cloning the relevant parts. – porg Jul 18 '22 at 17:56
  • I have a 64GB SD card on which there's an EXT4 FS (from a SBC NAS, Armbian 32bit, OpenMediaVault 6 on top). On `/dev/mmcblk0p1` only 2.7GB are used. Now `fdisk -l /dev/mmcblk0` tells me `Sectors 124018688` which multiplied by 512 is the entire 64GB. So `dd` will read ALL the SD card, also a loaaaads of continuous 0x00 series. `gzip` ensures these long 0x00 series are well compressed. But the backup takes way longer than necessary. Is there a) a clever way to backup only the allocated space? Or b) must I resize the partition to something smaller (with some headroom) and then backup? – porg Jul 18 '22 at 21:29