0

The case is the following:

we have two servers, a remote server with ONLY an FTP access, and a local server linux where we have to configure a script for backup all system using dd command. The backup must be saved on the remote server.

Here my script

CT=$(fdisk -l | awk '$1 == "/dev/sda1" { print $3 }') dd if=/dev/sda1 bs=512 count=$CT | gzip | /mnt/remoteftp/PATH/sda_$(uname -r).img.gz

The script works but I have an issue because the script should avoid empty space but the result is different, in fact

root@linuxserver:~/# df -h
Filesystem                           Size  Used Avail Use% Mounted on
udev                                 961M     0  961M   0% /dev
tmpfs                                195M  3,1M  192M   2% /run
/dev/sda1                             20G  2,1G   18G  11% /
tmpfs                                973M     0  973M   0% /dev/shm
tmpfs                                5,0M     0  5,0M   0% /run/lock
tmpfs                                973M     0  973M   0% /sys/fs/cgroup
/dev/sda15                           105M  3,6M  101M   4% /boot/efi
tmpfs                                195M     0  195M   0% /run/user/1001
curlftpfs#ftp://ftp.remotesrv.com/   954G     0  954G   0% /mnt/remoteftp

The /dev/sda1 disk have only 2.1GB assigned so I'm expecting 3 file when my script finish:

1 file sized 1GB, another file sized 1GB, and the last file sized 100MB.

Instead I have

root@linuxserver:~/batch# ls -lah /mnt/remoteftp/
total 17G
drwxr-xr-x 2 ubuntu ubuntu  4,0K nov 20  2019 .
drwxr-xr-x 7 ubuntu ubuntu  4,0K nov 20 12:02 ..
-rwxr-xr-x 0 ubuntu ubuntu   13G nov 20  2019 .fuse_hidden0000007900000001
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzaa
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzab
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzac
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzad
-rwxr-xr-x 1 ubuntu ubuntu  326M nov 20  2019 sda_4.15.0-70-generic.img.gzae
root@linuxserver:~/batch# ls -lah /mnt/remoteftp/
total 17G
drwxr-xr-x 2 ubuntu ubuntu  4,0K nov 20  2019 .
drwxr-xr-x 7 ubuntu ubuntu  4,0K nov 20 12:02 ..
-rwxr-xr-x 0 ubuntu ubuntu   13G nov 20  2019 .fuse_hidden0000007900000001
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzaa
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzab
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzac
-rwxr-xr-x 1 ubuntu ubuntu 1000M nov 20  2019 sda_4.15.0-70-generic.img.gzad
-rwxr-xr-x 1 ubuntu ubuntu  374M nov 20  2019 sda_4.15.0-70-generic.img.gzae
root@linuxserver:~/batch# 

Why this behavior?

Payedimaunt
  • 103
  • 3

1 Answers1

2

When you are using dd you are copying data on the block level. The partition does not know anything about the filesystem or what is on it.

If you copy a 10GB file on the server, and delete it right away, you will only remove the references to it in the filesystem. The data itself will still be there. So, when you copy the partition afterwards with dd, you will copy this 'ghost' file as well.

To get the result you are expecting, you would need to actually wipe all unused space on the disk by overwriting it with other data, preferably something that is easily compressed by gzip. Usually you do this by overwriting everything with zeroes.

After that, the result should be what you expect.


Personally, I wouldn't run the backup on the block device level. I'd just use an existing backup tool that actually knows and uses the filesystem.

Gerald Schneider
  • 19,757
  • 8
  • 52
  • 79