0

Problem

We have two mount points on two disks, both are the exact same type. Both disks are formatted ext4.

An rsync command with options to synchronize from source to destination is performed.

After the rsync is performed, the following data is shown:

Disk 1 - Source

315.8 GiB (339,148,905,125) - 476,038 files, 21,975 sub-folders.

Disk 2 - Destination

315.8 GiB (339,098,108,411) - 476,038 files, 21,975 sub-folders.

Difference

50,796,714 bytes (~50 Mb)

Command Used

rsync -r -t -p -o -g -v --progress --delete --ignore-existing -s /media/user/disk1 /media/user/disk2


Question

Why are the total byte sizes different?


Update

The suggested answer was attempted. The byte size between the source and the destination showed no improvement in size equalization.

Suggested answer command included the -a and -l switches, adding archiving and symbolic link transfer:

rsync -a -r -t -p -o -g -v -l --progress --delete --ignore-existing -s /media/user/disk1 /media/user/disk2

(Results)

Disk 1 - Source

315.8 GiB (339,148,905,125) - 476,038 files, 21,975 sub-folders.

Disk 2 - Destination

315.8 GiB (339,098,108,411) - 476,038 files, 21,975 sub-folders.

Difference

50,796,714 bytes (~50 Mb)


Status

Problem not solved.


Further Research

A similar problem found on SuperUser:

https://superuser.com/questions/442539/why-do-two-directory-hierarchies-that-are-in-sync-have-different-sizes

From ServerFault:

Rsync size is difference from source to destination


Update 2

Requests were made for du and df outputs, and the results were:

root@system:/# du -s /media/user/disk1

332172440 /media/user/disk1

root@system:/# du -s /media/user/disk2/

332119568 /media/user/disk2/

root@system:/# df /media/user/disk1

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sdc 528316088 332243868 169212316 67% /media/user/disk1

root@system:/# df /media/user/disk2/

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sdb 528316088 332190996 169265164 67% /media/user/disk2

Thus there is still a difference of 52,872 between disk1 and disk2.

Frugal Rasin
  • 123
  • 1
  • 1
  • 5
  • How are you measuring the total size? If that's the sum of the file sizes (not including directories which don't have a defined size), which files have changed? If that's some different measure, why do you care? Have you read all of [Why are there so many different ways to measure disk usage?](http://unix.stackexchange.com/questions/120311/why-are-there-so-many-different-ways-to-measure-disk-usage)? – Gilles 'SO- stop being evil' Nov 22 '15 at 23:55
  • If both disks are measured using the exact same method, and there is a discrepancy in size, and the data is important, and it should be the exact same, then yes, it worth 'caring' about. – Frugal Rasin Nov 23 '15 at 00:01
  • So have you read the thread I linked to, and have you checked if the reasons I list apply? I guess not, since you think it should be the same. The **size** of the files should be the size, but not necessarily their disk usage. – Gilles 'SO- stop being evil' Nov 23 '15 at 00:10
  • If that is what you have determined then you may reply as an answer, but it will not be marked as correct by me. The sizes should be the _exact_ same when using the same method of measurement against the same file system, and in this case, even the same brand, type, and size of storage. You are free to express your opinions as you like. – Frugal Rasin Nov 23 '15 at 00:32
  • I cannot determine anything since you have given very little information — you didn't even say how you got these numbers. All I can say is that there are reasons why the sizes `du` reports might be different, which I explained these in the thread I linked to. I'll stick to facts. If you want to stick to opinions, your loss. – Gilles 'SO- stop being evil' Nov 23 '15 at 00:36

1 Answers1

4

For many reasons. For example, you don't use -a or -l, so the symbolic links are converted to normal files. You don't use -H so hard links become normal files.

And there is also a phenomenon on Unix/Linux, that a removed file still consumes the space until all the processes that had it opened decide to close it. Might there be open files on Destination before rsync?

Lastly, it could be a problem with Source's sparse files not being properly handled, but it's less likely since rsync usually deals nicely with them.

PS rsync FAQ gives some other possibilities (source):

  1. If your target is slightly smaller than your source the likely cause is a difference in directory sizes. This is simply due to how directories allocate disk space and can't really be helped. I have devised a quick shell command to add up all of the file sizes in the current directory without including the directory sizes:
echo `find . -type f -ls | awk '{print $7 "+"}'`0 | bc
  1. There are also differences in filesystem types, block sizes, file slack overhead, etc. that can cause the outcome to be different.
tune2fs -l /dev/your_block_device | grep -i 'block size'
  1. If you have checked all of this and you are still bugged by an unexplained size difference then I would like to point out that simple sizes are not a very useful check for completeness or accuracy on a data copying operation. It doesn't check the contents of the files at all and it is subject to variations I explained above. I would suggest using an actual file verification utility such as cfv to verify your files using real cryptographic hashes. The cfv utility is very similar to the simple md5sum utility except that it is recursive, faster, and has a %completion bar.
kubanczyk
  • 13,502
  • 5
  • 40
  • 55
  • What does the -a (archive) option do exactly? The `man` section on it does not go into detail that can be understood in the context of the original question's conditions here. (Attempting to create a perfect mirror of files from the source to the destination). – Frugal Rasin Nov 22 '15 at 20:13
  • Manual says quite precisely, that -a stands for -rlptgoD – kubanczyk Nov 23 '15 at 09:02
  • `du` and `df` results posted to main question under heading **Update 2**. – Frugal Rasin Nov 23 '15 at 15:55