5

It never happened to me before, but I'm unable to do a simple task such as compressing a 18.5 GB file on Ubuntu Linux 18.04 with any of the popular compressing tools such as gzip, bzip2 and 7z. All of them report a similar warning (not error) message claiming that the file size has changed during compression, when actually no other process is accessing to the file. For example, when trying to "tar-gz", the tool reports: File shrank by <nnnnnnnn> bytes; padding with zeros, exiting with the error code 1, which tar's manpage says it's due to a file change during compression:

exit code 1: Some files differ. If tar was invoked with the --compare (--diff, -d) command line option, this means that some files in the archive differ from their disk counterparts. If tar was given one of the --create, --append or --update options, this exit code means that some files were changed while being archived and so the resulting archive does not contain the exact copy of the file set.

The file is a VMDK, and of course the associated VM is completely shut down when I compress it. On the other hand, I've noticed that all the compressing tools fail when the compressed file reaches a size around 280 MB.

I've already checked other similar questions on ServerFault but still I don't get any hint to figure out what's happening. The most voted answer to the linked question says that this is not an error and that the compressing tool is just "simplifying" a bunch of zero bytes, but if I attempt to run the VM after decompressing the VMDK file, it fails claiming the disk is corrupted.

I'm completely stuck on this. Any ideas of what can be happening?

UPDATE

While trying to copy the file to another directory using the cp command, it dumped an I/O error while reading the file. On the other hand dmesg reported I/O errors while reading a specific block of the file. Everything points to be a disk error (although e2fsck says everything is OK and there are 0 bad blocks). Since I already have a backup of the VM, I will try to change the host computer's disk and reinstall a fresh copy of Ubuntu and see what happens. I keep this question posted until I get some results.

Claudix
  • 181
  • 6
  • To make really sure no process is accessing the file you can use the `lsof` command. – João Alves Mar 22 '21 at 08:33
  • @JoãoAlves Hello! I already did that! I also even restarted the host machine and attempted to compress the file without starting the VM. – Claudix Mar 22 '21 at 08:35
  • "actually no other process is accessing to the file ....". How can you be sure? To be really really sure, can do `sudo chattr +i my-file.vmdk` then try compress. Also please paste example of failing compress command to help us understand exactly. – spinkus Mar 22 '21 at 09:45
  • Please execute `stat ` and triple check `mtime/ctime` timestamp to be sure nothing is changing the file. – shodanshok Mar 22 '21 at 10:06
  • @shodanshok, confirmed that nothing is changing the file. – Claudix Mar 22 '21 at 10:33
  • @semisecure, done, but got same results. Example of compressing command: `tar cvzSpf backup.tar.gz testmachine.vmdk` – Claudix Mar 22 '21 at 10:35
  • I would suggest to copy the file (ie: `cp original.vmdk temp.vmkd` and to try to compress the copied one. Does it change anything? – shodanshok Mar 22 '21 at 11:03
  • @shodanshok, precisely, I was attempting that until I got an error from `cp`. Please, read the update in my question. – Claudix Mar 22 '21 at 11:05
  • 1
    Сheck SMART of a disk. Run `e2fsck -c -f`, to check for bad blocks. Just simple `e2fsck` won't do full disk surface scan for bad blocks. Or better, immediately stop using this disk and make a dump of it if it has any valuable information (with `ddrescue`). – Nikita Kipriyanov Mar 22 '21 at 11:28

1 Answers1

3

OK, I'm answering my own question which may serve someone else for realizing if he/she is actually having hardware problems.

After trying multiple times to compress the problematic file (even with different compressors), I just tried to copy the file to another directory using cp, which dumped an I/O error while reading the file:

cp: reading `filename': Input/output error

A quick glance at dmesg's output confirmed the hardware error, reporting an I/O error reading a specific block on disk.

I booted the OS in emergency mode and ran e2fsck -vf /dev/sda1, yet it didn't report any bad block. From the comments to my question, user Nikita Kipriyanov suggested running e2fsck -c -f, which I had no chance to run because I already changed the disk. The -c flag deals specificly with bad blocks, according to the manpage:

causes e2fsck to use badblocks(8) program to do a read-only scan of the device in order to find any bad blocks. If any bad blocks are found, they are added to the bad block inode to prevent them from being allocated to a file or directory. If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test.

Maybe the reader can run this command as Nikita suggests as a workaround, but when a disk starts giving hardware errors the best option is to try to save as much information as possible and move the system to a new fresh one.

Good luck!

Claudix
  • 181
  • 6
  • 1
    You can always check the disk after it was changed. Just connect it to another computer :) We always do such post-mortem analysis, just to be sure. And, drives *always* die after some use, so you must have an essential drive health monitoring with alerts! It saves data... – Nikita Kipriyanov Apr 11 '21 at 17:29