dump/restore used to take 40 minutes for 4 filesystems, now takes 24 hours

1

On my small business linux server, I use dump for backups. I have 4 file systems, (root, home, app1, app2), of which root is a "real" partition, while the others are LVM partitions. All are ext4.

I used to be able to do the level 0 dumps in 40-45 minutes. Suddenly, one day they started to take 8 hours, since then it goes slower and slower ... now it takes more than 24 hours. The level 1 dumps still zoom through in 10-15 minutes most days.

My first thought was there might be "dirt" in the largest file system (/home) which was the first to go slow. But fsck did not cure it.

I was inspired to check the smartctl device status on the drive with the live filesystems, and indeed found a few hundred thousand transient read errors. Replaced the drive, and restored from backups (which were good). Problem persisted. smartctl on the replacement drive showed MILLIONS of transient read errors. Some articles on the net suggested that this might be alright and normal for modern terabyte drives. Nevertheless, I replaced the drive with a SSD, but nothing changed.

The live file system was on a Seagate Barracuda 500 GB drive. I was always told that Seagate Barracuda was the gold standard of drives.

The backup staging disk is a WD 1TB drive. smartctl shows 0 errors on it.

Any idea why this problem showed up out of nowhere and what may be causing it?

Some people will say that dump/restore is too old school to use today, but I find it much easier to manage for this use than rsync. I have daily incrementals, and then archive the weekly level 0 to a BD-R DL disc.

HELP

Lars Poulsen

Posted 2017-03-08T02:09:01.793

Reputation: 61

Can you tell us more about your backup process? Where are you dumping your filesystems? Could the target be your bottleneck? Alternatively, you tell us about rsync not providing with incrementals, ... Have you considered solutions such as Bacula? – SYN – 2017-03-08T02:16:51.077

Crontab runs 4 level 0 dumps on Saturday 00:15, 4 level 1 dumps Monday through Friday, to /backups, which is an LVM partition taking up all of /dev/sdb (The production file systems live on /dev/sda). – Lars Poulsen – 2017-03-08T02:39:17.020

I don't like GUI subsystems - not easy to script. Yes, I know you can do incrementals with rsync playing games with hard links, but (a) It is more complicated than I like (b) I end up with thousands of little files to put on my off-site media rather than 4 compressed dump files. (c) we like what we are used to, and I have used this for a decade. (Actually I used to have a tape jukebox to spool the dump files to, until the set no longer fit on tape, and it became DVD, DVD-DL, BD-R and now BD-R/DL.) – Lars Poulsen – 2017-03-08T02:42:44.793

Yes, I considered the staging disk as a possible bottleneck, but

(a) Why all of a sudden

(b) smartctl says no errors – Lars Poulsen – 2017-03-08T02:44:21.370

While I am still hoping to hear an answer to the problem, I am exploring the workaround of using rsync with hard links. I see that the rsync view of the 4 file systems add up to 70 GB instead of the 47GB of the (compressed) dump files - one level 0 plus one level 1. – Lars Poulsen – 2017-03-08T19:50:24.073

No answers