-2

Hello I have an issue with a backup script for ZFS snapshots:

basically the break down of the script is this:

### START OF SCRIPT
# These variables are named first because they are nested in other variables.
snap_prefix=snap
retention=10

# Full paths to these utilities are needed when running the script from cron.
#date=/usr/bin/date
GDATE="/opt/csw/bin/gdate"
grep=/usr/bin/grep
#mbuffer=/usr/local/bin/mbuffer
sed=/usr/bin/sed
sort=/usr/bin/sort   
xargs=/usr/bin/xargs
zfs=/sbin/zfs
src_0="ServerStoreR10SSD"

dst_0="zpoolRZ5SATA3/backuppool4/ServerStoreR10SSD"
host="root@hostbk"
today="$snap_prefix-`date +%Y%m%d`"
#yesterday="$snap_prefix-`date -v -1d +%Y%m%d`"
yesterday=$snap_prefix-`$GDATE -d "-1 day" +"%Y%m%d"`
snap_today="$src_0@$today"
snap_yesterday="$src_0@$yesterday"
snap_old=`$zfs list -t snapshot -o name | $grep "$src_0@$snap_prefix*" | $sort -r | $sed 1,${retention}d | $sort | $xargs -n 1`
log=/root/bin/zfsreplication/cronlog/ServerStoreR10SSD.txt
# Create a blank line between the previous log entry and this one.
echo >> $log

# Print the name of the script.
echo "zfsrep_ServerStoreR10SSD.sh" >> $log

# Print the current date/time.
$date >> $log

echo >> $log

# Look for today's snapshot and, if not found, create it.
if $zfs list -H -o name -t snapshot | $sort | $grep "$snap_today$" > /dev/null
then
        echo "Today's snapshot '$snap_today' already exists." >> $log
        # Uncomment if you want the script to exit when it does not create today's snapshot:
        #exit 1
else
        echo "Taking today's snapshot: $snap_today" >> $log
        $zfs snapshot -r $snap_today >> $log 2>&1
fi

echo >> $log

# Look for yesterday snapshot and, if found, perform incremental replication, else print error message.
if $zfs list -H -o name -t snapshot | $sort | $grep "$snap_yesterday$" > /dev/null
then
        echo "Yesterday's snapshot '$snap_yesterday' exists. Proceeding with replication..." >> $log
        $zfs send -R -i $snap_yesterday $snap_today | ssh $host $zfs receive -vudF $dst_0 >> $log 2>&1
        #For use in local snapshots
        #$zfs send -R -i $snap_yesterday $snap_today | $zfs receive -vudF $dst_0 >> $log 2>&1
        echo >> $log
        echo "Replication complete." >> $log
else
        echo "Error: Replication not completed. Missing yesterday's snapshot." >> $log
fi

echo >> $log

 # Remove snapshot(s) older than the value assigned to $retention.
 echo "Attempting to destroy old snapshots..." >> $log

  if [ -n "$snap_old" ]
  then
    echo "Destroying the following old snapshots:" >> $log
    echo "$snap_old" >> $log
    $zfs list -t snapshot -o name | $grep "$src_0@$snap_prefix*" | $sort -r 
| $sed 1,${retention}d | $sort | $xargs -n 1 $zfs destroy -r >> $log 2>&1
else
echo "Could not find any snapshots to destroy."     >> $log
fi

# Mark the end of the script with a delimiter.
echo "**********" >> $log

# END OF SCRIPT
~

the log shows the following

Yesterday's snapshot 'ServerStoreR10SSD@snap-20170419' exists. Proceeding with replication... cannot receive: specified fs (zpoolRZ5SATA3/backuppool4/ServerStoreR10SSD) does not exist attempting destroy zpoolRZ5SATA3/backuppool4/ServerStoreR10SSD failed - trying rename zpoolRZ5SATA3/backuppool4/ServerStoreR10SSD to zpoolRZ5SATA3/backuppool4/ServerStoreR10SSDrecv-5424-1 cannot open 'zpoolRZ5SATA3/backuppool4/ServerStoreR10SSD': dataset does not exist

The script was successfully up until one point when i had a power outage. The main issue is that every time it runs the incremental portion the receiving zfs pool gets renamed to something weird like "..recv-5424-1" hence it cannot open the destination pool and the backup fails...

any suggestions please?

2 Answers2

1

Your script does not show the rename or destroy operations and we don't know the source and destination snapshots, so this answer is generic advice which you can apply to your situation:

Potential cause of error

For incremental ZFS send/recv to work, you always have to have snapshots N and N-1 on the source side and N-1 on the target side. Then you will send the delta (difference) between N-1 and N to the target side, where it will become N. Afterwards, you can delete N-1 on the source and repeat the process.

Immediate fix for now

If one of your snapshots does not match this system, for example because it was deleted or renamed, you have two possible ways of correcting it:

  1. Delete all data on the remote side, then do a full/initial/normal send/recv. This takes more time, but you will not have to troubleshoot much.
  2. Find out what exactly is wrong and correct all problems by hand, if possible (may not be possible if you already deleted snapshots that you need).

Improvement for future problems

Aside from that, you should check your script flow to see how this error has surfaced. It helps to refactor it into smaller functions like send_initial, send_diff, create_snap, delete_snap etc so that you get a clearer picture what happens when. Then draw a state machine diagram (DFA) with possible branches and flows and look at each state change: what happens if errors (network lost, power lost, user cancels script, permission denied, ...) occur and how could you mitigate it?

If this is too much work, you may also use existing solutions that already have fixed those problems. For some of them, have a look at this question from two weeks ago.

user121391
  • 2,452
  • 12
  • 31
  • Thanks - I forgot to add the whole script -- i just edited now it has everything -- but even with the N and N-1 on source and N-1 still fails on trying to make a rename operation: failed - trying rename zpoolRZ5SATA3/backuppool3 to zpoolRZ5SATA3/backuppool3recv-23838-1 – user1814718 Apr 21 '17 at 15:42
  • the tricky part is that i have other pools that run a similar script and they are working correctly... – user1814718 Apr 21 '17 at 15:43
  • @user1814718 Please post both your filesystems and your snapshots related to the question from both systems. The script looks fine at first glance, although I would prefer option `-I` instead of `-i` for `zfs send`, because it includes all interim snapshots between N and N-1 (snapshots from other sources, like users or other scripts). – user121391 Apr 21 '17 at 15:54
0

on my source zfs list

ServerStoreR10SSD                       380G   321G  44.9K  /ServerStoreR10SSD
ServerStoreR10SSD/DataStore2R10SSD      380G   321G   296G  /ServerStoreR10SSD/DataStore2R10SSD

on my source the snapshots are:

ServerStoreR10SSD@snap-20170411                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170412                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170413                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170414                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170415                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170416                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170417                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170418                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170419                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170420                          0      -  44.9K  -
ServerStoreR10SSD@snap-20170421                          0      -  44.9K  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170411     8.77G      -   295G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170412     3.95G      -   295G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170413     3.11G      -   295G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170414     2.99G      -   295G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170415     5.61G      -   296G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170416     3.31G      -   296G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170417     2.76G      -   296G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170418     3.74G      -   296G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170419     3.65G      -   296G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170420     2.72G      -   296G  -
ServerStoreR10SSD/DataStore2R10SSD@snap-20170421     2.27G      -   296G  -

zfs list on my desntination

zpoolRZ5SATA3/backuppool3                                                1.19T  1.77T   202K  /zpoolRZ5SATA3/backuppool3
zpoolRZ5SATA3/backuppool3/DataStoreR10 

on my destination a list of the snapshots:

zpoolRZ5SATA3/backuppool4@moving                                                        139K      -   202K  -
zpoolRZ5SATA3/backuppool4/ServerStoreR10SSDrecv-9540-1/DataStore2R10SSD@snap-20170418  11.8G      -   296G  -
zpoolRZ5SATA3/backuppool4/ServerStoreR10SSDrecv-9540-1/DataStore2R10SSD@snap-20170419  3.67G      -   296G  -
zpoolRZ5SATA3/backuppool4/ServerStoreR10SSDrecv-9540-1/DataStore2R10SSD@snap-20170420      0      -   296G  -