4

We recently ran into a situation where we had 3 volumes in an Error state that were clogging up our 'Default' pool.

We have several media pools for different purposes and thus have the Maximum Volumes directive in place so we become aware of problems (e.g. this problem, sudden increase of data volume, etc.).

My Default pool is:

Pool { 
  Name = Default
  Pool Type = Backup
  Recycle = yes
  Recycle Oldest Volume = yes
  RecyclePool = Scratch
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 60 days
  Maximum Volumes = 35
  Cleaning Prefix = "CLN"
} 

Job retention parameters were set such that we couldn't purge a tape or add another one into the pool:

17-Mar 23:05 server8-dir JobId 10652: Start Backup JobId 10652, Job=server1.2012-03-17_23.05.00_57
17-Mar 23:05 server8-dir JobId 10652: Warning: Unable add Scratch Volume, Pool "Default" full MaxVols=35
17-Mar 23:05 server8-dir JobId 10652: Pruning oldest volume "000026L2"
17-Mar 23:05 server8-dir JobId 10652: Using Device "TS3200-1a"
17-Mar 23:05 server8-dir JobId 10652: Warning: Unable add Scratch Volume, Pool "Default" full MaxVols=35
17-Mar 23:05 server8-dir JobId 10652: Pruning oldest volume "000026L2"
17-Mar 23:05 server8-sd JobId 10652: Job server1.2012-03-17_23.05.00_57 is waiting. Cannot find any appendable volumes.
Please use the "label" command to create a new Volume for:
    Storage:      "TS3200-1a" (/dev/nst0)
    Pool:         Default
    Media type:   LTO3

Is there a way to tell Bacula to move any Errored volumes out of a pool automatically so they don't take up space? One of them had been there for a while (since 2011-08-20 00:10:34) so I don't think it would have ever been moved out.

The emphasis here is for this to happen automatically - I think it makes sense for a volume to get moved out of the pool when RecyclePool is set.

(background: we maintain Bacula for quite a few different customers and we try to have things happen as automatically as possible. While this isn't a massive problem, perhaps this just doesn't exist yet and ought to be submitted as a feature request.)

MikeyB
  • 38,725
  • 10
  • 102
  • 186

1 Answers1

5

The great thing about Bacula is there's a whole bunch of ways to resolve any problems you may encounter.
Here are a few options for this situation:

Option 1:
Delete the error'd volumes outright (delete volume at the bacula console, then select the ones you want to get rid of).
This one is my choice if you're sure the volume is bad/defective -- No sense keeping it around anywhere if you can't use it. Volume deletion is a catalog operation that doesn't affect the tape, so should you need to ever recover data from it you can always use bscan to create catalog entries and recover whatever can be read.

Option 2:
Get rid of the Maximum Volumes directive in your pool.
This guarantees you'll never have to deal with this problem again but also means you can create an infinite number of volumes if you screw up a label command. (I generally don't set Maximum Volumes on my pools -- It makes adding new tapes when your backup size grows more annoying than it needs to be).

Option 3:
Re-Label the volume (relabel at the bacula console, then select the volume(s) you want to move).
Note that the volume must be marked as Purged or Recycled before you can relabel it -- All of these are catalog operations (they don't need/affect the actual tape/volume) so you can run purge volume on the failed volumes and then relabel them into another pool if you so desire.
(purge volume can override/ignore job retention parameters -- or more accurately it WILL purge the volume, regardless of any retention parameters. It's a tactical nuke and should be handled with care.)

Option 4:
I've been assuming we're talking about real tapes -- if these are virtual tape files and the "error" state is from some previous transient incident you can use update volume to clear the error (set it to any appropriate non-error status). You should obviously only do this if you are certain the volume in question is good

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • Good options! This alone would be a handy synopsis for "what to do when a Bacula volume gets into the Error state". – MikeyB Mar 19 '12 at 19:00
  • 1
    @MikeyB it's missing Option 0 (panic and cry in a corner praying to the gods that the data you need to recover is still on the tape somewhere) – voretaq7 Mar 19 '12 at 19:08