7

Bacula won't make use of 2 tape devices simultaneously. (Search for #-#-# for the TL;DR)

A little background, perhaps.

In the process of trying to get a decent working backup solution (backing up >20TB ain't cheap, or easy) at $dayjob, we bought a bunch of things to make it work.

Firstly, there's a Spectra Logic T50e autochanger, 40 slots of LTO5 goodness, and that robot's got a pair of IBM HH5 Ultrium LTO5 drives, connected via FibreChannel Arbitrated Loop to our backup server.

There's the backup server.. A Dell R715 with 2x 16 core AMD 62xx CPUs, and 32GB of RAM. Yummy. That server's got 2 Emulex FCe-12000E cards, and an Intel X520-SR dual port 10GE NIC.

We were also sold Commvault Backup (non-NDMP).

Here's where it gets really complicated.

Spectra Logic and Commvault both sent respective engineers, who set up the library and the software. Commvault was running fine, in so far as the controller was working fine. The Dell server has Ubuntu 12.04 server, and runs the MediaAgent for CommVault, and mounts our BlueArc NAS as NFS to a few mountpoints, like /home, and some stuff in /mnt.

When backing up from the NFS mountpoints, we were seeing ~= 290GB/hr throughput. That's CRAP, considering we've got 20-odd TB to get through, in a <48 hour backup window. The rated maximum on the BlueArc is 700MB/s (2460GB/hr), the rated maximum write speed on the tape devices is 140MB/s, per drive, so that's 492GB/hr (or double it, for the total throughput).

So, the next step was to benchmark NFS performance with IOzone, and it turns out that we get epic write performance (across >20 threads), and it's like 1.5-2.5TB/hr write, but read performance is fecking hopeless. I couldn't ever get higher than 343GB/hr maximum. So let's assume that the 343GB/hr is a theoretical maximum for read performance on the NAS, then we should in theory be able to get that performance out of a) CommVault, and b) any other backup agent.

Not the case. Commvault seems to only ever give me 200-250GB/hr throughput, and out of experimentation, I installed Bacula to see what the state of play there is. If, for example, Bacula gave consistently better performance and speeds than Commvault, then we'd be able to say "**$.$ Refunds Plz $.$**"

#-#-#

Alas, I found a different problem with Bacula. Commvault seems pretty happy to read from one part of the mountpoint with one thread, and stream that to a Tape device, whilst reading from some other directory with the other thread, and writing to the 2nd drive in the autochanger.

I can't for the life of me get Bacula to mount and write to two tape drives simultaneously.

Things I've tried:

  • Setting Maximum Concurrent Jobs = 20 in the Director, File and Storage Daemons
  • Setting Prefer Mounted Volumes = no in the Job Definition
  • Setting multiple devices in the Autochanger resource.

Documentation seems to be very single-drive centric, and we feel a little like we've strapped a rocket to a hamster, with this one. The majority of example Bacula configurations are for DDS4 drives, manual tape swapping, and FreeBSD or IRIX systems.

I should probably add that I'm not too bothered if this isn't possible, but I'd be surprised. I basically want to use Bacula as proof to stick it to the software vendors that they're overpriced ;)

I read somewhere that @KyleBrandt has done something similar with a modern Tape solution..

Configuration Files: bacula-dir.conf

#
# Default Bacula Director Configuration file

Director {                            # define myself
  Name = backuphost-1-dir
  DIRport = 9101                # where we listen for UA connections
  QueryFile = "/etc/bacula/scripts/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/var/run/bacula"
  Maximum Concurrent Jobs = 20
  Password = "yourekiddingright"         # Console password
  Messages = Daemon
  DirAddress = 0.0.0.0
  #DirAddress = 127.0.0.1
}

JobDefs {
  Name = "DefaultFileJob"
  Type = Backup
  Level = Incremental
  Client = backuphost-1-fd 
  FileSet = "Full Set"
  Schedule = "WeeklyCycle"
  Storage = File
  Messages = Standard
  Pool = File
  Priority = 10
  Write Bootstrap = "/var/lib/bacula/%c.bsr"
}

JobDefs {
  Name = "DefaultTapeJob"
  Type = Backup
  Level = Incremental
  Client = backuphost-1-fd
  FileSet = "Full Set"
  Schedule = "WeeklyCycle"
  Storage = "SpectraLogic"
  Messages = Standard
  Pool = AllTapes
  Priority = 10
  Write Bootstrap = "/var/lib/bacula/%c.bsr"
  Prefer Mounted Volumes = no

}

#
# Define the main nightly save backup job
#   By default, this job will back up to disk in /nonexistant/path/to/file/archive/dir
Job {
  Name = "BackupClient1"
  JobDefs = "DefaultFileJob"
}

Job {
  Name = "BackupThisVolume"
  JobDefs = "DefaultTapeJob"
  FileSet = "SpecialVolume"
}
#Job {
#  Name = "BackupClient2"
#  Client = backuphost-12-fd
#  JobDefs = "DefaultJob"
#}

# Backup the catalog database (after the nightly save)
Job {
  Name = "BackupCatalog"
  JobDefs = "DefaultFileJob"
  Level = Full
  FileSet="Catalog"
  Schedule = "WeeklyCycleAfterBackup"
  # This creates an ASCII copy of the catalog
  # Arguments to make_catalog_backup.pl are:
  #  make_catalog_backup.pl <catalog-name>
  RunBeforeJob = "/etc/bacula/scripts/make_catalog_backup.pl MyCatalog"
  # This deletes the copy of the catalog
  RunAfterJob  = "/etc/bacula/scripts/delete_catalog_backup"
  Write Bootstrap = "/var/lib/bacula/%n.bsr"
  Priority = 11                   # run after main backup
}

#
# Standard Restore template, to be changed by Console program
#  Only one such job is needed for all Jobs/Clients/Storage ...
#
Job {
  Name = "RestoreFiles"
  Type = Restore
  Client=backuphost-1-fd                 
  FileSet="Full Set"                  
  Storage = File                      
  Pool = Default
  Messages = Standard
  Where = /srv/bacula/restore
}

FileSet {
  Name = "SpecialVolume"
  Include {
    Options {
      signature = MD5
    }
  File = /mnt/SpecialVolume
  }
  Exclude {
    File = /var/lib/bacula
    File = /nonexistant/path/to/file/archive/dir
    File = /proc
    File = /tmp
    File = /.journal
    File = /.fsck
  }
}


# List of files to be backed up
FileSet {
  Name = "Full Set"
  Include {
    Options {
      signature = MD5
    }
    File = /usr/sbin
  }

  Exclude {
    File = /var/lib/bacula
    File = /nonexistant/path/to/file/archive/dir
    File = /proc
    File = /tmp
    File = /.journal
    File = /.fsck
  }
}

Schedule {
  Name = "WeeklyCycle"
  Run = Full 1st sun at 23:05
  Run = Differential 2nd-5th sun at 23:05
  Run = Incremental mon-sat at 23:05
}

# This schedule does the catalog. It starts after the WeeklyCycle
Schedule {
  Name = "WeeklyCycleAfterBackup"
  Run = Full sun-sat at 23:10
}

# This is the backup of the catalog
FileSet {
  Name = "Catalog"
  Include {
    Options {
      signature = MD5
    }
    File = "/var/lib/bacula/bacula.sql"
  }
}

# Client (File Services) to backup
Client {
  Name = backuphost-1-fd
  Address = localhost
  FDPort = 9102
  Catalog = MyCatalog
  Password = "surelyyourejoking"          # password for FileDaemon
  File Retention = 30 days            # 30 days
  Job Retention = 6 months            # six months
  AutoPrune = yes                     # Prune expired Jobs/Files
}

#
# Second Client (File Services) to backup
#  You should change Name, Address, and Password before using
#
#Client {
#  Name = backuphost-12-fd                
#  Address = localhost2
#  FDPort = 9102
#  Catalog = MyCatalog
#  Password = "i'mnotjokinganddontcallmeshirley"         # password for FileDaemon 2
#  File Retention = 30 days            # 30 days
#  Job Retention = 6 months            # six months
#  AutoPrune = yes                     # Prune expired Jobs/Files
#}


# Definition of file storage device
Storage {
  Name = File
# Do not use "localhost" here    
  Address = localhost                # N.B. Use a fully qualified name here
  SDPort = 9103
  Password = "lalalalala"
  Device = FileStorage
  Media Type = File
}

Storage {
  Name = "SpectraLogic"
  Address = localhost
  SDPort = 9103
  Password = "linkedinmakethebestpasswords"
  Device = Drive-1
  Device = Drive-2
  Media Type = LTO5
  Autochanger = yes
}



# Generic catalog service
Catalog {
  Name = MyCatalog
# Uncomment the following line if you want the dbi driver
# dbdriver = "dbi:sqlite3"; dbaddress = 127.0.0.1; dbport =  
  dbname = "bacula"; DB Address = ""; dbuser = "bacula"; dbpassword = ""
}

# Reasonable message delivery -- send most everything to email address
#  and to the console
Messages {
  Name = Standard

  mailcommand = "/usr/lib/bacula/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula: %t %e of %c %l\" %r"
  operatorcommand = "/usr/lib/bacula/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula: Intervention needed for %j\" %r"
  mail = root@localhost = all, !skipped            
  operator = root@localhost = mount
  console = all, !skipped, !saved
#
# WARNING! the following will create a file that you must cycle from
#          time to time as it will grow indefinitely. However, it will
#          also keep all your messages if they scroll off the console.
#
  append = "/var/lib/bacula/log" = all, !skipped
  catalog = all
}


#
# Message delivery for daemon messages (no job).
Messages {
  Name = Daemon
  mailcommand = "/usr/lib/bacula/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula daemon message\" %r"
  mail = root@localhost = all, !skipped            
  console = all, !skipped, !saved
  append = "/var/lib/bacula/log" = all, !skipped
}

# Default pool definition
Pool {
  Name = Default
  Pool Type = Backup
  Recycle = yes                       # Bacula can automatically recycle Volumes
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 365 days         # one year
}

# File Pool definition
Pool {
  Name = File
  Pool Type = Backup
  Recycle = yes                       # Bacula can automatically recycle Volumes
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 365 days         # one year
  Maximum Volume Bytes = 50G          # Limit Volume size to something reasonable
  Maximum Volumes = 100               # Limit number of Volumes in Pool
}

Pool {
  Name = AllTapes
  Pool Type = Backup
  Recycle = yes
  AutoPrune = yes                     # Prune expired volumes
  Volume Retention = 31 days         # one Moth
}

# Scratch pool definition
Pool {
  Name = Scratch
  Pool Type = Backup
}

#
# Restricted console used by tray-monitor to get the status of the director
#
Console {
  Name = backuphost-1-mon
  Password = "LastFMalsostorePasswordsLikeThis"
  CommandACL = status, .status
}

bacula-sd.conf

#
# Default Bacula Storage Daemon Configuration file
#

Storage {                             # definition of myself
  Name = backuphost-1-sd
  SDPort = 9103                  # Director's port      
  WorkingDirectory = "/var/lib/bacula"
  Pid Directory = "/var/run/bacula"
  Maximum Concurrent Jobs = 20
  SDAddress = 0.0.0.0
#  SDAddress = 127.0.0.1
}

#
# List Directors who are permitted to contact Storage daemon
#
Director {
  Name = backuphost-1-dir
  Password = "passwordslinplaintext"
}

#
# Restricted Director, used by tray-monitor to get the
#   status of the storage daemon
#
Director {
  Name = backuphost-1-mon
  Password = "totalinsecurityabound"
  Monitor = yes
}


Device {
  Name = FileStorage
  Media Type = File
  Archive Device = /srv/bacula/archive
  LabelMedia = yes;                   # lets Bacula label unlabeled media
  Random Access = Yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
}


Autochanger {
   Name = SpectraLogic
   Device = Drive-1
   Device = Drive-2
   Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
   Changer Device = /dev/sg4
}

Device {
   Name = Drive-1
   Drive Index = 0
   Archive Device = /dev/nst0
   Changer Device = /dev/sg4
   Media Type = LTO5
   AutoChanger = yes
   RemovableMedia = yes;
   AutomaticMount = yes;
   AlwaysOpen = yes;
   RandomAccess = no;
   LabelMedia = yes

}

Device {
   Name = Drive-2
   Drive Index = 1
   Archive Device = /dev/nst1
   Changer Device = /dev/sg4
   Media Type = LTO5
   AutoChanger = yes
   RemovableMedia = yes;
   AutomaticMount = yes;
   AlwaysOpen = yes;
   RandomAccess = no;
   LabelMedia = yes
}

# 
# Send all messages to the Director, 
# mount messages also are sent to the email address
#
Messages {
  Name = Standard
  director = backuphost-1-dir = all
}

bacula-fd.conf

#
# Default  Bacula File Daemon Configuration file
#

#
# List Directors who are permitted to contact this File daemon
#
Director {
  Name = backuphost-1-dir
  Password = "hahahahahaha"
}

#
# Restricted Director, used by tray-monitor to get the
#   status of the file daemon
#
Director {
  Name = backuphost-1-mon
  Password = "hohohohohho"
  Monitor = yes
}

#
# "Global" File daemon configuration specifications
#
FileDaemon {                          # this is me
  Name = backuphost-1-fd
  FDport = 9102                  # where we listen for the director
  WorkingDirectory = /var/lib/bacula
  Pid Directory = /var/run/bacula
  Maximum Concurrent Jobs = 20
  #FDAddress = 127.0.0.1
  FDAddress = 0.0.0.0
}

# Send all messages except skipped files back to Director
Messages {
  Name = Standard
  director = backuphost-1-dir = all, !skipped, !restored
}
Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
  • To clarify - are you trying to have a single job writing to both slots in the autochanger? I don't think you can do that, but I *do* think that multiple jobs will write to separate devices... – voretaq7 Jun 08 '12 at 18:55
  • 1
    Commvault allows a single job to write to both slots. I don't think it's too unfair to expect other software to do the same. – Tom O'Connor Jun 08 '12 at 23:09
  • 1
    If what you're trying to do is prove performance, why not just run 2 jobs at the same time to different devices, it'll prove what you want. – EightBitTony Jun 09 '12 at 00:14
  • I'd like to replicate as closely to what CommVault does, in order to get some decent comparison. Two jobs is not that. – Tom O'Connor Jun 09 '12 at 10:42
  • 1
    Just because Commvault logs it under one job, doesn't mean its not separating it out. For instance Syncsort backup express does this by mountpoint, and will typically exhaust all given mount points to separate threads (in the same job) before it spans a single job across two tapes. I think there are some good reasons for this... don't ask me what they are :) – SpacemanSpiff Jun 11 '12 at 03:23
  • Oh and in Syncsort it was not a global setting, but a per-job option to allow a single mount point to span two tapes I believe. – SpacemanSpiff Jun 11 '12 at 03:28
  • Are you first backing up to fast disk storage and then having cv copying to tapes? Ideally, you would first backup to fast disk storage, and then spin off copies to tape so you could have two streams writing to two drives on the same job at once.. since the disk storage is the source... If not, your copies will be using one drive as the source and the other for copies which would grind things to a halt... I may just be confused but thought I would ask. – InChargeOfIT Jun 12 '12 at 04:16
  • Nope, straight from fast disks to fast tapes. Tried spooling to the local disk array, but that slowed things down 5-7x roughly. Officially out of ideas. – Tom O'Connor Jun 12 '12 at 08:12
  • I would advise backing up to disk storage first as that will always be the fastest... then let your copy jobs spin to tape so you can have two streams running at once... else all of your jobs will queue up while one drive is used as the source... tape to tape copies will be slower as well – InChargeOfIT Jun 13 '12 at 21:28
  • Whut? No way dude.. We've got a HUGE NAS array, like 60 spindles, and 2 NAS "heads". That thing can support concurrent streams. Writing to a local spool disk, as I've already said, slows things down 5-7x. – Tom O'Connor Jun 14 '12 at 06:05
  • 1
    I'm considering abandoning this question, and VTC Too Localised. – Tom O'Connor Jun 14 '12 at 17:50

2 Answers2

1

When you setup a fileset in bacula, it will literally read the pathspec line-by-line and back up like this.

It wont create two threads to read the different file paths in the agent.

As @SpacemanSpiff said, if you wanted to to do this, the way forward would be to setup different jobs, one for each filespec you wanted to backup.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
0

I've got three tips for you:

  • Use multiple storage daemons. You can run multiple storage daemons on different ports on the same machine.
  • Use base jobs for de-duplication. Saves on time and space.
  • Use compression - if your tape drives do compression, well and good, but you may need to weigh it against and experiment with bacula-fd compression. That happens on the client and consequently saves on bandwidth too for a small CPU time sacrifice.
nearora
  • 445
  • 2
  • 8