1

I have a Bacula setup with 9 clients, and it's working happily. Today I had to add another client, so I went and copied+adapted the existing configuration files from another client, but when I schedule a job for the new client, I get these errors:

20-Mar 17:50 tools-dir JobId 39: Start Backup JobId 39, Job=BackupPresenze2.2012-03-20_17.50.49_04
20-Mar 17:50 tools-dir JobId 39: Using Device "FileStorage"
20-Mar 17:50 presenze2-fd JobId 39: Fatal error: Failed to connect to Storage daemon: bacula.mylan.local:9103
20-Mar 17:50 tools-dir JobId 39: Fatal error: Bad response to Storage command: wanted 2000 OK storage
, got 2902 Bad storage

From the client I can telnet to bacula.mylan.local:9103 just fine, and jobs for other clients work successfully... What could I check?
(Server and client run Ubuntu 10.04, if it's relevant)

Joril
  • 1,513
  • 1
  • 19
  • 27

3 Answers3

4

It looks like it was a "slow DNS" kind of problem... I added the hostname to /etc/hosts and now Bacula works happily.

Joril
  • 1,513
  • 1
  • 19
  • 27
  • 1
    Thanks for replying back when you fixed. Helped me out :) – user3227965 May 19 '14 at 10:36
  • 1
    In my case I used an external DNS-server on the file-deamon-machine which doesn't know about the internal DNS-zone (and therefore the information about the storage-daemon). I got the DNS-response "NXDONAIN" while using nslookup to verify the IP-address. After adding the IP/FQDN-combination of the storage-daemon in the /etc/hosts-file on the file-daemon-machine nslookup's response is still "NXDOMAIN" but ping works and bacula also. Thanks to the answer from "Joril" (above) which gaves the clue to fix my problem. –  Sep 25 '14 at 05:57
1
[rt@bacula user]# netstat -anp | grep bac
tcp        0      0 127.0.0.1:9101          0.0.0.0:*               LISTEN      48075/bacula-dir
tcp        0      0 0.0.0.0:9102            0.0.0.0:*               LISTEN      48077/bacula-fd
tcp        0      0 10.x.y.z:9103        0.0.0.0:*               LISTEN      48076/bacula-sd #<---
Storage {                             # definition of myself
  Name = bacula-sd
  SDPort = 9103                  # Director's port
  WorkingDirectory = "/var/spool/bacula"
  Pid Directory = "/var/run"
  Maximum Concurrent Jobs = 20
# SDAddress = {{ ansible_fqdn }}
}

After adding another interface to the storage system these Bad storage errors started showing up. After looking into it the StorageDaemon was only listening to one interface. Commenting SDAddress forces the sd to listen to all interfaces.

1

In my case it was firewall - FD (client) could not connect to SD (storage daemon) to port tcp/9103. But director could normally retrieve client status with status client=myclient-fd (as DIR could connect to FD to port tcp/9102)

The error is confusing as on causal look it sounds like SD is returning error to FD (which would imply that FD could connect to SD afterall), while in reality it is DIR noticing that FD did not connect to SD and issuing error.

Matija Nalis
  • 2,409
  • 23
  • 37