0

I've an environment like this:

  • master
  • some satellites assigned to master
  • many agents assigned to satellites and some assigned to master (without a satellite).

All systems are ready and the PKI setup is complete. Also most default checks (apt, disk, cpu) are running and I can see the current state on the master. Now I've started to implement custom checks (like check_eth to monitor the network traffic). I've published the script to all hosts and defined also on all hosts the command:

object CheckCommand "check_eth" {
  import "plugin-check-command"
  command = [ "/usr/bin/sudo", PluginDir + "/check_eth" ]
 
  arguments       = {
   "-w" = {
      value                     = "$eth_warning$"
      description               = "Percent free/used when to warn"
      required                  = true
    }
    "-c" = {
      value                     = "$eth_critical$"
      description               = "Percent free/used when critical"
      required                  = true
    }
    "-i" = {
      value                     = "$eth_interface$"
      description               = "Given network interface"
      required                  = true
    }
  }

  vars.eth_interface  = "enp0s31f6"
  vars.eth_warning  = "2048G"
  vars.eth_critical = "4096G"
}

I can run the script on all hosts. On Master, the satellites and all hosts that directly assigned to master the response of the check is visible. On all hosts with parent=satellite the state is UNKNOWN. And that is my problem... why?

The host object is like:

# master: /etc/icinga2/zones.conf

object Endpoint "monitor.domain" {
}

object Zone "master" {
  endpoints = [ "monitor.domain" ]
}

object Endpoint "satellite1.domain" {
    host = "<ip>"
    port = "<port>"
}

object Zone "satellite1.domain" {
    parent = "master"
    endpoints = [ "satellite1.domain" ]
}

The satellite configuration looks like this:

# master: /etc/icinga2/zones.d/satellite1.domain/hosts.conf

object Host "satellite1.domain" {
    import "generic-host"
    check_command = "hostalive"
    zone = "master"

    address = "<ipv4>"
    address6 = "<ipv6>"
    
    vars.agent_endpoint = name
    ...
}

object Host "agent1.domain" {
    import "generic-host"
    check_command = "hostalive"
    zone = "satellite1.domain"

    address = "<ipv4>"
    address6 = "<ipv6>"
    
    vars.agent_endpoint = name
    ...
}
...

The zone incl. endpoint inside satellite is also defined on master:

# master: /etc/icinga2/zones.d/satellite1.domain/zones.conf
object Zone "agent1.domain" {
    parent = "satellite1.domain"
    endpoints = [ "agent1.domain" ]
}

object Endpoint "agent1.domain" {
    host = "<ip>"
    port = "<port>"
}

And now the Apply of the Command to the host (also defined on master)

# master: /etc/icinga2/zones.d/satellite1.domain/services.conf

apply Service "Network Traffic" {
  import "generic-service"

  check_command = "check_eth"
  command_endpoint = host_name

  assign where host.name == "satellite1.domain"
}

apply Service "Network Traffic" {
  import "generic-service"

  check_command = "check_eth"
  command_endpoint = host_name

  assign where host.name == "agent1.domain"
}

What do I miss?

TRW
  • 438
  • 3
  • 14

1 Answers1

0

Ah, now I found the problem. The check command definition contains a default value for eth_interface which exists on the satellites and on master. But the VMs have another interface. If I remove the checkcommand default vars and assign per Host object that var, everything is fine.

TRW
  • 438
  • 3
  • 14