0

On supercomputer's management node we receive numerous errors such as:

pbs_server: LOG_ERROR::is_request, bad attempt to connect from 10.10.0.254:1023 (address not trusted - check entry in server_priv/nodes)

And after them nearly every minute follows this one:

last message repeated 16 times

where repeat's count vary from time to time.

Mentioned address 10.10.0.254 is one of management node's addresses. Port 1023 according to "netstat -pa | grep 1023" is related to pbs_mom.

It turns out that management node several times per minute tries to connect with itself and can't do it. Advice from error text doesn't help much, management node should not be in "nodes" file as far as I understand.

Could anybody suggest how to solve this problem?

2 Answers2

1

Your management node is not defined as a node in pbs. Open up qmgr and run "create node [hostname without brackets]". The other options is to kill pbs_mom since you probably don't want to run compute jobs on your head node.

chuck
  • 232
  • 1
  • 5
0

I faced this problem, and the reason is I have multiple network interfaces (GE, IB) on the compute nodes, which are both reachable to the admin node.

The admin node in the affected compute node is defined on a separate subnet as it is supposed to be with a different NIC.

Wei
  • 179
  • 4