I'm about to install our new cluster. I've installed the first node and used it for golden-image. As a queuing software we use SGE
(Sun Grid Engine).
After installing of the first node I tested submission with qsub
and reading of queue statistics with qstat
. It worked as expected. However after cloning to another node SGE is not working. I can't start daemon. If I try qstat -f
eventually after longer time I see message:
"error: unable to send message to qmaster using port 535 on host "myHOST": got send timeout*"
I'm not sure where it comes from as the /etc/services
and firewall settings are the same on both hosts.
Another thing is that spool directory for the new node was not created (that could be understood).
Can somebody advise me how to install SGE
using systemimager
without unnecessary pains. I wouldn't like to go through all the computing nodes to start ./install_execd