Questions tagged [gridengine]

Grid Engine is a distributed resource management (DRM) system that manages the distribution of users' workloads to available compute resources.

Links:

73 questions
3
votes
1 answer

Trying to install Sun Grid Engine on Ubuntu 10.04 - can't connect more execution hosts

I'm using Ubuntu 10.04 and trying to install Sun Grid Engine from Ubuntu repesitory. It works on single machine, I can submit jobs etc. But I can't make it working with any other machine. I added another execution host and installed…
klew
  • 713
  • 2
  • 11
  • 16
2
votes
0 answers

Sun Grid Engine (SGE) / limiting simultaneous array job sub-tasks

I am installing a Sun Grid Engine environment and I have a scheduler limit that I can't quite figure out how to implement. My users will create array jobs that have hundreds of sub-tasks. I would like to be able to limit those jobs to only running…
wfaulk
  • 6,828
  • 7
  • 45
  • 75
2
votes
1 answer

Set up SGE to Fill Each Node Completely Rather than Distribute Jobs

Originally posted on Stack Overflow by mistake... See PS at bottom for response from that post. I've search for this a while, but cannot find the answer. The problem I have is this: assume I have a SGE set up with two 12-CPU machines. I have two…
Andrew
  • 121
  • 2
2
votes
1 answer

Why is there concept of slots in SGE?

According to SGE 5.3 Manual, Slots - The number of jobs which may be executed concurrently in that queue I am new to these concepts and want to start by understanding one by one. For suppose, if RAM is 10G and if there are 10 slots, and hence 1G…
GP92
  • 599
  • 2
  • 6
  • 25
2
votes
0 answers

Programmatically add EC2 execute nodes to Grid Engine cluster

I am running Grid Scheduler (fka Sun Grid Engine) on Amazon Web Services. Master node is running all the time, but I want to programmatically add nodes to the cluster (also remove - but remove is not a problem). I launch an instance from existing…
Felix
  • 533
  • 4
  • 10
  • 23
2
votes
1 answer

Prevent users running processes on cluster head node

What ways are there to prevent users from starting long running, resource intensive processes on the headnode of a Rocks cluster? I've tried: asking politely setting the nice level in limits.conf to 19. Didn't have the expected effect. Processes…
pufferfish
  • 2,660
  • 9
  • 37
  • 40
2
votes
1 answer

qsub: How can I find out what DRM middleware exactly is installed on a cluster?

I have a user account on a very big cluster. I have previous experience with Grid Engine and want to use the cluster for array jobs. The documentation tells me to use "qsub" for load balancing / submission of many jobs. Therefore I assumed this…
user116990
2
votes
1 answer

Using ionice Over Cluster

Background: I use a computing cluster at work (4 slave nodes and 1 head node) that uses the SGE job scheduler. Recently we've been running jobs that do some heavy IO and it has been slowing down shell/vim usage (small IO, but we need it running…
sequenceGeek
  • 155
  • 1
  • 6
2
votes
1 answer

Is there a way to tell SGE to run specific jobs as root on the execution node?

The title kinda says it all... We're using SGE/OGE to submit jobs to a set of worker nodes that then do things with specific pieces of equipment. The programs and scripts that have been created that manipulate this equipment rely on running as…
Rick Reynolds
  • 341
  • 3
  • 10
2
votes
1 answer

SGE - limit a user to a certain host, using resource quota configuration

Is it possible to limit a user to a particular host, using the Resource Quota Configuration option in qmon for Sun Grid Engine? I'm thinking of a line to the effect of: { ... limit users {john} to hostname=compute-1-1.local } The documentation…
pufferfish
  • 2,660
  • 9
  • 37
  • 40
2
votes
1 answer

How can one run a prologue script as root in gridengine?

In one of our compute clusters, we have systems with unique hardware resources to which access is controlled by device-file permissions. Each node has two or four of these, and multiple CPU cores. We'd like to be able to schedule different users'…
mattdm
  • 6,550
  • 1
  • 25
  • 48
2
votes
4 answers

Best way to monitor a Grid of computers?

I've installed Sun Grid Engine on 10 nodes, and one virtual master host. Now I have to monitor all the resources prior to launching it into production, but I don't know which is the best way. I've tried using xml-qstat, but it seems unstable. Any…
Marc Riera
  • 1,587
  • 4
  • 21
  • 38
1
vote
1 answer

Trying to get qsub to work on my cluster

Trying to get qsub to work on my cluster (single node right now but more are coming) So far trying to submit with qsub was returning error: commlib error: got select error (Connection refused) Unable to run job: unable to send message to qmaster…
OMRY VOLK
  • 111
  • 2
1
vote
0 answers

SGE qsub concatenate different requests for different hosts

I wonder it is possible to concatenate multiple requests for different hosts in SGE qsub command? For example, I tried this: qsub -l h="compute-0-[0-9]" -pe smp 6 -l h="compute-0-2[0-9]" -pe smp 4 However, SGE will ignore the first -l and -pe…
Youngman
  • 111
  • 4
1
vote
1 answer

OGE no value for load_avg

There is a problem with my OGE configuration. The load_avg for the nodes does not get set (remains at -NA-). Because of this and because of the np_load_avg threshold on the queue no jobs are being run. [ce@node1 ce]$ qhost -F -l h=node2 HOSTNAME …
Adversus
  • 121
  • 3