1

This is a VERY open question since this is my first time creating a cluster. I'm just wondering what type of security concerns will there be and how to prevent them.

Background information

Using SGE (currently installing and figuring out which schedule is the best) on an internal cluster.

Will allow PVM/MPI programs to run as well as Perl programs using one or the other or maybe just forking because they are embarrassingly parallel executions (if I remember correctly, SGE allows fork, but that was read a while ago before I compiled a lot more information. Someone please just comment on this).

There will be an external node that connects to the cluster and this node will submit the jobs received from the Internet/Server.

All users must submit their request to run a job through the internet to the server (trying to think of ways to not allow them to bypass this when they're locally connected).

Goals of this project:

Eventually allow people over the internet from anywhere submit jobs to run, and then be notified when the program is finished. Furthermore allow them to view the data, maybe even download the data for offline viewing.

Unlikely but possible: Maybe even allow users to upload programs to fine tune their data when our program is insufficient.

Kamil Kisiel
  • 11,946
  • 7
  • 46
  • 68
  • Does locally connect mean that they are sitting at a node of the cluster? – 3dinfluence Feb 26 '10 at 20:56
  • No, but people can ssh into the nodes to perform maintenance. I'm just worried that some of those people will try to run their programs through that. The hardware is specialized as in they are dedicated to running programs and rarely have user interactions. –  Mar 01 '10 at 13:40
  • I run a cluster that uses Maui+Torque. Torque has a module for the prologue and epilogue (http://www.clustersinc.com/products/torque/docs/3.4hostsecurity.shtml) that will not allow unauthorized users to access the node, and kill completed jobs. I think you can adapt this to your SGE cluster if needed. – ryanlim Jul 16 '10 at 02:16

2 Answers2

1

A simple way to hinder people to submit jobs locally(from compute nodes) or by using remote shell sessions is to forbid ssh logins for users on compute and IO nodes - there are a couple of how tos how not to break SGE by doing so. It goes without saying that by doing so you can control which host acts as the submission machine.

The primary securing work has to be done on the login/portal node with properly documented and defined interfaces for people to do what ever they are intended to do there. There's stuff like Globus Toolkit's grid-ftp transfers over SSH or with a full blown PKI for that matter.

Or you could also prepare a web portal which in turn uses the DRMAA API to submit jobs from python, ruby, java, etc. and offers ways and means to upload/download programs or data from the system.

Usually security is not a big concern for most HPC installations and the usual UNIX multi-user security principles apply fully. The distributed resource management even helps you protecting from resource abuse and things like that.

For the data viewing portion of the problem: I usually implement a couple of desktop nodes that are reserved for interactive work like development and debugging. Most of the time they also contain GPUs and I configure TruboVNC + VirtualGL to let people have a look at their data locally before they start lengthy transfers to other storages and/or their desktops (they submit VNC desktop sessions to SGE). Helps them to stay locally on the cluster and VNC when setup up properly allows for a very speed experience with hw acceled 3D visualization even over WAN type links. You can also embed a (slower) VNC viewer into your web portal.

pfo
  • 5,630
  • 23
  • 36
0

We did a housekeeping script that run twice an hour and kills all jobs that SGE doesn't know of. This works well and cleans out processes that have for some reason been left running on the nodes as well.

Jimmy Hedman
  • 155
  • 5