1

I'm setting up a computer cluster of (20+) machines, I have a working central ldap server for authenticating users and keeping GID and UID synced across the cluster. One machine acts as a head node, which is exposed to the wider network, users ssh into this and then can ssh into the other machines. This works fine. Users can also ssh into the other machines fine.

When a user logs in for the first time to the head node, PAM creates a homedir for them.

I need, when this happens, to simultaneously create a homedir on all the other machines as well.

I'm thinking of possible solutions:

  • Have a script that does this, triggered when a user first logs into the headnode. I'm not sure on the most elegant way to do this, bar a .sh script with 20+ ssh commands
  • Have a cron job do the same as above, frequently
  • Configure some sort of PAM voodoo to do it
  • Have a cron job to create a homedir for every user in the LDAP directory (Don't want to do this, only want user that have log'd into the head node)
  • Something else (suggestions welcome)

At the moment the users have to ssh into all the nodes to create their homedir.

A rough analogy to what we are running and how I set this up can be found in theses dockerfiles: https://github.com/dooglz/slurm_docker/blob/master/slurm/ldap_host.dockerfile https://github.com/dooglz/slurm_docker/blob/master/slurm/slurm_node.dockerfile

The cluster is actually run 100% in docker, but on 20+ bare metal Ubuntu 18 servers. This is so I can change configurations easily.

Why: We are running SLURM job scheduler, it runs batch jobs on the cluster, as the users UID, if the homedir doesn exist already [i.e job is running on a node that a user hasn't ssh'd into yet], we get errors. Users can change the default job directory, but I wan't to not have to do this

Suggestions and comments welcome. Thanks

PS

Mounting /home with NFS, is a possibility, but due to the nature of the usecase, we need /home to write and read from the fast local disks.


*EDIT I now have a solution, but it's only for my exact workflow. Using the SLURM prolog command, which runs as root on a node that's about to have a job run, I mkdir the users home. This works for me, but hasn't solved the how I would do this otherwise, or with PAM.

Dooglz
  • 21
  • 3
  • 2
    Usually home directories are mounted by automount via nfs in such use cases. – Gerald Schneider May 16 '18 at 12:37
  • pam_mkhomedir is usually called from `/etc/pam.d/login`. You may have some luck with adding it to for instance `/etc/pam.d/common-session` on your compute nodes, which gets used for both interactive and non-interactive sessions of any kind so the home directory gets created the moment slurm starts a job for that particular user. – HBruijn May 16 '18 at 13:06
  • @HBruijn thanks for the idea, I tested with a pam_exec.so that logged to a file every time common-session and common-session-non-interactive fired, this would fire whenever I log in with ssh, but not when I run a job. It seems slurm doesn't pass through the usual PAM modules when it runs a job. I tested by putting the same script under all of the /pam.d/ configs, nada. – Dooglz May 16 '18 at 15:22
  • It seems that you can do [interesting PAM stuff](https://slurm.schedmd.com/faq.html#pam) by setting `UsePAM=1` in [`slurm.conf`](https://slurm.schedmd.com/slurm.conf.html) and setting up `/etc/pam.d/slurm` – HBruijn May 16 '18 at 15:32

1 Answers1

0

I now have a solution, but it's only for my exact workflow. Using the SLURM prolog command, which runs as root on a node that's about to have a job run, I mkdir the users home. This works for me, but hasn't solved the how I would do this otherwise, or with PAM.

Dooglz
  • 21
  • 3