I'm setting up a computer cluster of (20+) machines, I have a working central ldap server for authenticating users and keeping GID and UID synced across the cluster. One machine acts as a head node, which is exposed to the wider network, users ssh into this and then can ssh into the other machines. This works fine. Users can also ssh into the other machines fine.
When a user logs in for the first time to the head node, PAM creates a homedir for them.
I need, when this happens, to simultaneously create a homedir on all the other machines as well.
I'm thinking of possible solutions:
- Have a script that does this, triggered when a user first logs into the headnode. I'm not sure on the most elegant way to do this, bar a .sh script with 20+ ssh commands
- Have a cron job do the same as above, frequently
- Configure some sort of PAM voodoo to do it
- Have a cron job to create a homedir for every user in the LDAP directory (Don't want to do this, only want user that have log'd into the head node)
- Something else (suggestions welcome)
At the moment the users have to ssh into all the nodes to create their homedir.
A rough analogy to what we are running and how I set this up can be found in theses dockerfiles: https://github.com/dooglz/slurm_docker/blob/master/slurm/ldap_host.dockerfile https://github.com/dooglz/slurm_docker/blob/master/slurm/slurm_node.dockerfile
The cluster is actually run 100% in docker, but on 20+ bare metal Ubuntu 18 servers. This is so I can change configurations easily.
Why: We are running SLURM job scheduler, it runs batch jobs on the cluster, as the users UID, if the homedir doesn exist already [i.e job is running on a node that a user hasn't ssh'd into yet], we get errors. Users can change the default job directory, but I wan't to not have to do this
Suggestions and comments welcome. Thanks
PS
Mounting /home with NFS, is a possibility, but due to the nature of the usecase, we need /home to write and read from the fast local disks.
*EDIT I now have a solution, but it's only for my exact workflow. Using the SLURM prolog command, which runs as root on a node that's about to have a job run, I mkdir the users home. This works for me, but hasn't solved the how I would do this otherwise, or with PAM.