0

I setup a HDP 2.2 cluster successfully (1 NM, 3 DNs and 1 client). User accounts to access HDP cluster are created in client and checked these users can submit jobs, by SSH to client node and run sample jobs.

In next step I enabled Kerberos authentication and created user principals corresponding to users in client. All things went well as expected. Then I SSH to client PC as a user, generate a Kerberos ticket kinit, then I tried to run a sample job, but job submission failed with: user <user name> not found message.

In order to run a Job as a user in Secure HDP cluster, do I have to create a user in all nodes in the cluster?

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208

1 Answers1

1

In short:

Yes, when running Hadoop with Kerberos, the authenticated user must exist in the passwd file (or equivalent user directory such as LDAP) on every node where the TaskTracker (MRv1) or YARN nodemanager runs.

For MRv1, the TaskTracker launches a program called the task-controller prior to each task starting. The task-controller is a setuid root tool that allows the mapred user to change the runtime user of the task. Think of it as the equivalent of the TaskTracker doing

sudo -u youruser /usr/bin/java yourtask

If the user cannot be found in the passwd file, then the task-controller can not complete the switch to that user, causing a failure.

YARN has a similar mechanism.

In the absense of a secure cluster, the TaskTracker does NOT use this. Instead, the task actually runs as the mapred user on each node, but the JobTracker reports it as the submitting user.

Your options at this point are to:

  1. Place the user in the /etc/passwd (and /etc/shadow) via something like adduser on every node that will launch tasks.
  2. OR configure each node to do passwd map lookups via an LDAP server that stores account information via the LDAP posixAccount standards.

You don't mention which distro you're using, so it's hard to point you further than this.

Travis Campbell
  • 1,456
  • 7
  • 15