3

I would like to set up HDFS permissions in CDH4, with the following requirements:

  1. Everyone can read everything from all HDFS directories
  2. Each user can only write to his user directory on HDFS
  3. Except a special user who can write everywhere

This is a simplified version of the requirements, but it's a good start.

The question is - how do I configure this? Do I have to have Kerberos set up? The Cloudera security guide only discusses Kerberos, but I don't think I need a strong authentication scheme at this point.

A step-by-step guide would be really helpful, as I'm new to Hadoop.

yby
  • 175
  • 2
  • 6

1 Answers1

2

If you're running in the non-kerberos mode, dfs.permissions is basically advisory. Permissions will be enforced by the namenode, so long as someone doesn't figure out that they can spoof their username and become anyone else (including a superuser such as hdfs). If you're cool with that, then you don't need to set up Kerberos.

  1. Set the default permissions for files and directories to be fs.permissions.umask-mode = 0022. This should cause all newly created files to be set up with the right permissions to read from.
  2. Set the default permissions for /user/username to be 755.
  3. Set up a new unix group called "hadoop". Add your user to that. In your hdfs-site.xml, set dfs.permissions.supergroup to hadoop. Make sure your hdfs user is part of this unix group. Any user in the hadoop group can now write to any file because they are considered a superuser.
Travis Campbell
  • 1,456
  • 7
  • 15
  • Travis, could you please explain how users can spoof their username in HDFS? If, for example, we have some permissions set up for linux's user named myUser - how can someone else act like myUser? – MiamiBeach Mar 14 '15 at 09:37
  • @MiamiBeach ... basically this: http://stackoverflow.com/questions/11041253/set-hadoop-system-user-for-client-embedded-in-java-webapp – Travis Campbell Apr 14 '15 at 21:54