Permissions error on webhdfs

2019-08-29 02:37发布

问题:

I'm working on using the REST interface to Hadoop's HDFS as a convenient way to store files over the network. To test I installed hadoop on my mac (10.8.5) following these instructions:

http://importantfish.com/how-to-install-hadoop-on-mac-os-x/

That worked like a charm and I'm able to start hadoop and run a basic test:

hadoop-examples-1.1.2.jar pi 10 100

Now, I'm using the python client to handle the HTTP requests to/from webhdfs:

http://pythonhosted.org/pywebhdfs/

But I'm stumbling on a basic permissions error when I try to create a directory:

from pywebhdfs.webhdfs import PyWebHdfsClient  
hdfs = PyWebHdfsClient()  
my_dir = 'user/hdfs/data/new_dir'  
hdfs.make_dir(my_dir, permission=755)  

Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/pywebhdfs/webhdfs.py", line 207, in make_dir
_raise_pywebhdfs_exception(response.status_code, response.text)
File "/Library/Python/2.7/site-packages/pywebhdfs/webhdfs.py", line 428, in _raise_pywebhdfs_exception
raise errors.PyWebHdfsException(msg=message)
pywebhdfs.errors.PyWebHdfsException: {"RemoteException":{"exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException","message":"Permission denied: user=webuser, access=WRITE, inode=\"user\":mlmiller:supergroup:rwxr-xr-x"}}

I've also tried specifying the user as 'hdfs' instead of the python lib's defeault to 'webhdfs' but get the same result. After 30 minutes reading I gave up and realized I don't understand the interplay of hdfs users, hadoop security (which I enabled following the install isntructions) and my unix user and permissions.

回答1:

You need to have the PyWebHdfsClient user_name match a unix user that has permission to the directory you are trying to write to. The user that starts the namenode service is by default the "superuser"

I wrote the pywebhdfs client you are using in response to a need at work. If you have any issues or would like to ask for features on the client itself please leave an issue on github and I can address it.

https://github.com/ProjectMeniscus/pywebhdfs/issues

Thank you



回答2:

Figured this one out after stepping away and reading some more docs. webdhfs expects you to specify a user value that matches the unix user who launched hdfs from the shell. So the correct python is:

from pywebhdfs.webhdfs import PyWebHdfsClient  
user = <specify_linux_user_who_launched_hadoop>
hdfs = PyWebHdfsClient(user_name=user)  
my_dir = '%s/data/new_dir' % user  
hdfs.make_dir(my_dir, permission=755)