So our Hadoop cluster runs on some nodes and can only be accessed from these nodes. You SSH into them and do your work.
Since that is quite annoying, but (understandably) nobody will even go near trying to configure access control so that it may be usable from outside for some, I'm trying the next best thing, i.e. using SSH to run a SOCKS proxy into the cluster:
$ ssh -D localhost:10000 the.gateway cat
There are whispers of SOCKS support (naturally I haven't found any documentation), and apparently that goes into core-site.xml
:
<property>
<name>fs.default.name</name>
<value>hdfs://reachable.from.behind.proxy:1234/</value></property>
<property>
<name>mapred.job.tracker</name>
<value>reachable.from.behind.proxy:5678</value></property>
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.SocksSocketFactory</value></property>
<property>
<name>hadoop.socks.server</name>
<value>localhost:10000</value></property>
Except hadoop fs -ls /
still fails, without any mention of SOCKS.
Any tips?
I'm only trying to run jobs, not administer the cluster. I only need to access HDFS and submit jobs, through SOCKS (it seems there's an entirely separate thing about using SSL/Proxies between the cluster nodes etc; I don't want that, my machine shouldn't be part of the cluster, just a client.)
Is there any useful documentation on that? To illustrate my failure to turn up anything useful: I found the configuration values by running the hadoop client through strace -f
and checking out the configuration files it read.
Is there a description anywhere of which configuration values it even reacts to? (I have literally found zero reference documentation, just differently outdated tutorials, I hope I've been missing something?)
Is there a way to dump the configuration values it is actually using?
The original code to implement this was added in https://issues.apache.org/jira/browse/HADOOP-1822
But this article also notes that you have to change the socket class to SOCKS
http://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-using-a-proxy/
with
<property> <name>hadoop.rpc.socket.factory.class.default</name> <value>org.apache.hadoop.net.SocksSocketFactory</value> </property>
Edit: Note that the properties go in different files: