In Dataproc how can I access the Spark and Hadoop

2019-08-13 07:14发布

In Google Cloud Dataproc how can I access the Spark or Hadoop job history servers? I want to be able to look at my job history details when I run jobs.

回答1:

To do this, you will need to create an SSH tunnel to the cluster and then use a SOCKS proxy with your browser. This is due to the fact that while the web interfaces are open on the cluster, firewall rules prevent anyone from connecting (for security.)

To access the Spark or Hadoop job history server, you will first need to create an SSH tunnel to the master node of your cluster:

gcloud compute ssh --zone=<master-host-zone> \ --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>

Once you have the SSH tunnel in place, you need to configure a browser to use a SOCKS proxy. Assuming you're using Chrome and know the path to Chrome on your system, you can launch Chrome with a SOCKS proxy using:

<Google Chrome executable path> \
  --proxy-server="socks5://localhost:1080" \
  --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
  --user-data-dir=/tmp/

The full details on how to do this can be found here.