I am specifying a NodeInitializationAction for Dataproc as follows:
ClusterConfig clusterConfig = new ClusterConfig();
clusterConfig.setGceClusterConfig(...);
clusterConfig.setMasterConfig(...);
clusterConfig.setWorkerConfig(...);
List<NodeInitializationAction> initActions = new ArrayList<>();
NodeInitializationAction action = new NodeInitializationAction();
action.setExecutableFile("gs://mybucket/myExecutableFile");
initActions.add(action);
clusterConfig.setInitializationActions(initActions);
Then later:
Cluster cluster = new Cluster();
cluster.setProjectId("wide-isotope-147019");
cluster.setConfig(clusterConfig);
cluster.setClusterName("cat");
Then finally, I invoke the dataproc.create operation with the cluster. I can see the cluster being created, but when I ssh into the master machine ("cat-m" in us-central1-f), I see no evidence of the script I specified having been copied over or run.
So this leads to my questions:
- What should I expect in terms of evidence? (edit: I found the script itself in /etc/google-dataproc/startup-scripts/dataproc-initialization-script-0).
- Where does the script get invoked from? I know it runs as the user root, but beyond that, I am not sure where to find it. I did not find it in the root directory.
- At what point does the Operation returned from the Create call change from "CREATING" to "RUNNING"? Does this happen before or after the script gets invoked, and does it matter if the exit code of the script is non-zero?
Thanks in advance.