I am running a big job, in cluster mode. However, I am only interested in two floats numbers, which I want to read somehow, when the job succeeds.
Here what I am trying:
from pyspark.context import SparkContext
if __name__ == "__main__":
sc = SparkContext(appName='foo')
f = open('foo.txt', 'w')
pi = 3.14
not_pi = 2.79
f.write(str(pi) + "\n")
f.write(str(not_pi) + "\n")
f.close()
sc.stop()
However, 'foo.txt' doesn't appear to be written anywhere (probably it gets written in an executor, or something). I tried '/homes/gsamaras/foo.txt', which would be the pwd
of the gateway. However, it says: No such file or directory: '/homes/gsamaras/myfile.txt'
.
How to do that?
import os, sys
import socket
print "Current working dir : %s" % os.getcwd()
print(socket.gethostname())
suggest that the driver is actually a node of the cluster, that's why I don't see the file in my gateway.
Maybe write the file in the HDFS somehow?
This won't work either:
Traceback (most recent call last):
File "computeCostAndUnbalancedFactorkMeans.py", line 15, in <module>
f = open('hdfs://myfile.txt','w')
IOError: [Errno 2] No such file or directory: 'hdfs://myfile.txt'