This question already has an answer here:
-
Python read file as stream from HDFS
3 answers
I have a text file name mr.txt in the hadoop file sytem under /project1 directory. I need to write the python code to read the first line of the text file without downloading mr.txt file into local. But I have trouble to open the mr.txt file from hdfs.
I had tried:
open('hdfs:///project1/mr.txt','r')
Get PySpark installed.
text = sc.textFile('hdfs:///project1/mr.txt')
first_line = text.first()
Without knowing in more detail what your software is or where it is run...
You can use a NFS server so you can mount the HDFS volume and access to it locally. If this option does not suit your needs, you should use Hadoop Streaming. Finally if you are writting a Spark job, you can access the HDFS as if it were your local FS.