Accessing external file in Python UDF

2019-05-06 19:14发布

问题:

I am using hive and a python udf. I defined a sql file in which I added the python udf and I call it. So far so good and I can process on my query results using my python function. However, at this point of time, I have to use an external .txt file in my python udf. I uploaded that file into my cluster (the same directory as .sql and .py file) and I also added that in my .sql file using this command:

ADD FILE /home/ra/stopWords.txt;

When I call this file in my python udf as this:

file = open("/home/ra/stopWords.txt", "r")

I got several errors. I cannot figure out how to add nested files and using them in hive.

any idea?

回答1:

All added files are located in the current working directory (./) of UDF script.

If you add a single file using ADD FILE /dir1/dir2/dir3/myfile.txt, its path will be

./myfile.txt

If you add a directory using ADD FILE /dir1/dir2, the file's path will be

./dir2/dir3/myfile.txt