Error 1121 importing external library in Pig UDF i

2019-08-04 00:42发布

问题:

I'm having a problem using the python library simplejson in jython to write a Pig UDF. I need because jython-standalone-2.5.2.jar doesn't come with a JSON library. I'm using Apache Pig version 0.11.0-cdh4.4.0 (rexported) compiled Sep 03 2013, 20:25:46, and according to the documentation http://pig.apache.org/docs/r0.11.1/udf.html#python-advanced "You can import Python modules in your Python script. Pig resolves Python dependencies recursively, which means Pig will automatically ship all dependent Python modules to the backend. Python modules should be found in the jython search path: JYTHON_HOME, JYTHON_PATH, or current directory.". So I download the library from https://pypi.python.org/pypi/simplejson/, unzip it in my working directory and then my script works in local mode (with -x local). Nevertheless in cluster mode I get this error in the failed logs of the task tracker:

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error. Traceback (most recent call last):
  File "ejercicio4-udfs.py", line 8, in <module>
ImportError: No module named simplejson

    at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:231)
    at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.init(JythonScriptEngine.java:158)
    at org.apache.pig.scripting.jython.JythonScriptEngine.getFunction(JythonScriptEngine.java:349)
    at org.apache.pig.scripting.jython.JythonFunction.<init>(JythonFunction.java:55)
    ... 92 more
Caused by: Traceback (most recent call last):
  File "ejercicio4-udfs.py", line 8, in <module>
ImportError: No module named simplejson

I've tried several things, like zipping simplejson and registering the zip and trying to access it with sys.path.append('simplejson.zip'), I've also tried with:

export JYTHONPATH=$JYTHONPATH:$(pwd)/simplejson.zip; pig script.pig

and also

pig -Dmapred.cache.files="simplejson.zip#simplejson.zip" -Dmapred.create.symlink=yes script.zip

回答1:

I don't know if my answer come too late but I managed to import simplejson in an UDF.

Here is how I did it :

I downloaded simplejson and put it into a lib folder, then in my UDF I did this :

import sys
sys.path.append('/path/to/your/lib/folder')
import simplejson as json

I then managed to do a json.loads() without any problem on my cluster.

Hope it helps