I'm having a problem using the python library simplejson in jython to write a Pig UDF. I need because jython-standalone-2.5.2.jar doesn't come with a JSON library. I'm using Apache Pig version 0.11.0-cdh4.4.0 (rexported) compiled Sep 03 2013, 20:25:46, and according to the documentation http://pig.apache.org/docs/r0.11.1/udf.html#python-advanced "You can import Python modules in your Python script. Pig resolves Python dependencies recursively, which means Pig will automatically ship all dependent Python modules to the backend. Python modules should be found in the jython search path: JYTHON_HOME, JYTHON_PATH, or current directory.". So I download the library from https://pypi.python.org/pypi/simplejson/, unzip it in my working directory and then my script works in local mode (with -x local). Nevertheless in cluster mode I get this error in the failed logs of the task tracker:
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error. Traceback (most recent call last):
File "ejercicio4-udfs.py", line 8, in <module>
ImportError: No module named simplejson
at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:231)
at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.init(JythonScriptEngine.java:158)
at org.apache.pig.scripting.jython.JythonScriptEngine.getFunction(JythonScriptEngine.java:349)
at org.apache.pig.scripting.jython.JythonFunction.<init>(JythonFunction.java:55)
... 92 more
Caused by: Traceback (most recent call last):
File "ejercicio4-udfs.py", line 8, in <module>
ImportError: No module named simplejson
I've tried several things, like zipping simplejson and registering the zip and trying to access it with sys.path.append('simplejson.zip'), I've also tried with:
export JYTHONPATH=$JYTHONPATH:$(pwd)/simplejson.zip; pig script.pig
and also
pig -Dmapred.cache.files="simplejson.zip#simplejson.zip" -Dmapred.create.symlink=yes script.zip