python UDF version with Jython/Pig

2020-04-30 17:28发布

问题:

When I do Python UDF with Pig, how do we know which version of Python it is using? Is it possible to use a specific version of Python?

Specifically my problem is in my UDF, I need to use a function in math module math.erf() which is newly introduced in Python version 2.7. I have Python 2.7 installed on my machine and standalone Python program runs fine but when I run it in Pig as Python UDF, I got this:

AttributeError: type object 'org.python.modules.math' has no attribute 'erf'

My guess is Jython is using some pre-2.7 version of Python?

Thanks for your help!

回答1:

To get the version you are using you can do this:

myUDFS.py

#!/usr/bin/python

import sys

@outputSchema('bar: chararray')
def my_func(foo):
    print sys.version
    return foo

If you run the script locally then the version will be printed directly to stdout. To see the output of sys.version when you run it remotely you'll have to check the logs on the job tracker.

However, you are right about Jython being pre-2.7 (kind of). The current stable version of Jython right now is 2.5.3, so this is the version that Pig is using. There is a beta version of 2.7.