How can I import Pandas with Jython

2019-06-19 18:31发布

问题:

I'm new to python, and I've install Jython2.7.0

Java

import org.python.util.PythonInterpreter;
import org.python.core.*; 

public class Main {
    public static void main(String[] args) {
         PythonInterpreter interp = new PythonInterpreter(); 
         interp.execfile("D:/Users/JY/Desktop/test/for_java_test.py");  
         interp.close();
    }
}

Python

import pandas as pd
import ctypes

def main():
    data = pd.read_csv('for_test.csv')
    data_mean = data.a*2
    data_mean.to_csv('catch_test.csv',index=False)
    ctypes.windll.user32.MessageBoxW(0, "Done. Output: a * 2", "Output csv", 0)

if __name__ == '__main__':
    main()

Then I got this error.

Exception in thread "main" Traceback (most recent call last):
File "D:\Users\JYJU\Desktop\test_java\for_java_test.py", line 1, in <module>
    import pandas as pd
ImportError: No module named pandas

How can I fix this if I want to use pandas?

回答1:

You currently cannot use Pandas with Jython, because it depends on CPython specific native extensions. One dependency is NumPy, the other is Cython (which is actually not a native CPython extension, but generates such).

Keep an eye on the JyNI project ("Jython Native Interface"). It enables Jython to use native CPython-extensions and its exact purpose is to solve issues like that encountered by you. However, it is still under heavy development and not yet capable of loading Pandas or NumPy into Jython, but both frameworks are high on the priority list.

(E.g. ctypes is already working to some extend.)

Also, it is currently POSIX only (tested on Linux and OSX).

If you wouldn't require Jython specifically, but just some Java/Pandas interoperation, an already workable solution would be to embed the CPython interpreter. JPY and JEP are projects that provide this. With either of them you should be able to interoperate Java and Pandas (or any other CPython-specific framework).



回答2:

As far as I know pandas is written in cython and is a CPython extension. This means that it's meant to be used by CPython implementation of the Python language (which is the primary implemntation most people use).

Jython is a Python implementation to run Python programs on JVM and is used to provide integration with Java libraries, or Python scripting to Java programs, etc.

Python modules implemented as CPython extensions (like pandas) are not necessarily compatible with all Python implementations (famous implementations other than CPython are Jython, PyPy and IronPython)

If you really have to use Jython and pandas together and you could not find another way to solve the issue, then I suggest using them in different processes.

A Java process is your Jython application running on JVM (either is Java code calling Jython libraries, or a Python code that possibly requires integration with some Java libraries), and another CPython process runs to provide operations required from pandas.

Then use some form of IPC (or tool) to communicate (standard IO, sockets, OS pipes, shared memory, memcache, Redis, etc.).

The Java process sends a request to CPython (or registers the request to shared storage), providing processing parameters, CPython process uses pandas to calculate results and sends back a serialized form of the results (or puts the results back on the shared storage).

This approach requires extra coding (due to splitting the tasks into separate processes), and to serialize the request/response (which depends on the application and the data it's trying to process).

For example in this sample code on the question, Java process can provide the CSV filename to CPython, CPython processes the CSV file using pandas, generates the result CSV file and returns the name of the new file to Java process.