Python compute cluster

2020-05-21 05:53发布

问题:

Would it be possible to make a python cluster, by writing a telnet server, then telnet-ing the commands and output back-and-forth? Has anyone got a better idea for a python compute cluster? PS. Preferably for python 3.x, if anyone knows how.

回答1:

The Python wiki hosts a very comprehensive list of Python cluster computing libraries and tools. You might be especially interested in Parallel Python.

Edit: There is a new library that is IMHO especially good at clustering: execnet. It is small and simple. And it appears to have less bugs than, say, the standard multiprocessing module.



回答2:

You can see most of the third-party packages available for Python 3 listed here; relevant to cluster computation is mpi4py -- most other distributed computing tools such as pyro are still Python-2 only, but MPI is a leading standard for cluster distributed computation and well looking into (I have no direct experience using mpi4py with Python 3, yet, but by hearsay I believe it's a good implementation).

The main alternative is Python's own built-in multiprocessing, which also scales up pretty well if you have no interest in interfacing existing nodes that respect the MPI standards but may not be coded in Python.

There is no real added value in rolling your own (as Atwood says, don't reinvent the wheel, unless your purpose is just to better understand wheels!-) -- use one of the solid, tested, widespread solutions, already tested, debugged and optimized on your behalf!-)



回答3:

Look into these

http://www.parallelpython.com/

http://pyro.sourceforge.net/

I have used both and both are exellent for distributed computing
for more detailed list of options see http://wiki.python.org/moin/ParallelProcessing

and if you want to auto execute something on remote machine , better alternative to telnet is ssh as in http://pydsh.sourceforge.net/



回答4:

What kind of stuff do you want to do? You might want to check out hadoop. The backend, heavy lifting is done in java, but has a python interface, so you can write python scripts create and send the input, as well as process the results.



回答5:

If you need to write administrative scripts, take a look at the ClusterShell Python library too, or/and its parallel shell clush. It's useful when dealing with node sets also (man nodeset).



回答6:

I think IPython.parallel is the way to go. I've been using it extensively for the last year and a half. It allows you to work interactively with as many worker nodes as you want. If you are on AWS, StarCluster is a great way to get IPython.parallel up and running quickly and easily with as many EC2 nodes as you can afford. (It can also automatically install Hadoop, and a variety of other useful tools, if needed.) There are some tricks to using it. (For example, you don't want to send large amounts of data through the IPython.parallel interface itself. Better to distribute a script that will pull down chunks of data on each engine individually.) But overall, I've found it to be a remarkably easy way to do distributed processing (WAY better than Hadoop!)



回答7:

"Would it be possible to make a python cluster"

Yes.

I love yes/no questions. Anything else you want to know?

(Note that Python 3 has few third-party libraries yet, so you may wanna stay with Python 2 at the moment.)