What are the relative merits / downsides of various Python bundles (EPD / Anaconda) vs. a manual install?
I have installed EPD academic, and I have no issues with it. It provides more packages that I think I will ever need, and it is very easy to update using enpkg enstaller. The EPD academic licence requires yearly renewal however and the free version does not do updates as easily.
At the moment I really only use a handful of packages such as Pandas, NumPy, SciPy, matplotlib, IPython, Statsmodels and their respective dependencies.
For such limited use am I better off with manual install and pip install --upgrade 'package'
or do the bundles offer anything over and above this?
Update 2015: Nowadays I always recommend Anaconda. It includes lots of Python packages for scientific computing, data science, web development, etc. It also provides a superior environment tool, conda
, which allows to easily switch between environments, even between Python 2 and 3. It is also updated very quickly as soon as a new version of a package is released, and you can just do conda update packagename
to update it.
Original answer below:
On Windows, what is complicated is to compile the math packages, so I think a manual install is a viable option only if you are interested only in Python
, without other packages.
Therefore better chose either EPD (now Canopy) or Anaconda.
Anaconda has around 270 packages, including the most important for most scientific applications and data analysis, that is, NumPy, SciPy, Pandas, IPython, matplotlib, Scikit-learn.
So if this is enough for you, I would choose Anaconda.
Instead, if you are interested in other packages, and even more if you use any of the Enthought packages (Chaco for example is very useful for realtime data visualization), then EPD/Canopy is probably a better choice. The Academic version has a larger number of packages in the base install, and many more in the repository. Anaconda also includes Chaco.
I have tried various Windows distributions in the last year, trying to find one sutable for my work environment (behind a proxy, but without access to proxy configuration).
Here is my feedback from experience:
EPD/Canopy:
We had a license of EPD, but it was old and we were unable to update becasue of the weird proxy situation. In order to add some packages (such as recent version of xlrd/xlwt), I compiled from source. To update SciPy and NumPy, I used the precompiled installer from http://www.lfd.uci.edu/~gohlke/pythonlibs/, but it would sometimes screw up compatibility. I loved having a fully configured Py2exe and Cython, and it simply worked out of the box.
After a while, I tried installing the free version of Canopy, but it lacks Cython and py2exe and some specific advanced packaged I needed, so I never really used it.
Some of my colleagues bought the full Canopy license, but we're still not sure how they're going to update...
Python(x,y):
Not wanting to struggle with licenses, I installed Python(x,y) at home. The only downside I noticed right now is that the standard installation requires you to select which packages you want. It's both a good and a bad point, because I can't be sure that my clients will have the exact same configuration as I do when I install. (The Enthought tool suite can be installed in Python(x,y).)
After using Python(x,y) for a while, I just noticed I installed the 32 bit version. Although it is not clear on their website, it seems they don't have a 64 bit version as of July 2015. I'm going to uninstall it and get a 64 bit distribution.
Anaconda:
When I first wrote this, Anaconda didn't seem to have enough packages yet. A couple of years later, it seems much better, I'm going to give it a try!
Manual:
In order to avoid version compatibility issues with our old EPD version, I ended up using manual Python installation and adding additional packages from the LFD website linked above. It works great, but I would still suggest Canopy to a new user who requires advanced packages (like GDAL or PyFITS).
Summary: If you go for Canopy, get the full licence (Academic or purchased). Else, go with Python(x,y), it will end up being the same.
On Ubuntu:
No need for a distribution. It's all relatively recent (+/- 6 months is tolerable) and pre-compiled. You just need to execute sudo apt-get install python python-scipy
and it's there! Most advanced packages are there as well.
The other answers cover the ground quite nicely, so I just want to remark on one particular aspect that nobody has mentioned yet. It is probably fairly niche, but it may potentially make or break Anaconda or Canopy for some people under Linux systems:
Anaconda Python builds use the UCS4 Unicode mode, whereas Enthought Canopy uses UCS2.
What this means in practical terms is that if you rely on any extensions which you cannot compile yourself for whatever reason (e.g. pre-compiled proprietary libraries), if they happen not to be built for a Python version with the same mode, you may sooner or later run into errors that look something like undefined symbol: PyUnicodeUCS4_AsUTF8String
.
According to PEP 0513, UCS4 seems to currently be more popular and recommended. Also, the whole UCS compatibility issues seem to only affect 2.x and < 3.3 versions.
I used Anaconda for years and liked it quite a bit. Unfortunately, IPython Notebook (now Jupyter) is unavailable without the enterprise edition.
I want to use Jupyter notebooks in the classroom, so I switched to Canopy. It seems easy enough to install all of the packages we need. Admittedly, we haven't tested them all.