I am using HDP Version: 2.6.4
Can you provide a step by step instructions on how to install libraries to the following python directory under spark2 ?
The sc.version (spark version) returns
res0: String = 2.2.0.2.6.4.0-91
The spark2 interpreter name and value is as following
zeppelin.pyspark.python: /usr/local/Python-3.4.8/bin/python3.4
The python version and current libraries are
%spark2.pyspark
import pip
import sys
sorted(["%s==%s" % (i.key, i.version) for i in pip.get_installed_distributions()])
print("--")
print (sys.version)
print("--")
print(installed_packages_list)
--
3.4.8 (default, May 30 2018, 11:05:04)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)]
--
['pip==9.0.1', 'setuptools==28.8.0']
Update 1: using pip install [package name]
actually leads to two problems
1) The HDP is pointing at python2.6 rather than python3.4.8
2) pip3 is not there for some reason
Therefore, I am thinking of installing miniconda and pointing Zeppelin there and installing all the packages in conda to prevent conflict between python 2.6 and 3.4.8
You need to open your terminal and type
pip
and press the TAB key. The pip versions available on your sandbox shall be listed. Use pip3 to install the packages you require. The way to do so remains the samepip3 install "packageName"
. This would make the package available with the Python3 installation you wish to use in Zeppelin.This was painful for us. The workaround that works is:
pip
orpip3
accordingly.zeppelin.pyspark.python
on the spark interpreter is set to: python. This python did not recognize the packages we had installed using the terminal. We had to updatezeppelin.pyspark.python
: /usr/bin/python (the path to the python command, you can get it using the command 'which python')Now the interpreter and zeppelin notebooks were able to access all the packages we installed from the terminal.