I tried to follow the instructions on Easy Installation of an Optimized Theano on Current Ubuntu but it doesn't work: whenever I run a Theano script using GPU, it gives me the error message:
CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected)
More specifically, following the instructions in the linked webpage, I executed the following steps:
# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
# Install Nvidia drivers and CUDA
sudo apt-get install nvidia-current
sudo apt-get install nvidia-cuda-toolkit
Then I rebooted and tried running:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python gpu_test.py # gpu_test.py comes from http://deeplearning.net/software/theano/tutorial/using_gpu.html
But I get:
f@f-Aurora-R4:~$ THEANO_FLAGS=’mode=FAST_RUN,device=gpu,floatX=float32,cuda.root=/usr/lib/nvidia-cuda-toolkit’ python gpu_test.py WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: no CUDA-capable device is detected) [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 2.199992 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu
(I tested the following on Ubuntu 14.04.4 LTS x64 and Kubuntu 14.04.4 LTS x64, I guess it should work on most Ubuntu variants)
Installing Theano and configuring the GPU (CUDA)
The instructions on the official website are outdated. Instead you can use the following instructions (assume a freshly installed Kubuntu 14.04 LTS x64):
# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
# Install Nvidia drivers, CUDA and CUDA toolkit, following some instructions from http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb # Got the link at https://developer.nvidia.com/cuda-downloads
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo reboot
At that point, running nvidia-smi
should work, but running nvcc
won't work.
# Execute in console, or (add in ~/.bash_profile then run "source ~/.bash_profile"):
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
At that point, both nvidia-smi
and nvcc
should work.
To test whether Theano is able to use the GPU:
Copy-paste the following in gpu_test.py
:
# Start gpu_test.py
# From http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
# End gpu_test.py
and run it:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
which should return:
f@f-Aurora-R4:~$ THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
Using gpu device 0: GeForce GTX 690
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.658292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
To know your CUDA version:
nvcc -V
Example:
username@server:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
Adding cuDNN
To add cuDNN (instructions from http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html):
- Download cuDNN from https://developer.nvidia.com/rdp/cudnn-download (need registration, which is free)
tar -xvf cudnn-7.0-linux-x64-v3.0-prod.tgz
- Do one of the following
Option 1: Copy the *.h
files to CUDA_ROOT/include
and the *.so*
files to CUDA_ROOT/lib64
(by default, CUDA_ROOT
is /usr/local/cuda
on Linux).
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
Option 2:
export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
By default, Theano will detect if it can use cuDNN. If so, it will use it. If not, Theano optimizations will not introduce cuDNN ops. So Theano will still work if the user did not introduce them manually.
To get an error if Theano can not use cuDNN, use this Theano flag: optimizer_including=cudnn
.
Example:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,optimizer_including=cudnn' python gpu_test.py
To know your cuDNN version:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
Adding CNMeM
The CNMeM library is a "Simple library to help the Deep Learning frameworks manage CUDA memory.".
# Build CNMeM without the unit tests
git clone https://github.com/NVIDIA/cnmem.git cnmem
cd cnmem
mkdir build
cd build
sudo apt-get install -y cmake
cmake ..
make
# Copy files to proper location
sudo cp ../include/cnmem.h /usr/local/cuda/include
sudo cp *.so /usr/local/cuda/lib64/
cd ../..
To use with Theano, you need to add the lib.cnmem
flag. Example:
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=0.8,optimizer_including=cudnn' python gpu_test.py
The first output of the script should be:
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5005)
lib.cnmem=0.8
means that it can use up to 80% of the GPU.
CNMeM has been reported to give some interesting speed improvements, and is supported by Theano, Torch, and Caffee.
Theano - source 1:
The speed up depend of many factor, like the shapes and the model itself. The speed up go from 0 to 2x faster.
Theano - source 2:
If you don't change the Theano flag allow_gc, you can expect 20% speed up on the GPU. In some case (small models), we saw a 50% speed up.
Running Theano on multiple CPU cores
As a side note, you can run Theano on multiple CPU cores with the OMP_NUM_THREADS=[number_of_cpu_cores]
flag. Example:
OMP_NUM_THREADS=4 python gpu_test.py
The script theano/misc/check_blas.py
outputs information regarding which BLAS is used:
cd [theano_git_directory]
OMP_NUM_THREADS=4 python theano/misc/check_blas.py
To run Theano's test suite:
nosetests theano
or
sudo pip install nose-parameterized
import theano
theano.test()
Common issues:
- Importing theano: AttributeError: 'module' object has no attribute 'find_graphviz'