I've just built a deep learning rig (AMD 12 core threadripper; GeForce RTX 2080 ti; 64Gb RAM). I originally wanted to install CUDnn and CUDA on Ubuntu 19.0, but the installation was too painful and after reading around a bit, I decided to switch to Windows 10...
After doing several installs of tensorflow-gpu, in and outside condas, I ran into further issues which I assumed was down to the CUDnn-CUDA-tensorflow compatibility, so uninstalled various versions of CUDA and tf. My output from nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130
Attached also nvidia-smi (which shows CUDA==11.0?!)
I also have:
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
print("keras version: {0} | Backend used: {1}".format(keras.__version__, backend.backend()))
print("tensorflow version: {0} | Backend used: {1}".format(tf.__version__, backend.backend()))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("CUDA: {0} | CUDnn: {1}".format(tf_build_info.cuda_version_number, tf_build_info.cudnn_version_number))
with output:
My device: [name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12853915229880452239
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 9104897474
lo
cality {
bus_id: 1
links {
}
}
incarnation: 7328135816345461398
physical_device_desc: "device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:42:00.0, compute capability: 7.5"
]
Default GPU Device: /device:GPU:0
keras version: 2.3.1 | Backend used: tensorflow
tensorflow version: 2.1.0 | Backend used: tensorflow
Num GPUs Available: 1
CUDA: 10.1 | CUDnn: 7
So (I hope) my installation has at least partly worked, I just still don't know whether the GPU is being used for my training, or if it's just recognised as existing, but the CPU is still being used. How can I differentiate this?
I also use pycharm. There was a recommendation for the installation of Visio Studio and an additional step here:
5. Include cudnn.lib in your Visual Studio project.
Open the Visual Studio project and right-click on the project name.
Click Linker > Input > Additional Dependencies.
Add cudnn.lib and click OK.
I didn't do this step. I also read that I need to set the following in environment variables, but my directory is empty:
SET PATH=C:\tools\cuda\bin;%PATH%
Could anyone verify this?
Also one my kera models requires a search for hyperparameters:
grid = GridSearchCV(estimator=model,
param_grid=param_grids,
n_jobs=-1, # -1 for all cores
cv=KFold(),
verbose=10)
grid_result = grid.fit(X_standardized, Y)
This works fine on my MBP (assuming of course the n_jobs=-1 takes all CPU cores). On my DL rig, I get warnings:
ERROR: The process with PID 5156 (child process of PID 1184) could not be terminated.
Reason: Access is denied.
ERROR: The process with PID 1184 (child process of PID 6920) could not be terminated.
Reason: There is no running instance of the task.
2020-03-28 20:29:48.598918: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.599348: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.599655: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.603023: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.603649: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.604236: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.604773: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.605524: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.608151: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-03-28 20:29:48.608369: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-03-28 20:29:48.608559: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(10, 8), b.shape=(8, 4), m=10, n=4, k=8
[[{{node dense_1/MatMul}}]]
C:\Users\me\PycharmProjects\untitled\venv\lib\site-packages\sklearn\model_selection\_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(10, 8), b.shape=(8, 4), m=10, n=4, k=8
[[node dense_1/MatMul (defined at C:\Users\me\PycharmProjects\untitled\venv\lib\site-packages\keras\backend\tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_982]
Can I assume when using GridSearchCV
, this utilises only the CPU, and not the GPU? Still, when running and timing another method in my code, I compare the MBP's time (approx 40s with 2,8 GHz Intel Core i7) compared to the Desktop's time (approx 43s with a 12 core threadripper). Even when comparing the CPUs I'd expect a far quicker time than the MBP. Is my assumption then wrong?