Reproducing the issue
I had tensorflow running a few days ago, but it stopped working. Upon testing it with the tutorial code, both mnist_softmax
and mnist_deep
fail. Tensorflow is succeeding in running the simple helloworld
content.
What I've tried
- As with delton137, I've tried setting
allow_growth
to True or theper_process_gpu_memory_fraction
to 0.1, but this does not help. - I've tried reinstalling my
cudnn
files.
Additional notes
I don't remember making any changes to my Tensorflow installation or my CUDA/cuDNN setup, so my best guess is that this might be an issue with a driver that auto-updated.
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No. Issue is reproducible using code from tensorflow tutorials.
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.3 LTS
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): v1.3.0-rc2-20-g0787eee 1.3.0
- Python version: Python 3.5.2 (default, Aug 18 2017, 17:48:00)
- Bazel version (if compiling from source): N/A
- CUDA/cuDNN version: CUDA release 8.0, V8.0.61 / libcudnn.so.6.0.21
- GPU model and memory: GeForce GTX 1080, 8GB, on 384.90 driver
Source code / logs
For helloworld
code in REPL
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-10-26 21:56:00.418991: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419027: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419036: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419046: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419054: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.565143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-10-26 21:56:00.565417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.48GiB
2017-10-26 21:56:00.565432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-10-26 21:56:00.565437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-10-26 21:56:00.565447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
For python3 mnist_deep.py
2017-10-26 21:37:56.993479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-10-26 21:37:56.993560: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-10-26 21:37:56.993580: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
For python3 mnist_softmax.py
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.50GiB
2017-10-26 21:53:16.150706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-10-26 21:53:16.150712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-10-26 21:53:16.150723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
2017-10-26 21:53:16.422081: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-10-26 21:53:16.422132: W tensorflow/stream_executor/stream.cc:1756] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_9, Variable/read)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mnist_softmax.py", line 78, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "mnist_softmax.py", line 65, in main
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_9, Variable/read)]]
Caused by op 'MatMul', defined at:
File "mnist_softmax.py", line 78, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "mnist_softmax.py", line 42, in main
y = tf.matmul(x, W) + b
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1844, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1289, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_9, Variable/read)]]
Here is the output of nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 34% 51C P0 35W / 180W | 1340MiB / 8110MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1250 G /usr/lib/xorg/Xorg 785MiB |
| 0 2426 G compiz 359MiB |
| 0 3840 G ...-token=44A975F4EE134A1BF9C8CD1C7223C977 107MiB |
| 0 4944 G ...-token=4F87ADEE5575E9B5125D41E08D43BF0E 83MiB |
+-----------------------------------------------------------------------------+
Try to close sessions active in other processes. Please follow this thread -
TensorFlow: InternalError: Blas SGEMM launch failed