I get an error when I run my code, the error is:
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
Here is my code:
# -*- coding: utf-8 -*-
import ...
import ...
checkpoint='/home/vrview/tensorflow/example/char/data/model/'
MODEL_SAVE_PATH = "/home/vrview/tensorflow/example/char/data/model/"
def getAllImages(folder):
assert os.path.exists(folder)
assert os.path.isdir(folder)
imageList = os.listdir(folder)
imageList = [os.path.join(folder,item) for item in imageList ]
num=len(imageList)
return imageList,num
def get_labei():
img_dir, num = getAllImages(r"/home/vrview/tensorflow/example/char/data/model/file/")
for i in range(num):
image = Image.open(img_dir[i])
image = image.resize([56, 56])
image = np.array(image)
image_array = image
with tf.Graph().as_default():
image = tf.cast(image_array, tf.float32)
image_1 = tf.image.per_image_standardization(image)
image_2 = tf.reshape(image_1, [1, 56, 56, 3])
logit = color_inference.inference(image_2)
y = tf.nn.softmax(logit)
x = tf.placeholder(tf.float32, shape=[56, 56, 3])
saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(MODEL_SAVE_PATH)
if ckpt and ckpt.model_checkpoint_path:
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
saver.restore(sess, ckpt.model_checkpoint_path)
print('Loading success, global_step is %s' % global_step)
prediction = sess.run(y)
max_index = np.argmax(prediction)
else:
print('No checkpoint file found')
path='/home/vrview/tensorflow/example/char/data/move_file/'+str(max_index)
isExists = os.path.exists(path)
if not isExists :
os.makedirs(path)
shutil.copyfile(img_dir[i], path)
def main(argv=None):
get_labei()
if __name__ == '__main__':
tf.app.run()
And here is my error:
Traceback (most recent call last):
File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 61, in <module>
tf.app.run()
File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 58, in main
get_labei()
File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 40, in get_labei
with tf.Session() as sess:
File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1187, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 552, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
maybe out of GPU memory? Try running with
Also please provide details about what platform you are using (operating system, architecture). Also include your TensorFlow version.
Were you able to create a simple session from python console. Something like this:
In my case it helped to revert back to tensorflow 1.9.0 as was suggested here (Anaconda had installed version 1.10.0). It automatically installs the correct version of Cuda (9.0 instead of 9.2 out of my head). Downgrading is simple in Anaconda:
conda install tensorflow=1.9.0
That worked for me. This setup works with Keras 2.2.2.
In the case I just solved, it was updating the GPU driver to the latest and installing the cuda toolkit. First, the ppa was added and GPU driver installed:
After adding the ppa, it showed options for driver versions, and 390 was the latest 'stable' version that was shown.
Then install the cuda toolkit:
Then reboot:
It updated the drivers to a newer version than the 390 originally installed in the first step (it was 410; this was a p2.xlarge instance on AWS).
Happened to me when I had a separate Tensorflow session running in another terminal. Closing that terminal made it work.
After you execute
your tensorflow may not use GPU. It may start training the model using CPU only.
You can find a better solution here. This doesn't require any restart, and you can apply it in server.
Are you using GPU? If yes, maybe it's just simply out of GPU Memory due to the previous process failed to be killed.
This ticket helps me identify the problem: https://github.com/tensorflow/tensorflow/issues/9549
To see your GPU status: in terminal,
nvidia-smi -l 2
to update your gpu stat every 2 secondsThis post shows you how to kill the process that currently taking all the memory of your GPU: https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi