Tensorflow: Failed to create session

2020-07-13 01:08发布

I get an error when I run my code, the error is:

tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

Here is my code:

# -*- coding: utf-8 -*-
import ...
import ...

checkpoint='/home/vrview/tensorflow/example/char/data/model/'
MODEL_SAVE_PATH = "/home/vrview/tensorflow/example/char/data/model/"

def getAllImages(folder):
    assert os.path.exists(folder)
    assert os.path.isdir(folder)
    imageList = os.listdir(folder)
    imageList = [os.path.join(folder,item) for item in imageList ]
    num=len(imageList)
    return imageList,num

def get_labei():
    img_dir, num = getAllImages(r"/home/vrview/tensorflow/example/char/data/model/file/")
    for i in range(num):
        image = Image.open(img_dir[i])
        image = image.resize([56, 56])
        image = np.array(image)
        image_array = image

        with tf.Graph().as_default():
            image = tf.cast(image_array, tf.float32)
            image_1 = tf.image.per_image_standardization(image)
            image_2 = tf.reshape(image_1, [1, 56, 56, 3])

            logit = color_inference.inference(image_2)
            y = tf.nn.softmax(logit)
            x = tf.placeholder(tf.float32, shape=[56, 56, 3])

            saver = tf.train.Saver()
            with tf.Session() as sess:
              ckpt = tf.train.get_checkpoint_state(MODEL_SAVE_PATH)
              if ckpt and ckpt.model_checkpoint_path:
                   global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
                   saver.restore(sess, ckpt.model_checkpoint_path)
                   print('Loading success, global_step is %s' % global_step)
                   prediction = sess.run(y)
                   max_index = np.argmax(prediction)
              else:
                   print('No checkpoint file found')

        path='/home/vrview/tensorflow/example/char/data/move_file/'+str(max_index)
        isExists = os.path.exists(path)
        if not isExists :
            os.makedirs(path)
        shutil.copyfile(img_dir[i], path)

def main(argv=None):
    get_labei()

if __name__ == '__main__':
    tf.app.run()

And here is my error:

Traceback (most recent call last):
  File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 61, in <module>
    tf.app.run()
  File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 58, in main
    get_labei()
  File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 40, in get_labei
    with tf.Session() as sess:
  File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1187, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 552, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

7条回答
够拽才男人
2楼-- · 2020-07-13 01:40

maybe out of GPU memory? Try running with

export CUDA_VISIBLE_DEVICES=''

Also please provide details about what platform you are using (operating system, architecture). Also include your TensorFlow version.

Were you able to create a simple session from python console. Something like this:

import tensorflow as tf
hello = tf.constant('hi,tensorflow')
sess = tf.Session()
查看更多
Ridiculous、
3楼-- · 2020-07-13 01:43

In my case it helped to revert back to tensorflow 1.9.0 as was suggested here (Anaconda had installed version 1.10.0). It automatically installs the correct version of Cuda (9.0 instead of 9.2 out of my head). Downgrading is simple in Anaconda:

conda install tensorflow=1.9.0

That worked for me. This setup works with Keras 2.2.2.

查看更多
孤傲高冷的网名
4楼-- · 2020-07-13 01:46

In the case I just solved, it was updating the GPU driver to the latest and installing the cuda toolkit. First, the ppa was added and GPU driver installed:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-390

After adding the ppa, it showed options for driver versions, and 390 was the latest 'stable' version that was shown.

Then install the cuda toolkit:

sudo apt install nvidia-cuda-toolkit

Then reboot:

sudo reboot

It updated the drivers to a newer version than the 390 originally installed in the first step (it was 410; this was a p2.xlarge instance on AWS).

查看更多
一纸荒年 Trace。
5楼-- · 2020-07-13 01:51

Happened to me when I had a separate Tensorflow session running in another terminal. Closing that terminal made it work.

查看更多
男人必须洒脱
6楼-- · 2020-07-13 01:52

After you execute

export CUDA_VISIBLE_DEVICES=''

your tensorflow may not use GPU. It may start training the model using CPU only.

You can find a better solution here. This doesn't require any restart, and you can apply it in server.

查看更多
Deceive 欺骗
7楼-- · 2020-07-13 01:53

Are you using GPU? If yes, maybe it's just simply out of GPU Memory due to the previous process failed to be killed.

This ticket helps me identify the problem: https://github.com/tensorflow/tensorflow/issues/9549

To see your GPU status: in terminal, nvidia-smi -l 2 to update your gpu stat every 2 seconds

This post shows you how to kill the process that currently taking all the memory of your GPU: https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi

查看更多
登录 后发表回答