There is a question that I don't understand.
Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB.
The caller indicates that this is not a failure,
but may mean that there could be performance gains if more memory is available.
What is the sentence mean?
I have read the source code. But I cann't understand because of my poor ability.
The memory size of GPU is 6GB, the result of memory use that I use tfprof analysis is about 14GB.
That is beyond the memory size of GPU.
The sentence is showing weather tensorflow allocate the memory of CPU or use the good algorithm about the use of memory of GPU?
The version of tensorflow that I use is 1.2.
The infomition of GPU as fllows:
- name: GeForce GTX TITAN Z
- major: 3 minor: 5 memoryClockRate (GHz) 0.8755
- Total memory: 5.94GiB
- Free memory: 5.87GiB
My Code:
#!/usr/bin/python3.4
import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
#sess = tf.InteractiveSession()
def weight_variable(shape):
init = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(init)
def bias_variable(shape):
init = tf.constant(0.1, shape=shape)
return tf.Variable(init)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_f1 = weight_variable([7*7*64, 1024])
b_f1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_f1) + b_f1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_f2 = weight_variable([1024, 10])
b_f2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_f2) + b_f2)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
test_images = tf.placeholder(tf.float32, [None, 784])
test_labels = tf.placeholder(tf.float32, [None, 10])
tf.global_variables_initializer().run()
run_metadata = tf.RunMetadata()
for i in range(100):
batch = mnist.train.next_batch(10000)
if (i%10 == 0):
train_accurancy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob : 1.0})
print("step %d, traning accurancy %g" % (i, train_accurancy))
sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob : 0.5}, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), run_metadata=run_metadata)
tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
run_meta=run_metadata,
tfprof_options=tf.contrib.tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)
test_images = mnist.test.images[0:300, :]
test_labels = mnist.test.labels[0:300, :]
print("test accuracy %g" % accuracy.eval({x: test_images, y_: test_labels, keep_prob: 1.0}))
The warning:
2017-08-10 21:37:44.589635: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-08-10 21:37:46.208897: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
The result of tfprof:
==================Model Analysis Report======================
_TFProfRoot (0B/14854.97MB, 0us/7.00ms)