This Keras model seems to require 6GB+ of RAM using the Tensorflow backend. My back-of-the-envelope math suggests that storing the weights shouldn't require more than 500MB. What's going on?
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
IMAGE_SIZE = 128
print('Build model...')
model = Sequential()
# three color channels, 128x128
# 16 con filters, 3 rows, 3 columns
model.add(Convolution2D(16, 3, 3, input_shape=(3, IMAGE_SIZE, IMAGE_SIZE)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(1))
model.add(Dense(3 * IMAGE_SIZE * IMAGE_SIZE))
model.compile(loss='mse', optimizer='sgd')
It's a convolution layer (16 3x3 filters) connected to a single neuron, and then that single neuron is connected to ~50k neurons.
I'm pretty new to Keras, so I imagine my misunderstanding is pretty fundamental, but I can't seem to figure it out.
Turns out, my issue was including a path to CUDA 7.5 in my LD_CONFIG_PATH
, but including a path to CUDA 7.0 in PATH
. Apparently this awkward combination spawns some undefined behavior, which in my case produced a memory leak.
After examining the code with a valgrind
, I found that the nvcc
from 7.0 was essentially jumping into nonsense areas of the CUDA (7.5) library, which is not unexpected. It's actually pretty amazing it leaked memory instead of just crashing, and that Theano had the same error.
Hopefully no one else will be plagued by this particular issue in the future, but if you are, double check your version paths!
On my local machine, without a GPU'd Tensorflow installed, I still got the memory leak, which appeared to a bug in the previous (0.7.0) version that has been resolved with the (0.7.1) release. Again, I haven't figured out why my non-GPU Theano backend also produced the leak, but after upgrading Tensorflow, the Theano backend doesn't leak either. It's a very strange thing, but I believe the general solution to this problem is "upgrade" and "double-check your env".