Lasagne dropoutlayer does not utilize GPU efficien

2019-07-15 06:03发布

I am using theano and lasagne for a DNN speech enhancement project. I use a feed-forward network very similar to the mnist example in the lasagne documentation (/github.com/Lasagne/Lasagne/blob/master/examples/mnist.py). This network uses several dropout layers. I train my network on an Nvidia Titan X GPU. However, when I do not use dropout my GPU utilization is approximately 60% and one epoch takes around 60s but when I use dropout my GPU utilization drops to 8% and each epoch takes approximately 600s. This is regardless of the dropout rate is set to 20% or 0.1%.

Initially I thought it was due to the random number generator (RNG) used to generate the dropout mask, that did not run on the GPU. However, in the code (https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/noise.py) it seems like rng_mrg is used, which should be running on the GPU based on this link: http://deeplearning.net/software/theano/tutorial/examples.html#other-implementations

Running the theano profiler shows that "theano.sandbox.rng_mrg.mrg_uniform" takes up 86.7% of the execution time, which I do not understand.

If anyone has an idea of what kills my GPU utilization I would appreciate it.

2条回答
倾城 Initia
2楼-- · 2019-07-15 06:32

If you look at the code for mrg_uniform, you can see that it is a pure python CPU implementation of the random generator. You can also see that there is a GPU version of that same generator, but the code you are running apparently does not use it.

So the answer isn't that your GPU utilisation is going down so much as your CPU utilisation is greatly increasing because you are using a pure Python random generator. The solution would obviously be to work out how to switch to a GPU accelerated random generator.

查看更多
不美不萌又怎样
3楼-- · 2019-07-15 06:41

As pointed out by talonmies the problem was that lasagne was using the CPU version of the RNG (mrg_uniform) and not the GPU version (GPU_mrg_uniform). I have not yet found an elegant solution but the following two hacks solves the problem.

Either change line 93 cuda_enabled = False to cuda_enabled = True in

https://github.com/Theano/Theano/blob/master/theano/sandbox/cuda/__init__.py

or

Change line 57
self._srng = RandomStreams(get_rng().randint(1, 2147462579))
to
self._srng = "RandomStreams(get_rng().randint(1, 2147462579),use_cuda = True) in https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/noise.py

I also believe you should be able to do the same by simply typing theano.sandbox.cuda.use(enable_cuda=True) directly in the main script. However, this did not for some reasons not work for me.

查看更多
登录 后发表回答