Setup:
- Using a Amazon Linux system with a Nvidia GPU
- I'm using Keras 1.0.1
- Running Theano v0.8.2 backend
- Using CUDA and CuDNN
- THEANO_FLAGS="device=gpu,floatX=float32,lib.cnmem=1"
Everything works fine, but I run out of video memory on large models when I increase the batch size to speed up training. I figure moving to a 4 GPU system would in theory either improve total memory available or allow smaller batches to build faster, but observing the the nvidia stats, I can see only one GPU is used by default:
+------------------------------------------------------+
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 44C P0 45W / 125W | 3954MiB / 4095MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
| 1 GRID K520 Off | 0000:00:04.0 Off | N/A |
| N/A 28C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GRID K520 Off | 0000:00:05.0 Off | N/A |
| N/A 32C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GRID K520 Off | 0000:00:06.0 Off | N/A |
| N/A 29C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 9862 C python34 3941MiB |
I know with raw Theano you can use manually multiple GPU's explicitly. Does Keras support use of multiple GPU's? If so, does it abstract it or do you need to map the GPU's to devices as in Theano and explicitly marshall computations to specific GPU's?
Multi-GPU training is experimental ("The code is rather new and is still considered experimental at this point. It has been tested and seems to perform correctly in all cases observed, but make sure to double-check your results before publishing a paper or anything of the sort.") and hasn't been integrated into Keras yet. However, you can use multiple GPUs with Keras with the Tensorflow backend: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html#multi-gpu-and-distributed-training.