My machine has the following spec:

CPU: Xeon E5-1620 v4

GPU: Titan X (Pascal)

Ubuntu 16.04

Nvidia driver 375.26

CUDA tookit 8.0

cuDNN 5.1

I've benchmarked on the following Keras examples with Tensorflow as the backed reference:

SCRIPT NAME                  GPU       CPU
stated_lstm.py               5sec      5sec 
babi_rnn.py                  10sec     12sec
imdb_bidirectional_lstm.py   240sec    116sec
imbd_lstm.py                 113sec    106sec

My gpu is clearly out performing my cpu in non-lstm models.

SCRIPT NAME                  GPU       CPU
cifar10_cnn.py               12sec     123sec
imdb_cnn.py                  5sec      119sec
mnist_cnn.py                 3sec      47sec

Has anyone else experienced this?

标签： machine-learning tensorflow nvidia keras

4条回答

Emotional °昔

2楼-- · 2020-02-23 06:36

It's just a tip.

Using GPU is powerful when

your Neural Network is big.
your Batch_Size is big.

-- It's what I found from googling.

0人赞添加讨论(0) 举报

我命由我不由天

3楼-- · 2020-02-23 06:40

I have got similar issues here:

Test 1

CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz

Ubuntu 14.04

imdb_bidirectional_lstm.py: 155s

Test 2

GPU: GTX 860m

Nvidia Driver: 369.30

CUDA Toolkit: v8.0

cuDNN: v6.0

imdb_bidirectional_lstm.py:450s

Analyse

When I observe the GPU load curve, I found one interesting thing:

for lstm, GPU load jumps quickly between ~80% and ~10%

GPU load

This is mainly due to the sequential computation in LSTM layer. Remember that LSTM requires sequential input to calculate hidden layer weights iteratively, in other words, you must wait for hidden state at time t-1 to calculate hidden state at time t.

That's not a good idea for GPU cores, since they are many small cores who like doing computations in parallel, sequential compuatation can't fully utilize their computing powers. That's why we are seeing GPU load around 10% - 20% most of the time.

But in the phase of backpropagation, GPU could run derivative computation in parallel, so we can see GPU load peak around 80%.

0人赞添加讨论(0) 举报

倾城　Initia

4楼-- · 2020-02-23 06:45

If you use Keras, use CuDNNLSTM in place of LSTM or CuDNNGRU in place of GRU. In my case (2 Tesla M60), I am seeing 10x boost of performance. By the way I am using batch size 128 as suggested by @Alexey Golyshev.

0人赞添加讨论(0) 举报

The star\"

5楼-- · 2020-02-23 06:50

Too small batch size. Try to increase.

Results for my GTX1050Ti:

imdb_bidirectional_lstm.py
batch_size      time
32 (default)    252
64              131
96              87
128             66

imdb_lstm.py
batch_size      time
32 (default)    108
64              50
96              34
128             25

0人赞添加讨论(0) 举报

Why is my GPU slower than CPU when training LSTM/R

Test 1

Test 2

Analyse

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间