How to make best use of GPU for TensorFlow Estimat

2020-03-01 04:16发布

问题:

I was using Tensorflow(CPU version) for my Deep Learning Model. Specifically using DNNRegressor Estimator for training, with given set of parameters (network structure, hidden layers, alpha etc.) Though I was able to reduce the loss, but model took very large time for learning (approx 3 days.) and time it was taking was 9 sec per 100th step.

I came accross this article :- https://medium.com/towards-data-science/how-to-traine-tensorflow-models-79426dabd304 and found that GPU's can be more faster to learn. So i took p2.xlarge gpu from AWS (single core GPU) with 4(vCPU), 12(ECU) and 61 (MiB).

But the learning rate is same at 9 sec per 100th step. I m using same code I used for Estimators on CPU, because I read that Estimators use GPU on their own. Here is my "nvidia-smi" command output.

It is showing that GPU Memory being used, but my Volatile GPU-Util is at 1%. Not able to figure out, what I am missing out. Is it designed to work same or I m missing something, because global steps per sec is same for both CPU and GPU implementation of Estimators.
Do I have to explicitly change something in DNNRegressor Estimator Code?

回答1:

It sounds like you might be reading from csv and converting to a pandas DataFrame, and then using tensorflow's pandas_input_fn. This is a known issue with the implementation of pandas_input_fn. You can track the issue at https://github.com/tensorflow/tensorflow/issues/13530.

To combat this, you can use a different method for i/o (reading from TFRecords, for example). If you'd like to continue using pandas and increase your steps/second, you can reduce your batch_size, though this may have negative consequences on your estimator's ability to learn.