TensorBoard doesn't show all data points

2019-03-20 23:49发布

问题:

I was running a very long training (reinforcement learning with 20M steps) and writing summary every 10k steps. In between step 4M and 6M, I saw 2 peaks in my TensorBoard scalar chart for game score, then I let it run and went to sleep. In the morning, it was running at about step 12M, but the peaks between step 4M and 6M that I saw earlier disappeared from the chart. I tried to zoom in and found out that TensorBoard (randomly?) skipped some of the data points. I also tried to export the data but some data point including the peaks are also missing in the exported .csv.

I looked for answers and found this in TensorFlow github page:

TensorBoard uses reservoir sampling to downsample your data so that it can be loaded into RAM. You can modify the number of elements it will keep per tag in tensorboard/backend/server.py.

Has anyone ever modified this server.py file? Where can I find the file and if I installed TensorFlow from source, do I have to recompile it after I modified the file?

回答1:

The comment is out of date - it can actually be modified in tensorboard/backend/application.py, in the "Default Size Guidance". By default, it stores 1000 scalars. You can increase that limit arbitrarily, or set it to 0 to store every scalar.

You don't need to recompile TensorBoard, or even download it from source. You could just modify this file in your TensorBoard yourself.

If you install TensorFlow using pip in virtualenv (ubuntu, mac), then within your virtualenv directory the path to application.py should be something like lib/python2.7/site-packages/tensorflow/tensorboard/backend. If you modify that file, you should get the new setting in your tensorboard (when you run tensorboard in that virtualenv). If you're like me, you'll put a print statement too so you can be sure that you're running modified code :)



回答2:

You don't have to change the source code for this, there is a flag called --samples_per_plugin.

Quoting from the help command

--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly specify how many samples to keep per tag for that plugin. For unspecified plugins, TensorBoard randomly downsamples logged summaries to reasonable values to prevent out-of-memory errors for long running jobs. This flag allows fine control over that downsampling. Note that 0 means keep all samples of that type. For instance, "scalars=500,images=0" keeps 500 scalars and all images. Most users should not need to set this flag. (default: '')

So if you want to have a slider of 100 images, use:

tensorboard --samples_per_plugin images=100