I would like to "dump" the tensorboard histograms and plot them via matplotlib. I would have more scientific paper appealing plots.
I managed to hack the way through the Summary file using the tf.train.summary_iterator
and dump the histogram that I wanted to dump( tensorflow.core.framework.summary_pb2.HistogramProto
object).
By doing that and implementing what the java-script code does with the data (https://github.com/tensorflow/tensorboard/blob/c2fe054231fe77f3a5b05dbc519f713d2e738d1c/tensorboard/plugins/histogram/tf_histogram_dashboard/histogramCore.ts#L104), I managed to get something similar (same trends) with the tensorboard plots, but not the exact same plot.
Can I have some light on this?
Thanks
In order to plot a tensorboard histogram with matplotlib I am doing the following:
event_acc = EventAccumulator(path, size_guidance={
'histograms': STEP_COUNT,
})
event_acc.Reload()
tags = event_acc.Tags()
result = {}
for hist in tags['histograms']:
histograms = event_acc.Histograms(hist)
result[hist] = np.array([np.repeat(np.array(h.histogram_value.bucket_limit), np.array(h.histogram_value.bucket).astype(np.int)) for h in histograms])
return result
h.histogram_value.bucket_limit
gives me the value and h.histogram_value.bucket
the count of this value. So when i repeat the values accordingly (np.repeat(...)
), I get a huge array of expected size. This array can now be plotted with the default matplotlib logic.
The best solution is loading all events and reconstructing all the histogram (as the answer of @khuesmann) but not using EventAccumulator
but EventFileLoader
. This will give you a histogram per wall time and step as the ones Tensorboard plots. It can be extended to return a list of actions by timestep and wall time.
Don't forget to check which tag will you use.
from tensorboard.backend.event_processing.event_file_loader import EventFileLoader
# Just in case, PATH_OF_FILE is the path of the file, not the folder
loader = EventFileLoader(PATH_Of_FILE)
# Where to store values
wtimes,steps,actions = [],[],[]
for event in loader.Load():
wtime = event.wall_time
step = event.step
if len(event.summary.value) > 0:
summary = event.summary.value[0]
if summary.tag == HISTOGRAM_TAG:
wtimes += [wtime]*int(summary.histo.num)
steps += [step] *int(summary.histo.num)
for num,val in zip(summary.histo.bucket,summary.histo.bucket_limit):
actions += [val] *int(num)
bear in mind that tensorflow approximates the actions and treats the actions as continuous variables, so even if you have discrete actions (e.g. 0,1,3) you will end up actions as 0.2,0.4,0.9,1.4 ... in that case round the values will do it.
A good solution is the one from @khuesmann, but this only allows you to retrieve the accumulated histogram, not the histogram per step -- which is the one actually being showed in tensorboard.
If you want the distribution and so far, what I have understood is that Tensorboard usually compresses the histogram to decrease the memory used to store the data -- imagine storing a 2D histogram over 4 million steps, the memory can increase fast quickly. These compress histograms are accessible by doing this:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
n2n = EventAccumulator(PATH)
n2n.Reload()
# Check the tags under histograms and choose the one you want
n2n.Tags()
# This will give you the list used by tensorboard
# of the compress histograms by timestep and wall time
n2n.CompressedHistograms(HISTOGRAM_TAG)
The only problem is that it compresses the histogram to five percentiles (in Basic points they are 0, 668, 1587, 3085, 5000, 6915, 8413, 9332, 10000) which corresponds to (-Inf, -1.5, -1, -0.5, 0, 0.5, 1, 1.5, Inf) in standard deviations. Check the code here.
I haven't read much, but it wouldn't be hard to reconstruct the temporal histograms that tensorboard shows. If I find a way to do it, I will post it here.
Why not just download the raw data (as CSV or JSON) from the tensorboard plots that you want, to import and use with matplotlib? See: Can I export a tensorflow summary to CSV?