I'm using the tflearn wrapper over tensorflow to build a model, and would like to add metadata (labels) to the resultant embedding visualization. Is there a way to link a metadata.tsv file to a saved checkpoint after the fact of running it?
I've created a projector_config.pbtxt file in the logdir of the checkpoint summaries, with the metadata.tsv being in the same folder. The config looks like this:
embeddings {
tensor_name: "Embedding/W"
metadata_path: "C:/tmp/tflearn_logs/shallow_lstm/"
}
and was created using the code from the docs - https://www.tensorflow.org/how_tos/embedding_viz/
I've commented out the tf.Session part in the hopes of creating the metadata link without the need of doing so directly within a Session object, but I'm not sure if that's possible.
from tensorflow.contrib.tensorboard.plugins import projector
#with tf.Session() as sess:
config = projector.ProjectorConfig()
# One can add multiple embeddings.
embedding = config.embeddings.add()
embedding.tensor_name = 'Embedding/W'
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = 'C:/tmp/tflearn_logs/shallow_lstm/'
# Saves a config file that TensorBoard will read during startup.
projector.visualize_embeddings(tf.summary.FileWriter('/tmp/tflearn_logs/shallow_lstm/'), config)
Below is a snap of the current embedding visualization. Note the empty metadata. Is there a way to directly attach the desired metafile to this embedding?
I had the same problem and it is soloved now :)
Essentially, all you need to do is following 3 steps:
- save model checkpoint, supposing ckeckpoint's directory is
ckp_dir
;
- place projector_config.pbtxt and metadata.tsv in
ckp_dir
;
- run
tensorboard --logdir=ckp_dir
and click the Embedding Tab
the content of projector_config.pbtxt is :
embeddings {
tensor_name: "embedding_name"
metadata_path: "metatdata.tsv"
}
This is the key to link the embedding to metadata.tsv. In tf.Session(), we often get the embedding's value like sess.run('embedding_name:0')
. But in projector_config.pbtxt, we just type tensor_name: "embedding_name"
.
Generally, we can specify the checkpoint path and metadata_path in projector_config.pbtxt so that we can place checkpoint, projector_config.pbtxt and metadata.tsv in different directories. But i think it is too complicated. I just solved it as above.
the result shown here
Try this with your projector_config.pbtxt
:
embeddings {
tensor_name: "Embedding/W"
metadata_path: "$LOGDIR/metadata.tsv"
}
Make sure your $LOGDIR
is the same path you use to call tensorboard --logdir=$LOGDIR
on your terminal; that is, it should be relative to your current directory (so it probably shouldn't include C:/..
). Also include the filename in the metadata_path
.
Let me know if this works for you, too.
I stumbled upon the same problem trying to display words instead of indices for the word2vec tutorial. To achieve that your projector_config.pbtxt
should look like this:
embeddings {
tensor_name: "w_in"
metadata_path: "$LOGDIR/vocab.txt"
}
You might also want to modify the save_vocab
function in the code linked above since, as is, it converts unicode to hex.
I'm having the same problem. [EDIT:] The way I can make it work, is by creating a subdir and putting all the checkpoint files there, using a full path to the metadata file. The 'trick' is that then if you give tensorboard --logdir that specific dir, it somehow parses the metadata and displays the words in the plot instead of just the indices. The downside is that then then embeddings aren't found anymore when you just use the base log dir as --logdir, so you always have to start a separate tensorboard instance to view the embeddings.
It's really quite annoying and I can't imagine this is the only way to make it work, but I spent hours before that one finally worked...