Where can I find documentation about the Embedding Projector included in Tensorboard? Here, there are some references to it, but there's no step-by-step example/tutorial on how to use it.
问题:
回答1:
As far as I am aware this is the only documentation about embedding visualization on the TensorFlow website. Though the code snippet might not be very instructive for the first time users, so here is an example usage:
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
LOG_DIR = 'logs'
mnist = input_data.read_data_sets('MNIST_data')
images = tf.Variable(mnist.test.images, name='images')
with tf.Session() as sess:
saver = tf.train.Saver([images])
sess.run(images.initializer)
saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))
Here first we create a TensoFlow variable (images
) and then save it using tf.train.Saver
. After executing the code we can launch TensorBoard by issuing tensorboard --logdir=logs
command and opening localhost:6006
in a browser.
However this visualisation is not very helpful because we do not see different classes to which each data point belongs. In order to distinguish each class from another one should provider some metadata:
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.tensorboard.plugins import projector
LOG_DIR = 'logs'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')
mnist = input_data.read_data_sets('MNIST_data')
images = tf.Variable(mnist.test.images, name='images')
with open(metadata, 'w') as metadata_file:
for row in mnist.test.labels:
metadata_file.write('%d\n' % row)
with tf.Session() as sess:
saver = tf.train.Saver([images])
sess.run(images.initializer)
saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))
config = projector.ProjectorConfig()
# One can add multiple embeddings.
embedding = config.embeddings.add()
embedding.tensor_name = images.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = metadata
# Saves a config file that TensorBoard will read during startup.
projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)
Which gives us:
回答2:
Sadly, I cannot find a more comprehensive documentation. Below I collect all related resources:
- How-to : https://www.tensorflow.org/how_tos/embedding_viz/
- Google Research Blog: announcement and animation
- Paper : https://arxiv.org/pdf/1611.05469v1.pdf
- Source : https://github.com/tensorflow/embedding-projector-standalone
- 2017 TF Dev Summit tutorial and code
- Issue #6322 has some pointers and examples
PS: Thanks for upvoting me. Now I can post all the links.
回答3:
It sounds like you want to get the Visualization section with t-SNE running on TensorBoard. As you've described, the API of Tensorflow has only provided the bare essential commands in the how-to document.
I’ve uploaded my working solution with the MNIST dataset to my GitHub repo.
Original Stackoverflow answer: TensorBoard Embedding Example?
回答4:
@Ehsan
Your explanation is very good. The key here is that every Variable has to be initialized before saver.save(...) call.
@Everyone
Also, tensorboard embedding is simply visualizing instances of saved Variable class. It doesn't care about whether it's words or images or anything else.
The official doc https://www.tensorflow.org/get_started/embedding_viz does not point out that it is a direction visualization of matrix, which in my opinion, introduced a lot of confusion.
Maybe you wonder what does it mean to visualize a matrix. A matrix can be interpreted as a collection of points in a space.
If I have a matrix with shape (100, 200), I can interpret it as a collection of 100 points, where each point has 200 dimension. In another words, 100 points in a 200 dimension space.
In the word2vec case, we have 100 words where each word is represented with a 200 length vector. Tensorboard embedding simply uses PCA or T-SNE to visualize this collection(matrix).
Therefore, you can through any random matrices. If you through an image with shape (1080, 1920), it will visualize each row of this image as if it's a single point.
That been said, you can visualize the embedding of any Variable class instances by simply saving then
saver = tf.train.Saver([a, _list, of, wanted, variables])
...some code you may or may not have...
saver.save(sess, os.path.join(LOG_DIR, 'filename.ckpt'))
I will try to make a detailed tutorial later.