How can I extract the audio embeddings (features)

2019-04-15 10:57发布

问题:

I’m talking about the audio features dataset available at https://research.google.com/audioset/download.html as a tar.gz archive consisting of frame-level audio tfrecords.

Extracting everything else from the tfrecord files works fine (I could extract the keys: video_id, start_time_seconds, end_time_seconds, labels), but the actual embeddings needed for training do not seem to be there at all. When I iterate over the contents of any tfrecord file from the dataset, only the four keys video_id, start_time_seconds, end_time_seconds, and labels, are printed.

This is the code I'm using:

import tensorflow as tf
import numpy as np

def readTfRecordSamples(tfrecords_filename):

    record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)

    for string_record in record_iterator:
        example = tf.train.Example()
        example.ParseFromString(string_record)
        print(example)  # this prints the abovementioned 4 keys but NOT audio_embeddings

        # the first label can be then parsed like this:
        label = (example.features.feature['labels'].int64_list.value[0])
        print('label 1: ' + str(label))

        # this, however, does not work:
        #audio_embedding = (example.features.feature['audio_embedding'].bytes_list.value[0])

readTfRecordSamples('embeddings/01.tfrecord')

Is there any trick to extracting the 128-dimensional embeddings? Or are they really not in this dataset?

回答1:

Solved it, the tfrecord files need to be read as sequence examples, not as examples. The above code works if the line