Google colab TPU and reading from disc while trani

2019-06-03 16:03发布

问题:

I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.

dataset = tf.data.Dataset.from_tensor_slices(in_pics)
dataset = dataset.map(extract_fn)

def extract_fn(x):
    x = tf.read_file(x)
    x = tf.image.decode_jpeg(x, channels=3)
    x = tf.image.resize_images(x, [64, 64])
return x

But then I try to train, I get this error

File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')

Can I work around it somehow? Also tried with TFRecords API, get the same error.

回答1:

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().