What is the best way to save image metadata alongs

2020-08-11 11:00发布

问题:

In my work as a grad student, I capture microscope images and use python to save them as raw tif's. I would like to add metadata such as the name of the microscope I am using, the magnification level, and the imaging laser wavelength. These details are all important for how I post-process the images.

I should be able to do this with a tif, right? Since it has a header?

I was able to add to the info in a PIL image:

im.info['microscope'] = 'george'

but when I save and load that image, the info I added is gone.

I'm open to all suggestions. If I have too, I'll just save a separate .txt file with the metadata, but it would be really nice to have it embedded in the image.

回答1:

For internal use, try saving the metadata as JSON in the TIFF ImageDescription tag, e.g.

from __future__ import print_function, unicode_literals

import json
import numpy
import tifffile  # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html

data = numpy.arange(256).reshape((16, 16)).astype('u1')
metadata = dict(microscope='george', shape=data.shape, dtype=data.dtype.str)
print(data.shape, data.dtype, metadata['microscope'])

metadata = json.dumps(metadata)
tifffile.imsave('microscope.tif', data, description=metadata)

with tifffile.TiffFile('microscope.tif') as tif:
    data = tif.asarray()
    metadata = tif[0].image_description
metadata = json.loads(metadata.decode('utf-8'))
print(data.shape, data.dtype, metadata['microscope'])

Note that JSON uses unicode strings.

To be compatible with other microscopy software, consider saving OME-TIFF files, which store defined metadata as XML in the ImageDescription tag.



回答2:

I should be able to do this with a tif, right? Since it has a header?

No.

First, your premise is wrong, but that's a red herring. TIFF does have a header, but it doesn't allow you to store arbitrary metadata in it.

But TIFF is a tagged file format, a series of chunks of different types, so the header isn't important here. And you can always create your own private chunk (any ID > 32767) and store anything you want there.

The problem is, nothing but your own code will have any idea what you stored there. So, what you probably want is to store EXIF or XMP or some other standardized format for extending TIFF with metadata. But even there, EXIF or whatever you choose isn't going to have a tag for "microscope", so ultimately you're going to end up having to store something like "microscope=george\nspam=eggs\n" in some string field, and then parse it back yourself.

But the real problem is that PIL/Pillow doesn't give you an easy way to store EXIF or XMP or anything else like that.

First, Image.info isn't for arbitrary extra data. At save time, it's generally ignored.

If you look at the PIL docs for TIFF, you'll see that it reads additional data into a special attribute, Image.tag, and can save data by passing a tiffinfo keyword argument to the Image.save method. But that additional data is a mapping from TIFF tag IDs to binary hunks of data. You can get the Exif tag IDs from the undocumented PIL.ExifTags.TAGS dict (or by looking them up online yourself), but that's as much support as PIL is going to give you.

Also, note that accessing tag and using tiffinfo in the first place requires a reasonably up-to-date version of Pillow; older versions, and classic PIL, didn't support it. (Ironically, they did have partial EXIF support for JPG files, which was never finished and has been stripped out…) Also, although it doesn't seem to be documented, if you built Pillow without libtiff it seems to ignore tiffinfo.

So ultimately, what you're probably going to want to do is:

  • Pick a metadata format you want.
  • Use a different library than PIL/Pillow to read and write that metadata. (For example, you can use GExiv2 or pyexif for EXIF.)


回答3:

Tifffile is one option for saving microscopy images with lots of metadata in python.

It doesn't have a lot of external documentation, but the docstings are great so you can get a lot of info just by typing help(tifffile) in python, or go look at the source code.

You can look at the TiffWriter.save function in the source code (line 750) for the different keyword arguments you can use to write metadata.

One is to use description, which accepts a string. It will show up as the tag "ImageDescription" when you read your image.

Another is to use the extratags argument, which accepts a list of tuples. That allows you to write any tag name that exist in TIFF.TAGS(). One of the easiest way is to write them as strings because then you don't have to specify length.

You can also write ImageJ metadata with ijmetadata, for which the acceptable types are listed in the source code here.

As an example, if you write the following:

import json
import numpy as np
import tifffile

im = np.random.randint(0, 255, size=(150, 100), dtype=np.uint8)
# Description
description = "This is my description"
# Extratags
metadata_tag = json.dumps({"ChannelIndex": 1, "Slice": 5})
extra_tags = [("MicroManagerMetadata", 's', 0, metadata_tag, True),
              ("ProcessingSoftware", 's', 0, "my_spaghetti_code", True)]
# ImageJ metadata. 'Info' tag needs to be a string
ijinfo = {"InitialPositionList": [{"Label": "Pos1"}, {"Label": "Pos3"}]}
ijmetadata = {"Info": json.dumps(ijinfo)}
# Write file
tifffile.imsave(
    save_name,
    im,
    ijmetadata=ijmetadata,
    description=description,
    extratags=extra_tags,
)

You can see the following tags when you read the image:

frames = tifffile.TiffFile(save_name)
page = frames.pages[0]
print(page.tags["ImageDescription"].value)

Out: 'this is my description'

print(page.tags["MicroManagerMetadata"].value)

Out: {'ChannelIndex': 1, 'Slice': 5}

print(page.tags["ProcessingSoftware"].value)

Out: my_spaghetti_code



回答4:

You could try setting tags in the tag property of a TIFF image. This is an ImageFileDirectory object. See TiffImagePlugin.py.

Or, if you have libtiff installed, you can use the subprocess module to call the tiffset command to set a field in the header after you have saved the file. There are online references of available tags.

According to this page:

If one needs more than 10 private tags or so, the TIFF specification suggests that, rather then using a large amount of private tags, one should instead allocate a single private tag, define it as datatype IFD, and use it to point to a socalled 'private IFD'. In that private IFD, one can next use whatever tags one wants. These private IFD tags do not need to be properly registered with Adobe, they live in a namespace of their own, private to the particular type of IFD.

Not sure if PIL supports this, though.