From the h5py docs, I see that I can cast a HDF dataset as another type using astype
method for the datasets. This returns a contextmanager which performs the conversion on-the-fly.
However, I would like to read in a dataset stored as uint16
and then cast it into float32
type. Thereafter, I would like to extract various slices from this dataset in a different function as the cast type float32
. The docs explains the use as
with dataset.astype('float32'):
castdata = dataset[:]
This would cause the entire dataset to be read in and converted to float32
, which is not what I want. I would like to have a reference to the dataset, but cast as a float32
equivalent to numpy.astype
. How do I create a reference to the .astype('float32')
object so that I can pass it to another function for use?
An example:
import h5py as HDF
import numpy as np
intdata = (100*np.random.random(10)).astype('uint16')
# create the HDF dataset
def get_dataset_as_float():
hf = HDF.File('data.h5', 'w')
d = hf.create_dataset('data', data=intdata)
print(d.dtype)
# uint16
with d.astype('float32'):
# This won't work since the context expires. Returns a uint16 dataset reference
return d
# this works but causes the entire dataset to be read & converted
# with d.astype('float32'):
# return d[:]
Furthermore, it seems like the astype context only applies when the data elements are accessed. This means that
def use_data():
d = get_data_as_float()
# this is a uint16 dataset
# try to use it as a float32
with d.astype('float32'):
print(np.max(d)) # --> output is uint16
print(np.max(d[:])) # --> output is float32, but entire data is loaded
So is there not a numpy-esque way of using astype?
d.astype()
returns anAstypeContext
object. If you look at the source forAstypeContext
you'll get a better idea of what's going on:When you enter the
AstypeContext
, the._local.astype
attribute of your dataset gets updated to the new desired type, and when you exit the context it gets changed back to its original value.You can therefore get more or less the behaviour you're looking for like this:
When you now read from
d_new
, you will getfloat32
numpy arrays back rather thanuint16
:Note that this doesn't update the
.dtype
attribute ofd_new
(which seems to be immutable). If you also wanted to change thedtype
attribute, you'd probably need to subclassh5py.Dataset
in order to do so.The docs of
astype
seem to imply reading it all into a new location is its purpose. Thus yourreturn d[:]
is the most reasonable if you are to reuse the float-casting with many functions at seperate occasions.If you know what you need the casting for and only need it once, you could switch things around and do something like:
In any case, you want to make sure that
hf
is closed before leaving the function or else you will run into problems later on.In general, I would suggest separating the casting and the loading/creating of the data-set entirely and passing the dataset as one of the function's parameters.
Above can be called as follows: