How do I traverse all the groups and datasets of an hdf5 file using h5py?
I want to retrieve all the contents of the file from a common root using a for loop or something similar.
How do I traverse all the groups and datasets of an hdf5 file using h5py?
I want to retrieve all the contents of the file from a common root using a for loop or something similar.
visit()
and visititems()
are your friends here. Cf. http://docs.h5py.org/en/latest/high/group.html#Group.visit. Note that an h5py.File
is also an h5py.Group
. Example (not tested):
def visitor_func(name, node):
if isinstance(node, h5py.Dataset):
# node is a dataset
else:
# node is a group
with h5py.File('myfile.h5', 'r') as f:
f.visititems(visitor_func)
Well this is kind of an old thread but I thought I'd contribute anyway. This is what I did in a similar situation. For a data structure set up like this:
[group1]
[group2]
dataset1
dataset2
[group3]
dataset3
dataset4
I used:
datalist = []
def returnname(name):
if 'dataset' in name and name not in datalist:
return name
else:
return None
looper = 1
while looper == 1:
name = f[group1].visit(returnname)
if name == None:
looper = 0
continue
datalist.append(name)
I haven't found an h5py equivalent for os.walk.
This is a pretty old thread, but I found a solution to basically replicating the h5ls command in Python:
class H5ls:
def __init__(self):
# Store an empty list for dataset names
self.names = []
def __call__(self, name, h5obj):
# only h5py datasets have dtype attribute, so we can search on this
if hasattr(h5obj,'dtype') and not name in self.names:
self.names += [names]
# we have no return so that the visit function is recursive
if __name__ == "__main__":
df = h5py.File(filename,'r')
h5ls = H5ls()
# this will now visit all objects inside the hdf5 file and store datasets in h5ls.names
df.visititems(h5ls)
df.close()
This code will iterate through the whole HDF5 file, and store all the datasets in h5ls.names
, hope this helps!