Opinions on NetCDF vs HDF5 for storing scientific

2019-03-08 09:56发布

问题:

Anyone out there have enough experience w/ NetCDF and HDF5 to give some pluses / minuses about them as a way of storing scientific data?

I've used HDF5 and would like to read/write via Java but the interface is essentially a wrapper around the C libraries, which I have found confusing, so NetCDF seems intriguing but I know almost nothing about it.

edit: my application is "only" for datalogging, so that I get a file that has a self-describing format. Important features for me are being able to add arbitrary metadata, having fast write access for appending to byte arrays, and having single-writer / multiple-reader concurrency (strongly preferred but not a must-have. NetCDF docs say they have SWMR but don't say whether they support any mechanism for ensuring that two writers can't open the same file at once with disastrous results). I like the hierarchical aspect of HDF5 (in particular I love the directed-acyclic-graph hierarchy, much more flexible than a "regular" filesystem-like hierarchy), am reading the NetCDF docs now... if it only allows one dataset per file then it probably won't work for me. :(

update — looks like NetCDF-Java reads from netCDF-4 files but only writes from netCDF-3 files which don't support hierarchical groups. darn.

update 2009-Jul-14: I am starting to get really upset with HDF5 in Java. The library available isn't that great and it has some major stumbling blocks that have to do with Java's abstraction layers (compound data types). A great file format for C but looks like I just lose. >:(

回答1:

I strongly suggest you HDF5 instead of NetCDF. NetCDF is flat, and it gets very dirty after a while if you are not able to classify stuff. Of course classification is also a matter of debate, but at least you have this flexibility.

We performed an accurate evaluation of HDF5 vs. NetCDF when I wrote Q5Cost, and the final result was for HDF5 hands down.



回答2:

I'll have to admit using HDF5 is very much easier in the long run. It's not hard to get simple data structures into NetCDF format, but manipulating them down the road is kind of a pain.

The "H" in HDF5 stands for "heirarchical", which translated (for me anyway) into a REALLY easy way to manipulate data, by just moving nodes around and referencing nodes from other places.

Can I ask what kind of project this is? I use these both for a lot of HPC scientific modeling tasks. Can I assume you're doing the same? If so, the trend I'm seeing is people moving to HDF5, but that might be different in your particular domain.

However you end up going, best of luck!



回答3:

NetCDF, starting with version 4.0 (2008) can read and write most HDF5 files, and provides access to the hierarchical features of HDF5 via the enhanced data model.

HDF5 is extremely feature-rich, and has some great performance features.

NetCDF has a simpler API, and a much wider tool base. There are many tools that handle netCDF data.



回答4:

I know this is an older post, and the original poster has indicated they've moved on, but for anyone that ends up here...the netCDF-Java library (as of 4.3.13) has netCDF-4 write support via the netCDF C library. It's still in beta, but it does work and feedback is certainly appreciated!

Please see the netCDF-Java reference docs for more details.



回答5:

Try writing some small sample application in each, and compare the experience. If future scalability of your code to parallel execution (via MPI or the like) is important to you, I know that HDF has a parallel implementation, which people are constantly working to improve. I'm not sure about NetCDF.

Late edit: For NetCDF, there is now Parallel NetCDF from Argonne. It works quite well, and the development team is quite active in improving it further.



回答6:

1) Netcdf-4 C library is a layer on top of HDF-5 C library. The API is considered simpler than the HDF5 library, but in the end you have pretty much the same functionality. Netcdf does not support graphs, but HDF5 does. In fact, HDF does not prevent cycles in your graph i think.

2) the HDF group has a Java API on top of HDF-5 C library.

3) Unidata has Netcdf-Java library which is pure Java, but can only read HDF-5.



回答7:

NetCDF, which translates HDF5 into its own data model, looks and works great... until you find out that NetCDF doesn't support unsigned values! See also my question on how to detect unsigned values in existing HDF5 files using NetCDF.

Update: Actually, it turns out that although NetCDF-3 doesn't support signed values, NetCDF-4 supports signed values, even though the NetCDF API in Java for determining signedness is a little convoluted.



标签: hdf5 netcdf