HDF5.NET

For years, our company has built a sophisticated simulations management system for one of our clients. We take a traditional high-performance-computing simulation built for Linux clusters and expose it to users essentially as a service through a web application. Users can go to what appears to be a global database of simulations, create scenarios, view outputs, and execute simulations transparently, while we manage all the details of data management, job scheduling, file format translation, and visualization.

Like many HPC simulations, this one uses the fantastic HDF5 library for storing huge amounts of simulation data on disk. (Inputs for this simulation are just a few hundred megabytes, but outputs are between 1 gigabyte and 2 terabytes for typical projects. A Palladium-led architecture review recommended updates in the schema used in these files leading to 10x performance improvements, but that is another story for another day). HDF5 is a file format well suited to very large arrays of data.

The web application to manage these simulations is written in .NET and hosted on Windows machines. .NET is an important part of the ecosystem for enterprise applications at this client, and a good choice for web development, but it’s a little uncommon in the world of scientific programming. There were no great .NET clients for reading HDF5 data — and in fact the .NET libraries written by the HDF group were unusable.

We wrote our own thin HDF5 wrappers to access data with minimal memory allocation and copies. One of .NET’s great strengths is its powerful P/Invoke mechanism for calling into C APIs. But constructing all those API wrappers was painstaking and error prone. Like many companies, our client is a heavy user of open source code but does not allow its staff or consultants to contribute their work back to the community, so crowd-sourcing the effort wasn’t an option.

So it’s with great excitement I read today that the HDF5 group has rebooted their efforts to provide a great HDF5 experience in .NET. They will be focusing on reproducing faithfully and efficiently their C API in C#, leaving the more idiomatic .NET patterns to others.

Cynics might argue that it would be best if The HDF Group stopped altogether providing HDF5 APIs in languages other than C and FORTRAN. The mixed record of the HDF Group’s attempts and the success of the Python family of interfaces (PyTables, h5py, pandas) lend some credibility to that argument. Other communities have made a deliberate decision: ”So, in ZeroMQ, we aimed to make it easy to write bindings on top of the core library, and we stopped trying to make those bindings ourselves.” (Peter Hintjens, ZeroMQ, p. 334)

Our goal is not to settle this question (for HDF5), but to develop a proposal for a .NET facility on top of the core HDF5 library, which would make it easy to write more specialized or more high-level .NET APIs.

It looks like the good folks at ILNumerics.NET are involved in the effort, and will no doubt take a stab at these more high-level APIs.

If working with scientific data and .NET is something your company does, we’d love to hear about it and see how we can help. From HPC programming to managing large datasets to advanced visualization, turning simulations into products what we do.

Leave a Reply