Skip to content

Conversation

@luccadibe
Copy link

This PR aims to fix and optimize the HDF5Reader implementation from systemds with the goal of being able to correctly read the So2Sat LCZ42 dataset (https://mediatum.ub.tum.de/1454690) .

For this I added support for the filter pipeline and attribute message types from HDF5 ;
n dimensional matrices with n>2 are flattened into 2d .
I also added support for inferring hdf5 from the .h5 file extension.

Apologies for the massive PR 🤕 .
I benchmarked the performance of the new implementation and shared results in this repo:
https://github.com/luccadibe/systemds-hdf5-reader-benchmark

The code still needs some work regarding code style and formatting ( I am not sure if I set up the fomatter correctly as mentioned in the CONTRIBUTING.md ; in some files I was getting a huge diff so I tried to format only what I touched).

I am unsure about how to best split this into multiple PRs , or if that is wanted even.
I would appreciate some general feedback on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant