Add support for multiple and 2>n dimensional HDF5 Datasets #2394
+1,821
−365
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to fix and optimize the HDF5Reader implementation from systemds with the goal of being able to correctly read the So2Sat LCZ42 dataset (https://mediatum.ub.tum.de/1454690) .
For this I added support for the filter pipeline and attribute message types from HDF5 ;
n dimensional matrices with n>2 are flattened into 2d .
I also added support for inferring hdf5 from the .h5 file extension.
Apologies for the massive PR 🤕 .
I benchmarked the performance of the new implementation and shared results in this repo:
https://github.com/luccadibe/systemds-hdf5-reader-benchmark
The code still needs some work regarding code style and formatting ( I am not sure if I set up the fomatter correctly as mentioned in the CONTRIBUTING.md ; in some files I was getting a huge diff so I tried to format only what I touched).
I am unsure about how to best split this into multiple PRs , or if that is wanted even.
I would appreciate some general feedback on this.