Skip chunk coordinate enumeration in resize when array is only growing (#3650) #3702
+72
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Array.resize()enumerates all chunk coordinates ifdelete_outside_chunksisTrue(its default value), these old and new shapes are added into Python set objects to compute which chunks to delete. #3650 notes that this is totally unbounded in memory - even if the array is only growing and no chunks need deletion.In our case our array has approximately ~220 million chunks, and we want to add a new time step. Building these two sets of tuples explodes memory to >20GB and does not complete (at least deployed in a no-swap environment, I never left my laptop running long enough). Furthermore, the set difference will always be empty because growing a dimension can't produce chunks outside the new shape.
Fix
Rather than always checking which is O(total_chunks), first check if the new_shape is >= the dimensions of the old shape. If array is only growing, skip enumeration.
This is a targeted mitigation, rather than a complete solution to the problem described by 3650, a more complete solution would construct outside chunk coords from the shape diff, rather than naively enumerating all coords.
TODO:
docs/user-guide/*.mdchanges/