Add vectorized array_indices_to_chunk_dim to eliminate loops#5
Open
maxrjones wants to merge 1 commit intojhamman:feature/rectilinear-chunk-gridfrom
Open
Add vectorized array_indices_to_chunk_dim to eliminate loops#5maxrjones wants to merge 1 commit intojhamman:feature/rectilinear-chunk-gridfrom
maxrjones wants to merge 1 commit intojhamman:feature/rectilinear-chunk-gridfrom
Conversation
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
zarr-developers#3534 refactored
IntArrayDimIndexerandCoordinateIndexerto usechunk_grid.array_index_to_chunk_coord()for mapping array indices to chunk coordinates. This replaced the old vectorizeddim_sel // dim_chunk_lenwith a Python for-loop callingarray_index_to_chunk_coordper element, which introduces significant overhead for both regular and rectilinear chunk grids.This PR adds
array_indices_to_chunk_dim()— a vectorized method that maps an entire array of indices to chunk coordinates along a single dimension:RegularChunkGrid:indices // chunk_size(single numpy operation, O(n))RectilinearChunkGrid:np.searchsorted(cumsum, indices, side='right') - 1(vectorized binary search, O(n log m))The two hot loops in
indexing.pyare replaced with one-line calls to this method, restoring the original performance characteristics for regular chunks while providing efficient vectorized indexing for rectilinear chunks.New tests
test_chunk_grids/test_common.py— 4 tests for_is_nested_sequence()covering basic sequences, non-sequences,ChunkGridinstances, and empty iterablestest_chunk_grids/test_rectilinear.py— 5 end-to-end indexing tests for rectilinear arrays: slice at exact chunk boundaries, strided slicing, fancy indexing withoindex/vindex, boolean masks, andblock indexing
test_indexing.py— TestRectilinearIndexing class with 16 tests covering basic selection (1D/2D, strided), orthogonal selection (boolean/integer, mixed), coordinate/vindex selection, block selection,and set selection on rectilinear arrays with
chunks=[[5, 10, 15], [10, 20, 30, 40]]