Skip to content

Conversation

@Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 6, 2025

  • Closes cumsum drops index coordinates #6528
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

The non-flox version reduces chunksizes significantly:

x = xr.DataArray([1, 1, 1, 1, 1], name="x").chunk()
grp_idx = xr.DataArray([-1, 0, 0, -1, 1])
with xr.set_options(use_flox=False):
    print(x.groupby(grp_idx).cumsum())
<xarray.DataArray 'x' (dim_0: 5)> Size: 40B
dask.array<getitem, shape=(5,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>
Dimensions without coordinates: dim_0

With flox the chunksize is retained:

x = xr.DataArray([1, 1, 1, 1, 1], name="x").chunk()
grp_idx = xr.DataArray([-1, 0, 0, -1, 1])
with xr.set_options(use_flox=True):
    print(x.groupby(grp_idx).cumsum())
<xarray.DataArray 'x' (dim_0: 5)> Size: 40B
dask.array<_finalize_scan, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>
Dimensions without coordinates: dim_0

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@github-actions github-actions bot added the topic-DataTree Related to the implementation of a DataTree class label Dec 13, 2025
**kwargs,
)

# Prefer Dataset.func(...) over Dataset.reduce(func, ...):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed

cc @TomNicholas

Copy link
Contributor Author

@Illviljan Illviljan Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of #6528 and #10987 (comment). Since the problem of Datasets dropping the coords is solved in .cumsum in this PR, DataTree must also use ds.cumsum instead of ds.reduce("cumsum") in order to avoid dropping the coordinates.

)
@pytest.mark.parametrize("func", ["cumsum", "cumprod"])
def test_reduce_cumsum_test_dims(self, reduct, expected, func) -> None:
def test_reduce_cumsum_test_dims(self, reduct, func) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also assert that coordinates & indexes are preserved please

with xr.set_options(use_flox=use_flox):
if use_dask:
ds = ds.chunk()
if use_lazy_group_idx:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-DataTree Related to the implementation of a DataTree class topic-groupby

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cumsum drops index coordinates

2 participants