Skip to content

Conversation

@tadeja
Copy link
Contributor

@tadeja tadeja commented Jan 23, 2026

Rationale for this change

See issue #48961
Pandas 3.0.0 string storage type changes https://github.com/pandas-dev/pandas/pull/62118/changes
and https://pandas.pydata.org/docs/whatsnew/v3.0.0.html#dedicated-string-data-type-by-default

What changes are included in this PR?

Updating several doctest examples from string to large_string.

Are these changes tested?

Yes, locally.

Are there any user-facing changes?

No.

Closes #48961

@AlenkaF
Copy link
Member

AlenkaF commented Jan 26, 2026

Thank you @tadeja for looking into this!

One question regarding the bump of the Python version in Sphinx&Numpydoc job. I think it would be good if the examples worked for users with new or old pandas version. What if we use ... (ELLIPSIS) instead of changing the string type? Or even better, we could not use pandas where possible and instead create a pyarrow table directly, like so:

arrow/python/pyarrow/table.pxi

Lines 1812 to 1814 in 95a3ed4

>>> table = pa.Table.from_arrays([[2, 4, 5, 100],
... ["Flamingo", "Horse", "Brittle stars", "Centipede"]],
... names=['n_legs', 'animals'])

@rok
Copy link
Member

rok commented Jan 26, 2026

Agreed that it doesn't make sense for us to "test Pandas logic" especially in our docs. Agreed with @AlenkaF to instantiate the table in pyarrow. Using ellipsis in this case would hide the type and potentially increase user confusion :).

@AlenkaF
Copy link
Member

AlenkaF commented Jan 26, 2026

Note that some examples are demonstrating conversion from pandas to pyarrow so in that case we might remove the string column and only keep the integer ones?

@rok rok removed request for assignUser, jonkeane and kou January 26, 2026 18:06
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. I think (hope) removing pandas from examples that don't require streamlines things for readers.

day: int64
n_legs: int64
animals: string
animals: ...string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I wasn't aware this works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 5132 to 5133
animals: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, ...
>>> reader.read_all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change, just pointing out there is some interesting behavior here.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jan 26, 2026
@tadeja tadeja force-pushed the 48961-Doctest-fails-on-pandas-3.0 branch from 736837d to f224b15 Compare January 26, 2026 18:34
@rok
Copy link
Member

rok commented Jan 26, 2026

@github-actions crossbow submit preview-docs

@apache apache deleted a comment from github-actions bot Jan 26, 2026
@apache apache deleted a comment from tadeja Jan 26, 2026
@github-actions
Copy link

Revision: 186c0a9

Submitted crossbow builds: ursacomputing/crossbow @ actions-ca47b1b8be

Task Status
preview-docs GitHub Actions

@tadeja
Copy link
Contributor Author

tadeja commented Jan 27, 2026

@AlenkaF this is ready for final review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Docs][Python] Doctest fails on pandas 3.0

3 participants