GH-48961: [Docs][Python] Doctest fails on pandas 3.0 #48969

tadeja · 2026-01-23T16:31:47Z

Rationale for this change

See issue #48961
Pandas 3.0.0 string storage type changes https://github.com/pandas-dev/pandas/pull/62118/changes
and https://pandas.pydata.org/docs/whatsnew/v3.0.0.html#dedicated-string-data-type-by-default

What changes are included in this PR?

Updating several doctest examples from string to large_string.

Are these changes tested?

Yes, locally.

Are there any user-facing changes?

No.

Closes #48961

GitHub Issue: [Docs][Python] Doctest fails on pandas 3.0 #48961

AlenkaF · 2026-01-26T10:54:40Z

Thank you @tadeja for looking into this!

One question regarding the bump of the Python version in Sphinx&Numpydoc job. I think it would be good if the examples worked for users with new or old pandas version. What if we use ... (ELLIPSIS) instead of changing the string type? Or even better, we could not use pandas where possible and instead create a pyarrow table directly, like so:

arrow/python/pyarrow/table.pxi

Lines 1812 to 1814 in 95a3ed4

    
                   >>> table = pa.Table.from_arrays([[2, 4, 5, 100], 
        
                   ...                               ["Flamingo", "Horse", "Brittle stars", "Centipede"]], 
        
                   ...                               names=['n_legs', 'animals'])

rok · 2026-01-26T13:35:20Z

Agreed that it doesn't make sense for us to "test Pandas logic" especially in our docs. Agreed with @AlenkaF to instantiate the table in pyarrow. Using ellipsis in this case would hide the type and potentially increase user confusion :).

AlenkaF · 2026-01-26T14:25:46Z

Note that some examples are demonstrating conversion from pandas to pyarrow so in that case we might remove the string column and only keep the integer ones?

rok

This looks good to me now. I think (hope) removing pandas from examples that don't require streamlines things for readers.

rok · 2026-01-26T18:12:41Z

python/pyarrow/table.pxi

        day: int64
        n_legs: int64
-        animals: string
+        animals: ...string


Interesting, I wasn't aware this works.

rok · 2026-01-26T18:18:43Z

python/pyarrow/table.pxi

        animals: string
-        -- schema metadata --
-        pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, ...
        >>> reader.read_all()


I think this is a good change, just pointing out there is some interesting behavior here.

rok · 2026-01-26T22:13:40Z

@github-actions crossbow submit preview-docs

github-actions · 2026-01-26T22:15:53Z

Revision: 186c0a9

Submitted crossbow builds: ursacomputing/crossbow @ actions-ca47b1b8be

Task	Status
preview-docs

tadeja · 2026-01-27T12:03:58Z

@AlenkaF this is ready for final review.

Generated doc pages: pyarrow.Table page and pyarrow.RecordBatch
Both Sphinx jobs ran and completed doctests with success;
AMD64 Conda Python 3.12 Sphinx Documentation
pandas 3.0.0 pypi_0 pypi
================== 385 passed, 2 skipped, 1 warning in 6.24s ===================
and
AMD64 Conda Python 3.10 Sphinx & Numpydoc
pandas 2.3.3 pypi_0 pypi
======================== 385 passed, 2 skipped in 5.63s ========================
The two trivial cases where pandas 2.3.3 output expects None but pandas 3.0.0 expects NaN
1 4 None 2022.0
1 4 NaN 2022.0
get best resolved by populating pa.array with a string instead: first case and second case.
Note that I additionally removed pandas and replaced with pyarrow table for these three examples: def itercolumns, def remove_column and def join (although these are currently not causing failures as there isn't string vs. large_string in their output).
But there are more unnecessary pandas examples remaining that could be simplified in the future (num_columns, num_rows etc).

tadeja requested review from AlenkaF, raulcd and rok as code owners January 23, 2026 16:31

github-actions bot added Component: Python awaiting review Awaiting review labels Jan 23, 2026

tadeja requested review from assignUser, jonkeane and kou as code owners January 23, 2026 18:09

rok removed request for assignUser, jonkeane and kou January 26, 2026 18:06

rok approved these changes Jan 26, 2026

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jan 26, 2026

tadeja added 5 commits January 26, 2026 19:28

Fix DocTestFailure

05918e6

Fix DocTestFailure further

cf0c175

Update job Python 3.10 Sphinx & Numpydoc to 3.11

a6731cb

Update job 3.10 Sphinx & Numpydoc to 3.11

5193837

Alternative fix w/o pandas and revert CI

f224b15

tadeja force-pushed the 48961-Doctest-fails-on-pandas-3.0 branch from 736837d to f224b15 Compare January 26, 2026 18:34

Minor docs/ update to force docs_light job

186c0a9

github-actions bot added the Component: Documentation label Jan 26, 2026

apache deleted a comment from github-actions bot Jan 26, 2026

apache deleted a comment from tadeja Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-48961: [Docs][Python] Doctest fails on pandas 3.0 #48969

GH-48961: [Docs][Python] Doctest fails on pandas 3.0 #48969

tadeja commented Jan 23, 2026 •

edited

Loading

Uh oh!

AlenkaF commented Jan 26, 2026

Uh oh!

rok commented Jan 26, 2026

Uh oh!

AlenkaF commented Jan 26, 2026

Uh oh!

rok left a comment

Uh oh!

rok Jan 26, 2026

Uh oh!

AlenkaF Jan 27, 2026

Uh oh!

rok Jan 26, 2026

Uh oh!

rok commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

tadeja commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GH-48961: [Docs][Python] Doctest fails on pandas 3.0 #48969

Are you sure you want to change the base?

GH-48961: [Docs][Python] Doctest fails on pandas 3.0 #48969

Conversation

tadeja commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

AlenkaF commented Jan 26, 2026

Uh oh!

rok commented Jan 26, 2026

Uh oh!

AlenkaF commented Jan 26, 2026

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

rok Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

AlenkaF Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

rok Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

rok commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

tadeja commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tadeja commented Jan 23, 2026 •

edited

Loading