Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Dec 22, 2025

Rationale for this change

This is the second in series of PRs adding type annotations to pyarrow and resolving #32609. It builds on top of and should be merged after #48618.

What changes are included in this PR?

This adds:

  • _types.pyi - Core type definitions including
  • _stubs_typing.pyi - Internal typing protocols and helpers used across stub files
  • error.pyi - Exception classes (ArrowException, ArrowInvalid, ArrowIOError, etc.)
  • Minimal placeholder stubs - lib.pyi, io.pyi, scalar.pyi - using __getattr__ to allow imports to resolve while deferring to subsequent PRs

Are these changes tested?

Via CI type checks established in #48618.

Are there any user-facing changes?

Users will start seeing some minimal annotated types.

Comment on lines +49 to +60
Mask: TypeAlias = (
Sequence[bool | None]
| NDArray[np.bool_]
| BooleanArray
| ChunkedArray[Any]
)
Indices: TypeAlias = (
Sequence[int | None]
| NDArray[np.integer[Any]]
| IntegerArray
| ChunkedArray[Any]
)
Copy link

@dangotbanned dangotbanned Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the most exciting suggestion, but it's something that constantly frustrates me 😅

Suggested change
Mask: TypeAlias = (
Sequence[bool | None]
| NDArray[np.bool_]
| BooleanArray
| ChunkedArray[Any]
)
Indices: TypeAlias = (
Sequence[int | None]
| NDArray[np.integer[Any]]
| IntegerArray
| ChunkedArray[Any]
)
from pyarrow import lib
IntegerType: TypeAlias = (
lib.Int8Type
| lib.Int16Type
| lib.Int32Type
| lib.Int64Type
| lib.UInt8Type
| lib.UInt16Type
| lib.UInt32Type
| lib.UInt64Type
)
Mask: TypeAlias = (
Sequence[bool | None]
| NDArray[np.bool_]
| lib.Array[lib.Scalar[lib.BoolType]]
| ChunkedArray[Any]
)
Indices: TypeAlias = (
Sequence[int | None]
| NDArray[np.integer[Any]]
| lib.Array[lib.Scalar[IntegerType]]
| ChunkedArray[Any]
)

An alternative would just be Array[Any].
Using the concrete subclasses requires the stubs to do a carefully choreographed dance, or the user to typing.cast everywhere - sadly

@rok rok force-pushed the pyarrow-stubs-pr2-core-types branch from d3c5740 to 27d1c65 Compare January 26, 2026 12:57
@rok
Copy link
Member Author

rok commented Jan 26, 2026

I've rebased this on the annotation infra check PR (#48618) to make sure we're on the right track.

rok and others added 21 commits January 26, 2026 20:05
…d script for including docstrings into stubfiles before building wheels.

diff --git c/.github/workflows/python.yml i/.github/workflows/python.yml
index e5d3679..4ca0f9b 100644
--- c/.github/workflows/python.yml
+++ i/.github/workflows/python.yml
@@ -239,6 +239,11 @@ jobs:
       - name: Test
         shell: bash
         run: ci/scripts/python_test.sh $(pwd) $(pwd)/build
+      - name: Test annotations
+        shell: bash
+        env:
+          PYARROW_TEST_ANNOTATIONS: "ON"
+        run: ci/scripts/python_test_type_annotations.sh $(pwd)/python

   windows:
     name: AMD64 Windows 2022 Python 3.13
@@ -296,3 +301,7 @@ jobs:
         shell: cmd
         run: |
           call "ci\scripts\python_test.bat" %cd%
+      - name: Test annotations
+        shell: cmd
+        run: |
+          call "ci\scripts\python_test_type_annotations.bat" %cd%\python
diff --git c/ci/scripts/python_test_type_annotations.bat i/ci/scripts/python_test_type_annotations.bat
new file mode 100644
index 0000000000..3446e32
--- /dev/null
+++ i/ci/scripts/python_test_type_annotations.bat
@@ -0,0 +1,38 @@
+@Rem Licensed to the Apache Software Foundation (ASF) under one
+@Rem or more contributor license agreements.  See the NOTICE file
+@Rem distributed with this work for additional information
+@Rem regarding copyright ownership.  The ASF licenses this file
+@Rem to you under the Apache License, Version 2.0 (the
+@Rem "License"); you may not use this file except in compliance
+@Rem with the License.  You may obtain a copy of the License at
+@Rem
+@Rem   http://www.apache.org/licenses/LICENSE-2.0
+@Rem
+@Rem Unless required by applicable law or agreed to in writing,
+@Rem software distributed under the License is distributed on an
+@Rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+@Rem KIND, either express or implied.  See the License for the
+@Rem specific language governing permissions and limitations
+@Rem under the License.
+
+@echo on
+
+set PYARROW_DIR=%1
+
+echo Annotation testing on Windows ...
+
+@Rem Install library stubs
+%PYTHON_CMD% -m pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil || exit /B 1
+
+@Rem Install other dependencies for type checking
+%PYTHON_CMD% -m pip install fsspec || exit /B 1
+
+@Rem Install type checkers
+%PYTHON_CMD% -m pip install mypy pyright ty || exit /B 1
+
+@Rem Run type checkers
+pushd %PYARROW_DIR%
+
+mypy
+pyright
+ty check
diff --git c/ci/scripts/python_test_type_annotations.sh i/ci/scripts/python_test_type_annotations.sh
new file mode 100755
index 0000000000..82610ce
--- /dev/null
+++ i/ci/scripts/python_test_type_annotations.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -ex
+pyarrow_dir=${1}
+
+if [ "${PYARROW_TEST_ANNOTATIONS}" == "ON" ]; then
+  # Install library stubs
+  pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil
+
+  # Install type checkers
+  pip install mypy pyright ty
+
+  # Install other dependencies for type checking
+  pip install fsspec
+
+  # Run type checkers
+  pushd ${pyarrow_dir}
+  mypy
+  pyright
+  ty check;
+else
+  echo "Skipping type annotation tests";
+fi
diff --git c/ci/scripts/python_wheel_macos_build.sh i/ci/scripts/python_wheel_macos_build.sh
index bd61154..b64eee6 100755
--- c/ci/scripts/python_wheel_macos_build.sh
+++ i/ci/scripts/python_wheel_macos_build.sh
@@ -177,6 +177,11 @@ export CMAKE_PREFIX_PATH=${build_dir}/install
 export SETUPTOOLS_SCM_PRETEND_VERSION=${PYARROW_VERSION}

 pushd ${source_dir}/python
+# We first populate stub docstrings and then build the wheel
+python setup.py build_ext --inplace
+python -m pip install griffe libcst
+python ../dev/update_stub_docstrings.py pyarrow-stubs
+
 python setup.py bdist_wheel
 popd

diff --git c/ci/scripts/python_wheel_validate_contents.py i/ci/scripts/python_wheel_validate_contents.py
index 84fcaba..ee4a31a 100644
--- c/ci/scripts/python_wheel_validate_contents.py
+++ i/ci/scripts/python_wheel_validate_contents.py
@@ -35,6 +35,11 @@ def validate_wheel(path):
     assert not outliers, f"Unexpected contents in wheel: {sorted(outliers)}"
     print(f"The wheel: {wheels[0]} seems valid.")

+    candidates = [info for info in f.filelist if info.filename.endswith('compute.pyi')]
+    assert candidates, "compute.pyi not found in wheel"
+    content = f.read(candidates[0]).decode('utf-8', errors='replace')
+    assert '"""' in content, "compute.pyi missing docstrings (no triple quotes found)"
+

 def main():
     parser = argparse.ArgumentParser()
diff --git c/ci/scripts/python_wheel_windows_build.bat i/ci/scripts/python_wheel_windows_build.bat
index b4b7fed..3da7f60 100644
--- c/ci/scripts/python_wheel_windows_build.bat
+++ i/ci/scripts/python_wheel_windows_build.bat
@@ -135,6 +135,11 @@ pushd C:\arrow\python
 @Rem Build wheel
 %PYTHON_CMD% setup.py bdist_wheel || exit /B 1

+@Rem We first populate stub docstrings and then build the wheel
+%PYTHON_CMD% setup.py build_ext --inplace
+%PYTHON_CMD% -m pip install griffe libcst
+%PYTHON_CMD% ..\dev\update_stub_docstrings.py pyarrow-stubs
+
 @Rem Repair the wheel with delvewheel
 @Rem
 @Rem Since we bundled the Arrow C++ libraries ourselves, we only need to
diff --git c/ci/scripts/python_wheel_xlinux_build.sh i/ci/scripts/python_wheel_xlinux_build.sh
index a3fbeb3..977ef64 100755
--- c/ci/scripts/python_wheel_xlinux_build.sh
+++ i/ci/scripts/python_wheel_xlinux_build.sh
@@ -167,6 +167,11 @@ export ARROW_HOME=/tmp/arrow-dist
 export CMAKE_PREFIX_PATH=/tmp/arrow-dist

 pushd /arrow/python
+# We first populate stub docstrings and then build the wheel
+python setup.py build_ext --inplace
+python -m pip install griffe libcst
+python ../dev/update_stub_docstrings.py pyarrow-stubs
+
 python setup.py bdist_wheel

 echo "=== Strip symbols from wheel ==="
diff --git c/compose.yaml i/compose.yaml
index 2bd38a3..ae0a1d4 100644
--- c/compose.yaml
+++ i/compose.yaml
@@ -919,12 +919,14 @@ services:
     environment:
       <<: [*common, *ccache, *sccache]
       PYTEST_ARGS:  # inherit
+      PYARROW_TEST_ANNOTATIONS: "ON"
     volumes: *conda-volumes
     command: &python-conda-command
       ["
         /arrow/ci/scripts/cpp_build.sh /arrow /build &&
         /arrow/ci/scripts/python_build.sh /arrow /build &&
-        /arrow/ci/scripts/python_test.sh /arrow"]
+        /arrow/ci/scripts/python_test.sh /arrow &&
+        /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"]

   conda-python-emscripten:
     # Usage:
@@ -1001,6 +1003,7 @@ services:
       ARROW_S3: "OFF"
       ARROW_SUBSTRAIT: "OFF"
       ARROW_WITH_OPENTELEMETRY: "OFF"
+      PYARROW_TEST_ANNOTATIONS: "ON"
       SETUPTOOLS_SCM_PRETEND_VERSION:
     volumes: *ubuntu-volumes
     deploy: *cuda-deploy
@@ -1008,7 +1011,8 @@ services:
       /bin/bash -c "
         /arrow/ci/scripts/cpp_build.sh /arrow /build &&
         /arrow/ci/scripts/python_build.sh /arrow /build &&
-        /arrow/ci/scripts/python_test.sh /arrow"
+        /arrow/ci/scripts/python_test.sh /arrow &&
+        /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"

   debian-python:
     # Usage:
@@ -1500,6 +1504,7 @@ services:
         python: ${PYTHON}
     shm_size: *shm-size
     environment:
+      PYARROW_TEST_ANNOTATIONS: "ON"
       <<: [*common, *ccache, *sccache]
       PARQUET_REQUIRE_ENCRYPTION:  # inherit
       HYPOTHESIS_PROFILE:  # inherit
@@ -1510,7 +1515,8 @@ services:
         /arrow/ci/scripts/cpp_build.sh /arrow /build &&
         /arrow/ci/scripts/python_build.sh /arrow /build &&
         mamba uninstall -y numpy &&
-        /arrow/ci/scripts/python_test.sh /arrow"]
+        /arrow/ci/scripts/python_test.sh /arrow &&
+        /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"]

   conda-python-docs:
     # Usage:
@@ -1530,13 +1536,15 @@ services:
       BUILD_DOCS_CPP: "ON"
       BUILD_DOCS_PYTHON: "ON"
       PYTEST_ARGS: "--doctest-modules --doctest-cython"
+      PYARROW_TEST_ANNOTATIONS: "ON"
     volumes: *conda-volumes
     command:
       ["/arrow/ci/scripts/cpp_build.sh /arrow /build &&
         /arrow/ci/scripts/python_build.sh /arrow /build &&
         pip install -e /arrow/dev/archery[numpydoc] &&
         archery numpydoc --allow-rule GL10,PR01,PR03,PR04,PR05,PR10,RT03,YD01 &&
-        /arrow/ci/scripts/python_test.sh /arrow"]
+        /arrow/ci/scripts/python_test.sh /arrow &&
+        /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"]

   conda-python-dask:
     # Possible $DASK parameters:
diff --git c/docs/source/developers/python/development.rst i/docs/source/developers/python/development.rst
index d03b243..c23891e 100644
--- c/docs/source/developers/python/development.rst
+++ i/docs/source/developers/python/development.rst
@@ -42,7 +42,7 @@ Unit Testing
 ============

 We are using `pytest <https://docs.pytest.org/en/latest/>`_ to develop our unit
-test suite. After `building the project <build_pyarrow>`_ you can run its unit tests
+test suite. After `building the project <building.html>`_ you can run its unit tests
 like so:

 .. code-block::
@@ -101,6 +101,74 @@ The test groups currently include:
 * ``s3``: Tests for Amazon S3
 * ``tensorflow``: Tests that involve TensorFlow

+Type Checking
+=============
+
+PyArrow provides type stubs (``*.pyi`` files) for static type checking. These
+stubs are located in the ``pyarrow-stubs/`` directory and are automatically
+included in the distributed wheel packages.
+
+Running Type Checkers
+---------------------
+
+We support multiple type checkers. Their configurations are in
+``pyproject.toml``.
+
+**mypy**
+
+To run mypy on the PyArrow codebase:
+
+.. code-block::
+
+   $ cd arrow/python
+   $ mypy
+
+The mypy configuration is in the ``[tool.mypy]`` section of ``pyproject.toml``.
+
+**pyright**
+
+To run pyright:
+
+.. code-block::
+
+   $ cd arrow/python
+   $ pyright
+
+The pyright configuration is in the ``[tool.pyright]`` section of ``pyproject.toml``.
+
+**ty**
+
+To run ty (note: currently only partially configured):
+
+.. code-block::
+
+   $ cd arrow/python
+   $ ty check
+
+Maintaining Type Stubs
+-----------------------
+
+Type stubs for PyArrow are maintained in the ``pyarrow-stubs/``
+directory. These stubs mirror the structure of the main ``pyarrow/`` package.
+
+When adding or modifying public APIs:
+
+1. **Update the corresponding ``.pyi`` stub file** in ``pyarrow-stubs/``
+   to reflect the new or changed function/class signatures.
+
+2. **Include type annotations** where possible. For Cython modules or
+   dynamically generated APIs such as compute kernels add the corresponding
+   stub in ``pyarrow-stubs/``.
+
+3. **Run type checkers** to ensure the stubs are correct and complete.
+
+The stub files are automatically copied into the built wheel during the build
+process and will be included when users install PyArrow, enabling type checking
+in downstream projects and for users' IDEs.
+
+Note: ``py.typed`` marker file in the ``pyarrow/`` directory indicates to type
+checkers that PyArrow supports type checking according to :pep:`561`.
+
 Doctest
 =======

diff --git c/python/MANIFEST.in i/python/MANIFEST.in
index ed7012e..2840ba7 100644
--- c/python/MANIFEST.in
+++ i/python/MANIFEST.in
@@ -4,6 +4,7 @@ include ../NOTICE.txt

 global-include CMakeLists.txt
 graft pyarrow
+graft pyarrow-stubs
 graft cmake_modules

 global-exclude *.so
diff --git c/python/pyarrow-stubs/pyarrow/__init__.pyi i/python/pyarrow-stubs/pyarrow/__init__.pyi
new file mode 100644
index 0000000000..2a68a51
--- /dev/null
+++ i/python/pyarrow-stubs/pyarrow/__init__.pyi
@@ -0,0 +1,26 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Type stubs for PyArrow.
+
+This is a placeholder stub file.
+Complete type annotations will be added in subsequent PRs.
+"""
+
+from typing import Any
+
+def __getattr__(name: str) -> Any: ...
diff --git c/python/pyarrow/py.typed i/python/pyarrow/py.typed
new file mode 100644
index 0000000000..13a8339
--- /dev/null
+++ i/python/pyarrow/py.typed
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
diff --git c/python/pyproject.toml i/python/pyproject.toml
index 899144d..9f62f02 100644
--- c/python/pyproject.toml
+++ i/python/pyproject.toml
@@ -84,11 +84,11 @@ zip-safe=false
 include-package-data=true

 [tool.setuptools.packages.find]
-include = ["pyarrow"]
+include = ["pyarrow", "pyarrow.*"]
 namespaces = false

 [tool.setuptools.package-data]
-pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd"]
+pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd", "py.typed"]

 [tool.setuptools_scm]
 root = '..'
@@ -96,3 +96,39 @@ version_file = 'pyarrow/_generated_version.py'
 version_scheme = 'guess-next-dev'
 git_describe_command = 'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"'
 fallback_version = '24.0.0a0'
+
+# TODO: Enable type checking once stubs are merged
+[tool.mypy]
+files = ["pyarrow-stubs"]
+mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs"
+exclude = [
+    "^pyarrow/",
+    "^benchmarks/",
+    "^examples/",
+    "^scripts/",
+]
+
+# TODO: Enable type checking once stubs are merged
+[tool.pyright]
+pythonPlatform = "All"
+pythonVersion = "3.10"
+include = ["pyarrow-stubs"]
+exclude = [
+    "pyarrow",
+    "benchmarks",
+    "examples",
+    "scripts",
+    "build",
+]
+stubPath = "pyarrow-stubs"
+typeCheckingMode = "basic"
+
+# TODO: Enable type checking once stubs are merged
+[tool.ty.src]
+include = ["pyarrow-stubs"]
+exclude = [
+    "pyarrow",
+    "benchmarks",
+    "examples",
+    "scripts",
+]
diff --git c/python/setup.py i/python/setup.py
index a27bd3b..a25d2d7 100755
--- c/python/setup.py
+++ i/python/setup.py
@@ -121,8 +121,35 @@ class build_ext(_build_ext):

     def run(self):
         self._run_cmake()
+        self._copy_stubs()
         _build_ext.run(self)

+    def _copy_stubs(self):
+        """Copy .pyi stub files from pyarrow-stubs to the build directory."""
+        build_cmd = self.get_finalized_command('build')
+        build_lib = os.path.abspath(build_cmd.build_lib)
+
+        stubs_src = pjoin(setup_dir, 'pyarrow-stubs', 'pyarrow')
+        stubs_dest = pjoin(build_lib, 'pyarrow')
+
+        if os.path.exists(stubs_src):
+            print(f"-- Copying stub files from {stubs_src} to {stubs_dest}")
+            for root, dirs, files in os.walk(stubs_src):
+                # Calculate relative path from stubs_src
+                rel_dir = os.path.relpath(root, stubs_src)
+                dest_dir = pjoin(stubs_dest, rel_dir) if rel_dir != '.' else stubs_dest
+
+                # Create destination directory if needed
+                if not os.path.exists(dest_dir):
+                    os.makedirs(dest_dir)
+
+                # Copy .pyi files
+                for file in files:
+                    if file.endswith('.pyi'):
+                        src_file = pjoin(root, file)
+                        dest_file = pjoin(dest_dir, file)
+                        shutil.copy2(src_file, dest_file)
+
     # adapted from cmake_build_ext in dynd-python
     # github.com/libdynd/dynd-python
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
change bat
lint
add a popd and nicer logging for windows
ReplaceElipsis -> DocstringInserter
simplify
remove sphinx
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@rok rok force-pushed the pyarrow-stubs-pr2-core-types branch from 27d1c65 to 3f9ed3b Compare January 26, 2026 19:06
@rok rok force-pushed the pyarrow-stubs-pr2-core-types branch from 3f9ed3b to 0ac95b0 Compare January 26, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants