-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-32609: [Python] Add internal type system stubs (_types, error, _stubs_typing) #48622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Mask: TypeAlias = ( | ||
| Sequence[bool | None] | ||
| | NDArray[np.bool_] | ||
| | BooleanArray | ||
| | ChunkedArray[Any] | ||
| ) | ||
| Indices: TypeAlias = ( | ||
| Sequence[int | None] | ||
| | NDArray[np.integer[Any]] | ||
| | IntegerArray | ||
| | ChunkedArray[Any] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the most exciting suggestion, but it's something that constantly frustrates me 😅
| Mask: TypeAlias = ( | |
| Sequence[bool | None] | |
| | NDArray[np.bool_] | |
| | BooleanArray | |
| | ChunkedArray[Any] | |
| ) | |
| Indices: TypeAlias = ( | |
| Sequence[int | None] | |
| | NDArray[np.integer[Any]] | |
| | IntegerArray | |
| | ChunkedArray[Any] | |
| ) | |
| from pyarrow import lib | |
| IntegerType: TypeAlias = ( | |
| lib.Int8Type | |
| | lib.Int16Type | |
| | lib.Int32Type | |
| | lib.Int64Type | |
| | lib.UInt8Type | |
| | lib.UInt16Type | |
| | lib.UInt32Type | |
| | lib.UInt64Type | |
| ) | |
| Mask: TypeAlias = ( | |
| Sequence[bool | None] | |
| | NDArray[np.bool_] | |
| | lib.Array[lib.Scalar[lib.BoolType]] | |
| | ChunkedArray[Any] | |
| ) | |
| Indices: TypeAlias = ( | |
| Sequence[int | None] | |
| | NDArray[np.integer[Any]] | |
| | lib.Array[lib.Scalar[IntegerType]] | |
| | ChunkedArray[Any] | |
| ) |
An alternative would just be Array[Any].
Using the concrete subclasses requires the stubs to do a carefully choreographed dance, or the user to typing.cast everywhere - sadly
d3c5740 to
27d1c65
Compare
|
I've rebased this on the annotation infra check PR (#48618) to make sure we're on the right track. |
…d script for including docstrings into stubfiles before building wheels. diff --git c/.github/workflows/python.yml i/.github/workflows/python.yml index e5d3679..4ca0f9b 100644 --- c/.github/workflows/python.yml +++ i/.github/workflows/python.yml @@ -239,6 +239,11 @@ jobs: - name: Test shell: bash run: ci/scripts/python_test.sh $(pwd) $(pwd)/build + - name: Test annotations + shell: bash + env: + PYARROW_TEST_ANNOTATIONS: "ON" + run: ci/scripts/python_test_type_annotations.sh $(pwd)/python windows: name: AMD64 Windows 2022 Python 3.13 @@ -296,3 +301,7 @@ jobs: shell: cmd run: | call "ci\scripts\python_test.bat" %cd% + - name: Test annotations + shell: cmd + run: | + call "ci\scripts\python_test_type_annotations.bat" %cd%\python diff --git c/ci/scripts/python_test_type_annotations.bat i/ci/scripts/python_test_type_annotations.bat new file mode 100644 index 0000000000..3446e32 --- /dev/null +++ i/ci/scripts/python_test_type_annotations.bat @@ -0,0 +1,38 @@ +@Rem Licensed to the Apache Software Foundation (ASF) under one +@Rem or more contributor license agreements. See the NOTICE file +@Rem distributed with this work for additional information +@Rem regarding copyright ownership. The ASF licenses this file +@Rem to you under the Apache License, Version 2.0 (the +@Rem "License"); you may not use this file except in compliance +@Rem with the License. You may obtain a copy of the License at +@Rem +@Rem http://www.apache.org/licenses/LICENSE-2.0 +@Rem +@Rem Unless required by applicable law or agreed to in writing, +@Rem software distributed under the License is distributed on an +@Rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +@Rem KIND, either express or implied. See the License for the +@Rem specific language governing permissions and limitations +@Rem under the License. + +@echo on + +set PYARROW_DIR=%1 + +echo Annotation testing on Windows ... + +@Rem Install library stubs +%PYTHON_CMD% -m pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil || exit /B 1 + +@Rem Install other dependencies for type checking +%PYTHON_CMD% -m pip install fsspec || exit /B 1 + +@Rem Install type checkers +%PYTHON_CMD% -m pip install mypy pyright ty || exit /B 1 + +@Rem Run type checkers +pushd %PYARROW_DIR% + +mypy +pyright +ty check diff --git c/ci/scripts/python_test_type_annotations.sh i/ci/scripts/python_test_type_annotations.sh new file mode 100755 index 0000000000..82610ce --- /dev/null +++ i/ci/scripts/python_test_type_annotations.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -ex +pyarrow_dir=${1} + +if [ "${PYARROW_TEST_ANNOTATIONS}" == "ON" ]; then + # Install library stubs + pip install pandas-stubs scipy-stubs sphinx types-cffi types-psutil types-requests types-python-dateutil + + # Install type checkers + pip install mypy pyright ty + + # Install other dependencies for type checking + pip install fsspec + + # Run type checkers + pushd ${pyarrow_dir} + mypy + pyright + ty check; +else + echo "Skipping type annotation tests"; +fi diff --git c/ci/scripts/python_wheel_macos_build.sh i/ci/scripts/python_wheel_macos_build.sh index bd61154..b64eee6 100755 --- c/ci/scripts/python_wheel_macos_build.sh +++ i/ci/scripts/python_wheel_macos_build.sh @@ -177,6 +177,11 @@ export CMAKE_PREFIX_PATH=${build_dir}/install export SETUPTOOLS_SCM_PRETEND_VERSION=${PYARROW_VERSION} pushd ${source_dir}/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel popd diff --git c/ci/scripts/python_wheel_validate_contents.py i/ci/scripts/python_wheel_validate_contents.py index 84fcaba..ee4a31a 100644 --- c/ci/scripts/python_wheel_validate_contents.py +++ i/ci/scripts/python_wheel_validate_contents.py @@ -35,6 +35,11 @@ def validate_wheel(path): assert not outliers, f"Unexpected contents in wheel: {sorted(outliers)}" print(f"The wheel: {wheels[0]} seems valid.") + candidates = [info for info in f.filelist if info.filename.endswith('compute.pyi')] + assert candidates, "compute.pyi not found in wheel" + content = f.read(candidates[0]).decode('utf-8', errors='replace') + assert '"""' in content, "compute.pyi missing docstrings (no triple quotes found)" + def main(): parser = argparse.ArgumentParser() diff --git c/ci/scripts/python_wheel_windows_build.bat i/ci/scripts/python_wheel_windows_build.bat index b4b7fed..3da7f60 100644 --- c/ci/scripts/python_wheel_windows_build.bat +++ i/ci/scripts/python_wheel_windows_build.bat @@ -135,6 +135,11 @@ pushd C:\arrow\python @Rem Build wheel %PYTHON_CMD% setup.py bdist_wheel || exit /B 1 +@Rem We first populate stub docstrings and then build the wheel +%PYTHON_CMD% setup.py build_ext --inplace +%PYTHON_CMD% -m pip install griffe libcst +%PYTHON_CMD% ..\dev\update_stub_docstrings.py pyarrow-stubs + @Rem Repair the wheel with delvewheel @Rem @Rem Since we bundled the Arrow C++ libraries ourselves, we only need to diff --git c/ci/scripts/python_wheel_xlinux_build.sh i/ci/scripts/python_wheel_xlinux_build.sh index a3fbeb3..977ef64 100755 --- c/ci/scripts/python_wheel_xlinux_build.sh +++ i/ci/scripts/python_wheel_xlinux_build.sh @@ -167,6 +167,11 @@ export ARROW_HOME=/tmp/arrow-dist export CMAKE_PREFIX_PATH=/tmp/arrow-dist pushd /arrow/python +# We first populate stub docstrings and then build the wheel +python setup.py build_ext --inplace +python -m pip install griffe libcst +python ../dev/update_stub_docstrings.py pyarrow-stubs + python setup.py bdist_wheel echo "=== Strip symbols from wheel ===" diff --git c/compose.yaml i/compose.yaml index 2bd38a3..ae0a1d4 100644 --- c/compose.yaml +++ i/compose.yaml @@ -919,12 +919,14 @@ services: environment: <<: [*common, *ccache, *sccache] PYTEST_ARGS: # inherit + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: &python-conda-command [" /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-emscripten: # Usage: @@ -1001,6 +1003,7 @@ services: ARROW_S3: "OFF" ARROW_SUBSTRAIT: "OFF" ARROW_WITH_OPENTELEMETRY: "OFF" + PYARROW_TEST_ANNOTATIONS: "ON" SETUPTOOLS_SCM_PRETEND_VERSION: volumes: *ubuntu-volumes deploy: *cuda-deploy @@ -1008,7 +1011,8 @@ services: /bin/bash -c " /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && - /arrow/ci/scripts/python_test.sh /arrow" + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python" debian-python: # Usage: @@ -1500,6 +1504,7 @@ services: python: ${PYTHON} shm_size: *shm-size environment: + PYARROW_TEST_ANNOTATIONS: "ON" <<: [*common, *ccache, *sccache] PARQUET_REQUIRE_ENCRYPTION: # inherit HYPOTHESIS_PROFILE: # inherit @@ -1510,7 +1515,8 @@ services: /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && mamba uninstall -y numpy && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-docs: # Usage: @@ -1530,13 +1536,15 @@ services: BUILD_DOCS_CPP: "ON" BUILD_DOCS_PYTHON: "ON" PYTEST_ARGS: "--doctest-modules --doctest-cython" + PYARROW_TEST_ANNOTATIONS: "ON" volumes: *conda-volumes command: ["/arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && pip install -e /arrow/dev/archery[numpydoc] && archery numpydoc --allow-rule GL10,PR01,PR03,PR04,PR05,PR10,RT03,YD01 && - /arrow/ci/scripts/python_test.sh /arrow"] + /arrow/ci/scripts/python_test.sh /arrow && + /arrow/ci/scripts/python_test_type_annotations.sh /arrow/python"] conda-python-dask: # Possible $DASK parameters: diff --git c/docs/source/developers/python/development.rst i/docs/source/developers/python/development.rst index d03b243..c23891e 100644 --- c/docs/source/developers/python/development.rst +++ i/docs/source/developers/python/development.rst @@ -42,7 +42,7 @@ Unit Testing ============ We are using `pytest <https://docs.pytest.org/en/latest/>`_ to develop our unit -test suite. After `building the project <build_pyarrow>`_ you can run its unit tests +test suite. After `building the project <building.html>`_ you can run its unit tests like so: .. code-block:: @@ -101,6 +101,74 @@ The test groups currently include: * ``s3``: Tests for Amazon S3 * ``tensorflow``: Tests that involve TensorFlow +Type Checking +============= + +PyArrow provides type stubs (``*.pyi`` files) for static type checking. These +stubs are located in the ``pyarrow-stubs/`` directory and are automatically +included in the distributed wheel packages. + +Running Type Checkers +--------------------- + +We support multiple type checkers. Their configurations are in +``pyproject.toml``. + +**mypy** + +To run mypy on the PyArrow codebase: + +.. code-block:: + + $ cd arrow/python + $ mypy + +The mypy configuration is in the ``[tool.mypy]`` section of ``pyproject.toml``. + +**pyright** + +To run pyright: + +.. code-block:: + + $ cd arrow/python + $ pyright + +The pyright configuration is in the ``[tool.pyright]`` section of ``pyproject.toml``. + +**ty** + +To run ty (note: currently only partially configured): + +.. code-block:: + + $ cd arrow/python + $ ty check + +Maintaining Type Stubs +----------------------- + +Type stubs for PyArrow are maintained in the ``pyarrow-stubs/`` +directory. These stubs mirror the structure of the main ``pyarrow/`` package. + +When adding or modifying public APIs: + +1. **Update the corresponding ``.pyi`` stub file** in ``pyarrow-stubs/`` + to reflect the new or changed function/class signatures. + +2. **Include type annotations** where possible. For Cython modules or + dynamically generated APIs such as compute kernels add the corresponding + stub in ``pyarrow-stubs/``. + +3. **Run type checkers** to ensure the stubs are correct and complete. + +The stub files are automatically copied into the built wheel during the build +process and will be included when users install PyArrow, enabling type checking +in downstream projects and for users' IDEs. + +Note: ``py.typed`` marker file in the ``pyarrow/`` directory indicates to type +checkers that PyArrow supports type checking according to :pep:`561`. + Doctest ======= diff --git c/python/MANIFEST.in i/python/MANIFEST.in index ed7012e..2840ba7 100644 --- c/python/MANIFEST.in +++ i/python/MANIFEST.in @@ -4,6 +4,7 @@ include ../NOTICE.txt global-include CMakeLists.txt graft pyarrow +graft pyarrow-stubs graft cmake_modules global-exclude *.so diff --git c/python/pyarrow-stubs/pyarrow/__init__.pyi i/python/pyarrow-stubs/pyarrow/__init__.pyi new file mode 100644 index 0000000000..2a68a51 --- /dev/null +++ i/python/pyarrow-stubs/pyarrow/__init__.pyi @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Type stubs for PyArrow. + +This is a placeholder stub file. +Complete type annotations will be added in subsequent PRs. +""" + +from typing import Any + +def __getattr__(name: str) -> Any: ... diff --git c/python/pyarrow/py.typed i/python/pyarrow/py.typed new file mode 100644 index 0000000000..13a8339 --- /dev/null +++ i/python/pyarrow/py.typed @@ -0,0 +1,16 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. diff --git c/python/pyproject.toml i/python/pyproject.toml index 899144d..9f62f02 100644 --- c/python/pyproject.toml +++ i/python/pyproject.toml @@ -84,11 +84,11 @@ zip-safe=false include-package-data=true [tool.setuptools.packages.find] -include = ["pyarrow"] +include = ["pyarrow", "pyarrow.*"] namespaces = false [tool.setuptools.package-data] -pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd"] +pyarrow = ["*.pxd", "*.pyx", "includes/*.pxd", "py.typed"] [tool.setuptools_scm] root = '..' @@ -96,3 +96,39 @@ version_file = 'pyarrow/_generated_version.py' version_scheme = 'guess-next-dev' git_describe_command = 'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"' fallback_version = '24.0.0a0' + +# TODO: Enable type checking once stubs are merged +[tool.mypy] +files = ["pyarrow-stubs"] +mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs" +exclude = [ + "^pyarrow/", + "^benchmarks/", + "^examples/", + "^scripts/", +] + +# TODO: Enable type checking once stubs are merged +[tool.pyright] +pythonPlatform = "All" +pythonVersion = "3.10" +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", + "build", +] +stubPath = "pyarrow-stubs" +typeCheckingMode = "basic" + +# TODO: Enable type checking once stubs are merged +[tool.ty.src] +include = ["pyarrow-stubs"] +exclude = [ + "pyarrow", + "benchmarks", + "examples", + "scripts", +] diff --git c/python/setup.py i/python/setup.py index a27bd3b..a25d2d7 100755 --- c/python/setup.py +++ i/python/setup.py @@ -121,8 +121,35 @@ class build_ext(_build_ext): def run(self): self._run_cmake() + self._copy_stubs() _build_ext.run(self) + def _copy_stubs(self): + """Copy .pyi stub files from pyarrow-stubs to the build directory.""" + build_cmd = self.get_finalized_command('build') + build_lib = os.path.abspath(build_cmd.build_lib) + + stubs_src = pjoin(setup_dir, 'pyarrow-stubs', 'pyarrow') + stubs_dest = pjoin(build_lib, 'pyarrow') + + if os.path.exists(stubs_src): + print(f"-- Copying stub files from {stubs_src} to {stubs_dest}") + for root, dirs, files in os.walk(stubs_src): + # Calculate relative path from stubs_src + rel_dir = os.path.relpath(root, stubs_src) + dest_dir = pjoin(stubs_dest, rel_dir) if rel_dir != '.' else stubs_dest + + # Create destination directory if needed + if not os.path.exists(dest_dir): + os.makedirs(dest_dir) + + # Copy .pyi files + for file in files: + if file.endswith('.pyi'): + src_file = pjoin(root, file) + dest_file = pjoin(dest_dir, file) + shutil.copy2(src_file, dest_file) + # adapted from cmake_build_ext in dynd-python # github.com/libdynd/dynd-python
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
change bat lint add a popd and nicer logging for windows ReplaceElipsis -> DocstringInserter simplify remove sphinx
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
27d1c65 to
3f9ed3b
Compare
3f9ed3b to
0ac95b0
Compare
Rationale for this change
This is the second in series of PRs adding type annotations to pyarrow and resolving #32609. It builds on top of and should be merged after #48618.
What changes are included in this PR?
This adds:
_types.pyi- Core type definitions including_stubs_typing.pyi- Internal typing protocols and helpers used across stub fileserror.pyi- Exception classes (ArrowException,ArrowInvalid,ArrowIOError, etc.)lib.pyi,io.pyi,scalar.pyi- using__getattr__to allow imports to resolve while deferring to subsequent PRsAre these changes tested?
Via CI type checks established in #48618.
Are there any user-facing changes?
Users will start seeing some minimal annotated types.