-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Change function calls to use vectorcall #5948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This reverts commit 418a034.
It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member.
This would be easier with `if constexpr`
|
@b-pass I merged master, which had no conflicts. My idea was that maybe Cursor can figure out the root cause for these two CI failures:
I'll wait for this CI run to finish, to see if these errors appear again. When testing locally with Python 3.14.2, default and freethreaded, I saw this error (with both builds): EDIT: Please see #5948 (comment) for the resolution. I'm attaching the full build & test logs, JIC. I switched back to master, everything else being equal, this error didn't appear. So it's definitely somehow connected to this PR. pybind11_gcc_v3.14_df793163d58_default_tests_log_2025-12-26+193154.txt pybind11_gcc_v3.14_df793163d58_freethreaded_log_2025-12-26+193608.txt |
The `mod_per_interpreter_gil`, `mod_shared_interpreter_gil`, and `mod_per_interpreter_gil_with_singleton` modules were being built but not installed into the wheel when using scikit-build-core (SKBUILD=true). This caused iOS (and potentially Android) CIBW tests to fail with ModuleNotFoundError. Root cause analysis: - The main test targets have install() commands (line 531) - The PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES were missing equivalent install() commands - For regular CMake builds, this wasn't a problem because LIBRARY_OUTPUT_DIRECTORY places the modules next to pybind11_tests - For wheel builds, only targets with explicit install() commands are included in the wheel This issue was latent until commit fee2527 changed the test imports from `pytest.importorskip()` (graceful skip) to direct `import` statements (hard failure), which exposed the missing modules. Failing tests: - test_multiple_interpreters.py::test_independent_subinterpreters - test_multiple_interpreters.py::test_dependent_subinterpreters Error: ModuleNotFoundError: No module named 'mod_per_interpreter_gil'
|
Quick update: I think Cursor found the root cause for the CIBW / iOS failures, and a fix. I'm still looking into why we're not seeing the |
Add numpy==2.4.0 requirement for Python 3.14 (both default and free-threaded builds). NumPy 2.4.0 is the first version to provide official PyPI wheels for Python 3.14: - numpy-2.4.0-cp314-cp314-manylinux_2_27_x86_64...whl (default) - numpy-2.4.0-cp314-cp314t-manylinux_2_27_x86_64...whl (free-threaded) Previously, CI was skipping all numpy-dependent tests for Python 3.14 because PIP_ONLY_BINARY was set and no wheels were available: SKIPPED [...] test_numpy_array.py:8: could not import 'numpy': No module named 'numpy' With this change, the full numpy test suite will run on Python 3.14, providing better test coverage for the newest Python version. Note: Using exact pin (==2.4.0) rather than compatible release (~=2.4.0) to ensure reproducible CI results with the first known-working version.
NumPy 2.3.5 vs 2.4.0 ObservationsThe IssueWhen testing this PR locally with Python 3.14 + NumPy 2.3.5, I observed a test failure: This occurred when calling What We Know For Sure
What's UndeterminedThe exact bug fix in NumPy between 2.3.5 and 2.4.0 that resolved this issue. I searched the NumPy 2.4.0 release notes but couldn't find a specific fix that clearly matches our observation. The release notes mention "many expired deprecations and bug fixes" without listing all of them individually. ResolutionAdded
|
Add `-v` to the pytest command in tests/pyproject.toml to help diagnose hanging tests in CIBW jobs (particularly iOS). This will show each test name as it runs, making it easier to identify which specific test is hanging.
…ed since 3.8. This means getting rid of simple_collector, we can do the same with a constexpr if in the unpacking_collector.
This makes more sense and saves a sort, and the small_vector implementation means it will actually take less space than a vector of size_t elements. The most common case is that all kwargs are used.
Pyodide runs in a WebAssembly sandbox without POSIX signals, so `signal.setitimer` is not available. This causes pytest-timeout to crash with `AttributeError: module 'signal' has no attribute 'setitimer'` when timeout > 0. Override the test-command for Pyodide to keep timeout=0 (disabled).
Review of Changes Since Commit 849df0b1. 849df0b - argument_vector cleanupsAddressed @Skylion007's review comments:
2. b517942 - Unified to vectorcall for all Python versionsThis addressed the review question about why Net result: -221 lines, +55 lines — a major simplification! 3. b4ca366 -
|
rwgk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old
simple_collectorandunpacking_collectorwere removed entirely
Awesome to see the simplification! Looks even better to me know.
|
The push_back impl isnt quite right right. The pushback has an rvalue overload too. So it needs to either have both a const ref / rvalue overload (with corresponding std::move) or take by value and always move. |
Skylion007
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a static assert the heapvector? And maybe iarray? is noexcept move constructable technically, doesn't it? Otherwise, you need some weird edge case handling for OOM like you do with std::vector
|
Random design question, why does the collector store pyobject pointers directly vs py::handle? |
It's a good bit faster at the cost of this one scary reinterpret_cast.
At 6, the struct is 144 bytes (not 128 bytes as the comment said).
Fixed.
Not sure what you mean... elaborate?
It was to avoid having to reinterpret_cast the C++ type to the C type in the Vectorcall call.... But I've consolidated both the |
Avoiding the reintrepret_cast is a valid reason, just curious. |
Skylion007
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why a bunch of list and tuple objects are being stored as generic objects? We don't do anything crazy to them and they all inherit py;:object. The only different is the implicit boolean checks maybe but those can be easily over by unwrapping the ptr any time and checking against nullptr
include/pybind11/cast.h
Outdated
| if (PyList_Append(names_list.ptr(), name.release().ptr()) < 0) { | ||
| throw error_already_set(); | ||
| } | ||
| m_args.emplace_back(a.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| m_args.emplace_back(a.value); | |
| m_args.emplace_back(std::move(a.value)); |
It's just a bit inconvenient in this class ... I can certainly add some explanation in a comment about it. EDIT: Turns out there is a way: |
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
They can be null if you "steal" a null handle.
…icit ownership Introduce `ref_small_vector` to manage PyObject* references in `unpacking_collector`, replacing the previous `small_vector<object>` approach. Primary goals: 1. **Maintainability**: The previous implementation relied on `sizeof(object) == sizeof(PyObject*)` and used a reinterpret_cast to pass the object array to vectorcall. This coupling to py::object's internal layout could break if someone adds a debug field or other member to py::handle/py::object in the future. 2. **Readability**: The new `push_back_steal()` vs `push_back_borrow()` API makes reference counting intent explicit at each call site, rather than relying on implicit py::object semantics. 3. **Intuitive code**: Storing `PyObject*` directly and passing it to `_PyObject_Vectorcall` without casts is straightforward and matches what the C API expects. No "scary" reinterpret_cast needed. Additional benefits: - `PyObject*` is trivially copyable, simplifying vector operations - Batch decref in destructor (tight loop vs N individual object destructors) - Self-documenting ownership semantics Design consideration: We considered folding the ref-counting functionality directly into `small_vector` via template specialization for `PyObject*`. We decided against this because: - It would give `small_vector<PyObject*, N>` a different interface than the generic `small_vector<T, N>` (steal/borrow vs push_back) - Someone might want a non-ref-counting `small_vector<PyObject*, N>` - The specialization behavior could surprise users expecting uniform semantics A separate `ref_small_vector` type makes the ref-counting behavior explicit and self-documenting, while keeping `small_vector` generic and predictable.
I like it. It's less code than I thought it would be. |
| if (m_names) { | ||
| nargs -= m_names.size(); | ||
| } | ||
| PyObject *result = _PyObject_Vectorcall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight nit could have been a py::handle
| /// destructor, and avoids the need for reinterpret_cast when passing to vectorcall. | ||
| template <std::size_t InlineSize> | ||
| class ref_small_vector { | ||
| public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I don't like this, but I don't see a better way of doing due to layout compatibiltiy issues with py::object. Every other method of doing this requires a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you thought of this, but if you make push_back only take in objects, you can handle the reinterpret steal and reinterpret borrow using only move and release mechanics.
| } | ||
|
|
||
| /// Add a pointer, taking ownership (no incref, will decref on destruction) | ||
| void push_back_steal(PyObject *p) { m_ptrs.push_back(p); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void push_back_steal(PyObject *p) { m_ptrs.push_back(p); } | |
| void push_back_steal(object &&o) { m_ptrs.push_back(o.release().ptr()); } |
| #endif | ||
| } | ||
| args_list.append(std::move(o)); | ||
| m_args.push_back_steal(h.ptr()); // cast returns a new reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| m_args.push_back_steal(h.ptr()); // cast returns a new reference | |
| m_args.push_back_steal(std::move(o)); // cast returns a new reference |
Then it can keep the good old object semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be normal objects that you just release. The borrow semantics are a bit more ackward unless you use move by value implicit construction. This might actually be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored what this change would look like. Here's the diff (Cursor-generated):
diff --git a/include/pybind11/cast.h b/include/pybind11/cast.h
--- a/include/pybind11/cast.h
+++ b/include/pybind11/cast.h
@@ -2279,8 +2279,9 @@ private:
// normal argument, possibly needing conversion
template <typename T>
void process(list & /*names_list*/, T &&x) {
- handle h = detail::make_caster<T>::cast(std::forward<T>(x), policy, {});
- if (!h) {
+ auto o = reinterpret_steal<object>(
+ detail::make_caster<T>::cast(std::forward<T>(x), policy, {}));
+ if (!o) {
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
throw cast_error_unable_to_convert_call_arg(std::to_string(m_args.size() - 1));
#else
@@ -2288,7 +2289,7 @@ private:
type_id<T>());
#endif
}
- m_args.push_back_steal(h.ptr()); // cast returns a new reference
+ m_args.push_back_steal(std::move(o));
}
// * unpacking
@@ -2297,7 +2298,7 @@ private:
return;
}
for (auto a : ap) {
- m_args.push_back_borrow(a.ptr());
+ m_args.push_back_borrow(a);
}
}
@@ -2328,7 +2329,7 @@ private:
#endif
}
names_list.append(std::move(name));
- m_args.push_back_borrow(a.value.ptr());
+ m_args.push_back_borrow(a.value);
}
// ** unpacking
@@ -2347,7 +2348,7 @@ private:
#endif
}
names_list.append(std::move(name));
- m_args.push_back_borrow(k.second.ptr());
+ m_args.push_back_borrow(k.second);
}
}
diff --git a/include/pybind11/detail/argument_vector.h b/include/pybind11/detail/argument_vector.h
--- a/include/pybind11/detail/argument_vector.h
+++ b/include/pybind11/detail/argument_vector.h
@@ -389,13 +389,13 @@ public:
return *this;
}
- /// Add a pointer, taking ownership (no incref, will decref on destruction)
- void push_back_steal(PyObject *p) { m_ptrs.push_back(p); }
+ /// Add an object, taking ownership (will decref on destruction)
+ void push_back_steal(object &&o) { m_ptrs.push_back(o.release().ptr()); }
- /// Add a pointer, borrowing (increfs now, will decref on destruction)
- void push_back_borrow(PyObject *p) {
- Py_XINCREF(p);
- m_ptrs.push_back(p);
+ /// Add a handle, borrowing (increfs now, will decref on destruction)
+ void push_back_borrow(handle h) {
+ Py_XINCREF(h.ptr());
+ m_ptrs.push_back(h.ptr());
}
/// Add a null pointer (for PY_VECTORCALL_ARGUMENTS_OFFSET slot)The change compiles and local tests pass. However, I'd prefer to keep the current implementation for these reasons:
-
No reduction in code size — The change is net +1 line. The
reinterpret_steal<object>(...)wrapper andstd::move()add verbosity that offsets the minor savings from dropping.ptr()at the borrow call sites. -
The current code is explicit and straightforward —
ref_small_vectoris a purpose-built container for managingPyObject*ownership. Having it accept raw pointers makes the ownership transfer explicit at the call site. When you seepush_back_steal(h.ptr()), it's immediately clear you're handing off a raw pointer to a container that will manage its lifetime. -
The abstraction doesn't simplify anything — We still need to deal with the underlying
PyObject*for the vectorcall API. Wrapping the interface inobject&&/handlejust moves the.ptr()and.release()calls inside the method, without reducing overall complexity.
I think in this case the added abstraction layer doesn't provide tangible benefits. The current approach is clear, direct, and easy to follow.
| detail::make_caster<T>::cast(std::forward<T>(x), policy, {})); | ||
| if (!o) { | ||
| void process(list & /*names_list*/, T &&x) { | ||
| handle h = detail::make_caster<T>::cast(std::forward<T>(x), policy, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting this to a handle here makes me a bit uncomfortable. You could keep it in an object and it just use the move from ctor, the only downside is you'll have one extra call to PYXDECREF on a nullptr.
|
I like how you have it done now, I just wonder if it could be clenaed up anymore with .release().ptr() semantics. |
| #endif | ||
| } | ||
| args_list.append(std::move(o)); | ||
| m_args.push_back_steal(h.ptr()); // cast returns a new reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be normal objects that you just release. The borrow semantics are a bit more ackward unless you use move by value implicit construction. This might actually be cleaner.
| } | ||
| m_kwargs[a.name] = std::move(a.value); | ||
| names_list.append(std::move(name)); | ||
| m_args.push_back_borrow(a.value.ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah both of these could just be a push_back(&&o) with a release ptr syntax and then you don't have to duplicate the logic and while the implementation is py::objects the input is still py::object. The one downside is the extra xdecref call on the nullptr stored in the py::object, but doubt that nullcheck will be a bottleneck?
| m_args.push_back_borrow(a.value.ptr()); | |
| m_args.push_back_steal(reinterpret_borrow<object>(a.value)); |

Description
Change both the Python->C++ and the C++->Python function calls to use the faster "vector call" calling protocol.
This makes function calls faster, at the minor expense of having to handle the arguments slightly differently.
Suggested changelog entry: