gh-144569: Avoid creating temporary objects in BINARY_SLICE for list, tuple, and unicode#144590
gh-144569: Avoid creating temporary objects in BINARY_SLICE for list, tuple, and unicode#144590cocolato wants to merge 10 commits intopython:mainfrom
BINARY_SLICE for list, tuple, and unicode#144590Conversation
|
I think the tests failure that occurred is unrelated to the PR. |
|
You forgot to add the new recorder to BINARY_SLICE macro op |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
lgtm just one comment and add news please
Python/bytecodes.c
Outdated
| PySlice_AdjustIndices(PyUnicode_GET_LENGTH(container_o), &istart, &istop, 1); | ||
| res_o = PyUnicode_Substring(container_o, istart, istop); | ||
| } | ||
| DECREF_INPUTS(); |
There was a problem hiding this comment.
We should split this into the POP_TOP POP_TOP form, can you please do that in a follow up PR?
|
Actually I've changed my mind. Can you please open a PR to convert BINARY_SLICE to the POP_TOP form please? Then after we merge that PR we can update this one |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Let's leave the DECREF_INPUTS alone for now and try to remove it in a future PR.
The code has repeated instances that should be factored out to a micro-op
Misc/NEWS.d/next/Core_and_Builtins/2026-02-08-13-14-00.gh-issue-144569.pjlJVe.rst
Outdated
Show resolved
Hide resolved
|
This is pretty close, I'm going to push some changes in, then wait a day to see if Mark has any objections tomorrow. |
|
Thanks for your guidance! |
1. We cannot exit after unpacking indices, as the stack contains tagged ints now which may lead to a crash. We must insert a guard for the type before unpacking indices. 2. It is possible for an indice to not fit in a tagged int, in which case we must deopt. 3. Recorded values do not always mean that that's the actual type we will see at runtime. So we must guard on that as well.
|
@cocolato please check out the latest commit and the extended commit message for the bugs fixed. Thanks! |
|
Results using hyperfine: So we made it >10% faster. Nice! |
LGTM! Thanks again for the fix, I learned a lot from it.
This is my first time learning about this tool, and it looks great. In the past, when I needed to run benchmarks locally, |
|
At the same time, I've noticed that when |
markshannon
left a comment
There was a problem hiding this comment.
Thanks for doing this.
This looks sound, but there are some inefficiencies.
Beware the size of uops. They will be converted to machine code, so keeping code size down is important for performance.
| } | ||
| } | ||
|
|
||
| op(_UNPACK_INDICES, (container, start, stop -- container, sta, sto)) { |
There was a problem hiding this comment.
start and stop may well be constants. Slices like [:n] and [n:] are quite common, so this can be optimized, popping the None and replacing it with tagged(0) for start, or tagged(MAX) for the end.
If both values are constant we can eliminate the _UNPACK_INDICES. If only one is constant, then we can replace _UNPACK_INDICES with a non-negative compact int guard and tagging.
There was a problem hiding this comment.
I believe this should be implemented after #144569 (comment). I will open other PR to do this.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
BINARY_SLICE for list, tuple, and unicodeBINARY_SLICE for list, tuple, and unicode
|
Thank you for your review and corrections! I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
When the tier 2 optimizer can determine the type of the container being sliced,
_BINARY_SLICEcan be replaced with a specialized micro-op that bypasses thePySliceObjectallocation entirely.This PR implements scalar replacement of
BINARY_SLICEby add new tier2 op_BINARY_SLICE_LIST,_BINARY_SLICE_TUPLE, and_BINARY_SLICE_UNICODE.