Skip to content

Pickle PUT opcode size discrepancy #144410

@Legoclones

Description

@Legoclones

Bug report

Bug description:

In commit 59f247e, the PUT and LONG_PUT opcodes with large arguments leading to OOM errors was fixed by adding a sparse dictionary to the memo (in the C _pickle module). This has enabled large memo indices to be used in the C _pickle module, but has led to discovering another discrepancy in the PUT opcode.

In order to handle numbers > MAX_LONG (such as 999999999999999999999999), C _pickle must treat the numbers as PyLongs instead of a built-in type (like ssize_t or long). In the C _pickle source code for loading the PUT opcode during pickle deserialization, the memo index is first parsed as a PyLong, and then converted to a Py_ssize_t, which is really just a ssize_t under the hood.

cpython/Modules/_pickle.c

Lines 6544 to 6547 in 29acc08

key = PyLong_FromString(s, NULL, 10);
if (key == NULL)
return -1;
idx = PyLong_AsSsize_t(key);

_Unpickler_MemoPut(UnpicklerObject *self, size_t idx, PyObject *value)

This is done because idx is eventually is passed to _Unpickler_MemoPut(), which takes size_t idx as one of the arguments. However, this leads to the unintended side effect of preventing PUT indices > MAX_LONG from being valid in C _pickle.

payload:      b'K\x01p999999999999999999999999\n.'

pickle:       1
_pickle.c:    FAILURE Python int too large to convert to C ssize_t
pickletools:
    0: K    BININT1    1
    2: p    PUT        999999999999999999999999
   28: .    STOP
highest protocol among opcodes = 1

To ensure these indices are valid, it would likely require overloading _Unpickler_MemoPut() to accept a PyLong and somehow being able to use that as an index in the array.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions