Skip to content

ShareableList corrupts multi-byte UTF-8 strings and bytes with trailing nulls #145261

@zetzschest

Description

@zetzschest

Bug report

Bug description:

Issue

ShareableList allocates string slots based on character count instead of UTF-8 byte count, causing corruption for multi-byte characters. Additionally, rstrip(b'\x00') strips legitimate trailing null bytes from bytes values.

Note: The trailing null bytes issue was originally reported in #106939 (July 2023) and documented as a known issue with a workaround. This fix attempts to resolve both that long-standing issue and the newly discovered UTF-8 corruption bug.

Reproducer

from multiprocessing.shared_memory import ShareableList

# String corruption
sl = ShareableList(['0\U00010000\U00010000'])
print(sl[0])  # UnicodeDecodeError
sl.shm.close(); sl.shm.unlink()

# Bytes corruption
sl = ShareableList([b'\x00'])
print(repr(sl[0]))  # b'' instead of b'\x00'
sl.shm.close(); sl.shm.unlink()

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-multiprocessingtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions