gh-140009: Optimize dict.items() symmetric difference via PyTuple_FromArray#144771
gh-140009: Optimize dict.items() symmetric difference via PyTuple_FromArray#144771andrewloux wants to merge 3 commits intopython:mainfrom
dict.items() symmetric difference via PyTuple_FromArray#144771Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
PyTuple_FromArray in dict.items() symmetric difference
714fb11 to
451bec2
Compare
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
PyTuple_FromArray in dict.items() symmetric differencedict.items() symmetric difference via PyTuple_FromArray
|
@andrewloux Your own benchmarks show this change is performance neutral. Unless there are additional result that show why this change is a significant improvement, I suggest we close this. (I do believe this is a tiny net improvement, but in general we avoid making such small changes to reduce churn and potential unforeseen issues) |
Yup, totally makes sense - let's close this 👍🏽 Thanks @eendebakpt |
Summary
This PR replaces
PyTuple_PackwithPyTuple_FromArrayinObjects/dictobject.cwithin thedictitems_xor_lock_heldfunction.By avoiding the variadic argument (
va_args) processing overhead ofPyTuple_Pack, we reduce the per-item cost of symmetric difference operations (dict.items() ^ dict.items()) that involve value mismatches. The change uses a stack-allocated array to pass arguments directly to the tuple constructor.Benchmarks (PGO+LTO)
Validated using
pyperfin--rigorousmode on a full production build.--enable-optimizations --with-lto--rigorousmode)upstream/mainpytuple-dictitems-xor-fromarray(714fb11)dict_items_xor_overlap_neqdict_items_xor_disjointdict_items_xor_overlap_equal_controlGeometric mean: 1.00x faster (1.01x on target path)
Repro commands
Analysis
The
dict_items_xor_overlap_neqworkload specifically exercises the modified path by comparing dictionaries with overlapping keys but unequal values, triggering a tuple creation for every mismatched entry.While the aggregate effect is a micro-optimization, the results show a consistent improvement on the target path with a reduction in variance (±4.6 ms → ±2.5 ms) across multiple runs. Control workloads (
equalanddisjoint) remain neutral, confirming no regressions in non-target dictionary shapes.