gh-145044: Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj by benediktjohannes · Pull Request #145045 · python/cpython

benediktjohannes · 2026-02-20T16:43:03Z

Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj

In Objects/listobject.c, the function unsafe_object_compare() currently performs an unconditional Py_DECREF(res_obj); after evaluating the result of tp_richcompare, regardless of whether res_obj is a boolean or another object type.

Since PyBool_Check(res_obj) implies that res_obj is either Py_True or Py_False, both of which are immortal singletons, decrementing their reference count has no semantic effect. While Py_DECREF() is safe for immortal objects, it still triggers the decrement which introduces measurable overhead (see benchmarks below).

Rationale

PyBool_Check(res_obj) guarantees res_obj is either Py_True or Py_False. Both are immortal objects. Calling Py_DECREF() on immortal objects has no observable semantic effect. Avoiding the unnecessary decrement improves performance in microbenchmarks (see benchmarks below). This function sits on the hot path of list sorting (list.sort()), so even small savings per comparison accumulate measurably.

Correctness Considerations

The Py_NotImplemented branch remains unchanged and still decrefs before fallback. The res_obj == NULL case remains unchanged. Non-boolean results continue to be decremented exactly once. For boolean results Py_DECREF() is not triggered anymore, which is safe due to their immortal status. The existing comment regarding non-deterministic comparison functions (e.g. lambda a, b: int(random.random() * 3) - 1 in test_sort.py) should not be any problem for the change in my opinion (but please correct me if I'm mistaken anywhere).

Benchmarks

This was tested with two combined scripts in order to compare original cpython (also called CPython in the benchmark results) and so called cpython_patch (also called CPython PATCH in the benchmark results). The cpython_patch includes the exact same copy of cpython (created at the same time with the same commits) with only this change which is purposed in the PR.

I use controller.py which starts 2000 independent runs of both scripts one after another each time switched so that impacts due to CPU boosts, etc. are not relevant anymore. I've also tested this with switched order of both scripts so that the start of the next iteration does not cause any relevant overhead before the test (down below one of the most representative benchmark results).

Scripts

controller.py

import subprocess
import statistics
import math
import time

python_A = r"C:\Users\User\cpython\PCbuild\amd64\python.exe"
python_B = r"C:\Users\User\cpython_patch\PCbuild\amd64\python.exe"
benchmark_script = r"C:\Users\User\Desktop\performancetest.py"

runs = 2000

results_A = []
results_B = []

for i in range(runs):
    out_A = subprocess.check_output([python_A, benchmark_script])
    time_A = int(out_A.strip())
    results_A.append(time_A)
    print(f"Run {i+1}/{runs} completed for Python A | Time: {time_A} ns")

    out_B = subprocess.check_output([python_B, benchmark_script])
    time_B = int(out_B.strip())
    results_B.append(time_B)
    print(f"Run {i+1}/{runs} completed for Python B | Time: {time_B} ns")

def summarize(data, label):
    n = len(data)
    mean = statistics.mean(data)
    stdev = statistics.stdev(data)
    median = statistics.median(data)

    z = 1.96
    margin = z * (stdev / math.sqrt(n))
    ci_lower = mean - margin
    ci_upper = mean + margin

    print(f"\n--- {label} ---")
    print(f"Mean: {mean:.2f} ns")
    print(f"Median: {median:.2f} ns")
    print(f"95% CI: [{ci_lower:.2f}, {ci_upper:.2f}] ns")

print("\n===== STATS =====")
summarize(results_A, "CPython")
summarize(results_B, "CPython PATCH")

speedup = statistics.mean(results_A) / statistics.mean(results_B)
print(f"\nSpeedup A/B: {speedup:.4f}x")

performancetest.py

import time

class Item:
    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        return self.value < other.value

fixed_values = [
    14, 12
]

lst = [Item(v) for v in fixed_values]

start_ns = time.perf_counter_ns()

lst.sort()

end_ns = time.perf_counter_ns()

print(end_ns - start_ns)

Results

The results printed in console for my microbenchmark were:

--- CPython ---
Mean: 3614.60 ns
Median: 3500.00 ns
95% CI: [3598.79, 3630.41] ns

--- CPython PATCH ---
Mean: 3555.60 ns
Median: 3400.00 ns
95% CI: [3538.55, 3572.65] ns

Speedup A/B: 1.0166x

The full log is viewable online (because it includes around 4000 lines of single results) at: https://justpaste.it/fwgog

Issue: Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj #145044

…) for immortal res_obj

eendebakpt

The change itself is correct, but the performance gain is small.

Misc/NEWS.d/next/Core_and_Builtins/2026-02-20-16-49-28.gh-issue-145044.20DoM5.rst

benediktjohannes · 2026-02-20T23:56:00Z

Small addition for clarity: We measure A/B which is the lower speed of A in comparison to B which means that B is effectively not only 1.66% faster, but 1.688%.

picnixz · 2026-02-21T00:04:05Z

Please use pyperf instead of a custom tool for performance. This is true for all proposals for benchmarks. Also, be sure to run the tests with an PGO+LTO build not a DEBUG build.

For immortal objects, we also don't have a clear pattern of whether to optimize or not. While it is an optimization, it complicates a bit the branches and the mental burden (we need to remember that we are working on an immortal object here). So unless we have (1) more benchmarks (2) macro benchmarks as well (small lists are not relevant IMO, it's in the realm of noise here), I'm not willing to support this proposal.

benediktjohannes · 2026-02-21T00:16:41Z

Thanks for the review! I’ll add bechmarks using pyperf as soon as possible (probably tomorrow) and also including larger sizes (which probably will also have a similar or even better effect I guess because in this case comparisons are even more important in comparison to the other things done in sort I guess, but probably the effects will nearly be the same because in all cases comparisons should be a very important thing). And if it turns out that this actually improves performance by the expected amount, I think that merging this should be positive because the global impact in list.sort() and potential other use cases for these compares should be good to go in my opinion. And by the way concerning the noise: I actually had many problems with this at first, which was indicated by very huge confidence intervals, but I managed to fix this by doing fast switches between the scripts and so on so that this is clearly significant and reproducable (I’ve tested this extensively with consistent results; you’re welcome to try it yourself if you’d like to see the results).

benediktjohannes · 2026-02-21T12:19:20Z

I'm currently working on this and I have really great news to announce! In fact this change seems to be even way more positive for performance (around 5% or even more) because what was measured earlier was only list.sort(), but also many other aspects for lists are improved through this change (even way more) which means that e.g. if you just set the start for measurement in the scripts above earlier (which means that more code executions are included in the measurement which is also more realistic for many use cases because the whole impact is important) then you get a way bigger increasement in performance (and probably this also applies to many other functions) which is great in my opinion! So I think that this should definitely be merged because the increasement should be way more important for lists in many other aspects as well!

I'll be working on testing this with pyperf as soon as possible so that we can confirm the results and also with greater lists (so with more elements)! 👍

picnixz · 2026-02-21T12:49:55Z

So I think that this should definitely be merged because the increasement should be way more important for lists in many other aspects as well!

Unless I see real improvements with pyperf, I don't want to complicate the current code. The code is fine and easier to read. Also, I don't know if you are using an LLM to write your answers, but could you shorten them or at least make paragraphs? it is hard to read and catch your point.

I'll be working on testing this with pyperf as soon as possible so that we can confirm the results and also with greater lists (so with more elements)! 👍

I would also suggest running the pyperformance macrobenchmarks.

benediktjohannes · 2026-02-21T12:52:27Z

So I think that this should definitely be merged because the increasement should be way more important for lists in many other aspects as well!

Unless I see real improvements with pyperf, I don't want to complicate the current code. The code is fine and easier to read. Also, I don't know if you are using an LLM to write your answers, but could you shorten them or at least make paragraphs? it is hard to read and catch your point.

I'll try to shorten them, I'm sorry, I'm sometimes using too many words for simple sentences. 🤣 And I see your point, I'll add pyperf results as soon as possible (currently working on it).

benediktjohannes · 2026-02-21T12:52:55Z

I would also suggest running the pyperformance macrobenchmarks.

I'll have a look on that! 👍

benediktjohannes · 2026-02-21T12:53:03Z

Thanks!

benediktjohannes · 2026-02-21T13:44:39Z

Hmmm, I guess that the problem with macro benchmarks will be that sorting (or even lists in general) only are one part of the benchmarks and measuring the impacts will be extremely difficult because of noise (if we have around 1%-5% impact on lists, this is maybe around 0.01%-0.05% in general which is probably extremely hard to differenciate between noise and impact).

But I‘ll (of couse) add some pyperf tests (and I think that if we measure an impact there, it‘s worth adding these changes because code should not be too complicating even with this change and then we have a real user effect).

benediktjohannes · 2026-02-21T13:45:12Z

I hope that this message was not too long 😅 and is understandable

Increase performance in unsafe_object_compare by not using Py_DECREF(…

c1ffe40

…) for immortal res_obj

bedevere-app bot added the awaiting review label Feb 20, 2026

bedevere-app bot mentioned this pull request Feb 20, 2026

Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj #145044

Open

blurb-it bot and others added 4 commits February 20, 2026 16:49

📜🤖 Added by blurb_it.

10ab2ca

Update 2026-02-20-16-49-28.gh-issue-145044.20DoM5.rst

66e2011

Update 2026-02-20-16-49-28.gh-issue-145044.20DoM5.rst

12091bf

Update 2026-02-20-16-49-28.gh-issue-145044.20DoM5.rst

1128fc5

eendebakpt reviewed Feb 20, 2026

View reviewed changes

Misc/NEWS.d/next/Core_and_Builtins/2026-02-20-16-49-28.gh-issue-145044.20DoM5.rst Outdated Show resolved Hide resolved

Update 2026-02-20-16-49-28.gh-issue-145044.20DoM5.rst

76a3521

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

gh-145044: Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj#145045

gh-145044: Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj#145045
benediktjohannes wants to merge 6 commits intopython:mainfrom
benediktjohannes:patch-8

benediktjohannes commented Feb 20, 2026 •

edited by bedevere-app bot

Loading

Uh oh!

eendebakpt left a comment

Uh oh!

Uh oh!

benediktjohannes commented Feb 20, 2026 •

edited

Loading

Uh oh!

picnixz commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

picnixz commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

benediktjohannes commented Feb 20, 2026 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Increase performance in unsafe_object_compare by not using Py_DECREF() for immortal res_obj

Rationale

Correctness Considerations

Benchmarks

Scripts

Results

Uh oh!

eendebakpt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benediktjohannes commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

picnixz commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

benediktjohannes commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benediktjohannes commented Feb 20, 2026 •

edited by bedevere-app bot

Loading

benediktjohannes commented Feb 20, 2026 •

edited

Loading