Skip to content

Conversation

@adamchainz
Copy link
Contributor

@adamchainz adamchainz commented Feb 12, 2026

Move the default docstring construction to occur on-access through a descriptor.

Verified with tprof and this script that generates 10k dataclasses:

from dataclasses import dataclass

for i in range(10_000):
    @dataclass
    class Example:
        field1: int
        field2: str
        field3: float

Before:

$ tprof -t dataclasses._process_class example.py
🎯 tprof results:
 function                     calls total  mean ± σ       min … max
 dataclasses._process_class() 10000    5s 485μs ± 120μs 458μs … 6ms

After:

$ PYTHONPATH=Lib/ uvx tprof -t dataclasses._process_class example.py 🎯 tprof results:
 function                     calls total  mean ± σ       min … max
 dataclasses._process_class() 10000    3s 275μs ± 131μs 245μs … 6ms

The mean time spent in _process_class() has dropped from 485μs to 275μs, a ~42% time saving (admittedly skewed due to the small size of the dataclass).

@johnslavik
Copy link
Member

johnslavik commented Feb 12, 2026

DocDescriptor isn't meant to be public API. Can we rename it to a private class?

Move the default docstring construction to occur on-access through a descriptor.

Verified with [tprof](https://github.com/adamchainz/tprof) and this script that generates 10k dataclasses:

```py
from dataclasses import dataclass

for i in range(10_000):
    @DataClass
    class Example:
        field1: int
        field2: str
        field3: float
```

**Before:**

```
$ tprof -t dataclasses._process_class example.py
🎯 tprof results:
 function                     calls total  mean ± σ       min … max
 dataclasses._process_class() 10000    5s 485μs ± 120μs 458μs … 6ms
 ```

After:

```
$ PYTHONPATH=Lib/ uvx tprof -t dataclasses._process_class example.py
🎯 tprof results:
 function                     calls total  mean ± σ       min … max
 dataclasses._process_class() 10000    3s 275μs ± 131μs 245μs … 6ms
```

The mean time spent in `_process_class()` has dropped from 485μs to 275μs, a ~42% time saving (admittedly skewed due to the small size of the dataclass).
@adamchainz adamchainz force-pushed the adamchainz/dataclass-lazy-docstring branch from b69f2e5 to e2293d9 Compare February 13, 2026 00:07
@adamchainz
Copy link
Contributor Author

DocDescriptor isn't meant to be public API. Can we rename it to a private class?

Good point, done!

@danielhollas
Copy link
Contributor

Hehe, funny, I ended doing do same trick in my recent PR #144387, with the main purpose of being able to lazy import the inspect module. It's great to see it improves data class creation as well (I had a hunch but didn't do careful benchmarking).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants