-
Notifications
You must be signed in to change notification settings - Fork 245
Open
Labels
experimentDescribes an investigation or measurementDescribes an investigation or measurement
Description
API calls in cuda-bindings currently are made through 3 layers.
As an experiment to measure the performance impact of calling through these layers, I "flattened" the call so the top layer just directly calls the C function pointer in the library (currently handled by the bottom layer). The overhead of each of these layers is pretty small, by design, but there is still some Python exception handling, as well as our library initialization check (cuPythonInit()) along the way.
While we lose some safety and version independence doing this, it is useful as an experiment to see what the cost of that flexibility is.
Measuring this with the benchmark in #659, I do not see any measurable change. Branch predictors must be pretty good these days.
Before: Mean +- std dev: 2.77 us +- 0.37 us
After: Mean +- std dev: 2.76 us +- 0.21 us
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
experimentDescribes an investigation or measurementDescribes an investigation or measurement