ci: require tag-triggered artifacts for release uploads by rwgk · Pull Request #1606 · NVIDIA/cuda-python

rwgk · 2026-02-11T22:34:20Z

Summary

Build CI on release tag pushes (v*, cuda-core-v*, cuda-pathfinder-v*) so release artifacts are generated from tagged refs.
Make release run-id auto-detection require a successful tag-triggered CI run, instead of any successful run for the same commit SHA.
Add a release-wheel validator that enforces:
- wheel version exactly matches the requested git tag version
- no .dev / +local style versions
- expected component wheel(s) are present
Run the validator before both PyPI/TestPyPI publishing and GitHub Release wheel upload.
Add a release checklist reminder to wait for tag CI completion and use that run ID.

Why

After moving to setuptools-scm, merge-triggered CI can produce wheels like X.Y.Z.devN+gHASH when tags are created later.

The release workflow previously selected runs by commit SHA, so it could pull those pre-tag artifacts and fail at upload time.

This PR makes release selection and validation explicitly tag-aware to prevent that class of failure.

Behavior Changes

release.yml auto-detection now fails fast (with an actionable message) if no successful CI run exists for the requested tag ref.
Release workflows fail before upload when wheel versions do not match tag-derived release versions.
Publishing from pre-tag merge artifacts is blocked by default.

How this works in practice

For normal merges, behavior stays the same.

Merging to main still triggers the same CI workflow with the same jobs/matrix and artifact upload steps as before.
No build commands were changed for merge CI, so merge artifacts are produced the same way.

The one trigger change is that ci.yml now also listens to release tag pushes:

v*
cuda-core-v*
cuda-pathfinder-v*

So the trigger behavior is now:

merge push to main => usual CI run (unchanged)
release tag push => additional CI run on the tag ref (new)

Example:

git tag cuda-pathfinder-v1.3.4
git push upstream cuda-pathfinder-v1.3.4

The git push command above now triggers CI. Before this PR, it did not.

Operationally, release is now tag-aware:

CI: Release auto-detects a run ID from a successful tag-triggered CI run.
If no successful tag CI run exists yet, auto-detection fails with a clear message (wait for tag CI and retry).
Even when a run-id is manually provided, wheel validation still runs and blocks upload if versions do not match the requested tag.

This prevents publishing pre-tag wheels such as X.Y.Z.devN+gHASH.

What Triggers What

Push to main: triggers CI (same as today).
Push to pull-request/<N> branch: triggers CI (same as today).
Push matching release tag (v*, cuda-core-v*, cuda-pathfinder-v*): triggers CI (new).
Manual CI: Release dispatch: pulls wheels from the selected/detected run ID, validates wheel versions against git-tag, then uploads/publishes.

Auto-cancellation behavior

CI keeps:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
  cancel-in-progress: true

This means cancellation only happens for runs with the same workflow + same ref + same event.

New push to main cancels older in-progress main push CI runs.
Tag-triggered runs do not cancel main runs.
New pushes to main do not cancel tag-triggered runs.
Different tags (even on the same commit) do not cancel each other because refs differ.
Re-pushing the exact same tag ref could cancel the earlier run for that same tag ref (rare in normal release flow).

Day-to-day impact

Developers merging PRs should see no workflow behavior change in normal development.
Release managers should expect one additional CI run after pushing a release tag.
Release publication should now be more predictable: only tag-versioned artifacts can pass validation.

Recommended release sequence

Merge PR(s) as usual.
Create and push release tag.
Wait for tag-triggered CI run to complete successfully.
Run CI: Release for that tag (leave run-id empty to auto-detect, or provide the known tag run ID).
If release validation fails, fix artifact/tag mismatch before retrying.

Build CI on release tags and reject wheel artifacts that do not match the requested tag version. This prevents setuptools-scm dev/local wheels from being published when tags are created after merge CI. Co-authored-by: Cursor <cursoragent@cursor.com>

copy-pr-bot · 2026-02-11T22:34:23Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-02-11T22:38:04Z

This was the initial prompt, using Cursor(GPT-5.3 Codex Extra High):

With commit da7eb1f5a97aa21d8f78098e13e7c4edad013530 (PR 1411) we started using setuptools-scm.
 
Today we ran into an unforeseen consequence of that change, when running our Release workflow (.github/workflows/release.yml).

To start with the concrete error:

Transparency log entry created with index: 941159205
Uploading distributions to https://test.pypi.org/legacy/
Uploading cuda_pathfinder-1.3.4.dev147+g2928117f6-py3-none-any.whl
WARNING  Error during upload. Retry with the --verbose option for more details.
ERROR    HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
         Bad Request

Looking at this more high-level:

My PR #1596 was merged. The CI triggered automatically (.github/workflows/ci.yml) and built a bunch of wheels, including:

cuda_pathfinder-1.3.4.dev147+g2928117f6-py3-none-any.whl

After the GitHub Actions CI run finished I ran locally:

git tag cuda-pathfinder-v1.3.4
git push upstream cuda-pathfinder-v1.3.4

Then I manually triggered the Release workflow, which then generated the error above.

The quick solution we found was to manually trigger a rerun of the entire CI workflow, which then built the same wheel but with a different version number, based on the newly added tag:

cuda_pathfinder-1.3.4-py3-none-any.whl

Then manually triggering the Release workflow again, it succeeded publishing the wheel to PyPI.

Now the question: How can we make this work "right" in the future?

How have others solved this problem?

Naive thinking: What we want is to insert the tagging step between merging (when clicking the button in the github web UI) and triggering the CI workflow:

Click Merge button → trigger tagging → trigger CI workflow

But how can we define the desired tag to go with clicking the Merge button?

How much do we have to rethink our current overall workflow, to make all parts play nicely with each other?

For the code changes, I only had to say, Yes, Yes, ... a few times.

After Cursor generated the initial PR description, I asked for clarifications, made some manual changes, fed them back into Cursor, which then generated the initial PR description as posted.

leofang · 2026-02-12T02:10:05Z

.github/workflows/ci.yml

+    tags:
+      # Build release artifacts from tag refs so setuptools-scm resolves exact
+      # release versions instead of .dev+local variants.
+      - "v*"
+      - "cuda-core-v*"
+      - "cuda-pathfinder-v*"


Based on offline discussion we have 3 solutions

add another workflow which is a clone of "CI" (they can share most of the same code) that runs on tags, not every push. And then update the lookup-run-id script to look for artifacts in that new workflow. Those runs should always be guaranteed to have "releasable" versions.

this PR currently implements Solution 1.

I do not like this solution because I’ve seen several cases where we needed to walk back from a tag (i.e. re-tag) for various reasons, so I’d really like to separate tagging from releasing. But Solution 1 couples them together.

For wheels that were already tested on main, rebuilding & testing again takes unnecessary time before pushing packages out

Make building the wheel an actual dependency of the release workflow. The current approach of "assume it's already been done and look for it" seems pretty brittle.

This is what numba-cuda uses today (trigger release workflow -> tag -> rebuild wheels -> push out without tests). But I would like to unify our treatment across repos in the future and walk away from it. The reason is that by heavily relying on GHA we cannot guarantee that the wheels we build at release time are bitwise-identical to what’s built (and tested) in the main branch; the infra could change asynchronously behind our back. We should not rebuild IMHO.

Add a tagging workflow that pushes an empty commit to main + a git tag.

Automate tagging (instead of manually pushing a tag to the upstream, which can be nerve wrecking)

rebuild & test on main

decoupled from release workflow (require manual triggering)

Generally: the simplest solution that meets all requirements is the best one. 1 (cloning/duplication) and 2 (convoluting) don't sound like that's the direction.

Regrading 3, sounds like training wheels (no pun intended)? Do we need them, for tagging? This PR seems to be very close to 3 already?

You might be right. I probably should rest and resume tomorrow...

This solution seems fine, except it means we only get CI on tagged commits on main. The "tags" metadata here acts as a filter, so this means "on branch main, only run this when there is a tag matching the pattern". I think we want to do /both/ every commit to main (which is useful for development and also people do like to have "development snapshots" to download) and the tagged commits again. That's why I suggested in (1) that we need to /clone/ the existing CI workflow to trigger on tags so that we also get tagged releases being built. And when I say "clone" it doesn't have to be literally copy-and-paste -- GHA has various ways to reuse code.

Actually, I stand corrected. I just experimented on my own fork and it does look like pushing a tag causes the same workflow to run on the same commit. The cancelation policy we have in place gets in the way, but we can remove that. So this does seem like a good approach.

I think we can keep the current cancellation policy, based on this Cursor-generated explanation (it gave me something similar yesterday, which it then distilled into the Auto-cancellation behavior section in the PR description):

on.push.branches and on.push.tags are additive (OR), so we still run CI on every push to main, and we also run CI on matching tag pushes.

Concurrency is currently:

group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}

with cancel-in-progress: true.

Since github.ref differs (refs/heads/main vs refs/tags/<tag>), cancellations are scoped per ref:

new main push cancels older main run

tag run does not cancel main run

new main run does not cancel tag run

different tags do not cancel each other

So we keep the benefit of pruning stale branch CI without hurting tag-triggered release builds.

Experimentally, that doesn't seem to be how it works. On my testing on my own fork, the tag-triggered run canceled the branch-triggered run. But I'm fine with merging this and experimenting and changing the cancelation config later if necessary.

Canceled runs can be manually re-triggered. I am comfortable with merging this PR.

cpcloud · 2026-02-13T17:32:13Z

Since I am actively not following this conversation, it might be a good test of the solution for me to make the next pathfinder release and verify that it is idiot proof and not full of sharp edges to cut ourselves on.

rwgk · 2026-02-13T17:57:03Z

Since I am actively not following this conversation, it might be a good test of the solution for me to make the next pathfinder release and verify that it is idiot proof and not full of sharp edges to cut ourselves on.

That'll be awesome. I'll share the really tiny instructions I have in a doc.

rwgk · 2026-02-13T18:05:28Z

When do you guys want to merge?

Sooner, so we know asap everything except tagging works just like before?

Later, shortly before we make a release?

rwgk · 2026-02-13T22:27:19Z

I put up #1627: can we relax or remove the a/b/rc suffix stripping (e.g. to help testing this PR)?

leofang reviewed Feb 12, 2026

View reviewed changes

rwgk mentioned this pull request Feb 12, 2026

[REL]: Add a workflow to tag a release #1610

Draft

rwgk mentioned this pull request Feb 13, 2026

[REL] Preserve a/b suffixes in tag_regex #1627

Open

Conversation

rwgk commented Feb 11, 2026

Summary

Why

Behavior Changes

How this works in practice

What Triggers What

Auto-cancellation behavior

Day-to-day impact

Recommended release sequence

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

rwgk commented Feb 11, 2026

Uh oh!

leofang Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

leofang Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

rwgk Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

leofang Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Feb 13, 2026

Uh oh!

rwgk commented Feb 13, 2026

Uh oh!

rwgk commented Feb 13, 2026

Uh oh!

rwgk commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leofang Feb 12, 2026 •

edited

Loading