Skip to content

ci: require tag-triggered artifacts for release uploads#1606

Open
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:semantic-release
Open

ci: require tag-triggered artifacts for release uploads#1606
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:semantic-release

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Feb 11, 2026

Summary

  • Build CI on release tag pushes (v*, cuda-core-v*, cuda-pathfinder-v*) so release artifacts are generated from tagged refs.
  • Make release run-id auto-detection require a successful tag-triggered CI run, instead of any successful run for the same commit SHA.
  • Add a release-wheel validator that enforces:
    • wheel version exactly matches the requested git tag version
    • no .dev / +local style versions
    • expected component wheel(s) are present
  • Run the validator before both PyPI/TestPyPI publishing and GitHub Release wheel upload.
  • Add a release checklist reminder to wait for tag CI completion and use that run ID.

Why

After moving to setuptools-scm, merge-triggered CI can produce wheels like X.Y.Z.devN+gHASH when tags are created later.

The release workflow previously selected runs by commit SHA, so it could pull those pre-tag artifacts and fail at upload time.

This PR makes release selection and validation explicitly tag-aware to prevent that class of failure.

Behavior Changes

  • release.yml auto-detection now fails fast (with an actionable message) if no successful CI run exists for the requested tag ref.
  • Release workflows fail before upload when wheel versions do not match tag-derived release versions.
  • Publishing from pre-tag merge artifacts is blocked by default.

How this works in practice

For normal merges, behavior stays the same.

  • Merging to main still triggers the same CI workflow with the same jobs/matrix and artifact upload steps as before.
  • No build commands were changed for merge CI, so merge artifacts are produced the same way.

The one trigger change is that ci.yml now also listens to release tag pushes:

  • v*
  • cuda-core-v*
  • cuda-pathfinder-v*

So the trigger behavior is now:

  • merge push to main => usual CI run (unchanged)
  • release tag push => additional CI run on the tag ref (new)

Example:

git tag cuda-pathfinder-v1.3.4
git push upstream cuda-pathfinder-v1.3.4

The git push command above now triggers CI. Before this PR, it did not.

Operationally, release is now tag-aware:

  • CI: Release auto-detects a run ID from a successful tag-triggered CI run.
  • If no successful tag CI run exists yet, auto-detection fails with a clear message (wait for tag CI and retry).
  • Even when a run-id is manually provided, wheel validation still runs and blocks upload if versions do not match the requested tag.

This prevents publishing pre-tag wheels such as X.Y.Z.devN+gHASH.

What Triggers What

  • Push to main: triggers CI (same as today).
  • Push to pull-request/<N> branch: triggers CI (same as today).
  • Push matching release tag (v*, cuda-core-v*, cuda-pathfinder-v*): triggers CI (new).
  • Manual CI: Release dispatch: pulls wheels from the selected/detected run ID, validates wheel versions against git-tag, then uploads/publishes.

Auto-cancellation behavior

CI keeps:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
  cancel-in-progress: true

This means cancellation only happens for runs with the same workflow + same ref + same event.

  • New push to main cancels older in-progress main push CI runs.
  • Tag-triggered runs do not cancel main runs.
  • New pushes to main do not cancel tag-triggered runs.
  • Different tags (even on the same commit) do not cancel each other because refs differ.
  • Re-pushing the exact same tag ref could cancel the earlier run for that same tag ref (rare in normal release flow).

Day-to-day impact

  • Developers merging PRs should see no workflow behavior change in normal development.
  • Release managers should expect one additional CI run after pushing a release tag.
  • Release publication should now be more predictable: only tag-versioned artifacts can pass validation.

Recommended release sequence

  1. Merge PR(s) as usual.
  2. Create and push release tag.
  3. Wait for tag-triggered CI run to complete successfully.
  4. Run CI: Release for that tag (leave run-id empty to auto-detect, or provide the known tag run ID).
  5. If release validation fails, fix artifact/tag mismatch before retrying.

Build CI on release tags and reject wheel artifacts that do not match the requested tag version. This prevents setuptools-scm dev/local wheels from being published when tags are created after merge CI.

Co-authored-by: Cursor <cursoragent@cursor.com>
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 11, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Feb 11, 2026

This was the initial prompt, using Cursor(GPT-5.3 Codex Extra High):

With commit da7eb1f5a97aa21d8f78098e13e7c4edad013530 (PR 1411) we started using setuptools-scm.
 
Today we ran into an unforeseen consequence of that change, when running our Release workflow (.github/workflows/release.yml).

To start with the concrete error:

Transparency log entry created with index: 941159205
Uploading distributions to https://test.pypi.org/legacy/
Uploading cuda_pathfinder-1.3.4.dev147+g2928117f6-py3-none-any.whl
WARNING  Error during upload. Retry with the --verbose option for more details.
ERROR    HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
         Bad Request

Looking at this more high-level:

My PR #1596 was merged. The CI triggered automatically (.github/workflows/ci.yml) and built a bunch of wheels, including:

cuda_pathfinder-1.3.4.dev147+g2928117f6-py3-none-any.whl

After the GitHub Actions CI run finished I ran locally:

git tag cuda-pathfinder-v1.3.4
git push upstream cuda-pathfinder-v1.3.4

Then I manually triggered the Release workflow, which then generated the error above.

The quick solution we found was to manually trigger a rerun of the entire CI workflow, which then built the same wheel but with a different version number, based on the newly added tag:

cuda_pathfinder-1.3.4-py3-none-any.whl

Then manually triggering the Release workflow again, it succeeded publishing the wheel to PyPI.

Now the question: How can we make this work "right" in the future?

How have others solved this problem?

Naive thinking: What we want is to insert the tagging step between merging (when clicking the button in the github web UI) and triggering the CI workflow:

Click Merge button → trigger tagging → trigger CI workflow

But how can we define the desired tag to go with clicking the Merge button?

How much do we have to rethink our current overall workflow, to make all parts play nicely with each other?

For the code changes, I only had to say, Yes, Yes, ... a few times.

After Cursor generated the initial PR description, I asked for clarifications, made some manual changes, fed them back into Cursor, which then generated the initial PR description as posted.

Comment on lines +18 to +23
tags:
# Build release artifacts from tag refs so setuptools-scm resolves exact
# release versions instead of .dev+local variants.
- "v*"
- "cuda-core-v*"
- "cuda-pathfinder-v*"
Copy link
Member

@leofang leofang Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on offline discussion we have 3 solutions

  1. add another workflow which is a clone of "CI" (they can share most of the same code) that runs on tags, not every push.  And then update the lookup-run-id script to look for artifacts in that new workflow.  Those runs should always be guaranteed to have "releasable" versions.
    • this PR currently implements Solution 1.
    • I do not like this solution because I’ve seen several cases where we needed to walk back from a tag (i.e. re-tag) for various reasons, so I’d really like to separate tagging from releasing. But Solution 1 couples them together.
    • For wheels that were already tested on main, rebuilding & testing again takes unnecessary time before pushing packages out
  2. Make building the wheel an actual dependency of the release workflow.  The current approach of "assume it's already been done and look for it" seems pretty brittle.
    • This is what numba-cuda uses today (trigger release workflow -> tag -> rebuild wheels -> push out without tests). But I would like to unify our treatment across repos in the future and walk away from it. The reason is that by heavily relying on GHA we cannot guarantee that the wheels we build at release time are bitwise-identical to what’s built (and tested) in the main branch; the infra could change asynchronously behind our back. We should not rebuild IMHO.
  3. Add a tagging workflow that pushes an empty commit to main + a git tag.
    • Automate tagging (instead of manually pushing a tag to the upstream, which can be nerve wrecking)
    • rebuild & test on main
    • decoupled from release workflow (require manual triggering)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally: the simplest solution that meets all requirements is the best one. 1 (cloning/duplication) and 2 (convoluting) don't sound like that's the direction.

Regrading 3, sounds like training wheels (no pun intended)? Do we need them, for tagging? This PR seems to be very close to 3 already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right. I probably should rest and resume tomorrow...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution seems fine, except it means we only get CI on tagged commits on main. The "tags" metadata here acts as a filter, so this means "on branch main, only run this when there is a tag matching the pattern". I think we want to do /both/ every commit to main (which is useful for development and also people do like to have "development snapshots" to download) and the tagged commits again. That's why I suggested in (1) that we need to /clone/ the existing CI workflow to trigger on tags so that we also get tagged releases being built. And when I say "clone" it doesn't have to be literally copy-and-paste -- GHA has various ways to reuse code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I stand corrected. I just experimented on my own fork and it does look like pushing a tag causes the same workflow to run on the same commit. The cancelation policy we have in place gets in the way, but we can remove that. So this does seem like a good approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep the current cancellation policy, based on this Cursor-generated explanation (it gave me something similar yesterday, which it then distilled into the Auto-cancellation behavior section in the PR description):

on.push.branches and on.push.tags are additive (OR), so we still run CI on every push to main, and we also run CI on matching tag pushes.

Concurrency is currently:

group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}

with cancel-in-progress: true.

Since github.ref differs (refs/heads/main vs refs/tags/<tag>), cancellations are scoped per ref:

  • new main push cancels older main run
  • tag run does not cancel main run
  • new main run does not cancel tag run
  • different tags do not cancel each other

So we keep the benefit of pruning stale branch CI without hurting tag-triggered release builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimentally, that doesn't seem to be how it works. On my testing on my own fork, the tag-triggered run canceled the branch-triggered run. But I'm fine with merging this and experimenting and changing the cancelation config later if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canceled runs can be manually re-triggered. I am comfortable with merging this PR.

@cpcloud
Copy link
Contributor

cpcloud commented Feb 13, 2026

Since I am actively not following this conversation, it might be a good test of the solution for me to make the next pathfinder release and verify that it is idiot proof and not full of sharp edges to cut ourselves on.

@rwgk
Copy link
Collaborator Author

rwgk commented Feb 13, 2026

Since I am actively not following this conversation, it might be a good test of the solution for me to make the next pathfinder release and verify that it is idiot proof and not full of sharp edges to cut ourselves on.

That'll be awesome. I'll share the really tiny instructions I have in a doc.

@rwgk
Copy link
Collaborator Author

rwgk commented Feb 13, 2026

When do you guys want to merge?

Sooner, so we know asap everything except tagging works just like before?

Later, shortly before we make a release?

@rwgk
Copy link
Collaborator Author

rwgk commented Feb 13, 2026

I put up #1627: can we relax or remove the a/b/rc suffix stripping (e.g. to help testing this PR)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants