ci: require tag-triggered artifacts for release uploads#1606
ci: require tag-triggered artifacts for release uploads#1606rwgk wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Build CI on release tags and reject wheel artifacts that do not match the requested tag version. This prevents setuptools-scm dev/local wheels from being published when tags are created after merge CI. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
This was the initial prompt, using Cursor(GPT-5.3 Codex Extra High): For the code changes, I only had to say, Yes, Yes, ... a few times. After Cursor generated the initial PR description, I asked for clarifications, made some manual changes, fed them back into Cursor, which then generated the initial PR description as posted. |
| tags: | ||
| # Build release artifacts from tag refs so setuptools-scm resolves exact | ||
| # release versions instead of .dev+local variants. | ||
| - "v*" | ||
| - "cuda-core-v*" | ||
| - "cuda-pathfinder-v*" |
There was a problem hiding this comment.
Based on offline discussion we have 3 solutions
- add another workflow which is a clone of "CI" (they can share most of the same code) that runs on tags, not every push. And then update the lookup-run-id script to look for artifacts in that new workflow. Those runs should always be guaranteed to have "releasable" versions.
- this PR currently implements Solution 1.
- I do not like this solution because I’ve seen several cases where we needed to walk back from a tag (i.e. re-tag) for various reasons, so I’d really like to separate tagging from releasing. But Solution 1 couples them together.
- For wheels that were already tested on main, rebuilding & testing again takes unnecessary time before pushing packages out
- Make building the wheel an actual dependency of the release workflow. The current approach of "assume it's already been done and look for it" seems pretty brittle.
- This is what numba-cuda uses today (trigger release workflow -> tag -> rebuild wheels -> push out without tests). But I would like to unify our treatment across repos in the future and walk away from it. The reason is that by heavily relying on GHA we cannot guarantee that the wheels we build at release time are bitwise-identical to what’s built (and tested) in the main branch; the infra could change asynchronously behind our back. We should not rebuild IMHO.
- Add a tagging workflow that pushes an empty commit to main + a git tag.
- Automate tagging (instead of manually pushing a tag to the upstream, which can be nerve wrecking)
- rebuild & test on main
- decoupled from release workflow (require manual triggering)
There was a problem hiding this comment.
Generally: the simplest solution that meets all requirements is the best one. 1 (cloning/duplication) and 2 (convoluting) don't sound like that's the direction.
Regrading 3, sounds like training wheels (no pun intended)? Do we need them, for tagging? This PR seems to be very close to 3 already?
There was a problem hiding this comment.
You might be right. I probably should rest and resume tomorrow...
There was a problem hiding this comment.
This solution seems fine, except it means we only get CI on tagged commits on main. The "tags" metadata here acts as a filter, so this means "on branch main, only run this when there is a tag matching the pattern". I think we want to do /both/ every commit to main (which is useful for development and also people do like to have "development snapshots" to download) and the tagged commits again. That's why I suggested in (1) that we need to /clone/ the existing CI workflow to trigger on tags so that we also get tagged releases being built. And when I say "clone" it doesn't have to be literally copy-and-paste -- GHA has various ways to reuse code.
There was a problem hiding this comment.
Actually, I stand corrected. I just experimented on my own fork and it does look like pushing a tag causes the same workflow to run on the same commit. The cancelation policy we have in place gets in the way, but we can remove that. So this does seem like a good approach.
There was a problem hiding this comment.
I think we can keep the current cancellation policy, based on this Cursor-generated explanation (it gave me something similar yesterday, which it then distilled into the Auto-cancellation behavior section in the PR description):
on.push.branches and on.push.tags are additive (OR), so we still run CI on every push to main, and we also run CI on matching tag pushes.
Concurrency is currently:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
with cancel-in-progress: true.
Since github.ref differs (refs/heads/main vs refs/tags/<tag>), cancellations are scoped per ref:
- new
mainpush cancels oldermainrun - tag run does not cancel
mainrun - new
mainrun does not cancel tag run - different tags do not cancel each other
So we keep the benefit of pruning stale branch CI without hurting tag-triggered release builds.
There was a problem hiding this comment.
Experimentally, that doesn't seem to be how it works. On my testing on my own fork, the tag-triggered run canceled the branch-triggered run. But I'm fine with merging this and experimenting and changing the cancelation config later if necessary.
There was a problem hiding this comment.
Canceled runs can be manually re-triggered. I am comfortable with merging this PR.
|
Since I am actively not following this conversation, it might be a good test of the solution for me to make the next pathfinder release and verify that it is idiot proof and not full of sharp edges to cut ourselves on. |
That'll be awesome. I'll share the really tiny instructions I have in a doc. |
|
When do you guys want to merge? Sooner, so we know asap everything except tagging works just like before? Later, shortly before we make a release? |
|
I put up #1627: can we relax or remove the a/b/rc suffix stripping (e.g. to help testing this PR)? |
Summary
CIon release tag pushes (v*,cuda-core-v*,cuda-pathfinder-v*) so release artifacts are generated from tagged refs..dev/+localstyle versionsWhy
After moving to
setuptools-scm, merge-triggered CI can produce wheels likeX.Y.Z.devN+gHASHwhen tags are created later.The release workflow previously selected runs by commit SHA, so it could pull those pre-tag artifacts and fail at upload time.
This PR makes release selection and validation explicitly tag-aware to prevent that class of failure.
Behavior Changes
release.ymlauto-detection now fails fast (with an actionable message) if no successful CI run exists for the requested tag ref.How this works in practice
For normal merges, behavior stays the same.
mainstill triggers the sameCIworkflow with the same jobs/matrix and artifact upload steps as before.The one trigger change is that
ci.ymlnow also listens to release tag pushes:v*cuda-core-v*cuda-pathfinder-v*So the trigger behavior is now:
main=> usual CI run (unchanged)Example:
The
git pushcommand above now triggersCI. Before this PR, it did not.Operationally, release is now tag-aware:
CI: Releaseauto-detects a run ID from a successful tag-triggered CI run.run-idis manually provided, wheel validation still runs and blocks upload if versions do not match the requested tag.This prevents publishing pre-tag wheels such as
X.Y.Z.devN+gHASH.What Triggers What
main: triggersCI(same as today).pull-request/<N>branch: triggersCI(same as today).v*,cuda-core-v*,cuda-pathfinder-v*): triggersCI(new).CI: Releasedispatch: pulls wheels from the selected/detected run ID, validates wheel versions againstgit-tag, then uploads/publishes.Auto-cancellation behavior
CIkeeps:This means cancellation only happens for runs with the same workflow + same ref + same event.
maincancels older in-progressmainpush CI runs.mainruns.maindo not cancel tag-triggered runs.Day-to-day impact
Recommended release sequence
CIrun to complete successfully.CI: Releasefor that tag (leaverun-idempty to auto-detect, or provide the known tag run ID).