Skip to content

Cannot run jobs in parallel in a scaleset with a long name in kubernetes mode #4368

@oradwell

Description

@oradwell

Checks

Controller Version

0.12.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

  1. Create a Runner ScaleSet with containerMode: kubernetes and scaleset name of length between 40 and 45 characters (e.g. arc-prd-my-awesome-continuous-integration-1)
  2. Create a matrix job with at least 2 jobs
  3. One of the jobs will fail with Error: failed to create job pod: pods "arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow" already exists

Describe the bug

When the scaleset name has 40 or more characters (under the current 45-character limit) in kubernetes mode, the workflow pod creation code truncates the unique EphemeralRunner suffix. This results in naming collision for any new EphemeralRunners created in an EphemeralRunnerSet.

For a Scaleset EphemeralRunnerSet/arc-prd-my-awesome-continuous-integration-1-q5nb5r;
let's say 2 jobs are requested.

When job 1 comes in, it creates a new runner (EphemeralRunner) pod for a job arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-f45dst which then creates the workflow pod arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow.

When job 2 comes in, it creates a new runner (EphemeralRunner) pod for a job arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-g8fb7 which then creates the workflow pod arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow.

Job 2 fails on GitHub UI with error:

Error: failed to create job pod:
pods "arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow" already exists

As you can see, the workflow pod for job 1 and job 2 have the same name which causes the naming collision and the pod "already exists" error.

Describe the expected behavior

Workflow pod name for job 1 and job 2 should be different.

Additional Context

n/a

Controller Logs

ERROR	EphemeralRunner	Failed to create pod resource for ephemeral runner.	{"version": "0.12.1", "ephemeralrunner": {"name":"arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-g8fb7","namespace":"arc-runner"}, "error": "pods \"arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow\" already exists"}

Runner Pod Logs

n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions