Conversation
Add support for initializing and updating git submodules during repository cloning.
Added GitCloneSubmodules option to support recursive submodule initialization.
Resolved issues with git submodule handling for better cloning and URL resolution.
Resolved issues with git submodule handling to ensure robust cloning and URL resolution without relying on git binary calls.
*test.go file addendum
Adding tests for Submodules and two missing
|
I have read the CLA Document and I hereby sign the CLA |
johnstcn
left a comment
There was a problem hiding this comment.
Thanks for working on this @bjornrobertsson !
I have some comments below, and I think it would also be worthwhile adding an integration test (see: integration/integration_test.go and testutil/gittest/gittest.go).
| { | ||
| name: "absolute", | ||
| parentURL: "https://example.com/org/main.git", | ||
| subURL: "https://github.com/other/repo.git", | ||
| expect: "https://github.com/other/repo.git", | ||
| }, | ||
| { | ||
| name: "relativeSibling", | ||
| parentURL: "https://example.com/org/main.git", | ||
| subURL: "../deps/lib.git", | ||
| expect: "https://example.com/org/deps/lib.git", | ||
| }, | ||
| { | ||
| name: "relativeChild", | ||
| parentURL: "https://example.com/org/main.git", | ||
| subURL: "./extras/tool.git", | ||
| expect: "https://example.com/org/main.git/extras/tool.git", | ||
| }, | ||
| { | ||
| name: "badParent", | ||
| parentURL: "://bad", | ||
| subURL: "./child", | ||
| expectErr: "parse parent URL", | ||
| }, |
There was a problem hiding this comment.
Please also test with the URL schemes from TestSetupRepoAuth:
- SSH with scheme (
ssh://host.tld/repo) - SSH without scheme (
git@host.tld:repo/path) - Git scheme (
git://git@host.tld:repo/path) - Absolute local path (
/path/to/repo) - Relative paths without
./..(path/to/repo)
There was a problem hiding this comment.
Testing SSH and Git revealed negative user experience if ssh:// and git_username were passed. Lead to: #486
Otherwise testing are fine.
The location when extracted leads to the need to work around relative paths, a lot of work and testing led to the current methods as working
git/git.go
Outdated
| } | ||
|
|
||
| // initSubmodules recursively initializes and updates all submodules in the repository. | ||
| func initSubmodules(ctx context.Context, logf func(string, ...any), repo *git.Repository, opts CloneRepoOptions) error { |
There was a problem hiding this comment.
I'm always a bit wary about recursive functions. I'd feel better about this if we had a maximum depth of which to recurse. I'd wager that most repos won't need more than 2 iterations. If ENVBUILDER_GIT_CLONE_SUBMODULES were changed to accept a positive integer instead, this could function as the number of times to recurse.
There was a problem hiding this comment.
Changed the acceptance to 0-10, true, false - and added a Depth testing
| parentSrv = httptest.NewServer(opts.AuthMW(NewServer(fs))) | ||
| } | ||
| return parentSrv, submoduleSrv | ||
| } |
There was a problem hiding this comment.
I'm not 100% sure this needs to be its own function. I think we could probably in-line this into the relevant test for now.
git/git.go
Outdated
| if opts.Submodules { | ||
| err = initSubmodules(ctx, logf, repo, opts) | ||
| if err != nil { | ||
| return true, fmt.Errorf("init submodules: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
I just saw this -- apaprently go-git supports this natively?
https://pkg.go.dev/github.com/go-git/go-git#CloneOptions
https://pkg.go.dev/github.com/go-git/go-git#SubmoduleRescursivity
There was a problem hiding this comment.
This seems like the preferable option to me vs bespoke implementation. 👍🏻
There was a problem hiding this comment.
That function breaks because of the relative path, and reason for handling this manually (it was confounding and took some time).
There was a problem hiding this comment.
Oh, and v6 renames it, fixes the typo but we're not at v6 yet.
There was a problem hiding this comment.
I take it you're referencing this commit? We could fork and backport this fix to v5 until v6 comes out. WDYT?
There was a problem hiding this comment.
There is nothing concrete that the relative path issues are resolved, more they're not. Checks indicate that there are known bugs with:
- HTTPS URLs losing a slash
- Multiple relative paths (../../submodule.git) not working
git/git.go
Outdated
| if opts.Submodules { | ||
| err = initSubmodules(ctx, logf, repo, opts) | ||
| if err != nil { | ||
| return true, fmt.Errorf("init submodules: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
This seems like the preferable option to me vs bespoke implementation. 👍🏻
docs/env-variables.md
Outdated
| | `--git-clone-depth` | `ENVBUILDER_GIT_CLONE_DEPTH` | | The depth to use when cloning the Git repository. | | ||
| | `--git-clone-single-branch` | `ENVBUILDER_GIT_CLONE_SINGLE_BRANCH` | | Clone only a single branch of the Git repository. | | ||
| | `--git-clone-thinpack` | `ENVBUILDER_GIT_CLONE_THINPACK` | `true` | Git clone with thin pack compatibility enabled, ensuring that even when thin pack compatibility is activated,it will not be turned on for the domain dev.zaure.com. | | ||
| | `--git-clone-submodules` | `ENVBUILDER_GIT_CLONE_SUBMODULES` | | Recursively clone Git submodules after cloning the repository. | |
There was a problem hiding this comment.
We may also want to add --git-clone-submodules-depth to limit recursion depth.
There was a problem hiding this comment.
WDYT about --git-clone-submodules-depth=0 as "don't clone submodules (default)"?
There was a problem hiding this comment.
I just add conditional string eval, default, and 0 is 'false' and a number is Depth?
Transitional, it might be nicer if 'true' opens up a sub-menu for Depth in the template, poc:
validation {
regex = "^(true|false|[0-9]|10)$"
error = "Must be 'true', 'false', or a number from 0-10."
}
# and #
"ENVBUILDER_GIT_CLONE_SUBMODULES" : tostring(local.submodule_depth),
- Added SubmoduleDepth custom type that accepts 'true' (depth 10), 'false' (0), or positive integer - initSubmodules now tracks current depth and stops at max depth - Default is 0 (disabled) - submodules are not cloned unless explicitly enabled
git/git.go
Outdated
| } | ||
| } | ||
|
|
||
| // resolveSubmoduleURL resolves a potentially relative submodule URL against the parent repository URL |
There was a problem hiding this comment.
| // resolveSubmoduleURL resolves a potentially relative submodule URL against the parent repository URL |
git/git.go
Outdated
| // Use SingleBranch=false to fetch all branches so we can find the commit | ||
| subRepo, err := git.CloneContext(ctx, gitStorage, subFS, &git.CloneOptions{ | ||
| URL: resolvedURL, | ||
| Auth: opts.RepoAuth, |
There was a problem hiding this comment.
Security: This allows the parent repo auth to be exfiltrated by a malicious Git host. We should only forward auth if the submodule shares the same host as the parent repo. Or we could consider adding an auth allow list if users need more control.
There was a problem hiding this comment.
- Added extractHost(url) - extracts hostname from both standard URLs and SCP-like URLs
- Added SameHost(url1, url2) - compares hosts case-insensitively, ignoring ports
- Auth is only forwarded if SameHost(parentURL, submoduleURL) returns true
- Warning logged when auth is withheld: ⚠ Not forwarding auth to submodule (different host: evil.com)
git/git.go
Outdated
| if origin, hasOrigin := cfg.Remotes["origin"]; hasOrigin && len(origin.URLs) > 0 { | ||
| parentURL = origin.URLs[0] | ||
| } | ||
| logf("Parent repository URL: %s", parentURL) |
There was a problem hiding this comment.
Should we consider logging the redacted URL here? This URL could contain credentials.
There was a problem hiding this comment.
- Added with as many URI's as possible
testutil/gittest/gittest.go
Outdated
| url = %s | ||
| `, submoduleSrv.URL) | ||
| commits = append(commits, Commit(t, ".gitmodules", gitmodulesContent, "add submodule")) | ||
| _ = NewRepo(t, fs, commits...) |
There was a problem hiding this comment.
Does this actually create a valid submodule/parent module setup? It doesn't look like the test that uses this actually verifies the submodule is properly cloned? Ideally there would be a committed file that gets verified too.
We should probably expand this with regards to the security issue as well, and make sure we don't exfiltrate parent repo auth unless allowed?
| } | ||
|
|
||
| // Parse the parent URL | ||
| parentParsed, err := url.Parse(parentURL) |
There was a problem hiding this comment.
If the parent URL is SCP-like (git@github.com:org/repo.git), and the submodule is relative, we don't hit the guard above and url.Parse can't parse the SCP-like URL.
Suggestion: Use parentEP := transport.NewEndpoint(parentURL) instead? You should be able to just update the parentEP.Path and return a string instead of the url.URL construction below.
There was a problem hiding this comment.
You're right that url.Parse doesn't handle git@github.com:org/repo.git correctly - it ends up treating the whole thing as a path.
However, I'd prefer to keep the current implementation for now for a few reasons:
This function was built specifically to work around go-git's broken relative URL resolution. The native Submodules().Update() has known issues:
HTTPS URLs lose a slash when using filepath.Dir() internally
Multiple relative paths (../../submodule.git) don't resolve correctly
These bugs exist in v5 and there's no clear indication v6 fixes them
Most repos using SCP-style URLs (GitHub, GitLab, etc.) have submodules configured with absolute URLs or HTTPS URLs. And often they're even modified to HTTPS. The common case we're solving is HTTPS parent + relative submodule.
Using transport.NewEndpoint() risks reintroducing the bugs we fixed since it's the same go-git code path that has the known issues.
If we do see issues with SCP-like parent URLs in the wild, we can add detection similar to what we did in RedactURL and handle that case separately. But I'd rather not expand scope here without a concrete use case driving it.
There was a problem hiding this comment.
In that case, we should
- Specifically check for SCP-like submodule URL syntax (as opposed to "just" an invalid URL),
- Drop an ERROR / WARN log mentioning that these are not supported right now with a link to a follow-up issue,
- Document this limitation.
There was a problem hiding this comment.
@bjornrobertsson I had an LLM try out this suggestion, and from what I can tell it works as expected?
See change + test cases in this patch:
diff --git git/git.go git/git.go
index 65edbcf..8f048aa 100644
--- git/git.go
+++ git/git.go
@@ -509,32 +509,23 @@ func RedactURL(u string) string {
}
// ResolveSubmoduleURL resolves a potentially relative submodule URL against a parent repository URL.
-//
-// Limitation: SCP-like URLs (e.g., git@github.com:org/repo.git) are not supported as parent URLs
-// when the submodule uses a relative path. This is a known limitation.
-// See: https://github.com/coder/envbuilder/issues/492
func ResolveSubmoduleURL(parentURL, submoduleURL string) (string, error) {
// If the submodule URL is absolute (contains ://) or doesn't start with ./ or ../, return it as-is
if strings.Contains(submoduleURL, "://") || (!strings.HasPrefix(submoduleURL, "../") && !strings.HasPrefix(submoduleURL, "./")) {
return submoduleURL, nil
}
- // Check if parent URL is SCP-like (e.g., git@github.com:org/repo.git)
- // These cannot be properly parsed by net/url and relative submodule resolution is not supported.
- if scpLikeURLRegex.MatchString(parentURL) {
- return "", fmt.Errorf("relative submodule URL %q cannot be resolved: parent URL %q uses SCP-like syntax which is not supported for relative submodule resolution (see https://github.com/coder/envbuilder/issues/492)", submoduleURL, RedactURL(parentURL))
- }
-
- // Parse the parent URL
- parentParsed, err := url.Parse(parentURL)
+ // Parse the parent URL using go-git's endpoint parser, which handles
+ // SCP-like URLs (git@host:path) in addition to standard URLs.
+ parentEP, err := transport.NewEndpoint(parentURL)
if err != nil {
return "", fmt.Errorf("parse parent URL: %w", err)
}
- // For relative URLs, we need to resolve them against the parent's path
- // The parent path represents a repository (like a file in filesystem terms)
- // So ../something means "sibling repository"
- parentPath := strings.TrimSuffix(parentParsed.Path, "/")
+ // For relative URLs, we need to resolve them against the parent's path.
+ // The parent path represents a repository (like a file in filesystem terms),
+ // so ../something means "sibling repository".
+ parentPath := strings.TrimSuffix(parentEP.Path, "/")
// Split the submodule URL into components
// and manually walk up the directory tree for each ../
@@ -554,18 +545,9 @@ func ResolveSubmoduleURL(parentURL, submoduleURL string) (string, error) {
}
}
- // Clean the final path
- resolvedPath := path.Clean(currentPath)
-
- // Construct the absolute URL
- resolvedParsed := &url.URL{
- Scheme: parentParsed.Scheme,
- User: parentParsed.User,
- Host: parentParsed.Host,
- Path: resolvedPath,
- }
-
- return resolvedParsed.String(), nil
+ // Reconstruct the URL with the resolved path.
+ parentEP.Path = path.Clean(currentPath)
+ return parentEP.String(), nil
}
// initSubmodules recursively initializes and updates all submodules in the repository.
diff --git git/git_test.go git/git_test.go
index c3a0b69..a324a2a 100644
--- git/git_test.go
+++ git/git_test.go
@@ -807,16 +807,28 @@ func TestResolveSubmoduleURL(t *testing.T) {
expect: "https://example.com/org/main.git/extras/tool.git",
},
{
- name: "badParent",
- parentURL: "://bad",
- subURL: "./child",
- expectErr: "parse parent URL",
+ name: "scpRelativeSibling",
+ parentURL: "git@github.com:org/main.git",
+ subURL: "../deps/lib.git",
+ expect: "ssh://git@github.com/org/deps/lib.git",
},
{
- name: "scpParentWithRelativeSubmodule",
+ name: "scpRelativeChild",
parentURL: "git@github.com:org/main.git",
- subURL: "../other/submodule.git",
- expectErr: "SCP-like syntax which is not supported",
+ subURL: "./extras/tool.git",
+ expect: "ssh://git@github.com/org/main.git/extras/tool.git",
+ },
+ {
+ name: "scpMultiLevelUp",
+ parentURL: "git@github.com:a/b/c/repo.git",
+ subURL: "../../other/lib.git",
+ expect: "ssh://git@github.com/a/b/other/lib.git",
+ },
+ {
+ name: "httpsMultiLevelUp",
+ parentURL: "https://example.com/a/b/c/repo.git",
+ subURL: "../../other/lib.git",
+ expect: "https://example.com/a/b/other/lib.git",
},
{
name: "scpParentWithAbsoluteSubmodule",There was a problem hiding this comment.
This also supports ports in SCP-like URLs:
{
name: "scpWithPort",
parentURL: "git@github.com:2222:org/main.git",
subURL: "../deps/lib.git",
expect: "ssh://git@github.com:2222/org/deps/lib.git",
},- Add RedactURL function that handles all common URL formats: - Standard URLs with userinfo (user:pass, token-only, user-only) - URL-encoded credentials (p%40ss%3Aw0rd) - SCP-like URLs with any user (git@, deploy@, token@) - Various schemes (http, https, ssh, git, ftp, sftp) - IPv6 hosts - Uses net/url for scheme URLs (handles encoding automatically) - Uses regex for SCP-style URLs - Redact credentials from parent URL, submodule URL, and resolved URL logs - Add comprehensive tests covering all credential patterns
Addresses credential exfiltration risk where a malicious .gitmodules could point to an attacker-controlled server and receive the parent repo's auth. - Add extractHost() to parse host from standard URLs and SCP-like URLs - Add SameHost() to compare hosts (case-insensitive, ignores port) - Only forward RepoAuth to submodule if hosts match - Log warning when auth is not forwarded due to different host - Add comprehensive tests for SameHost()
- CreateGitServerWithSubmodule now creates a proper git submodule with: - .gitmodules file - Submodule config in .git/config - Gitlink entry (mode 160000) in the index pointing to submodule commit - Add CommitSubmodule helper to create proper submodule entries - Update TestGitSubmodules to verify the submodule file is actually cloned, not just that .gitmodules exists
- Add explicit check for SCP-like parent URLs when resolving relative submodules - Return clear error message with link to tracking issue - Document limitation in function doc comment - Add test cases for SCP parent with relative and absolute submodule URLs Addresses review feedback from johnstcn.
URLs like oauth2:token@host:path were being misinterpreted as having scheme 'oauth2'. Check SCP regex first, and also require parsed.Host to be non-empty for standard URL handling.
Move standard URL parsing back before SCP check, but require both scheme AND host to be non-empty. This correctly handles: - oauth2:token@host:path (scheme=oauth2, host=empty → SCP fallback) - https://user@[2001:db8::1]/path (scheme=https, host=[2001:db8::1] → standard URL)
Add submodule cloning support via native Go implementation
Summary
This PR resolves/mitigates #74
Changes
Since the existing submodule support in
git.godepends on execution of the git binary, this implementation circumvents that limitation by traversing the tree to clone submodules natively in Go.Behavior