Skip to content

Comments

feat: add git submodule support#485

Open
bjornrobertsson wants to merge 31 commits intomainfrom
74-git-submodule-support
Open

feat: add git submodule support#485
bjornrobertsson wants to merge 31 commits intomainfrom
74-git-submodule-support

Conversation

@bjornrobertsson
Copy link
Contributor

Add submodule cloning support via native Go implementation

Summary

This PR resolves/mitigates #74

Changes

Since the existing submodule support in git.go depends on execution of the git binary, this implementation circumvents that limitation by traversing the tree to clone submodules natively in Go.

Behavior

  • Functionality can be toggled via a parameter (ENVBUILDER_GIT_CLONE_SUBMODULES)
  • Only runs on initial workspace start
  • Does not repeat on subsequent starts (unless the repository is deleted)

Add support for initializing and updating git submodules during repository cloning.
Added GitCloneSubmodules option to support recursive submodule initialization.
Resolved issues with git submodule handling for better cloning and URL resolution.
Resolved issues with git submodule handling to ensure robust cloning and URL resolution without relying on git binary calls.
Adding tests for Submodules and two missing
@bjornrobertsson bjornrobertsson linked an issue Dec 11, 2025 that may be closed by this pull request
@bjornrobertsson
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA
recheck

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @bjornrobertsson !
I have some comments below, and I think it would also be worthwhile adding an integration test (see: integration/integration_test.go and testutil/gittest/gittest.go).

Comment on lines +505 to +528
{
name: "absolute",
parentURL: "https://example.com/org/main.git",
subURL: "https://github.com/other/repo.git",
expect: "https://github.com/other/repo.git",
},
{
name: "relativeSibling",
parentURL: "https://example.com/org/main.git",
subURL: "../deps/lib.git",
expect: "https://example.com/org/deps/lib.git",
},
{
name: "relativeChild",
parentURL: "https://example.com/org/main.git",
subURL: "./extras/tool.git",
expect: "https://example.com/org/main.git/extras/tool.git",
},
{
name: "badParent",
parentURL: "://bad",
subURL: "./child",
expectErr: "parse parent URL",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test with the URL schemes from TestSetupRepoAuth:

  • SSH with scheme (ssh://host.tld/repo)
  • SSH without scheme (git@host.tld:repo/path)
  • Git scheme (git://git@host.tld:repo/path)
  • Absolute local path (/path/to/repo)
  • Relative paths without ./.. (path/to/repo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing SSH and Git revealed negative user experience if ssh:// and git_username were passed. Lead to: #486
Otherwise testing are fine.
The location when extracted leads to the need to work around relative paths, a lot of work and testing led to the current methods as working

git/git.go Outdated
}

// initSubmodules recursively initializes and updates all submodules in the repository.
func initSubmodules(ctx context.Context, logf func(string, ...any), repo *git.Repository, opts CloneRepoOptions) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always a bit wary about recursive functions. I'd feel better about this if we had a maximum depth of which to recurse. I'd wager that most repos won't need more than 2 iterations. If ENVBUILDER_GIT_CLONE_SUBMODULES were changed to accept a positive integer instead, this could function as the number of times to recurse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the acceptance to 0-10, true, false - and added a Depth testing

parentSrv = httptest.NewServer(opts.AuthMW(NewServer(fs)))
}
return parentSrv, submoduleSrv
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this needs to be its own function. I think we could probably in-line this into the relevant test for now.

git/git.go Outdated
Comment on lines 145 to 150
if opts.Submodules {
err = initSubmodules(ctx, logf, repo, opts)
if err != nil {
return true, fmt.Errorf("init submodules: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the preferable option to me vs bespoke implementation. 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function breaks because of the relative path, and reason for handling this manually (it was confounding and took some time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and v6 renames it, fixes the typo but we're not at v6 yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you're referencing this commit? We could fork and backport this fix to v5 until v6 comes out. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing concrete that the relative path issues are resolved, more they're not. Checks indicate that there are known bugs with:

  • HTTPS URLs losing a slash
  • Multiple relative paths (../../submodule.git) not working

git/git.go Outdated
Comment on lines 145 to 150
if opts.Submodules {
err = initSubmodules(ctx, logf, repo, opts)
if err != nil {
return true, fmt.Errorf("init submodules: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the preferable option to me vs bespoke implementation. 👍🏻

| `--git-clone-depth` | `ENVBUILDER_GIT_CLONE_DEPTH` | | The depth to use when cloning the Git repository. |
| `--git-clone-single-branch` | `ENVBUILDER_GIT_CLONE_SINGLE_BRANCH` | | Clone only a single branch of the Git repository. |
| `--git-clone-thinpack` | `ENVBUILDER_GIT_CLONE_THINPACK` | `true` | Git clone with thin pack compatibility enabled, ensuring that even when thin pack compatibility is activated,it will not be turned on for the domain dev.zaure.com. |
| `--git-clone-submodules` | `ENVBUILDER_GIT_CLONE_SUBMODULES` | | Recursively clone Git submodules after cloning the repository. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to add --git-clone-submodules-depth to limit recursion depth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about --git-clone-submodules-depth=0 as "don't clone submodules (default)"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just add conditional string eval, default, and 0 is 'false' and a number is Depth?
Transitional, it might be nicer if 'true' opens up a sub-menu for Depth in the template, poc:

  validation {
    regex = "^(true|false|[0-9]|10)$"
    error = "Must be 'true', 'false', or a number from 0-10."
  }
# and #
    "ENVBUILDER_GIT_CLONE_SUBMODULES" : tostring(local.submodule_depth),

- Added SubmoduleDepth custom type that accepts 'true' (depth 10), 'false' (0), or positive integer
- initSubmodules now tracks current depth and stops at max depth
- Default is 0 (disabled) - submodules are not cloned unless explicitly enabled
@johnstcn johnstcn changed the title 74 git submodule support feat: add git submodule support Feb 6, 2026
git/git.go Outdated
}
}

// resolveSubmoduleURL resolves a potentially relative submodule URL against the parent repository URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// resolveSubmoduleURL resolves a potentially relative submodule URL against the parent repository URL

git/git.go Outdated
// Use SingleBranch=false to fetch all branches so we can find the commit
subRepo, err := git.CloneContext(ctx, gitStorage, subFS, &git.CloneOptions{
URL: resolvedURL,
Auth: opts.RepoAuth,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: This allows the parent repo auth to be exfiltrated by a malicious Git host. We should only forward auth if the submodule shares the same host as the parent repo. Or we could consider adding an auth allow list if users need more control.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added extractHost(url) - extracts hostname from both standard URLs and SCP-like URLs
  • Added SameHost(url1, url2) - compares hosts case-insensitively, ignoring ports
  • Auth is only forwarded if SameHost(parentURL, submoduleURL) returns true
  • Warning logged when auth is withheld: ⚠ Not forwarding auth to submodule (different host: evil.com)

git/git.go Outdated
if origin, hasOrigin := cfg.Remotes["origin"]; hasOrigin && len(origin.URLs) > 0 {
parentURL = origin.URLs[0]
}
logf("Parent repository URL: %s", parentURL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider logging the redacted URL here? This URL could contain credentials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added with as many URI's as possible

url = %s
`, submoduleSrv.URL)
commits = append(commits, Commit(t, ".gitmodules", gitmodulesContent, "add submodule"))
_ = NewRepo(t, fs, commits...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually create a valid submodule/parent module setup? It doesn't look like the test that uses this actually verifies the submodule is properly cloned? Ideally there would be a committed file that gets verified too.

We should probably expand this with regards to the security issue as well, and make sure we don't exfiltrate parent repo auth unless allowed?

}

// Parse the parent URL
parentParsed, err := url.Parse(parentURL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the parent URL is SCP-like (git@github.com:org/repo.git), and the submodule is relative, we don't hit the guard above and url.Parse can't parse the SCP-like URL.

Suggestion: Use parentEP := transport.NewEndpoint(parentURL) instead? You should be able to just update the parentEP.Path and return a string instead of the url.URL construction below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that url.Parse doesn't handle git@github.com:org/repo.git correctly - it ends up treating the whole thing as a path.

However, I'd prefer to keep the current implementation for now for a few reasons:

This function was built specifically to work around go-git's broken relative URL resolution. The native Submodules().Update() has known issues:

HTTPS URLs lose a slash when using filepath.Dir() internally
Multiple relative paths (../../submodule.git) don't resolve correctly
These bugs exist in v5 and there's no clear indication v6 fixes them
Most repos using SCP-style URLs (GitHub, GitLab, etc.) have submodules configured with absolute URLs or HTTPS URLs. And often they're even modified to HTTPS. The common case we're solving is HTTPS parent + relative submodule.

Using transport.NewEndpoint() risks reintroducing the bugs we fixed since it's the same go-git code path that has the known issues.

If we do see issues with SCP-like parent URLs in the wild, we can add detection similar to what we did in RedactURL and handle that case separately. But I'd rather not expand scope here without a concrete use case driving it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we should

  1. Specifically check for SCP-like submodule URL syntax (as opposed to "just" an invalid URL),
  2. Drop an ERROR / WARN log mentioning that these are not supported right now with a link to a follow-up issue,
  3. Document this limitation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjornrobertsson I had an LLM try out this suggestion, and from what I can tell it works as expected?

See change + test cases in this patch:

diff --git git/git.go git/git.go
index 65edbcf..8f048aa 100644
--- git/git.go
+++ git/git.go
@@ -509,32 +509,23 @@ func RedactURL(u string) string {
 }
 
 // ResolveSubmoduleURL resolves a potentially relative submodule URL against a parent repository URL.
-//
-// Limitation: SCP-like URLs (e.g., git@github.com:org/repo.git) are not supported as parent URLs
-// when the submodule uses a relative path. This is a known limitation.
-// See: https://github.com/coder/envbuilder/issues/492
 func ResolveSubmoduleURL(parentURL, submoduleURL string) (string, error) {
 	// If the submodule URL is absolute (contains ://) or doesn't start with ./ or ../, return it as-is
 	if strings.Contains(submoduleURL, "://") || (!strings.HasPrefix(submoduleURL, "../") && !strings.HasPrefix(submoduleURL, "./")) {
 		return submoduleURL, nil
 	}
 
-	// Check if parent URL is SCP-like (e.g., git@github.com:org/repo.git)
-	// These cannot be properly parsed by net/url and relative submodule resolution is not supported.
-	if scpLikeURLRegex.MatchString(parentURL) {
-		return "", fmt.Errorf("relative submodule URL %q cannot be resolved: parent URL %q uses SCP-like syntax which is not supported for relative submodule resolution (see https://github.com/coder/envbuilder/issues/492)", submoduleURL, RedactURL(parentURL))
-	}
-
-	// Parse the parent URL
-	parentParsed, err := url.Parse(parentURL)
+	// Parse the parent URL using go-git's endpoint parser, which handles
+	// SCP-like URLs (git@host:path) in addition to standard URLs.
+	parentEP, err := transport.NewEndpoint(parentURL)
 	if err != nil {
 		return "", fmt.Errorf("parse parent URL: %w", err)
 	}
 
-	// For relative URLs, we need to resolve them against the parent's path
-	// The parent path represents a repository (like a file in filesystem terms)
-	// So ../something means "sibling repository"
-	parentPath := strings.TrimSuffix(parentParsed.Path, "/")
+	// For relative URLs, we need to resolve them against the parent's path.
+	// The parent path represents a repository (like a file in filesystem terms),
+	// so ../something means "sibling repository".
+	parentPath := strings.TrimSuffix(parentEP.Path, "/")
 
 	// Split the submodule URL into components
 	// and manually walk up the directory tree for each ../
@@ -554,18 +545,9 @@ func ResolveSubmoduleURL(parentURL, submoduleURL string) (string, error) {
 		}
 	}
 
-	// Clean the final path
-	resolvedPath := path.Clean(currentPath)
-
-	// Construct the absolute URL
-	resolvedParsed := &url.URL{
-		Scheme: parentParsed.Scheme,
-		User:   parentParsed.User,
-		Host:   parentParsed.Host,
-		Path:   resolvedPath,
-	}
-
-	return resolvedParsed.String(), nil
+	// Reconstruct the URL with the resolved path.
+	parentEP.Path = path.Clean(currentPath)
+	return parentEP.String(), nil
 }
 
 // initSubmodules recursively initializes and updates all submodules in the repository.
diff --git git/git_test.go git/git_test.go
index c3a0b69..a324a2a 100644
--- git/git_test.go
+++ git/git_test.go
@@ -807,16 +807,28 @@ func TestResolveSubmoduleURL(t *testing.T) {
 			expect:    "https://example.com/org/main.git/extras/tool.git",
 		},
 		{
-			name:      "badParent",
-			parentURL: "://bad",
-			subURL:    "./child",
-			expectErr: "parse parent URL",
+			name:      "scpRelativeSibling",
+			parentURL: "git@github.com:org/main.git",
+			subURL:    "../deps/lib.git",
+			expect:    "ssh://git@github.com/org/deps/lib.git",
 		},
 		{
-			name:      "scpParentWithRelativeSubmodule",
+			name:      "scpRelativeChild",
 			parentURL: "git@github.com:org/main.git",
-			subURL:    "../other/submodule.git",
-			expectErr: "SCP-like syntax which is not supported",
+			subURL:    "./extras/tool.git",
+			expect:    "ssh://git@github.com/org/main.git/extras/tool.git",
+		},
+		{
+			name:      "scpMultiLevelUp",
+			parentURL: "git@github.com:a/b/c/repo.git",
+			subURL:    "../../other/lib.git",
+			expect:    "ssh://git@github.com/a/b/other/lib.git",
+		},
+		{
+			name:      "httpsMultiLevelUp",
+			parentURL: "https://example.com/a/b/c/repo.git",
+			subURL:    "../../other/lib.git",
+			expect:    "https://example.com/a/b/other/lib.git",
 		},
 		{
 			name:      "scpParentWithAbsoluteSubmodule",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also supports ports in SCP-like URLs:

		{
			name:      "scpWithPort",
			parentURL: "git@github.com:2222:org/main.git",
			subURL:    "../deps/lib.git",
			expect:    "ssh://git@github.com:2222/org/deps/lib.git",
		},

- Add RedactURL function that handles all common URL formats:
  - Standard URLs with userinfo (user:pass, token-only, user-only)
  - URL-encoded credentials (p%40ss%3Aw0rd)
  - SCP-like URLs with any user (git@, deploy@, token@)
  - Various schemes (http, https, ssh, git, ftp, sftp)
  - IPv6 hosts
- Uses net/url for scheme URLs (handles encoding automatically)
- Uses regex for SCP-style URLs
- Redact credentials from parent URL, submodule URL, and resolved URL logs
- Add comprehensive tests covering all credential patterns
Addresses credential exfiltration risk where a malicious .gitmodules could
point to an attacker-controlled server and receive the parent repo's auth.

- Add extractHost() to parse host from standard URLs and SCP-like URLs
- Add SameHost() to compare hosts (case-insensitive, ignores port)
- Only forward RepoAuth to submodule if hosts match
- Log warning when auth is not forwarded due to different host
- Add comprehensive tests for SameHost()
- CreateGitServerWithSubmodule now creates a proper git submodule with:
  - .gitmodules file
  - Submodule config in .git/config
  - Gitlink entry (mode 160000) in the index pointing to submodule commit
- Add CommitSubmodule helper to create proper submodule entries
- Update TestGitSubmodules to verify the submodule file is actually cloned,
  not just that .gitmodules exists
- Add explicit check for SCP-like parent URLs when resolving relative submodules
- Return clear error message with link to tracking issue
- Document limitation in function doc comment
- Add test cases for SCP parent with relative and absolute submodule URLs

Addresses review feedback from johnstcn.
URLs like oauth2:token@host:path were being misinterpreted as having
scheme 'oauth2'. Check SCP regex first, and also require parsed.Host
to be non-empty for standard URL handling.
Move standard URL parsing back before SCP check, but require both
scheme AND host to be non-empty. This correctly handles:
- oauth2:token@host:path (scheme=oauth2, host=empty → SCP fallback)
- https://user@[2001:db8::1]/path (scheme=https, host=[2001:db8::1] → standard URL)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Git submodule support

3 participants