Skip to content

Comments

MissingRemoteId: also check EGIT_REPO_URI#643

Draft
parona-source wants to merge 1 commit intopkgcore:masterfrom
parona-source:EGIT_REPO_URI_remoteid
Draft

MissingRemoteId: also check EGIT_REPO_URI#643
parona-source wants to merge 1 commit intopkgcore:masterfrom
parona-source:EGIT_REPO_URI_remoteid

Conversation

@parona-source
Copy link
Contributor

@parona-source parona-source commented Dec 22, 2023

Making it a draft as I'd like input if on if there is a better way to do to get the value of EGIT_REPO_URI with pkgcore's API.

Also I'm unsure if I correctly added the test, also a test scenario where EGIT_REPO_URI isnt set in the ebuild itself would be good to add.

Keep in mind that I want get the value of the variable post inherit as eclasses like kde.org set it as well.

@parona-source parona-source marked this pull request as ready for review December 25, 2023 11:16
urls = urls.union(pkg.homepage)

if "git-r3" in pkg.inherited and hasattr(pkg, "environment"):
egit_repo_uri = re.compile(r"EGIT_REPO_URI=\"(.*)\"")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally speaking, what you're up to here isn't safe due to bash being 'bash'. Minimally that has to be anchored to start/endline.

Personally, I'd rather not do regex on bash since it's not robust. EBD could be tweaked to spit back arbitrary requested variables which is far safer- as in, do the parse, then pull the raw value out of the bash env post render.

There may be other tooling for this that pkgcore/pkgcheck uses that I'm unaware of, however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be happy with this now?

egit_repo_uri = re.compile(r"^declare -- EGIT_REPO_URI=\"(.*)\"$")

Copy link
Contributor

@ferringb ferringb Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume I'm never happy due to personality flaws. :)

egit_repo_uri = re.compile(r"^declare -[^ ]* EGIT_REPO_URI=([\"'])(?<content>.*)\1$")

Is probably tighter, albeit untested;

  • the declare part ignores if it's exported. It also ignores if it's an array... which is fine to ignore. Even I won't boil that particular ocean. :)
  • note the (?P<content>- named group. I suggest this since you should move this compile as a class var; in doing so, you want to disconnect any hardcoded regex positional assumptions, and use a named capture instead.

Note; you're hardcoding the whitespace usage. If this ever goes to tabs, etc. That problem is out of your scope- this is something that should be implemented centrally so any consumer, like you, just can iterate through "here's all the vars defined in that ebuild's rendered scope". That central implementation is the one that should be explicitly specific and anal about things I'm mentioning; we shouldn't do that in end consumer implementations like this, however.

Does that make sense? If so, any interest in tackling it as a seperate PR after this one? If not, just let me know and I'll ticket this for future work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that make sense? If so, any interest in tackling it? If not, just let me know and I'll ticket this for future work.

Yes it does. And yeah I can look into this. Another change to dive into pkgcheck internals :)

Copy link
Contributor

@ferringb ferringb Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a ticket laying out what you'd want in terms of API capabilities? I have notions enumerated below, but you're writing checks- I don't. The backend API's are where I stick to. Said plainly: I may be very much talking out of my ass, even if I'm open to wiring what you're asking for.

My rough thoughts here:

  • you clearly want a mapping of variables. Id' think the value objects should also have attributes exposing the bash bits like array or export, so y'all can write QA against "this shouldn't be exported" or "this variables typing must be array".

  • Is a mapping of functions actually useful? I think I've seen checks that try to detect if some "you must never override this phase" eclass export was overriden. NFC, but if a mapping of functions is added- down the line- I suspect adding an exposure of "this is the chain of overrides" might be useful. Or not; again, I just do internals, I don't write checks.

    If it's not immediately useful, I'd just suggest the object api of the parsed environment is designed for this to be added. IE, evolve the api capabilities; it doesn't have to be 'everything' up front.

  • I loathe that this shouldn't be part of the pkgcore API itself. Pkgcore is a package manager- thus minimal deps are required for it. Tree-sitter isn't a viable dep IMO, and ultimately, what you're doing should be moved from regex to tree-sitter to deal w/ bash being 'bash' in regards to ways to do stuff. The regex I proposed is solid- tree-sitter just ensures we're not sensitive to bash changes like this, thus why I think it's the end implementation, even if v1 is regex based.

    TL;DR: unless @thesamesam @arthurzam @mgorny are good w/ a tree-sitter dep in pkgcore, the addition we're talking about here would have to be at the pkgcheck layer. I'd suggest injecting a pkg wrapper for results from repos that laces this in; basically you'll extend the pkg object's exposed API. If this doesn't make sense, ask- it's buried, but it's something we do in a lot of places for situations like this.

^^^ that's my thoughts coming at this from the guts/internals standpoint, and guessing about what y'all who write checks may want. Feel free to ignore/expand/whatever :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to be clear- thanks for whatever you can suggest. Checks are never going to be simple, but the more we can provide a solid API- and then document the shit out of it and how to write checks- the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was poking around in portage itself due to an unrelated reason.

Their regex is quite something https://github.com/gentoo/portage/blob/master/bin/filter-bash-environment.py#L13

Copy link
Contributor

@ferringb ferringb Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about your API needs- we've got a few ways to do this in a fashion that "it works, no question":

  1. EBD wise, we've got a direct interp in that env. We can make it serialize the env vars back out, literally, one by one. This is guaranteed to always be 'perfect' but there is a perf hit. If the "one by one" doesn't make sense- EBD allows full transfer of all bash context, bidirectionally.
  2. We can obviously do regex against results- dump the full env, regex shit out. Being we control the serialization bash side, this is basically safe.
  3. Slightly insane notion- via the general EBD machinery, there's actually nothing stopping us from transfering an ebuild state back into EBD, and then leveraging the bash side to do the parse and spit out variables one by one (1. above, basically). This is probably more relevant for interaction w/ binpkgs or VDB.
  4. Or we throw this all through tree-sitter if it's available. Being we trust tree-sitter for pkgcheck parses, we should trust it at this level.

I'm fucking around with a build out of 1. and 2. locally. 3. is a trivial manipulation of our EBD objects, so we can do that easily.

The 4th option- tree-sitter- I'm not as familiar with. If someone wants to knock that out code wise, I'd appreciate.

@parona-source as to your portage regex, half that code looks like things I wrote and gave to portage, and I moved pkgcore away from due to the edge cases. The correct approach here is to control the interaction with bash- how you get it to spit shit out- and force it into a controlled manner. IE, export should never come out of a basic give me the declare statement for example, not unless you're doing it wrong, or you have long dead patterns in your regex.

In this scenario, any env parsing we do for QA will come from pkgcore's implementation of the env state capture and serialization. If we ever spit out an export in a state serialization, that's a bug at the serialization layer. No code above should try to compensate, it violates the internal contract.

tl;dr: we're pissy about not tolerating layer violations :)

Copy link
Contributor

@ferringb ferringb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly way the fuck late, but please give a repro command... preferably for modern pkgcheck. :)

I ran a quick pkgcheck against some ebuilds that had git-r3 as an inheritance- I could not see any reports generated related to this. IE, I need a modern repro please, unless this is no longer an issue..

Signed-off-by: Alfred Wingate <parona@protonmail.com>
@parona-source
Copy link
Contributor Author

parona-source commented Feb 19, 2026

One notable place where this useful is the kde overlay. A lot of the packages miss this remote.

https://github.com/gentoo/kde

(Limiting category as there are quite a lot missing this)

(pkgcheck-egit) ask@bigglane /home/ask/sources/kde.d/_ $ pkgcheck scan -k MissingRemoteId media-gfx/
media-gfx/kgeotag
  MissingRemoteId: missing <remote-id type="kde-invent">graphics/kgeotag</remote-id> (inferred from URI 'https://invent.kde.org/graphics/kgeotag')

media-gfx/kxstitch
  MissingRemoteId: missing <remote-id type="kde-invent">graphics/kxstitch</remote-id> (inferred from URI 'https://invent.kde.org/graphics/kxstitch')

media-gfx/kgraphviewer
  MissingRemoteId: missing <remote-id type="kde-invent">graphics/kgraphviewer</remote-id> (inferred from URI 'https://invent.kde.org/graphics/kgraphviewer')

media-gfx/skanpage
  MissingRemoteId: missing <remote-id type="kde-invent">utilities/skanpage</remote-id> (inferred from URI 'https://invent.kde.org/utilities/skanpage')

Addressing these in the repo would then make it easier to update them in ::gentoo and such would make my life easier by showing a nice invent.kde.org link for the packages in packages.gentoo.org.

Copy link
Contributor

@ferringb ferringb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by obesrvations/suggestions. The hasattr I'm particularly interested in.

The actual benefit of this, etc- that I'm not evaluating, just the implementation.

urls = set(filter(self.__filter_url, all_urls))
urls = sorted(urls.union(pkg.homepage), key=len)

if "git-r3" in pkg.inherited and hasattr(pkg, "environment"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is hasattr needed here?

I don't know this codepath fully- is there an implication that PackageRepoSource isn't guaranteed to surface the environment text data? Because from the pkgcore side, it should always be accessible afaik.

And if it isn't, I believe that is a bug and should be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a cursory trace of the ebuild src side- I can't find any scenario where pkg.environment isn't guaranteed to exist. If you can point me at any code you copied that pattern from, I'd appreciate- I can grep, I've just not yet, busy with other shit :)

Our pkg api isn't documented or enforced, even if it's consistent, but environment should be explicit. I'm poking at this discussion because I'm in the process of lacing ABC through that layer so people know what they can rely on, and I think this should be a guaranteed attribute.

if "git-r3" in pkg.inherited and hasattr(pkg, "environment"):
egit_repo_uri = re.compile(r"^declare -- EGIT_REPO_URI=\"(.*)\"$")
for env_line in pkg.environment.data.splitlines():
result = re.search(egit_repo_uri, env_line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if result := re.search(egit_repo_uri, env_line):
  urls.append ...

urls = sorted(urls.union(pkg.homepage), key=len)

if "git-r3" in pkg.inherited and hasattr(pkg, "environment"):
egit_repo_uri = re.compile(r"^declare -- EGIT_REPO_URI=\"(.*)\"$")
Copy link
Contributor

@ferringb ferringb Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're hardcoding that bash currently uses " for these directives, and that the export statement doesn't inject crap like actual 'export' (which the env saving implementation should- declare -e iirc, if it's marked as exported). Any of those will break this; a code comment note, and/or an expansion of the regex is warranted.

For the regex, this should be moved to a class var; it both is easier to spot it, and it avoids re.compile relying on it's internal LRU to avoid the recompile. If it must recompile, it's surprisingly not cheap.

Your code here implies pkgcore should provide a better api for this, also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also- considering that regex has mild complexity to get it right- as I mentioned for things like export or array- perhaps this should be broke out in a later PR as a util func, and other regex's rebased to it?

# Third generation eclass for easing maintenance of live ebuilds using
# git as remote repository.

case ${EAPI} in
Copy link
Contributor

@ferringb ferringb Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of scope for your PR, just an observation.

I suspect we should inject a function into ebuild/eclass environment tests that is require_eapi $@. It'll reduce this sort of boilerplate. It's our tests, this sort of helper would be fine.

What's others thoughts? And are there are other areas we should do this? Having the capability I'd expect would lead to finding usage.

IE, if we build the injection- is it just for this, or are we improving the general case of our tests?

@parona-source parona-source marked this pull request as draft February 19, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants