-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add a new NDJSON / JSONL input source #4721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
rosecodym
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start! In addition to my inline questions, I have this one: Are scans of individual paths cancellation-aware? It looks like the source is only cancellation-aware between paths.
trufflesteeeve
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, however I'm waiting on others to review before I give my own approval.
camgunz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of nits and suggestions, but super clean and straightforward--looks great 👍🏻
| e.Metadata = *aux.Metadata | ||
| if aux.DataB64 != nil { | ||
| e.Data = *aux.DataB64 | ||
| } else { | ||
| e.Data = []byte(*aux.Data) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I thought about this a little but my brain sputtered out: do you think there's a significant performance cost to this copying? I think there might be? But if it's not reasonable to avoid then that's that 🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first case (aux.DataB64 != nil), the value being copied is a byte slice ([]byte). It's a shallow copy of 16 bytes, so "essentially free", right?
In the second case, it's a string value being converted to a byte slice. A bit of investigation indicates that this does in general result in a copy of the data being made, since string and []byte differ in mutability in Go. That's unfortunate. We might be able to use unsafe.Slice + unsafe.StringData to do a zero-cost conversion, but we'd have to think hard about the safety of such a thing.
In any case, the serialization format that this input source accepts is JSON, which is very not great in terms of performance. It is very easy to generate, however, from just about any language you want. We can consider a better serialization format for a similar source in the future if performance here becomes a constraint.
|
|
||
| metadataJSON, err := entry.Metadata.MarshalJSON() | ||
| if err != nil { | ||
| ctx.Logger().Error(err, "failed to convert metadata to JSON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: log the entry in a separate logging call at like... level 4 or 5 maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This adds a new input source to TruffleHog, accessible via
trufflehog json-enumerator.This input source requires a list of filenames, each of which is an NDJSON-formatted sequence of objects that take one of two forms:
Form 1:
{"data": "utf-8 string", "metadata": <non-null JSON value>}Form 2:
{"data_b64": "base64-encoded bytestring", "metadata": <non-null JSON value>}The
data/data_b64field specifies the content to be scanned. Themetadatafield is arbitrary, and is simply propagated downstream with scan results from the corresponding content.Note that although
trufflehog json-enumeratorrequires a list of filenames to be given, the NDJSON data that you wish to scan may not need to be first written to disk. On Linux and macOS, at least, you can use shell process substitution to set up a named pipe from a producer process, liketrufflehog json-enumerator <(some-program-that-emits-ndjson).