Skip to content

Conversation

@bradlarsen
Copy link
Contributor

@bradlarsen bradlarsen commented Jan 29, 2026

This adds a new input source to TruffleHog, accessible via trufflehog json-enumerator.

This input source requires a list of filenames, each of which is an NDJSON-formatted sequence of objects that take one of two forms:

Form 1: {"data": "utf-8 string", "metadata": <non-null JSON value>}
Form 2: {"data_b64": "base64-encoded bytestring", "metadata": <non-null JSON value>}

The data / data_b64 field specifies the content to be scanned. The metadata field is arbitrary, and is simply propagated downstream with scan results from the corresponding content.

Note that although trufflehog json-enumerator requires a list of filenames to be given, the NDJSON data that you wish to scan may not need to be first written to disk. On Linux and macOS, at least, you can use shell process substitution to set up a named pipe from a producer process, like trufflehog json-enumerator <(some-program-that-emits-ndjson).

@bradlarsen bradlarsen requested a review from a team January 29, 2026 19:41
@bradlarsen bradlarsen requested review from a team as code owners January 29, 2026 19:41
Copy link
Contributor

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start! In addition to my inline questions, I have this one: Are scans of individual paths cancellation-aware? It looks like the source is only cancellation-aware between paths.

Copy link
Contributor

@trufflesteeeve trufflesteeeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, however I'm waiting on others to review before I give my own approval.

Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of nits and suggestions, but super clean and straightforward--looks great 👍🏻

Comment on lines +120 to +125
e.Metadata = *aux.Metadata
if aux.DataB64 != nil {
e.Data = *aux.DataB64
} else {
e.Data = []byte(*aux.Data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I thought about this a little but my brain sputtered out: do you think there's a significant performance cost to this copying? I think there might be? But if it's not reasonable to avoid then that's that 🤷🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first case (aux.DataB64 != nil), the value being copied is a byte slice ([]byte). It's a shallow copy of 16 bytes, so "essentially free", right?

In the second case, it's a string value being converted to a byte slice. A bit of investigation indicates that this does in general result in a copy of the data being made, since string and []byte differ in mutability in Go. That's unfortunate. We might be able to use unsafe.Slice + unsafe.StringData to do a zero-cost conversion, but we'd have to think hard about the safety of such a thing.

In any case, the serialization format that this input source accepts is JSON, which is very not great in terms of performance. It is very easy to generate, however, from just about any language you want. We can consider a better serialization format for a similar source in the future if performance here becomes a constraint.


metadataJSON, err := entry.Metadata.MarshalJSON()
if err != nil {
ctx.Logger().Error(err, "failed to convert metadata to JSON")
Copy link
Contributor

@camgunz camgunz Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: log the entry in a separate logging call at like... level 4 or 5 maybe

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants