feat: add GenAI evaluation OTel event support #1656
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Evaluation frameworks like
strands_evalsneed a way to export evaluation results as OpenTelemetry events so they can be visualized in any OTel-compatible backend (Datadog, Jaeger, Honeycomb, etc.). Currently there is no standard way to emitgen_ai.evaluation.resultevents on spans from within the SDK.open-telemetry/semantic-conventions#3398
#1633
This PR adds a lightweight evaluation telemetry API to
strands.telemetrythat follows the proposedgen_ai.evaluation.resultOTel semantic convention. The API is opt-in — no telemetry is emitted unless the developer explicitly calls these functions.Public API Changes
New exports from
strands.telemetry:None-valued fields are omitted from OTel attributes. None/non-recording spans are silently skipped.
Use Cases
response_idRelated Issues
N/A — new feature
Documentation PR
N/A — docs update will follow separately
Type of Change
New feature
Testing
31 tests (25 unit + 6 property-based with Hypothesis):
EvaluationResultdataclass construction andto_otel_attributes()mappingEvaluationEventEmitter.emit()span interactionadd_evaluation_event()convenience function equivalenceset_test_suite_context()/set_test_case_context()attribute correctnessEdge cases: None span, non-recording span, missing name ValueError
Public API export verification
I ran
hatch run prepareChecklist