The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
-
Updated
Feb 11, 2026 - TypeScript
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
Evaluate your LLM's response with Prometheus and GPT4 💯
👩⚖️ Agent-as-a-Judge: The Magic for Open-Endedness
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
Inference-time scaling for LLMs-as-a-judge.
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
A native policy enforcement layer for AI coding agents. Built on OPA/Rego.
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Solving Inequality Proofs with Large Language Models.
First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
(NeurIPS 2025) Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
[ICLR 2026 Oral] Generative Universal Verifier as Multimodal Meta-Reasoner
A set of tools to create synthetically-generated data from documents
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
OmicsBench: Distinguishing Multi-Omics Reasoning from Shortcut Learning in Large Language Models
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
Add a description, image, and links to the llm-as-a-judge topic page so that developers can more easily learn about it.
To associate your repository with the llm-as-a-judge topic, visit your repo's landing page and select "manage topics."