Skip to main content
Open Source · 100% Local · Zero Cloud

Know What Your
AI Agents Actually Do

Your AI agents make thousands of decisions you never see. AgentReplay captures every call, every token, every cost — so you can debug failures, catch hallucinations, and ship with confidence. All without sending a single byte to the cloud.

No cloud. Ever.Your data never leavesMac, Windows & Linux
Works with the tools you already use
🤖Claude Code
Cursor
🦜LangChain
🔗OpenAI
🅰️Anthropic
🦙LlamaIndex
📡OpenTelemetry
Desktop App

See It in Action

A native desktop app built with Tauri & React. 100% local — your data never leaves your machine.

Trace Explorer
Tracing

Trace Explorer

Browse and search all captured traces, spans, and agent interactions in a unified timeline view.

Why AgentReplay

AI Agents Fail in Ways You Can't Predict

You wouldn't deploy a web app without logs. Why would you deploy an AI agent without observability?

🔍

Agents Hallucinate Silently

Your agent returns confident answers that are completely wrong. Without tracing, you'll never know which retrieval step failed or which prompt caused the drift.

💸

Costs Spiral Without Warning

A single runaway agent loop can burn through your API budget in minutes. AgentReplay tracks every token and dollar in real-time — before the invoice shocks you.

🔒

Cloud Platforms See Everything

Sending your prompts, user data, and business logic to cloud observability platforms creates compliance risks. AgentReplay runs entirely on your machine. Nothing leaves.

⚡ Power Feature

Full Observability for Claude Code & AI Coding Tools

See exactly what Claude Code, Cursor, and other AI coding tools are doing behind the scenes. Every tool call, every file edit, every conversation — traced, stored, and searchable with persistent memory across sessions.

Trace every Claude Code tool use — bash, file edits, search, and multi-step reasoning chains
Persistent memory that survives restarts — your agent context is never lost
MCP Server integration — Claude and Cursor query your trace history for smarter decisions
Evaluate coding quality with built-in evaluators for tool correctness and trajectory efficiency
Cost tracking per session — know exactly what each coding task costs across providers
claude-code-session.trace
┌─ Session: refactor-auth-module
│  Duration: 14m 32s │ Cost: $0.47
│
├─ 🔍 Read src/auth/handler.rs
│  └─ 2,847 tokens analyzed
│
├─ 🧠 Reasoning: "The OAuth flow has
│  │  a race condition in token refresh"
│  └─ Confidence: 0.94
│
├─ ✏️  Edit src/auth/handler.rs
│  └─ +47 lines, -12 lines
│
├─ 🔧 bash: cargo test auth
│  └─ 23 tests passed ✓
│
├─ 📝 Memory saved:
│  └─ "auth module uses PKCE flow,
│     tokens stored in secure keychain"
│
└─ ✅ Session complete
   Total: 8 tool calls │ 12,451 tokens
Platform

Everything You Need. Nothing You Don't.

A complete observability platform that runs on your machine. No accounts, no API keys, no cloud dependencies.

📡

Trace Every LLM Call

OpenTelemetry-native ingestion captures every request, response, and token usage. Auto-instrument OpenAI, Anthropic, LangChain, and LlamaIndex with a single line.

🧪

20+ Built-in Evaluators

Hallucination detection, RAGAS faithfulness, G-Eval quality scoring, toxicity checks, tool correctness, and adversarial robustness testing — out of the box.

🔌

MCP Protocol Server

Your traces and memory become context for AI tools. Claude, Cursor, and any MCP client can query your agent history for smarter, context-aware responses.

📝

Prompt Versioning & A/B Tests

Semantic versioning, deployment environments, canary rollouts, and traffic splitting. Know which prompt version performs best in production.

🧠

Persistent Agent Memory

Observations, sessions, and context that survive restarts. Your agent never forgets what it learned — and neither do you.

💰

Real-Time Cost Tracking

Per-model, per-session, per-project cost breakdowns. Set budget alerts before costs spiral. Know exactly what each agent workflow costs.

🔎

Semantic Vector Search

HNSW index with local ONNX embeddings and 32× compression. Search your traces by meaning, not just keywords. Find the needle in a million spans.

🛡️

Privacy by Architecture

Not just a policy — it's the architecture. No network calls, no cloud sync, no telemetry. PII scrubbing, sensitivity flags, and compliance reports built in.

🖥️

Beautiful Desktop App

Native macOS, Windows, and Linux app. Real-time trace visualization, model comparison, prompt playground — all in a polished UI that feels like home.

Architecture

Built with Rust. Modular by Design.

16 focused crates — each with a single responsibility. Powered by SochDB for ACID storage and HNSW for sub-millisecond vector search.

TRANSPORTagentreplay-serveragentreplay-tauriagentreplay-cliAxum HTTP · gRPC · MCPDesktop App · CLI InterfaceINTELLIGENCEagentreplay-evalsagentreplay-promptsagentreplay-queryagentreplay-memory20+ EvaluatorsVersioned PromptsSTORAGEagentreplay-storageagentreplay-indexCOREagentreplay-coreagentreplay-observabilityEXTENSIBILITYagentreplay-pluginsagentreplay-experimentsagentreplay-telemetryagentreplay-dataSochDB ACID Storage · HNSW Vector Search · WASM Plugins · OpenTelemetry · Hybrid Logical Clock · SIMD Distance · Product Quantization
Compare

Why Not Just Use a Cloud Platform?

Cloud platforms are powerful — but they see your data. AgentReplay gives you the same capabilities with zero compromise on privacy.

CapabilityAgentReplayLangfuseArizeBraintrust
100% Local / Offline
Claude Code Observability
Persistent Agent Memory
Built-in MCP ServerPartial
20+ Evaluators
Prompt Versioning
Native Desktop App
No Account Required
Data Stays on DeviceSelf-host
Free ForeverFreemiumFreemiumFreemium
Quick Start

Instrument in 3 Lines. Debug in Seconds.

Add full observability to any AI agent without changing your business logic. Works with every major LLM provider.

import agentreplay
from openai import OpenAI

# One line — traces flow to your local AgentReplay server
agentreplay.init()

# Wrap your client. That's it. Every call is now traced.
client = agentreplay.wrap_openai(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this PR"}]
)

# Tokens, latency, cost — all captured automatically.
# Open AgentReplay desktop app to see traces in real-time.
100+
REST API Endpoints
20+
Built-in Evaluators
0
Cloud Dependencies
3
SDKs (Python · JS · Rust)

Stop Flying Blind with Your AI Agents

Download AgentReplay, add three lines of code, and see everything your agents do — in real-time, on your machine, for free.