Benchspan

Sp26Pivot 2 of 3

2 people|Active|Website

87°Major Pivot

Before

Run agent benchmarks in minutes, not hours

After

Real-time threat detection for AI agents in production

Full description — before

Benchspan is a benchmarking platform for AI agents. If you're building an agent, you need to know if it's getting better. But running benchmarks is slow, expensive, and fragile. You spend days writing glue code every time you want to run a new benchmark, runs take forever on your laptop, and when they fail halfway through you burn hundreds of dollars in tokens with nothing to show for it. Benchspan fixes all of it. Onboard your agent once, and it works with every benchmark on the platform. We onboarded Claude Code in 37 lines of code. Running a benchmark becomes a single command, executed in parallel in the cloud. Every result goes to one place your whole team can see, with full trajectories, token usage, latency, and custom metrics. When runs partially fail, rerun just the subset that errored instead of starting from scratch. Compare runs side by side to see exactly where your agent is improving and where it's regressing.

Full description — after

We built the most accurate indirect prompt injection classifier available today, trained by the team behind Microsoft's Prompt Shields. Our model catches the attacks that generic guardrails miss: hidden instructions embedded in documents, emails, tool outputs, and retrieval results that manipulate your agent from the inside. Connect your observability stack and Benchspan monitors every LLM call, tool invocation, and RAG retrieval in production. We learn your agents' normal behavior and flag data exfiltration, unauthorized tool access, and behavioral drift before they cause damage

Category shift

Enterprise AI AgentsAI Cybersecurity

Summary

Benchspan fundamentally changed from an AI agent benchmarking platform (helping developers evaluate AI agent performance) to a real-time threat detection security product (monitoring AI agents in production for attacks and malicious behavior). The old and new products solve entirely different problems and serve different use cases, so this is a full product pivot.

Detected 2 months ago · 2026-04-12

Company journey — 3 pivots

CurrentInstitutional data broker for prediction markets

111.4°Near Reinvention2026-04-30

Real-time threat detection for AI agents in production(viewing)

87.1°Major Pivot2026-04-12

Run agent benchmarks in minutes, not hours

131.6°Near Reinvention2026-03-26

Started as

Institutional data layer for prediction markets