Research

Dec 9, 2025

Omnigrep: State-of-the-Art Agentic Code Search

Omnigrep achieves state-of-the-art F0.5 of 0.475 on CodeSearchEval, outperforming Cognition's SWE-grep by 15% and Claude Code by 33% through multi-turn chain-of-thought reasoning.

Dec 5, 2025

Paragon E2E: Natural Language End-to-End Testing

Paragon now supports end-to-end testing through natural language. Describe what you want to test in plain English, and Paragon writes, runs, and iterates on Playwright tests.

Nov 3, 2025

ReviewBenchLite: A Benchmark for Evaluating Code Review Agents Capabilities with Production issues.

A benchmark for systematically evaluating proactive code review capabilities. Evaluates 117 real-world issues from 25 Python repositories across five categories. Results show specialized agents achieve up to 81.2% accuracy.

Oct 23, 2025

Introducing Paragon: The Next Generation of Autonomous Code Review

Deep dive into Paragon's architecture: how Polarity Heavy, Planner Agent, Worker Fleet, and Sandbox Verifier work together to achieve 94% accuracy and 6x faster execution than competitors.