DarwinLoop
Your AI Agents smarter at each user interaction
DarwinLoop is an open-source Python and Node.js library that detects implicit user feedback signals, computes prompt quality metrics, and autonomously evolves your agent's prompt — without touching your infrastructure, storing conversations, or requiring labeled data.
Prompt quality silently decays. Nothing in your stack catches it.
Prompts don't age well
Production agents accumulate failure modes silently. There's no alert when your prompt starts degrading.
The signal is there — and ignored
Every re-prompt, correction, and session abandoned by users is diagnostic data. Today it vanishes on every turn.
Manual review doesn't scale
Reading conversation logs to find failure patterns is economically irrational at production volume.
A closed loop from signal to safe deployment.
Intercept
Zero-latency I/O hook. Darwin wraps your agent without changing your code. <1 line of instrumentation.
Detect
Implicit signals extracted per turn: re-prompts, corrections, session abandonment. NFR, PFR, and AINPS computed in rolling windows.
Mutate
When metrics breach configurable thresholds, the Darwin agent generates a targeted prompt edit using GEPA — a reflective evolutionary algorithm. No labeled data required.
Validate
Canary rollout with statistical significance testing. Auto-promote if better. Auto-rollback if worse.
One wrap call. Your agent now evolves itself.
import darwin
agent = darwin.wrap(
agent=my_agent, # any LangChain, OpenAI, Anthropic agent
agent_id="support-bot",
llm_api_key=OPENAI_KEY # BYOK — for the mutation engine only
)
# That's it. Darwin runs in the background.
# Your agent now evolves autonomously.Production-grade primitives, not a science project.
Privacy by design
Raw conversations never leave your process. Darwin stores hashes and anonymized summaries only.
Three mutation levels
Injection (session-scoped), Soft Edit (auto-apply to new sessions), Hard Edit (GitHub PR for human approval).
Local-first
SQLite by default. PostgreSQL/Supabase via pluggable interface. No SaaS dependency. pip install and go.
Statistical validation
Fisher exact test, Student's t-test, or threshold-based. Configurable. Mutations only promote when provably better.
GEPA mutation engine
Prompt evolution via reflective reasoning + Pareto-optimal candidate selection. The algorithm that beat Claude Opus 4.1 on enterprise benchmarks at 90× lower cost.
"Caught a regression in our support agent we wouldn't have seen for weeks. The PR landed before standup."
Self-host today. Hosted Cloud when you need it.
Self-hosted
- Managed Darwin agent
- SQLite / Postgres
- Full mutation engine
- Community support
OSS Cloud
- Team dashboard
- RBAC + multi-agent view
- Priority support
- Managed canary infra