zeuikli/claude-pilot-suite

24 stars · Last commit 2026-05-08

Claude Code execution playbook with 3 pilot modes: cost-first (Haiku), quality-first (Sonnet), ceiling-elevation (Opus). Quantitative escalation gates.

PluginTDD & Testing Debugging Documentation Design & UI DevOps & Infra Productivity AI & Prompting Data & ML Security

README preview

# Claude Pilot Suite

> **Version**: 0.4.1 (2026-05-08 — opus-pilot added; 5 benchmark-driven fixes for haiku/sonnet-pilot)
> **License**: MIT
> **Validated on**: Claude Sonnet 4.6 + Haiku 4.5 + Opus 4.7 (6-agent × 10-question pilot-vs-vanilla benchmark)

A portable, opinionated execution playbook for Claude Code that provides:

- **Haiku 4.5 quality close to Opus 4.7** (Haiku Pilot mode) — with 70–85% cost savings
- **Sonnet 4.6 quality floor protection + predictable escalation** (Sonnet Pilot mode) — with 55–65% cost savings
- **Opus 4.7 ceiling elevation** (Opus Pilot mode) — 5 structural mechanisms to exceed vanilla Opus baseline on hard analytical tasks

**What "quality floor protection" means**: on structured wiki-extraction tasks, Sonnet+SKILL ≈ Sonnet Plain (both ~283/300 vs Opus 286/300 in A/B/C benchmark). The SKILL's value is in *preventing quality drops* on complex tasks (ambiguous requirements, multi-step agentic, cross-module design) — not in boosting already-strong performance on routine extraction. Think of it as insurance: no premium on good days, prevents catastrophic failure on hard days.

Built on the empirical finding that **prompt structure + sub-agent delegation often matters more than raw model capability**. Documented in two arXiv-backed studies:

> AgentOpt (arxiv 2604.06296): On HotpotQA, Opus alone = 31.71%; Ministral planner + Opus solver = 74.27%.
> Augment Code AGENTS.md eval (2026-04): Best-quality docs ≈ one model tier upgrade. Bad docs are worse than none.

---

View full repository on GitHub →