FlineDev/TandemKit

27 stars · Last commit 2026-05-02

Planner/Generator/Evaluator orchestration harness for Claude Code (and Codex)

PluginWorkflow & Automation TDD & Testing Debugging Documentation Design & UI DevOps & Infra Productivity AI & Prompting Data & ML Security

README preview

<p align="center">
  <img src="https://github.com/FlineDev/TandemKit/blob/main/Logo.png?raw=true" height="256" />
  <br><br>
  <a href="#how-it-works">How It Works</a> · <a href="#installation">Installation</a> · <a href="#mission-lifecycle">Mission Lifecycle</a> · <a href="#faq">FAQ</a>
</p>

# TandemKit

Describe your goal, approve the spec, then step away — Claude and Codex loop together until it's right.

TandemKit is a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin that runs three sessions — Planner, Generator, and Evaluator — with two of them pairing Claude and Codex as independent reviewers. You are only needed at two points: during **planning** (questions and spec approval) and at **review** (when evaluation passes and you give feedback or call it done). Between those two points, the Generator implements and the Evaluator verifies in a tight loop, with no manual review or copy-pasting from you. In both the Planner and Evaluator sessions, Claude automatically launches [Codex](https://openai.com/index/introducing-codex/) as a background task using the official [Codex plugin](https://github.com/openai/codex-plugin-cc), so two different models independently investigate and converge on a result — everything inside Claude Code.

## Why TandemKit?

### Who Is It For?

You have a **Claude Max** subscription (which includes Claude Code) and a **ChatGPT** subscription (which includes Codex). You work on tasks complex enough to warrant the extra cost — TandemKit is not recommended for simple, small, or mechanical tasks, since the multi-session loop uses more tokens than a regular Claude session.

### The Reasoning

View full repository on GitHub →