shinpr/rashomon

9 stars · Last commit 2026-04-04

Measure prompt and skill improvements with blind A/B comparison.

README preview

<p align="center">
  <img src="assets/rashomon-banner.jpg" width="600" alt="Rashomon">
</p>

<p align="center">
  <a href="https://claude.ai/code"><img src="https://img.shields.io/badge/Claude%20Code-Plugin-purple" alt="Claude Code"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue" alt="License"></a>
</p>

**Know whether your skills actually improve agent behavior — not just look different.**

## Why rashomon?

> Inspired by the *Rashomon effect* — the idea that the same event can produce different outcomes depending on perspective.
> rashomon makes those differences explicit and comparable.

- Built a skill but unsure if it actually changes agent behavior?
- Iterating on skills and prompts by gut feel instead of evidence?
- Want proof that your changes made things better, not just different?

View full repository on GitHub →