<!doctype html>
<html lang="en">
<head><base href="about:srcdoc"><script>(function(){function n(){parent.postMessage({t:'share:hash',h:location.hash},'*')}addEventListener('hashchange',n);addEventListener('message',function(e){if(e.source!==parent)return;var d=e.data||{};if(d.t==='share:setHash'&&typeof d.h==='string'&&d.h!==location.hash){location.hash=d.h}});addEventListener('DOMContentLoaded',function(){parent.postMessage({t:'share:ready',h:location.hash},'*')});})();</script>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>The autonomous PR pipeline, explained</title>
<meta name="description" content="A plain-English tour of the autopilot: what it is, what it does for you, and how the four subsystems fit together." />
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link href="https://fonts.googleapis.com/css2?family=Source+Serif+4:opsz,wght@8..60,300..700&family=JetBrains+Mono:wght@400;500;700&display=swap" rel="stylesheet" />
<style>
:root {
--paper: #F7F1E2;
--paper-bright: #FBF6E8;
--ink: #1E1A14;
--ink-soft: #3C3528;
--ink-light: #6F6552;
--rule: #D4C9A8;
--rule-strong: #A89A75;
--accent: #8C2E25;
--accent-deep: #5E1F18;
--green: #4F5C36;
--maxw: 78ch;
--gutter: clamp(1.25rem, 3vw, 2.5rem);
}
*, *::before, *::after { box-sizing: border-box; }
html { -webkit-text-size-adjust: 100%; scroll-behavior: smooth; }
body { margin: 0; }
table { border-collapse: collapse; }
section { scroll-margin-top: 2rem; }
body {
background: var(--paper);
color: var(--ink);
font-family: 'Source Serif 4', Georgia, 'Times New Roman', serif;
font-size: 18.5px;
line-height: 1.62;
font-weight: 400;
text-rendering: optimizeLegibility;
-webkit-font-smoothing: antialiased;
overflow-x: hidden;
}
/* Subtle paper grain — fixed, very low opacity */
body::before {
content: "";
position: fixed;
inset: 0;
pointer-events: none;
z-index: 1000;
opacity: .035;
mix-blend-mode: multiply;
background-image:
url("data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='240' height='240'><filter id='n'><feTurbulence type='fractalNoise' baseFrequency='1.2' numOctaves='2' stitchTiles='stitch'/><feColorMatrix values='0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .9 0'/></filter><rect width='100%25' height='100%25' filter='url(%23n)'/></svg>");
}
a { color: var(--accent-deep); text-decoration: underline; text-decoration-thickness: 1px; text-underline-offset: 2px; }
a:hover { color: var(--accent); }
::selection { background: var(--accent); color: var(--paper-bright); }
.doc {
width: min(100% - 2rem, calc(var(--maxw) + 5rem));
margin-inline: auto;
padding-inline: var(--gutter);
padding-block: clamp(2rem, 5vw, 3.5rem) 3rem;
}
/* ---------- Masthead ---------- */
.masthead {
display: flex;
justify-content: space-between;
align-items: baseline;
gap: 1rem;
padding-bottom: .9rem;
border-bottom: 1px solid var(--ink);
font-size: .72rem;
letter-spacing: .14em;
text-transform: uppercase;
color: var(--ink-soft);
font-weight: 500;
}
.masthead .right { color: var(--ink-light); }
/* ---------- Hero ---------- */
.hero {
margin-block: 2.5rem 3rem;
}
.hero h1 {
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: clamp(2rem, 4.5vw, 2.95rem);
line-height: 1.12;
letter-spacing: -0.014em;
margin: 0 0 1.2rem 0;
color: var(--ink);
max-width: 28ch;
}
.hero .standfirst {
font-family: 'Source Serif 4', serif;
font-style: italic;
font-weight: 400;
font-size: 1.12rem;
line-height: 1.5;
color: var(--ink-soft);
max-width: 60ch;
margin: 0;
}
.hero .meta {
margin-top: 1.75rem;
padding-top: 1.1rem;
border-top: 1px solid var(--rule);
display: grid;
grid-template-columns: max-content 1fr;
column-gap: 1.5rem;
row-gap: .35rem;
font-size: .85rem;
max-width: 60ch;
}
.hero .meta dt {
font-family: 'Source Serif 4', serif;
font-size: .68rem;
letter-spacing: .14em;
text-transform: uppercase;
color: var(--ink-light);
font-weight: 600;
align-self: baseline;
padding-top: .15rem;
}
.hero .meta dd {
font-family: 'JetBrains Mono', ui-monospace, monospace;
font-size: .8rem;
margin: 0;
color: var(--ink);
overflow-wrap: anywhere;
word-break: break-word;
}
/* ---------- Sections ---------- */
section { margin-block: 3.5rem; }
.section-head {
margin-bottom: 1.5rem;
}
.section-head .label {
font-family: 'Source Serif 4', serif;
font-size: .68rem;
letter-spacing: .22em;
text-transform: uppercase;
color: var(--accent);
font-weight: 600;
margin-bottom: .35rem;
}
.section-head h2 {
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: 1.55rem;
line-height: 1.22;
letter-spacing: -0.006em;
margin: 0 0 .65rem 0;
color: var(--ink);
max-width: 36ch;
}
.section-head::after {
content: "";
display: block;
width: 2.5rem;
height: 1px;
background: var(--ink);
margin-top: .8rem;
}
h3 {
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: 1.12rem;
margin: 2.1rem 0 .65rem 0;
color: var(--ink);
}
p { margin: 0 0 .95em; }
p strong { font-weight: 600; color: var(--ink); }
em { font-style: italic; }
/* Inline code */
code {
font-family: 'JetBrains Mono', ui-monospace, monospace;
font-size: .82em;
background: var(--paper-bright);
padding: .08em .35em;
border: 1px solid var(--rule);
border-radius: 2px;
color: var(--accent-deep);
overflow-wrap: anywhere;
}
/* Block code */
pre {
font-family: 'JetBrains Mono', ui-monospace, monospace;
font-size: .76rem;
line-height: 1.55;
background: var(--paper-bright);
border: 1px solid var(--rule);
border-left: 2px solid var(--ink);
padding: .9rem 1.1rem;
margin: 1.25rem 0;
overflow-x: auto;
color: var(--ink-soft);
}
pre code { background: none; border: 0; padding: 0; color: inherit; }
/* Lists */
ul, ol { margin: 0 0 1em 1.3em; padding: 0; }
li { margin-bottom: .3em; }
ul li::marker { color: var(--accent); }
/* Blockquote — pull-quote / John conversation */
blockquote {
margin: 1.5rem 0;
padding: 0 0 0 1.5rem;
border-left: 2px solid var(--accent);
font-style: italic;
font-size: 1rem;
line-height: 1.55;
color: var(--ink-soft);
}
blockquote p:last-child { margin-bottom: 0; }
/* ---------- Tables ---------- */
.table-wrap {
margin: 1.4rem 0;
overflow-x: auto;
border-top: 1.5px solid var(--ink);
border-bottom: 1.5px solid var(--ink);
}
table {
width: 100%;
font-size: .9rem;
line-height: 1.45;
}
thead tr { border-bottom: 1px solid var(--ink); }
th {
text-align: left;
padding: .7rem .8rem;
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: .68rem;
letter-spacing: .14em;
text-transform: uppercase;
color: var(--ink);
vertical-align: bottom;
}
td {
padding: .65rem .8rem;
border-bottom: 1px solid var(--rule);
vertical-align: top;
color: var(--ink-soft);
}
td.num { font-family: 'JetBrains Mono', monospace; text-align: right; font-size: .85rem; color: var(--ink); }
td strong { color: var(--ink); }
td em { color: var(--ink-light); }
tbody tr:last-child td { border-bottom: 0; }
/* ---------- The four pillars (mental model diagram) ---------- */
.pillars {
margin: 1.5rem 0 1.75rem;
display: grid;
grid-template-columns: repeat(4, 1fr);
gap: 0;
border-top: 1.5px solid var(--ink);
border-bottom: 1.5px solid var(--ink);
background: var(--paper-bright);
}
.pillar {
padding: 1.1rem 1rem 1.2rem;
border-right: 1px solid var(--rule);
}
.pillar:last-child { border-right: 0; }
.pillar .stage {
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: .62rem;
letter-spacing: .18em;
text-transform: uppercase;
color: var(--ink-light);
margin-bottom: .45rem;
}
.pillar .name {
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: 1rem;
color: var(--ink);
margin-bottom: .3rem;
}
.pillar .qa {
font-style: italic;
font-size: .8rem;
line-height: 1.4;
color: var(--accent-deep);
margin-bottom: .7rem;
}
.pillar ul {
margin: 0;
padding: 0;
list-style: none;
font-size: .76rem;
line-height: 1.45;
color: var(--ink-soft);
}
.pillar ul li { padding-left: .85rem; position: relative; margin-bottom: .2rem; }
.pillar ul li::before { content: "—"; position: absolute; left: 0; color: var(--rule-strong); }
.pillar ul li code { font-size: .72rem; }
@media (max-width: 760px) {
.pillars { grid-template-columns: 1fr; }
.pillar { border-right: 0; border-bottom: 1px solid var(--rule); }
}
/* ---------- Group tree ---------- */
.tree {
margin: 1.5rem 0;
padding: 1rem 1.25rem;
background: var(--paper-bright);
border: 1px solid var(--rule);
border-left: 2px solid var(--ink);
font-family: 'JetBrains Mono', monospace;
font-size: .8rem;
line-height: 1.6;
color: var(--ink-soft);
}
.tree .root { color: var(--ink-light); }
.tree .inst { color: var(--ink); font-weight: 700; }
.tree .unit { color: var(--ink-soft); }
.tree .unit.john { color: var(--accent); font-weight: 700; }
.tree .annot {
color: var(--ink-light);
font-style: italic;
font-family: 'Source Serif 4', serif;
font-size: .76rem;
padding-left: .5rem;
}
.tree .annot.john { color: var(--accent); }
/* ---------- Scenario / timeline ---------- */
.scenario {
margin: 1.5rem 0;
padding: 1rem 1.25rem;
background: var(--paper-bright);
border: 1px solid var(--rule);
border-left: 2px solid var(--accent);
font-family: 'JetBrains Mono', monospace;
font-size: .78rem;
line-height: 1.6;
color: var(--ink-soft);
}
.scenario .stamp { color: var(--accent); font-weight: 700; }
.scenario .arrow { color: var(--rule-strong); }
.scenario .ok { color: var(--green); font-weight: 700; }
.scenario .stale { color: var(--accent); font-weight: 700; }
/* ---------- Big number cards ---------- */
.figures {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
gap: .85rem;
margin: 1.5rem 0;
}
.figure {
padding: 1rem 1.1rem .95rem;
background: var(--paper-bright);
border: 1px solid var(--rule);
border-top: 2px solid var(--rule-strong);
}
.figure.legacy { border-top-color: var(--accent); }
.figure.modern { border-top-color: var(--green); }
.figure .big {
font-family: 'Source Serif 4', serif;
font-weight: 500;
font-size: 2.4rem;
line-height: 1;
letter-spacing: -.02em;
color: var(--ink);
}
.figure.legacy .big { color: var(--accent); }
.figure.modern .big { color: var(--green); }
.figure .label {
margin-top: .45rem;
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: .66rem;
letter-spacing: .14em;
text-transform: uppercase;
color: var(--ink-light);
}
.figure .desc {
margin-top: .35rem;
font-style: italic;
font-size: .82rem;
line-height: 1.4;
color: var(--ink-soft);
}
/* ---------- TL;DR ---------- */
.tldr {
margin: 1rem 0 0;
padding: 1.5rem 1.75rem;
background: var(--ink);
color: var(--paper-bright);
}
.tldr p {
margin: 0;
font-size: .98rem;
line-height: 1.6;
color: var(--paper-bright);
}
.tldr p strong {
color: #E5C9B5;
font-weight: 600;
}
/* ---------- Footer ---------- */
footer.colophon {
margin-top: 4rem;
padding-top: 1.1rem;
border-top: 1px solid var(--ink);
display: flex;
justify-content: space-between;
font-size: .68rem;
letter-spacing: .14em;
text-transform: uppercase;
color: var(--ink-light);
font-weight: 500;
}
/* Print */
@media print {
body { background: white; font-size: 11pt; }
body::before { display: none; }
.tldr { background: white; color: var(--ink); border: 1px solid var(--ink); }
.tldr p, .tldr p strong { color: var(--ink); }
.doc { width: auto; padding: 0 1.5cm; }
section, .pillars { break-inside: avoid; }
}
@media (max-width: 540px) {
body { font-size: 16px; }
.hero h1 { font-size: 1.65rem; }
.pillars { grid-template-columns: 1fr; }
}
/* ---------- Right-rail TOC (desktop) ---------- */
.toc {
position: fixed;
top: 50%;
right: 1.5rem;
transform: translateY(-50%);
width: 12.5rem;
max-height: calc(100vh - 4rem);
overflow-y: auto;
padding: 1rem 1.1rem;
background: var(--paper-bright);
border: 1px solid var(--rule);
border-left: 2px solid var(--accent);
z-index: 50;
font-size: .78rem;
line-height: 1.4;
}
.toc .toc-label {
font-family: 'Source Serif 4', serif;
font-size: .6rem;
letter-spacing: .2em;
text-transform: uppercase;
color: var(--ink-light);
font-weight: 600;
margin-bottom: .55rem;
padding-bottom: .45rem;
border-bottom: 1px solid var(--rule);
}
.toc ol {
list-style: none;
padding: 0;
margin: 0;
counter-reset: toc;
}
.toc ol li {
counter-increment: toc;
position: relative;
padding: .35rem 0 .35rem 1.5rem;
border-top: 1px solid var(--rule);
}
.toc ol li:first-child { border-top: 0; padding-top: .15rem; }
.toc ol li::before {
content: counter(toc, upper-roman) ".";
position: absolute;
left: 0;
top: .35rem;
font-family: 'Source Serif 4', serif;
font-weight: 600;
font-size: .7rem;
color: var(--accent);
}
.toc ol li:first-child::before { top: .15rem; }
.toc a {
color: var(--ink-soft);
text-decoration: none;
display: block;
}
.toc a:hover { color: var(--accent); }
/* Hide rail when viewport is too narrow to fit it next to the column */
@media (max-width: 1380px) {
.toc { display: none; }
}
/* ---------- Inline mini-TOC (mobile / narrow) ---------- */
.toc-inline {
display: none;
margin: 1.5rem 0 0;
padding: .9rem 1.1rem;
background: var(--paper-bright);
border: 1px solid var(--rule);
border-left: 2px solid var(--accent);
font-size: .82rem;
}
.toc-inline .toc-label {
font-family: 'Source Serif 4', serif;
font-size: .6rem;
letter-spacing: .2em;
text-transform: uppercase;
color: var(--ink-light);
font-weight: 600;
margin-bottom: .5rem;
}
.toc-inline ol {
list-style: none;
padding: 0;
margin: 0;
column-count: 2;
column-gap: 1rem;
}
.toc-inline ol li {
break-inside: avoid;
padding: .15rem 0;
}
.toc-inline a {
color: var(--ink-soft);
text-decoration: none;
}
.toc-inline a:hover { color: var(--accent); }
@media (max-width: 1380px) {
.toc-inline { display: block; }
}
@media (max-width: 480px) {
.toc-inline ol { column-count: 1; }
}
/* ---------- Back-to-top floater ---------- */
.top-link {
position: fixed;
right: 1.25rem;
bottom: 1.25rem;
z-index: 60;
display: inline-flex;
align-items: center;
gap: .35rem;
padding: .45rem .75rem .5rem;
background: var(--ink);
color: var(--paper-bright);
text-decoration: none;
font-family: 'Source Serif 4', serif;
font-size: .68rem;
letter-spacing: .14em;
text-transform: uppercase;
font-weight: 600;
border: 0;
box-shadow: 0 6px 18px -8px rgba(30,26,20,.45);
opacity: 0;
pointer-events: none;
transition: opacity .2s ease;
}
body.scrolled .top-link { opacity: 1; pointer-events: auto; }
.top-link:hover { background: var(--accent); color: var(--paper-bright); }
@media print { .top-link, .toc, .toc-inline { display: none !important; } }
</style>
</head>
<body id="top">
<!-- ─────────── Right-rail TOC (desktop ≥1380px) ─────────── -->
<nav class="toc" aria-label="Sections">
<div class="toc-label">On this page</div>
<ol>
<li><a href="#s1">What the autopilot is</a></li>
<li><a href="#s2">The four moving parts</a></li>
<li><a href="#s3">Life of an issue</a></li>
<li><a href="#s4">How it stays safe</a></li>
<li><a href="#s5">What it will not do</a></li>
<li><a href="#s6">What just shipped</a></li>
<li><a href="#s7">What is next</a></li>
</ol>
</nav>
<a href="#top" class="top-link" aria-label="Back to top">↑ Top</a>
<div class="doc">
<!-- ─────────── Masthead ─────────── -->
<header class="masthead">
<div>Reference brief · autonomous PR pipeline</div>
<div class="right">2026-05-08</div>
</header>
<!-- ─────────── Hero ─────────── -->
<section class="hero">
<h1>The autopilot, in plain English.</h1>
<p class="standfirst">An autonomous engineer that reads your repo's new issues, decides which ones it can actually fix, opens a draft pull request with the fix, and posts an adversarial review on every open PR — without ever auto-merging anything.</p>
<dl class="meta">
<dt>Repo</dt><dd>MCP_eRegistrations_BPA</dd>
<dt>Status</dt><dd>v1 implementation complete · 4 plans · 60+ commits</dd>
<dt>Tests</dt><dd>257 pipeline · 1178 monorepo (no regressions)</dd>
<dt>Runtime</dt><dd>any Linux host · systemd · Python 3.11+ · gh + Claude CLIs</dd>
</dl>
<!-- Inline mini-TOC (visible <1380px). Mirror the right-rail entries above. -->
<nav class="toc-inline" aria-label="Sections (mobile)">
<div class="toc-label">Contents</div>
<ol>
<li>I. <a href="#s1">What the autopilot is</a></li>
<li>II. <a href="#s2">The four moving parts</a></li>
<li>III. <a href="#s3">Life of an issue</a></li>
<li>IV. <a href="#s4">How it stays safe</a></li>
<li>V. <a href="#s5">What it will not do</a></li>
<li>VI. <a href="#s6">What just shipped</a></li>
<li>VII. <a href="#s7">What is next</a></li>
</ol>
</nav>
</section>
<div class="tldr">
<p><strong>The one-line version.</strong> The autopilot watches your GitHub issues every five minutes; for the ones it understands, it writes a spec, writes failing tests, writes the fix, opens a draft PR with the full audit trail, and reviews every PR — yours and its own — for blocking issues. You always make the merge call. It cannot.</p>
</div>
<!-- ═══════════════ §1 ═══════════════ -->
<section aria-labelledby="s1">
<header class="section-head">
<div class="label">Section I</div>
<h2 id="s1">What the autopilot is, and why it exists</h2>
</header>
<p>Imagine a junior engineer who never sleeps, reads every new issue within five minutes, and either (a) writes you a thoughtful triage comment explaining what they think is going on, or (b) goes ahead and opens a draft pull request that fixes it. That junior engineer does not push to <code>main</code>. They do not deploy. They do not write to production databases. Every pull request they open is a draft, with a long audit trail explaining the reasoning, the tests they wrote, and the adversarial findings their three-member internal review council surfaced.</p>
<p>That is the autopilot. The metaphor is deliberate: like an aviation autopilot, it operates within a bounded scope, the human stays in command, and a single yoke-twitch (a panic-close script) disengages everything.</p>
<p>It runs on any Linux host with systemd, Python 3.11+, the <code>gh</code> CLI, and the Claude CLI installed. It takes input only from this repository's GitHub issues and pull requests, and it uses Claude Code as its reasoning engine. Its job is to be cautious, slow, and verbose. Your job is to read the draft PRs and decide which ones to merge.</p>
<p>A note on naming you will see in the code: the Python package, systemd unit names, label prefixes, and state directory all carry the historical <code>erbot</code> prefix from the working name during development. The system itself does not depend on that prefix; the <code>autopilot</code> name in this brief is the one to use in conversation. Renaming the code identifiers is an open Plan 5+ refactor candidate.</p>
<h3>The problem it solves</h3>
<p>The MCP eRegistrations stack has roughly 200 tools across six MCP servers (BPA, DS, GDB, Keycloak, SmartLink, Translations). Most of the issues filed against the repo fall into a few patterns: a small bug in a tool, a missing field on a response, a docstring drift, a flaky test. Each one takes 15–60 minutes of human attention to triage and fix. Over a busy week that adds up.</p>
<p>The autopilot's hypothesis is that <strong>the boring 70% of those issues are mechanical enough for an LLM to handle reliably</strong>, provided the system around the LLM is rigorous: a fixed pipeline, mandatory adversarial review, hard halt-on-suspicion gates, and a structural inability to do damage. The other 30% — anything risky, ambiguous, or production-touching — are explicitly out of scope and the bot halts with a labeled comment for you.</p>
<h3>What you get</h3>
<ul>
<li><strong>Fast triage</strong> on every new issue, posted as a structured comment within ~10 minutes of filing.</li>
<li><strong>Draft PRs</strong> for issues the bot can fix on its own, capped at 3 per 26-hour window so they remain reviewable by hand.</li>
<li><strong>Adversarial PR review</strong> on every open PR, including your own — three personas (skeptic, backward-compat, edge-hunter) post one structured comment per PR.</li>
<li><strong>A daily digest</strong> at 06:00 UTC summarising yesterday's activity, halts, and token use.</li>
<li><strong>Loud failure</strong> — if anything halts, you get a Discord ping and an issue label that tells you exactly which gate tripped.</li>
</ul>
</section>
<!-- ═══════════════ §2 ═══════════════ -->
<section aria-labelledby="s2">
<header class="section-head">
<div class="label">Section II</div>
<h2 id="s2">The four moving parts</h2>
</header>
<p>The bot is divided into four subsystems. Each one runs on its own systemd timer, has a single clear job, and can be stopped independently if it misbehaves.</p>
<div class="pillars">
<div class="pillar">
<div class="stage">Subsystem A</div>
<div class="name">Watcher & Picker</div>
<div class="qa">"Which issue should we work on next?"</div>
<ul>
<li>Lists open issues every 5 min</li>
<li>Skips already-triaged ones</li>
<li>Picker chooses the highest-severity, smallest-scale untriaged issue and dispatches the executor</li>
</ul>
</div>
<div class="pillar">
<div class="stage">Subsystem B</div>
<div class="name">Triage</div>
<div class="qa">"What is this issue actually about?"</div>
<ul>
<li>Eight sub-agents in parallel</li>
<li>Extracts claims, verifies each one against the codebase, scores duplicates, classifies hard/soft constraints</li>
<li>Posts a single structured comment per issue</li>
</ul>
</div>
<div class="pillar">
<div class="stage">Subsystem C</div>
<div class="name">Executor</div>
<div class="qa">"Can we actually fix this?"</div>
<ul>
<li>11-phase pipeline per issue</li>
<li>Eight named adversarial agents debate the spec</li>
<li>Live-API probe + regression test before the diff lands</li>
<li>Opens the draft PR with the full audit trail</li>
</ul>
</div>
<div class="pillar">
<div class="stage">Subsystem D</div>
<div class="name">Reviewer</div>
<div class="qa">"Is this PR safe to merge?"</div>
<ul>
<li>Runs every 5 min on every open PR</li>
<li>Light mode on bot's own drafts (sanity check only)</li>
<li>Full mode on human PRs (3 PR personas + project-specific gates)</li>
<li>Posts <code>REQUEST_CHANGES</code>, <code>COMMENT_ONLY</code>, or <code>APPROVE</code> — never blocks merging</li>
</ul>
</div>
</div>
<p>The four subsystems share one global lock (<code>flock</code> on <code>/var/lock/erbot.lock</code>) so only one of them is doing work at any moment. This is invariant <em>I3</em> in the spec — single-writer enforcement. It is the cheapest, most reliable way to keep the bot from racing with itself.</p>
<h3>Where each piece lives</h3>
<div class="tree">
<span class="root">/var/lib/erbot/</span>
<br>├── <span class="inst">repo/</span> <span class="annot">this repository, fresh-pulled before every issue</span>
<br>├── <span class="inst">state/</span> <span class="annot">per-issue token counters, idempotency lock files</span>
<br>│ ├── <span class="unit">heartbeat.timestamp</span> <span class="annot">touched every watcher tick</span>
<br>│ └── <span class="unit"><issue-N>/usage.json</span> <span class="annot">running input/output token totals</span>
<br>└── <span class="inst">logs/</span> <span class="annot">journald → graylog forwarder</span>
<br><br><span class="root">/etc/erbot/</span>
<br>├── <span class="unit">secrets.env</span> <span class="annot">Anthropic API key + GitHub PAT + Discord webhook</span>
<br>├── <span class="unit">config.yaml</span> <span class="annot">live-prober allowlist, digest issue, write caps</span>
<br>└── <span class="unit">expected-plugins.sha256</span> <span class="annot">checked on every cold-start by erbot-doctor</span>
</div>
</section>
<!-- ═══════════════ §3 ═══════════════ -->
<section aria-labelledby="s3">
<header class="section-head">
<div class="label">Section III</div>
<h2 id="s3">Life of an issue, end to end</h2>
</header>
<p>Suppose a user files issue <strong>#42</strong>: <em>"effects[-1] returns the wrong row when two effects share a component."</em> Here is what the bot does, step by step, over the next ~15 minutes.</p>
<div class="scenario">
<span class="stamp">T+0:00</span> <span class="arrow">→</span> user files issue #42
<br><span class="stamp">T+0:05</span> <span class="arrow">→</span> watcher picks it up, dispatches triage
<br><span class="stamp">T+0:08</span> <span class="arrow">→</span> triage posts the canonical 7-section comment
<br><span class="stamp"> </span> verified claims (3) · possible duplicates (0) · constraints (1 hard) · recommendation: PROCEED
<br><span class="stamp">T+0:10</span> <span class="arrow">→</span> picker sees a clean PROCEED, dispatches executor
<br><span class="stamp">T+0:11</span> <span class="arrow">→</span> Phase 1 INTAKE <span class="ok">·</span> injection check passes
<br><span class="stamp">T+0:12</span> <span class="arrow">→</span> Phase 2 VERIFY <span class="ok">·</span> 3/3 claims VERIFIED against the repo
<br><span class="stamp">T+0:13</span> <span class="arrow">→</span> Phase 3 CHALLENGE <span class="ok">·</span> premise-challenger finds no foundational error
<br><span class="stamp">T+0:14</span> <span class="arrow">→</span> Phase 4 COUNCIL <span class="ok">·</span> 3 personas debate · 1 medium-severity finding (callers grep)
<br><span class="stamp">T+0:15</span> <span class="arrow">→</span> Phase 5 SPEC <span class="ok">·</span> invariants + edge cases + risks captured
<br><span class="stamp">T+0:17</span> <span class="arrow">→</span> Phase 6 TEST_FIRST <span class="ok">·</span> test-author writes 2 failing tests
<br><span class="stamp">T+0:18</span> <span class="arrow">→</span> Phase 7 PLAN <span class="ok">·</span> orchestrator maps spec to a single task
<br><span class="stamp">T+0:21</span> <span class="arrow">→</span> Phase 8 IMPLEMENT <span class="ok">·</span> diff applied · branch bot/issue-42-effects-wrong-row created
<br><span class="stamp"> </span> live-prober: bpa connection_status against jamaica · OK
<br><span class="stamp"> </span> regression-runner: bpa-regression-test · OK
<br><span class="stamp">T+0:22</span> <span class="arrow">→</span> Phase 9 SELF_REVIEW <span class="ok">·</span> diff matches spec
<br><span class="stamp">T+0:23</span> <span class="arrow">→</span> Phase 10 DRAFT_PR <span class="ok">·</span> branch pushed · draft PR #341 opened
<br><span class="stamp">T+0:23</span> <span class="arrow">→</span> Phase 11 CLOSE <span class="ok">·</span> comment on issue · auto:pr-opened label
<br><span class="stamp">T+5:00</span> <span class="arrow">→</span> reviewer tick · light review on PR #341 · COMMENT_ONLY (clean)
<br><span class="stamp">T+next</span> <span class="arrow">→</span> <span class="ok">your turn — read PR #341 and merge it (or not)</span>
</div>
<p>Two things matter about this timeline. First, every step writes a comment or a label that you can read after the fact: there is no opaque "the bot did it" black box. Second, every phase has an explicit halt label — if Phase 4's council finds a blocking severity, the bot stops at <code>auto:halt:council-blocking-finding</code> with the council's full transcript in the comment. You read it, you decide whether to lift the halt with the <code>auto:proceed-anyway</code> label, or you close the issue with <code>auto:reject</code>.</p>
<h3>The halt vocabulary</h3>
<p>There are exactly seventeen reasons the bot will stop. These are written down in code as a frozen set, so a future change can never accidentally introduce a new failure mode the resume protocol does not understand. A few examples:</p>
<div class="table-wrap">
<table>
<thead><tr><th>Halt reason</th><th>Meaning</th><th>You typically</th></tr></thead>
<tbody>
<tr><td><code>auto:halt:verifier-disagrees</code></td><td>One of the claims in the issue body could not be verified against the code</td><td>Read the verdict; clarify the issue</td></tr>
<tr><td><code>auto:halt:hard-constraint-unverified</code></td><td>A "must not break" constraint has no file:line citation</td><td>Add the citation, re-trigger</td></tr>
<tr><td><code>auto:halt:council-blocking-finding</code></td><td>One of the three personas found a fatal flaw</td><td>Read the council debate; decide</td></tr>
<tr><td><code>auto:halt:diff-mismatches-spec</code></td><td>Self-reviewer says the diff does not implement the spec</td><td>Investigate; usually the spec was wrong</td></tr>
<tr><td><code>auto:halt:test-instance-unsafe</code></td><td>A write tool was about to fire against a non-allowlisted instance</td><td><strong>Page yourself.</strong> Stop all timers manually</td></tr>
<tr><td><code>auto:halt:budget-tokens</code></td><td>Issue ran past the per-issue 200k input / 50k output cap</td><td>Tune the cap if legit; investigate if not</td></tr>
<tr><td><code>auto:halt:state-corrupted</code></td><td>The hidden state-marker SHA-256 does not match the current issue body</td><td>Someone edited a bot comment; investigate</td></tr>
</tbody>
</table>
</div>
</section>
<!-- ═══════════════ §4 ═══════════════ -->
<section aria-labelledby="s4">
<header class="section-head">
<div class="label">Section IV</div>
<h2 id="s4">How it stays safe</h2>
</header>
<p>The single biggest design constraint of the bot is this: <strong>even if Claude is fully prompt-injected, the worst-case outcome must still be bounded.</strong> The bot does not trust its own LLM. Three independent layers of mechanical enforcement sit between the agent and any production system.</p>
<div class="figures">
<div class="figure legacy">
<div class="big">I.</div>
<div class="label">Credential air-gap</div>
<div class="desc">erdev never holds credentials for any non-allowlisted instance. If the credential is not there, no agent — no matter how clever — can use it.</div>
</div>
<div class="figure">
<div class="big">II.</div>
<div class="label">auth_login refusal</div>
<div class="desc">Every <code>*_auth_login</code> call across the six MCP servers reads <code>/etc/erbot/allowlist.yaml</code> first. Non-allowlist instance → typed <code>ToolError</code>, no token persisted.</div>
</div>
<div class="figure modern">
<div class="big">III.</div>
<div class="label">Per-tool capability gate</div>
<div class="desc">Even on an allowlisted instance, <code>*_update</code> and <code>*_delete</code> on existing entities are blocked. Only <code>create</code> within a configured test-fixture scope is allowed. Per-issue write count capped at 5.</div>
</div>
</div>
<p>These layers are belt, suspenders, and a parachute. The credential air-gap is the only one that survives complete LLM compromise — even if every other guard fails, the bot simply does not have the keys to the production castle. That is the design's North Star.</p>
<h3>What a "test instance" actually means</h3>
<p>One subtlety worth knowing: there is <strong>no separate "test" instance</strong> for the BPA stack. The instance the bot is allowed to write to (<code>jamaica</code>) is the same instance live applicants use. The bot's safety against production-data corruption comes from two constraints applied together:</p>
<ol>
<li>The allowlist's <code>safe_test_scope.service_ids</code> field restricts writes to <strong>one specific test-fixture service UUID</strong> on that instance.</li>
<li>The capability gate restricts the operations to <code>create</code> only — never <code>update</code> or <code>delete</code> on existing rows.</li>
</ol>
<p>Without both, the bot would be a production write target. With both, its blast-radius is bounded to creating new sub-entities under one specific test service. This is exactly the kind of constraint an LLM cannot reason about reliably — and exactly why it is enforced at the <code>auth_login</code> middleware level, not in any prompt.</p>
<h3>Anti-prompt-injection</h3>
<p>Every issue body, PR body, and human comment is wrapped in <code><untrusted_user_input></code> tags before the bot's agents see it. Every system prompt contains an explicit reminder that text inside those tags is data, never instructions. Before any of that runs, a heuristic regex pre-filter scans for known injection patterns (instruction-shaped tokens, role markers, "ignore previous", base64 blobs of significant length). If anything trips, the issue halts with <code>auto:halt:prompt-injection-detected</code> and you decide whether to proceed.</p>
</section>
<!-- ═══════════════ §5 ═══════════════ -->
<section aria-labelledby="s5">
<header class="section-head">
<div class="label">Section V</div>
<h2 id="s5">What the bot will not do</h2>
</header>
<p>The "never" list is short, mechanical, and unconditional. None of these are policies the bot promises to follow — they are properties of the code the bot cannot bypass.</p>
<ul>
<li><strong>Auto-merge anything.</strong> Every PR is opened as a draft. Draft PRs cannot be merged via API. Branch protection requires CODEOWNER approval, which the bot does not have.</li>
<li><strong>Write to production.</strong> See Section IV. The credentials are not even on the box.</li>
<li><strong>File new issues.</strong> Out of scope per the spec — prevents prompt-injection feedback loops where bot-1 files an issue that prompt-injects bot-2.</li>
<li><strong>Rotate its own credentials.</strong> A 30-day GitHub PAT is rotated by you, with a Discord ping at T-7d and T-1d.</li>
<li><strong>Touch translation issues.</strong> Anything labeled <code>translation</code> halts with <code>auto:halt:translation-domain</code>. The blast-radius on the global catalog is too large for v1.</li>
<li><strong>Run more than three draft PRs in any 26-hour window.</strong> Hard cap. If the queue piles up, the picker stops opening PRs (it still triages). This is the strongest blast-radius control: with at most three PRs per day-and-change, you can review them all by hand.</li>
<li><strong>Process Camunda or IAM destructive operations.</strong> Out of pipeline scope — existing skill safety nets cover those domains.</li>
</ul>
<h3>If the worst happens</h3>
<p>Should the bot somehow produce a flood of bad PRs anyway, there is a single command that stops everything:</p>
<pre><code>sudo erbot-panic-close # interactive, prompts before each step
sudo erbot-panic-close --yes # no prompts (script-able)
sudo erbot-panic-close --dry-run # print commands, run nothing</code></pre>
<p>That script stops every timer, kills any in-flight worker, and closes every bot-authored draft PR from the last 24 hours with a documented comment. It also prints reminders about credential rotation and a test-instance regression check, both of which require domain credentials the bash script intentionally does not have.</p>
</section>
<!-- ═══════════════ §6 ═══════════════ -->
<section aria-labelledby="s6">
<header class="section-head">
<div class="label">Section VI</div>
<h2 id="s6">What just shipped</h2>
</header>
<p>The full v1 implementation came together across four plans. Each plan produced working, testable software on its own; later plans built on what the previous ones shipped.</p>
<div class="figures">
<div class="figure">
<div class="big">4</div>
<div class="label">Plans tagged</div>
<div class="desc">plan1-foundation · plan2-watcher-triage · plan3-picker-executor · plan4-reviewer-ops</div>
</div>
<div class="figure">
<div class="big">60+</div>
<div class="label">Commits on erbot-impl</div>
<div class="desc">Conventional-commits, body-rich, no AI-authorship trailers per the project's commit-message guard</div>
</div>
<div class="figure modern">
<div class="big">257</div>
<div class="label">Pipeline tests passing</div>
<div class="desc">Up from 0 at session start. Mocked end-to-end smoke test exercises all 11 phases.</div>
</div>
<div class="figure">
<div class="big">1178</div>
<div class="label">monorepo regressions</div>
<div class="desc">Existing test suites — no regressions across all four plans</div>
</div>
<div class="figure">
<div class="big">17</div>
<div class="label">closed-set halt reasons</div>
<div class="desc">Frozen vocabulary; resume protocol maps every halt label deterministically to a re-entry phase</div>
</div>
<div class="figure">
<div class="big">11</div>
<div class="label">executor phases</div>
<div class="desc">INTAKE → VERIFY → CHALLENGE → COUNCIL → SPEC → TEST_FIRST → PLAN → IMPLEMENT → SELF_REVIEW → DRAFT_PR → CLOSE</div>
</div>
</div>
<h3>Plan-by-plan recap</h3>
<div class="table-wrap">
<table>
<thead><tr><th>Plan</th><th>Focus</th><th>Test delta</th><th>Key artefacts</th></tr></thead>
<tbody>
<tr><td><strong>Plan 1</strong></td><td>Foundation + spikes</td><td class="num">0 → 95</td><td>bootstrap, systemd skeletons, GitHub state primitives, allowlist Spike 3</td></tr>
<tr><td><strong>Plan 2</strong></td><td>Watcher + Triage</td><td class="num">95 → 132</td><td>8 triage sub-agents, canonical comment, dedup scorer</td></tr>
<tr><td><strong>Plan 3</strong></td><td>Picker + Executor</td><td class="num">132 → 199</td><td>PhaseRegistry, 11 phases, WorkflowState, 8 named adversarial agents, resume protocol, PR composition</td></tr>
<tr><td><strong>Plan 4</strong></td><td>Reviewer + Ops finishing</td><td class="num">199 → 257</td><td>closed Plan 3 wiring gaps; Subsystem D in light + full modes; daily digest, halt-rate alarms, drain-then-update sequencer, panic-close, heartbeat, end-to-end smoke test</td></tr>
</tbody>
</table>
</div>
<h3>The architectural choices that paid off</h3>
<p>A handful of patterns held up across all four plans and proved load-bearing every time:</p>
<ul>
<li><strong>Test-double-friendly orchestration.</strong> The phase registry is just a list of <code>PhaseSpec</code> objects; tests inject a fake registry and stub adapters, instead of patching every agent symbol in production code. Plan 4's end-to-end smoke test dropped in <code>PhaseRegistry(PHASES)</code> with no patching of any agent.</li>
<li><strong>WorkflowState as a parameter object.</strong> Instead of widening every phase signature to accept upstream data, each phase reads only the WorkflowState fields it consumes and writes only the fields it produces. Phase 7 PLAN does not need to know phase 4 COUNCIL exists.</li>
<li><strong>Closed-set halt vocabulary.</strong> Seventeen frozen halt reasons mean the resume protocol can map every label to a re-entry phase deterministically. New failure modes cannot accidentally invent new halt strings.</li>
<li><strong>HTML state markers + body SHA-256.</strong> Every bot-authored comment ends with a hidden HTML marker tying it to a SHA-256 hash of the issue body at the time of writing. On resume, if the body has been edited, the SHA mismatch trips <code>auto:halt:state-corrupted</code> rather than letting the bot continue from a stale plan.</li>
<li><strong>Cumulative-review pattern.</strong> Every task across all four plans was reviewed by a fresh subagent against the spec and against code-quality conventions. Twenty-one Important findings surfaced across the four plans — every one fixed before merge, zero shipped silently.</li>
</ul>
</section>
<!-- ═══════════════ §7 ═══════════════ -->
<section aria-labelledby="s7">
<header class="section-head">
<div class="label">Section VII</div>
<h2 id="s7">What is next</h2>
</header>
<p>The bot has the mechanical surface complete, but several refinements were intentionally deferred to keep v1 small. They are documented in the Plan 4 sign-off as <em>Plan 5+ entry conditions</em>. The most important ones, in priority order:</p>
<h3>Operational, before live deployment</h3>
<ul>
<li>A <code>NOPASSWD</code> sudoers entry is required so the plugin updater can restart the watcher and reviewer services. Without it, the updater hangs on STDIN waiting for a password that never comes (no tty in oneshot units). One line in <code>/etc/sudoers.d/erbot-restart</code>.</li>
<li>The systemd <code>ExecStart</code> lines hardcode <code>/usr/bin/uv</code>. On most distros <code>uv</code> lives at <code>/usr/local/bin/uv</code>. Either canonical install path documented or paths swept to <code>which uv</code>.</li>
<li>Plan 4's reviewer and updater unit files dropped Plan 1's <code>PartOf=erbot.slice</code>, <code>Wants=network-online.target</code>, and <code>TimeoutStartSec=600</code> — siblings still have them. Re-add for parity.</li>
</ul>
<h3>Behavioural, when production data tells us we need them</h3>
<ul>
<li>The orchestrator-side spec author. Today phase 5 SPEC runs the depth-reviewer's lenses on whatever the orchestrator has written into <code>state.spec</code>. A real spec-author agent would generate that text; today it is fed by the test fixture in the smoke test.</li>
<li>The council back-channel revision loop. Spec §5.3.3 mandates a 3-round cap on council ↔ spec disagreements. Plan 4 ships the agents but not the loop.</li>
<li>Parallel COUNCIL fanout. The three personas currently run sequentially per Plan 3's deferral. <code>asyncio</code> fanout would shave 60–90 seconds off every issue.</li>
<li>An I8 picker auto-pause on halt-flood. Plan 4's alarm scanner detects ≥10 halts/hour but does not actually pause the picker. Operator workaround today is <code>sudo systemctl stop erbot-picker.timer</code>.</li>
</ul>
<h3>The success criteria</h3>
<p>Per the spec's §9, v1 is "done" when, against a 4-week observation window:</p>
<blockquote>
<p>Hard invariants (any failure = v1 not shipped): zero PRs auto-merged by the bot; zero writes against any non-allowlisted instance; zero lock violations; zero untrusted-input strings reaching an agent without the XML wrapping.</p>
<p>Outcome criteria (measurable): ≥80% of triage comments rated useful or accurate; ≥1 bot-authored PR merged with ≤2 review-comment-driven changes; mean human time spent adjudicating halts ≤30 min/week.</p>
</blockquote>
<p>The implementation is sealed at tag <code>erbot-plan4-reviewer-ops</code>. The 4-week window is operational, not implementation. Whether the bot actually saves you time across a real month of issues is the only question that matters next.</p>
</section>
<footer class="colophon">
<div>Autopilot v1 reference brief · MCP_eRegistrations_BPA</div>
<div>2026-05-08</div>
</footer>
</div>
<script>
// Show "↑ Top" only after the user has scrolled past the hero.
(function () {
var threshold = 600;
var ticking = false;
function update() {
document.body.classList.toggle('scrolled', window.scrollY > threshold);
ticking = false;
}
window.addEventListener('scroll', function () {
if (!ticking) {
window.requestAnimationFrame(update);
ticking = true;
}
}, { passive: true });
})();
</script>
</body>
</html>