Everything in the earlier labs was so an agent could act without you watching. That only works if the agent can prove its work actually happened and cannot accidentally do it twice. This is not polish. We hide a unique marker in the build (a sentinel) and keep checking the live URL until that exact marker shows up. We run the real user path and fail loudly if the output is wrong (a smoke test). And we give every important write a key that makes running it twice safe, so a retry never fires the action again (this is idempotency). Proving it works before calling it ready, and making writes safe to repeat, are what earn an agent the right to act on its own.
The trap: a deploy command reports success while the CDN (the network of edge servers that caches and serves your site) is still handing out the old build. "Success" means the platform accepted your files, not that visitors see them yet. So you put a unique BUILD_ID inside the build itself, then keep checking the live URL until that exact id shows up. Watch the ETag change too: it is a version tag the server sends so you can tell when the content has actually changed. Vercel is the example here, but the idea works anywhere: any URL, any marker. Stamp the id in at build time.
# Generate a unique id per build and write it where the live page will serve it. # (Here: a meta tag in index.html and a plain /build-id.txt next to it.) BUILD_ID="build-$(date +%s)-$(git rev-parse --short HEAD 2>/dev/null || echo nogit)" echo "stamping $BUILD_ID" # index.html gets a machine-readable marker the poller can grep for. printf '<meta name="x-build-id" content="%s">\n' "$BUILD_ID" >> dist/index.html printf '%s\n' "$BUILD_ID" > dist/build-id.txt # ... then ship it (Vercel here; swap for your deploy target) ... # vercel deploy --prod --yes echo "$BUILD_ID" # hand this exact value to the poller below
Now check in a loop. The checker knows nothing about the platform. It just asks the public URL for the marker and tries again, waiting a bit longer between tries, until the marker appears or it hits a firm limit and gives up. There is a bash version for mac and Linux and a PowerShell version for Windows, so the same check runs on either machine.
#!/usr/bin/env bash
# Usage: ./poll.sh https://your-app.vercel.app build-1717...-a1b2c3
set -euo pipefail
URL="$1"; WANT="$2"; MAX=20; n=0
while [ "$n" -lt "$MAX" ]; do
n=$((n+1))
# -s body, plus the ETag so we can see the edge object change.
body="$(curl -fsSL "$URL" || true)"
etag="$(curl -fsSI "$URL" | tr -d '\r' \
| awk -F': ' 'tolower($1)=="etag"{print $2}')"
if printf '%s' "$body" | grep -qF "$WANT"; then
echo "live: $WANT after $n polls (etag $etag)"
exit 0
fi
echo "poll $n/$MAX: sentinel not live yet (etag $etag), sleeping"
sleep 6
done
echo "FAIL: $WANT never went live after $MAX polls" >&2
exit 1
# Usage: .\poll.ps1 https://your-app.vercel.app build-1717...-a1b2c3
param([string]$Url, [string]$Want, [int]$Max = 20)
$ErrorActionPreference = "Stop"
for ($n = 1; $n -le $Max; $n++) {
try {
$r = Invoke-WebRequest -Uri $Url -UseBasicParsing
$etag = $r.Headers["ETag"]
if ($r.Content -like "*$Want*") {
Write-Host "live: $Want after $n polls (etag $etag)"
exit 0
}
Write-Host "poll $n/$Max`: sentinel not live yet (etag $etag), sleeping"
} catch {
Write-Host "poll $n/$Max`: request failed, retrying"
}
Start-Sleep -Seconds 6
}
Write-Error "FAIL: $Want never went live after $Max polls"
exit 1
poll 1/20: sentinel not live yet (etag W/"a1b2-old"), sleeping poll 2/20: sentinel not live yet (etag W/"a1b2-old"), sleeping poll 3/20: sentinel not live yet (etag W/"7f3c-new"), sleeping build-1717-abc123 live after 4 polls (etag W/"7f3c-new") exit 0
The exit code is the promise. (An exit code is the pass/fail number a script leaves behind when it ends; 0 means success.) Here 0 means the thing you built is the thing the internet now serves. Anything else means do not announce yet. An agent waits on this, not on the deploy command's "success."
A response of 200 (the HTTP code for "OK") does not mean correct. A page can return 200 and still show an error. An MCP server (a small server that exposes tools an AI agent can call) can return 200 and be missing half its tools. So you walk the actual user path and check for the output you expect, then exit with a failure code on any mismatch so an agent (or an automated build) can stop on it. For a page, check that a known string is there. For an MCP server, call tools/list (the request that asks a server which tools it offers) and check that every tool name you expect is present.
#!/usr/bin/env python3
"""Smoke the live deploy. Exits 0 only if the real flow produces the
real output. Generic: a page-string check and an MCP tools/list check."""
import sys, json, urllib.request
def fetch(url, data=None, headers=None):
req = urllib.request.Request(
url, data=data, headers=headers or {}, method="POST" if data else "GET")
with urllib.request.urlopen(req, timeout=15) as r:
return r.status, r.read().decode("utf-8", "replace")
def check_page(url, must_contain):
status, body = fetch(url)
ok = status == 200 and must_contain in body
print(("PASS" if ok else "FAIL"), "page", url,
"->", status, repr(must_contain), "present" if ok else "MISSING")
return ok
def check_mcp_tools(mcp_url, expected):
# One framed JSON-RPC call. tools/list must return every name we expect.
payload = json.dumps({
"jsonrpc": "2.0", "id": 1, "method": "tools/list"}).encode()
status, body = fetch(mcp_url, data=payload,
headers={"Content-Type": "application/json"})
try:
names = {t["name"] for t in json.loads(body)["result"]["tools"]}
except Exception:
names = set()
missing = set(expected) - names
ok = status == 200 and not missing
print(("PASS" if ok else "FAIL"), "mcp", mcp_url,
"-> got", sorted(names), "missing", sorted(missing))
return ok
if __name__ == "__main__":
base = sys.argv[1] if len(sys.argv) > 1 else "https://your-app.vercel.app"
results = [
check_page(base, "ship + prove"),
# check_mcp_tools(base + "/mcp", ["get_notes", "search_notes", "add_note"]),
]
sys.exit(0 if all(results) else 1) # non-zero -> the agent holds
$ python smoke.py https://your-app.vercel.app PASS page https://your-app.vercel.app -> 200 'ship + prove' present $ echo $? 0 # a missing tool or a changed string flips this to FAIL and exit 1.
Run them in order: check for the marker, then run the smoke test, and only then does the agent say the word "deployed." A 200 and a hope is how a broken build gets announced to a room.
Once an agent runs on its own, crashes, restarts, and accidental double-sends are normal, not rare. So before any write that has real consequences (send an email, book a venue, post a tweet) you build an idempotency key from the action itself (who it is for, what it is, and the day), reserve that key in a store, and if it was already done you simply do nothing. Pair the reservation with a log you only ever add to (an append-only ledger) that carries a "do not repeat" window: 30 days for invites, 60 seconds for one-time login links. A retry turns into a skip, not a second email to a sponsor.
#!/usr/bin/env python3
"""Idempotent consequential write over a tiny store + append-only ledger.
The store backs idem_claim/idem_record; the ledger gives a human-auditable
trail with a dedup window. Swap the dict for Redis/KV; the contract holds."""
import json, time, hashlib, os, threading
_LOCK = threading.Lock()
_STORE = {} # key -> claimed_at epoch (swap for KV)
LEDGER = "sent_ledger.jsonl" # append-only audit trail
def idem_key(recipient: str, action: str, day: str) -> str:
raw = f"{action}|{recipient}|{day}".encode()
return "idem:" + hashlib.sha256(raw).hexdigest()[:16]
def _within_window(key: str, window_s: int) -> bool:
"""True if this key was recorded in the ledger inside the window."""
if not os.path.exists(LEDGER):
return False
cutoff = time.time() - window_s
with open(LEDGER, encoding="utf-8") as f:
for line in f:
try:
row = json.loads(line)
except ValueError:
continue
if row.get("key") == key and row.get("ts", 0) >= cutoff:
return True
return False
def idem_claim(key: str, window_s: int) -> bool:
"""Reserve the key. Returns False if already claimed/recorded in-window."""
with _LOCK:
if key in _STORE or _within_window(key, window_s):
return False
_STORE[key] = time.time()
return True
def idem_record(key: str, meta: dict) -> None:
"""Commit to the append-only ledger after the side effect succeeds."""
with _LOCK, open(LEDGER, "a", encoding="utf-8") as f:
f.write(json.dumps({"key": key, "ts": time.time(), **meta}) + "\n")
def send_invite(email: str, day: str, window_s: int = 30 * 86400) -> str:
key = idem_key(email, "invite", day)
if not idem_claim(key, window_s):
return f"already done, skipping ({key})"
# --- real side effect goes here (SMTP, API call, booking) ---
# send_email(email, ...) # only runs once per key per window
idem_record(key, {"email": email, "action": "invite", "day": day})
return f"sent invite to {email} ({key})"
if __name__ == "__main__":
day = time.strftime("%Y-%m-%d")
print("run 1:", send_invite("sponsor@example.com", day))
print("run 2:", send_invite("sponsor@example.com", day)) # retry -> no-op
$ python idempotent_send.py run 1: sent invite to sponsor@example.com (idem:9c1f4a2b7e0d583a) run 2: already done, skipping (idem:9c1f4a2b7e0d583a) # one ledger line, one email. the retry changed nothing.
The window is the setting you tune. 60 seconds blocks a flood of one-time login links while still allowing a fresh one tomorrow. 30 days blocks re-inviting the same person to the same series. The key comes from the action itself, so two processes that try the same send at the same instant cannot both reserve it. Only one wins.
The whole point of this masterclass is to let an agent do real things while you are not watching. That is only safe if two things are true. The agent can prove the thing happened: the marker matched and the smoke test passed, not a 200 and a hope. And the agent cannot do it twice: every write carries an idempotency key, so a retry, a crash-and-restart, or an accidental double-send does nothing the second time, instead of sending a duplicate email to a sponsor.
A real scar from this stack: an agent that was handed a task deployed to production and committed the code, despite an explicit do NOT deploy, because that rule lived only as a sentence in its instructions instead of being enforced in the tools. A written instruction does not stop a tool call. Enforce consequential actions in code, confirm them against git and the live URL, and never on a teammate's word alone.
Prove it before you call it ready. Make it safe to repeat before you let it do anything that matters. Those two habits are the whole reason you can trust an agent to act for you.
You started with one function that tells the truth about failure. You wrapped it in an MCP server an agent can call. You wrote schemas a small model can read. You published a front door agents can find on their own, then locked it behind scopes. You gave it memory it can trust, blended four kinds of memory into one mix, packed the context window with only what matters, and compiled your prompts instead of writing them by hand. You taught it to talk to other agents, then to run ten at once behind a gate. You turned a closed site into a typed function you own. And now you ship and prove it live. That is a repo an agent can find, log into, call, remember, coordinate, and trust. That is the masterclass.
Thirteen labs. You and the machine, building software for an audience that is partly machine. Now go ship something it can use.
After you deploy <project>, do not tell me it is done. Poll the live URL until the sentinel <BUILD_ID> is served and the ETag changes, smoke-test <flow> and exit non-zero if the expected output is missing, and prove the main write is idempotent by running it twice and showing the second run no-ops. Then report what you verified, with the evidence.