● lab 12 | ~12 min | masterclass

No public API? Build one. Four layers, one of them disposable.

Most of the tools you actually use have no API (no official way for code to talk to them), or one locked behind enterprise approval. That does not stop an agent. The four-layer pattern turns any site you can log into in a browser into something your agent can drive. You capture your logged-in session, replay it with a client that pretends to be a real browser, wrap that in clean typed functions, and put a small command-line tool on top. It is built so that when the connection breaks, you fix one layer and the rest never notices. The order to prefer is always: an official command-line tool first, then an official code library, then the public web API, and only then this. This is the last resort, and it always works.

step 1

Layer 1: capture the session.

Log in once with your real browser, then save the login cookies to sessions/<site>.json. First, find the call you want. Open DevTools (the developer panel built into every browser, usually F12), go to the Network tab (it lists every request the page makes), filter to Fetch/XHR (the behind-the-scenes data requests, as opposed to images or styles), and do the one action you care about: send a message, run a search, open an inbox. The page fires one of those requests at some /api/... or /graphql address (the endpoint, the exact URL that does the work). That request, headers and all, is your target. The cookies riding along on it are what proves you are logged in. Save them atomically, meaning the write either fully finishes or does not happen at all, so a crash partway through never leaves you locked out.

capture_session.py

import json, os, tempfile

# You logged in with a real browser. These are the cookies that ride on the
# XHR you found in the Network tab -- the auth the endpoint actually checks.
# (In Ray's stack a CDP bootstrap reads these straight from the live tab via
# Storage.getCookies, so HttpOnly cookies are visible. Shown here as a literal
# so the shape is obvious.)
captured = {
    "session_id": "PASTE_FROM_DEVTOOLS",   # the long-lived auth cookie
    "csrf_token": "PASTE_FROM_DEVTOOLS",    # rotates; we re-read it per response later
}

SESSION_PATH = "sessions/example.json"

def save_atomic(path, data):
    """tmp + os.replace so a crash mid-write never corrupts a good session."""
    os.makedirs(os.path.dirname(path), exist_ok=True)
    # Reject empty cookies now: an empty string passes `if cookie:` but 403s
    # every later request with a cryptic auth error.
    for k, v in data.items():
        if not v or not isinstance(v, str):
            raise ValueError("cookie " + k + " is empty -- re-capture, do not save")
    fd, tmp = tempfile.mkstemp(dir=os.path.dirname(path), suffix=".tmp")
    with os.fdopen(fd, "w") as f:
        json.dump(data, f)
    os.replace(tmp, path)   # atomic on the same filesystem

if __name__ == "__main__":
    save_atomic(SESSION_PATH, captured)
    print("saved " + str(len(captured)) + " cookies -> " + SESSION_PATH)

One rule that saves an hour: refuse to save empty cookie values. An empty string slips past a simple "is the cookie set" check, then makes every later request fail with a 403 (access denied) and an error message that points at the wrong cause. Much cheaper to catch it here than to chase it later.

step 2

Layer 2: replay with curl_cffi.

Now call that saved endpoint with the saved cookies. The catch: when a program opens a secure connection, the way it does so leaves a signature (a TLS fingerprint), and a plain Python requests call has a signature that clearly is not a browser. A site that checks for this will deny you with a 403 even when your cookies are perfect. curl_cffi (a Python web client that can copy a real Chrome browser's signature) with the setting impersonate="chrome120" makes your connection look like Chrome. That one setting is the difference between blocked and through.

client.py

import json
from curl_cffi import requests   # pip install curl_cffi

ENDPOINT = "https://example.com/api/notes"   # the XHR you captured in step 1

def load_session(path="sessions/example.json"):
    with open(path) as f:
        return json.load(f)

def make_session():
    saved = load_session()
    # impersonate="chrome120" -> Chrome's real TLS fingerprint, not python's.
    # A plain requests/httpx fingerprint gets blocked; this gets through.
    s = requests.Session(impersonate="chrome120")
    s.cookies.update(saved)
    # Realistic headers still matter -- curl_cffi only fixes the transport.
    s.headers.update({
        "Accept": "application/json",
        "X-Csrf-Token": saved["csrf_token"],
        "Sec-Fetch-Site": "same-origin",
    })
    return s

def fetch_notes():
    s = make_session()
    r = s.get(ENDPOINT)
    r.raise_for_status()
    return r.json()

if __name__ == "__main__":
    print(fetch_notes())

That is the whole replay: load the cookies, build a Chrome-shaped session, call the endpoint, return the JSON. But everything so far hands back raw data and raw crashes, which is exactly what Lab 01 told you never to give an agent. Step 3 fixes that.

step 3

Layers 3 and 4: type it, then put a CLI on top.

Wrap the raw call as a typed verb, a clean function that always returns the same predictable result shape from Lab 01: .ok, .error, and .error_kind (a short label for what went wrong). Sort the failures that matter into those labels: 401 -> auth_expired (your login expired, re-capture it), 429 -> rate_limit (you are going too fast, slow down), and anything else -> transient (a one-off glitch, worth a retry). Then a tiny command-line tool calls that verb. The payoff is the whole point of the lab: your agent calls one clean function and never touches the raw connection.

verb.py

from dataclasses import dataclass
from curl_cffi.requests.errors import RequestsError
import client   # step 2

@dataclass
class NotesResult:
    notes: list | None = None
    error: str = ""
    error_kind: str = ""        # auth_expired | rate_limit | transient
    @property
    def ok(self):
        return self.error_kind == "" and self.notes is not None

def get_notes():
    """Typed verb. The agent calls this -- never the raw HTTP."""
    try:
        s = client.make_session()
        r = s.get(client.ENDPOINT)
        if r.status_code == 401:
            return NotesResult(error="session rejected", error_kind="auth_expired")
        if r.status_code == 429:
            return NotesResult(error="rate limited", error_kind="rate_limit")
        if r.status_code >= 400:
            return NotesResult(error="http " + str(r.status_code), error_kind="transient")
        return NotesResult(notes=r.json())
    except (RequestsError, ValueError) as e:
        # never let a raw secret-bearing body into the message
        return NotesResult(error=type(e).__name__, error_kind="transient")

cli.py

import argparse, json
from verb import get_notes

p = argparse.ArgumentParser()
p.add_argument("action", choices=["notes"])
args = p.parse_args()

r = get_notes()
print(json.dumps({"ok": r.ok, "error_kind": r.error_kind, "notes": r.notes}, indent=2))

running the cli -- a typed result, not a raw dump

$ python cli.py notes
{
  "ok": true,
  "error_kind": "",
  "notes": ["ships on vercel", "codes on windows"]
}

# session expired? same shape, no traceback, agent knows what to do:
{ "ok": false, "error_kind": "auth_expired", "notes": null }

An auth_expired result is a signal, not a crash. Your agent reads it, re-captures the session from step 1, and tries once more. That self-recovery only works because the failure came back as a clean label instead of a stack trace.

the fragile part is not the request. it is the session and the schema drift.life / reverse-api

These internal endpoints come with no promises. There is no published contract and no guarantee they will keep working (no SLA, no service-level agreement). The id hashes that GraphQL APIs use to name a query rotate every few months, cookies expire on a timer, and fields get renamed with no warning. A careless scraper that bakes the request straight into its logic breaks the day any of that changes, and it takes your whole script down with it. The four-layer split is what survives. When the connection changes, you fix layer 2 only, and layers 3 and 4 (your typed functions, your command-line tool, and the agent calling them) never notice.

Real war stories from this exact stack:

• Twitter / X: the id hashes that name each GraphQL query rotate. When a call starts returning 404 (not found) on the /graphql/ path, you grab the fresh id from the DevTools Network tab and drop it in a captures file. Label that 404 as operation_stale rather than a generic not-found, so the error itself tells you which layer to fix.

• Namecheap: the server sends an anti-forgery token (a CSRF token, a value that proves the request came from the real logged-in page) in a cookie named x-ncpl-csrf, but it wants that value sent back in a header named x-ncpl-rcsrf, with an extra r. Miss that one-letter rename and every logged-in POST fails with a 403.

• When the request keeps drifting on you: run a fetch() from inside the already logged-in page using CDP (Chrome DevTools Protocol, a way to drive a real browser from code). The browser then builds the request with its own cookies and headers, so you sidestep the drift entirely.

your agent, on a typed auth_expired verb returned error_kind=auth_expired, re-running the session bootstrap then retrying once.

That line is the whole point. The connection broke, the verb labeled the failure, and the agent fixed itself with no human involved. A careless scraper would have thrown a stack trace and stopped dead.

hand this to your coding agent

Reverse <site> for me. Capture my logged-in session to sessions/<site>.json,
find the XHR endpoint behind <action> in DevTools Network, build a curl_cffi
client that impersonates chrome120 and replays it, then wrap it as a typed verb
returning a Result with auth_expired / rate_limit / transient error_kinds, and
put a small CLI on top. Tell me which layer to fix when the endpoint hash rotates.

checkpoint

A site with no API is now something your agent can drive through one clean function, and it holds up when the site changes underneath it. The connection lives in a single layer. When a cookie expires or an id hash rotates, you fix layer 2 and nothing above it notices. That is the difference between a script that works today and a client your agent can rely on for months.

Ethics: do this only for sites you have a real account on and whose terms of service you have read, at personal-use scale, never to circumvent an access control you were not granted.

← 11 | orchestrate a fleet 13 | ship it and prove it →