Scoping auth tokens in headless Chrome with an MV3 extension

March 26, 2024

To render an authenticated page, you hand a headless browser a bearer token — and it does its job perfectly, which is exactly the problem. A browser sprays that token onto every request it makes, onto assets that never needed it and any off-site origin it touches, while the PDF comes out flawless. The real question was never how to send the token — it’s how to scope it.

TL;DR

We generate accessible PDFs by driving headless Chrome against our authenticated docs site (tagged output for screen readers — the reason we left wkhtmltopdf). To fetch the pages, the browser must send Authorization: Bearer <token>.
The trap: a global header fix (the Chrome DevTools Protocol’s setExtraHTTPHeaders, or a proxy) staples the token onto every request the page fans out to — every asset that never needed it, and any off-site origin it references. It renders a flawless PDF, so the leak passes its own happy-path test.
The tempting server-side fix fails too: an edge rule (Azure Front Door) that tokenizes “our traffic” can only key off a forgeable, caller-supplied signal — so anyone who sets it gets in.
The fix is per-URL scope: an MV3 extension with declarativeNetRequest rules — inject the token only on the app host, strip it back off the public asset paths.
Name the two boundaries: the inject rule’s host-scoped condition keeps the token off other origins (the real guarantee); the strip rule is same-host hygiene — not a cache guarantee.

The scenario: a PDF is just a browser you drove

We render PDFs by driving headless Chrome — and that’s a deliberate choice, not the path of least resistance. We used to run wkhtmltopdf, and we dropped it on purpose: it renders the page into a flat visual artifact with no document structure, so the resulting PDF is inaccessible — a screen reader sees an untagged blob. Headless Chrome, with --export-tagged-pdf, carries the page’s semantics into the PDF: heading structure, reading order, image alt text. So we point Chrome at the real page, let its own CSS and JavaScript do the layout, and print-to-pdf. Playwright drives it. The accessibility win only holds if the source is accessible — which means rendering the real, fully-styled, JS-hydrated site, not a stripped-down print view. The output is faithful to what a human sees because it is what a human sees.

But the site is behind auth. A logged-in user’s browser sends Authorization: Bearer <token> on every request without thinking about it; our headless browser has to do the same or every fetch comes back 401. So somewhere in the pipeline, we have to give the browser a token.

That sentence — “give the browser a token” — is where the security bug lives, and it’s easy to walk right past it.

The trap: a credential in a browser fans out

Here’s the reflex. You need an auth header on your requests, so you reach for the one-liner: the Chrome DevTools Protocol (CDP) Network.setExtraHTTPHeaders, or a forward proxy that rewrites requests. Either way you set Authorization: Bearer … once, globally, and move on.

It works. The PDF renders. Ship it.

The problem is what “globally” means to a browser. You were picturing the request — the GET for the authenticated HTML page. But a page isn’t a request; it’s a fan-out. Rendering one doc pulls fonts, images, CSS, JS bundles — dozens of sub-requests to whatever the page references. A global header override staples your bearer token onto every one of them. For a site whose assets are self-hosted, that’s mostly over-scoping within your own origin — the token riding onto /static/ resources that never needed it. But the same blunt override has no notion of origin at all: the day a page does reference something off-site, the credential goes there too, with nothing to stop it.

And nothing complains. The fonts still load, the images still load, the PDF still looks perfect — so it sails through review and into production. The failure mode is invisible precisely when everything appears to work.

The fix isn’t a better header-setting trick. It’s deciding, per request, whether this particular URL should see the credential at all — and making that decision declarative so it can’t quietly drift. That’s the shape declarativeNetRequest is built for.

The alternative we rejected: one gate at the edge

Before the extension, the tempting fix was server-side. The PDF renderer isn’t the only thing that needs to get past auth — our automated tests hit the same authenticated site for the same reason. So why not solve authenticated-automation once, at the edge: write a rule in the CDN/edge layer (Azure Front Door) that recognizes “our traffic” and attaches a predefined token, and let both the renderer and the test suite ride through it? One rule, every automation client covered, no per-client token plumbing. That economy is exactly what makes it attractive.

It’s also what makes it unsafe. The edge has to decide whose traffic gets the token, and the only thing it can decide on is what the caller sends — an agent name, a header, an IP. Every one of those is forgeable. A rule that says “attach the credential when the agent name is our renderer” means anyone who sets that agent name is handed the credential. The gate is a string any client can type; the convenience (one rule covers all our automation) and the vulnerability (one spoofable claim unlocks the site for anyone) are the same property. A claim is not a credential. The correct answer keeps the real token on the client — where holding it is the proof — and that applies to the tests too: they should carry their own token, not be waved through by the edge.

Doing it correctly: scope the token per-URL with an MV3 extension

The correct version is a small Manifest V3 extension loaded into the headless browser. It does two jobs: get the token into the extension, then rewrite request headers per-URL using it. The header rewriting is where the security work actually lives, so take that first.

Inject on the app host, and only the app host

A storage.onChanged listener rebuilds the rules whenever the token changes — so token rotation is just “write the new value to storage.” It installs two rules, and the relationship between them is the whole point:

async function updateRules() {
  const rules = [
    {
      id: 2,
      priority: 50,                       // strip: must out-rank the inject below
      condition: { urlFilter: `|https://${host}/static/` },
      action: {
        type: "modifyHeaders",
        requestHeaders: [
          { header: "Authorization", operation: "remove" }
        ]
      }
    },
    {
      id: 3,
      priority: 1,                        // inject: set token on any app-host request
      condition: { urlFilter: `|https://${host}/` },
      action: {
        type: "modifyHeaders",
        requestHeaders: [
          { header: "Authorization", operation: "set", value: "Bearer " + token }
        ]
      }
    }
  ];
  return chrome.declarativeNetRequest.updateSessionRules({
    addRules: rules,
    removeRuleIds: [2, 3]
  });
}

chrome.storage.onChanged.addListener(updateRules);

We use updateSessionRules (not updateDynamicRules) deliberately: session rules live in memory and never persist to disk, so a credential leaves no residue across browser restarts. The flip side is that the rules vanish on restart, which is why the token bootstrap runs on every launch rather than once.

The two scope boundaries the trap blurred are now explicit in this code, and they’re worth separating carefully:

Other sites — bounded by the condition. Rule 3’s condition is |https://${host}/. The | anchors to the start of the URL and ${host} is the single app host you stored, so the rule fires only on requests to that one origin. A request to any other host — a CDN domain, Google Fonts, an analytics beacon — doesn’t match, so no token is attached. The credential is bound to one origin by construction. This is the real guarantee, and it’s the one the global override threw away.

The app host’s own /static/ tree — bounded by the strip rule. Assets co-hosted under a path on the app host (/static/…) do match rule 3 — same host — so without more they’d carry the token too. Rule 2 removes it. The mechanism here is worth getting exactly right, because it’s easy to mis-state. For modifyHeaders, dNR does not pick a single winning rule the way it does for block or redirect — normally both matching rules apply. What makes the strip hold is a specific documented restriction: when a higher-priority rule removes a header, lower-priority rules can no longer modify it. So rule 2 (remove, priority 50) suppresses rule 3’s set on /static/ paths because it both removes and out-ranks it. The priority order is load-bearing and directional: if the set rule out-ranked the remove, both would apply, the set would win, and the token would leak straight onto /static/. Get the ordering backwards and the bug is silent.

And be precise about what this buys even when it’s correct: those assets are public and don’t need a credential, so this is least-privilege hygiene — don’t attach a secret to requests that don’t use it. It is not a guarantee about a downstream cache. (More on that in the limits.)

Cross-origin scope is enforced by the rule’s condition; same-host hygiene by a higher-priority strip. Conflating the two is how “the token is scoped” turns into a sentence nobody can actually defend.

Getting the token in: a host that never resolves

One puzzle remains: the token is born outside the browser, in the process that launches Chrome, and there’s no clean API to hand a value to an extension’s service worker. So we use navigation as the channel. The driver navigates to a sentinel host that is never meant to resolve:

https://bootstrap.example.invalid/?token=<url-escaped>&host=app.example.com

A priority-100 redirect rule catches that navigation inside the browser process — before any DNS lookup or network egress — and rewrites it to a page bundled in the extension:

const tokenPageUrl = chrome.runtime.getURL('/token.html');
chrome.declarativeNetRequest.updateSessionRules({
  addRules: [{
    id: 1,
    priority: 100,
    condition: {
      regexFilter: "^https://bootstrap\\.example\\.invalid/\\?(.+)$",
      resourceTypes: ["main_frame"]
    },
    action: { type: "redirect", redirect: { regexSubstitution: tokenPageUrl + "?\\1" } }
  }],
  removeRuleIds: [1]
});

The bundled page runs a tiny script that reads its own query string and writes to chrome.storage.local, which triggers the listener above:

const params = new URL(document.location).searchParams;
const items = {};
if (params.get("token")) items.token = params.get("token");
if (params.get("host"))  items.host  = params.get("host");
chrome.storage.local.set(items, () => { document.title = "Done"; });

Because the sentinel host never resolves, there’s no server, no listener, no DNS — the token is caught and stored entirely in-process.

Launching it

var context = await playwright.Chromium.LaunchPersistentContextAsync(string.Empty, new()
{
    Headless = false, // real headless comes from the flag below
    Args = new[]
    {
        "--headless=new",
        $"--disable-extensions-except={extensionPath}",
        $"--load-extension={extensionPath}"
    }
});

--headless=new is what lets an extension load headlessly at all; the old mode wouldn’t. The Headless = false line looks contradictory but isn’t: it stops Playwright from injecting its own legacy --headless flag (the one that blocks extensions), leaving us free to supply --headless=new ourselves. After launch, the driver hits the sentinel URL and polls document.title until it reads "Done" before navigating to real content — a small handshake that absorbs the race between “rules registered” and “first real request goes out.”

Honest limits

The approach closes the loud bug, but it isn’t magic, and the easy-to-forget security here cuts both ways — so name the soft spots:

The token rides in a URL. That should make you wince — query strings land in history and traces. But trace where this one goes: the sentinel host never resolves, the navigation is caught in-process, and the value never crosses the network. The only residue is local — the bootstrap tab’s in-memory history, or a Playwright trace if you’ve enabled one — on a throwaway headless context you tear down. For a short-lived, rotating token that’s a low-severity, local-only artifact. If you want zero residue, POST the token or pass it via postMessage so it never appears in a URL.
Stripping the request header is not response-cache security. The strip rule keeps the token off /static/ requests, but whether a shared cache serves one user’s authenticated response to another is governed entirely by the response side — Cache-Control: private/no-store and the cache key. A request-header rule can’t touch that. Read the strip as hygiene, not as a cache-safety control.
Path vs. separate host. The strip rule only does anything when public assets are co-hosted under a path (/static/). If yours live on a separate hostname (cdn.example.com), the inject rule’s host-scoped condition already excludes them and the strip is redundant — match on whatever actually distinguishes your public assets.

If you take only three things: handing a headless browser a credential silently widens its blast radius to every request the page fans out to, and it passes the happy-path test — so don’t trust “it renders fine”; the inject rule’s host-scoped condition is what keeps the token off other origins — that’s the guarantee that matters; and the higher-priority strip is same-host hygiene, not a promise about what a downstream cache does with the response.

Source: notes/scoping-auth-tokens-headless-chrome-mv3.md @ 15cd198