Incremental computation without a dependency manifest

November 21, 2023

The naive way to make a computation incremental is to teach it what changed. The better way is to teach it nothing — let every value remember the inputs it read, and re-derive only the ones whose inputs moved. A correct incremental recompute is a full computation where the unchanged parts return instantly.

TL;DR

Don’t hand-maintain a dependency graph. A Recorded<T> records its own dependencies at runtime, the first time you ask for its value, by watching what it reads. No input is registered, no edge is declared by hand.
The recording mechanism is an AsyncLocal call stack. Every Recorded and every Recorder.Read pushes itself while its factory runs; whatever runs inside attaches as a child. The graph is the call tree, captured.
Invalidation is a tree walk comparing change tokens. Re-evaluate a Recorded and it asks each child “did you change?”. The leaf decides: a Read reports change when its token no longer Equals its recomputed token. The interior aggregates: a Recorded reports change if any child did. No change ⇒ replay the cached value, recurse no further.
Identity, not value, scopes a run. A fresh BeginScope() per run is the cache key. Same Recorded instances, new scope ⇒ everything is rechecked exactly once; reference-equal scope ⇒ memoized.
The whole thing is ~250 lines. No graph database, no persisted object store, no content hashes on disk. It lives entirely in memory — enough to turn a full recompute over thousands of inputs into a near-instant refresh when a single input changes.

The version a reasonable person writes first

Take any engine that turns inputs into outputs and gets asked to recompute when something changes — a build, a query planner, a spreadsheet. The moment “recompute everything” gets too slow, someone proposes making it incremental, and the first design is almost always a manifest. Record, for every output, the set of inputs it touched. On the next run, check each input, diff against the manifest, find the outputs whose inputs changed, recompute those. It works. It is also a second program living beside the first — one whose entire job is to stay in sync with what the computation actually reads.

That second program is where incremental engines rot. Someone adds a step that reads a config value and forgets to register it. Now an edit to that config silently produces a stale output, and the only symptom is someone asking why their change didn’t show up. The manifest and the code drift, because nothing forces them to agree. Correctness depends on a human remembering to declare every edge, forever.

The insight that kills the manifest: the code already knows its dependencies — it reads them. You don’t need to declare what a computation depends on if you can watch it depend.

`Recorded<T>`: a memoized cell that records what it read

A note on naming: three types share one metaphor. A static Recorder watches a computation run; it captures a Recording of everything that computation read; and the memoized cell whose reads were captured is a Recorded<T>. (Agent, artifact, result — same verb, three forms.) One thing worth stressing: a Recorded<T> is pull-based. It does nothing when its inputs change; it does its work when you read it, re-validating its recording and recomputing only if something moved. There is no push, no callback that fires on change — the change is noticed lazily, on the next read.

Here is the entire public surface:

public class Recorded<T>
{
    private readonly Func<T> _valueFactory;

    public Recorded(Func<T> valueFactory) => _valueFactory = valueFactory;

    public T Value { get; }  // computes once, then replays until a dependency changes
}

You wrap a computation in a Recorded. The first time someone reads .Value, it runs the factory and caches the result. Every subsequent read returns the cached value — until one of the inputs that factory read has changed, at which point it recomputes. That’s it. From the caller’s side there is no incremental machinery at all; there is just a value that happens to be cheap to ask for twice.

If that shape feels familiar, it’s Lazy<T> with one extra power. Lazy<T> also computes once on first .Value and caches the result — but it caches forever, because it has no notion of an input that could move. Recorded<T> is the same pull-based cell that learns what its factory read, so it can notice when the cached value goes stale and recompute. The rest of this note is, essentially, how a Lazy<T> earns the right to invalidate itself.

Internally a Recorded caches two things, and keeping them straight makes the rest of this note easy to read:

the result — the T the factory returned, stored in _value.
the recording — a Recording node holding what the factory read (its child dependencies and their change-tokens), stored in _recording, so a later read can ask “is this still valid?” before trusting _value. The result is the answer; the recording is the proof you can still trust it.

The interesting part is how it knows what “an input that changed” means, given that nobody told it.

Dependency tracking: reads announce themselves on a call stack

The graph has exactly three kinds of node, and the rest of this note is just filling in what each one does. An interior Recorded<T> is a memoized cell; it has no opinion of its own and is stale only if something under it moved. A leaf Read is the only node that touches the outside world — it captures a cheap change token and is the sole place a real “did this move?” comparison happens. A leaf Write is a recorded side effect — it never causes invalidation, it just needs to re-fire when its branch is reused. Interior nodes aggregate; leaves decide (a Read) or act (a Write). Build the tree first, then mark the leaves, then replay the side effects — that’s the order below.

The bookkeeping is done by a static Recorder. We’ll build it up one concern at a time — it’s a partial class, and each section below adds the members it needs. The first concern is the call stack itself, plus the two primitives that push and pop it:

public static partial class Recorder
{
    private static readonly AsyncLocal<ImmutableStack<IRecording>> s_callstack = new();

    // push the node onto the current stack
    internal static void BeginRecording(IRecording recording)
    {
        var stack = s_callstack.Value ?? ImmutableStack<IRecording>.Empty;
        s_callstack.Value = stack.Push(recording);
    }

    // pop the node, then attach it as a child of whatever is now on top
    internal static void EndRecording(bool attachToParent = true)
    {
        var stack = s_callstack.Value;
        s_callstack.Value = stack = stack.Pop(out var completed);
        if (attachToParent && !stack.IsEmpty)
            stack.Peek().AddChild(completed);   // <-- the edge, recorded automatically
    }
}

AsyncLocal is the async-aware cousin of [ThreadStatic]: the value is bound to the logical operation — the ExecutionContext the runtime captures and restores at every await and Parallel.ForEach body — not to the thread it happens to run on. So it flows down with the work even as it hops threads, and a pooled thread picking up a new operation starts from that operation’s context, never the leftover value of whatever ran on the thread before. That property is what makes this safe in a computation that is aggressively parallel.

BeginRecording/EndRecording are the entire tree-wiring mechanism, and they’re worth reading closely. Begin is the easy half: push the node onto the current stack. End is the mirror — pop the node back off, and, this is the whole trick, attach it as a child of whatever is now on top, the parent it ran inside of. Nobody declared that parent-child edge. It exists because one read happened lexically inside another’s factory. The dependency graph is the call tree, and the call tree records itself. (We’ll see in a moment why the edge is added here, on pop, rather than the moment a child is read.)

When a Recorded computes, it pushes a node, runs the factory, and pops:

var recording = new Recording();
Recorder.BeginRecording(recording);
try
{
    _value = _valueFactory();    // anything read in here attaches to `recording`
    _recording = recording;      // publish the recording — only after the factory succeeded
    return _value!;
}
finally
{
    Recorder.EndRecording(attachToParent: recording.HasChildren);
}

The recording starts empty, but by the time the finally runs the factory has finished — and everything trackable it read (a Read, or an inner Recorded that itself read something) has already attached itself as a child by the same rule, one level down (that’s the “anything read in here attaches” comment). So recording.HasChildren is really asking, after the fact, did this computation read anything trackable? If it did, we attach it to its own parent so the parent re-checks it on invalidation. If it read nothing, it has no dependencies, can never go stale, and is pruned rather than wired into the graph as dead weight — it still computed and cached its value, it just isn’t worth re-checking.

The leaves of the graph are Recorder.Read calls — the points where the computation touches the outside world (a file’s bytes, an mtime, an env var, a row in a table). You call Recorder.Read from inside a Recorded’s factory, exactly where you’d otherwise read the input directly, and by the call-stack rule it attaches as a child of the Recorded it ran inside:

var output = new Recorded<byte[]>(() =>
{
    // this read is a leaf; it attaches to the Recorded above by the call-stack rule
    var bytes = Recorder.Read(() => File.ReadAllBytes(path));
    return Transform(bytes);
});

(What a Read captures, and how a leaf decides it changed, is the next section — here it’s enough that the read attaches itself as a leaf.)

This is where the attachToParent: recording.HasChildren gate from the factory pays off — and why EndRecording adds the edge on pop, rather than the moment a child is read, which would be the obvious place. The prune decision needs information that doesn’t exist yet at read time: whether the child itself read anything. Attaching eagerly would wire every constant leaf permanently into its parent’s subtree, and every later invalidation walk would pay to descend into nodes that can never change. Deferring the edge to EndRecording — after the child’s factory has run and its child-count is known — is what makes that pruning possible at all. (It also keeps a factory that throws from leaving a half-finished node in the graph: the parent publishes its _recording only on success, inside the try.)

flowchart TD
    P["Recorded&lt;Output&gt;<br/>compute(id)"] --> R1["Read: input bytes<br/>token = mtime"]
    P --> T["Recorded&lt;PartA&gt;<br/>derive()"]
    P --> L["Recorded&lt;PartB&gt;<br/>derive()"]
    T --> R2["Read: dependency bytes<br/>token = mtime"]
    L --> R3["Read: referent exists<br/>token = bool"]
    classDef watch fill:#dbeafe,stroke:#2563eb,color:#1e3a5f;
    classDef read fill:#dcfce7,stroke:#16a34a,color:#14532d;
    class P,T,L watch;
    class R1,R2,R3 read;

Invalidation: the leaf decides, the interior node aggregates

On the next run, you read the same Recorded.Value. Before recomputing, it checks whether its recording still holds:

var currentRecording = _recording;
if (currentRecording != null && !currentRecording.HasChanged())
{
    Recorder.AttachToParent(currentRecording);  // re-attach to whoever's asking now
    currentRecording.Replay();                 // re-emit any side effects
    value = _value!;
    return true;                               // cache hit — factory never runs
}

HasChanged() is polymorphic, and the two node kinds answer it in opposite ways. Start at the leaf, because the leaf is the only node that actually looks at anything.

The leaf decides. A Read is the one node that captures a change token — a value that stands in for “did this input move?” In the common case you don’t supply the token at all; you just wrap the read, and the value becomes its own token:

public static partial class Recorder
{
    public static T Read<T>(Func<T> valueFactory)
    {
        var recording = new ReadRecording<T>(valueFactory);  // valueFactory doubles as token factory
        BeginRecording(recording);
        try
        {
            var result = valueFactory();
            recording.ChangeToken = result;   // remember what we saw
            return result;
        }
        finally { EndRecording(); }
    }
}

The single-arg Read passes valueFactory straight through as the token factory, so the recording stores the returned value as its ChangeToken. On the next run, that’s the only place the engine touches the outside world to ask “did this move?” — it recomputes the token and compares it to the frozen one:

// ReadRecording<T>
internal T? ChangeToken { get; set; }                 // frozen: what we saw last run
public bool HasChanged() => !Equals(ChangeToken, _changeTokenFactory());
//                                  ↑ stored token    ↑ re-runs valueFactory now

Here _changeTokenFactory is valueFactory, so this re-runs the value and compares it to the value seen last time. Correct by construction, and exactly right when the value is cheap.

When the value is expensive, a second overload lets you supply a cheap proxy as the token — read a file’s last-write-time as the token, run the costly read only when that token moved. The HasChanged() line is identical; the only difference is that _changeTokenFactory is now the cheap token factory instead of valueFactory:

Recorder.Read(
    valueFactory:       () => File.ReadAllBytes(path),       // the value
    changeTokenFactory: () => File.GetLastWriteTimeUtc(path) // the token
);

The interior node aggregates. A Recorded has no token of its own; its HasChanged() just recurses into its children and reports change if any of them did. If a file’s mtime is unchanged, that Read leaf reports no change; if every leaf under a Recorded reports no change, the Recorded is clean and replays its cached value without ever running the factory again. A Recorded with thousands of clean descendants returns in the time it takes to walk a tree of Equals calls.

When the subtree is large, that walk parallelizes — past a small threshold it fans the children out across cores and short-circuits the moment any one reports a change:

Parallel.ForEach(_children, (child, loop) =>
{
    if (child.HasChanged()) { result = true; loop.Break(); }
});

The asymmetry is the whole point. Detecting that nothing changed is cheap and parallel; only the parts that did change pay the cost of recomputing. A full computation and an incremental recompute run the same code — the incremental one just gets cache hits on every Recorded whose inputs held still.

Non-pure functions: side effects ride along via `Write` and `Replay`

Not every computation is pure. Some accumulate side state as they run — beyond returning a value, they build up an index, a diagnostics log, a set of derived edges. If a clean Recorded skips its factory, those writes would vanish on an incremental run — the output would be missing state that a full run produced. So writes are recorded too, and replayed on a cache hit.

Recorder.Write is the next member of the partial class — structurally a twin of Read, minus the change token: push a node, run the action inside it, pop. Recording the action is the whole point — it’s what lets a cache hit re-fire the side effect later.

public static partial class Recorder
{
    public static void Write(Action action)
    {
        var recording = new WriteRecording(action);
        BeginRecording(recording);
        try { action(); }
        finally { EndRecording(); }
    }
}

You call it from inside a factory wherever you’d otherwise mutate shared state directly:

Recorder.Write(() => _diagnostics.Value.Add(warning));

A WriteRecording reports HasChanged() => false (a write is never itself a reason to invalidate) and, on replay, re-runs its action:

public bool HasChanged() => false;
public void Replay() => _action();

So when a Recorded replays a cached value, ReplayCore() walks the subtree and re-fires every recorded Write — serially for a small subtree, and (past the same fan-out threshold as the change-check) in parallel for a large one, so replayed writes must not depend on firing order. The incremental run reconstructs the same accumulated side state a full run would have — it just skips recomputing the values that fed them. The output is identical; only the work differs.

With Write and Replay in hand, the two recursive walks line up. Both traverse the same child tree, dispatched polymorphically through the Recording node types, and a single cache hit drives both — HasChanged() first to decide, then Replay() to re-emit — each bottoming out differently per leaf:

flowchart TD
    V["Recorded.Value (next run)"] --> HC["recording.HasChanged()"]
    HC -->|"any child changed"| MISS["miss: run factory, recompute"]
    HC -->|"all children clean"| RP["recording.Replay()"]
    RP --> RC["ReplayCore() — serial for a small subtree, else Parallel.ForEach"]
    RC --> C1["child.Replay()"]
    C1 -->|"Recorded node"| RC
    C1 -->|"WriteRecording"| WA["_action() — side effect re-fires"]
    C1 -->|"ReadRecording"| NO["{ } — no-op"]
    classDef hit fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef miss fill:#fee2e2,stroke:#dc2626,color:#7f1d1d;
    class RP,RC,C1,WA,NO hit;
    class MISS miss;

HasChanged() short-circuits the moment one child reports change; Replay() is scope-guarded so a node re-emits at most once per run (a diamond doesn’t double-fire a shared Write). Reads replay to nothing — their job was the token check — so the only leaf that does anything on replay is a Write.

Scopes: reference identity is the cache key

So far the engine runs once. But picture it living inside a long-running server — a watch process, a language server — that must re-run the whole computation whenever something on disk moves. It needs a way to say “everything you cached is from the last pass; this is a new pass, re-check it all” — without throwing away the graph and rebuilding it. That signal is a scope: a fresh object minted per run — one rebuild pass, one “files changed, recompute” event — and the final pair of partial class members manage it.

public static partial class Recorder
{
    private static readonly object s_defaultScope = new();
    private static readonly AsyncLocal<IDisposable?> s_scope = new();   // static field, but per-run value

    // start a new run; dispose to end it. nesting is a bug, so it throws.
    public static IDisposable BeginScope()
    {
        if (s_scope.Value != null)
            throw new InvalidOperationException("Cannot start a nested scope.");
        return s_scope.Value = new DelegatingDisposable(() => s_scope.Value = null);
    }

    // the current run's identity — falls back to a process-wide default
    internal static object GetCurrentScope() => s_scope.Value ?? s_defaultScope;
}

GetCurrentScope returns a bare object; nothing reads its contents, only its reference identity. That identity is what makes one run “the same run” and the next one “a new run” — not the values. The field holding it is static, but its type is AsyncLocal — the same async-aware isolation the call stack relied on earlier — so two runs racing in the same process (two requests in a language server) each see their own scope through the one shared field; “static” here means process-wide reachable, not process-wide shared value. Each Recording stamps itself with the scope it was last checked in:

public bool HasChanged()
{
    var scope = Recorder.GetCurrentScope();
    if (scope == _hasChangedScope)   // already answered this scope — reuse
        return _hasChanged;

    lock (_children)
    {
        if (scope == _hasChangedScope) return _hasChanged;
        _hasChanged = HasChangedCore();   // walk children once
        _hasChangedScope = scope;
        return _hasChanged;
    }
}

Within a single run (one scope object), a node computes its changed-ness exactly once and memoizes the answer — a diamond-shaped graph doesn’t re-walk shared subtrees per parent. Open a new run with using (Recorder.BeginScope()) and you mint a fresh scope object for the duration of the block; now scope != _hasChangedScope everywhere, every node re-checks its tokens exactly once, and the whole graph refreshes in a single coherent pass before the using disposes it. That’s exactly the server loop: on each file-change notification, wrap the re-run in one BeginScope, and the same Recorded instances re-validate against the new scope — clean ones replay, dirty ones recompute. Same Recorded instances, new scope ⇒ a clean incremental pass. Reference-equal scope ⇒ pure memoization. Identity, not content, draws the boundary between runs.

Scoped<T> rounds this out: per-run mutable state (the error bag, the maps) keyed on the same scope object via a ConditionalWeakTable, so it’s collected when the run is, with no explicit teardown.

public T Value => _values.GetValue(Recorder.GetCurrentScope(), _ => _valueFactory());

You already know this engine — it renders your UI

If the three moves above feel familiar — a global “who’s computing right now” pointer, reads that auto-subscribe to it, a value that recomputes only when a tracked input moves — it’s because this is fine-grained reactivity, the same model under every modern front-end framework. Wrap a computation, read its inputs, get a cached value that invalidates itself. The vocabulary is all that changes:

This engine	Front-end equivalent	What it is
`Recorder.Read` (leaf, captures a token)	signal / ref / observable (Solid, Vue, MobX, Angular)	the tracked source of truth
`Recorded<T>` (memoized cell)	computed / derived / memo	a cached derivation that re-runs when its inputs move
`AsyncLocal` current recording	the “current observer” global (`activeEffect`, `activeConsumer`)	“whoever is computing now, attach to them”
`EndRecording` → `AddChild`	auto-subscription on read	the edge no one declared
`Recorder.Write` → `Replay`	effect (`watchEffect`, `effect`)	side effects that re-fire when deps change

The load-bearing trick is identical and has the same name in both worlds — automatic dependency tracking. It’s what separates this lineage from the “list your dependencies by hand” model: React’s useMemo(fn, [a, b]) arrays, Bazel’s declared inputs, the rotting manifest this note opened with. Outside the browser the same engine shows up in Rust’s salsa (which powers rust-analyzer), in spreadsheets (a formula subscribes to the cells it names), and in incremental compilers generally.

The one real divergence is push vs. pull, and it’s forced by the domain rather than chosen:

This engine pulls. An input changes with no callback to fire, so nothing happens until you next read .Value and the tree re-checks its tokens. Lazy, poll-on-read.
Front-end signals push. A signal.set(x) is a live write, so it eagerly marks dependents dirty through reverse subscriber links, then lets computeds recompute lazily on read. Push-pull.

Same dependency-graph skeleton, opposite trigger: a batch computation can’t be notified when its inputs move, so it polls; a UI is notified on every write, so it propagates. The closest cousins to this engine’s exact “compare a cheap token, replay if equal” check are Angular signals and MobX, which version-stamp dependencies and ask “did any of mine change since I last ran?” — structurally the same as HasChanged() memoized by scope identity.

The cache boundary is a local, runtime decision

There’s a quieter payoff to discovering edges from the call tree instead of declaring them: where you put the cache boundary is a per-cell choice you can move freely — even at runtime. Wrapping a sub-computation in a Recorded (cache it, recompute only on token change) or leaving it inline (recompute every time, hold no memory) is a one-line local edit, because no global manifest has to be kept in sync with that choice. The dependency edges re-derive themselves on the next run either way.

That makes the perf-vs-memory tradeoff tunable in place. Cache an expensive, rarely-changing derivation and you trade memory for skipped work; leave a cheap or huge-result one uncached and you reclaim the memory at the cost of recomputing it. Because the boundary is local, you can decide it per node — and even conditionally at runtime: notice a code path running hot under real load and you can wrap that one cell to start memoizing it, without re-declaring anything elsewhere. A static manifest fixes the granularity at authoring time; here the unit of caching is wherever you choose to draw a Recorded, whenever you choose to draw it.

Honest limits

This is a deliberately small design, and the smallness costs something.

It’s in-memory only. The graph and its cached values live in the process. Kill it and the next run is a cold full computation — there is no persisted object store, no on-disk content-addressed cache to warm from. (A persistent design — git-object-style content IDs, a SQLite-backed object store keyed by hash — is a natural extension and has been sketched, but it is not what ships here. Everything above is the actual in-memory mechanism, nothing aspirational.)
Change tokens must be cheap and honest. The whole model rests on Equals(token, recomputed_token) being both fast and a faithful proxy for “did the value change?”. A token that ticks without a real change forces a needless recompute; a token that doesn’t move when the input does produces a stale output. Choosing the token is the one place real judgment is required.
Re-deriving a token still touches the input. “Nothing changed” is cheap relative to recomputing values, but it is not free — you still probe every input (stat every file, ping every row version). For a tree of N inputs, detecting a no-op is O(N) token reads, not O(1). It wins because token reads are orders of magnitude cheaper than the work they gate, not because they’re zero.
Factories must be reentrancy-clean. A factory that tries to read its own Recorded.Value is a cycle, and the code throws rather than deadlock ("ValueFactory attempted to access the Value property of this instance."). The graph must be a DAG; the design doesn’t discover cycles for you, it just refuses to hang on them.
Replayed writes must be order-independent enough. Side effects are re-fired on cache hits, but into fresh per-scope state. If a write depended on observing another scope’s accumulated state, replay wouldn’t reproduce it. The model assumes writes are “add this item to this run’s collection,” which is the common case but not a universal one.

If you take only three things

Let dependencies record themselves. An AsyncLocal call stack turns “what did this computation read?” from a thing you declare into a thing you observe. The manifest that rots is the manifest you don’t have.
Make the cheap question the common one. An incremental engine spends most of its life confirming that nothing changed. Design so that confirming no-change is a parallel tree of Equals, and only genuine change pays for recomputation. Then incremental and full are the same code path.
Scope by identity, not by value. A fresh scope object per run, compared by reference, is a complete and tiny cache key — and pairs with ConditionalWeakTable to make per-run state self-cleaning. You don’t need a database to know which run you’re in.

Source: notes/incremental-computation-without-a-dependency-manifest.md @ 15cd198