01 / The problem

Make human–GenAI collaboration legible.

Substrate extracts a paired, time-indexed representation of human–LLM dialogue: the user's evolving problem formulation (G_user), the model's inferred formulation (G_model), and a divergence channel capturing grounding acts and moments of silent disagreement.

This page documents how that extraction actually works — the schema, the prompt, the pipeline, the provider routing, and a real worked example. The goal is to make the algorithm legible from the outside, so the artefact can be evaluated independently of the demo.

02 / Pipeline overview

One conversation, two graphs, per-turn deltas.

For each turn t in the conversation, the extractor sees the conversation up to and including t, the current state of G_user / G_model as they stood after turn t-1, and emits a delta: nodes to add, nodes to update, edges to add, and node ids to mark as superseded. The delta is schema-validated, applied to the running graph, and streamed back to the client as an SSE turn:snapshot event.

Three design choices matter more than the rest:

Deltas, not re-extractions. The LLM sees the graph so far and is asked only for the change this turn produces. This preserves node identity across turns (no duplicate entities, stable ids) and dramatically reduces context size as the conversation grows.
Source spans on every node. Each emitted node carries a {turnId, start, end} span pointing to the exact substring it was extracted from. That's what powers the click-to-highlight binding in the UI and keeps the representation auditable.
Supersession, not deletion. When a later turn contradicts or replaces an earlier node, the prior node is marked superseded: true and a supersedes / contradicts edge is added. History is preserved.

03 / The schema

What can appear in a graph, and why.

Node types

Goal

The overall thing being attempted in the conversation.

Subgoal

A decomposition step beneath a goal.

Constraint

A stated requirement or limit (e.g. 'keep it under 10 lines').

Entity

A concrete artefact being discussed (a file, a function, a paragraph, a table).

Assumption

An implicit belief the side is operating under. Lower confidence by default.

Decision

A commitment the side has made.

Example

Illustrative content provided to demonstrate intent.

Open question

An unresolved item the conversation has surfaced.

Edge types

Type	Shape	Meaning
refines	A → B	A is a more specific version / decomposition of B.
constrains	A → B	A is a constraint that limits how B can be solved.
depends_on	A → B	A requires B in order to be defined or executed.
contradicts	A → B	A is in direct conflict with B (dashed in the UI).
instantiates	A → B	A is a concrete instance / example of B.
supersedes	A → B	A replaces B following a revision (dashed in the UI).

Divergence channel

Borrowing the grounding-act taxonomy from Shaikh et al. (NAACL 2024; ACL 2025) and extending it with five GenAI-specific divergence types that emerged from analysing real human–LLM sessions.

Grounding acts

Clarification requestedSomeone explicitly asks the other to clarify.

Clarification givenSomeone answers such a request.

AcceptanceExplicit agreement ("yes, that's right").

Implicit acceptanceContinued without pushing back on a claim.

RepairWalks back or corrects a prior statement.

ReformulationRestates the other's content in their own words.

Divergences

Unilateral additionModel introduced a constraint / entity / decision the user did not sanction.

Ignored constraintUser stated X; model proceeded as if X had not been said.

Scope driftOne side silently narrowed or broadened the goal.

Contradicted assumptionOne side acted on a belief the other had retracted.

Premature commitmentModel committed to a specific final answer before clarifying ambiguity.

04 / The extraction prompt

What the LLM is actually told.

This is the system prompt sent verbatim on every per-turn call, paired with a structured tool schema enforcing the TurnExtraction shape. The model is given the conversation, the prior graph state, and the next available node ids, and must respond by calling emit_extraction.

You are extracting structured representations from a multi-turn human-AI conversation.

Two parallel views are maintained:
- G_user: the user's evolving formulation of their problem (what THEY think the
  task is, what THEY have asked for, what constraints THEY have stated).
- G_model: the model's inferred formulation (what the MODEL has treated as the
  task, what assumptions the MODEL is operating under, what decisions the MODEL
  has committed to — including implicit assumptions visible from how the model
  responds).

You will be given:
1. The conversation up to and including a target turn N.
2. G_user and G_model as they stood AFTER turn N-1.
3. The role of turn N (user or assistant) and its content.

Your job: return the DELTAS to G_user and G_model produced by turn N, plus any
grounding acts or divergence events visible in turn N.

Rules:
1. Attribution. Only attribute a node to G_user if the user introduced or
   affirmed it. Only attribute to G_model if the model introduced it OR
   proceeded as if it were true (an implicit assumption — these get
   confidence 0.5-0.7, type "assumption").
2. Identity. If turn N refers to an entity already represented (e.g. "the
   function" referring to a function already in the graph), reference the
   existing node's id rather than creating a duplicate. Add it to updatedNodes
   with refreshed lastUpdatedAt.
3. Supersession, not deletion. If turn N supersedes or contradicts a prior
   node, add an edge of type "supersedes" or "contradicts" and list the
   superseded node id in supersededNodeIds. Never delete a node.
4. Source spans. For every added or updated node, set sourceSpan to the
   character range in turn N's content from which you extracted it. Use
   0-indexed half-open intervals [start, end). If you must approximate, pick
   a span that contains the relevant phrase.
5. Confidence.
   - 0.9-1.0 for explicit statements
   - 0.7-0.9 for direct inferences
   - 0.5-0.7 for implicit assumptions you had to read in
   - Below 0.5: do not add the node.
6. Node ids. Use the ids provided in the "next ids" block exactly — start at
   the next available number and increment. G_user nodes use prefix "u",
   G_model nodes use prefix "m", edges use "e".

Divergence events to detect (in this turn only):

Grounding acts (category="grounding_act"):
- clarification_request: someone explicitly asks the other to clarify.
- clarification_response: someone answers such a request.
- acceptance: explicit agreement ("yes, that's right").
- implicit_acceptance: continued without pushing back on something the other
  side claimed.
- repair: someone walks back or corrects a prior statement.
- reformulation: restating the other's content in their own words.

Divergences (category="divergence"):
- unilateral_addition: the model introduced a constraint/entity/decision the
  user did not sanction.
- ignored_constraint: the user stated X; the model proceeded as if X had not
  been said.
- scope_drift: one side silently narrowed or broadened the goal.
- contradicted_assumption: one side acted on a belief the other had retracted.
- premature_commitment: the model committed to a specific final answer before
  clarifying ambiguity that was clearly present.

For each event, cite the related node ids (userNodeIds, modelNodeIds), give a
specific one-sentence description (not generic), and for divergences include
severity (low/medium/high — high = it changes what the dyad is solving for).

Use the "emit_extraction" tool to return your output. Never return prose.

05 / Worked example

Turn 2 of "Refactor my parser"

The user asked for a small empty-cell fix on a 4-line parser. The model returned a class-based rewrite with strict mode, error handling, and a switch to the stdlib csv module. Here's what the extractor sees and emits.

Input — conversation so far

t1 · USER

Hey, my Python CSV parser is treating empty cells weird. They show up as empty strings but I'd really prefer them as None so the downstream code can use `is None` checks. Can you take a look? ```python def parse_csv(text): rows = [] for line in text.strip().split("\n"): …

t2 · ASSISTANT

Great, I can see several issues with your parser beyond the empty-cell handling. Let me give you a more robust solution: ```python from typing import Optional, List from csv import reader from io import StringIO class CSVParser: def __init__(self, strict: bool = True): …

LLM output — TurnExtraction delta

Nothing changes in G_user (the user didn't speak this turn). Eight new nodes appear in G_model — the model's inferred re-framing of the task. The first node:

{
  "id": "m1",
  "type": "goal",
  "label": "Provide robust CSV parser",
  "content": "Model reframed the task as providing a more robust CSV parser addressing multiple issues beyond empty-cell handling.",
  "introducedAt": "t2",
  "lastUpdatedAt": "t2",
  "confidence": 0.95,
  "sourceSpan": {
    "turnId": "t2",
    "start": 0,
    "end": 110
  }
}

Four divergence events also fire — the most consequential is scope drift:

{
  "id": "t2-d1",
  "type": "scope_drift",
  "category": "divergence",
  "turnId": "t2",
  "description": "Model broadened the goal from a small empty-cell-to-None tweak into a full parser redesign (class, strict mode, csv.reader, error list), expanding scope well beyond what the user asked for.",
  "userNodeIds": [
    "u1",
    "u4"
  ],
  "modelNodeIds": [
    "m1",
    "m3",
    "m4"
  ],
  "severity": "high"
}

That single event references u1, u4 from G_user (the user's goal and their "quick script, single use site" constraint) and m1, m3, m4 from G_model (the model's re-framed goal + introduced class structure + strict-mode error handling). Clicking it in the UI surfaces both sides simultaneously — that's where the substrate becomes legible.

06 / Theoretical lineage

Why this representation, not another.

Clark & Brennan (1991), Grounding in Communication. The conceptual root — dialogue as the joint construction of common ground.
Shaikh, Gligorić, Khetan, Gerstgrasser, Yang & Jurafsky (NAACL 2024), Grounding Gaps in Language Model Generations; and Shaikh, Mozannar, Bansal, Fourney & Horvitz (ACL 2025), Navigating Rifts in Human–LLM Grounding. Source of the grounding-act taxonomy used in the divergence channel.
Subramonyam, Pea, Pondoc, Agrawala & Seifert (CHI 2024), Bridging the Gulf of Envisioning. Motivation for externalising the user's evolving formulation.
Schneider, Machner, Jokinen & Matthes (SIGDIAL 2024), Bridging Information Gaps in Dialogues with Grounded Exchanges Using Knowledge Graphs (BridgeKG). Closest prior art on KG-mediated dialogue annotation; Substrate generalises from closed-schema dialogues to open GenAI workflows.
Laban, Hayashi, Zhou & Neville (ICLR 2026 Oral), Lost in Multi-Turn. The empirical motivation: LLMs lose ~35% on multi-turn vs single-turn because conversational state isn't maintained externally. Substrate is the artefact that lets it be.

Substrate is a research prototype. Back to the app · Terms · Source