CONCEPT / THESIS DEMO · AI-HUMAN INTERFACE

Intent Canvas
A shared typed artifact between humans and AI

Every conversation with an AI agent today is four translations per turn: my intent into prose, prose into the model’s graph, graph into the model’s prose, prose back into my head. Three chances for drift, every turn. This is the prototype of a different interface — one where the human and the model edit the same typed structure, and the model’s response is a diff of that structure, approved row by row, with each row citing the exact constraint it came from.

Why this exists: I spent four years at an ASIC-regulated broker watching compliance, risk, and product teams try to use AI tools on work that gets audited. The prompts got longer. The drift stayed. The problem isn’t prompt quality; it’s that prose was never meant to be a wire protocol between two systems that don’t share a vocabulary. The demo below on a mock 12-holding portfolio is the shape of what I think replaces it. No real LLM call — this is a deterministic thesis demo, built so the interaction itself is the argument.
01 · The translation tax

We’re using prose as a wire protocol between two machines that don’t share a vocabulary.

Every conversation with an AI agent is a small act of translation. I have a structured intent in my head — a decision about my portfolio, a constraint on a disclosure, a sort order for a list of bonds — and I flatten it into English. The model reads the English, rebuilds a structured representation of its own, acts on that representation, and re-flattens the result back into English for me to read. Four translations per turn. Three chances for drift.

In a casual context that drift is the running joke of AI products: the model confidently gets something slightly wrong, you laugh, you retype the prompt. In finance it’s a compliance finding. A disclosure that reads as “margin requirements may change” instead of “margin requirements will change above 25% concentration” is the difference between an audit note and a headline.

The fix people reach for is better prompts. Longer prompts. System prompts with twenty bullet points. I spent two years writing those prompts for my own team at an ASIC-regulated broker. The prompts got longer. The drift stayed.

The problem isn’t that our prose is bad. It’s that prose was never supposed to be the protocol.

02 · The thesis

A shared, editable, typed artifact that the human and the model both look at — and neither owns alone.

An Intent Canvas is a structured representation of what the user is trying to do, rendered as a graph the user can see and edit. The model doesn’t read the user’s prose and guess at the graph. The model reads the graph. Every node is typed: constraint, entity, target, time-window, exclusion, preference. Every edge is labeled: applies-to, limits, requires, blocks.

When the user types a sentence — “reduce my tech concentration to under 30% by quarter-end” — the system extracts four nodes (Target: concentration, Entity: sector=tech, Constraint: ≤30%, Time-window: Q2 close) and shows them on a canvas. The user can correct any of them before the model proposes an action. Got the sector wrong? Click. Fix. Done. No re-prompting. No arguing with a chat bubble about what you meant.

The model’s response is not prose either. It’s a Semantic Diff: a set of proposed changes to the canvas, each with its own rationale citation. “I’m adding a node: Rebalance trade on AAPL, −2.1% position. Rationale: current AAPL weight 7.8%, tech sector weight 34.2%, largest contributor to over-limit.”

The human approves each proposed change with one click. No prompt engineering. No drift. Every approval is a signed row in an audit log.

03 · The loop

Five steps. No prose in the middle.

How a portfolio rebalance actually flows through the canvas — once at the start, once at the end, and nowhere else.

  1. 01

    Prose in.

    User types one sentence, or picks a preset. This is the only place prose touches the system.

  2. 02

    Graph extraction.

    A parser — deterministic rules plus a small model — turns the sentence into typed nodes on the canvas. User sees them immediately, rendered in plain view.

  3. 03

    Graph edit (optional).

    User adjusts any extracted node. This replaces prompt-engineering: if the model got “tech” wrong and meant “large-cap tech excluding NVDA”, the user clicks the node and edits.

  4. 04

    Semantic Diff proposed.

    The model reads the canvas (not the prose) and proposes a set of changes. Each change carries its own citation — which node in the canvas triggered it, which portfolio position it affects, what the dollar impact is, what the confidence level is.

  5. 05

    Line-by-line approval.

    User approves, rejects, or edits each proposed change. The approved diff is applied. The rejected and edited rows are logged as training signal for the next turn.

The prose step happens once. The canvas state is what persists across turns. Ten turns in, the canvas is a rich structured object; the prose log is a footnote.

04 · The demo (live)

“Reduce my tech concentration.”

Below is a working Intent Canvas on a mock 12-holding portfolio. Type one of the three preset prompts (or your own sentence), watch the graph extraction happen, edit the nodes if the extractor got anything wrong, and review the proposed Semantic Diff. This is deterministic — no real LLM call, no API key. The mock mirrors the structural behaviour described above. The demo isn’t the point; the shape of the interaction is.

Step 1

Prose in

Pick a preset, or type your own sentence.

Step 2

Intent Graph — editable

Click any node to edit its label. Drag to reposition. This is the shared artifact the model will read.

Pick a preset above, or type a sentence and press Extract to canvas. Nodes will appear here.

Target
What is being changed.
Entity
The subject (sector, ticker, book).
Constraint
The rule (≤, ≥, Reg T…).
Time
The horizon.
Exclusion
A negation.
Preference
A soft hint.
Step 3

Semantic Diff — the model’s response

Each row carries a rationale citation, an evidence block, and a confidence score. Approve, reject, or edit row by row.

Once the canvas is populated, press Propose Semantic Diff above to see the model’s proposed changes.

Step 4

Portfolio — before and after

12 mock holdings. Applying the approved diff changes the sector weight totals. No real orders placed — pressing Apply logs to console.

Before Tech 34.2%
    After (if approved) Tech —
      05 · Intent Graph — anatomy

      Six node types, four edge types. No free-text fields on the graph itself.

      The canvas is deliberately small. The six node types below cover roughly 90% of the product-rebalance and disclosure-edit intents I saw in four years at ACY. The remaining 10% falls back to a free-text Note node that the model is told to treat as informational, never authoritative.

      • Target

        The thing being changed. “Portfolio concentration”, “Disclosure wording”, “Sort order”. There is exactly one target per active intent.

      • Entity

        The subject. “Sector=tech”, “Order book=equities”, “Client=institutional”. Typed, not free text.

      • Constraint

        The rule. “≤30%”, “Reg T compliant”, “No hedge funds as counterparty”. Machine-readable predicate.

      • Time-window

        The horizon. “Quarter-end”, “Before market open”, “By Friday”. Parsed to an absolute timestamp at extraction.

      • Exclusion

        A negation. “Exclude NVDA”, “Not including options”, “Minus Treasury positions”.

      • Preference

        A soft hint. “Prefer lower tax-lot impact”, “Favour liquid names”. Influences but does not force.

      And four edge types:

      • applies-to — an entity is the subject of a target.
      • limits — a constraint bounds a target.
      • requires — one node is a precondition for another.
      • blocks — an exclusion prevents a target.
      06 · Semantic Diff — anatomy

      A diff is not a message. It’s a set of proposed graph mutations with citations.

      Every row has the same shape, whether it adds a node, removes a node, or mutates an existing one. The format borrows from git diff — with every change annotated by why it was proposed, not just what it does.

      + Add node: Rebalance trade
        kind:            action
        target:          AAPL
        delta:           −2.1% position (−$64,300)
        rationale-source: node#3 (Constraint ≤30%), node#1 (Target: concentration)
        evidence:        current AAPL 7.8% of portfolio;
                         tech sector 34.2%; over-limit by 4.2pp
        confidence:      0.91
        status:          [approve] [reject] [edit]

      This format matters because it makes the model’s reasoning auditable at the granularity of a row. When a compliance officer asks “why did the model propose this trade”, the answer isn’t a paragraph. It’s two pointers into a typed graph plus the specific dollar numbers that triggered the action. No prose paraphrase. No re-interpretation.

      Rejected rows don’t disappear. They’re logged with the reject reason, which becomes input to the next session’s extractor tuning. A human who says “no, don’t touch NVDA” is teaching the next extractor to add an exclusion node automatically.

      The audit log is a flat table of approved rows. Each row traceable to a node. Each node traceable to the prose sentence that created it. The chain is complete in either direction.

      07 · Why this matters for finance

      Four regulatory primitives the canvas satisfies that a chat transcript doesn’t.

      None of these regulations mention AI. They mention records, rationales, and approvals. The canvas format maps directly; a prose chat transcript doesn’t.

      FINRA Rule 2111

      Suitability

      A suitability decision needs a documented match between a customer profile and the recommended action. The canvas stores the customer profile as nodes; the Semantic Diff stores the recommended action with citations back to those nodes. The canvas is the suitability audit trail by construction, not by retrofit.

      MiFID II Art. 24

      Best execution

      Article 24(1) requires firms to act in the client’s best interest. In UI terms that means the system must show the client the rationale for every recommended action in a form they can verify. The Semantic Diff row — with its evidence block and rationale-source pointers — is exactly that form.

      SEC Rule 17a-4

      Record-keeping

      The rule requires trading recommendations and their basis to be retained in an immutable form for three to six years. The canvas state plus the approved diff log is a structured, replayable record. A prose chat transcript is retention; it’s not replayability.

      SOX §302

      Internal controls

      Every material financial decision needs a signed chain of approvals. The line-by-line approval model on the Semantic Diff is a signed chain of approvals — each approved row is who-approved-what-when, in a form a SOX auditor can walk without asking follow-up questions.

      08 · Where the numbers land

      Three separate cost lines the canvas compresses.

      These aren’t marketing numbers. They’re the three cost lines I’d defend in a technical review, with the method note attached.

      Token cost per turn

      Roughly 40–60% lower

      The prose-only protocol has to carry the full chat history into each turn to maintain context. With a canvas, only the canvas state (typed nodes, usually under 500 tokens for a complex rebalance) is carried. For a 10-turn session the prose baseline accumulates ~8,000 tokens of rolling history; the canvas baseline carries ~500 tokens of state plus ~200 tokens of new input per turn.

      Method Rough estimate against a typical 10-turn rebalance session with commercial model pricing at the time of writing. Production variance will be higher; this is the directional claim.

      Hallucination rate

      Near-zero on structured outputs

      The model is never asked to produce free-text rationales — only to mutate a typed graph with a constrained schema. Structured-output prompting (OpenAI response_format: json_schema, Anthropic tool use, Google controlled generation) ships with schema validation; malformed responses are retried at the API layer before reaching the user. The remaining error surface is semantic — wrong sector classification — not structural — fabricated company name.

      Method Observed pattern from structured-output API documentation; the structural error class is eliminated by validation at the transport layer, leaving only semantic misclassification as the residual.

      Compliance review time

      Roughly 3–5× faster

      At ACY I watched compliance review a chat-log-style disclosure session. The reviewer read the prose, reconstructed the decision tree, compared it to the source rule. ~45 minutes per session. With a Semantic Diff log, the reviewer scans the approved-row table — each row already citing the rule — in about 10 minutes.

      Method Internal observation across eight disclosure reviews between 2023–2025. Not benchmarked against production yet. The claim is directional; the mechanism (row-level citations vs prose decode) is the point.

      09 · Scope discipline

      Five things this case study is explicitly not trying to solve.

      1. 01

        Not a replacement for chat.

        Chat is fine for exploration, for reading, for loose conversation. The canvas is for action. Most products will have both surfaces.

      2. 02

        Not a universal protocol.

        The six node types and four edge types were derived from portfolio-management and disclosure-editing intents. A triage medical workflow has a different vocabulary. The pattern generalizes; the specific types don’t.

      3. 03

        Not live-trained.

        This demo is deterministic. No real LLM call, no API key, no retraining. The mock is calibrated to behave the way the thesis predicts. A production version wires the Semantic Diff format into a real structured-output API.

      4. 04

        Not a trading system.

        No real orders are placed. The portfolio is 12 mock holdings with plausible numbers. The demo stops at the approval step — Apply logs to the browser console, not to a brokerage.

      5. 05

        Not a finished product.

        This is a working thesis demonstration. Six node types cover maybe 90% of my target domain; the remaining 10% lives in a free-text escape hatch that a production version would formalize. Users who want to collaborate on extending the type system, reach out.

      10 · References & provenance

      Where the thinking comes from — and what I’m deliberately not drawing on.

      Direct theoretical sources
      • Donald Norman, The Design of Everyday Things — the gulf of execution and gulf of evaluation is exactly what prose-only interfaces widen, and what a shared canvas narrows.
      • Herbert Clark, Using Language — his work on common ground between interlocutors. The canvas is the physical artifact of common ground; before it existed, common ground had to be rebuilt by inference every turn.
      • Jakob Nielsen, 10 heuristics #1 “visibility of system status” — the canvas is system status made visible, with the twist that the user can edit it.
      Structured-output precedent
      • OpenAI response_format: json_schema documentation.
      • Anthropic tool-use + structured tool_result patterns.
      • Google Gemini controlled generation.
      • These are the API-level hooks that make a canvas-based agent implementable today, without model retraining.
      Finance reference material
      • FINRA Rule 2111 (suitability), FINRA.org rulebook.
      • MiFID II Directive 2014/65/EU, Article 24 — client best interest.
      • SEC Rule 17a-4, 17 CFR 240.17a-4 — record-keeping.
      • SOX §302 (15 USC §7241) — internal controls on financial reporting.
      What I’m deliberately not using
      • “AI agents” as a genre — the case study avoids that framing. Agents are an implementation detail; this case is about the interface between a human and whatever is on the other side of the wire.
      • Chain-of-thought prose traces — CoT is a training artifact, not a user-facing audit format. Auditors don’t read CoT; they read signed approvals.

      If you’re working on any of these primitives — canvas UI for AI agents, structured-output prompting patterns, auditable AI-human workflows in regulated domains — I’d like to compare notes. Email is the fastest path.

      Portfolio threads

      Where this case study sits in the larger web

      Every problem we solve for clients has multiple valid approaches — different costs, different ROI, different risk profiles. These threads show how the approach on this page compares to others in the portfolio.

      Thread

      Concentration, Risk & Agents

      Portfolio-level math primitives — HHI, beta, VaR, regime — rendered into UI defaults and AI-assisted decision surfaces.

      Thread

      Regulatory Routing & Disclosure

      How upstream regulation and macro prints become downstream product defaults and Legal-safe disclosure.

      Thread

      Evidence & Verification Discipline

      How we prove design claims with data — A/B, pooled-SD, cohort analysis, and the rigor behind every number quoted on this site.