A product designer’s case study· open source, MIT

Ed AgentThe operating model, runnable.

I’m a product designer. The whole job is finding the solution — to a visual problem, a product problem, the friction a user or client actually feels. When working alongside AI made my own practice fragile, that was a real problem. So I designed a solution. The real one runs, live, on this page.

9Human-gated stages
2Deliberation checkpoints
5Mission squads
0Dependencies · runs in your browser
Why a designer builds an agent

A designer’s job is to find the solution — not to add problems, and not to invent ones that were never there.

Problem-solving is the work, whatever the surface: the aesthetic problem, the product problem, the friction a user feels, the risk a client carries. AI didn’t change that job. It changed the material I was solving with.

And the new material was fragile. Hand a task to an agent and the reasoning behind it vanishes; the output can look rigorous and say nothing; a locally perfect step can quietly dig a global pit. Those aren’t hypotheticals — they’re the friction I hit in my own practice.

So I treated it like any other design problem: name it honestly, then build the solution. A way of working that keeps a person’s judgment in the loop, surfaces the right question at the right moment, and refuses to pretend it understands more than it does. That solution is Ed Agent.

The four problems below are stated plainly, not dramatised — a designer names a real problem before solving it, and never manufactures a fake one.

The four problems, named plainly

What kept breaking when AI did the work.

Real friction from real practice. Each is a problem a designer would refuse to ship around — and Ed Agent is the answer to all four.

01
Trust over authorship

You didn’t write it — so can you trust it?

Defect rates, production incidents, and the reviewer’s role all shift. The question stops being “is it correct” and becomes “should I trust this” — provenance, verification, blast radius, the gap between confidence and evidence.

02
Lost intent

Hand it all to AI and the why disappears.

Without the chain of intent and reasoning, the agent can only guess the project — never understand it. So the intent has to be captured at the door, and an unstated goal named as the first risk, not silently assumed.

03
Blind to the whole

Locally right, globally wrong.

Agents are strong on the local move and weak on the whole. The result is the technically-right, business-wrong step — perfectly correct, and quietly digging a pit. Every decision needs tracing back to the goal it was supposed to serve.

04
Confident filler

Looks rigorous. Reads professional. Says little.

AI is fluent in over-defensive work that seems sound and, on a second read, is ceremony — empty catches, grand claims with no numbers. Substance and ceremony have to be told apart, deliberately.

Live · running the real assessors in your browser

The console

Not a video, not a mockup. The deterministic brain — the same logic the CLI runs — is ported into this page. Drive a run, audit a diff, run the 總導師 review, or call the eds-mcp engine. Everything below computes live.

ed-agent · orchestration console
real logic · live
Runrequirement → 9 stages
Auditshould I trust this?
Optimize總導師 review
eds-mcpquantify a surface

Type a requirement (or pick one). Ed Agent detects the mission + squad, streams the nine stages, and stops at the FRAME checkpoint if the business intent is missing — the run stays in deliberation until you answer. Then the TRUST checkpoint runs live.

Point Ed Agent at an existing artifact — a diff, a snippet, a design note — and ask “should I trust this?” No build. State the goal and the non-goals and it will flag a technically-right, business-wrong local optimum that violates one.

The 總導師 review squad. Paste content — a deck line, a case study, marketing copy — and it runs the SOP: blind-score diagnostic, ban AI-tone filler, quantify-or-flag every claim, and no blind praise. The output is the three-part format.

Ed Agent doesn’t just talk about compliance — when the surface is regulated, it calls the real eds-mcp engine to show the numbers: the guardrail components the surface should map to, the regulatory anchors for the jurisdiction, and the WCAG token-contrast pass rate of the system it would ship on.

The assessors above are the real, deterministic Ed Agent logic, ported client-side. The eds-mcp figures are a faithful slice of the live engine (v1.16.0).

How it thinks

Nine stages. Two checkpoints. The judgment stays human.

The spine is universal; the middle five swap their content per mission. Two deliberation checkpoints sit at phase boundaries — the agent stops and surfaces the questions only a human can answer. Click any node.

The honest line

Ed Agent does not pretend to understand your business — a harness can’t. It forces the intent to be captured, runs deterministic trust / coherence / substance checks, and puts the right question at the right node. A run is shippable only when both gates are cleared and both checkpoints are closed. Surface the question; never make the call.

One harness, five squads

The roster swaps to the kind of work.

The mission is auto-detected from the requirement; the squad — named specialists — swaps with it. Build a feature, run a campaign, draft a contract, ship regulated finance, or review and optimize anything.

eds-mcp · integrated, not separate

The design-system engine, driven twice over.

eds-mcp is the other half of Ed Chen’s machine layer: the institutional-finance design system exposed as an engine — 65 component contracts across 14 regulated domains, 88 regulatory frameworks, tokens-first and dual-theme. Ed Agent drives it twice:

① The regulated-finance mission uses it to build — scaffolding compliant, token-correct, accessible UI. ② The optimize mission calls it to quantify a surface under review — mapping guardrail components, naming the jurisdiction’s anchors, and reporting the WCAG token-contrast pass rate.

That is the difference between asserting “accessible and compliant” and showing it. Try it live in the console’s eds-mcp tab.

Live slice · eds-mcp v1.16.0
65Component contracts
14Regulated domains
88Regulatory frameworks
18Contrast pairs · 3 below AA

The same numbers the console returns — pulled from the real engine, not asserted.

Where design is heading

The next UX has two users: the person, and the machine.

For years the work was to remove friction for one user — a human. It still is. But there is a second user in the room now. AI systems hit their own walls: a goal they were never told, a constraint nobody set, a context they can only guess at. The seam between the two — where the machine stops and a person decides — is itself a surface to design. The deliberation checkpoints in Ed Agent are that seam, drawn on purpose. Breaking the barrier between human and machine is just UX, one layer deeper.

And the tool is never the job. A product designer doesn’t owe every problem a Figma file or a wall of code. Some of the best work is a process, a constraint, or a refusal — the deliberate choice not to build.

Because the most common real problem is a manufactured one. Over-thinking breeds pseudo-needs; over-engineering ships ceremony. A designer who actually serves the client subtracts — finds the most reasonable, most cost-effective path, even when that path is less software, not more. That, to me, is what considering the client really means.

What this tool proves about the thesis

The fix for “working with AI made my practice fragile” was not more code. It was designing the judgment seam — the moment the machine surfaces a question and a human makes the call.

And it is built to fight the manufactured problem: the substance scan flags over-defensive ceremony, the coherence check catches the technically-right, business-wrong step. It subtracts pseudo-work instead of generating more of it.

The point

This is an operating model, made runnable.

Ed Agent is the runnable form of how Ed Chen works with AI in regulated design: a small team plus an agent fleet sustaining many product lines. Agents ingest, analyse, research, produce and review at velocity; design judgment and human sign-off stay with the operator.

● Real — runs live on this page

  • Mission + squad detection from the requirement
  • Intent capture · unstated intent flagged as the #1 risk
  • Trust score — provenance, verification, blast radius, confidence-vs-evidence
  • Global coherence — the technically-right, business-wrong local optimum
  • Substance scan — substance vs over-defensive ceremony
  • AI-tone scan · blind score · no-blind-praise verdict (EN + 中文)
  • The deliberation checkpoint flow + the shippable verdict

○ Illustrative — needs the Node runtime / MCP host

  • Writing the scaffolded HTML / CSS / JS files to disk
  • The MCP server that drops into Claude / Cursor / Codex
  • The append-only Ed_agents_Claude.md memory ledger across sessions
  • The host loop that hands open questions to a host LLM and back

Velocity, not autonomy — disclosed, not hidden. The full source is on GitHub.