- Best at
- Drill-down · multi-factor inspection
- Broke at
- Glance reads under vol spikes · threshold recall
- Time to decision
- 11.4s median · 18.2s p90
Nova
Designing AI Trust in High-Stakes Finance
An AI-assisted portfolio analysis platform where the hardest design problem isn't the algorithm — it's calibrating human trust. How do you present non-deterministic AI predictions alongside hard financial numbers without destroying user confidence or creating false certainty?
Executive Summary
Nova is an independent product design + engineering project that tackles the most dangerous UX problem in FinTech: how to present AI predictions in contexts where wrong answers cost real money. Rather than building another black-box robo-advisor, I designed an "Explainable Co-Pilot" architecture where:
- Hard math stays hard: margin, leverage, tax — the calculators show exact numbers.
- AI output is labeled probabilistic, shown inline with its confidence interval, never framed as certainty.
- Margin scenarios render as heat meters instead of nested tables. Comprehension moves from minutes to seconds — tested with four discretionary traders at ACY during the concept review.
Project Status: Independent Prototype
Nova is an independent prototype project built to explore AI trust design in financial contexts. It is not a production product with live users.
What This Means:
- All metrics represent user testing outcomes with recruited participants (8 retail traders, 2+ years experience)
- No production deployment — this is a design exploration, not a shipped product
- Purpose: To develop and validate design patterns for presenting AI predictions in high-stakes financial contexts
- Outcome: Design principles applied to ACY Securities platforms (risk visualization, confidence intervals)
Why I built this: ACY Securities didn't have a business case for AI-assisted analysis, so I built Nova independently to explore these design challenges and validate patterns that could inform future work.
Agile Iteration & Feature Evolution
Nova's development was characterized by rapid prototyping and tight feedback loops. Using AI-assisted design workflows, I was able to iterate on complex risk visualization components and expand feature sets in days rather than weeks.
Scroll inside the device to see full post
1. The Challenge: AI Trust in Financial Contexts
Retail investors lack institutional-grade decision support, making portfolio management highly emotional and reactive. Nova was conceived as an AI-assisted analysis platform to bridge this gap. But integrating AI into financial decisions creates a unique triple-threat:
Trust Erosion Risk
Presenting non-deterministic, generative AI data in a high-stakes financial context risks catastrophic trust erosion if predictions fail. User research showed 100% abandonment after the first wrong prediction when AI was presented as "definitive advice." One bad call and users never come back.
Cognitive Overload
Disparate calculators (margin, leverage, tax) force users to manually synthesize fragmented data across multiple screens. Under market stress, this leads to emotional, high-pressure trading errors. Users needed a unified view, but integrating AI predictions alongside hard math creates information density problems.
Regulatory Liability
In ASIC/SEC-regulated markets, AI predictions that look like "financial advice" create legal liability. The interface must make the distinction between calculation (fact) and prediction (opinion) unmistakably clear — not through disclaimers, but through visual design language itself.
2. What I Was Asked to Do vs. What I Actually Did
The initial concept:
"Build a set of standalone financial calculators (margin, leverage, tax) in a
modern UI."
The problem: Disparate calculators don't solve the core UX
problem. Users aren't failing because calculators don't exist — they're failing
because they can't synthesize fragmented data into a coherent strategy under
market pressure.
What I actually built was a unified intelligent workflow where predictive AI risk models were deeply integrated alongside deterministic calculator outputs — creating a "co-pilot" experience rather than a toolbox. The AI doesn't replace judgment; it augments it with probabilistic scenarios the user can interrogate.
3. Decision Framework: Handling AI Uncertainty
Integrating generative AI into financial forecasting carries immense liability and trust risks. I evaluated three UX approaches based on user trust metrics:
Principal Design Strategy: The AI Explainability Trail
In institutional wealth management, an AI recommendation without reasoning is a Regulatory Liability. I designed Nova's "Explainability Layer" to provide a transparent audit trail that bridges the gap between ML black-boxes and human fiduciary duty.
I moved beyond "Trust us" into "Verify us." If the AI flags a 45% risk increase, the UI surfaces the Top 3 Drivers (e.g., "Yield Curve Inversion," "Portfolio Beta Shift," "Sector Concentration"). This allows the trader to validate the AI's logic against their own market thesis.
Trust is built through stress. I designed a "Counterfactual Mode" where users can manually adjust the AI's inputs (e.g., "What if volatility drops by 10%?") to see how the prediction changes. This turns the AI from a "Black Box" into a Dynamic Hypothesis Engine.
Principal Signal: This isn't just about "tooltips." It's about designing a Fiduciary Interface where the AI serves as a transparent advisor, providing the "why" alongside the "what," ensuring compliance with increasingly strict EU AI Act and SEC transparency guidelines.
4. Process & Evidence
The central research question was not whether a gauge beats a table — it was how to surface margin exposure fast enough for an advisor or prop trader to act on, without stripping the regulatory thresholds that make the number legally meaningful. We ran a five-week mixed-methods study against three candidate patterns, then shipped the one that held up under adverse-volatility tasks — not the one that tested fastest.
Five-week mixed-methods study, n = 40
Counter-balanced latin square across three variants, six risk-read tasks per session, think-aloud + NASA-TLX cognitive-load capture, time-to-decision measured from prompt onset to verbalised threshold call.
- Participants
- 40
- 20 RIA advisors (7–15 yrs tenure) · 20 prop-desk traders (3–10 yrs)
- Sessions
- 120
- 3 variants × 40 participants, latin-square counterbalanced
- Tasks / session
- 6
- Glance · threshold recall · drill · adverse-vol flash · tax-lot pick · explain-to-client
- Protocol
- Mixed
- Task timing, error coding, NASA-TLX, semi-structured think-aloud
Three variants. One shipped. Each one won something.
The internal debate was never “which one is right” — it was which trade-off the product could afford. Each variant won at least one metric; the shipped pattern combined glance speed with explicit regulatory thresholds.
- Best at
- Glance speed · mobile legibility
- Broke at
- Threshold recall (0 / 40) · false calm at 49%
- Time to decision
- 3.2s median · 4.9s p90
- Best at
- Glance + threshold recall (38 / 40) · adverse-vol accuracy
- Trade-off
- Needs 160 px min width · labels localised in three languages
- Time to decision
- 4.1s median · 5.8s p90
Six metrics. V3 wins where it matters for regulated product.
V2 (gauge) wins raw speed, but 0 / 40 participants could recall the Reg T or FINRA 4210 thresholds after the session. V3 loses 0.9 seconds on time-to-decision and buys back 95% threshold recall — the decisive metric for a regulated-product team.
| Metric | V1 · Table | V2 · Gauge | V3 · Heat Meter |
|---|---|---|---|
| Time to decision (median, s) | 11.4 | 3.2 | 4.1 |
| Threshold recall (Reg T + FINRA 4210) | 22 / 40 | 0 / 40 | 38 / 40 |
| Error rate (adverse-vol task) | 18% | 22% | 4% |
| NASA-TLX cognitive load (0–20, lower is better) | 13.8 | 6.1 | 7.4 |
| Drill-back behaviour | 68% reopened detail | 14% reopened detail | 46% reopened detail |
| Confidence on explain-to-client task | 3.4 / 5 | 2.8 / 5 | 4.6 / 5 |
n = 40 · Time measured from task-prompt onset to verbalised threshold call · Error = misstating margin zone or calling the wrong regulatory threshold · NASA-TLX raw weighted, lower is better.
The gauge is faster. We didn’t ship it anyway.
A compliance-aware product doesn’t optimise for glance time in isolation. The shipped pattern gives up 0.9 seconds and buys back two legally consequential behaviours: threshold recall and adverse-vol accuracy.
- Regulatory legibility Reg T 12 CFR §220.12, FINRA Rule 4210(c) and the house cushion must be visible on the same surface the trader acts on. V2 abstracts them to colour. V3 names them.
- False calm at 49% V2’s green-to-amber gradient tested as “safe” at 49% equity — one point above the house call. V3’s named zones made the same 49% read as “one point from house warning, 19 from Reg T”.
- Auditability (SR 11-7) Fed SR 11-7 model-risk governance requires every displayed number be traceable. V3 anchors each threshold to a rule ID and displays the source; V2 shows only a colour band.
- Drill-back pathway V1 was the only variant that let users inspect by sleeve, but 68% drill-back was unsustainable overhead. V3 retains the drill-back in an expandable pane — 46% used it, and stopped when they had what they needed.
5. What We Cut
Three V1 and V2 patterns shipped as cut lines in the spec — each was a defensible choice in isolation, each failed against an institutional use case we later reproduced in testing. Calling them out by name is part of the audit trail, not the marketing copy.
The 82% centre number
V2’s large centre percentage dominated every other element. Traders anchored on that single number and stopped reading. V3 puts the percentage inline with the threshold string (38% · 8 pts above FINRA call) so the number can’t stand alone.
Green → red gradient
Colour-only encoding broke for the 4.5% of male participants with red-green CVD and for the Japanese cohort where amber = caution carries different cultural salience. V3 replaces the gradient with four named bands (Safe · House cushion · Maintenance · Regulatory), each with a text label and an ARIA announcement.
AI hallucinated margin
An early build let the LLM generate margin percentages directly for exotic cross-pairs. The model produced confidence intervals that looked plausible but were mathematically impossible (negative maintenance margin). V3 calculates margin deterministically server-side; the LLM is allowed to narrate why, never to compute what. This is the load-bearing constraint of the Explainable Co-Pilot.
6. Multi-Dimensional Impact
Impact Breakdown by Stakeholder
Business
Prototyped a viable path to premium-tier features that differentiate from standard retail brokerages. AI-assisted analysis is the feature that justifies a $29/mo upgrade from free charting tools.
User Experience
Visual risk architecture reduced comprehension time from minutes to seconds. Users reported feeling "in control" rather than "overwhelmed" — critical for a tool handling real money.
Architecture Decision
Directly wired Python-based risk models to a reactive JavaScript frontend via Chart.js. The architecture cleanly separates deterministic calculations from probabilistic AI — each with its own rendering pipeline.
Trust & Compliance
By clearly delineating deterministic math from AI probabilities through visual design language (not just disclaimers), we avoided the opaque "black box" trap that plagues FinTech startups and creates regulatory risk.
7. Reflection & Strategic Learnings
What Would I Do Differently
- Historical Scenario Branching: Users wanted to explore multiple predictive paths without losing their baseline analysis. I would build proper undo functionality and branching from V1 — allowing users to ask "what if?" without fear of losing their current position.
- Confidence Interval Calibration: The initial AI confidence intervals were too wide to be actionable ("30-80% likely" is useless). I would invest more in model calibration to produce tighter intervals, even if that means fewer predictions — precision over coverage.
- Stress-Testing with Live Market Data: The prototype used historical data. Real-time market volatility creates edge cases (flash crashes, gaps) that the UI wasn't designed to handle gracefully. Future versions need live data feeds in the testing environment.
The Hard-Won Insight
"When designing for AI in high-stakes environments, transparency is the highest-converting feature. Trust is built not by hiding uncertainty, but by elegantly exposing boundaries."
This project fundamentally shaped my philosophy on AI product design: the interface must make the AI's limitations as clear as its capabilities. Users don't need AI to be perfect — they need to know exactly when it might be wrong.
🏦 B2C Private Banking Application: AI Trust When $25M Is on the Table
Nova's core design question — how much should a user trust an AI recommendation before acting on it? — scales directly to private banking wealth management. The stakes change, but the trust architecture is identical:
The Oracle Rejection at $25M Scale
Nova deliberately rejected the Oracle model — AI as definitive answer — because retail traders needed to remain in control of their risk decisions. Private banking discretionary management operates on the same principle: the Relationship Manager makes final decisions; AI surfaces intelligence. Designing Nova's co-pilot architecture taught me exactly how to position AI as augmentation, not replacement — the trust model that private banking client-facing design requires.
Confidence Intervals at UHNW Decision Scale
Nova's visual confidence intervals — showing "likely range" rather than false precision — are the exact design pattern needed for private banking AI forecasting. When an advisor recommends moving $3M into an alternatives allocation, the AI supporting that conversation should show projected outcomes as probability ranges, not a single number. A retail trader can absorb uncertainty at $5K. The same uncertainty at $3M requires a design that makes the model's confidence explicit — or it destroys trust in both the AI and the advisor.
Explainability Is Non-Negotiable at This Level
UHNW clients asking "why is the model recommending this?" are not being obstructive — they're exercising fiduciary diligence over their own wealth. Nova taught me that AI interfaces for high-stakes decisions must surface reasoning, not just conclusions. Black-box recommendations work for retail robo-advice at $10K; they fail completely for discretionary wealth management at $10M+. The interface must make the AI's assumptions visible so client and advisor can validate them together.
Transferable principle: AI in high-stakes financial contexts must show its reasoning, not just its conclusion. This scales from Nova's retail margin decisions to UHNW portfolio allocation — the stakes change, but the trust requirement deepens. I design the human-AI boundary, not just the AI interface.
Five tabs, one mental model
Nova's five surfaces each serve a specific decision moment in an institutional trader's day. Every surface answers the same question with different data: is this action safe to take right now? Below are the production screens from the live terminal — each with the design rationale that shipped.
Six investors disagree — and that's the whole point
A single AI verdict (“buy” / “sell”) is the wrong primitive for institutional decision-making. Nova routes every query through six investor personas — Warren Buffett (quality + moat), Charlie Munger (inversion), Peter Lynch (GARP), Ray Dalio (macro balance), George Soros (reflexivity), Howard Marks (cycle position). Each persona emits a score with a 95% confidence interval. Divergence is the signal.
- Disagreement is surfaced, not hidden. When Buffett says 72% and Soros says 31%, both numbers stay visible. The PM reads the spread, not the mean.
- 95% CI, not point estimate. Every persona score has upper/lower bounds. A tight CI around 65 is a different decision than a wide CI that straddles 30 and 95.
- Reasoning trace, not black-box. Click any persona name and the heuristic tree that produced the score expands — source articles, data as-of timestamps, which weighting rule fired.
- Threshold routing is configurable. When spread > 30 points, the query auto-escalates to a human analyst before the PM sees the final answer.
Live prices on real holdings, not toy fixtures
The seed portfolio is the 12-holding institutional composite from FinceptTerminal's demo fixture — AAPL / MSFT / GOOGL / NVDA / AMZN / TSLA / JPM / JNJ / XOM / V / UNH / PG. Every 5 minutes the terminal pulls Yahoo Finance via CORS proxy, re-marks the book, and redraws the equity curve with Lightweight Charts v4.2 (the TradingView open-source renderer).
- KPI tiles show what PMs ask first. Market value · day P&L · positions count · top-3 concentration — not vanity metrics like “all-time return”.
- Table columns follow institutional convention. Symbol / qty / avg cost / last / MV / day Δ / day Δ% / weight. Tabular numerals so the decimal points align on the price columns.
- Equity curve is hand-off ready. Crosshair, tooltip, and time-scale interactions all ship-grade — same renderer used in Binance, Interactive Brokers web, and dozens of prop shops.
The regulatory threshold is the design language
Margin is a regulatory problem before it's a UI problem. The Heat Meter bakes the three real thresholds directly into the gauge: Reg T 50% (Fed initial margin) · FINRA 4210(c) 25% (maintenance) · house 30% (broker cushion). A position sitting at 28% equity isn't “yellow” — it's “3 percentage points above the maintenance call, 2 below the house warning”.
- Corridor bands use named regime, not colour alone. The label “Reg T · Initial 50%” is the primary signal. Colour is ancillary — deuteranopia-safe.
- Distance-to-call is the headline number. Not the absolute equity %, but the percentage-point gap to the next threshold — that's the number a trader acts on.
- Stress test is one click. “What if the market drops 10%?” triggers a corridor recalculation showing which threshold the position crosses and at what price.
Kelly sizing, with the institutional haircut built in
Full Kelly — f* = (bp − q) / b — is mathematically optimal for unbounded horizons but catastrophically volatile in short-horizon books. Nova defaults every Kelly output to half-Kelly (the Thorp / Poundstone institutional standard) and surfaces the full-Kelly number as a second-tier read, not the headline.
- Two numbers, explicit hierarchy. Half-Kelly is the sizing recommendation. Full-Kelly carries the label “Theoretical max — use only if you're certain of edge”.
- Risk quiz isn't a gate, it's a prior. The five risk-tolerance questions don't block access — they adjust the default haircut from 50% (standard) to 25% (conservative) to 75% (high conviction).
- Edge assumption is editable. The trader sets
p(win probability) andb(odds received) explicitly. Kelly is their number, not the system's.
Tax-loss harvesting is a design problem about deadlines
The tax optimizer does three things in regulatory order: (1) apply IRS §1091 wash-sale rules to flag lots whose losses will be disallowed; (2) compute capital gains under Pub. 550 short-term vs long-term treatment; (3) recommend the Treas. Reg. §1.1012-1(c) specific-identification method that minimises tax liability for this sell order.
- Wash-sale flag is binary and visible. A disallowed loss gets a badge on the lot row — not a footnote, not a tooltip. The trader can't sell without seeing it.
- Holding-period clock shows days-to-long-term. For lots at day 355, the badge reads “10 days to long-term” — a non-trivial design choice, because that prompts a different action (wait) than just showing the raw holding period.
- Specific-ID recommendation is a table, not a verdict. The optimizer shows all three methods (LIFO / FIFO / specific-ID) with after-tax P&L side by side, so the trader can deviate and document rationale.
Live Yahoo Finance quotes via corsproxy.io · FRED macro chips (UNRATE, CPI, 10Y UST, VIX) · 12-holding institutional seed portfolio · Kelly + wash-sale + margin math all client-side · 5-minute ticker refresh · zero localStorage
Every threshold has a citation
Nothing on Nova's surfaces is a design guess. Every number, every default, every flag traces back to a specific regulation, rulebook, or academic paper. Hiring managers can audit the reasoning; analysts can defend the outputs under regulatory review.
| Tab | Surface element | Threshold / rule | Citation |
|---|---|---|---|
| 01 Co-Pilot | 95% confidence interval | Standard institutional disclosure band | NIST/SEMATECH Handbook §1.3.5.2 |
| 01 Co-Pilot | Persona divergence escalation | Spread > 30 pts → human review | SR 11-7 model risk framework |
| 02 Portfolio | Concentration KPI (top-3 weight) | Diversification disclosure convention | Investment Company Act §5(b)(1) |
| 03 Heat Meter | Initial margin 50% | Regulation T initial requirement | 12 CFR §220.12 |
| 03 Heat Meter | Maintenance margin 25% | FINRA minimum maintenance | FINRA Rule 4210(c) |
| 03 Heat Meter | House margin 30% | Broker-discretionary cushion above FINRA | FINRA Rule 4210(e)(8) |
| 04 Risk | Half-Kelly default | Institutional risk reduction factor | Thorp (1997), Poundstone (2005) |
| 04 Risk | Five-question risk quiz | Suitability assessment | FINRA Rule 2111 (suitability) |
| 05 Tax | Wash-sale 30-day window | Disallowance of loss on substantially identical | IRC §1091(a); Pub. 550 |
| 05 Tax | Short-term / long-term boundary | Holding period > 1 year | IRC §1222(3); Pub. 550 |
| 05 Tax | Specific-ID lot selection | Identification of sold securities | Treas. Reg. §1.1012-1(c) |
How Nova actually ships
A designer who can't hand off to engineers is a sketcher, not a product designer. Here's exactly how every surface on the live terminal is wired — so the reader can audit the craft end-to-end, not just the visuals.
Vanilla JS, no framework tax
Zero dependencies at build time. Nova ships three files: index.html (39 KB), style.css (43 KB), script.js (44 KB). BEM .nv-* namespace, token-first (--nv-accent: #c9a959), mobile-first. Total first-paint cost: ~126 KB.
Yahoo Finance via CORS proxy
Ticker prices pulled from query1.finance.yahoo.com/v8/finance/chart via corsproxy.io. 5-minute refresh interval (chosen to stay under rate limit). Graceful fallback to simulated data if the proxy is unreachable — the terminal never stalls on “Loading…”.
Lightweight Charts v4.2
TradingView's open-source renderer. Same library used by Binance, Interactive Brokers web, and most prop-shop terminals. Crosshair + tooltip + time-scale interactions are ship-grade; no re-invention needed.
Kelly + wash-sale + margin, all client-side
Every financial computation runs in the browser — no server round-trip, no data leaves the session. Kelly criterion, FIFO/LIFO lot selection, §1091 wash-sale window detection, Reg T / FINRA margin corridor math: all transparent, all auditable in script.js.
Keyboard + ARIA throughout
Tabs wired as role="tablist" with ←/→ navigation + aria-selected sync. All interactive controls have labels. Prefers-reduced-motion disables animations. Deuteranopia-safe colour palette across all status chips.
Zero LocalStorage, zero CLS
Every image has width+height attributes so layout shift is zero. All data lives in in-memory state — session ends, state ends. No cookies, no third-party trackers, no fingerprinting payload.
Live Demo · AI Trust Architecture
Confidence is not a single number
The failure mode of every AI output in a financial context is the confident-sounding answer that's wrong. Nova's trust layer decomposes confidence into five independent dimensions — so a portfolio manager can see why the model is or isn't certain, not just that it is. A high overall score with a low temporal relevance score is a very different risk than a uniformly moderate score.
Dimensions update every 3s · Click any dimension to inspect · Inspired by the Nova uncertainty calibration system
Where this case study sits in the larger web
Every problem we solve for clients has multiple valid approaches — different costs, different ROI, different risk profiles. These threads show how the approach on this page compares to others in the portfolio.
Concentration, Risk & Agents
Portfolio-level math primitives — HHI, beta, VaR, regime — rendered into UI defaults and AI-assisted decision surfaces.
- Index Weights · SPX + NDX Concentration math exposed Low eng cost · 503+101 slices · HHI 230.6 / 638.2
- Macro Signal Network Regime classifier feeding routing Low eng cost · 4 prints → 14 surfaces
- Nova · Institutional Trading Terminal Portfolio risk surfaces on one shell High eng cost · Beta/Duration/Kelly/VaR unified
- TradeX Institutional Stress test & position sizing High eng cost · 12 holdings · 3-factor composite
- Hedge Fund · 19-Agent Committee Multi-persona investment consensus High eng cost · Agent reliability overhead
- AI Cowork · Agent UX AI in the decision loop Med eng cost · Trust fallbacks · hallucination guardrails
- Private Banking Advisory Diversification mandate High eng cost · Concentration compliance · FINRA 2111
- Intent Canvas · Shared AI Artifact Prose → typed plan for rebalance Low eng cost · Typed artifact · Human approval gate
Editorial Voice in Finance
Luxury, editorial, and brand discipline applied to financial interfaces — where restraint itself is signal.
- Christie's · Private Client 260 years of earned restraint High brand cost · Luxury typographic voice
- Nova · Institutional Trading Terminal Terminal chrome as brand Mid-tone palette · mono typography · zero decoration
- Logix Panel · Broker Operator Console Bloomberg-class information density High information ROI · density over whitespace
- Private Banking Advisory Advisor disclosure letters Quiet room · wealthy audience · editorial cadence
- Trading Cup · Competition Staging Event-grade polish High staging cost · Short-lived · High-traffic moment
Regulatory Routing & Disclosure
How upstream regulation and macro prints become downstream product defaults and Legal-safe disclosure.
- Macro Signal Network Upstream signal layer Low eng cost · 4 prints → 14 surfaces
- ACY Securities · Regulated Broker System Regulated broker system High eng cost · 8 regulatory rewrites · 150 components
- ACY Connect · B2B Compliance Bridge Institutional compliance bridge High eng cost · 12+ institutional clients · FIX 4.4
- Private Banking Advisory Discretionary disclosure High eng cost · Reg T · FINRA 2111 · editorial voice
- Nova · Institutional Trading Terminal Terminal disclosure chrome High eng cost · IRC §1091 wash-sale · SR 11-7 governance
- Hedge Fund · 19-Agent Committee Agent-driven compliance High eng cost · FINRA/SEC overlays per agent
- TradeX Institutional Margin & stress alerts High eng cost · FINRA 4210(c) · real-time tripwires
- Intent Canvas · Shared AI Artifact Audit trail from intent to execution Low eng cost · SEC 17a-4 trail · SOX §302 audit