■ Methodology

How research becomes shipped AI.

Six phases, end to end. What we do, what we produce, what we won't do.

SECTION.01

The full flow

Six phases. Not every engagement runs all of them. Each engagement shape selects a subset. Durations below are reference ranges.

6 phases · loading…Total10–14 weeks full

Phase / Label

Range

Phase

Discovery · 1 week

We scope the problem before we quote the work. A thirty-minute call, then a structured brief written together. We separate the stated problem from the observed problem, and mark where AI is leverage and where it adds risk without return.

01Phase · 1 week

Discovery · We scope the problem before we quote the work.

A thirty-minute call, then a structured brief written together. We separate the stated problem from the observed problem, and mark where AI is leverage and where it adds risk without return.

■ Artifacts

Engagement brief
Scope proposal
Risk register

■ Inputs · from you

Your existing research, metrics, PRDs
Access to the team that will ship the work

■ Outputs · to you

Fixed-scope proposal
Timeline
Roles for each phase

02Phase · 2–3 weeks

Research · Real users in real contexts. Synthesis as sessions run, not after.

Twelve moderated sessions across four segments is the default. Moderation, transcription, and synthesis run in parallel so findings land in days. Sessions happen where the work happens: analyst desks, internal tools, vehicle cabins.

■ Artifacts

Trust model
Feature cut-list
Segment map
Eval corpus seed

■ Inputs · from you

Recruited participants
Product-side observer on 3+ sessions

■ Outputs · to you

Research synthesis doc
Evidence-tagged findings
Interaction spec seed

03Phase · 1–2 weeks

Shape · One decisive artifact. Cuts live here, not in build.

Research consolidates into a single document: trust model, cut-list, interaction spec, and eval corpus seed. This phase decides what does not get built. Some engagements end here with a recommendation not to ship.

■ Artifacts

The shape doc
Interaction spec
Feature cut-list with rationale

■ Inputs · from you

Stakeholder review session
Engineering feasibility check

■ Outputs · to you

Greenlit feature set
Build plan
Eval harness spec

04Phase · 4–8 weeks

Build · The studio that ran the sessions ships the code.

Frontend, backend, model integration, auth, data pipelines, audit trail. Same team across phases. Architectural decisions stay tethered to session evidence. The eval harness runs from week one, not bolted on at v2.

■ Artifacts

Production code
Eval harness
Monitoring dashboards
Audit trail

■ Inputs · from you

Your infrastructure access
Staging environment
Weekly review cadence

■ Outputs · to you

Deployed system
Eval coverage ≥ research corpus
Runbooks

05Phase · Ongoing

Evals & monitoring · Quality proves itself continuously. Drift catches itself.

The corpus seeded in research is live in production. Retrieval accuracy, latency, drift, edge-case alerts. Dashboards live in your Grafana, Datadog, or Langfuse. Operated on your side from cutover. You operate the tooling on exit.

■ Artifacts

Eval dashboards
Drift detection alerts
Weekly quality reports

■ Inputs · from you

Monitoring stack access
Ops review cadence

■ Outputs · to you

Live quality signal
Regression catches
Runbook updates

06Phase · 1 week

Handoff · You own the system. The studio documents the handoff to receiving-team standard.

Architecture docs, eval-harness walkthrough, runbooks per failure mode, onboarding guide for the receiving owner. Thirty days of post-engagement Slack with a hard end date. After that, the code is yours and the engagement is closed.

■ Artifacts

Handoff dossier
Onboarding walkthrough
Ops runbook
System architecture diagram

■ Inputs · from you

Target owner on your team
Final review sign-off

■ Outputs · to you

Clean cutover
30-day support window
Referenceable engagement

SECTION.02

Operating principles

No handoff between research and build

Models that separate research from build leak signal. The insight that matters is the one caught in session 7, and it never reaches engineering through a deck. The studio carries every phase end to end.

Applied · Platform UX for data-center operators

Evals seeded from research, not later

The corpus comes directly from session questions and flows. Production is measured against observed user intent, not synthetic QA. Quality is proved on your workload, not generic benchmarks.

Applied · Visual Daily Information Needs study

Recommend cutting when research does not support shipping

When the research does not support shipping, we recommend cutting the feature. The research deposit covers the work; losing the build fee is the correct trade against shipping something that erodes user trust.

Applied · Max Diff Preference Study

Compliance baked, not bolted

Audit trail, approval workflows, data residency, and AI disclosure patterns are designed in phase 01, not retrofitted at launch. EU AI Act Article 13 and 14 alignment lives at the architecture layer, not in a footer line.

Applied · Permissioned Operator Platform

You own the tooling on exit

Eval harness, dashboards, architecture diagrams, and runbooks live in your stack. When the engagement closes, your team operates the system without us. Repeat work happens because the first engagement was worth it, not because you cannot leave.

Applied · Internal Tooling

SECTION.03

In scope / out of scope

What each engagement shape covers, and what it does not. Procurement reviews work better when the boundary is published.

UX Research

■ In scope

Moderated sessions with your users
Synthesis doc + cut-list
Interaction spec seed
Decision-ready recommendation

□ Out of scope

Writing production code
Long-form market research reports
Recruitment outside the agreed segments

AI Product Build

■ In scope

Frontend, backend, model integration
Eval harness from day one
Audit trail + auth
Monitoring wired in pre-launch

□ Out of scope

Greenfield discovery research (assumes research is done)
Ops team headcount replacement
Post-ship marketing launch

Retained Partnership

■ In scope

All six phases across sequenced features
Standing review cadence
Roadmap sparring
Eval and monitoring continuity past launch

□ Out of scope

Resourcing a full product team
Non-AI generic engineering work
Headcount substitution

SECTION.04

Which phases your engagement uses

UX Research

010203040506

Discovery, research, shape. Output is the shape doc. Build happens later, with the studio or with your team.

AI Product Build

010203040506

Assumes research already done (by us or you). Shape tightens it, then build, evals, and handoff.

Retained Partnership

010203040506

All phases, six-month block, multiple features sequenced. Research runs in parallel with build.

SECTION.05

Evaluation harness

Default instrumentation. Metrics seeded from the research corpus, gates set in the shape doc against your workload. No ship without green, no "we'll add monitoring later".

MetricWhat we measureShip / regression gate

Retrieval hit@k

Correct source in top-k for the research-corpus query set

k=5 ≥ 0.92 before cutover

Answer faithfulness

Claim-level support check against retrieved sources

≥ 0.95, zero tolerance on fabricated citations

Refusal calibration

System declines when confidence is below threshold

False-accept on known-unanswerable set < 5%

Latency p95

End-to-end response under measured production load

Set in shape doc, usually < 3s interactive / < 12s async

Drift signal

Weekly regression on frozen eval slice vs last stable run

Any metric slip > 2% opens an incident

Edge-case coverage

Hand-curated adversarial set from session transcripts

Grows monotonically; no removals without sign-off

SECTION.06

Sample deliverables

The artifacts clients leave with. Stock titles and formats, so the definition of done is set before kickoff.

Phase 01

Engagement Brief

PDF · 6-10 pages · problem, assumptions, exit criteria

Phase 02

Research Synthesis · Evidence Tagged

Doc · findings linked to session timestamps + transcripts

Phase 03

The Shape Doc

Doc · trust model · cut-list · interaction spec · eval seed

Phase 04

System Architecture Diagram

Mermaid + annotated PNG · data flow, auth, audit boundary

Phase 04

Eval Harness · v1 Report

HTML report · metrics table, failure samples, regression diff

Phase 05

Weekly Quality Readout

Dashboard link + 1-page written summary · green/yellow/red

Phase 06

Handoff Dossier

Repo README + walkthrough Loom · ops, runbooks, contacts

SECTION.07

Handoff checklist

The hard commitments on exit. The receiving team verifies each line. Nothing on this list remains in our infrastructure after cutover.

[ 01 ]All production code in your repo, under your org, on branches your team controls
[ 02 ]Eval harness running on your CI with the seed corpus and adversarial set
[ 03 ]Monitoring dashboards deployed to your Grafana / Datadog / Langfuse
[ 04 ]Secrets rotated to your vault, no credentials retained by us
[ 05 ]Architecture diagram checked in next to the code, not in a separate doc portal
[ 06 ]Runbook per known failure mode, with on-call escalation path
[ 07 ]Onboarding walkthrough recorded and linked in the repo README
[ 08 ]Named owner on your side with write access and working environment
[ 09 ]30-day post-engagement Slack window with a hard end date, not indefinite

Start an engagement

30-min discovery call. Proposal within the week.

Book a call

Loading...

::::::::::: ::: ::: :::::::::: :::::::::: ::::::::::: :+: :+: :+: :+: :+: :+: +:+ +:+ +:+ +:+ +:+ +:+ +#+ +#++:++#++ +#++:++# :#::+::# +#+ +#+ +#+ +#+ +#+ +#+ +#+ #+# #+# #+# #+# #+# #+# ### ### ### ########## ### ###

Discovery · We scope the problem before we quote the work.

A thirty-minute call, then a structured brief written together. We separate the stated problem from the observed problem, and mark where AI is leverage and where it adds risk without return.

■ Artifacts

Engagement brief
Scope proposal
Risk register

■ Inputs · from you

Your existing research, metrics, PRDs
Access to the team that will ship the work

■ Outputs · to you

Fixed-scope proposal
Timeline
Roles for each phase

Research · Real users in real contexts. Synthesis as sessions run, not after.

■ Artifacts

Trust model
Feature cut-list
Segment map
Eval corpus seed

■ Inputs · from you

Recruited participants
Product-side observer on 3+ sessions

■ Outputs · to you

Research synthesis doc
Evidence-tagged findings
Interaction spec seed

Shape · One decisive artifact. Cuts live here, not in build.

■ Artifacts

The shape doc
Interaction spec
Feature cut-list with rationale

■ Inputs · from you

Stakeholder review session
Engineering feasibility check

■ Outputs · to you

Greenlit feature set
Build plan
Eval harness spec

Build · The studio that ran the sessions ships the code.

■ Artifacts

Production code
Eval harness
Monitoring dashboards
Audit trail

■ Inputs · from you

Your infrastructure access
Staging environment
Weekly review cadence

■ Outputs · to you

Deployed system
Eval coverage ≥ research corpus
Runbooks

Evals & monitoring · Quality proves itself continuously. Drift catches itself.

■ Artifacts

Eval dashboards
Drift detection alerts
Weekly quality reports

■ Inputs · from you

Monitoring stack access
Ops review cadence

■ Outputs · to you

Live quality signal
Regression catches
Runbook updates

Handoff · You own the system. The studio documents the handoff to receiving-team standard.

■ Artifacts

Handoff dossier
Onboarding walkthrough
Ops runbook
System architecture diagram

■ Inputs · from you

Target owner on your team
Final review sign-off

■ Outputs · to you

Clean cutover
30-day support window
Referenceable engagement