Six phases, end to end. What we do, what we produce, what we won't do.
■SECTION.01
The full flow
Six phases. Not every engagement runs all of them. Each engagement shape selects a subset. Durations below are reference ranges.
■6 phases · loading…Total10–14 weeks full
Phase / Label
01
02
03
04
05
06
Range
■ Phase
01
01
Discovery · 1 week
We scope the problem before we quote the work. A thirty-minute call, then a structured brief written together. We separate the stated problem from the observed problem, and mark where AI is leverage and where it adds risk without return.
01Phase · 1 week
Discovery · We scope the problem before we quote the work.
A thirty-minute call, then a structured brief written together. We separate the stated problem from the observed problem, and mark where AI is leverage and where it adds risk without return.
■ Artifacts
·Engagement brief
·Scope proposal
·Risk register
■ Inputs · from you
·Your existing research, metrics, PRDs
·Access to the team that will ship the work
■ Outputs · to you
·Fixed-scope proposal
·Timeline
·Roles for each phase
02Phase · 2–3 weeks
Research · Real users in real contexts. Synthesis as sessions run, not after.
Twelve moderated sessions across four segments is the default. Moderation, transcription, and synthesis run in parallel so findings land in days. Sessions happen where the work happens: analyst desks, internal tools, vehicle cabins.
■ Artifacts
·Trust model
·Feature cut-list
·Segment map
·Eval corpus seed
■ Inputs · from you
·Recruited participants
·Product-side observer on 3+ sessions
■ Outputs · to you
·Research synthesis doc
·Evidence-tagged findings
·Interaction spec seed
03Phase · 1–2 weeks
Shape · One decisive artifact. Cuts live here, not in build.
Research consolidates into a single document: trust model, cut-list, interaction spec, and eval corpus seed. This phase decides what does not get built. Some engagements end here with a recommendation not to ship.
■ Artifacts
·The shape doc
·Interaction spec
·Feature cut-list with rationale
■ Inputs · from you
·Stakeholder review session
·Engineering feasibility check
■ Outputs · to you
·Greenlit feature set
·Build plan
·Eval harness spec
04Phase · 4–8 weeks
Build · The studio that ran the sessions ships the code.
Frontend, backend, model integration, auth, data pipelines, audit trail. Same team across phases. Architectural decisions stay tethered to session evidence. The eval harness runs from week one, not bolted on at v2.
The corpus seeded in research is live in production. Retrieval accuracy, latency, drift, edge-case alerts. Dashboards live in your Grafana, Datadog, or Langfuse. Operated on your side from cutover. You operate the tooling on exit.
■ Artifacts
·Eval dashboards
·Drift detection alerts
·Weekly quality reports
■ Inputs · from you
·Monitoring stack access
·Ops review cadence
■ Outputs · to you
·Live quality signal
·Regression catches
·Runbook updates
06Phase · 1 week
Handoff · You own the system. The studio documents the handoff to receiving-team standard.
Architecture docs, eval-harness walkthrough, runbooks per failure mode, onboarding guide for the receiving owner. Thirty days of post-engagement Slack with a hard end date. After that, the code is yours and the engagement is closed.
■ Artifacts
·Handoff dossier
·Onboarding walkthrough
·Ops runbook
·System architecture diagram
■ Inputs · from you
·Target owner on your team
·Final review sign-off
■ Outputs · to you
·Clean cutover
·30-day support window
·Referenceable engagement
■SECTION.02
Operating principles
01
No handoff between research and build
Models that separate research from build leak signal. The insight that matters is the one caught in session 7, and it never reaches engineering through a deck. The studio carries every phase end to end.
The corpus comes directly from session questions and flows. Production is measured against observed user intent, not synthetic QA. Quality is proved on your workload, not generic benchmarks.
Recommend cutting when research does not support shipping
When the research does not support shipping, we recommend cutting the feature. The research deposit covers the work; losing the build fee is the correct trade against shipping something that erodes user trust.
Audit trail, approval workflows, data residency, and AI disclosure patterns are designed in phase 01, not retrofitted at launch. EU AI Act Article 13 and 14 alignment lives at the architecture layer, not in a footer line.
Eval harness, dashboards, architecture diagrams, and runbooks live in your stack. When the engagement closes, your team operates the system without us. Repeat work happens because the first engagement was worth it, not because you cannot leave.
What each engagement shape covers, and what it does not. Procurement reviews work better when the boundary is published.
UX Research
■ In scope
+Moderated sessions with your users
+Synthesis doc + cut-list
+Interaction spec seed
+Decision-ready recommendation
□ Out of scope
−Writing production code
−Long-form market research reports
−Recruitment outside the agreed segments
AI Product Build
■ In scope
+Frontend, backend, model integration
+Eval harness from day one
+Audit trail + auth
+Monitoring wired in pre-launch
□ Out of scope
−Greenfield discovery research (assumes research is done)
−Ops team headcount replacement
−Post-ship marketing launch
Retained Partnership
■ In scope
+All six phases across sequenced features
+Standing review cadence
+Roadmap sparring
+Eval and monitoring continuity past launch
□ Out of scope
−Resourcing a full product team
−Non-AI generic engineering work
−Headcount substitution
■SECTION.04
Which phases your engagement uses
UX Research
010203040506
Discovery, research, shape. Output is the shape doc. Build happens later, with the studio or with your team.
AI Product Build
010203040506
Assumes research already done (by us or you). Shape tightens it, then build, evals, and handoff.
Retained Partnership
010203040506
All phases, six-month block, multiple features sequenced. Research runs in parallel with build.
■SECTION.05
Evaluation harness
Default instrumentation. Metrics seeded from the research corpus, gates set in the shape doc against your workload. No ship without green, no "we'll add monitoring later".
MetricWhat we measureShip / regression gate
Retrieval hit@k
Correct source in top-k for the research-corpus query set
k=5 ≥ 0.92 before cutover
Answer faithfulness
Claim-level support check against retrieved sources
≥ 0.95, zero tolerance on fabricated citations
Refusal calibration
System declines when confidence is below threshold
False-accept on known-unanswerable set < 5%
Latency p95
End-to-end response under measured production load
Set in shape doc, usually < 3s interactive / < 12s async
Drift signal
Weekly regression on frozen eval slice vs last stable run
Any metric slip > 2% opens an incident
Edge-case coverage
Hand-curated adversarial set from session transcripts
Grows monotonically; no removals without sign-off
■SECTION.06
Sample deliverables
The artifacts clients leave with. Stock titles and formats, so the definition of done is set before kickoff.
■Phase 01
Engagement Brief
PDF · 6-10 pages · problem, assumptions, exit criteria
■Phase 02
Research Synthesis · Evidence Tagged
Doc · findings linked to session timestamps + transcripts