A major search client needed a behavior-led account of how people actually use images to meet daily information needs, not another feature-led survey of visual search. The studio ran a two-phase study, a seven-day remote digital diary with 30 participants across the US and India captured behavior in context, followed by 16 sixty-minute depth interviews grounded in the diary entries. The work surfaced six mental-model archetypes, three distinct journey types, and an eight-step behavioral map, reframing visual search as a behavior-specific mode rather than a default, with image collection treated as load-bearing to shopping decisions.
Seven-day remote digital diary with 30 participants across the US and India. Captured visual information needs in the moment, including triggers, tools, and outcomes.
Text-based searcher, image-collector, image-sharer, feedback-finder, reverse-image searcher, and true visual searcher. Each carries a distinct expectation about what a visual search tool should do.
Visual search sits outside most participants' go-to patterns. It is reached for in specific moments rather than used as a daily query mode, which reshapes how and where visual search surfaces should appear.
Saving, grouping, and revisiting images is how many participants move through shopping and planning. A tool that supports collection directly matches a real, repeated behavior.
Image-initiated, text-initiated, and exploratory journeys all fit a shared eight-step behavioral map (trigger, frame, choose tool, query, scan, refine, act, save / share). The journey type, not the step list, determines product design choices.
US and India participants differed in what started a visual information need more than in how the need resolved. The behavioral shape is stable. The entry conditions are market-specific.
Visual search reframed from a feature to a behavior-specific mode with six distinct user archetypes.
The research gave the product team a behavior-led framework for where visual search belongs and how it should show up. Product decisions moved from feature-led framing to archetype-led framing, with image collection treated as a load-bearing behavior rather than an adjacent nicety.