Conversion

AI + Experimentation = Growth | Inside Adobe’s Agentic AI

Matt Wright — Thu, 23 Oct 2025 17:08:50 +0000

In this special episode, Matt Wright talks with Brent Kostak, Product Marketing Lead for Optimization and Experimentation at Adobe, and David Arbour, Senior Research Scientist at Adobe Research. Together, they explore the launch of Adobe’s Experimentation Accelerator, a new AI-first platform built to automate and scale experimentation programs across enterprises. The conversation dives into how AI is transforming experimentation from manual testing into a continuous, insight-driven process powered by Adobe’s new Agent Orchestrator platform and specialized AI agents.

The guests discuss key use cases and business challenges the Accelerator addresses, from automating experiment analysis and identifying high-impact opportunities to unifying data across teams. Arbour explains how the system grounds AI reasoning in statistical rigor and historical data, ensuring consistency, replicability, and reliability. Kostak highlights early beta results showing 200% increases in experiment variation velocity and major gains in ARR impact, while customers like AAA Northeast have discovered new strategic insights for campaign optimization.

Looking ahead, both guests predict that experimentation will become tightly embedded into daily workflows, with AI agents proposing, interpreting, and even executing experiments automatically. Their advice for successful adoption: focus on organizational readiness, define clear non-AI goals, and treat AI as a collaborative augmentation tool rather than a replacement. Adobe’s agentic AI framework, combining data, content, and journey orchestration — is positioned to make experimentation faster, smarter, and more connected than ever before.

To view the complete podcast and transcript click here.

Introductions
Top use cases for experimentation and optimization
What is experimentation accelerator & key challenges
Challenges and pain points addressed
How it works: AI insights & reliability
AI agents, assistants & Orchestrator platform
Beta results & real-world impact
What sets Adobe apart
Future of experimentation & final advice
Links

Introductions

Matt Wright:
Hi everyone, and welcome to a very special episode. Today, we’re joined by the team behind Adobe’s upcoming Experiment Accelerator, diving into the potential of AI-powered experimentation.

We’ve got two special guests: Brent Kostak from Adobe and David Arbour.

I probably can’t do justice introducing all that you both do, so I’ll let you take it from here. Brent, can you start with a quick intro, and then we’ll hand it to David?

Brent Kostak:
Thanks, Matt, and thanks to the Conversion team for having us.

I’m Brent Kostak, and I lead product marketing for optimization and experimentation at Adobe.

We’re here to discuss some of the AI-first application launches for Experimentation Accelerator and upcoming innovations.

David Arbour:
Thanks, Brent, and thanks again for having us. I’m David Arbour, a Senior Research Scientist at Adobe Research. I work on experimentation, causal inference, and AI in support of those two areas. A lot of my focus lately has been on this new initiative we’ll be discussing.

Matt Wright:
Fantastic. It’s great to have you both here.
Before we dive in, I’d love to get Adobe’s perspective on this: what are the top use cases for experimentation and optimization?
Top use cases for experimentation and optimization
Brent Kostak:
Great question. Adobe’s been in this market for a long time with Adobe Target and Journey Optimizer.

From a use-case standpoint, we’re seeing things expand beyond traditional UI/UX teams into channel marketers and lifecycle journey experts. The main use cases we’re seeing today include:
- Driving higher-impact campaigns and customer journeys – focusing on higher conversion and revenue uplift across the end-to-end journey.
- Automating experimentation analysis – so teams can prioritize faster and understand what and why they’re testing.
- Growth-focused experimentation – helping both marketing and product teams improve subscription and service growth.
These three use cases align with our core personas around optimization, experimentation, and growth.

David Arbour:
I completely agree with that. From my perspective, it’s never been easier to create experiment content—but it’s never been harder to extract meaningful insights from it.

We all know we need to experiment a lot, and generating variants is easy now. The real challenge is making that analysis approachable and reusable so each experiment builds long-term learning, not just one-off results.

“It’s never been easier to create content for experiments. And I think it’s maybe never been harder to get meaningful insights on top of that. The goal is to make that analysis approachable and also reusable in future contexts, so you are learning from it over time.”

What is experimentation accelerator & key challenges
Matt Wright:
Let’s talk specifically about Experimentation Accelerator and AI-guided experimentation. What is it, and how would you describe it to someone new?

Brent Kostak:
Sure. The Adobe Journey Optimizer Experimentation Explorer is a new AI-first application we’re launching on September 30th.

It’s designed for Adobe Target and Journey Optimizer customers to:
- Accelerate and automate experimentation analysis
- Identify high-impact opportunities to scale growth
- Scale experimentation programs across the enterprise
It’s built on Adobe’s Agent Orchestrator platform, aligning with Adobe’s broader agentic AI innovation strategy.

Essentially, it helps teams automate analysis, spot where to test next, and mature their experimentation programs across people, processes, and technology.
Challenges and pain points addressed
Matt Wright:
What are some of the problems or challenges this helps solve?

David Arbour:
It’s honestly never been a better time to work on experimentation. A few years ago, I wouldn’t have said that!

Here’s what we’ve seen:
- It’s easy to create lots of test variants.
- But many teams then realize their experiment is underpowered, or they’re unsure how to interpret results.
That leads to a few key needs:
1. Making it clear why you’re running an experiment—what decision you’re trying to inform.
2. Using AI to identify relationships between variants—what worked, what didn’t—and using those learnings to guide the next test.
AI can help reveal patterns across experiments, surfacing which attributes tend to drive success.

Brent Kostak:
Exactly. There’s also an organizational shift happening.

We’re seeing companies use AI agents and workflows to align business goals and experimentation efforts across different teams.

By creating a centralized integration point, we’re helping teams share learnings across units, not just in PowerPoint decks or Jira tickets. Instead, they can collaborate in an automated, conversational way, breaking silos and scaling insights.

David Arbour:
Right—and that’s crucial. Too often, experiment results live in slide decks that no one revisits.

If someone leaves the organization, that knowledge disappears.
What we’re building reduces that friction—making experimentation insights persistent, searchable, and actionable long-term.

“By creating a centralized integration point, we’re helping teams share learnings across units, not just in PowerPoint decks or Jira tickets. Instead, they can collaborate in an automated, conversational way, breaking silos and scaling insights.”
How it works: AI insights & reliability
How AI Experiment Insights Work

Matt Wright:
Let’s dig into how this actually works.
Sometimes when people use AI, they give it a lot of context but get shallow answers. How do you make sure that doesn’t happen here?

David Arbour:
Great question—and honestly, it’s what keeps me up at night.
You could just dump experiment data into a large language model (LLM) and ask, “Why did this work?” Sometimes the answer looks plausible, but you run into two major problems:
1. Omissions – Important details get skipped.
2. Hallucinations – AI makes things up.
So instead, we take a different approach:
- We extract representations (attributes and features) of the content itself.
- We learn from historical experiments how those attributes relate to performance.
- We anchor results in fixed, auditable scores — meaning if you recheck in a month, you’ll get the same answer.
This makes it grounded, consistent, and tied to real experimental data, not random AI text generation.

We also use these patterns to recommend what to test next — for example:

“We noticed empathetic tone performs better. Consider adding that to your next campaign.”

Everything stays fact-based and replicable, not just “the model said so.”

Handling Conflicts and Scaling

Matt Wright:
What happens when results conflict or change over time? How does the model handle that?

David Arbour:
That’s where we blend AI reasoning with classical statistics.
Classical stats gives guarantees; AI gives interpretability.
We model time-based factors like:
- Seasonality
- Brand differences
- Audience shifts
The more experiments you run, the more accurate the model becomes.
We start with a baseline, but over time, it becomes tailored to your data — your customers, your industry, your campaigns.

Brent Kostak:
Exactly — and this connects to Adobe’s three pillars:
Data, Content, and Journeys.
- Data: What David just described—context, modeling, and insights.
- Content: Making sure AI-generated content is on-brand and compliant.
- Journeys: Ensuring cross-channel orchestration—no conflicting experiments across campaigns or channels.
All of this runs on the Adobe Experience Platform, so you get enterprise-level scalability and visibility.
AI agents, assistants & Orchestrator platform
Understanding Adobe’s AI Framework

Matt Wright:
Adobe often mentions “AI capabilities,” “AI features,” and “AI agents.”
How are those different?

Brent Kostak:
Good question. Think of it in three layers:
1. AI Assistant – The conversational interface within Adobe apps (for example, “Summarize my experiments”).
2. AI Agents – Specialized reasoning models that take action or query data across systems.
3. Agent Orchestrator Platform – The layer that manages and coordinates all those agents across Adobe Experience Cloud.
The Experimentation Agent powers Experimentation Accelerator, but other related ones include:
- Journey Agent – connects campaigns and touchpoints.
- Data Insights Agent – drives analytics and interpretation.
- Audience Agent – helps understand and segment users.
These all interconnect through the Agent Orchestrator, enabling consistent reasoning and data flow.

How Agents Work and Specialization

Matt Wright:
Are all Adobe agents built the same way?

David Arbour:
They share an orchestrator, but each is specialized for its use case.
For example, the Experimentation Agent focuses deeply on testing logic, analysis, and insight delivery.

Two big design priorities:
1. Rich features – The agent must understand and access the right experiment data.
2. Natural conversation – Translating user intent correctly (e.g., “show me underperforming variants” → actual statistical query).
It’s surprisingly complex to get both right, so we spend a lot of time aligning human language to technical action.

Building a Custom Agent

Matt Wright:
What’s it like to build one of these agents? How do you evaluate or optimize it?

David Arbour:
It’s part technical, part sociological.
In classical machine learning (pre-2020), you had labels and accuracy metrics.
Now, success depends on human feedback loops — understanding what’s helpful and in-scope for real users.

We annotate hundreds of examples:
- Was the answer correct?
- Was it frustrating?
- Did it go out of scope?
Then we refine prompts and training examples until the experience feels natural and reliable.

Future Evolution and Adaptability

Matt Wright:
How will this evolve as LLMs advance?

David Arbour:
We’re moving toward adaptive reasoning.
Right now, each agent has a defined set of tools. In the future, a single agent could dynamically compose hundreds of specialized tools under the hood.

For example:

“Find the best-performing segment in my last experiment.”
The agent could then identify segments, analyze attributes, and even propose new campaign paths — all in one flow.

It’s like giving the system more Lego blocks to build richer workflows.

Brent Kostak:
And because this runs on Adobe Experience Platform, everything’s grounded in real-time customer profiles and behavioral data.

Partners can use Agent Composer, SDK, and Registry to build their own custom agents or fine-tune ours for specific business use cases.

That’s a major differentiator versus point solutions in the market.

“Everything’s grounded in real-time customer profiles and behavioral data.”
Beta results & real-world impact
Early Customer Impact

Matt Wright:
You’ve had a beta running for a while. What business impact have you seen so far?

Brent Kostak:
Yes, we’ve been in beta for several months.
Two great examples:

1. Adobe.com (Customer Zero)
- Massive internal program managing Adobe’s website testing.
- Using Experimentation Accelerator increased:
  - 200% more experiment variations
  - Higher win rates
  - Over 200% increase in ARR impact per test
It wasn’t just faster testing—it was smarter, more consistent outcomes.

2. AAA Northeast (Beta Partner)
- Used AI Experiment Insights to guide a new member benefits campaign.
- AI highlighted which messaging and engagement pathways would perform best.
- They learned not only which variants worked but also why — enabling broader marketing strategy insights.
These weren’t just “conversion bumps” — they gained contextual understanding that shaped future campaigns.
What sets Adobe apart
Differentiation in the Market

Matt Wright:
Other platforms are releasing agentic AI features too. What sets Adobe apart?

David Arbour:
Two main things:
1. Grounding in statistical rigor
2. Integration across the experience stack
Many competitors emphasize “velocity”—run 100x more experiments. But:
- Without 100x more users or better analysis, that’s just noise.
- It leads to frustration or weakened stats.
Adobe instead focuses on analyzing smarter, not just testing faster.

We keep statistical integrity (confidence sequences, causal analysis) while using AI to scale insight generation.

That means:
- You can run more variants safely.
- You can reuse learnings for future experiments.
- You don’t lose trust in your data.
Brent Kostak:
Exactly. And our Experiment Insights combine:
1. Content patterns – what’s working in messaging and design
2. Audience data – who’s responding
3. Test behavior – what’s driving causal impact
Plus, customers can bring their own multi-armed bandit or modeling approaches to fine-tune their programs.

And then there’s Adaptive Experiments — a new, human-in-the-loop method that lets teams:
- Add or remove variants mid-experiment
- Maintain statistical validity
- Accelerate iteration
That’s groundbreaking — it challenges the old “flush your data” rule and opens up new adaptive workflows.
Future of experimentation & final advice
The Future (Next Year and Beyond)

Matt Wright:
Looking ahead — how do you see experimentation programs evolving a year from now?

David Arbour:
Experimentation will become tightly integrated into everyday workflows.
Instead of being a separate process, agents will:
- Spot opportunities
- Propose experiments
- Even run them automatically if approved
It’ll shift from isolated testing to continuous, intelligent learning.

Brent Kostak:
Totally agree.
Experimentation is expanding beyond conversion metrics.
Teams will start optimizing:
- Prompt quality in conversational AI
- Engagement experience in real-time interactions
- Operational efficiency inside organizations
Optimization will mean more than “higher revenue”—it’ll mean better experience orchestration.

How to Adopt AI Successfully

Matt Wright:
Three-quarters of AI initiatives reportedly fail. Any advice on successful adoption?

David Arbour:
Yes — define success without using the word “AI.”
Focus on the use case and measurable outcome.

AI should support your goal, not be the goal.
It’s easy to get 85% of the way fast—but that last 15% is where failure happens if the goal isn’t clear.

Brent Kostak:
Exactly.
It’s about organizational readiness as much as technology.

Teams that succeed think about:
- How AI will augment human workflows
- How automation fits with existing analytics and experimentation culture
- How to operationalize insights across business units
Without that, you may get short-term gains but miss transformational potential.

Closing

Matt Wright:
This has been fascinating. There’s so much more coming in the next six months, I’m sure.
Thank you both for sharing your insights.

Brent Kostak:
Thank you, Matt.
Keep an eye out for:
- Adobe Summit announcements
- New podcasts and events
- Ongoing thought leadership from our teams
Matt Wright:
Fantastic. Thanks again, Brent and David — and thanks to everyone for listening.
Links
1. Adobe Experimentation Accelerator
2. Upcoming Adobe Events

The post AI + Experimentation = Growth | Inside Adobe’s Agentic AI appeared first on Conversion.

The Conversion maturity model: How mature is your experimentation program?

Steph Le Prevost — Wed, 27 Aug 2025 15:54:28 +0000

How do you take an immature experimentation organization – the kind that runs one or two a/b tests a month – and turn them into a booking.com?

This is a question that we – and many of our clients – have been trying to answer for years.

We’ve approached this problem from many different angles. We’ve developed a number of models and frameworks to support us in this work, and we’ve tailored – and implemented – maturation plans for a huge range of clients: for some of the most mature experimentation organizations on the planet, as well as complete newbies.

Throughout all of this work – and through our broader work in experimentation – we’ve gradually been able to put together a map of the experimentation maturity landscape.

This map, which has become our flagship maturity model, is proving to be an invaluable resource, allowing us to to tell our clients
1. How mature their experimentation function is relative to the best in the business
2. Where, specifically, they’re doing well – and where they’re falling down
3. How they can remedy shortcomings and take meaningful strides towards maturity
Throughout the remainder of this blog post, we’re going to share this maturity model with you.

…but first:

There are already tons of existing maturity models kicking about in our industry. Why did we feel the need to develop another one?

Introduction
1. Not another maturity model: why we felt the need to develop a new maturity model
2. The five stages of program maturity
3. The Areas dimensions of maturity
4. Bringing it all together: how to actually apply this stuff
Contact us

1. Not another maturity model: why we felt the need to develop a new maturity model
A couple of months ago, I was working with an ambitious client to try and mature their experimentation program.

According to our 3 V model of experimentation success, this client was doing everything right:
- Velocity – the speed from ideation to launch was extremely fast
- Volume – they were launching lots of experiments each month
- Value – the experiments they were launching were driving real, demonstrable business growth
Unfortunately, this 3 V analysis was missing something.

While this team was doing lots of things right – they’d matured immensely since we’d started working together – I knew there was still tons of room for improvement.

To give two examples:
1. The client was using experimentation to drive real business value, but this value was siloed to a couple of teams and had produced zero impact in other areas of the business. The most mature experimentation teams we work with use experimentation to make better decisions across every area of their business.
2. The client was using research to inform experiments, but their research was extremely infrequent. The most mature experimentation teams we work with tend to have an ‘always-on’ research mentality, which allows a true Mixed Methods approach to develop.
My first port of call in trying to solve this problem was to look to the PACET model that we’d developed many years ago.

PACET essentially breaks an experimentation function down into 5 factors – Process, Accountability, Culture, Expertise, and Technology – and attempts to identify and remedy any bottlenecks that are harming program performance.

Unfortunately, for all its many strengths, the trouble with PACET is that it doesn’t provide a clear series of stepping stones that an experimentation program can use to benchmark and mature its approach. Put another way, I needed a map – with clear milestones – that I could use to benchmark my client’s maturity and help them level up.

The good news is that I work for the world’s leading experimentation agency (!), so I decided to tap into the collective experience of our 40+ strong consulting team to begin building a comprehensive map of the experimentation maturity landscape.

After much trial, error, discussion, iteration, refinement, etc., we’ve now arrived at a model that is delivering real value for clients, providing them with specific goals and actions that they are using to mature their programs at breakneck speed.

We’re hopeful that this model can do the same for your program too, so here it is.
2. The five stages of program maturity
At the highest level, our maturity model breaks down the spectrum of experimentation maturity into five discrete stages.

These stages range from teams that are running the odd ad-hoc test to companies like Duolingo and Microsoft that run thousands of tests each year and use experimentation to inform decisions across every area of the business.

Here are the 5 stages:
1. Reactive
Teams at the reactive stage are characterized by ad hoc, sporadic testing initiated by individuals without strategic direction or leadership support. These organizations typically run occasional experiments focused on low-hanging fruit, with basic A/B tests that lack proper documentation or knowledge sharing.

There’s no clear program goal or defined KPIs, and ROI tracking hasn’t even crossed their minds. The testing tool stands alone without integration to other data platforms, and research is extremely limited. These teams are essentially testing wherever they can, whenever they can, without any formal framework or stopping protocol.

The key challenge at this stage is the complete absence of structure – no hypothesis framework, no pipeline of experiments, and crucially, no buy-in from leadership. Results are barely documented or shared, leading to a cycle of “spaghetti testing” where learnings are lost and mistakes are repeated.

1. Emerging

At the emerging stage, experimentation begins to gain traction within specific teams or projects. While there’s still no formal framework or strategy, experiments become more regular and organized. Teams start building awareness of testing’s value and demonstrating wins to gain broader support.

These programs typically have some individuals championing experimentation, but lack official buy-in or formalized processes. Volume and velocity are slower than optimal, but teams are beginning to identify blockers. A backlog starts forming, though without clear prioritization methods.

The key development here is that KPIs are being assessed and questioned, with ROI showing for some experiments. Research remains sporadic – conducted when time allows or specific questions arise. While results are shared, cross-functional learning remains limited due to the lack of integration between testing tools and data platforms.

2. Strategic

Strategic programs represent a significant maturity leap. Experimentation is now recognized as a strategic activity with clear buy-in from senior leadership. A dedicated team leads or governs experimentation efforts, with established frameworks for hypotheses, experiment plans, and summaries.

These organizations have defined success metrics with a primary KPI closest to the business goal. Experiments align with business objectives and ladder up to an overarching goal. Research is conducted regularly with a cohesive plan, and a culture of experimentation is taking shape.

The testing tool is integrated with supporting analytics platforms, enabling behavioral analysis and advanced experiments like multi-armed bandits, multivariate tests, and personalization. Teams focus on aligning strategically and establishing standardized processes, with consistent approaches to prioritization and clear stopping protocols.

3. Integrated

Integrated organizations have experimentation embedded across the entire company. There’s a company-wide vision with a shared roadmap, and most of the business is empowered to experiment. Strategic goals align with business objectives while pushing the boundaries of established norms.

Cross-functional collaboration is frequent, with learnings applied across teams. KPIs are clearly defined, tracked regularly, and insights are shared business-wide and acted upon. These companies actively scale their experimentation efforts, understanding that volume and velocity may dip temporarily in favor of more complex tests.

A constant research loop with clear questions feeds the experimentation pipeline. The rigorous processes ensure excellent documentation of insights and outcomes. The backlog is consistently fed with high-quality experiments, and prioritization is automated and easily utilized. These teams focus on scaling experimentation across the organization.

4. Optimized

Optimized organizations represent the pinnacle of experimentation maturity – think Amazon, Netflix, or Booking.com. Experimentation is fundamental to their business model, integrated into everything they do across every channel. There’s a test-and-learn culture at all levels, with every employee empowered to run experiments.

These companies use sophisticated frameworks, tools, and processes, leveraging AI and automation to speed up and scale. They run experiments in every aspect of business – online and offline – with clear measures balancing learning and earning goals.

The process is continuously refined, with everyone following and optimizing the delivery process. Insights are well-documented, shared continuously, and feed back into strategy. Research is consistent and constant, bringing in new methodologies. These organizations often outgrow commercial testing tools and build their own, pioneering complex measurement and experimentation approaches. They focus on innovation and competitive advantage through experimentation.

3. The Areas dimensions of maturity
Now, some of you might have read the preceding section and thought:

‘Hollldd up – I feel like we tick some of the criteria for this stage but not others.’

If this is you, you’re not alone: we found the same for almost all of the clients we used this model with.

This is where the maturity areas come in: in essence, we’ve taken the various criteria that define each stage of the model, and we’ve clustered these criteria into four primary areas, which are:
1. Experiment goals
2. Delivery and process
3. Strategy and culture
4. Data & tools
Organizations rarely mature evenly – you might be Strategic in experiment goals but only Emerging in data and tools. By introducing the areas dimension into our model, we’re able to identify the specific places where each program is falling short. Once we’ve diagnosed weaknesses, we’re then in a much stronger position to begin fixing them.

1. Experiment Goals

This dimension examines what you’re trying to achieve with experimentation. Are experiments random and goalless (Reactive), or do they ladder up to strategic business objectives (Strategic)? The most mature programs have company-wide goals that push boundaries and treat both learning and earning as valuable outcomes.

2. Delivery and Process

How efficiently and effectively do you run experiments? This covers everything from tracking velocity through each stage of the experimentation process to having clear frameworks and stopping protocols. Mature programs have rigorous, well-documented processes that everyone follows, with continuous optimization of the process itself.

For an example of optimizing the optimization process itself, check out this blog post.

3. Strategy and Culture

The cultural dimension is often the hardest to change but also the most impactful. It encompasses leadership buy-in, how widely experimentation is adopted, and whether there’s a true test-and-learn mindset across the business. Advanced programs have experimentation embedded in their DNA, with everyone from C-suite to individual contributors running tests.

4. Data and Tools

This covers both the research feeding your experiments and the technical infrastructure supporting them. Mature programs have constant research loops, integrated tech stacks, and advanced testing capabilities. They’ve often moved beyond commercial tools to custom solutions that support their scale and complexity.

By assessing where you stand on each dimension, you can create targeted improvement plans. For instance, if you’re Strategic in goals but Emerging in tools, you know to focus on tech stack integration and research capabilities.
4. Bringing it all together: how to actually apply this stuff
Now that you understand the maturity stages and areas, we’re going to finish up this article by sharing the step-by-step process that we’ve been using to help our clients level up their maturity.

Here it is:

Step #1: Honest Assessment

Gather your experimentation team and stakeholders to evaluate where you currently stand on each area. Use the detailed criteria in this file to score yourselves objectively. Don’t aim for perfection – even being aware of your gaps is valuable progress.

Step #2: Identify Your Constraints

Look for the areas where your program is least mature. These are your primary constraints holding back overall program maturity. You can’t jump from Reactive to Optimized overnight, but you can identify the specific blockers preventing you from reaching the next stage.

Step #3: Create Your Roadmap

Based on your assessment, identify 2-3 concrete actions that will move you forward in the next 6 months. For example, if you’re Emerging in Data & Tools, you might focus on:
- Establishing defined KPIs with proper tracking
- Implementing continuous research practices
- Integrating your testing tool with analytics platforms
Here’s an example of 3 actions that we chose to focus on with one of our clients recently:

Step #4: Set Realistic Timelines

Most organizations take 3-5 years to move from Reactive to Optimized. Plan for steady progress:
1. Year 1: Move from Reactive to Strategic, focusing on the fundamentals
2. Years 2-3: Progress to Integrated, scaling successful practices
3. Years 3-5: Push toward Optimized, innovating and leading your industry
Step #5: Regular Reviews

Reassess your maturity every 6 months. Celebrate progress in specific areas while identifying new constraints. Remember, maturity isn’t just about running more tests – it’s about building a sustainable system that drives continuous improvement and innovation.

The key is starting where you are and taking consistent steps forward. Even small improvements in process, culture, or tools can unlock significant value when compounded over time.
Thanks for reading! If you’d like to chat about how we can help you improve your program’s maturity – or if you’d just like to chat about experimentation in general! – please do get in touch. We’re passionate about experimentation and always look to share notes.

Feel free to reach out to us by filling out our contact us form!

The post The Conversion maturity model: How mature is your experimentation program? appeared first on Conversion.

UX Research: Moderated vs. Unmoderated Testing

Christopher Barlow — Wed, 30 Jul 2025 11:25:53 +0000

If you want to know what your users are doing on your website, A/B tests are your go-to tool. But if you’re looking to uncover why they’re doing it, user testing is the key to unlocking those insights.

One common question that arises is whether to conduct moderated or unmoderated user testing to gather insights. Both approaches can uncover the “why” behind user behavior insights that pure analytics or A/B testing alone might miss.

In fact, at Conversion, a GAIN specialist, we regularly use both moderated and unmoderated testing methods as complementary tools in our experimentation strategy. This article breaks down the differences, pros, and cons of each approach, and how to decide which method (or combination) is right for your needs.

Introduction
What is moderated user testing?
What is unmoderated user testing?
Which testing method should you use?
Integrating user research into experimentation
The Author

What is moderated user testing?
Moderated user testing involves a researcher actively guiding and observing a participant through tasks in real time. The session can be in-person or remote (via video call), but in either case, a moderator is present to introduce tasks, ask follow-up questions, and probe the participant’s thoughts.

For example, a researcher might ask a user to complete a product purchase on a website, observe where they encounter friction, and ask, “What are you thinking at this step?”

Moderated sessions are fundamentally qualitative, capturing rich observations and direct feedback from users. This makes them extremely powerful for uncovering why users behave a certain way.

A skilled moderator can dig into users’ motivations, clarify any confusion on the spot, and explore unexpected behaviors in depth. At Conversion, we frequently run moderated interviews or usability tests to inform our design hypotheses.

For example, in our work with Whirlpool Corporation, we ran an A/B test that highlighted performance issues with an interstitial element. To better understand why users were reacting negatively, we conducted moderated user interviews, which revealed that the interstitial felt “unexpected” and disrupted the experience. These insights guided a redesign of the feature, which we then validated through further experimentation.

Pros of Moderated Testing:
- Deep qualitative insights: Moderated sessions allow direct observation of body language, tone, and facial expressions, and let you ask “why” in the moment. This yields a deeper understanding of user motivations and frustrations. You’re not just seeing what users do, but learning why they do it, which is crucial for discovering new test ideas and solutions.
- Real-time flexibility: The moderator can clarify task instructions or follow interesting tangents as they arise. If a user gets stuck or confused, the facilitator can ask them to elaborate (or can adjust the task in future sessions). This adaptability helps ensure you’re gathering meaningful feedback rather than useless data points.
- Identify subtle usability issues: Because you can probe users’ thought processes, moderated testing often surfaces UX problems that might be overlooked in clickstream data. Minor friction points, cognitive hesitations, or emotional reactions become evident when you’re watching and listening to a user live.
Cons of Moderated Testing:
- Time and resource-intensive: Each session requires a researcher’s active involvement and often involves one-on-one scheduling with participants. This typically means smaller sample sizes. Moderated tests are highly insightful but don’t scale easily; running 5–10 sessions can already be a significant time investment.
- Moderator bias and variability: The quality of insights depends on the moderator’s skill. Poorly worded questions or unconscious cues can bias participants’ responses. (For example, leading a user, “Did you find the checkout confusing?” can plant that idea.) Consistency is key; a structured discussion guide and training help mitigate this risk.
Higher cost per participant: Due to the labor-intensive nature of moderated studies, they can be more expensive on a per-user basis. They may also require specific facilities or conferencing tools. However, the return on insight is often worth the cost when the goal is to gain a deep understanding of complex user journeys or critical conversion issues.
What is unmoderated user testing?
In unmoderated user testing, participants complete assigned tasks independently without the presence of a live facilitator. They might be given a scenario (e.g. “Find and purchase a pair of running shoes on our site”) via a testing platform or survey, and their screen actions and comments are recorded for later analysis. Unmoderated tests are often conducted remotely, with users participating from their homes or offices at their convenience.

Because there is no moderator involved in real-time, unmoderated testing relies on a well-crafted test plan. Tasks and questions must be crystal clear, since participants can’t ask for clarification during the test. The upside is that unmoderated studies can gather data from more users in a shorter time, often at a lower cost.

For example, when a client requested eye-tracking research on a new landing page, we leveraged a remote tool (Sticky) to conduct an unmoderated test with a broad pool of participants. This approach allowed us to collect dozens of eye-tracking sessions simultaneously and follow them with a survey, rather than scheduling each participant individually.

Pros of Unmoderated Testing:
- Scalable and fast: Unmoderated tests can be deployed to many participants at once. You might get results from 20, 50, or more users within a day or two, which is impractical with fully moderated sessions. This makes unmoderated testing ideal when you need quick, directional feedback or a larger sample to increase confidence in findings.
- Natural user behavior: Participants complete tasks in their environment, on their own devices, without a researcher potentially looking over their shoulder. This can lead to more natural behavior. Users are less likely to feel observed or pressured, so you may catch genuine stumbling blocks in the experience. (However, note that lack of guidance can also mean they wander off-track, a double-edged sword.)
- Lower cost per participant: In many cases, unmoderated testing is a cost-effective option. You don’t need to pay a moderator for each session or rent a lab. Many online UX testing platforms offer panel participants and automated recording, which helps drive down costs. For straightforward usability checks or A/B test follow-ups, unmoderated studies can be a budget-friendly way to gather qualitative data at scale.
Cons of Unmoderated Testing:
- Limited depth of insight: Without a moderator, you can’t ask participants follow-up questions in the moment or clarify their responses. You might know what they did (e.g., 3 out of 5 users failed to find the wishlist), but you often have to infer the why from their screen recordings or written comments. In other words, unmoderated tests tend to surface the symptoms of UX issues; you may still need moderated research to diagnose the root causes.
- Rigid test script: The tasks and questions must be carefully designed upfront. If participants misinterpret a task, there’s no way to correct the course during the session. Common pitfalls include users not understanding what they’re asked to do, or question wording inadvertently biasing their behavior.
For example, if your task prompt is ambiguous, participants might do entirely different things, yielding unusable data. Unmoderated testing leaves little room for error in research design.
- No immediate observation of emotions: While many unmoderated tools capture video or audio of the user, it’s not the same as being in the room to notice subtle cues. You might miss non-verbal signals or the ability to probe an offhand remark. In unmoderated sessions, you get what you get: the recorded behavior or survey answers, and sometimes that can feel a bit hollow compared to a rich conversation with a user.
Tip: Since unmoderated studies lack a live facilitator, it is crucial to pilot-test your setup. Run through the test internally or with a couple of trial users first to catch any confusing instructions or technical glitches. As our team emphasizes, proper experiment design is just as necessary in research as it is in A/B testing.

A quick pilot can save you from wasting dozens of participant sessions on flawed tasks. In one of our projects, we piloted an unmoderated test and discovered that the initial instructions weren’t providing enough context, causing confusion.

We adjusted the wording and timing, re-ran the pilot, and only then launched the complete study, avoiding what could have been a costly mistake.

When done right, this upfront effort can even yield unexpected insights.

“The seemingly ‘failed’ result of the pilot test actually gave us a huge A-ha moment on how users perceived these pages… and drastically shifted our strategic approach to the A/B variations themselves,” notes Nick So, our VP of Delivery.

In other words, a misstep in testing design can itself reveal something fundamental about user expectations, as long as you’re paying attention.

Which testing method should you use?
Both moderated and unmoderated testing have a place in a robust optimization and UX research program. The best choice depends on your goals, resources, and the stage of the project. Here are some guidelines to help you decide:
- Use moderated testing for exploratory research and complex scenarios. When you need to gain a deep understanding of user motivations or when evaluating a complex flow or prototype, a moderated session is invaluable.
The ability to ask “why did you do that?” is key to uncovering insights that drive innovative hypotheses.

For instance, providing a comprehensive assessment is crucial for decision-making. Sitting down with users (via Zoom or in-person) to watch them go through it will likely reveal pain points that numbers alone won’t show.

Moderated testing is also preferable if your target audience is particular or tasks are high-stakes. You wouldn’t want a dozen users floundering in an unmoderated test that deals with, say, sensitive financial data or intricate B2B workflows. Instead, a moderated approach allows you to prioritize quality over quantity in the feedback.
- Use unmoderated testing for validation and fast feedback loops if you have a relatively clear idea of what you want to test, for example, the usability of a new feature or a content comprehension check. Unmoderated studies can quickly confirm whether users succeed or struggle.
They are great for getting broad input on straightforward questions. Maybe you want 50 peoples’ first impressions of a homepage hero image: an unmoderated test or on-site survey can gather that data within hours.

Unmoderated testing also shines when you need to benchmark an experience (e.g., how long does it take on average for users to find an item using your site search?) or when you want to test with users across many time zones without scheduling. Just remember to keep tasks specific and straightforward, and invest time in writing clear instructions (again, pilot testing is your friend here).
- Consider a mixed approach for the best of both worlds. Moderated and unmoderated testing are not mutually exclusive; instead, they complement each other. Using them together can amplify their strengths.
At Conversion, our philosophy is to mix methods to get a 360° view of the user. Quantitative techniques like A/B tests or analytics tell us what is happening, while qualitative research tells us why.

A moderated interview might reveal an unexpected user need, which you can then validate at scale with an unmoderated survey to see how widespread that sentiment is.

Alternatively, you might start with unmoderated usability sessions to identify the most common UX issues, and then follow up with moderated sessions to delve deeper into those specific problems. In practice, we often alternate between the two.

In our experience, a blended strategy drives the most significant impact. One of our ongoing partnerships is a great example: with Whirlpool Corporation, we established a regular cadence of both A/B testing and UX research. This mixed-methods program enables us to continually gather qualitative insights to inform new experiments and quantitative results, measuring their impact. The Whirlpool team gets to see the whole picture, not just that a change improved revenue by X%, but why it resonated with customers (or didn’t).

Their Senior Optimisation Manager put it well: “Conversion has become a trusted partner… quite literally an extension of our in-house capability,” a nod to how seamlessly we integrate research with testing.
Integrating user research into experimentation

When it comes to moderated vs. unmoderated testing, the answer isn’t one or the other; it’s figuring out when to use each, and often using both.

Moderated sessions offer depth and discovery, while unmoderated sessions offer scale and speed. The true power of conversion optimization lies in uniting these methods within an overarching experimentation framework. By doing so, you ensure that every A/B test is not just a shot in the dark but a data-informed hypothesis grounded in real user behavior and feedback.

Above all, remain user-centric. Any test or optimization should ultimately serve the needs of your users. Moderated and unmoderated research are tools to keep you connected to those needs, whether through the voice of a single user in an interview or the patterns of thousands of users clicking through your funnel. The companies that win in CRO are those that never lose sight of the customer experience behind the metrics.

In the end, the moderated vs. unmoderated question isn’t a competition at all. It’s a collaboration. When used thoughtfully together, they ensure your UX research is both broad and deep, and your optimization efforts are both data-driven and user-informed. That is the formula for creating digital experiences that not only convert but also delight.
The Author

Christopher Barlow – User Experience Consultant

The post UX Research: Moderated vs. Unmoderated Testing appeared first on Conversion.

Using experimentation to find a product’s optimal price (and increase RPV by 16% in the process)

Stephen Pavlovich — Wed, 14 May 2025 17:39:24 +0000

What question keeps product leaders up at night?

Crying babies, anxious dogs, general unease about the state of the world. Oh, and:

“Is our product priced right?”

And that’s for good reason – pricing can make or break your business.

If you get it right, you unlock massive value – selling your product at the optimised price to balance profit and demand. (But get it wrong and you leave serious money on the table.)

The problem is, there’s not always a clear answer to the question. It helps if you’re selling a product in a competitive market, as that at least gives you a starting point.

But what if you’re introducing a disruptive product, without the customer having a clear idea of what the product should cost? When there’s no precedent, you can set whatever price you want…

Intro
How do you decide what to charge?
Will customers tell you what they pay?
But…are people reliable?
Enter: Mixed-Methods research
Balancing risk and reward
A/b testing price
But wait, there’s more – price anchoring and framing
Holding horses: evaluating long term impact

How do you decide what to charge?

That was the challenge for our client – a SaaS brand in an emerging industry with great traction.

They weren’t strangers to experimentation. We’d run experiments for them previously on their landing pages and sign-up flow – and their approach was mature.

Now they wanted to see – is our product priced right?

Like many SaaS businesses, you could sign-up for a monthly subscription – but could save if you committed to a quarterly or annual plan. The plans were the same otherwise – the only change was the level of commitment.

A rough-and-ready, anonymised version of what our client’s plan page looks like. Note: in a bid to maintain client confidentiality, all future screenshots are anonymised versions of the originals.

Our goal was simple: find out the optimal amount for each tier.

The only problem was… how do you do this?
Will customers tell you what they pay
Many people start with the Gabor-Granger method. A research study that – at its simplest – asks customers if they’d buy a product at different price points:
1. You recruit research participants for the study – matching your typical customer demographic.
2. Each participant is presented with the product and asked whether they would purchase it at different price points (eg at $10, $15, $20, and so on).
3. When you aggregate the answers from all participants, you can plot price vs demand and calculate which combination gives you the most revenue.
So let’s say 100 people would buy it at $10, and 80 people would buy it at $15, but only 55 people would buy it at $20… What’s the right price point?

The revenue for each price point would be $1000, $1200 and $1,100. In other words, it suggests the $15 price point would lead to the most revenue.

But…are people reliable?

Good point – most people aren’t reliable. Especially when it comes to something as irrational as pricing. People will often make an emotional decision to buy a product, then justify it retrospectively – and the price is often far less important than we think.

It becomes even harder when we ask people to vocalise their response to different price points.

The Gabor-Granger method relies on participants telling you what they would buy at different price points (attitudinal) as opposed to actually observing whether they would truly buy at these different price points (behavioural).

The chart below shows how research methods vary.

All research methods have their strengths and weaknesses. The Gabor-Granger Method is no exception. Here it is plotted against a range of other research methods in terms of 1) behavioural vs. attitudinal slant and 2) quantitative vs. qualitative data type.

No one method is perfect – and we’d never recommend that a client trusts a Gabor-Granger study on its own – but they are useful to get initial insight, especially highlighting discrepancies between current and optimal price.
Enter: Mixed-methods research

Take another look at the chart of research methods above.

The best approach by far is mixed-method research. Instead of relying on one method (eg analytics), we triangulate opinion through multiple methods (eg analytics x surveys x usability tests).

So how does this apply to pricing for our SaaS client?

First, we ran a Gabor-Granger study on all three subscription tiers (monthly, quarterly and annual). This allowed us to map demand for each tier against revenue generated at each price point:

According to the Gabor-Granger study, the prices of the quarterly and annual plans were already at – or even above – their revenue maximising prices. Our opportunity lay with the Monthly plan, which was significantly lower than the revenue maximizing price.

As you can see from the graphs for the quarterly and annual tiers, the Gabor-Granger study indicated that any increase in price would reduce demand. What’s more, revenue would drop in parallel – so there was no elasticity.

But – take another look at the monthly chart. It shows that there was significant room to increase the price. At the time, the price of the monthly subscription was 25USD per month – but the study indicated that they would drive the most revenue by increasing the price to more than double at 51USD per month!

So… what did we do? Change the price to 51USD and A/B test it?

Not quite…
Balancing risk and reward
Pricing tests are generally considered to be one of the riskier forms of A/B test. (It’s how Amazon got in trouble in 2021.)

And the more radical the price increase, the more risky price tests are… and we’re talking about a 2x increase here.

To minimise risk while maximising insight, we therefore chose to run this as an A/B/C test with the following variations:
- Control – price stays at 25USD/mo
- Variation 1 (V1) – lower risk – 33USD/mo
- Variation 2 (V2)- medium risk – 41USD/mo
We ran two variations against the original 25USD monthly price. In the first, we changed the price on the monthly subscription to 33USD (or 1.06USD per day) and in the second we changed it to 41USD (or 1.32USD per day).

Note: we could have increased the price all the way up to 51USD like the Gabor-Granger study suggested, but a price increase of this size was perceived by the client as being particularly high-risk. So we capped ourselves at 41USD, with an option to test more in the future depending on the first A/B test’s results.
A/b testing price

So, with all of that said, what was the result?

As expected, monthly subscriptions fell for both variants – by 17% for V1 and by 27% for V2. In both cases, this decrease in demand was steeper than the Gabor-Granger study had predicted.

What’s more, total revenue generated by the monthly plan had also fallen in both variants – by 8% in V1 and by 7% in V2.

So, in other words, the Gabor-Granger had been slightly optimistic in its predictions…

…but – and this is quite a big but (I cannot lie) – here’s a twist I’ve been holding back.
But wait, there’s more – price anchoring and framing
Let’s take a quick detour to explore pricing psychology.

In previous experiments, this client’s customers had been particularly sensitive to price framing and price anchoring effects:
- Price framing is when the price of a product or service is positioned in a certain way to make it less or more appealing to customers, e.g. maybe you frame the price of your subscription product in terms of daily vs. monthly cost (like we’ve done here)
- Price anchoring is when a higher price is used first to give the user a baseline to compare against (think of it like a decoy).
Based on some survey feedback, we’d previously tried framing the price in terms of monthly cost rather than daily cost and it had resulted in a 22% fall in total subs. Read: this client’s users were extremely sensitive to price framing.

That means we can’t just look at the monthly pricing in isolation – we have to consider the overall impact on quarterly and annual sales as well.

Given customer’s sensitivity to pricing tests, there was a chance that raising the monthly plan price would make the discounts for a quarterly or annual plan more attractive.

Did this bear out?

Yes.

In a big way.

Demand for quarterly subscriptions increased by 33% in V1 and by 67% in V2.

So by increasing the price of the monthly plan, we slightly reduced revenue from monthly subs but massively increased revenue from quarterly subs.

All in all, this netted out at an increase in revenue per visitor (RPV) – our primary metric for this test – of 16% with 99% statistical significance.

As you can imagine, this result caused quite a stir within the business. On its face, we’d given the client the means of increasing revenue by 16% overnight.

So did they pop the champagne and roll the new pricing strategy straight out?

Again, not quite…
Holding horses: evaluating long term impact
Increasing the price of the monthly subscription by over 80% was a huge decision for the client – and not one that they, or we, were willing to leave to chance in any way.

The initial data suggested that revenue would rise if we increased the price to 41USD/mo, but we still needed to monitor long-term metrics like churn rate and LTV. After all, if short-term revenue rose but churn rose or LTV tanked, then there would be no point in making this change.

As a result, we’ve been working with the client to track a range of long-term metrics across different cohorts. Once that data is in, we – and they – will be in a much stronger position to understand the full range of consequences and whether this price increase is likely to be a viable option for them.

Looking beyond this specific experiment: with something as sensitive as price, we would never assume that what works on one market/region can be straightforwardly carried across to other markets/regions.

We’ve been working with the client – using the lethal Gabor-Granger/experiment combination – to identify those markets where the price is already optimal and those where it is ripe for optimization.

And you know what?

The client’s product leader has never slept better!

tl;dr
- The Gabor-Granger method can be a great way of ascertaining demand and revenue for your product at different price points.
- Saying that, Gabor-Granger studies are limited in many respects, so the best way to prove – or disprove – its results is with an A/B test.
- Pricing experimentation isn’t for everyone. It can be risky and requires a high-level of maturity and sophistication. But when done right, it can offer some of the strongest ROI of any type of experiment.

The post Using experimentation to find a product’s optimal price (and increase RPV by 16% in the process) appeared first on Conversion.

8 ways to get marketing and product teams experimenting together

Collin Crowell — Mon, 17 Feb 2025 12:15:24 +0000

There are many good reasons why marketing and product departments should work together on A/B testing, from driving efficiencies to creating better experiences. However, achieving harmonious cross-team collaboration is difficult, and most businesses want more than “good reasons.”

So here’s the TLDR: A study by Kameleoon found that companies are 81% more likely to grow when product and marketing strategies are aligned. The same study showed that less than 50% of managers felt confident their product and marketing-led growth programs were aligned.

The picture is clear: collaborative testing efforts make good business sense, but companies struggle to make it happen. It doesn’t help that terms like collaboration, culture, and alignment are difficult to action. So, in this article, I share practical ways to get your product and marketing teams working on experimentation together.

Introduction
8 practical ways to get marketing & product experimenting together
Takeaways

Build a strong foundation for teams to collaborate
No matter how far into your journey of working with colleagues from other departments, it’s good to take a step back and ensure you have all the groundwork in place. You’ll need strong foundations to build on. Here’s what you need to do first.

1. Define roles and responsibilities

There are many moving parts and different tasks involved in good experimentation. To avoid finger-pointing, identify who is responsible for specific tasks, who makes decisions, and who needs to be kept in the loop. The bigger the business, the more complex this becomes, but it should still be agreed upon with representatives from all teams present.

Make it formal and add structure with a RACI chart. The completed RACI chart will look different for every business depending on the structure of your testing team (centralized, COE, or decentralized), maturity, resources, and department complexity.

Doing the RACI exercise will help managers ensure they have enough resources to support experimentation as things ramp up.

2. Choose a unifying testing tool

It’s hard to work together if you’re using separate testing tools and statistical engines. Teams stay in their silos, finger point, and fail to see how their experimentation work is affecting others.

Different teams have different needs. Marketing teams tend to build more frontend tests using WYSIWYG editors, whereas product teams typically run server-side tests that need a developer. Everyone wants AI.

If your experimentation tool only supports one way of building tests, it will cause unnecessary bottlenecks for developer resources or restrict teams in where and how they test. Neither are good outcomes.

The best way to encourage collaboration is to choose a tool that supports both testing approaches, offers AI features, and provides dashboards that show how each test affects the other team’s metrics. Kameleoon’s research found that organizations that used a single platform for experimentation were 70% more likely to grow significantly than those that didn’t.

“An enterprise experimentation platform that combines strong web and feature experimentation capabilities is critical for bringing all teams together around a unified experimentation approach.”

Peter Ernst, Director of Digital Experiments, Providence Health & Services

3. Standardize with templates and processes

This feels like an easy one to skip over. However, consistent documents and templates mean people from different teams can pick up and understand any test document immediately. This consistency allows you to store all your data in a test repository, and teams can easily view previous experiments. Not only this, but consistent reporting means meta-analysis is possible, leading to great insights into your program.

This consistency is even more important when it comes to prioritization. All teams must use the same method to judge test ideas. Without this, testing becomes a battleground, and tests will be prioritized not in the business’s interest but in the individual’s.

4. Work with the same data and KPIs

It’s rare to find teams using the same tech stack even when doing the same thing. This causes the dreaded data silo where each department holds its own stash of user data and separate KPIs. This leads to situations, for example, where marketers focused on acquisition drive more leads, but those leads may turn out to churn or provide low value. Great companies insist their teams understand how acquisition affects adoption and vice versa.

Collaborating teams need access to reliable data across the user journey that is consistently calculated with visibility of each other’s metrics. Sharing segments, audiences, and KPIs is easy when all teams use the same unified A/B testing platform. And when things are easy, teams do it more often. So make sure your testing tool has these features natively available. In the Kameleoon research, we saw that teams who share data, KPIs, and segments increase test velocity by 3-4x.

Good companies test. Great companies work together to test flywheels

Given that companies have overall goals, you’d expect everyone to pull in the same direction. However, approaches to reaching those goals vary by team. Not to mention, department-specific goals and priorities creep in. This leads to teams fighting over resources to get their work done. No one wins, including the business, in these situations.

That’s why showing teams how experimentation outcomes impact other team goals is imperative. For the greatest business success, teams must work together to test their flywheel. Answering questions like;
- How does our acquisition strategy affect adoption/retention?
- What levers can we pull to improve customer lifetime value?
- What new channels or strategies can be used once we maximize paid acquisition?
The level and willingness to align on goals will depend on organizational structure and how experimentation is perceived internally. As a starting point, teams must have visibility of all team metrics and KPIs.

Here are some additional action points to help product and marketing teams work together.

5. Get input from AI & different teams before running user research

There’s a lost opportunity when user research is done without other teams’ involvement. For example, marketing-led research might not ask users how product features impact their perception of the brand as it’s not something marketing can impact. Still, the insights are valuable for both teams and help improve the business flywheel. After all, the same customer sees your marketing and then experiences the product. Both aspects impact their overall experience. Collaborative user research, with input from AI, creates better insights, a holistic business view, and can draw teams together around shared problems.

6. Build test roadmaps together

Your team’s priorities and goals decide what goes onto your roadmap. But there is only so much test bandwidth and resources. If teams don’t prioritize and build testing roadmaps together, it can lead to chaos. Not to mention wasting effort and potential test conflict if teams aren’t aware of what the other is doing.

Working on a short and longer-term roadmap with swimlanes is a good approach. But in the real world, things change, so regular standups and monthly meetings can help keep individuals on the same page.

7. Invite AI and people from different departments to ideation sessions

There are hundreds, if not thousands, of ways to solve user problems. While common UX patterns offer some advantages for users (reducing the mental load needed to understand how something works), they can leave your experience looking and feeling like everyone else’s.

Divergent thinking can create novel solutions to problems, and involving multiple individuals from different backgrounds and disciplines and AI can boost idea creation. Involving various teams in ideation sessions means individuals are more “bought-in” to a test process than if they weren’t involved at this stage.

8. Celebrate improving the company flywheel

A lot of work goes into cross-team collaboration. It doesn’t happen overnight and won’t always be plain sailing, but the advantages for individuals, teams, and the company make it worthwhile. So, when the wins come, celebrate them.

If individuals are incentivized, using outcomes based on collective testing efforts can help motivate them, too. Rewards that can be enjoyed by all teams together will also strengthen work bonds and foster shared understanding.
Takeaways: 8 practical ways to get marketing & product experimenting together
Here are eight practical ways to support the marketing and product team in working collaboratively and testing together.
1. Define roles and responsibilities
2. Choose a unifying testing tool
3. Standardize with templates and processes
4. Work with the same data and KPIs
5. Get input from AI & different teams before running user research
6. Build test roadmaps together
7. Invite AI and people from different departments to ideation sessions
8. Celebrate improving the company flywheel
Feature image by Christina @ wocintechchat.com on Unsplash

The post 8 ways to get marketing and product teams experimenting together appeared first on Conversion.

Unlocking Insights: The Power of Painted Door Tests

Haliena Brown — Mon, 28 Oct 2024 16:49:39 +0000

In the fast-paced digital world, businesses constantly seek ways to understand their customers better and innovate without unnecessary risk. The problem? Developing new features or products based on assumptions can be costly and often leads to disappointing results.

Enter painted door tests.

Painted door tests are a clever, cost-effective type of experiment used to gauge user interest in potential new features before fully developing them.

If you have ever clicked on an ad or a button on a website and been met by a ‘coming soon’ or ‘this service isn’t available yet’ message, you may have been part of a painted door test. This post will examine what precisely a painted door test is, how it works, why it is valuable, and how to implement it effectively.

Introduction
What is a Painted Door Test?
Why Use Painted Door Tests?
Potential Negatives of Painted Door Tests
How to Implement a Painted Door Test
A new tool in your CRO toolkit.

What is a Painted Door Test?
The name ‘painted door’ comes from an architectural technique in which a door is painted on a wall for aesthetics rather than serving any functional purpose. A painted door test is a similar concept. It is a form of A/B testing in which a new feature, product, or service is presented to users as if it exists, often through a button or a link, without being fully functional.

When users click on this “painted door,” they are either informed that the feature is not yet available or redirected to a survey or a different page. The purpose is simple: to measure user interest in a potential new offering. These tests allow companies to make data-driven decisions, ensuring that resources are allocated to features with real user demand.

The great thing about painted door tests is that they are versatile. They can be used in multiple different ways, including:
- New Features: Testing interest in new functionalities within an existing product.
- Product Variations: Gauging demand for different versions of a product.
- Service Offerings: Exploring user interest in additional services or support options.
It is important to remember that the painted door test should look and act as similar to the real thing as possible. A clear call to action should sit as close as possible to where the user would naturally select the product on the site, if it were truly available, to gain the most accurate data. Later, we will take a closer look at how to implement painted door tests effectively, but first, we will examine why you should be using them.
Why Use Painted Door Tests?

Understanding painted door tests is just the first step. This section explores how painted door tests enable data-driven decisions and enhance the overall user experience by prioritizing features that resonate.

So: why should you consider using painted door tests?

Gathering Information at Minimal Cost: When looking at your next experiment, think of the smallest possible experiment – in terms of time and resources – you could complete to test your hypothesis successfully. Painted door tests are a great example of a Minimum Viable Experiment (MVE). They are low-cost and quick to build, and when done correctly, they should give you everything you need to validate or invalidate your hypothesis. This method means you can test multiple ideas quickly and affordably. The reduced financial risk is particularly advantageous for startups and small businesses with limited budgets. We go into more detail about why MVEs are essential in an experimentation strategy here.

A standout example of why you should consider a painted door test instead of going big is from one of our experiments. Early research conducted while working with a real estate company dictated that adding a map feature to the property search function would increase inquiries.

Once we had created our hypothesis, we dedicated a lot of time and resources to developing this feature on-site. It was an extensive project, as this feature was more complicated than we initially thought. The results: the map had no impact on user behavior.

What should we have done? We should have tested the hypothesis with a simple, low-cost painted door test. By replicating the feature with a button and call-to-action (CTA), we would have been able to see whether anyone would use the Google Maps functionality.

When considering the execution options for our painted door test, we could have explored a couple of strategic approaches. One option could have been to incorporate a “View Map” CTA that, upon clicking, would display a message informing users that the feature isn’t currently available. This approach could help gauge interest without full functionality in place. Alternatively, we could have implemented the same “View Map” CTA but linked it directly to Google Maps. This would provide users with immediate map functionality, albeit external to our site, offering a seamless experience while still allowing us to measure engagement.

If the painted door test showed that customers were interested, we could have used this data and implemented the feature. However, if the test showed that customers were not interested in this feature, we could have looked at alternative methods to increase inquiries.

Build Better Products with Less Risk: Use actual user data to guide product development rather than assumptions or hunches. This approach leads to more informed and strategic decision-making. Companies can prioritize features that demonstrate user demand, fostering a deeper connection with their customers and enhancing their satisfaction and engagement. Painted door tests help ensure that only the most promising ideas move forward. By validating concepts early, companies can focus on high-potential projects and avoid costly missteps.

Below is an excellent example of how conducting a painted door test has provided concrete data for the product team and saved nearly a year of product development time. This example highlights the importance of gathering real-time user feedback on new products.

Domino’s, a well-known pizza restaurant chain, sought to introduce a new “premium” cookie but faced indecision among four potential options. Traditionally, product development at Domino’s spans 12 months, involving extensive market research with uncertain real-world demand, risking significant time and financial investments.

We proposed replacing the lengthy R&D cycle with a swift 1-week painted door test. Collaborating with Domino’s R&D team, we added four “fake” cookies to the menu and measured customer interest by tracking “Add to basket” clicks. We offered these cookies at two different price points across various stores to test pricing, creating eight test variations in all.

Before the test, the Domino’s product team believed that Chocolate Orange would be the clear winner. However, the experiment revealed that customers were far more interested in the Salted Caramel flavor, which had a 32% higher conversion rate than the initially favored flavor.

During the test, we also found that customers were willing to pay a higher price, providing valuable insights into price elasticity. This approach delivered concrete, actionable data from real-world purchasing behavior, bypassing the limitations of traditional market research.

Our method proved to be a low-risk, cost-effective strategy that significantly accelerated Domino’s product development process while yielding reliable insights and high rewards.

Enhanced User Experience: Businesses can create more compelling and user-friendly products by focusing on users’ desired features. This not only improves customer satisfaction but also fosters loyalty and long-term engagement. Look at Gousto, a food delivery company that wanted to test whether adding more payment options at checkout would increase completed orders. Rather than implementing these options without data, we helped Gousto conduct a painted door test.

Gousto utilized painted-door experiments to assess the viability of integrating PayPal and other payment methods into their platform. Instead of fully implementing these options right out the gate, we strategically tested user interest and behavior.

The core of the experiment involved adding PayPal and GPay call-to-action buttons at checkout, even though these payment options were not functional. Users who clicked these buttons were given a message explaining that the respective payment method was unavailable. This setup allowed Gousto to measure the click-through rates and gauge user interest in these alternative payment options.

While the painted door experiments initially resulted in a marginal decline in signups, the valuable learnings and insights gained far outweighed this short-term effect. These tests allowed Gousto to identify segments of their user base eager to use PayPal or GPay, potentially increasing conversion rates once these options were fully implemented. More importantly, the experiments provided crucial data for forecasting the impact on revenue once additional payment methods were rolled out.

Potential Negatives of Painted Door Tests

While painted door tests provide valuable insights, there are potential drawbacks. As with any test, evaluating the positives and negatives is important to mitigate any unnecessary risks. Below are two main points to consider when considering painted door tests for your next experiment.

Misleading Metrics: Since painted door tests measure initial interest, they may not always accurately predict actual user behavior once a feature is fully developed and implemented. Users might click on a novel feature out of curiosity, but their engagement could differ when the feature is fully functional.

a. It is important to note that painted door tests should be part of a broader strategy. If you have a feature that was successful through a painted door test, it is still best practice to test the fully developed feature before going live. This enables you to gain more data and further validate the customer’s interest in the feature.

User Frustration: Users may feel disappointed or frustrated if they click on a painted door expecting a new feature only to find out it’s not yet available or functional. This can impact user trust and satisfaction, especially if the testing process is not transparent or well-explained, affecting the conversion rate.

a. This frustration can and should be mitigated by making the painted door test as transparent as possible. There should always be clear messaging supporting these tests and you should be open with customers about what is happening.
b. The good news is that, as with any A\B tests, painted door tests can be paused if they significantly affect the conversion rate. We can also serve the test to a small % of the traffic to reduce the risk.
How to Implement a Painted Door Test

Hopefully, the information above helps you decide whether painted door tests would be a good fit for your experimentation program. Once you know this, it’s time to implement. This requires a structured approach to ensure meaningful results.

This section provides a guide to executing these tests effectively. Each step is crucial in gathering actionable insights, from defining a clear hypothesis to creating a compelling design and effective user engagement tracking.

Our hypothesis framework

Identify the Hypothesis: Determine what you want to test and what the intended result is. For instance, “Will users be interested in a one-click checkout feature?” Having a well-defined hypothesis helps set clear goals and metrics for success. You can find a detailed post here if you are interested in how we build our hypothesis.

It’s important to note, a key difference between a painted door test and a traditional A/B test lies in how success is measured. In a traditional A/B test, a statistically significant uplift in your primary metric signifies a winner. However, a painted door test operates differently: the control group sees zero clicks on the non-existent feature, meaning the variant will always show a significant uplift in clicks.

This doesn’t automatically deem it a success. Instead, success in a painted door test requires upfront criteria, such as a minimum percentage of users engaging with the feature. For instance, would at least 20% of users need to click on the feature to justify further investment? Defining these benchmarks beforehand is crucial to making informed decisions about rolling out new products.

Create the Painted Door: Design a visual element that clearly suggests the new feature. This could be a button, banner, or link. Ensure it stands out and is placed in a relevant area of your website or app. The design should be enticing enough to attract clicks while fitting seamlessly within the user interface.

Track Engagement: Use analytics tools to track clicks on the painted door and measure the number of users showing interest. Tools like Google Analytics, Hotjar, or custom tracking scripts can provide the necessary data. Proper tracking is essential to capture all relevant interactions and metrics.

Don’t just look at click-through rates; analyze user behavior before and after the click, demographic data, and qualitative feedback. This holistic approach provides a deeper understanding of user intent and potential barriers.

Provide Feedback: Once a user clicks on your painted door, they must be informed via a popup or redirected to a page explaining that the feature is not yet available but is under consideration. Optionally, you can collect feedback or email addresses for future updates. This step is crucial for maintaining user trust and gathering valuable insights.

Analyze Results: Assess the data to understand the level of interest. High engagement indicates strong user interest, while low engagement may suggest the feature is not worth pursuing. It could also indicate that the idea should be iterated on in a different area/way on-site. Look at click-through rates, user feedback, and any patterns or trends in the data to gain a comprehensive understanding. Use this data to decide whether to develop the feature further.

Iterate and Test: The speed and adaptability of painted door tests lead to iteration. Once you have analyzed your data, you can iterate and test the fully developed feature before going live. If the data isn’t significant enough, you can run another test with a different design or in a different area on-site.

Top tip: make sure not to run too many painted door tests simultaneously.
A new tool in your CRO toolkit.
Painted door tests are a strategic tool in the Conversion Rate Optimisation (CRO) toolkit. They offer a low-cost, high-value method of understanding user preferences. Businesses can use these to make informed decisions, reduce risk, and maximize the potential for successful product launches.

However, it’s important to note that painted door tests are just one part of a broader CRO strategy. For optimal results, consider integrating painted door tests with other methods, such as more A/B testing, user surveys, and usability testing. A multi-faceted approach ensures a comprehensive understanding of user behavior and preferences.

Here are two more valuable tips to keep in mind when running your painted door tests:
- Actionable Insights: Refine your approach by using the data collected from the painted door test. Whether you tweak the feature based on user feedback or pivot to a new idea, let the insights guide your decisions.
- Follow-up: Use the opportunity to gather user feedback or interest via surveys or sign-ups. This additional data can provide deeper insights into user preferences and expectations. Consider offering an incentive for providing feedback, such as a discount or early access to the feature.
Ready to explore more CRO techniques? Check out our latest case studies showcasing how we’ve helped businesses like yours achieve remarkable growth.

The post Unlocking Insights: The Power of Painted Door Tests appeared first on Conversion.

The Conversion Experiment Repository: generating a competitive edge for our clients

Frazer Mawson — Thu, 19 Sep 2024 16:25:37 +0000

Here at Conversion, we’re always looking for ways to create an unfair competitive advantage for our clients. Nothing allows us to do this more effectively than our Experiment Repository.

As far as we know, our Experiment Repository is the largest, most robustly tagged collection of experiment data in the world. By putting this one-of-a-kind resource at our clients’ disposal, we’re able to give each of them a sizable, one-of-a-kind edge over their competition.

In this post, we’re going to start out by introducing our Experiment Repository, before then walking through some of the most impactful ways we’re using the repository to generate previously untapped value for our clients.

Insofar as business experimentation is concerned, experiment repositories are still a relatively underexplored commodity. Our hope is that by sharing some of the techniques we’ve developed to get the most out of our own repository, we’ll inspire and empower others to do the same with theirs.

So, without further ado…

Note: for those teams looking to build an experiment repository from scratch, we have another piece of content that will serve your interests better than this one (click here). The piece you’re reading now can be thought of as part two in series: how to use your repository – once it’s already built – to drive results.

Introduction
Our Experiment Repository
7 ways that we’re using our repository to generate a competitive edge

Our Experiment Repository

Since opening our doors 15+ years ago, we’ve stored and tagged every experiment we’ve ever run.

As the world’s largest experimentation agency, this means we now have a database of more than 20,000+ experiments, which have been run across countless websites, industries, verticals, company sizes, maturity levels, etc.

What’s more, throughout this period of time, we’ve dedicated huge amounts of energy to developing some of the most advanced taxonomies our industry has to offer. These taxonomies allow us to slice up our data in novel ways to unearth patterns that would otherwise have remained buried beneath the noise.

As a result of all this effort, we now have what we believe to be the largest (in terms of size and breadth), most operational experiment repository on the planet.

High-level view of our experiment repository

This puts us in an extremely unique position.

We have access to data that no other experimentation team has access to. Working out what to do with all of this – how to turn it to our clients’ advantage – has been a unique and ongoing challenge.

After many years of trial, error, and iteration, we’ve now arrived at some extremely well-validated techniques for exploiting this resource in our clients’ favor.

In fact, our Experiment Repository has become the single greatest source of value that we’re able to offer our clients – and as we trial increasingly innovative techniques and technologies, the value of the repository is only growing with each passing year.

Throughout the remainder of this piece, we’re going to share some of the most effective techniques we’ve come up with – so far – with you.
7 ways that we’re using our repository to generate a competitive edge
1. Database research

Most experimentation teams are limited to the same set of research methodologies – things like analytics, surveys, user research, etc.

While we ourselves still use these kinds of methodologies extensively, we also have access to a completely novel research methodology of our own:

Database research.

Database research involves querying our experiment repository to unearth macro-trends that we use to develop better hypotheses and produce more successful experiments.

In conjunction with data from other methodologies, the value of this information can’t really be overstated. For example,
- Analytics may tell you where users are dropping out of your funnel.
- User testing may tell you why they are dropping out of your funnel.
- Database research tells you how past clients solved this problem – and therefore how you might be able to do so too.
It allows us to enrich all of our research data with powerful contextual information, and it ultimately means that we’re able to drive results more quickly and more consistently than we would otherwise have been able to.

To give an example:

Imagine that we’ve just started working with a new client. Without the database, we would be forced to proceed from a standing start, blindly testing the waters with early experiments and then trying to refine and iterate as we go.

But with our database, we’re able to take insights from past experiment programs and apply them directly to our new clients’ programs. For example:
- Which kinds of levers tend to be most – and least – effective within this industry, vertical, and company size?
- Which kinds of psychological principles tend to be most – and least – powerful for users of this kind of website?
- Whereabouts on these kinds of websites – e.g. on product pages or the basket – do tests tend to be most impactful?
- What design patterns tend to perform best in different situations?
  Etc.
By using the database to answer these kinds of questions, we can get a head start and begin driving results from day one of the program.

Moreover, the value of the database isn’t just limited to the start of a program. We’re also able to use insights from our database to solve the problems of mature programs too.

Consider this example:

Through survey responses, we’d discovered that one of our financial services clients had a trust problem: their website visitors did not know who they (our client) were and therefore did not find them to be a credible brand.

Adding social proof in the form of reviews, ratings, and testimonials is often the first port of call for addressing issues of this kind, but before going with this tack, we decided to run some database research.

We queried our repository to discover how past clients in the same industry had solved problems tied to the Trust lever. Interestingly, we discovered that Social Proof was actually an extremely ineffective lever in this specific niche. Financial services clients do not generally want to hear about how other clients have benefited from the service; instead, they want to feel that the service is highly exclusive and that it can offer them a unique edge.

As one example of many, social proof had a profoundly negative effect when added to Motley Fool’s email capture modal.

The Authority lever, on the other hand, which involves appealing to credible institutions and authority figures in an industry, tends to be much more effective in this particular niche – so we deployed this lever and were able to achieve a string of strong winners for the client

In this instance, our repository allowed us to avoid a potentially blind alley of testing and meant we were able to apply a highly effective solution to the problem on our first attempt.

2. Sharpen experiment executions

A good experiment concept is made up of two parts:
- The hypothesis – a prediction you want to validate
- The execution – how you intend to validate it
It’s possible to have an extremely strong, data-backed hypothesis but a poor execution. If this is the case, you may find that your test loses even though your hypothesis was actually correct.

This is a real problem.

Declaring a test a loser when its hypothesis was correct can result in vast sums of money being left on the table.

Thankfully, this is a danger that our experiment repository is helping us mitigate.

When developing the executions for our hypotheses, we subject our concepts to several ‘actions of rigor’ to ensure that they’re as strong and as thought-out as they possibly can be.

Database research offers one such action of rigor. In this context, database research involves querying our repository in various different ways to understand how we might make our execution more effective.

Consider this example:

One of our clients had a single-page free-trial funnel.

Various different research methodologies had suggested that a multi-step funnel would be more effective, since it would elicit completion bias and thereby motivate the user to progress through the funnel.

As part of this test, we knew we were going to need to design a new progress bar. To make this new progress bar as effective as possible, we filtered our repository by ‘component’ and ‘industry’ so that we could find all of our past tests that involved progress bars from the same or adjacent verticals.

When we did this, a clear pattern emerged: low-detail progress bars have a much higher win-rate than high-detail progress bars.

Our meta-components study showed that on the aggregate, low-detail progression bars perform better than high-detail progression bars

We were able to use this insight to build a new, low-detail progress bar for the test – and the experiment ultimately resulted in a 7.5% increase in signups.

Newly designed funnel with simplified progression bar delivers a CR uplift of 7.5%

This is just one example of the many ways we’ve been able to use our repository to sharpen up our executions and generate as much impact for our clients as possible.

3. Machine-learning assisted prioritization

Every experimentation team has more test ideas than they can ever conceivably run. This creates the need for prioritizing some ideas – and deprioritizing others.

Over the years, many serviceable prioritization tools have emerged, but they all fall short in at least one of two ways:
1. Subjectivity – they rely too heavily on gut-feel and too little on cold hard data.
2. One size fits all – they judge every test by the same set of criteria when criteria should be dynamic, based on the unique context of each test.
To solve this problem, we decided to train an advanced machine learning model on all of the data in our database.

Though only the first iteration, this model – dubbed Confidence AI – is now able to predict the results of winning a/b tests with ~63% accuracy. Based on standard industry win rates, this makes the model several times better at predicting a/b test results than the average practitioner.

Confidence AI computes a confidence score for each experiment concept

By embedding this tool into each of our client’s Experimentation Operating Systems, we’re able to use Confidence AI to dynamically prioritize our test ideas when new results and insights come in.

Ultimately, this means we can zero in on winning test ideas far more quickly, while deprioritizing avenues of testing that appear – based on the model – to be less potentially fruitful.

There are various nuances and niceties to the way we use Confidence AI in our work. If you’d like to learn more, click here.

4. Optimizing our own methodology

Our proprietary methodology here at Conversion is one of the primary reasons that leading brands like Microsoft, Whirlpool, and Adblock have chosen to work with us.

In fact, when we apply for awards or enter into competitive pitches, our methodology invariably achieves the highest score possible.

One of the reasons for this is our repository:

Our repository allows us to gain a bird’s-eye view on what’s working with our methodology and what’s not. Over time, we’ve been able to use this data to question our assumptions, invalidate company lore, and refine how we approach experimentation.

To give one (of many) examples:

Many people in our industry hold the assumption that ‘ the bigger the build, the bigger the uplift.’ On its face, this makes sense: experiments with bigger builds are generally assumed to involve more ambitious ideas; ambitious ideas have the potential to move the needle in a big way.

But here at Conversion, we believe in using data to put hypotheses to the test – so that’s what we did.

We used our database to compare the relationship between build size (in terms of dev time) and win-rate/uplift.

To our surprise, we found that tweaks had the same average win-rate and uplift as large tests.

Contrary to industry lore, data from our repository shows that experiments with longer built times actually perform worse than those with shorter build times

This insight has tremendous practical significance. After all, if small tests are just as likely to win as large ones, why waste resources building larger tests?

Since running this analysis, we now seek to validate our hypotheses using the smallest experiment possible – or what we call a Minimum Viable Experiment (MVE). This approach allows us to gather data at speed, and ensures that when we do decide to invest in a big, resource expensive experiment, its odds of winning are significantly higher.

Our MVE-centered approach

By running analyses of this kind and using them to challenge our assumptions, we’ve been able to develop a highly novel approach to experimentation that delivers outstanding results time after time.

5. Unearth macro-trends within a program

The points we’ve been talking about so far have mainly been focussed on the way we use our agency-wide repository to generate value for our clients – but we also create individual repositories for each of our clients too.

The final three points in this post relate primarily to these individual client repositories.

All of the data from within our main repository is filtered off into individual client repositories that contain only the tests and research insights that we’ve unearthed for each specific client.

One of the main advantages of these client repositories is that they allow us to cut up a client’s data and unearth patterns and trends that have accumulated throughout the course of our work together.

This kind of analysis can be incredibly powerful.

With one client, for example, we’d been running hundreds of tests per year for a couple of years. This meant they had several hundred tests in their repository.

When we ran our analysis, we found that some sub levers performed extremely well, some showed promise – and some performed poorly.

We sorted each Lever into an Exploit, Explore, and Abandon category

We split these sub-levers out into three groups:
- Exploit – this has been extremely effective in the past; let’s do more of it
- Explore – this looks promising but we don’t have enough data to be sure; let’s test the waters and see what comes up
- Abandon – we’ve run lots of tests on this but it doesn’t seem to be effective on this website
What’s more, our analysis also revealed that many of our client’s wins and biggest uplifts tended to cluster on specific areas of the website.

Here’s an example of what this kind of analysis might look like.

With this analysis in hand, we were then able to build out our client’s experiment roadmap, focussing the majority of our tests on the levers and areas of the website that our analysis revealed to be particularly promising.

Ultimately, the following quarter’s results turned out to be the most impressive results this client had ever seen. In fact, in large part thanks to this analysis, we were able to hit our annual revenue target – which was itself the biggest target we had ever been set – by the end of March, i.e. with 9 months to spare.

To read the full story, click here.

6. Wider impact

Experimentation isn’t just about driving winning tests or unearthing insights – it’s about using experiment results to inform key decisions. What use is a game-changing insight unearthed through experimentation if nobody in the organization hears about it?

Unfortunately, many experimentation teams struggle with this final piece of this equation. Generating winners and unearthing insights is one thing; finding ways to make those results percolate through an organization, so that they can inform important decisions, is another.

By providing a centralized database of insights that anyone in our client’s organization can access and filter according to their needs, our client repositories do a tremendous job of increasing the ‘impact radius’ of our programs. They grant all teams – from design, product, engineering, and even the c-suite – ready access to our findings, which means these findings have a greater chance of informing decisions across the entire organization.

To give a recent example:

One of our clients had brought in an external design agency to redesign large portions of its website. By providing this design agency with access to our experiment repository, they were able to use past experiment findings and program-wide patterns to steer away from unsuccessful design patterns – and towards successful ones.

From a conversion standpoint, this ultimately meant that the newly designed pages were much more effective than they probably would have been had the design agency operated without access to our insights.

7. Gain buy-in for your program

Tying in with the point above, one of the key advantages of having a centralised experiment repository is that it becomes a lot easier to monitor program-wide metrics, including win-rate, revenue, and ROI.

This is obviously extremely important.

Money speaks.

If you can easily point to the bottom line impact of your program, it becomes much easier to evidence the program’s value, fight for budget, and grow your program.

Our experiment repositories provide our clients with an easily accessible dashboard that updates as soon as new experiment results come in, providing an up to the minute account of the program’s progress and success.

This ROI tracking functionality has proven to be extremely important for many of our clients, allowing any of them to easily demonstrate ROI and ultimately generate leadership buy-in and enthusiasm for the program.

The post The Conversion Experiment Repository: generating a competitive edge for our clients appeared first on Conversion.

Spotlight : Conversion x Best Egg

Haliena Brown — Tue, 03 Sep 2024 12:09:51 +0000

Full Spotlight
Who is Best Egg?
Full Video Transcript

Who is Best Egg?

In this spotlight, we explore the partnership between Conversion and Best Egg, focusing on the experimentation program. We’ll cover how we built the program, the challenges we’ve overcame, and the successes we’ve achieved. Plus, we’ll look at what’s next as we plan for the future. Whether you’re into experimentation or want to improve your own program, this spotlight has insights you won’t want to miss.

Joining us is Meghan Metzner, a seasoned Product Management Diector at Best Egg, a financial technology company dedicated to helping people feel more confident about their finances. Best Egg offers a range of solutions designed to provide straightforward, reliable financial support. With over $21 billion in personal loans funded, Best Egg has earned a reputation for empowering customers to tackle their financial challenges with ease. Meghan’s leadership and the company’s innovative approach have been instrumental in shaping their successful experimentation program.

How Experimentation Inspired My Career

[0.00 – 1.20]

James (Conversion): Thanks for joining me this morning, Meghan. It’s great to see you. I’m excited to chat about everything, experimentation and your experience at Best Egg. That may be the perfect place for us to start. Tell me a little bit about yourself and your current role.

Meghan (Best Egg): Thanks for having me. I’m excited to be here and speak with you. My career started in public relations and traditional marketing, so I did that for about 2 to 3 years. As I spent time in that role, I started to be more interested in the digital space. It seems like that’s where the world was going and what I wanted to explore.

I started to shift my focus more into digital channels such as email marketing, paid search, and some paid social advertising, and that’s when I started to stumble into the website optimization space. That’s when I felt like I found my home, and that ultimately led me into product management.

That’s my role today at Best Egg. I’m a product manager, and I love understanding and learning about customer problems and how we can help solve them to ultimately support the business. So, it’s been all about building new features and optimizing. These experiences have brought me so much fulfillment so far in my career.

James (Conversion): Awesome. It keeps things interesting. Never a dull day.

Meghan (Best Egg): Sure does.
How Experimentation Supports Best Egg’s Digital Strategy

[1.21 – 2.56]

James (Conversion): Tell me about Best Egg and the digital experience there. What are you working on and trying to optimize in that space?

Meghan (Best Egg): Best Egg is a financial technology company, you might hear the term fintech. The goal in what we do ladders up to we’re always trying to provide flexible solutions to help people with limited savings, feel confident in navigating their everyday financial lives.

We’re supporting customers through a growing suite of products. We have personal loan products, we have flexible rent product and all of these different financial health tools. It’s all in an effort to make people feel more money confident.

In terms of our overarching goals and the things that keep us up at night, creating experiences that really differentiate us from others by offering things like payment flexibility for folks. Products that we feel are truly guiding their progress and building experiences that make it easy for the customer.

We always want to make it easy. We want to guide them. We’re focused right now in terms of when we think about the business; we want to increase our personal loan value. We’re hyper-focused on scaling these new products and then building a cross-sell strategy for our customers now that we have more products to offer them.

James (Conversion): There’s a lot of different places to play there as well, in all different areas to optimize and experiences to improve.

Meghan (Best Egg): It’s opened up our testing program with you all, which has been fun.
How does the Experimentation program fit into the Digital Strategy

[2.56 – 4.09]

James (Conversion): Let’s let’s talk a little bit about the experimentation program and how it fits into this digital strategy that you guys are working towards.

Meghan (Best Egg): I would say a majority of our work is focused with your team on what we call the unauthenticated website experience. That’s what I’m the product manager of. Tthat essentially means with [Conversion], we’re creating testing strategies and optimizations for folks who visit our website before they’ve ever even logged into an account.

We’re also starting to dip our toe into the water on how to optimize experiences for actual logins. Things like creating an online account, username and password reset. If we’re talking about KPIs for tests, the thing we’re hyper-focused on with [Conversion] is application starts and login rates.

James (Conversion): It’s kind of like the new customer and the existing customer.

Meghan (Best Egg): Exactly.

James (Conversion): There are two very different cohorts to work on as well.

Meghan (Best Egg): Two very different ways to attack those cohorts when we’ve been working with you guys and how we think about them.
How Conversion Helped De-Risk Best Egg’s Rebrand

[4.10 – 7.15]

James (Conversion): [Best Egg] have been working with us for a number of years now. We’ve had a lot of fantastic work together. What have we accomplished, and what have we been focused on over the last couple of years in the experimentation program?

Meghan (Best Egg): The first thing that comes to mind is de-risking our rebrand. About two years ago, we went through a total, of more than just the website, company rebrand. As you would expect, touched a lot of different things and [Conversion] were so helpful in instrumenting this idea of ‘do no harm’ testing.

It gave us super valuable insight into understanding what to expect once we push this major shift and look and feel [of the brand] live on Best Egg. Thankfully we didn’t have to, but it gave us the opportunity by gathering these insights if there was going to be major harm done to make some adjustments and optimizations before we just pushed the whole thing live, which was huge.

Now that that’s behind us, we’ve gotten a chance to dig into more of that optimization and performative testing. I would say we’ve moved into this world of thinking about the two things I mentioned earlier, all about increasing loan value and then scaling our new products.

Our bread and butter product is our personal loan product. When we began our relationship with [Conversion], all the testing strategy and optimization was focused solely on, and still is a huge part driving, those personal loan application leads. That’s what we’ve thought about day in and day out.

Now it’s been an interesting shift where, if the primary metric isn’t necessarily this personal loan application lead starts in that top of the funnel when those tests aren’t the primary KPI, there’s still always this looking at ensuring that those tests aren’t harming that metric. Just to illustrate, it’s still a very important metric, and that is our bread and butter.

It’s been an interesting shift, but along the lines of those personal lead application starts. We have, in our work [Conversion] had many tests that have improved that metric by up to 4-5%. What’s so interesting is that things like simply changing the copy of a CTA will give you that lift, which is cool to see.

In terms of scaling these products, we’re making a lot of plans to test and optimize these experiences for things like vehicle equity loans, which are still a relatively new product. I think for us, in terms of our work together with Conversion, our biggest area of opportunity is with any new product we launch just focusing on things like content comprehension.

Some of these products can be complex for the average user, [to understand] what they are, and how they will help them. It’s things like copy testing and content hierarchy testing are great places to start to optimizing those experiences for users.

James (Conversion): I remember there have been some pretty surprising small lifts, but big impact tests in the history there.

Meghan (Best Egg): Absolutely.

James (Conversion): Which is cool to see. The flip side of that, of course, is the big lift and the big change things. Speaking about the redesign, something that I remember quite clearly and admire is the culture at Best Egg. The number of people that are coming to our calls, collaborating, the receptiveness to trying new things and following the data.
Building An Experimentation Culture At Best Egg

[7.16 – 9.31]

James (Conversion): Can you tell me a little bit more about what that culture is like [at Best Egg] and what it takes to establish that culture?

Meghan (Best Egg): We’re a team that embraces that test and learning culture and working with [Conversion] just reinforces that within our culture internally, which is fantastic. It also helps our laser focus and obsession with serving the customer in following that data, has also really contributed to the culture that we’ve created here.

The biggest thing in terms of our work together is, when we have a losing task, we don’t view it as a failure. We view it as a meaningful insight. That just helps guide further optimization and iteration into our product and the testing roadmap on Best Egg.com.

It’s interesting to watch, as we craft our roadmap with you all from our testing agenda; we have that framework in place. Once you’ve read out the results of a test, and maybe it wasn’t considered a winner, but you glean a very interesting insight from that. You just continue iterating and kind of shifting around the roadmap as you identify these opportunities from these, “loosing tests”.

That has been embraced throughout the whole company. Not just from the product organization or our work together. You hear it from every single department, which is fantastic.

James (Conversion): That’s awesome. I’ve got to admit, some of my favorite experiments through my career have come from, originally losing tests because the insights are usually pretty interesting.

It’s also pretty admirable that you guys have managed to build that culture through the organisation. It’s one thing for the organization team to love and admire losing tests. It’s another for the design teams, the product teams, and the executive teams.
Using Mixed Methods To Solve Best Egg’s Friction Points

[9.32 – 13.55]

James (Conversion): We’ve done some cool things recently on combining methodologies, what we like to talk about, mixed methodologies [at Conversion] in that blend of the commonly quantitative work of A/B testing and behavioral data that we use. Then the qualitative arm of attitudinal data and digging a little bit deeper into the why behind behaviors, we’re observing in line through, user interviews or those sorts of things.

We’ve done some cool work, and I think your teams have done some great work integrating those two methods across the organization and extracting so you have more insight there. Can tell me a little about what that’s been like over the last couple of years?

Meghan (Best Egg): In our work with Conversion, that’s been one of the biggest unlocks that you all have unveiled to us. You have shown us we’re pretty good at gathering that quantitative data, but there was a huge opportunity for us to start leveraging our own internal insights team to gather more of that qualitative data.

That was one of the most valuable pieces of insights from an annual business review that we conducted together, really rethinking how we think about our testing lifecycle. We started digging into thism bringing insights in more upfront into the process, we were able to take the understanding of the problem and work with design to come up with these multiple different variations that we hypothesized would solve that problem.

What was happening previously is, we were looking at data, but we were looking at web analytics data and seeing the friction points. Then our teams were kind of just deducing from there, well, maybe this is what’s happening or maybe why, this is a friction point.

Now we have that unlock saying, hey, here’s the data, here’s the friction points, let’s go out and ask consumers and customers why this is a friction point. We can craft design variations now that we feel like we have a better shot at addressing the problem or the opportunity.

What we’ve done is historically it would be, hey, here’s one, maybe two design variations that we passed to [Conversion] that we’re still just deducing would solve the problem. We’ve shifted the way we think about that. Now it’s okay, we have an understanding of the problem. We’re working closely with our design team and all other areas of our pod, our insights team, to focus on those problems and come up with 4 to 5 different design solutions. Then we go back to our internal insights team and say, ‘Hey’ now that we understand the problem from you all, let’s get these design solutions in front of folks and get their preference for which one they think solves this issue or that they’re more inclined to resonate with them.

Once we do that, we’ve whittled down the A/B test variations that we ultimately pass to [Conversion]. By doing this, we’ve been able to give our test variations a better chance of success because, again, it just all goes back to the why. Now we finally have those two arms working more hand in hand, and it has turned into this continuous feedback loop cycle with the team. I can’t stress enough how valuable that is that we’ve started doing that.

James (Conversion): Hearing directly from customers and also having that direct feedback. Any surprises that you can think of when you sort of put forward designs and thought, I’m placing my money on this one, and the user feedback comes back, or the test comes back, and they’re going, No, it wasn’t that one.

Meghan (Best Egg): Yes, there’s always surprises like that. I think things such as color choices, as silly as that sounds, there are things where we’ll put this radical design variation in front of customers, and I’m thinking, there’s no way that that’s going to come out as something we pass over to Conversion. Sure enough, it does.

It just reinforces the thought of, when we get together in our work together, we have meaningful insights to bring to the table and have great ideas, but we aren’t our customers. That just continues to be reinforced as we work like that.

James (Conversion): The humility. You’ve got to still put the ideas out there.

Meghan (Best Egg): Let them tell us exactly.
Navigating Challenges In Best Eggs Digital Experience

[13.55 – 16.27]

James (Conversion): Let’s talk a bit about some of the biggest aha moments, the challenges, the successes of the program. Let’s start with the hard stuff. What have been the challenges of doing this work, building this program, and collaborating with these teams? It’s not always easy.

Meghan (Best Egg): We’ve had some interesting challenges. I would say the first one, I just finished talking about the two arms and how wonderful it is that they’re working together. That is all true, but a challenge can be when qualitative and quantitative data don’t show the same results.

You’re left scratching your head for a minute. I think what that shows us is we are learning that what people say they will do or maybe even think they will do isn’t what they’ll always actually do. It just reinforces the importance, though, of having both of those arms working together. So you’re getting the full picture and more reliable results.

That’s been the first big challenge. The second challenge that’s been a really fun one to tackle is that as Best Egg has evolved, we’ve had to shift how we think about our primary KPIs. At the start of our journey together, we were essentially a single product company and it made discussing our main metric easy. We were looking to impact personal loan application starts. That was it.

Now that we’ve shifted into this multi-product world, the conversations are getting so much more interesting about, identifying what that right primary KPI is, and it’s more challenging. Are we optimizing for personal loan leads or is it for a different product lead? Is it logins? Are we okay if there is an impact on personal loan leads if we see this fantastic result for logins?

It’s become a much more dynamic conversation, which is really interesting but still challenging nonetheless.

James (Conversion): It’s a balancing act, right? As you said at the outset when we were speaking, you’ve got three different audiences all going through the same web experience. Frequently, they’re all experiencing the same touchpoints as they start their journeys. So balancing and creating that basket of metrics, and where is your tolerance for improvement or decreases in some metrics to improve the others.
Building A Successful Experimentation Program

[16.26 – 17.33]

Meghan (Best Egg): The tolerance [for experimenting] has evolved. When we started our work together, there was less tolerance for harming that top-of-funnel personal loan application starts.

We have much more tolerance now if we slightly harm that top-of-funnel personal loan metric. But we see another primary KPI for a specific test, be successful. We’ll look down to the bottom of the funnel now and say, okay, it didn’t impact the funded rate. We’re moving forward with this test,we just didn’t have those conversations, two years ago.

James (Conversion): It helps that you’ve got representatives from the Analytics Team and the Data Team in the room reviewing and planning these tests; almost getting ahead of what could happen is a big component. When you plan a test, this could go south or impact these metrics. Let’s think about and plan for that in advance, and how we’re going to answer those questions. That’s half the battle.
Unlocking Success By Understanding The User: Strategies For Deep Customer Insights

[17.34 -18.52]

James (Conversion): What about device types and user groups? Is anything weird there that’s happened?

Meghan (Best Egg): Yes, we’ve run quite a few tests now where we’re seeing directionally different results on desktop versus mobile. So we’re still trying to peel back the onion to understand why. It’s an interesting challenge. Going back to that iterative nature of testing and just getting all of these insights and learning some of that will start making its way onto our testing roadmap. What should these experiences look like for a mobile user versus a desktop? It’s clear the behavior is very different on those device types.

James (Conversion): When the industry can finally solve the question of are people just browsing on their phones, or are they shopping on their phones or do they return to another device?

Meghan (Best Egg): That all goes back to, starting to really think about those three cohorts more. Are there relationships, between somebody that’s a brand new visitor? Are they mostly on a mobile device or are they mostly on a desktop device? Because if they’re still in research mode, that may change and evolve the way we design things for that cohort. So it’s super interesting to start digging into that.
Best Egg’s Biggest Experimentation Wins

[18.53 – 23.00]

James (Conversion): How about the successes with that? With hard work, sometimes comes rewards.

Meghan (Best Egg): Just as many successes. So that’s always fun. The biggest one that comes to mind to me is starting my work in website optimization, I was very eager to dig into what are these technically complex overhaul experiences that we test and see what happens.

What I’m finding the longer I’m in this game is sometimes, and a lot of the time, the most simple, low-effort tests just give you the most bang for your buck. So, one that comes to mind that we’ve done is just a simple copy change on A CTA, giving us a 4% lift in application starts. Who doesn’t love that you’re seeing this positive impact? It’s not technically complex. You’re not waiting forever to implement it natively in production. Those have been fun successes that we’ve had.

Another example that I can think of that is, again, working with our insights team, learning that folks, more so this new visitor cohort, they just don’t know a lot about us yet. They’re not sure whether they should trust us. So, in our work with [Conversion], we landed on a testing variation where we simply just added what we call trust badges pretty prominently across a lot of our main pages on our website with the Better Business Bureau and the Trustpilot Badging, not super complex to do that gave us a 3% lift in application starts. Those have been some some big wins with [Conversion].

James (Conversion): I was going to say I remember actually with the trust badging that same ABR [annual business review] you were speaking about earlier. I remember us going through a levers analysis and looking at through the Levers Framework, all the different areas we had tested in where we’d maybe under-explored others.

Trust being a low one and the whole team kind of looked at that and gone there’s probably something there. I think trust would be impactful. Low and behold, not too long later, there’s the Easter egg.

Meghan (Best Egg): We continue to test into that. To your point, because that was a lower explored area, we’ve done things now, adding actual components across the website. We just wrapped up this test where we had a variation that was just our customer testimonials and then another one that was actual Trustpilot testimonials.

That was a winner that’s in our backlog to implement natively. It is taking a look and it’s so great to have the partnership with [Conversion] to expose areas within that whole methodology of, “Hey”, we haven’t explored that. We should tap into that and see what happens.

James (Conversion): I love the small wins as well, you mentioned, right? The small copy and CTA stuff. Hindsight’s always 20/20, but it’s fun to look back on those and think, “How did I not see that before?” It seems so obvious in hindsight.

There’s so many of those little tweaks that you can make as you go through the user experience. They’ve been a certain way for so long, but I feel like the whole team overlooks them. Right, you just take, one final or secondary glance and go, wait a minute. Maybe this copy is holding people back a bit, there’s some anxiety, and there can be some big unlocks. Then, as you mentioned earlier, follow-up tests and new insights that lead to more wins. That’s what keeps this interesting.

Meghan (Best Egg): The other thing I’m starting to learn and see is that, let’s use the CTA as an example. That was a winner, you know that person, right? It is a 4% lift in application starts.

That doesn’t mean just set it and forget it that way for three years. You know, give it its time, give it its space. Take a look at it again because there are always macro-environmental changes happening where maybe a specific piece of copy or whatever it may be that you’re testing just isn’t quite as relevant anymore. So capitalizing on what’s happening in the here and now from that macro space is super important.
Unlocking Success By Understanding The User: Strategies For Deep Customer Insights

[23.01 – 26.15]

James (Conversion): There’s the challenge of being in the product, and the experience you guys have there where, you work in the products day in and day out. It’s intuitive to you and the team, right? There are things that you might overlook with how your customers understand the suite of products, the differences, and the choices they need to make. How has testing helped there?

Meghan (Best Egg): I think that this idea has definitely been around for a while. Again, I can’t stress enough the importance of [Conversion] illuminating how we should leverage our insights team more.

The first concrete example of how we put that into practice was, we’re going to take a look at our home page, the most highly trafficked page on the website. Let’s see how we can understand it. Are there detractors for why people aren’t starting an application? Let’s just learn more about what’s happening there.

There’s been this reasonable hypothesis, that if we removed a certain element from our home page, that’s the unlock. We’re going to open up our top of the funnel. If we were to remove that element, it would be super technically complex. It would have a ton of downstream effects on the way the rest of the business works.

That’s not to say we shouldn’t explore these things. We absolutely should. Let’s be disruptive, especially if we think there’s opportunity there. What’s so important with working with insights is that the data debunks was the primary detractor. It is a detractor, but it’s more of a secondary one.

First, people just want to know more about the products. They just want to hit the site and understand quickly and concisely what we offer and the details behind each offering. That now becomes a different conversation in terms of complexity and level of effort. So we’ve spent a lot of time and a bulk of the roadmap with [Conversion] lately exploring those two areas of opportunity, going back to making sure that the qualitative data and the quantitative data are aligned.

We did test with [Conversion] from an A/B test perspective, removing that component, and we found that it did harm the overall application start. There is an opportunity in different ways of engagement that leads us to want to continue exploring that further. It’s helped validate, in a sense, in terms of everybody’s limited time priorities.

We’re going to start hitting hard on these product descriptions and understanding what the unlock is there. So that’s all happening in real-time with [Conversion], which is exciting to see. That’s a concrete example, of how these insights and your team and just getting these understandings of why upfront are so important.
Website Personalisation Is Crucial For Success

[26.16 – 28.05]

James (Conversion): Lots that we’ve already accomplished and I can hear already, plenty that’s on the near term roadmap. What about long term roadmap? Where does this take us and where does it take Best Egg? What’s does the future look like?

Meghan (Best Egg): I think in a nutshell, mainly talking about the unauthenticated experience at Best Egg, is starting to dig into how we can evolve from kind of using the website as more of an application start mechanism is how I kind of think about it to shifting into creating distinct experiences for those three primary cohorts that I mentioned earlier.

How are we creating a distinct experience for that first-time visitor who knows nothing about us? How are we creating an experience for our existing customers? Then how can we support returning visitors that haven’t yet converted, and how do we ultimately get them to take action?

So in the spirit of all of that kind of vision for the experiences we’re looking at, how we can better leverage data and those specific audiences within our tests with [Conversion] to get more focused learnings for each cohort. Continuing to optimize the website to help serve these limited saving customers, but meeting them exactly where they are in that journey.

James (Conversion): It’s journey optimization and sort of a small step into the world of personalization.

Meghan (Best Egg): Definitely, I would say long term vision, we’ll really start digging into personalization for sure.

James (Conversion): Meghan, t’s been great chatting with you. I appreciate the time and talking about the great work that you and the team are doing and can’t wait to see what’s next.

The post Spotlight : Conversion x Best Egg appeared first on Conversion.

Evolutionary Design: Crafting Websites That Adapt and Thrive

Haliena Brown — Tue, 27 Aug 2024 13:28:15 +0000

Most redesign projects will fail, but businesses will often be unable to identify why. Have you experienced the frustration of a website redesign that backfired, leading to a decline in conversion rates? Or does your brand find itself caught in a costly cycle, redesigning the website every few years without seeing significant improvements in customer experience or sales?

We’re here to explain why complete redesigns can harm businesses and why it may not be the solution you’re looking for. This post presents an alternative approach to traditional redesign: evolutionary redesign. Discover what evolutionary redesign entails and how it can help you boost your conversion rate.

Introduction
What is Evolutionary Design and why it’s better
M&S vs Amazon: Two very different approaches to redesign
How we use Evolutionary Design to iteratively build websites that convert
Next time you think about a redesign, think evolutionary.

What is Evolutionary Design and why it’s better

To understand evolutionary design fully, it is important to examine why traditional redesigns can harm businesses. Traditional design cycles within a company often include an expensive and lengthy process of implementing all design changes as one project. Although this has become the most common way businesses approach redesign, it has its major pitfalls. Implementing several changes at once means that if any aspect of the design negatively affects the conversion rate, finding the source of the issue can be extremely difficult.

In contrast, evolutionary redesign focuses on testing individual changes one by one. Separating complicated redesigns into smaller individual changes allows a de-risked approach to implementing design changes.

Why is this so important? During a complete redesign, it might appear that the overall conversion rate has decreased. Yet, this dip could be attributed to only a handful of changes having a negative effect. Meanwhile, some changes could be driving positive results, but the negative impact of the majority overshadows them. Since these positive changes remain hidden, it becomes impossible to identify and refine them through iteration.

The same principle applies to successful redesigns, too: while your overall conversion rate may have gone up, certain unidentified changes—changes that may have seemed like no-brainers—could still be pulling it down (even if things have netted out with a positive result).

For an example of such a ‘no-brainer’ change, consider the below:

Based on some user research, the Dominos team was keen to switch from a vertical scroll to a horizontal scroll on its menu. We urged them to test. They did. This change ended up being detrimental to the site, losing 3.13% of overall revenue, which, given the company’s size, would have equated to a loss of over $100 million.

Evolutionary Design allows us to remove any changes that negatively affect conversion rates, meaning only the changes that have positive effects are implemented. Consider the scenario where this alteration had been executed through a traditional batch redesign. Despite the identical adverse effects, pinpointing the root cause of the decline in conversion rate would have posed an impossible challenge, and the loss in conversion rate would have been far more significant over a prolonged period.

The argument for redesign can affect companies of all sizes, whether start-ups or established brands that have been online for years. In the next section, we’ll explore the redesign strategies of Amazon and Marks & Spencers and why one succeeded where the other failed.
M&S vs Amazon: Two very different approaches to redesign

In the realm of website redesigns, Marks & Spencer, the British retail giant, embarked on a two-year journey to overhaul their online platform. Their ambitious project, which culminated in the launch of the new website in 2014, came with a hefty price tag of £150 million.

This transformation encompassed two significant shifts: firstly, the migration of their backend away from Amazon, and secondly, the adoption of a more visually engaging “magazine” style frontend for their e-commerce interface.

However, despite meticulous planning and investment, the launch of the revamped website in February 2014 was met with disappointment. Sales plummeted within three months, and the company’s stock price took a hit. Upon reflection, it became evident that many technical and usability issues had plagued the user experience, contributing to the website’s lackluster performance.

In hindsight, consolidating all the changes into a single release proved to be a double-edged sword for M&S. While it forced them to confront and prioritize the issues, it also hindered their ability to address them incrementally and adaptively.

Had M&S embraced an evolutionary design strategy, they could have mitigated these setbacks and capitalized on increased sales in the preceding two years. This example serves as a poignant lesson in the importance of iterative refinement in the ever-evolving landscape of online retail.

“Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day.” ― Jeff Bezos.

Now, let’s compare M&S’s redesign approach with Amazon’s. When was the last time Amazon revamped its website? “Never” and “Constantly” are both valid responses.
Amazon consistently tests and enhances its website, refining various aspects such as the Buy Now button, 1-Click ordering, suggested products, and checkout. As a result, alterations to its website occur frequently, often in subtle ways.

Jeff Bezos, the founder of Amazon, is renowned for emphasizing the importance of experimentation in the company’s success. He has famously stated that Amazon’s prosperity is contingent on the volume of experiments conducted annually, monthly, and daily. Bezos advocates that doubling the number of experiments undertaken each year will lead to a corresponding increase in inventiveness.

Amazon continues beyond just one round of testing. They continuously iterate on their designs based on the results of A/B tests and other data sources. By constantly refining and optimizing their website, Amazon ensures it remains competitive and maximizes conversion opportunities.

How we use Evolutionary Design to iteratively build websites that convert
We don’t just tell our clients that evolutionary design works; we show them.

Here are two real-life examples, taken from our own work, of why you should move to an iterative design approach.

BuildDirect

BuildDirect, a leader in high-quality flooring, was looking to redesign its homepage. Instead of approaching this through a traditional redesign and implementing several changes simultaneously, we did this iteratively by testing various new components separately in an A/B/n test.

This method allowed us to split out all of the changes into unique variants and therefore meant that only the most positive changes were implemented, including one (variation C) that produced a substantial +16.9% uplift. More importantly, this approach allowed us to quickly identify and prune any iterations that were negatively affecting the site. If we had packaged them up into a wholesale redesign, the gains would have been mostly canceled out by the losses. Instead, we saw a significant increase in conversions.

WeBoost

Original product page design

WeBoost, a company offering cell phone signal boosters, sought to revamp significant portions of its website. Rather than approaching this with a traditional redesign project, we spent just over a year completing experiments for Weboost to ensure every change had the intended impact.

During this redesign process, we worked on Weboost’s category pages. We removed a lot of the technical information on these pages to direct customers straight to the product page for more information.

Our category page test results revealed a fascinating insight: the placement of technical information significantly impacts user behavior. When the technical details were available on the category pages, users didn’t rely as heavily on the product pages for clear information.

However, once we removed this information from the category pages, the clarity—or lack thereof—on the product pages became more apparent and detrimental to the user experience. As a result, we noticed that when customers reached the product pages, they encountered several issues. Key purchasing information was buried below the fold, vital package content was hidden in dense text, and multiple CTAs competed for attention.

Initial testing on the product pages yielded small but significant wins, prompting us to address above-the-fold issues. In variation A, we redesigned the hero section to tackle:
- Anxiety: The primary CTA was below the fold.
- Clarity: Product features were buried in the Specification section.
- Value Proposition: Strong reviews were hidden.
By moving the CTA and clarifying product features above the fold, we saw a 27% increase in completed orders.

When running tests for WeBoost, balancing speed and the precision of isolating impacts was essential. While our best practice involves testing individual changes separately to accurately measure their effects, there are times when constraints like limited time or budget necessitate a different approach.

We might conduct tests where multiple elements are changed simultaneously. This method allows for quicker insights and faster implementation, but it does come with a trade-off: the inability to pinpoint which specific changes drove the results. By acknowledging this trade-off, we can ensure that our strategy remains flexible and pragmatic, adapting to the needs of each client.

Exploring Further with Variation B

Following the success of Variation A, it was crucial to ensure we retained the increases we saw. To achieve this, we isolated one specific change in Variation B, allowing us to accurately measure its impact.

We introduced variation B to address below-the-fold issues. We implemented tabulated information, a strategy that had previously proven effective on the weBoost homepage.

Our hypothesis was that organizing content into clear tabs would improve eye flow, reduce cognitive load, and facilitate information retrieval, thereby driving conversions.

Despite strong support from our strategy team and the weBoost executive team, variation B unexpectedly underperformed, resulting in an 8% decrease in orders compared to the control. When isolated, the tab change led to a significant 27.6% reduction in sales compared to variation A.

It’s tempting for companies to dive headfirst into complete redesigns based on intuition or assumptions about what might work better. However, this experiment underscores the crucial role of evolutionary design. Although Variation B didn’t succeed, weBoost adopted the successful changes from Variation A. If we had combined all the changes from both Variations A and B into a single redesign of the product page, it would have been nearly impossible to identify the specific design elements causing the decrease. Additionally, the effective changes from Variation A would have been hidden.
Next time you think about a redesign, think evolutionary.

The journey through Evolutionary Design’s intricacies unveils a critical perspective shift from traditional website redesigns.

The cautionary tale of Marks & Spencer’s ambitious but ultimately flawed overhaul is an example of the dangers of redesign. In contrast, Amazon’s constant evolution underscores the importance of iterative design in maintaining competitiveness and maximizing conversion opportunities.

By embracing Evolutionary Design, businesses can move past the limitations of opinion-driven overhauls and pivot towards a data-driven, user-centric paradigm.

So, the next time the idea of a redesign comes up, remember Evolutionary Design. Embrace the ethos of continual improvement and harness data-driven decision-making. It’s the key to enduring success.

The post Evolutionary Design: Crafting Websites That Adapt and Thrive appeared first on Conversion.

Maximising Experimentation Impact: The ALARM Protocol

Haliena Brown — Tue, 23 Jul 2024 15:53:49 +0000

As experimenters, we often overlook the distinction between a hypothesis and its execution. A hypothesis represents a theory we aim to validate, while an execution is how we specifically plan to validate it. It’s conceivable to possess a robust hypothesis supported by evidence and yet execute it poorly.

This raises a question:

What measures can we take to ensure our execution effectively validates our hypothesis?

The ALARM protocol is not just another tool in the world of experimentation. It’s a comprehensive framework that allows you to scrutinize your experiment concepts from every angle, helping you test your hypotheses as effectively as possible.

In this post, we’ll offer an overview of our ALARM protocol before showing how to apply it to your experiment concepts to ensure they are as strong as possible.

Contents
What is the ALARM Protocol
Applying the ALARM Protocol

Introduction
What is the ALARM Protocol?
Applying the ALARM Protocol
Conclusion

What is the ALARM Protocol?
Broadly speaking, an experiment will lose for one of two reasons.
1. The hypothesis is incorrect – e.g. adding social proof to your product page was not an effective way of increasing trust.
2. The execution is poor – e.g. maybe social proof was effective, but the specific way you executed your hypothesis was the problem.
The ALARM protocol is a framework developed by our team of 20+ experimentation consultants to tackle the second item on this list: poor executions.

By passing experiment concepts through our ALARM protocol, we can ensure that our executions are as strong as possible before they’re eventually built. This means that when we decide to invest time into an experiment, we can be confident that it is the most effective execution possible for that specific hypothesis.

Generally speaking, we produce the best experiments by questioning ourselves. We are undoubtedly missing an opportunity if we always do the first thing we think of. The ALARM protocol is here to guide us and ensure our success.

The ALARM Protocol

ALARM – an acronym for Alternative executions, Loss factors, Audience and Area, Rigour, and MDE & MVE – is a structured approach to concept evaluation. Each component plays a crucial role in the process, and understanding them all individually is key to effectively applying the ALARM protocol.
Applying the ALARM Protocol
A: What ALTERNATIVE EXECUTIONS could be used to test this hypothesis? Why didn’t we pick those?

The first step in the ALARM protocol prompts us to consider alternative execution strategies that we might be able to use to test our hypothesis better. By exploring different approaches, we can uncover hidden opportunities and mitigate potential risks associated with our chosen approach.

Example

To understand the importance of considering alternative executions, take a look at the following example from our own work:

For the first experiment in this sequence, we added a comparison modal to the product category page so that users could compare the features of different products. We’d done our research, and we were confident in the concept, but when we ran the experiment, the result was flat.

We wanted to understand the why behind this result, so we dug into our data, and here’s what we found:
- Due to its low visibility—the modal only appeared after at least two products had been selected—only a very small proportion of users engaged with the new feature.
- Users that engaged with the modal actually did have a significantly higher conversion rate.
Given that users who engaged with the modal were, in fact, converting more reliably, we hypothesised that increasing the visibility of the modal would increase its conversion rate.

Therefore, in an iteration of the experiment, we chose to enhance the prominence of the comparison feature. Although the design remained the same as in the initial experiment, the modal was shown by default once the user landed on the category page. This meant the modal was visible to all users, regardless of whether or not they had selected to compare items. This change increased the visibility of the feature and ultimately resulted in an 11.7% improvement in revenue.

The above example underscores the crucial importance of exploring a wide range of executions. If we had applied the ALARM protocol, would we have questioned whether this execution was bold enough earlier and got winning results sooner?

L: Write down at least four reasons that the test might LOSE. Should we adapt the execution to mitigate those risks? If not, why not?

Once we’ve considered potential alternative executions, we need to identify potential reasons for a concept’s failure and proactively mitigate these risks. By understanding and addressing potential pitfalls upfront, we can increase the likelihood of success.

Example

In the above experiment, we identified two key areas that may have caused a loss during this experiment.

Firstly, the user could question whether the rating is based on a reliable number of people. We can and should mitigate this risk. When considering the mitigable risk, we propose that disclosing the number of reviews contributing to the score will increase confidence that these reviews are from real customers.

Secondly, the user might find Trustpilot more credible than Feefo, as it is a more prominent and well-known review site in the UK. However, we can’t risk displaying the Trustpilot score as it is too low and may, therefore, have the opposite of the intended effect. Although a smaller and lesser-known review site, the Feefo score is much better, with enough customer reviews mitigating the first loss factor. This factor cannot be mitigated—we are taking a considered risk to learn.

A: Is there a better choice of AUDIENCE & AREA to maximize our chance of a winner?

The next step in the ALARM protocol asks us to consider whether we would have a higher chance of success if we tested the concept on a different site page or to a wider audience.

Is there a risk that the change is too early or too late in the journey? Will the execution shrink your audience? For example, if users are only exposed to the experiment change when clicking on a tooltip or scrolling down the page, you are shrinking your audience as not everyone will see the change.

Example

Examining the impact of the selected area on experiment results is crucial. Typically, the chosen area directly influences the kind of audience that will be interacting with the experiment.

In the experiment above, which was conducted on a vehicle rental website, we tested placing step-by-step instructions for the booking process on the homepage, the earliest point in the user’s journey. This test did not lead to a significant uplift, but while this outcome was unfavorable, it allowed us to learn and iterate.

We conducted a second test implementing the step-by-step instructions for the booking process on the location page, which yielded positive results despite occurring later in the user’s journey. This test shows the value of questioning the page you choose upfront and thinking about this more carefully. Exploring multiple areas is vital in producing results with the highest impact.

The iteration process above – prompted by using the ALARM protocol – holds significance, especially when the initial experiment did not provide any significant results. Using the protocol, instead of prematurely deeming the experiment a failure, we considered an alternative area that could be valuable. This approach determines the next course of action: if successful in a new location, it wins; if unsuccessful, it prompts consideration of an alternative approach.

R: Have we taken at least two actions of RIGOR to ensure the execution is as good as possible?

Here at Conversion, we have several predefined methods to ensure our execution plans are robust and well-thought-out. For example, maybe we can use our experiment repository to see if we can use learnings from previous experiments on similar websites to inspire our concept. Or maybe we explore our library of psychological principles to see if one of them can be applied to our experiment.

This step in the ALARM protocol is where we apply at least two of these actions of rigor to our concept. By conducting thorough research, gathering data-driven insights, and finding supporting psychological principles, we can refine our concepts and maximize their potential for success.

Example

When looking at rigor, one impactful way to corroborate your experiment is to look at supporting psychological principles. Two examples of psychological principles we often use within our experiments include:
- Social Proof: Including the number of customer reviews may lower the risk in the customer’s eyes. Although this doesn’t fully mitigate the perceived risk, it does reinforce that other customers have chosen and had a positive experience with this company.
- Picture Superiority Effect: Using an image or icon, like the stars shown above, alongside the review count makes it easier for the customer to perceive the positive reviews. Removing the need to read and use stars is widely recognized as a sign of trustworthiness.
M: Is your concept bold enough to hit your Minimum Detectable Effect (MDE)?:

The ALARM protocol’s final step is evaluating the proposed concept based on whether or not it is likely to be bold enough to hit our minimum detectable effect (MDE). For us here at Conversion, this is a slightly more nuanced issue than you might think:

Our experiment database shows that experiments with a small build size are just as likely to win as those with a large build size. Our philosophy, therefore, is to attempt to validate our hypotheses with the smallest experiments possible, i.e. the minimum viable experiment (MVE).

The balance we need to strike here is to ensure that our experiment is small enough to validate our hypothesis with minimum effort while being bold enough to hit our MDE. Generally speaking, there are usually ways to increase the experiment’s boldness without necessarily increasing the build size.

For instance, imagine an experiment where you want to add reassurance messages like “You are free to cancel anytime.” A less bold approach might overlook how to make this content ‘pop’ on the page. A bolder strategy, however, could simply involve placing the content more prominently or integrating it into a site-wide banner for increased visibility across multiple pages.

Example

Remember, for every idea you have, you should ask what the smallest thing you can test is that proves your hypothesis could be correct.

In the above experiment, we were looking to optimize the value statement lever. To do this, we simply adjusted the copy of the 3-for-2 roundel, which resulted in a +3.55% uplift in transactions. This experiment was an MVE – it involved a simple copy change – but it was also bold enough to hit our minimum detectable effect – the roundel was displayed prominently across the site, where lots of people would see it.

A Journey Towards Excellence

As shown above, the ALARM protocol should guide the depth of execution in an experiment, ensuring a thorough understanding of the data, lever, hypothesis, and execution strategy. By following these steps, we can navigate potential risks and optimize our approach for success.

In the examples provided, we identified areas of potential issues and proposed mitigation strategies, highlighting the importance of addressing risks where possible and accepting calculated risks to facilitate learning.

Integrating the ALARM protocol ensures that every concept is rigorously evaluated before proceeding. It is a structured framework for fostering innovation and ensuring that our concepts have the best possible chance of success.

If you’re curious about how this works or have any questions, please contact us! We love talking about experimentation, and we’re always eager to share what we know!

The post Maximising Experimentation Impact: The ALARM Protocol appeared first on Conversion.

Conversion

AI + Experimentation = Growth | Inside Adobe’s Agentic AI

Contents

Introductions

Top use cases for experimentation and optimization

“It’s never been easier to create content for experiments. And I think it’s maybe never been harder to get meaningful insights on top of that. The goal is to make that analysis approachable and also reusable in future contexts, so you are learning from it over time.”

What is experimentation accelerator & key challenges

Challenges and pain points addressed

“By creating a centralized integration point, we’re helping teams share learnings across units, not just in PowerPoint decks or Jira tickets. Instead, they can collaborate in an automated, conversational way, breaking silos and scaling insights.”

How it works: AI insights & reliability

How AI Experiment Insights Work

Handling Conflicts and Scaling

AI agents, assistants & Orchestrator platform

Understanding Adobe’s AI Framework

How Agents Work and Specialization

Building a Custom Agent

Future Evolution and Adaptability

“Everything’s grounded in real-time customer profiles and behavioral data.”

Beta results & real-world impact

Early Customer Impact

What sets Adobe apart

Differentiation in the Market

Future of experimentation & final advice

The Future (Next Year and Beyond)

How to Adopt AI Successfully

Closing

Links

The Conversion maturity model: How mature is your experimentation program?

Contents

1. Not another maturity model: why we felt the need to develop a new maturity model

2. The five stages of program maturity

Reactive

1. Emerging

2. Strategic

3. Integrated

4. Optimized

3. The Areas dimensions of maturity

1. Experiment Goals

2. Delivery and Process

3. Strategy and Culture

4. Data and Tools

4. Bringing it all together: how to actually apply this stuff

Step #1: Honest Assessment

Step #2: Identify Your Constraints

Step #3: Create Your Roadmap

Step #4: Set Realistic Timelines

Step #5: Regular Reviews

UX Research: Moderated vs. Unmoderated Testing

Contents

What is moderated user testing?

What is unmoderated user testing?

Which testing method should you use?

Integrating user research into experimentation

The Author

Using experimentation to find a product’s optimal price (and increase RPV by 16% in the process)

What question keeps product leaders up at night?

Contents

How do you decide what to charge?

Will customers tell you what they pay

But…are people reliable?

Enter: Mixed-methods research

Balancing risk and reward

A/b testing price

But wait, there’s more – price anchoring and framing

Holding horses: evaluating long term impact

8 ways to get marketing and product teams experimenting together

Contents

Build a strong foundation for teams to collaborate

1. Define roles and responsibilities

2. Choose a unifying testing tool

3. Standardize with templates and processes

4. Work with the same data and KPIs

5. Get input from AI & different teams before running user research

6. Build test roadmaps together

7. Invite AI and people from different departments to ideation sessions

8. Celebrate improving the company flywheel

Takeaways: 8 practical ways to get marketing & product experimenting together

Unlocking Insights: The Power of Painted Door Tests

Contents

What is a Painted Door Test?