
Rebuilding Mailchimp’s A/B Testing Experience
A unified A/B testing framework that simplifies creation, increases adoption, and fits seamlessly into real marketing workflows

Defined and delivered the vision for Mailchimp’s A/B testing platform—securing a dedicated initiative and driving a 140% increase in exploration, 23% repeat usage growth, and a clearer path for marketers to test with confidence.
CONTRIBUTIONS
Research qualitatively & quantitatively to understand the problem space
Present vision & impact to leadership, leading to formation of a dedicated team for this initiative
Create & communicate references such as design system & principles for consistency
Led design in collaboration with designers, engineers, and product managers on multiple teams utilizing A/B testing
Role
Lead product designer
Team
1 product manager
4 engineers
Timeline
2024 - 2025
Problem
What is A/B testing
A/B testing is an experimentation method that helps marketers compare variations of a campaign to understand what resonates with their audience and make data-driven decisions.
Opportunity gap
Although A/B testing is critical for marketers, Mailchimp saw a low established usage rate of 0.34% of its tool among eligible users, indicating issues with the product experience.
95% of users had not used A/B testing in Mailchimp
USER NEED
Marketers need to A/B test as an essential method to:
Understand their audience
Improve strategy & tactics
Increase performance & key metrics
59% of companies are A/B testing marketing channels
"If we’re not testing, we’re not marketing"
User PROBLEM
Mailchimp users are rarely testing due to:
Poor discoverability
Limited functionality
Low interpretability & actionability
Less than 1% of Mailchimp users are A/B testing
"I would get frustrated at this point and stop A/B testing"
Why it matters
As businesses grow, so must their marketing capabilities.
If marketers can't perform essential actions such as testing, they lose confidence in the platform and churn toward more capable tools.
"The biggest difference between Mailchimp and (competitor), and one of the main reasons we switched, was the ability to A/B test automations”
Key metrics
Success for an experience which improves user discoverability, usability, and interpretability can be measured by the following metrics:
Usage rate
Completion rate
Repeat usage rate
Mailchimp's current A/B testing experience doesn't meet users' needs
Solution
Final experience
A unified A/B testing experience that works across channels, supports simple to advanced tests, and provides clearer guidance and insights at every step.
Discoverability
A/B testing entry points were aligned with marketers' natural workflow of creating a base campaign and then adding a variable to test.
Functionality
A/B testing capabilities were expanded & refined to enable and guide users to create effective tests
Interpretability
Results were presented in a way which allowed for easy analysis, gathering insight, and direction for subsequent actions.
Results
The team was able to launch A/B testing for SMS, and afterwards for forms. After a month of collecting data for SMS, we found the following results:
Discoverability:
Increase of 140% exploration rate
Functionality:
12% increase in completion rate
Interpretability:
220% increase in repeat usage
This shows that the changes were effective, and with monitored results for forms as well as deeper quantitative & qualitative research, can make any needed adjustments and continue to roll out to other marketing channels.
Research
Goals
Took lead in conducting effective research from start to finish of this project, with the following goals:
Manage & prioritize research methods throughout the process
Understand user behaviors & attitudes
Generate insights that drive data informed decision making
Communicate & collaborate with cross-functional stakeholders
Methods
Each stage of the process required different methodologies.
I utilized current research & data to inform research plans for subsequent exploratory research. After each round of designs & releases, additional research & monitoring were needed to ensure we were heading in the right direction.
Current landscape
Synthesize 2 past research studies
Data analysis using Looker, Amplitude, & custom data
Competitive research with the 5 relevant competitors
Heuristic evaluation of the current experience
Research papers of best practices
New insights
Primary, in-depth interviews with 20 target users
Diary study & analysis of artifacts
Concept testing with 10 users
Interviews with 2 subject matter experts
Card sorting
Surveys
continual refinement
Usability testing, both moderated and unmoderated
Consultations with internal subject matter expert
Data monitoring of usage & key actions
Data on current product usage, competitive research, and qualitative research with user artifacts
Insights & outcomes
Research was used to create & inform the following:
Presentation to leadership which justified formation of a dedicated team
Strategic direction based on user perspectives & behaviors
Clear mental model to align product information architecture
Guidelines on how AI can best assist marketers with testing
Strategy
Project priorities & goals
When considering designs & inter team negotiations, it was essential to outline each teams' priorities.
Design
Meet user needs & minimize friction
Scalability & consistency
User guidance, including utilizing AI capabilities
Engineering
Technology stack that is capable of future features
Minimize duplicate codebases
Reusablility
Business & product
Contribute to key company metrics
Speed to delivery
Reduce risk
When also considering end goals for this initiative, it was essential to align how elements of the experience would ladder up to high level impact that business cared about.
Lining up product improvements with user & business needs
Key questions
I worked with product and tech leads to answer the following questions. I did so by anchoring direction to the research, understanding the priorities of each partner, and finding compromise where needed.
Where to introduce testing first?
To balance risk while maximizing learning, we launched A/B testing in SMS first—a smaller, highly motivated user group—before expanding to email, Mailchimp’s highest-visibility and highest-impact channel.
What scope of testing is allowed?
We defined the right balance between flexibility and guardrails by evaluating test scopes—from individual fields to cross-channel assets. Based on user mental models and best practices, we enabled testing entire assets, ensuring meaningful insights without over-complicating setup.
How can AI be leveraged to help?
To meet rising AI expectations without creating low-quality output, we focused AI on high-value moments: generating test inspiration and helping interpret results. Research showed these areas delivered clarity and confidence without replacing user intent.
Design process
Team alignment
Alignment among stakeholders was needed to ensure cohesive communication.
Some areas of alignment included a glossary based on technical documentation & intuitive common terminology which was helpful both internally and in guiding product content language.
Reference material for internal alignment
Design principles
The following were principles used to guide feature prioritization and design decisions
Flexibility:
Adapt to unique & evolving user needs
Guidance:
Provide starting points & education moments in context
Consistency:
Utilize patterns that apply to all areas of testing
User flow & touch points
Creating a map of the overall user flow & touch points for A/B testing helped orient and contextualize the end to end experience throughout the design process.
High level user flow through the product
Ideation
In a team session, I lead a workshop to ideate in context to the varying stages of the user journey.
Tied to each were insights related to user needs & problems to help direct the ideas.
Ideation workshop with context of the product workflow
Explorations were made for various flows, layouts, and features.
Low fidelity mockups were quickly created for breadth of ideation.
Low fidelity explorations of diverging concepts
Prototyping & testing
Features that were critical were prioritized to be tested, as well as experiences that answered any unknown questions about user expectations.
Testing prototypes across common scenarios
There was an initial moderated test with 10 users to be able to ask questions and understand nuanced behaviors.
Afterwards, multiple unmoderated tests were conducted for quick validation. The results were synthesized and presented to the team to establish clear direction.
Analysis from concept & usability testing
Design decisions
Several key design decisions were made, utilizing design principles, collaboration with cross functional teams, and evidence from qualitative & quantitative research. Below are some major decisions that were deduced through testing.
Hybrid workflow
A workflow that is familiar when creating field variants while building the base campaign, and then clearly an A/B test when refining and reviewing test details.
The tension was whether or not to fully integrate the experience in the base experience, to fully have a separate experience, or a hybrid.
Through synthesis of user behaviors during testing, users need a familiar environment when creating base campaigns & field variants, however, a separate testing environment allows for intuitive test set up and review.
Shifting experiences from base email editing, to A/B testing
independent campaign assets
Independently editable campaign variations will be created to empower users to be the expert in how they craft their tests.
Though in best practices tests should only have one variable & change at a time, sometimes what is tested such as subject matter is more nuanced and require changing multiple fields like subject line & content to adequately test.
This led us to prioritize the ability for users to independently edit each campaign, which also aligns with what users expect when creating a test.
Each campaign version can be edited independently
With AI assistance, provide choice at varying levels
Users expected AI to help generate alternatives to test, but they needed to still feel in control.
Through choice and reasoning for generated choices, users gained confidence and a sense of autonomy in creating a test they wanted.
AI suggestions at varying levels of the creation process, with explanations for each option
Comparison
Reviewing variations side by side is essential to establish confidence before testing.
We noticed users flipping back and forth between campaign versions to check the variables & constants before sending out their test. In order to better facilitate this need for visual verification, side by side comparison was added.
Providing a way to quickly compare versions
Key deliverables
Design system
Established components & patterns for A/B testing, which were integrated into the larger product design system.
Roadmap & designs
Roadmap & designs for A/B testing across all campaign types, across walk, crawl, run capabilities.
Sunset plan
Sunset plan & experience to archive the old experience and transition users to the new one.
SMS A/B testing launch
Launched A/B testing for SMS which showed improvements in performance across all metrics.
Reflections
Initial launch takeaways
The new A/B testing experience has proven effective across SMS and forms, establishing a strong foundation for expansion into additional surface areas.
While broader rollout phases were defined, leadership changes and shifting priorities introduced timeline variability.
Despite this, the future state of A/B testing at Mailchimp remains clearly defined—with a shared vision, reference guidelines, and validated user value ready to scale when priorities realign.
What I learned
Single source of truth
When a feature spans multiple surfaces, a single, shared source of truth is critical to keep product, design, and engineering aligned, which reduces downstream rework, as well as maintains a consistent user experience.
Juggling priorities
Teams operate against different success metrics; understanding those incentives early allows you to frame the product in terms of shared outcomes, accelerating alignment and decision-making.




























