The 9 essential steps to succeed in an A/B test
16/07/2024
Summary
Here are 9 steps to follow BEFORE starting an A/B test:
- Identify a problem to solve
- Identify multiple solutions
- Identify the KPIs influenced by the solution
- Socialize the problem and solutions with other teams
- Check that the impacted KPIs are measured and measurable
- Measure current KPIs, project expected gains
- Calculate the number of conversions needed, the time required and the number of experiments testable simultaneously
- Determine success criteria
- Decide what to do in case of success and in case of failure
No serious optimization program can succeed in the long term without following these nine steps. Here’s who should be responsible for them, and why they should be done.
1. Why identify a problem to solve? [Business Team]
- Because it’s not reasonable to want to test a solution that doesn’t answer to any problem. “What if we tried to…”: these ideas we have while brushing teeth in the morning or after talking with a neighbor, are perhaps interesting but they have little chance of fitting into the company overall strategy or significantly improve the business.
- Because while I’m testing a solution to no problem, I’m losing a slot to test a solution to an identified and documented problem.
2. Why identify multiple solutions? [Business Team]
- Because some solutions are not technically or legally feasible. Some take too long: can I wait twelve months to check my solution really increases the renewal rate?
- Because some solutions are more likely to succeed than others.
3. Why identify all the KPIs influenced by the solution? [Business & Optimization Teams]
- Because some solutions influence secondary KPIs: when I improve the number of credit card being updated (micro KPI), I improve the retention rate (macro KPI).
- Because a solution can influence KPIs we do not want to modify: I add a new payment method in order to increase the Conversion Rate (CR%) in the check-out. This new payment method takes market share from another method. My volume of transactions for this other method drops below a limit and my transaction fees increase.
4. Why socialize problem and solutions with other teams [ALL Teams]
- Because a proposed solution may affect other teams and tests as previously illustrated with payment methods.
- Because the solution may go against the other team’s project: I want to introduce new products to cover a wider price range while another team tries to reduce the number of products to maintain.
5. Why check the impacted KPIs are measured or measurable [Business & Optimization Teams]
- Because some KPIs aren’t measured (or not yet measured): I’m creating a feature so that customers can update their credit card, because it’s necessary. However, I do not have an objective in terms of the number of updated cards, therefore no KPI. I don’t feel the need to measure this quantity.
- Because I’m on the wrong track if I try to solve a problem that cannot be measured: how can I say there is a problem with a quantity that I cannot measure? The problem is certainly poorly identified.
6. Why measure current KPIs and project expected gains [Business Team]
- Because I need to know the current state of the KPIs to calculate the current Customer Lifetime Value (CLV). Then I can estimate the gains and calculate the future CLV.
- Because it’s useless to test a solution which doesn’t bring a significant gain, or which requires unachievable KPI increases : if my test aims to change the new product mix with no booking impact, there’s no point testing solutions requiring a conversion rate (CR%) greater than 100%.
7. Why calculate conversions needed, the time required and number of experiments ? [Optimization Teams]
- Because the result of an A/B test, to be considered valid, needs to be significant (enough data) and must reach a sufficient level of confidence*.
- Because you don’t conclude a test after only 5 conversions in each experiment.
- Because there is generally seasonality during the week (more visits on weekends for example), a test must always run over a whole number of weeks.
- Because you need as much traffic and conversion in each experience. So the more experience you have, the more time it takes. If testing 2 experiences requires 3 weeks then, testing 6 experiences will require 9 weeks.
8. Why determine success criteria? [Business Team]
- Because agreeing before the results allows you to stay honest.
- Because when a test doesn’t give the expected results, we are tempted to make concessions, to lower our requirements so as not to have the impression of having failed, and out of self-esteem towards a solution that we created. I test the introduction of a new product to my portfolio with the objective of increasing volume and booking at the same time. If the test result only shows a gain in booking, I will be tempted to accept this solution to not feel like all this work was useless.
9. Why decide what to do in case of success and failure? [Business Team]
- Because optimization is a long-distance race, you must always have a list of solutions to test, always know the next step.
- Because I have several options after a test: (1) keep the current version in case of failure; If successful, (2) implement the tested solution immediately or (3) launch a phase 2 of the test.
- Because sometimes failure isn’t an option: The test which consists of validating the company’s new graphic identity has no impact on booking turns out to be a bitter failure. What do we do ? Do we accept financial losses ? Are we canceling the new graphic charter ?
Key Takeaways
- The Business team must be in charge of improving the business. The Optimization team is in charge of execution, advice and history.
- An optimization program is a scientific approach aiming to validate hypotheses regarding improvements. It’s a team sport that requires method.
- A solution that doesn’t address a problem, just like a problem without data, has no place in an optimization program.
* The confidence level (or confidence index) of a test is the probability that the results observed aren’t due to chance, but reflect a real impact on user behavior. A confidence rating above 95% is generally considered valid.