Conducting controlled experiments is the best way of determining whether a site or app redesign would lead to improvements on key metrics. One barrier is the amount of time or resources it takes to run experiments. You may have a low traffic site, you may want to detect small differences in key metrics (i.e. fractions of a percent), or you may want to get experiment results faster. Here are some suggestions on how to run experiments more efficiently.
Imagine a company that sells a line of products and services. This company will likely have multiple goals for its website:
- to sell its products and services online
- to collect user information for sales prospects
- to drive brand awareness and loyalty
- to provide online support for existing customers
Let’s say the company has identified 20 KPIs (Key Performance Indicators) that measure the success of these four goals, and it is committed to optimizing the conversion of these goals by running many experiments. Should the company launch the treatment if some KPIs perform better but others perform worse than the control (i.e. the original site)?
To know how your website’s doing, you need to define your Key Performance Indicators. If you’re running a blog (or some sort of content publishing site), and your goal is to increase user engagement, your Key Performance Indicators may include number of visits, pageviews per visit, and visit duration.
I haven’t watched American Idol in years, but I still remember the cringe-worthy auditions of those who claim to be the next Idol yet can’t carry a tune to save their lives. For some, it’s hard to objectively evaluate their own talent when there is so much at stake.
Psychologists have coined the term “motivated reasoning,” a tendency for people to reason in ways that allow them to form or maintain desirable beliefs (e.g. that they can sing). They may readily accept information that supports their beliefs as valid but question information that challenges their beliefs (remember how angry those contestants were at the Idol judges?).
In a similar vein, research should not be conducted by those who have a stake in how the research findings turn out. This may seem obvious, but I’m surprised by how often it still happens.
There’s a recent term called “Google Statisticians.” No, they are not statisticians who work for Google; they are people who do statistical analyses by googling words like “how to do significance testing” or “how to calculate p.”
As biostatistician Jeff Leek pointed out, most analyses are no longer performed by statisticians, as data are now abundant and cheap to collect. Long gone are the days of door-to-door surveys, and phone surveys are almost a thing of the past. Online surveys are everywhere due to platforms like Lime Surveys and the powerful Google Consumer Surveys that make it easy to collect and analyze survey data. Log file data is free and overwhelming in size. There’s even software geared towards non-statisticians that automates statistical analyses.
My main recommendation – include only users impacted by the change in your analysis; exclude users who are not.
- Let’s say you have an e-commerce site. You want to test whether certain changes to your checkout page would increase conversion (% of users purchasing).
- You want to run a 2 x 2 Multi-Variable experiment with 1 control and 3 treatment groups.
- Your current conversion is 5%; you want to detect conversion changes as small as 10% (with the conventional 80% probability of detection and confidence level at 95%).
- According to this table in my blog post, you would need 30,400 users in each group, or 30400 x 4 = 121,600 users in total visiting your site. (That’s a lot!)
To calculate how many people you need in your experiment, you need to know 3 things:
1. How many groups are in your experiment?
- In an A/B experiment with a control and treatment group, you have 2 groups.
- In a 2 x 2 Multi-Variable experiment with 1 control and 3 treatment groups, you have 4 groups.
- The more groups you have, the more people you need.