Four Suggestions for Running Experiments More Efficiently

Conducting controlled experiments is the best way of determining whether a site or app redesign would lead to improvements on key metrics. One barrier is the amount of time or resources it takes to run experiments. You may have a low traffic site, you may want to detect small differences in key metrics (i.e. fractions of a percent), or you may want to get experiment results faster. Here are some suggestions on how to run experiments more efficiently.

#1 – Filter out users who are not impacted by the change. For example, if you estimate that only 20% of users will see your redesign, you should include only these users (and exclude the remaining 80%) when testing for significance, which will reduce the number of users (and therefore time) needed for your experiment (see my earlier blog post).

#2 – Pick binary instead of continuous Key Performance Indicators (KPI). Binary KPIs are ones that can only be one of two values. Conversion would be an example – either the user converted (for e.g. made a purchase on your e-commerce site) or did not convert during the experiment period. Quite often, binary KPIs have a lower variance than continuous KPIs, making it easier to detect differences between the control and the treatment in binary than continuous KPIs.

The variance of binary KPIs can only take on a maximum value of 0.25 based on the variance formula p(1 – p) in which p = the proportion of users who converted. The variance will be 0.25 if 50% of your users convert. Any conversion rate higher or lower than 50% will result in a variance lower than 0.25. On the other hand, the variance of continuous KPIs (e.g. revenue generated per user) will often be much, much higher than just 0.25. The higher the variance, the bigger the sample size needed.

#3 – Run A/B experiments instead of Multivariate experiments. If you have a low traffic site (e.g. less than 30,000 visitors a month), forget about Multivariate testing. Even the simplest form of Multivariate experiments require 4 groups of users (1 control, 3 treatments), and without enough traffic, you won’t be able to detect differences. Run a simple A/B experiment instead, and make sure that the redesign in the treatment is quite different from the control to maximize the chances of detecting differences between the two variants.

#4 – Use Pre-Experiment Data to control for user differences. Let’s say you’re interested in comparing the control vs. the treatment on a continuous KPI – for example, visit frequency or visit duration per user. How users score on these KPIs will depend not just on whether they are in the control or the treatment (assuming that the treatment had an impact), but also on users themselves. Some users will visit your site more often or spend more time on your site than others for various reasons; these user differences make it harder to detect differences between your treatment and your control.

You can minimize user differences by using data on these users before they are assigned to control or treatment conditions (i.e. pre-experiment data) as a covariate in your analysis. For instance, you can look at visit frequency of your users over a 2-week period before your experiment as a measure of user differences in visit frequency, and control for that in your analyses when comparing visit frequency of your control vs. the treatments. According to the Bing Data Mining team at Microsoft, controlling for pre-experiment data as covariates can reduce the number of users needed for an experiment by 50%. (For more information, look up ANCOVA or Analysis of Covariance.)